Skip to content

8Altair/Compression-of-sequential-values

Repository files navigation

Compression of Sequential Values

This repository implements a simple yet effective lossless compression algorithm for sequences of unsigned 8-bit integers and provides tools to evaluate its performance.
The core idea is to compress the differences between consecutive values using variable-length codes, with special handling for zero runs and large differences.
A complementary decoder reconstructs the original sequence, and an analysis module evaluates performance.


🧩 Algorithm Overview

1. Differential Encoding

  • The first value in the list is written directly as 8 bits.
  • Each subsequent value is replaced by its difference (diff = current - previous).

2. Token Types

Prefix Meaning Details
00 Small difference 2–5 bits follow, depending on the range:
• 2 bits → differences in [-2, -1] ∪ [1, 2]
• 3 bits → [-6, -3] ∪ [3, 6]
• 4 bits → [-14, -7] ∪ [7, 14]
• 5 bits → [-30, -15] ∪ [15, 30]
01 Repetition 3 bits follow → count of consecutive zeroes (1–8 repetitions)
10 Absolute encoding Used when
11 Terminator Marks the end of the sequence

3. Output

All bits are concatenated into a bitarray and converted to bytes for storage or transmission.

4. Decompression

The decoder reads the bit stream, reverses each token type, reconstructs differences,
and rebuilds the original data by cumulative summation.


🧮 Features

  • Lossless compression/decompression
  • 📦 Variable-length bit encoding for efficiency
  • 🔁 Repetition detection for identical values
  • ⚙️ Automatic absolute fallback for large differences
  • 📊 Integrated analysis module to measure:
    • Compression ratio
    • Compression time
    • Decompression time
  • 🖼️ Graph and table generation for performance visualization

🧰 Repository Structure

Compression-of-sequential-values/
│
├── compression.py            # Main compression logic
├── decompression.py          # Reverse process
├── coding_tables.py          # Precomputed forward/reverse tables
├── logging_configuration.py  # Custom colored logger
├── main.py                   # Entry point and demonstration
│
├── Analysis/
│   ├── performance_analysis.py  # Performance metrics
│   ├── run_analysis.py          # Automated batch testing
│   └── README.md
│
├── Data/
│   ├── compressed.txt
│   ├── decompressed.txt
│   └── README.md
│
└── .gitignore

🧪 Testing Setup

Test Parameters

  • Sequence length N ∈ {5, 50, 500, 5000}
  • Maximum adjacent difference M ∈ {5, 10, 15, 30}
  • Random values in [0, 255]

Recorded Metrics

Metric Description
Compression ratio Original size / Compressed size
Compression time Time taken to encode
Decompression time Time taken to decode

Observed Trends:

  • Smaller M → better compression (up to ≈1.3 ratio for long sequences)
  • Larger M → ratio approaches 1.0 or below (possible expansion)
  • Time scales roughly linearly with N

📈 Example Outputs

  • Compression Ratio vs. M: Highest efficiency at small M
  • Compression Time vs. M: Minor variation; first run slower due to setup
  • Decompression Time vs. M: Flat trend after warm-up

⚙️ Requirements

  • Python ≥ 3.10
  • Required packages: bitarray, pandas, matplotlib, python-docx, colorlog

Results (graphs and tables) are saved to Analysis/Results/.

Note: Logging slows down compression and decompression time by a large margin.

About

This project implements a lossless differential compression algorithm for sequences of 8‑bit integers.

Topics

Resources

License

Stars

Watchers

Forks

Languages