This repository implements a simple yet effective lossless compression algorithm for sequences of unsigned 8-bit integers and provides tools to evaluate its performance.
The core idea is to compress the differences between consecutive values using variable-length codes, with special handling for zero runs and large differences.
A complementary decoder reconstructs the original sequence, and an analysis module evaluates performance.
- The first value in the list is written directly as 8 bits.
- Each subsequent value is replaced by its difference (
diff = current - previous).
| Prefix | Meaning | Details |
|---|---|---|
00 |
Small difference | 2–5 bits follow, depending on the range: • 2 bits → differences in [-2, -1] ∪ [1, 2] • 3 bits → [-6, -3] ∪ [3, 6] • 4 bits → [-14, -7] ∪ [7, 14] • 5 bits → [-30, -15] ∪ [15, 30] |
01 |
Repetition | 3 bits follow → count of consecutive zeroes (1–8 repetitions) |
10 |
Absolute encoding | Used when |
11 |
Terminator | Marks the end of the sequence |
All bits are concatenated into a bitarray and converted to bytes for storage or transmission.
The decoder reads the bit stream, reverses each token type, reconstructs differences,
and rebuilds the original data by cumulative summation.
- ✅ Lossless compression/decompression
- 📦 Variable-length bit encoding for efficiency
- 🔁 Repetition detection for identical values
- ⚙️ Automatic absolute fallback for large differences
- 📊 Integrated analysis module to measure:
- Compression ratio
- Compression time
- Decompression time
- 🖼️ Graph and table generation for performance visualization
Compression-of-sequential-values/
│
├── compression.py # Main compression logic
├── decompression.py # Reverse process
├── coding_tables.py # Precomputed forward/reverse tables
├── logging_configuration.py # Custom colored logger
├── main.py # Entry point and demonstration
│
├── Analysis/
│ ├── performance_analysis.py # Performance metrics
│ ├── run_analysis.py # Automated batch testing
│ └── README.md
│
├── Data/
│ ├── compressed.txt
│ ├── decompressed.txt
│ └── README.md
│
└── .gitignore
- Sequence length
N ∈ {5, 50, 500, 5000} - Maximum adjacent difference
M ∈ {5, 10, 15, 30} - Random values in
[0, 255]
| Metric | Description |
|---|---|
| Compression ratio | Original size / Compressed size |
| Compression time | Time taken to encode |
| Decompression time | Time taken to decode |
Observed Trends:
- Smaller
M→ better compression (up to ≈1.3 ratio for long sequences) - Larger
M→ ratio approaches 1.0 or below (possible expansion) - Time scales roughly linearly with
N
- Compression Ratio vs. M: Highest efficiency at small M
- Compression Time vs. M: Minor variation; first run slower due to setup
- Decompression Time vs. M: Flat trend after warm-up
- Python ≥ 3.10
- Required packages: bitarray, pandas, matplotlib, python-docx, colorlog
Results (graphs and tables) are saved to Analysis/Results/.
Note: Logging slows down compression and decompression time by a large margin.