Compression of Sequential Values

This repository implements a simple yet effective lossless compression algorithm for sequences of unsigned 8-bit integers and provides tools to evaluate its performance.
The core idea is to compress the differences between consecutive values using variable-length codes, with special handling for zero runs and large differences.
A complementary decoder reconstructs the original sequence, and an analysis module evaluates performance.

🧩 Algorithm Overview

1. Differential Encoding

The first value in the list is written directly as 8 bits.
Each subsequent value is replaced by its difference (diff = current - previous).

2. Token Types

Prefix	Meaning	Details
`00`	Small difference	2–5 bits follow, depending on the range: • 2 bits → differences in [-2, -1] ∪ [1, 2] • 3 bits → [-6, -3] ∪ [3, 6] • 4 bits → [-14, -7] ∪ [7, 14] • 5 bits → [-30, -15] ∪ [15, 30]
`01`	Repetition	3 bits follow → count of consecutive zeroes (1–8 repetitions)
`10`	Absolute encoding	Used when
`11`	Terminator	Marks the end of the sequence

3. Output

All bits are concatenated into a bitarray and converted to bytes for storage or transmission.

4. Decompression

The decoder reads the bit stream, reverses each token type, reconstructs differences,
and rebuilds the original data by cumulative summation.

🧮 Features

✅ Lossless compression/decompression
📦 Variable-length bit encoding for efficiency
🔁 Repetition detection for identical values
⚙️ Automatic absolute fallback for large differences
📊 Integrated analysis module to measure:
- Compression ratio
- Compression time
- Decompression time
🖼️ Graph and table generation for performance visualization

🧰 Repository Structure

Compression-of-sequential-values/
│
├── compression.py            # Main compression logic
├── decompression.py          # Reverse process
├── coding_tables.py          # Precomputed forward/reverse tables
├── logging_configuration.py  # Custom colored logger
├── main.py                   # Entry point and demonstration
│
├── Analysis/
│   ├── performance_analysis.py  # Performance metrics
│   ├── run_analysis.py          # Automated batch testing
│   └── README.md
│
├── Data/
│   ├── compressed.txt
│   ├── decompressed.txt
│   └── README.md
│
└── .gitignore

🧪 Testing Setup

Test Parameters

Sequence length N ∈ {5, 50, 500, 5000}
Maximum adjacent difference M ∈ {5, 10, 15, 30}
Random values in [0, 255]

Recorded Metrics

Metric	Description
Compression ratio	Original size / Compressed size
Compression time	Time taken to encode
Decompression time	Time taken to decode

Observed Trends:

Smaller M → better compression (up to ≈1.3 ratio for long sequences)
Larger M → ratio approaches 1.0 or below (possible expansion)
Time scales roughly linearly with N

📈 Example Outputs

Compression Ratio vs. M: Highest efficiency at small M
Compression Time vs. M: Minor variation; first run slower due to setup
Decompression Time vs. M: Flat trend after warm-up

⚙️ Requirements

Python ≥ 3.10
Required packages: bitarray, pandas, matplotlib, python-docx, colorlog

Results (graphs and tables) are saved to Analysis/Results/.

Note: Logging slows down compression and decompression time by a large margin.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Compression of Sequential Values

🧩 Algorithm Overview

1. Differential Encoding

2. Token Types

3. Output

4. Decompression

🧮 Features

🧰 Repository Structure

🧪 Testing Setup

Test Parameters

Recorded Metrics

📈 Example Outputs

⚙️ Requirements

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Analysis		Analysis
Data		Data
Test		Test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
coding_tables.py		coding_tables.py
compression.py		compression.py
decompression.py		decompression.py
logging_configuration.py		logging_configuration.py
main.py		main.py

License

8Altair/Compression-of-sequential-values

Folders and files

Latest commit

History

Repository files navigation

Compression of Sequential Values

🧩 Algorithm Overview

1. Differential Encoding

2. Token Types

3. Output

4. Decompression

🧮 Features

🧰 Repository Structure

🧪 Testing Setup

Test Parameters

Recorded Metrics

📈 Example Outputs

⚙️ Requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages