Extremely fast time series downsampling 📈 for visualization, written in Rust.
- Fast: written in rust with PyO3 bindings
- leverages optimized argminmax - which is SIMD accelerated with runtime feature detection
- scales linearly with the number of data points
 - multithreaded with Rayon (in Rust)
Why we do not use Python multiprocessingCiting the PyO3 docs on parallelism:
 CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing. In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with Rayon) with little to no overhead.
 
- Efficient: memory efficient
- works on views of the data (no copies)
- no intermediate data structures are created
 
- Flexible: works on any type of data
- supported datatypes are
- for x:f32,f64,i16,i32,i64,u16,u32,u64,datetime64,timedelta64
- for y:f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64,datetime64,timedelta64,bool
 !! 🚀In contrast with all other data types above,f16argminmax is 200-300x faster than numpyf16is *not* hardware supported (i.e., no instructions for f16) by most modern CPUs!!
 🐌 Programming languages facilitate support for this datatype by either (i) upcasting to f32 or (ii) using a software implementation.
 💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a symmetrical ordinal mapping fromf16toi16is sufficient. This mapping allows to use the hardware supported scalar and SIMDi16instructions - while not producing any memory overhead 🎉
 More details are described in argminmax PR #1.
- for 
 
- supported datatypes are
- Easy to use: simple & flexible API
pip install tsdownsamplefrom tsdownsample import MinMaxLTTBDownsampler
import numpy as np
# Create a time series
y = np.random.randn(10_000_000)
x = np.arange(len(y))
# Downsample to 1000 points (assuming constant sampling rate)
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)
# Select downsampled data
downsampled_y = y[s_ds]
# Downsample to 1000 points using the (possible irregularly spaced) x-data
s_ds = MinMaxLTTBDownsampler().downsample(x, y, n_out=1000)
# Select downsampled data
downsampled_x = x[s_ds]
downsampled_y = y[s_ds]Each downsampling algorithm is implemented as a class that implements a downsample method.
The signature of the downsample method:
downsample([x], y, n_out, **kwargs) -> ndarray[uint64]
Arguments:
- xis optional
- xand- yare both positional arguments
- n_outis a mandatory keyword argument that defines the number of output values*
- **kwargsare optional keyword arguments (see table below):- parallel: whether to use multi-threading (default:- False)
 ❗ The max number of threads can be configured with the- TSDOWNSAMPLE_MAX_THREADSENV var (e.g.- os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4")
- ...
 
Returns: a ndarray[uint64] of indices that can be used to index the original data.
*When there are gaps in the time series, fewer than n_out indices may be returned.
The following downsampling algorithms (classes) are implemented:
| Downsampler | Description | **kwargs | 
|---|---|---|
| MinMaxDownsampler | selects the min and max value in each bin | parallel | 
| M4Downsampler | selects the min, max, first and last value in each bin | parallel | 
| LTTBDownsampler | performs the Largest Triangle Three Buckets algorithm | parallel | 
| MinMaxLTTBDownsampler | (new two-step algorithm 🎉) first selects n_out*minmax_ratiomin and max values, then further reduces these ton_outvalues using the Largest Triangle Three Buckets algorithm | parallel,minmax_ratio* | 
*Default value for minmax_ratio is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332
This library supports two NaN-policies:
- Omit NaNs (NaNs are ignored during downsampling).
- Return index of first NaNonce there is at least one present in the bin of the considered data.
| Omit NaNs | Return NaNs | 
|---|---|
| MinMaxDownsampler | NaNMinMaxDownsampler | 
| M4Downsampler | NaNM4Downsampler | 
| MinMaxLTTBDownsampler | NaNMinMaxLTTBDownsampler | 
| LTTBDownsampler | 
Note that NaNs are not supported for
x-data.
Assumes;
- x-data is (non-strictly) monotonic increasing (i.e., sorted)
- no NaNs inx-data
👤 Jeroen Van Der Donckt