This repository contains the models and training scripts used in the papers: "LSTMs for Keyword Spotting with ReRAM-based Compute-In-Memory Architectures" (ISCAS 2021).
- python 3.7
- python packages: argparse, uuid, time, itertools
- NumPy
- PyTorch and torchaudio
- Matplotlib
python KWS_LSTM.py
| Argument | Parameter Name | Description |
|---|---|---|
| -h, --help | show help message and exit | |
| --random-seed | RANDOM_SEED | Random Seed (default: 80085) |
| --method | METHOD | Method: 0 - blocks, 1 - orthogonality, 2 - mix (default: 1) |
| --dataset-path-train | DATASET_PATH_TRAIN | Path to Dataset (default: data.nosync/speech_commands_v0.02) |
| --dataset-path-test | DATASET_PATH_TEST | Path to Dataset (default: data.nosync/speech_commands_test_set_v0.02) |
| --word-list | WORD_LIST [WORD_LIST ...] | Keywords to be learned (default: ['yes', 'no', 'up', 'down', 'left', 'right', 'on', 'off', 'stop', 'go', 'unknown', 'silence']) |
| --batch-size | BATCH_SIZE | Batch Size (default: 100) |
| --training-steps | TRAINING_STEPS | Training Steps (default: 10000,10000,200) |
| --learning-rate | LEARNING_RATE | Learning Rate (default: 0.002,0.0005,0.00008) |
| --finetuning-epochs | FINETUNING_EPOCHS | Number of epochs for finetuning (default: 10000) |
| --dataloader-num-workers | DATALOADER_NUM_WORKERS | Number Workers Dataloader (default: 8) |
| --validation-percentage | VALIDATION_PERCENTAGE | Validation Set Percentage (default: 10) |
| --testing-percentage | TESTING_PERCENTAGE | Testing Set Percentage (default: 10) |
| --sample-rate | SAMPLE_RATE | Audio Sample Rate (default: 16000) |
| --canonical-testing | CANONICAL_TESTING | Whether to use the canoncial test data (0 non canoncial, 1 canoncial. (default: 0) |
| --background-volume | BACKGROUND_VOLUME | How loud the background noise should be, between 0 and 1. (default: 0.1) |
| --background-frequency | BACKGROUND_FREQUENCY | How many of the training samples have background noise mixed in. (default: 0.8) |
| --silence-percentage | SILENCE_PERCENTAGE | How much of the training data should be silence. (default: 0.1) |
| --unknown-percentage | UNKNOWN_PERCENTAGE | How much of the training data should be unknown words. (default: 0.1) |
| --time-shift-ms | TIME_SHIFT_MS | Range to randomly shift the training audio by in time. (default: 100.0) |
| --win-length | WIN_LENGTH | Window size in ms (default: 641) |
| --hop-length | HOP_LENGTH | Length of hop between STFT windows (default: 320) |
| --hidden | HIDDEN | Number of hidden LSTM units (default: 108) |
| --n-mfcc | N_MFCC | Number of mfc coefficients to retain (default: 40) |
| --noise-injectionT | NOISE_INJECTIONT | Percentage of noise injected to weights (default: 0.0) |
| --noise-injectionI | NOISE_INJECTIONI | Percentage of noise injected to weights (default: 0.1) |
| --quant-actMVM | QUANT_ACTMVM | Bits available for MVM activations/state (default: 6) |
| --quant-actNM | QUANT_ACTNM | Bits available for non-MVM activations/state (default: 8) |
| --quant-inp | QUANT_INP | Bits available for inputs (default: 4) |
| --quant-w | QUANT_W | Bits available for weights (default: None) |
| --l2 | L2 | Strength of L2 norm (default: 0.01) |
| --n-msb | N_MSB | Number of blocks available (default: 4) |
| --cs | CS | Strength cosine similarity penalization (default: 0.1) |
| --max-w | MAX_W | Maximumg weight (default: 0.2) |
| --drop-p | DROP_P | Dropconnect probability (default: 0.125) |
| --pact-a | PACT_A | Whether scaling parameter is trainable (1:on,0:off) (default: 1) |
| --rows-bias | ROWS_BIAS | How many rows for the bias (default: 6) |
| --gain-blocks | GAIN_BLOCKS | Fox mixed method, how many parallel blocks (default: 2) |
