YouroQNet: Quantum Text Classification with Context Memory

Official implementaion of YouroQNet, a toyish quantum text classifier implemented with pyVQNet and pyQPanda

This repo contains code for the final problem of the OriginQ's 2nd CCF "Pilot Cup" contest (Professional Group - Quantum Machine Learning Track).

Oh yes yes child, we've run a hard time struggling.

The final total score is 79.2, and ranking unknown; but why the fuck you owe me a point of 0.8?? 🐱

And, code repo for the qualifying stage is here: 第二届“司南杯”初赛

Quickstart

⚪ install

conda create -n q python==3.8 (pyvqnet requires Python 3.8)
conda activate q
pip install -r requirements.txt

⚪ for contest problem (👈 Follow this to reproduce our contest results!!)

python answer.py for preprocess & train (⚠ VERY VERY SLOW!!)
python check.py for evaluate

⚪ for quick peek of YouroQNet components

python vis_tokenizer.py for adaptive k-gram tokeinzer interactive demo
python vis_youroqnet.py for YouroQNet interactive demo
- run_quantum_toy.cmd (👈 run the toy version out of box before all)

⚪ for full development

download the full dataset simplifyweibo_4_moods, unzip simplifyweibo_4_moods.csv to data folder
pip install -r requirements_dev.txt for extra dependencies
pushd repo & init_repos.cmd & popd for extra git repos
- fasttext==0.9.2 requires numpy<1.24 (things might changed)
start_shell.cmd to enter deveolp run command env
- start_shell.cmd py to get a ipy console that quick refering to pyvqnet's fucking undocumented-documentation with help()
mk_preprocess.cmd for making clean datasets, stats, plots & vocabs etc... (~7 minutes)
python vis_project.py to see 3d data projection (you will understand what the fuck this dataset is 👿)
run_baseline.cmd to run classic models
run_quantum.cmd to run quantum models

⚠ The training sometimes might fail due to ill random parameter initialization, when trainset loss not tends to decay or quickly go overfit, just kill it & retry 😅

⚪ core idea & contributions

adaptive k-gram tokenizer (see mk_vocab.py, interactivate demo vis_tokenizer.py)
YouroQNet for text clf (see run_quantum.py, interactivate demo vis_youroqnet.py)
theoretical analysis of why & how QNN works (see vis_qc_apriori.py)

ℹ See our PPT YouroQNet.pdf for more conceptual understanding 🎉

Dataset

A subset from simplifyweibo_4_moods: 1600 samples for train, 400 samples for test. Class label names: 0 - joy, 1 - angry, 2 - hate, 3 - sad, however is not very semantically corresponding in the datasets :(

⚠ File naming rule: train.csv is train set, test.csv is valid set, and the generated valid.csv might be the real test set for this contest. We use csv filename to refer to each split in the code

Todo List

Project layout

# meterials
ref/                # thesis for dev
  Question-ML.png   # problem sheet
  YouroQNet.pdf     # solution PPT (YouroQNet)
  init_thesis.cmd   # thesis donwloader
repo/               # git repos for research
  init_repos.cmd    # git repo cloner
  update_repos.cmd
data/               # dataset
  simplifyweibo_4_moods.csv   # raw dataset (manually download)
  train|test.csv    # context dataset
  *_cleaned.csv
  *_tokenized.txt
  cc.zh.300.bin     # FastText pretrained word embedding (auto downloaded)
log/                # outputs
  <analyzer>/       # aka. vocab
    <feature>/      # sklearn models
    <model>/        # vqnet/torch models
tmp/                # generated intermediate results for debug

# contest related
answer.py           # run script for preprocessing & training
check.py            # run script for evalution

# preprocessors
mk_*.py
mk_preprocess.cmd   # run script for mk_*.py

# models
run_baseline_*.py   # classical experiments
run_baseline.cmd    # run script for run_baseline_*.py
run_quantum.py      # quantum experiments
run_quantum.cmd     # run script for run_quantum.py
run_quantum_toy.cmd # toy QNN for debug and verify

# misc
vis_*.py            # intercative demos or debug scaffolds
utils.py            # common utils
start_shell.cmd     # develop env entry

# doc & lic
README.md
TECH.md             # techincal & theoretical stuff
requirements_*.txt
LICESE

ℹ For the contest, only these files are submitted: answer.py, mk_vocab.py, run_quantum.py, utils.py, README.md; it should be enough to run all quantum parts 😀

References

FastText:
- Enriching Word Vectors with Subword Information: https://arxiv.org/abs/1607.04606
- Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759
- repo: https://github.com/facebookresearch/fastText
QNN for text-clf:
- QNLP-DisCoCat: https://arxiv.org/abs/2102.12846
- QSANN: https://arxiv.org/abs/2205.05625
OriginQ: https://originqc.com.cn/index.html
- QPanda: https://github.com/OriginQ/QPanda-2
- pyQPanda: https://pyqpanda-toturial.readthedocs.io/zh/latest/
- pyVQNet: https://vqnet20-tutorial.readthedocs.io/en/main/index.html
QCNN related:
- tensorflow-quantum impl: https://www.tensorflow.org/quantum/tutorials/qcnn
- pytorch + qiskit impl: https://github.com/YPadawan/qiskit-hackathon
- pytorch + pennylane impl: https://github.com/christorange/QC-CNN
Tiny-Q: https://github.com/Kahsolt/Tiny-Q

=> find thesis of related work in ref/init_thesis.cmd
=> find implementations of related work in repo/init_repos.cmd

Citation

If you find this work useful, please give a star ⭐ and cite~ 😃

@misc{kahsolt2023,
  author = {Kahsolt},
  title  = {YouroQNet: Quantum Text Classification with Context Memory},
  howpublished = {\url{https://github.com/Kahsolt/YouroQNet}}
  month  = {May},
  year   = {2023}
}

by Armit 2023/05/03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YouroQNet: Quantum Text Classification with Context Memory

Quickstart

Dataset

Todo List

Project layout

References

Citation

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
data		data
img		img
log/kgram/YouroQ		log/kgram/YouroQ
ref		ref
repo		repo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Score_Sheet.png		Score_Sheet.png
TECH.md		TECH.md
answer.py		answer.py
check.py		check.py
mk_7z.cmd		mk_7z.cmd
mk_data.py		mk_data.py
mk_preprocess.cmd		mk_preprocess.cmd
mk_stats.py		mk_stats.py
mk_vocab.py		mk_vocab.py
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
run_baseline.cmd		run_baseline.cmd
run_baseline_sk.py		run_baseline_sk.py
run_baseline_tc.py		run_baseline_tc.py
run_baseline_vq.py		run_baseline_vq.py
run_quantum.cmd		run_quantum.cmd
run_quantum.py		run_quantum.py
run_quantum_toy.cmd		run_quantum_toy.cmd
start_shell.cmd		start_shell.cmd
utils.py		utils.py
vis_project.py		vis_project.py
vis_qc_apriori.py		vis_qc_apriori.py
vis_qc_enc.py		vis_qc_enc.py
vis_qc_func.py		vis_qc_func.py
vis_qc_grad.py		vis_qc_grad.py
vis_qc_test.py		vis_qc_test.py
vis_tokenizer.py		vis_tokenizer.py
vis_youroqnet.py		vis_youroqnet.py
vis_youroqnet_toy.py		vis_youroqnet_toy.py

License

Kahsolt/YouroQNet

Folders and files

Latest commit

History

Repository files navigation

YouroQNet: Quantum Text Classification with Context Memory

Quickstart

Dataset

Todo List

Project layout

References

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages