arxiv-classifier

How to Train

nb = ArticleClassifier(dbpath='/path/to/db')
fn = nb.create_input_file(metadata)
nb.train(fn)
nb.save()

How to Classify

nb = ArticleClassifier(dbpath='/path/to/db')
nb.load()
fn = nb.create_input_file(metadata)
classes = nb.classify(fn)

In these examples, metadata should be a List of Dict where the Dict are in the format given below. The file name paths are relative to the current machine (on the local file system):

{
  "id": "1704.00222",
  "categories": ["cs.NM", "hep-th"],
  "filename": "/path/to/file.txt"
}

Trained Models

Trained model for use by arXiv staff can be found at s3://arxiv-classifier-models

ULMFiT classifier

Training

See experiments directory for training and evaluation notebooks.

Models

The ULMFiT and SentencePiece model files can be downloaded here. Make sure CLASSIFIER_PATH configuration parameter points to models/abstracts-classifier.pkl and that CLASSIFIER_TYPE equals ulmfit.

Testing

To test the service locally you can run it with

FLASK_APP=classifier.test_app flask run --port 9999

and make a request:

curl -s -H "Content-Type: application/json" -X POST http://localhost:9999/classify \
    --data '{"title":"P = NP", "abstract": "We prove that P = NP for N = 1 or P = 0.", "primary": "cs.SE"}'

[{"category":"cs.CC","probability":0.8264293074607849},{"category":"cs.DS","probability":0.1285623162984848},...]

The primary is optional.

Both the input and output format are not yet compatible with the Naive Bayes classifier.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
classifier		classifier
deploy		deploy
experiments		experiments
tests		tests
.gcloudignore		.gcloudignore
.gitignore		.gitignore
.pylintrc		.pylintrc
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
arxiv-classifier.service		arxiv-classifier.service
classifier-gunicorn.sh		classifier-gunicorn.sh
classifier-service.sh		classifier-service.sh
cloudbuild.yaml		cloudbuild.yaml
mypy.ini		mypy.ini
requirements.txt		requirements.txt
service-file.txt		service-file.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

arxiv-classifier

How to Train

How to Classify

Trained Models

ULMFiT classifier

Training

Models

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

arXiv/arxiv-classifier

Folders and files

Latest commit

History

Repository files navigation

arxiv-classifier

How to Train

How to Classify

Trained Models

ULMFiT classifier

Training

Models

Testing

About

Topics

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages