-
Notifications
You must be signed in to change notification settings - Fork 0
Training Process
Raj Nath Patel edited this page Jun 9, 2017
·
10 revisions
| Switch | Description |
|---|---|
| --use_model | To specify which RNN variants to be used in the training process. Currently, available models are GRU and LSTM. In future, we will be adding other variants also like Deep-LSTM, Simple RNN etc. |
| --load_data | This is useful when you have prepared data for training. |
| --data_train | This switch is used to specify the training data. For quality estimation, this could be the source, target, and alignment files as shared in WMT shared-task. For tagging, it could be source text only. |
| --data_train_y | This is used to specify the labels. In QE task, this should be the word level quality tags. In pos tagging, this should be the word level part-of-speech tags. |
| --data_test | Similar to --data_train switch |
| --data_test_y | Similar to --data_train_y |
| --data_valid | Similar to --data_train |
| --data_valid_y | Similar to --data_train_y |
| --dictionaries | To specify the word dictionaries. This must be a JSON file containing word and corresponding integer id. For the bilingual model of QE task, one must provide both source and target dictionaries. |
| --label2indx | This is used to specifying the system output labels. It similar to the --dictionaries switch except it is for the output tags. |
| --use_char | To enable the models to use word character feature |
| --character2index | Character level dictionary similar to word dictionaries. This is required when you are using --use_char switch |
| --pretrain | Enable the switch if you have pre-trained word embedding |
| --embeddings | Provide trained word embeddings. Currently, the system accepts only word2vec (text) trained models. |