-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Take the output logits and decode the model’s prediction. Apply a softmax to the logits to obtain a probability distribution (or for simplicity, you can directly pick the argmax as the predicted token for greedy decoding). Convert the selected token ID back to a text string using the tokenizer from step 4. If the goal is to generate multi-token outputs (as is typical in language model inference), implement a generation loop: append the predicted token to the input sequence, and feed the last