- 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
ContextWindowHelper.cs
        suncloudsmoon edited this page Feb 4, 2025 
        ·
        1 revision
      
    The ContextWindowHelper class provides helper methods to:
- Determine the context window size (number of tokens) for various language models.
- Split text into token-based chunks.
- Convert between character counts and token counts using simple heuristics.
Supported model providers include OpenAI and Ollama.
- OpenAI: Uses pre-defined dictionaries for known model identifiers and aliases.
- Ollama: Retrieves model details from a remote API endpoint.
- 
Namespace: LLMHelperFunctions
- Purpose: Offers functions to obtain model context window sizes and to process text for tokenization.
- 
Enum: ModelProvider
 Identifiers for supported model providers:- OpenAI
- Ollama
 
- 
Method: GetContextWindow(ModelProvider provider, Uri endpoint, string model)
 Asynchronously retrieves the context window size (in tokens) for the specified model.- 
Parameters:
- 
provider: The model provider (e.g.ModelProvider.OpenAIorModelProvider.Ollama).
- 
endpoint: The endpoint URI for accessing model information (used for Ollama).
- 
model: The identifier or alias for the model.
 
- 
- Returns: A task that resolves to an integer representing the context window.
- 
Exceptions:
- 
ArgumentExceptionif the OpenAI model is unknown.
- 
OllamaExceptionif the context window cannot be retrieved from an Ollama provider.
- 
NotImplementedExceptionif the specified provider is unsupported.
 
- 
 
- 
Parameters:
- 
Method: Chunkify(string content, int numTokens)
 Splits a string into chunks based on an estimated number of tokens.- 
Parameters:
- 
content: The text to be split.
- 
numTokens: The target number of tokens per chunk.
 
- 
- Returns: An enumerable of string chunks.
 
- 
Parameters:
- 
Method: CharToTokenCount(int charCount)
 Estimates token count based on the given character count.- Note: Uses a heuristic of roughly 1 token per 4 characters.
 
- 
Method: TokenToCharCount(int tokenCount)
 Estimates character count from a token count.- Note: Uses a heuristic of roughly 4 characters per token.
 
- 
Class: ContextLenCacheSystem
 Provides a simple caching mechanism for context window values, organized by provider and model name.- 
Methods:
- 
Cache(ModelProvider provider, string model, int contextWindow): Caches the context window value.
- 
TryGetContextWindow(ModelProvider provider, string model, out int contextWindow): Attempts to retrieve a cached value.
- 
CheckModelProviderValidity(ModelProvider provider): Ensures that only supported providers are used.
 
- 
 
- 
Methods:
using System;
using System.Threading.Tasks;
using LLMHelperFunctions;
public class Example
{
    public async Task Run()
    {
        // Example endpoint URI (required for Ollama calls)
        Uri endpoint = new Uri("https://your-ollama-api-endpoint.com/");
        string modelName = "gpt-4"; // Can also be an alias
        // Retrieve context window size for an OpenAI model
        int contextWindow = await ContextWindowHelper.GetContextWindow(
            ContextWindowHelper.ModelProvider.OpenAI,
            endpoint,
            modelName
        );
        Console.WriteLine($"Context window size: {contextWindow} tokens");
    }
}