Skip to content

speakeztech/WhisperFS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

WhisperFS

An F# library providing streaming-capable bindings to whisper.cpp, designed from the ground up to support both real-time transcription and batch processing scenarios.

Features

  • 🎯 Comparable to Whisper.NET - Push-to-talk batch support with enhanced streaming capabilities
  • πŸš€ True Streaming Support - Real-time transcription using whisper.cpp's state management
  • πŸ”§ Unified API - Single IWhisperClient interface for both batch and streaming modes
  • πŸ“Š Token-Level Access - Confidence scores and timestamps for fine-grained control
  • 🌍 Language Detection - Automatic language identification with confidence scores
  • πŸ’ͺ Platform Optimized - Automatic GPU detection (CUDA, OpenCL, CoreML) with CPU fallback
  • πŸ”· F# Idiomatic - Leverages discriminated unions, async workflows, and observables
  • ⚑ Zero-Copy Operations - Efficient memory management for audio buffers
  • πŸ”„ Robust Error Handling - Result types with comprehensive error discrimination

Installation

dotnet add package WhisperFS

Native Runtime Dependencies

WhisperFS automatically downloads and manages the appropriate native runtime for your platform:

  • Windows: CUDA, OpenCL, AVX2, AVX, or CPU variants
  • macOS: CoreML optimized, OpenCL, or CPU variants
  • Linux: CUDA, OpenCL, or CPU variants

For detailed GPU acceleration support including OpenCL for AMD/Intel GPUs, see Native Libraries Documentation.

Quick Start

Batch Transcription (PTT Mode)

open WhisperFS

// Build a client with fluent configuration
let! clientResult =
    WhisperBuilder()
        .WithModel(ModelType.Base)
        .WithLanguage("en")
        .WithGpu()
        .Build()

match clientResult with
| Ok client ->
    // Process audio file
    let! result = client.ProcessFileAsync("audio.wav")

    match result with
    | Ok transcription ->
        printfn "Text: %s" transcription.FullText
        printfn "Duration: %A" transcription.Duration

        // Access segments with timestamps
        for segment in transcription.Segments do
            printfn "[%.2f-%.2f] %s"
                segment.StartTime
                segment.EndTime
                segment.Text
    | Error err ->
        printfn "Transcription failed: %A" err

| Error err ->
    printfn "Failed to create client: %A" err

Streaming Transcription

open WhisperFS
open System.Reactive.Linq

// Create streaming client
let! clientResult =
    WhisperBuilder()
        .WithModel(ModelType.Base)
        .WithStreaming(chunkMs = 1000, overlapMs = 200)
        .WithTokenTimestamps()
        .Build()

match clientResult with
| Ok client ->
    // Create audio source (e.g., from microphone)
    let audioSource = AudioCapture.CreateMicrophone(sampleRate = 16000)

    // Process stream with real-time updates
    client.ProcessStream(audioSource)
    |> Observable.subscribe (function
        | PartialTranscription(text, tokens, confidence) ->
            printfn "Partial: %s (confidence: %.2f)" text confidence

        | FinalTranscription(text, tokens, segments) ->
            printfn "Final: %s" text

        | ProcessingError msg ->
            printfn "Error: %s" msg

        | _ -> ())
    |> ignore

| Error err ->
    printfn "Failed to create streaming client: %A" err

Language Detection

let! detection = client.DetectLanguageAsync(audioSamples)

match detection with
| Ok lang ->
    printfn "Detected language: %s (confidence: %.2f)"
        lang.Language
        lang.Confidence

    // Access probabilities for all languages
    for KeyValue(language, probability) in lang.Probabilities do
        if probability > 0.01f then
            printfn "  %s: %.2f%%" language (probability * 100.0f)

| Error err ->
    printfn "Language detection failed: %A" err

Advanced Configuration

let! client =
    WhisperBuilder()
        .WithModel(ModelType.LargeV3)
        .WithLanguageDetection()           // Auto-detect language
        .WithBeamSearch(beamSize = 5)      // Better accuracy
        .WithTemperature(0.0f)              // Deterministic output
        .WithPrompt("Technical terms: API, GPU, CPU, RAM")
        .WithTokenTimestamps()              // Enable token-level timestamps
        .WithMaxSegmentLength(30)           // Segment length in seconds
        .WithThreads(8)                     // Parallel processing
        .Build()

API Reference

Core Types

type TranscriptionEvent =
    | PartialTranscription of text:string * tokens:Token list * confidence:float32
    | FinalTranscription of text:string * tokens:Token list * segments:Segment list
    | ContextUpdate of contextData:byte[]
    | ProcessingError of error:string

type IWhisperClient =
    abstract member ProcessAsync: samples:float32[] -> Async<Result<TranscriptionResult, WhisperError>>
    abstract member ProcessStream: audioStream:IObservable<float32[]> -> IObservable<TranscriptionEvent>
    abstract member ProcessFileAsync: path:string -> Async<Result<TranscriptionResult, WhisperError>>
    abstract member DetectLanguageAsync: samples:float32[] -> Async<Result<LanguageDetection, WhisperError>>
    abstract member Reset: unit -> unit
    abstract member StreamingMode: bool with get, set

Error Handling

type WhisperError =
    | ModelLoadError of message:string
    | ProcessingError of code:int * message:string
    | InvalidAudioFormat of message:string
    | StateError of message:string
    | NativeLibraryError of message:string
    | TokenizationError of message:string
    | OutOfMemory
    | Cancelled

Migration from Whisper.NET

WhisperFS provides full backward compatibility with Whisper.NET through the IWhisperProcessor interface:

// Existing Whisper.NET code
let processor = whisperFactory.CreateBuilder()
    .WithLanguage("en")
    .Build()
let! result = processor.ProcessAsync(audioFile)

// WhisperFS - identical API
let processor = whisperFactory.CreateBuilder()
    .WithLanguage("en")
    .Build()
let! result = processor.ProcessAsync(audioFile)

Enhanced Features Beyond Whisper.NET

Feature Whisper.NET WhisperFS
Streaming ❌ βœ… Real-time with state management
Token Confidence ❌ βœ… Per-token probabilities
Language Detection ❌ βœ… With confidence scores
Custom Prompts ❌ βœ… Context hints for technical terms
Beam Search ❌ βœ… Configurable parameters
Error Handling βž– Exceptions βœ… Result types
Observables ❌ βœ… Reactive extensions

Building from Source

# Clone the repository
git clone https://github.com/yourusername/WhisperFS.git
cd WhisperFS

# Build the solution
dotnet build

# Run tests
dotnet test

# Pack NuGet packages
dotnet pack -c Release

Performance

WhisperFS is designed for optimal performance:

  • Memory Efficient: Streaming processes audio in chunks, not loading entire files
  • Platform Optimized: Automatically uses GPU acceleration when available
  • Parallel Processing: Configurable thread count for CPU processing
  • Zero-Copy: Direct memory access for native interop

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Support

WhisperFS Banner

About

Whisper.cpp wrapper with streaming support

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published