WhisperFS

An F# library providing streaming-capable bindings to whisper.cpp, designed from the ground up to support both real-time transcription and batch processing scenarios.

Features

🎯 Comparable to Whisper.NET - Push-to-talk batch support with enhanced streaming capabilities
🚀 True Streaming Support - Real-time transcription using whisper.cpp's state management
🔧 Unified API - Single IWhisperClient interface for both batch and streaming modes
📊 Token-Level Access - Confidence scores and timestamps for fine-grained control
🌍 Language Detection - Automatic language identification with confidence scores
💪 Platform Optimized - Automatic GPU detection (CUDA, OpenCL, CoreML) with CPU fallback
🔷 F# Idiomatic - Leverages discriminated unions, async workflows, and observables
⚡ Zero-Copy Operations - Efficient memory management for audio buffers
🔄 Robust Error Handling - Result types with comprehensive error discrimination

Installation

dotnet add package WhisperFS

Native Runtime Dependencies

WhisperFS automatically downloads and manages the appropriate native runtime for your platform:

Windows: CUDA, OpenCL, AVX2, AVX, or CPU variants
macOS: CoreML optimized, OpenCL, or CPU variants
Linux: CUDA, OpenCL, or CPU variants

For detailed GPU acceleration support including OpenCL for AMD/Intel GPUs, see Native Libraries Documentation.

Quick Start

Batch Transcription (PTT Mode)

open WhisperFS

// Build a client with fluent configuration
let! clientResult =
    WhisperBuilder()
        .WithModel(ModelType.Base)
        .WithLanguage("en")
        .WithGpu()
        .Build()

match clientResult with
| Ok client ->
    // Process audio file
    let! result = client.ProcessFileAsync("audio.wav")

    match result with
    | Ok transcription ->
        printfn "Text: %s" transcription.FullText
        printfn "Duration: %A" transcription.Duration

        // Access segments with timestamps
        for segment in transcription.Segments do
            printfn "[%.2f-%.2f] %s"
                segment.StartTime
                segment.EndTime
                segment.Text
    | Error err ->
        printfn "Transcription failed: %A" err

| Error err ->
    printfn "Failed to create client: %A" err

Streaming Transcription

open WhisperFS
open System.Reactive.Linq

// Create streaming client
let! clientResult =
    WhisperBuilder()
        .WithModel(ModelType.Base)
        .WithStreaming(chunkMs = 1000, overlapMs = 200)
        .WithTokenTimestamps()
        .Build()

match clientResult with
| Ok client ->
    // Create audio source (e.g., from microphone)
    let audioSource = AudioCapture.CreateMicrophone(sampleRate = 16000)

    // Process stream with real-time updates
    client.ProcessStream(audioSource)
    |> Observable.subscribe (function
        | PartialTranscription(text, tokens, confidence) ->
            printfn "Partial: %s (confidence: %.2f)" text confidence

        | FinalTranscription(text, tokens, segments) ->
            printfn "Final: %s" text

        | ProcessingError msg ->
            printfn "Error: %s" msg

        | _ -> ())
    |> ignore

| Error err ->
    printfn "Failed to create streaming client: %A" err

Language Detection

let! detection = client.DetectLanguageAsync(audioSamples)

match detection with
| Ok lang ->
    printfn "Detected language: %s (confidence: %.2f)"
        lang.Language
        lang.Confidence

    // Access probabilities for all languages
    for KeyValue(language, probability) in lang.Probabilities do
        if probability > 0.01f then
            printfn "  %s: %.2f%%" language (probability * 100.0f)

| Error err ->
    printfn "Language detection failed: %A" err

Advanced Configuration

let! client =
    WhisperBuilder()
        .WithModel(ModelType.LargeV3)
        .WithLanguageDetection()           // Auto-detect language
        .WithBeamSearch(beamSize = 5)      // Better accuracy
        .WithTemperature(0.0f)              // Deterministic output
        .WithPrompt("Technical terms: API, GPU, CPU, RAM")
        .WithTokenTimestamps()              // Enable token-level timestamps
        .WithMaxSegmentLength(30)           // Segment length in seconds
        .WithThreads(8)                     // Parallel processing
        .Build()

API Reference

Core Types

type TranscriptionEvent =
    | PartialTranscription of text:string * tokens:Token list * confidence:float32
    | FinalTranscription of text:string * tokens:Token list * segments:Segment list
    | ContextUpdate of contextData:byte[]
    | ProcessingError of error:string

type IWhisperClient =
    abstract member ProcessAsync: samples:float32[] -> Async<Result<TranscriptionResult, WhisperError>>
    abstract member ProcessStream: audioStream:IObservable<float32[]> -> IObservable<TranscriptionEvent>
    abstract member ProcessFileAsync: path:string -> Async<Result<TranscriptionResult, WhisperError>>
    abstract member DetectLanguageAsync: samples:float32[] -> Async<Result<LanguageDetection, WhisperError>>
    abstract member Reset: unit -> unit
    abstract member StreamingMode: bool with get, set

Error Handling

type WhisperError =
    | ModelLoadError of message:string
    | ProcessingError of code:int * message:string
    | InvalidAudioFormat of message:string
    | StateError of message:string
    | NativeLibraryError of message:string
    | TokenizationError of message:string
    | OutOfMemory
    | Cancelled

Migration from Whisper.NET

WhisperFS provides full backward compatibility with Whisper.NET through the IWhisperProcessor interface:

// Existing Whisper.NET code
let processor = whisperFactory.CreateBuilder()
    .WithLanguage("en")
    .Build()
let! result = processor.ProcessAsync(audioFile)

// WhisperFS - identical API
let processor = whisperFactory.CreateBuilder()
    .WithLanguage("en")
    .Build()
let! result = processor.ProcessAsync(audioFile)

Enhanced Features Beyond Whisper.NET

Feature	Whisper.NET	WhisperFS
Streaming	❌	✅ Real-time with state management
Token Confidence	❌	✅ Per-token probabilities
Language Detection	❌	✅ With confidence scores
Custom Prompts	❌	✅ Context hints for technical terms
Beam Search	❌	✅ Configurable parameters
Error Handling	➖ Exceptions	✅ Result types
Observables	❌	✅ Reactive extensions

Building from Source

# Clone the repository
git clone https://github.com/yourusername/WhisperFS.git
cd WhisperFS

# Build the solution
dotnet build

# Run tests
dotnet test

# Pack NuGet packages
dotnet pack -c Release

Performance

WhisperFS is designed for optimal performance:

Memory Efficient: Streaming processes audio in chunks, not loading entire files
Platform Optimized: Automatically uses GPU acceleration when available
Parallel Processing: Configurable thread count for CPU processing
Zero-Copy: Direct memory access for native interop

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

whisper.cpp - The underlying C++ implementation
Whisper.NET - Inspiration for .NET bindings
OpenAI Whisper - The original Whisper model

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.claude		.claude
.github/workflows		.github/workflows
docs		docs
img		img
scripts		scripts
src		src
tests/WhisperFS.Tests		tests/WhisperFS.Tests
.gitignore		.gitignore
BUILD.md		BUILD.md
Directory.Build.props		Directory.Build.props
LICENSE		LICENSE
README.md		README.md
WhisperFS.sln		WhisperFS.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhisperFS

Features

Installation

Native Runtime Dependencies

Quick Start

Batch Transcription (PTT Mode)

Streaming Transcription

Language Detection

Advanced Configuration

API Reference

Core Types

Error Handling

Migration from Whisper.NET

Enhanced Features Beyond Whisper.NET

Building from Source

Performance

Contributing

License

Acknowledgments

Support

About

Uh oh!

Releases

Packages

Languages

License

speakeztech/WhisperFS

Folders and files

Latest commit

History

Repository files navigation

WhisperFS

Features

Installation

Native Runtime Dependencies

Quick Start

Batch Transcription (PTT Mode)

Streaming Transcription

Language Detection

Advanced Configuration

API Reference

Core Types

Error Handling

Migration from Whisper.NET

Enhanced Features Beyond Whisper.NET

Building from Source

Performance

Contributing

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages