From 3690d7451763070b8c30486b532e07164b6e0d37 Mon Sep 17 00:00:00 2001 From: karamouche Date: Thu, 23 Oct 2025 11:19:21 +0200 Subject: [PATCH 1/3] update gladia documentation --- fern/customization/multilingual.mdx | 2 +- fern/providers/transcriber/gladia.mdx | 181 +++++++++++--------------- 2 files changed, 74 insertions(+), 109 deletions(-) diff --git a/fern/customization/multilingual.mdx b/fern/customization/multilingual.mdx index cfef4bb88..25b7732fa 100644 --- a/fern/customization/multilingual.mdx +++ b/fern/customization/multilingual.mdx @@ -460,10 +460,10 @@ Validate your configuration with different languages and scenarios. |----------|---------------------|-----------|-------| | **Deepgram** | ✅ Full auto-detection | 100+ | **Recommended**: Nova 2/Nova 3 with "Multi" language setting | | **Google STT** | ✅ Full auto-detection | 125+ | Latest models with "Multilingual" language setting | +| **Gladia** | ✅ Full auto-detection | 110+ | Supports automatic language detection and code-switching | | **Assembly AI** | ❌ English only | English | No multilingual support | | **Azure STT** | ❌ Single language | 100+ | Many languages, but no auto-detection | | **OpenAI Whisper** | ❌ Single language | 90+ | Many languages, but no auto-detection | -| **Gladia** | ✅ Full auto-detection | 110+ | Supports automatic language detection and code-switching | | **Speechmatics** | ❌ Single language | 50+ | Many languages, but no auto-detection | | **Talkscriber** | ❌ Single language | 40+ | Many languages, but no auto-detection | diff --git a/fern/providers/transcriber/gladia.mdx b/fern/providers/transcriber/gladia.mdx index 12780e10f..89fa6eac1 100644 --- a/fern/providers/transcriber/gladia.mdx +++ b/fern/providers/transcriber/gladia.mdx @@ -1,111 +1,76 @@ --- -title: Gladia -subtitle: What is Gladia? -slug: providers/transcriber/gladia +title: Gladia +slug: providers/transcriber/gladia --- - -**What is Gladia?** - -Gladia is an advanced AI platform specializing in real-time transcription, translation, and audio intelligence. By leveraging state-of-the-art ASR (Automatic Speech Recognition), NLP (Natural Language Processing), and GenAI (Generative AI) models, Gladia helps businesses extract valuable insights from unstructured audio data. Their enterprise-grade API offers scalable, secure, and efficient solutions for various applications, from virtual meetings to customer service. - - -**The Evolution of AI Transcription:** - -AI transcription has significantly evolved, moving from basic speech recognition systems to advanced platforms capable of real-time transcription, translation, and audio intelligence. Innovations in machine learning and natural language processing have enhanced accuracy and efficiency. Gladia utilizes these advancements to deliver top-tier transcription services tailored for modern business needs. - -**Overview of Gladia’s Offerings:** - -Gladia provides a comprehensive suite of AI-driven tools: - - -**Speech-to-Text:** - -Gladia’s core offering is its AI-powered speech-to-text technology, delivering highly accurate and real-time transcription. This service supports automatic language detection (including code‑switching within a conversation) and 90+ languages, and includes speaker diarization. - -**Audio Intelligence:** - -Gladia’s audio intelligence add-ons offer features like summarization, chapterization, and sentiment analysis, providing deeper insights into audio data. - -**API:** - -Gladia’s robust API allows seamless integration of speech-to-text capabilities into applications, ensuring low latency and high availability. - -**AI Transcription Technology:** - -Gladia’s AI transcription technology offers several key features and benefits: - -**Features:** - -- High Accuracy: Industry-leading transcription accuracy. -- Real-time and Async Transcription: Instantaneous and batch processing options. -- Multilingual Support: Supports transcription and translation in 99 languages. - -**Benefits:** - -- Efficiency: Reduces the time needed for transcription and analysis. -- Scalability: Handles large volumes of data efficiently. -- Cost-Effective: Provides high performance at a competitive cost. - -**Real-time Transcription and Translation:** - -Gladia excels in providing real-time transcription and translation: - - -**Multilingual Support:** - -- Automatic language recognition: Detects the spoken language automatically and handles code‑switching -- 90+ languages: Supports a wide range of languages and dialects -- Real-time Translation: Near-instantaneous translation for diverse applications - -**Use Cases:** - -- Virtual Meetings: Provides real-time transcriptions, note-taking, and video captions. -- Content Creation: Transcribes and translates videos and podcasts for global audiences. - -**Developer API:** - -Gladia offers a comprehensive API for easy integration: - -**Integration:** - -- SDKs: Available for multiple programming languages. -- Comprehensive Documentation: Detailed guides and support for seamless implementation. - -**Use Cases:** - -- Application Development: Enhance applications with advanced AI capabilities. -- Business Solutions: Improve operational efficiency and customer service. - -**Use Cases for Gladia:** - -Gladia supports a wide range of applications: - -**Content Creation:** - -Enhance content creation with high-quality transcription, translation, and subtitling. - - -**Customer Service:** - -Improve customer service with accurate call transcriptions and emotion detection. - -**Market Research:** - -Gain valuable insights into market trends and customer preferences through advanced speech analysis. - -**Impact on Business Operations:** - -Gladia is revolutionizing business operations by providing tools that enhance productivity and insights. By automating transcription and audio intelligence, businesses can focus on innovation and strategy rather than manual processes. - -**Innovation and Research:** - -Gladia is committed to continuous innovation and research in AI transcription. Their team of experts focuses on advancing the capabilities of ASR and NLP technologies, exploring new applications, and refining existing tools to stay at the forefront of the industry. - -**AI Safety and Ethics:** - -Ensuring the ethical use of AI is a core principle at Gladia. They implement robust safeguards to prevent misuse of their technology and are actively involved in promoting responsible AI development. Protecting user data and maintaining transparency in AI operations are central to their mission. - -**Integrations and Compatibility:** - -Gladia’s API allows seamless integration with various platforms and applications. This ensures that users can incorporate Gladia’s AI capabilities into their existing systems effortlessly, enhancing functionality and improving user experience. \ No newline at end of file +## What is Gladia? +Gladia is a state-of-the-art audio transcription and intelligence platform. It provides **real-time** speech-to-text processing for audio and video, and layers on advanced audio-intelligence tools that let businesses convert unstructured audio into actionable insights. +Their product is built to integrate easily and scale, enabling companies to focus on building features rather than transcription infrastructure. +Try Gladia on their [playground](https://app.gladia.io/?utm_source=vapi) to get a feel for the product! + +## The Evolution of AI Transcription +Transcription technology has progressed from simple speech recognition systems to full-blown platforms that handle real-time streaming, multilingual audio, code-switching, speaker-diarization, and deep analytics. Gladia's technology reflects this shift: their engine is designed for live audio, multi-channel, noisy environments, and supports extensive language coverage. +As voice continues to become a primary interface for human-machine interaction, transcription and audio intelligence are becoming foundational rather than optional. + +## Overview of Gladia's Offerings + +### Speech-to-Text +Their core offering is accurate, fast speech-to-text: +- Real-time transcription: low latency (under ~300 ms in many cases) for live audio and calls. +- Multilingual and code-switch capable: supporting **100+ languages** and mixed-language audio. +- Speaker diarization, word-level timestamps, custom vocabulary, etc., enabling powerful downstream workflows. + +### Audio Intelligence +Beyond transcription, they provide add-ons that transform audio into richer outputs: +- Translation: translate transcripts into one or more target languages in one API call. +- Summarization, chapter-detection, sentiment analysis, named-entity recognition and more. +These intelligence features enable building applications around meetings, customer calls, content production, and more. + +### API & Integrations +Their API is designed for developers: REST/JSON endpoints, webhooks, callbacks, SDKs, and compatibility with telephony protocols (SIP/VoIP) for live use-cases. +They support real-time streaming via suitably low-latency APIs—so platforms, contact centres, and media producers can all use the same backbone for live scenarios. + +## Gladia's Technology + +### Features +- **Real-time latency**: their transcription engine supports live streaming with under 300 ms latency in many cases. +- **Multilingual support**: more than 100 languages and dialects, with code-switching support. +- **World-class timestamps**: provide word-level timing for precise analytics/subtitles. +- **Custom vocabulary & domain adaptation**: tailor the model to your terminology for better accuracy. +- **Audio intelligence add-ons**: summarization, entity recognition, sentiment, real-time translation. + +### Benefits +- **Efficiency**: Transcription and analysis workflows become far faster and more automated, reducing manual burden. +- **Scalability**: Built to handle large volumes of audio/video in live scenarios, globally. +- **Global readiness**: With broad multilingual support and live streaming capability, they can deploy in many regions/languages. +- **Integrability**: Developer-friendly APIs mean they can embed transcription plus intelligence into their apps or platforms cleanly. + +## Real-time Transcription and Translation +Gladia is particularly strong at live use-cases: they handle real-time streaming audio (e.g., from calls, meetings, live events) with sub-300 ms latency, and can simultaneously transcribe and translate in 100+ languages. +Use-cases include: live meeting captions, contact centre agent assistance, voice bots, or multilingual live events. + +## Use Cases for Gladia +Here are some strong scenarios where Gladia shines: +- **Voice agents**: Real-time transcription, speaker attribution, translation and post-meeting summaries. +- **Virtual Meetings**: Real-time transcription, speaker attribution, translation and post-meeting summaries. +- **Customer Service / Contact Centres**: Live transcription of calls, sentiment/keyword extraction, multilingual agent assistance. +- **Sales Enablement**: Capture names/emails/details across languages and accents, feed CRMs, enable global sales teams. +- **Media & Content Creation**: Transcribe/edit video/audio, generate subtitles (SRT/VTT), translate for global distribution. + +## Impact on Business Operations +By embedding Gladia's transcription + audio intelligence, enterprises can shift from manual audio workflows (listening, typing, editing) to automated pipelines. This frees up teams to focus on strategy, insights and growth rather than operational overhead. +With real-time capabilities and broad language support, they also expand their reach globally and reduce latency in delivering actionable outputs from voice data. + +## Innovation and Research +Gladia stays at the forefront of audio AI research. Their engineering team continually advances their ASR/NLP engine (e.g., optimized versions of AI speech-to-text models, speaker-diarization, real-time adaptation) and explores new features such as code-switching, noise robustness, and live streaming architecture. +We believe voice is the ultimate interface: speaking should be the most natural way to build, access and connect with technology. + +## AI Safety and Ethics +Responsible AI use is built-in. Gladia offers enterprise-grade data governance, secure hosting options, and alignment with privacy/compliance best practices (such as GDPR). They focus on avoiding hallucinations in transcripts and ensuring veracity in business-critical settings. EU and US regions are available for data residency. + +## Useful links +- **Playground**: [app.gladia.io](https://app.gladia.io/?utm_source=vapi) +- **Website**: [gladia.io](https://gladia.io/?utm_source=vapi) +- **Documentation**: [docs.gladia.io](https://docs.gladia.io/?utm_source=vapi) + +--- \ No newline at end of file From 37eb5c535cc231097eb28e66b14fc3259c50a3e6 Mon Sep 17 00:00:00 2001 From: karamouche Date: Thu, 23 Oct 2025 16:21:24 +0200 Subject: [PATCH 2/3] Revise Gladia documentation for clarity and conciseness --- fern/providers/transcriber/gladia.mdx | 111 ++++++++++++-------------- 1 file changed, 50 insertions(+), 61 deletions(-) diff --git a/fern/providers/transcriber/gladia.mdx b/fern/providers/transcriber/gladia.mdx index 89fa6eac1..ebcb67bee 100644 --- a/fern/providers/transcriber/gladia.mdx +++ b/fern/providers/transcriber/gladia.mdx @@ -4,69 +4,58 @@ slug: providers/transcriber/gladia --- ## What is Gladia? -Gladia is a state-of-the-art audio transcription and intelligence platform. It provides **real-time** speech-to-text processing for audio and video, and layers on advanced audio-intelligence tools that let businesses convert unstructured audio into actionable insights. -Their product is built to integrate easily and scale, enabling companies to focus on building features rather than transcription infrastructure. +Gladia is a state-of-the-art audio transcription and intelligence platform. It provides **real-time** speech-to-text for audio and video and adds advanced audio-intelligence features so you can turn unstructured audio into actionable insights. It integrates easily and scales so you can focus on building features instead of transcription infrastructure. Try Gladia on their [playground](https://app.gladia.io/?utm_source=vapi) to get a feel for the product! -## The Evolution of AI Transcription -Transcription technology has progressed from simple speech recognition systems to full-blown platforms that handle real-time streaming, multilingual audio, code-switching, speaker-diarization, and deep analytics. Gladia's technology reflects this shift: their engine is designed for live audio, multi-channel, noisy environments, and supports extensive language coverage. -As voice continues to become a primary interface for human-machine interaction, transcription and audio intelligence are becoming foundational rather than optional. - -## Overview of Gladia's Offerings - -### Speech-to-Text -Their core offering is accurate, fast speech-to-text: -- Real-time transcription: low latency (under ~300 ms in many cases) for live audio and calls. -- Multilingual and code-switch capable: supporting **100+ languages** and mixed-language audio. -- Speaker diarization, word-level timestamps, custom vocabulary, etc., enabling powerful downstream workflows. - -### Audio Intelligence -Beyond transcription, they provide add-ons that transform audio into richer outputs: -- Translation: translate transcripts into one or more target languages in one API call. -- Summarization, chapter-detection, sentiment analysis, named-entity recognition and more. -These intelligence features enable building applications around meetings, customer calls, content production, and more. - -### API & Integrations -Their API is designed for developers: REST/JSON endpoints, webhooks, callbacks, SDKs, and compatibility with telephony protocols (SIP/VoIP) for live use-cases. -They support real-time streaming via suitably low-latency APIs—so platforms, contact centres, and media producers can all use the same backbone for live scenarios. - -## Gladia's Technology - -### Features -- **Real-time latency**: their transcription engine supports live streaming with under 300 ms latency in many cases. -- **Multilingual support**: more than 100 languages and dialects, with code-switching support. -- **World-class timestamps**: provide word-level timing for precise analytics/subtitles. -- **Custom vocabulary & domain adaptation**: tailor the model to your terminology for better accuracy. -- **Audio intelligence add-ons**: summarization, entity recognition, sentiment, real-time translation. - -### Benefits -- **Efficiency**: Transcription and analysis workflows become far faster and more automated, reducing manual burden. -- **Scalability**: Built to handle large volumes of audio/video in live scenarios, globally. -- **Global readiness**: With broad multilingual support and live streaming capability, they can deploy in many regions/languages. -- **Integrability**: Developer-friendly APIs mean they can embed transcription plus intelligence into their apps or platforms cleanly. - -## Real-time Transcription and Translation -Gladia is particularly strong at live use-cases: they handle real-time streaming audio (e.g., from calls, meetings, live events) with sub-300 ms latency, and can simultaneously transcribe and translate in 100+ languages. -Use-cases include: live meeting captions, contact centre agent assistance, voice bots, or multilingual live events. - -## Use Cases for Gladia -Here are some strong scenarios where Gladia shines: -- **Voice agents**: Real-time transcription, speaker attribution, translation and post-meeting summaries. -- **Virtual Meetings**: Real-time transcription, speaker attribution, translation and post-meeting summaries. -- **Customer Service / Contact Centres**: Live transcription of calls, sentiment/keyword extraction, multilingual agent assistance. -- **Sales Enablement**: Capture names/emails/details across languages and accents, feed CRMs, enable global sales teams. -- **Media & Content Creation**: Transcribe/edit video/audio, generate subtitles (SRT/VTT), translate for global distribution. - -## Impact on Business Operations -By embedding Gladia's transcription + audio intelligence, enterprises can shift from manual audio workflows (listening, typing, editing) to automated pipelines. This frees up teams to focus on strategy, insights and growth rather than operational overhead. -With real-time capabilities and broad language support, they also expand their reach globally and reduce latency in delivering actionable outputs from voice data. - -## Innovation and Research -Gladia stays at the forefront of audio AI research. Their engineering team continually advances their ASR/NLP engine (e.g., optimized versions of AI speech-to-text models, speaker-diarization, real-time adaptation) and explores new features such as code-switching, noise robustness, and live streaming architecture. -We believe voice is the ultimate interface: speaking should be the most natural way to build, access and connect with technology. - -## AI Safety and Ethics -Responsible AI use is built-in. Gladia offers enterprise-grade data governance, secure hosting options, and alignment with privacy/compliance best practices (such as GDPR). They focus on avoiding hallucinations in transcripts and ensuring veracity in business-critical settings. EU and US regions are available for data residency. +## Why choose Gladia on Vapi? + +### Real-time speech-to-text +- Low-latency live transcription (often under ~300 ms) for calls and streaming audio. +- Super-fast partials transcription ( ~100 ms) for immediate response processing. +- Word-level timestamps, and detailed custom vocabulary to power downstream workflows. +- Mixed-language and code-switch support for natural conversations. + +### Global language coverage +- Support for **110+ languages** and dialects. +- Robust handling of multilingual and mixed-language audio. + +### Audio intelligence add-ons +- Translation in one API call to one or more target languages. +- Summarization post-call, sentiment analysis, and named-entity recognition in real-time. +- Build meeting notes, customer-call insights, and content production workflows on top of transcripts. + +### API and integrations +- Developer-friendly REST/JSON endpoints, webhooks ans callbacks. +- Telephony compatibility (SIP/VoIP) and noise resistance for live use cases. +- Real-time streaming with low-latency interfaces for platforms and contact centers. + +## Getting started + +1. Go to the **Assistants** tab in the left-hand navigation. +2. Create a new assistant, or select the voice assistant you want to configure. +3. Open the **Transcriber** tab in the top navigation (or scroll to the Transcriber module). +4. In the **Provider** dropdown, select **Gladia**. + +Watch the [Vapi x Gladia demo video](https://youtu.be/7EoYnMOHR5A?si=dIDTTXw2L--DY-QY) to see real-time features in action! + +## Best practices + +- **Region selection**: Use the region closest to your users; EU and US options are available for data residency and latency. +- **Custom vocabulary**: Add domain-specific terms (product names, acronyms) to improve accuracy. +- **Timestamps**: Use word-level timestamps when you need precise analytics or subtitles. +- **Translation**: Use built-in translation when you need multilingual outputs from a single stream. + +## Use cases + +- **Voice agents**: Real-time transcription, speaker attribution, translation, and post-call summaries. +- **Virtual meetings**: Live transcription, speaker attribution, translation, and meeting notes. +- **Customer service / contact centers**: Live call transcription, sentiment/keyword extraction, multilingual agent assistance. +- **Sales enablement**: Capture names, emails, and details across languages and accents; feed CRMs. +- **Media & content creation**: Transcribe/edit audio/video, generate subtitles (SRT/VTT), and translate for global distribution. + +## Data protection and compliance + +Gladia offers enterprise-grade data governance, secure hosting options, and alignment with privacy and compliance frameworks such as GDPR. EU and US regions are available for data residency. ## Useful links - **Playground**: [app.gladia.io](https://app.gladia.io/?utm_source=vapi) From 91d175e0b28c3fcf463dce125a46297770eaddd4 Mon Sep 17 00:00:00 2001 From: karamouche Date: Thu, 23 Oct 2025 16:31:33 +0200 Subject: [PATCH 3/3] Update gladia details in speech-to-text sections --- fern/assistants/examples/multilingual-agent.mdx | 1 + fern/customization/multilingual.mdx | 6 +++--- fern/debugging.mdx | 1 + fern/quickstart/introduction.mdx | 2 +- 4 files changed, 6 insertions(+), 4 deletions(-) diff --git a/fern/assistants/examples/multilingual-agent.mdx b/fern/assistants/examples/multilingual-agent.mdx index 3a2813bac..8afe1fc52 100644 --- a/fern/assistants/examples/multilingual-agent.mdx +++ b/fern/assistants/examples/multilingual-agent.mdx @@ -1155,6 +1155,7 @@ For a more structured approach with explicit language selection, see our compreh ## Provider Support Summary **Speech-to-Text (Transcription):** +- **Gladia**: Solaria, automatic language detection and code-switching. - **Deepgram**: Nova 2, Nova 3 with "Multi" language setting - **Google**: Latest models with "Multilingual" language setting - **All other providers**: Single language only, no automatic detection diff --git a/fern/customization/multilingual.mdx b/fern/customization/multilingual.mdx index 25b7732fa..94c3686c9 100644 --- a/fern/customization/multilingual.mdx +++ b/fern/customization/multilingual.mdx @@ -29,9 +29,9 @@ Set up your transcriber to automatically detect and process multiple languages. 2. Create a new assistant or edit an existing one 3. In the **Transcriber** section: - **Provider**: Select `Deepgram` (recommended), `Google`, or `Gladia` - - **Model**: For Deepgram, choose `Nova 2` or `Nova 3`; for Google, choose `Latest`; for Gladia, choose your preferred Gladia model - - **Language / Mode**: Set `Multi` (Deepgram), `Multilingual` (Google), or enable automatic language detection (Gladia) - 4. **Other providers**: May require a single language and not auto-detect + - **Model**: For Deepgram, choose `Nova 2` or `Nova 3`; for Google, choose `Latest`; for Gladia, choose `Solaria` + - **Language / Mode**: Set `Multi` (Deepgram), `Multilingual` (Google), or choose the language you want to transcribe (Gladia) + 4. **Other providers**: May require a single languages and not auto-detect 5. Click **Save** to apply the configuration diff --git a/fern/debugging.mdx b/fern/debugging.mdx index 8ac703a52..2c9f95b53 100644 --- a/fern/debugging.mdx +++ b/fern/debugging.mdx @@ -83,6 +83,7 @@ Start with these immediate checks before diving deeper: - [Anthropic Status](https://status.anthropic.com/) for Anthropic language models - [ElevenLabs Status](https://status.elevenlabs.io/) for ElevenLabs voice synthesis - [Deepgram Status](https://status.deepgram.com/) for Deepgram speech-to-text + - [Gladia Status](https://status.gladia.io/) for Gladia speech-to-text - And other providers' status pages as needed diff --git a/fern/quickstart/introduction.mdx b/fern/quickstart/introduction.mdx index 99fc8e800..d8a57219c 100644 --- a/fern/quickstart/introduction.mdx +++ b/fern/quickstart/introduction.mdx @@ -30,7 +30,7 @@ Every Vapi assistant combines three core technologies: -You have full control over each component, with dozens of providers and models to choose from; OpenAI, Anthropic, Google, Deepgram, ElevenLabs, and many, many more. +You have full control over each component, with dozens of providers and models to choose from; OpenAI, Anthropic, Google, Gladia, Deepgram, ElevenLabs, and many, many more. ## Two ways to build voice agents