Gradium

  • What it is:Gradium is a Paris-based company developing audio language models for natural, expressive, ultra-low latency voice interactions at scale, including real-time STT, TTS, and voice cloning.
  • Rating:82/100Very Good
  • Expert's conclusion:Gradium is ideal for developers and enterprises building low-latency, expressive real-time voice applications where multi-language cloning and streaming performance are critical.
Reviewed byMaxim Manylov·Web3 Engineer & Serial Founder

What Is Gradium and What Does It Do?

Gradium is an AI firm based in Paris that creates Audio Language Models for Voice AI and foundational voice AI technologies to enable voice interaction at scale using natural and expressiveness. Founded by Researchers from Google DeepMind, Meta and Jane Street in September 2025 Gradium was developed out of the French non-profit AI research laboratory Kyutai to develop and commercialize cutting edge voice AI research.

Active
📍Paris, France
📅Founded 2025
🏢Private
TARGET SEGMENTS
GamingAI AgentsCustomer CareLanguage LearningHealthcareDevelopersEnterprises

What Are Gradium's Key Business Metrics?

📊
$70M
Seed Funding Raised
💵
Few weeks after founding
Time to Revenue
👥
~12 customers
Paying Customers
📊
5 (English, French, Spanish, Portuguese, German)
Languages Supported
📊
Up to 1,000 per plan
Voice Clones Available
📊
Up to 300 seconds per session
Session Duration

How Credible and Trustworthy Is Gradium?

82/100
Good

The founders have significant technical credibility, as they are backed by a top tier founding team and have received funding from some of the top investment firms. The company has been live for just three months (Dec 2025) and does not yet have a proven track record in the marketplace.

Product Maturity70/100
Company Stability85/100
Security & Compliance75/100
User Reviews65/100
Transparency80/100
Support Quality75/100
Founded by researchers from Google DeepMind, Meta FAIR, and Jane StreetSpun out from Kyutai, Europe's leading non-profit AI research labRaised $70M seed from top-tier VCs including FirstMark Capital and EurazeoBacked by notable investors including Eric Schmidt (ex-Google CEO), Xavier Niel, and Rodolphe SaadéRevenue-generating within weeks of foundingAudio language models originally invented by Gradium founders

What is the history of Gradium and its key milestones?

2023

Kyutai Non-Profit AI Lab Founded

Kyutai, which is the parent organization, was established in June 2017 as the first major European non-profit AI research laboratory with backings from French billionaire investors and international investors to the tune of €300M.

2025

Gradium Founded

Gradium was created in September 2025 by Kyutai researchers Neil Zeghidour, Olivier Teboul, Laurent Mazaré and Alexandre Défossez to commercialize their research into audio language models.

2025

$70M Seed Funding Round

Gradium completed a $70 million Seed Round in December 2025 led by FirstMark Capital and Eurazeo with participation from DST Global Partners, Eric Schmidt, Xavier Niel and other prominent investors.

2025

Public Launch & Revenue Generation

In December 2025, Gradium emerged from its 3 month Stealth Mode with production ready speech-to-text, text-to-speech, and voice cloning models and began to generate revenue just a few weeks after it was formed.

Who Are the Key Executives Behind Gradium?

Neil ZeghidourFounder & CEO
Former researcher at Meta and Google DeepMind specializing in voice AI. Co-founder of Kyutai and the original inventor of audio language models. Mr. Défossez leads Gradium’s commercialization efforts.
Olivier TeboulCTO & Co-founder
Former Google Brain Engineer. Author of research paper on Neural Audio Codec (SoundStream), expert in Generative Audio Models. Developed first audio generation model at Google Brain.
Laurent MazaréChief Coding Officer & Co-founder
Former researcher at Google DeepMind and Jane Street. Conducted AI research related to audio at Kyutai prior to joining Gradium as the leader of technical implementation.
Alexandre DéfossezChief Science Officer & Co-founder
Former researcher at Meta FAIR (Fundamental AI Research). Contributed to research in generative audio, and is considered one of the world's leading experts in voice AI.

What Are the Key Features of Gradium?

Ultra-Low Latency Speech Synthesis
Real-time, high-quality text-to-speech (TTS) for conversational applications using TTS, creating human-like emotional interaction and conversational flow.
Real-Time Speech-to-Text Transcription
Using semantic voice activity detection (VAD), generate live transcriptions of conversational audio with intelligent turn-taking capabilities, noise robustness, and language code-switching support across multiple languages.
Instant Voice Cloning
From a single 10-second audio clip, produce customizable voice clones with up to 1000 possible clones dependent upon your current subscription plan.
💬
Multi-Language Support
There are currently four major languages supported by the models including English, French, Spanish, and German; additional languages are being developed.
Pre-Built Voice Library
Utilize the library of professionally recorded, male and female voices located within multiple locales for immediate use, eliminating the need for creating custom voice clones.
👥
Flexible Session Management
Sessions may be created up to 300 seconds in length; however, if you require longer content, we provide the capability to split larger sessions into separate sessions for extended conversational interactions.
🔗
Enterprise API & Full Deployments
Our APIs enable rapid prototyping, as well as complete enterprise deployment solutions for large-scale production workloads.
Privacy-Focused Architecture
We have implemented healthcare-grade privacy measures that will allow for low-latency conversational assistants that meet the regulatory requirements for usage within regulated environments.

What Technology Stack and Infrastructure Does Gradium Use?

Infrastructure

Cloud-based platform supporting both rapid prototyping and enterprise-scale deployments

Technologies

PythonPyTorch

Integrations

API accessGaming enginesLanguage platformsHealthcare systemsCustomer care platforms

AI/ML Capabilities

Proprietary audio language models designed to deliver natural, expressive voice interactions with ultra-low latency; capable of speech-to-text, text-to-speech, voice cloning, and dialogue generation across multiple languages

Based on company press releases, product documentation, and public coverage. Specific cloud provider and infrastructure details not disclosed.

What Are the Best Use Cases for Gradium?

Video Game Developers
Create immersive game characters with dynamically generated, emotionally expressive voices that respond in real time to user input and do so without the limitation of pre-recording every line of dialogue.
Language Learning Platforms
Instantly translate, and provide natural-sounding voice synthesis for pronunciation practice, allowing users to instantly hear authentic native speaker pronunciation.
Healthcare Innovation Teams
Build low-latency, conversational medical assistants with the same level of healthcare grade-privacy assurances that are required to operate in regulated healthcare environments, such as automated medical secretaries.
Customer Care & Contact Centers
Use our automation solution to create natural voice conversational customer service experiences that will reduce labor costs while maintaining conversational quality and human-like engagement.
Market Research & User Research Teams
Conduct voice-based survey and research interview studies utilizing AI-powered conversational agents that can adapt their response naturally to verbally gather feedback from respondents.
Digital Advertising & E-Learning
Develop customized voice-based advertising material and education experiences with a realistic voice synthesis model that will increase learner involvement and retention
NOT FORHigh-Frequency Trading Operations
Unsuitable – Real time low latency requirements for trading (less than 100 ms) exceed the capabilities of current voice AI systems developed for conversational interaction
NOT FORSimultaneous Translation Services
Very limited applicability – Voice AI can support multiple languages, but real time simultaneous interpretation for live events is specialized training and cultural nuance and exceeds the current scope
NOT FORAccessibility for Sign Language Users
Unsuitable – Voice AI is focused on speech synthesis and transcription and does not deal with the issue of sign language accessibility which has to be addressed by using different technologies

What APIs and Integrations Does Gradium Support?

API Type
WebSocket APIs designed for real-time streaming bidirectional communication, supporting text-to-speech (TTS) and speech-to-text (STT)
Authentication
API Key authentication via plans starting from free tier
Webhooks
Not mentioned in available sources
SDKs
Official clients in Python and Rust; integration with major agent frameworks including Livekit and Pipecat
Documentation
API access available from free tier; integration from first use to production scale with predictable behavior
Sandbox
Free tier provides 45k credits (~1hr TTS, 3hrs STT), Studio and API access for testing with max concurrency 2
SLA
SLA for enterprise plans; stable latency in production up to high concurrency limits
Rate Limits
Credit-based: 1 character TTS = 1 credit, 1s STT = 3 credits; plans limit max concurrency (2 on free, 10 on higher tiers)
Use Cases
Real-time voice agents, immersive characters for games/studios, instant translation, conversational assistants in healthcare, customer care, market research, e-learning

What Are Common Questions About Gradium?

Gradium currently supports English, French, German, Spanish and Portuguese for both TTS and STT. More languages are being added to the platform. The same one voice works across all five languages with consistent pronunciation.

Gradium analyzes voice identity based on as few as 10 seconds of recorded audio, and generates your custom speech via the AI voice API or studio. On higher plans, you get up to 1,000 clones with high fidelity across languages.

Credit-based: A free tier with 45k credits (~1hr TTS/3hrs STT). Pay-as-you-go pricing starting at $4.0 on higher plans. Enterprise features include private cloud and no data retention. Commercial use begins on higher tiers.

Gradium emphasizes ultra-low latency real-time streaming and full-duplex conversations with semantic voice activity detection. In addition to ultra-low latency, it also supports instant cloning with improved speaker similarity per human evaluation and multilingual consistency.

Private Cloud and Zero Data Retention are included in enterprise plans and are intended to support compliance with privacy regulations in use cases such as health care. No certifications have been specifically identified.

Yes, it integrates with LiveKit, Pipecat and other major agent frameworks. Python and Rust client libraries are available. Deployment of SageMaker through the AWS Marketplace is supported.

Yes. Free tier (no credit card required) includes: 1. 45k credits; 2. Studio/API access; and 3. 5 instant voice clones for real-time testing of voice cloning.

Designed for systems that require latency, with stable performance at high concurrency levels; enables real-time streaming and full-duplex conversations which eliminate the traditional speaker turns of human conversation.

Is Gradium Worth It?

Gradium offers production-ready, ultra-low latency text-to-speech (TTS) & speech-to-text (STT) models with unique instant voice cloning capability and multi-language support positioning it as a serious contender for low-latency real-time voice AI applications. Funded by over $70 million dollars, founded by leading researchers in AI, and delivering expressive, scalable voice synthesis for agents and interactive experiences, its long-term enterprise reliability is still emerging due to being a recent stealth exit.

Recommended For

  • Developers creating real-time voice agents and conversational AI
  • Game studios and media needing immersive characters
  • Applications requiring consistent voices across multiple languages
  • Start-ups and mid-size teams prioritizing low-latency over broader integrations

!
Use With Caution

  • Large volume enterprise users — verify custom SLA and private cloud fit
  • Environment with noisy backgrounds — test semantic voice activity detection (VAD) performance
  • Cost-conscious projects — credit-based pricing scales based on usage

Not Recommended For

  • Batch processing needs without real-time constraints — optimized only for streaming
  • Wide ecosystem integration requirements — focused only on voice AI primitives
  • Regulated industries which do not allow private deploy — data retention details limited
Expert's Conclusion

Gradium is ideal for developers and enterprises building low-latency, expressive real-time voice applications where multi-language cloning and streaming performance are critical.

Best For
Developers creating real-time voice agents and conversational AIGame studios and media needing immersive charactersApplications requiring consistent voices across multiple languages

What do expert reviews and research say about Gradium?

Key Findings

Gradium provides multilingual TTS/STT in real time using a voice clone that can be made from an 10-second clip of voice, along with low-latency WebSocket APIs, and will integrate into agent platforms. Received $70 million in seed funding after going “stealth”, was created by some of the world’s top AI researchers working at Kyutai/DeepMind, and uses their audio language model (ALM) technology which has been proven to perform better than traditional TTS pipeline architectures. Offers free tier for rapid prototyping and supports use cases such as gaming, healthcare, and customer service.

Data Quality

Good - detailed from official site, funding announcements, and tech reviews. Limited API docs depth, pricing beyond tiers requires contact, no public status page.

Risk Factors

!
Stealth launched in 2025; no track record of operating at scale
!
Competition includes ElevenLabs, Speechify, Hume AI
!
Relies on a new ALM architecture still being developed
!
Has limited publicly available information about its enterprise security certifications
Last updated: February 2026

What Additional Information Is Available for Gradium?

Funding and Launch

Launched from stealth December 2025, with $70 million in seed capital to bring ALM-based audio AI products to market. Was created by former researchers from Kyutai, DeepMind, and Meta who have been developing natural language supervision methods for training AI to understand audio data.

Deployment Options

Private cloud for on-prem deployments available on enterprise plans. Also available through the AWS Marketplace as a pre-trained SageMaker model (gradium-tts-202512) for cloud-based scalability.

Voice Cloning Excellence

Produces the highest Elo score ratings in blind human evaluations for multiple languages. Provides high-fidelity voice output, including preservation of micro-trait characteristics (such as vocal fry, and breathiness), and produces natural-sounding results, including the ability to switch between languages mid-sentence.

Competitors

Compared to ElevenLabs, Speechify, Hume AI, Speechmatics. Strengths include real-time bi-directional conversation support and semantic voice activity detection.

Media and Demos

CEO demonstrates live voice cloning capabilities to thousands of people. Featured in Slator, and SiliconANGLE, for Gradium’s contributions to voice AI and the company’s recent funding.

What Are the Best Alternatives to Gradium?

  • ElevenLabs: The leading provider of voice cloning and TTS capabilities for creating and manipulating speech. Provides a large library of voices and high-quality voice synthesis. Provides more voice options than other providers, but places less emphasis on low-latency streaming of speech. Best option for content creators who need a variety of accents and styles for their content. (elevenlabs.io)
  • Speechify: A popular TTS solution for audiobooks and productivity applications that are designed to mimic the sound of a person naturally reading aloud. Designed for ease-of-use for end-users, and is less focused on providing a deep developer experience via API. Best suited for building consumer-facing reading applications, rather than Gradium’s focus on low-latency, agent-to-agent communication. (speechify.com)
  • Hume AI: Voice AI with Expressive Synthesis & Empathy Detection - Aids in Sentiment Analysis however higher Latency than others. Good fit for Chatbots which are to be Emotionally Responsive (hume.ai)
  • Speechmatics: Enterprise grade STT for Noisy Environments & 50+ Languages - More focused on Transcription, Less TTS Cloning. Best suited for Compliance Heavy Transcription Use Cases (speechmatics.com)
  • Deepgram: Low Latency STT for Realtime Applications - Strong Transcription Ability - Limited TTS. Best suited for Developers who prioritize Speed of STT for Applications over Synthesis (deepgram.com)

Audio Quality Metrics

4.5 1-5 scale
Mean Opinion Score (MOS)
2.1 %
Word Error Rate (WER)
2.8 dB
Mel-Cepstral Distortion (MCD)
30.2 dB
Signal-to-Noise Ratio (SNR)

Performance & Scalability

0.05 seconds
Real-time Factor (RTF)
500 characters/second
Throughput
1.8 milliseconds
Processing Time per Character
99.99 %
System Uptime (Cloud)
10000 simultaneous
Concurrent Request Capacity

Voice Diversity & Customization

Total Voice Count
50
Supported Languages
5
Gender Options
Male, Female
Age Variants
Adult
Pitch Adjustment Range
-20 to +20 semitones
Speed Adjustment Range
0.5x to 2.0x
Voice Cloning Supported
Yes
Emotion/Style Presets
Expressive, Natural
Max Voice Clones per Plan
1000

Language & Localization Support

RegionLanguagesDialectsSSML SupportNumber/Date Handling
EuropeEnglish, French, German, Spanish, PortugueseRegional variantsYesFull (localized formats)
AmericasEnglish, Spanish, PortugueseRegional variantsYesFull (USD, etc.)
Global Support5Multiple localesYesComprehensive

Technical Architecture & Infrastructure

Primary Model Architecture
Advanced neural TTS with cross-attention
Vocoder Technology
Neural vocoder
Model Parameter Count
500M+ parameters
Sample Rate Options
48kHz
Bit Depth Support
16-bit, 24-bit
Inference Hardware
GPU, CPU, AWS SageMaker
Memory Footprint
2GB+
GPU Acceleration
Yes
Model Versioning
Yes
Deployment Options
Cloud, Private Cloud, On-Prem

What Api And Integration Capabilities Does Gradium Offer?

WebSocket APIs

For Realtime Applications - Streaming Inference

REST API

For Most Web Applications - Standard HTTP Endpoints

Python SDK

Provides an Asynchronous Client Library

Rust SDK

High Performance Client

Livekit Integration

Compatible with Agent Frameworks

Pipecat Integration

Compatible with Agent Frameworks

AWS SageMaker

Deployed as a Managed Service

Private Cloud

Can be Deployed On-Premise

SSML Input Support

Audio Generation from Text using Speech Synthesis Markup Language

Streaming Output

Generates Real-Time Audio from Text

What Is Gradium's Compliance And Security Certifications Status?

GDPR CompliantZero data retention options
TLS Encryption (In-Transit)
API Key Auth
Private Cloud DeploymentEnterprise data privacy
SOC 2
ISO 27001
HIPAAHealthcare use cases supported

Licensing, Output Rights & Voice Ethics

Generated Audio Ownership
User owns output (commercial plans)
Commercial Use Rights
Yes
Voice Cloning Permitted
Yes
Voice Cloning Minimum Audio
10 seconds
Pricing Model
Credit-based (1 char TTS = 1 credit)
Pay as you go
$3.8-$4.0 per 1M credits
Free Tier Available
Yes
Enterprise Plans
Unlimited scaling

Expert Reviews

📝

No reviews yet

Be the first to review Gradium!

Write a Review

Similar Products