Gradium Review: Key Features and Pros&Cons

Name: Gradium
Author: Gradium

What it is:Gradium is a Paris-based company developing audio language models for natural, expressive, ultra-low latency voice interactions at scale, including real-time STT, TTS, and voice cloning.
Rating:82/100Very Good
Expert's conclusion:Gradium is ideal for developers and enterprises building low-latency, expressive real-time voice applications where multi-language cloning and streaming performance are critical.

Reviewed byMaxim Manylov·Web3 Engineer & Serial Founder

Company Overview

Gradium is an AI firm based in Paris that creates Audio Language Models for Voice AI and foundational voice AI technologies to enable voice interaction at scale using natural and expressiveness. Founded by Researchers from Google DeepMind, Meta and Jane Street in September 2025 Gradium was developed out of the French non-profit AI research laboratory Kyutai to develop and commercialize cutting edge voice AI research.

Active

📍Paris, France

📅Founded 2025

🏢Private

TARGET SEGMENTS

GamingAI AgentsCustomer CareLanguage LearningHealthcareDevelopersEnterprises

Key Metrics

📊

$70M

Seed Funding Raised

💵

Few weeks after founding

Time to Revenue

👥

~12 customers

Paying Customers

📊

5 (English, French, Spanish, Portuguese, German)

Languages Supported

📊

Up to 1,000 per plan

Voice Clones Available

📊

Up to 300 seconds per session

Session Duration

Credibility Rating

82/100

Good

The founders have significant technical credibility, as they are backed by a top tier founding team and have received funding from some of the top investment firms. The company has been live for just three months (Dec 2025) and does not yet have a proven track record in the marketplace.

BREAKDOWN

Product Maturity70/100

Company Stability85/100

Security & Compliance75/100

User Reviews65/100

Transparency80/100

Support Quality75/100

TRUST SIGNALS

Founded by researchers from Google DeepMind, Meta FAIR, and Jane StreetSpun out from Kyutai, Europe's leading non-profit AI research labRaised $70M seed from top-tier VCs including FirstMark Capital and EurazeoBacked by notable investors including Eric Schmidt (ex-Google CEO), Xavier Niel, and Rodolphe SaadéRevenue-generating within weeks of foundingAudio language models originally invented by Gradium founders

Company History

2023

Kyutai Non-Profit AI Lab Founded

Kyutai, which is the parent organization, was established in June 2017 as the first major European non-profit AI research laboratory with backings from French billionaire investors and international investors to the tune of €300M.

2025

Gradium Founded

Gradium was created in September 2025 by Kyutai researchers Neil Zeghidour, Olivier Teboul, Laurent Mazaré and Alexandre Défossez to commercialize their research into audio language models.

2025

$70M Seed Funding Round

Gradium completed a $70 million Seed Round in December 2025 led by FirstMark Capital and Eurazeo with participation from DST Global Partners, Eric Schmidt, Xavier Niel and other prominent investors.

2025

Public Launch & Revenue Generation

In December 2025, Gradium emerged from its 3 month Stealth Mode with production ready speech-to-text, text-to-speech, and voice cloning models and began to generate revenue just a few weeks after it was formed.

Key Executives

Neil Zeghidour— Founder & CEO: Former researcher at Meta and Google DeepMind specializing in voice AI. Co-founder of Kyutai and the original inventor of audio language models. Mr. Défossez leads Gradium’s commercialization efforts.
Olivier Teboul— CTO & Co-founder: Former Google Brain Engineer. Author of research paper on Neural Audio Codec (SoundStream), expert in Generative Audio Models. Developed first audio generation model at Google Brain.
Laurent Mazaré— Chief Coding Officer & Co-founder: Former researcher at Google DeepMind and Jane Street. Conducted AI research related to audio at Kyutai prior to joining Gradium as the leader of technical implementation.
Alexandre Défossez— Chief Science Officer & Co-founder: Former researcher at Meta FAIR (Fundamental AI Research). Contributed to research in generative audio, and is considered one of the world's leading experts in voice AI.

Key Features

✨

Ultra-Low Latency Speech Synthesis

Real-time, high-quality text-to-speech (TTS) for conversational applications using TTS, creating human-like emotional interaction and conversational flow.

✨

Real-Time Speech-to-Text Transcription

Using semantic voice activity detection (VAD), generate live transcriptions of conversational audio with intelligent turn-taking capabilities, noise robustness, and language code-switching support across multiple languages.

⚡

Instant Voice Cloning

From a single 10-second audio clip, produce customizable voice clones with up to 1000 possible clones dependent upon your current subscription plan.

💬

Multi-Language Support

There are currently four major languages supported by the models including English, French, Spanish, and German; additional languages are being developed.

✨

Pre-Built Voice Library

Utilize the library of professionally recorded, male and female voices located within multiple locales for immediate use, eliminating the need for creating custom voice clones.

👥

Flexible Session Management

Sessions may be created up to 300 seconds in length; however, if you require longer content, we provide the capability to split larger sessions into separate sessions for extended conversational interactions.

🔗

Enterprise API & Full Deployments

Our APIs enable rapid prototyping, as well as complete enterprise deployment solutions for large-scale production workloads.

✨

Privacy-Focused Architecture

We have implemented healthcare-grade privacy measures that will allow for low-latency conversational assistants that meet the regulatory requirements for usage within regulated environments.

Tech Stack

Infrastructure

Cloud-based platform supporting both rapid prototyping and enterprise-scale deployments

Technologies

PythonPyTorch

Integrations

API accessGaming enginesLanguage platformsHealthcare systemsCustomer care platforms

AI/ML Capabilities

Proprietary audio language models designed to deliver natural, expressive voice interactions with ultra-low latency; capable of speech-to-text, text-to-speech, voice cloning, and dialogue generation across multiple languages

Based on company press releases, product documentation, and public coverage. Specific cloud provider and infrastructure details not disclosed.

Use Cases

Video Game Developers

Create immersive game characters with dynamically generated, emotionally expressive voices that respond in real time to user input and do so without the limitation of pre-recording every line of dialogue.

Language Learning Platforms

Instantly translate, and provide natural-sounding voice synthesis for pronunciation practice, allowing users to instantly hear authentic native speaker pronunciation.

Healthcare Innovation Teams

Build low-latency, conversational medical assistants with the same level of healthcare grade-privacy assurances that are required to operate in regulated healthcare environments, such as automated medical secretaries.

Customer Care & Contact Centers

Use our automation solution to create natural voice conversational customer service experiences that will reduce labor costs while maintaining conversational quality and human-like engagement.

Market Research & User Research Teams

Conduct voice-based survey and research interview studies utilizing AI-powered conversational agents that can adapt their response naturally to verbally gather feedback from respondents.

Digital Advertising & E-Learning

Develop customized voice-based advertising material and education experiences with a realistic voice synthesis model that will increase learner involvement and retention

NOT FORHigh-Frequency Trading Operations

Unsuitable – Real time low latency requirements for trading (less than 100 ms) exceed the capabilities of current voice AI systems developed for conversational interaction

NOT FORSimultaneous Translation Services

Very limited applicability – Voice AI can support multiple languages, but real time simultaneous interpretation for live events is specialized training and cultural nuance and exceeds the current scope

NOT FORAccessibility for Sign Language Users

Unsuitable – Voice AI is focused on speech synthesis and transcription and does not deal with the issue of sign language accessibility which has to be addressed by using different technologies

Api Integrations

API Type: WebSocket APIs designed for real-time streaming bidirectional communication, supporting text-to-speech (TTS) and speech-to-text (STT)
Authentication: API Key authentication via plans starting from free tier
Webhooks: Not mentioned in available sources
SDKs: Official clients in Python and Rust; integration with major agent frameworks including Livekit and Pipecat
Documentation: API access available from free tier; integration from first use to production scale with predictable behavior
Sandbox: Free tier provides 45k credits (~1hr TTS, 3hrs STT), Studio and API access for testing with max concurrency 2
SLA: SLA for enterprise plans; stable latency in production up to high concurrency limits
Rate Limits: Credit-based: 1 character TTS = 1 credit, 1s STT = 3 credits; plans limit max concurrency (2 on free, 10 on higher tiers)
Use Cases: Real-time voice agents, immersive characters for games/studios, instant translation, conversational assistants in healthcare, customer care, market research, e-learning

Faq

What languages does Gradium support?

Gradium currently supports English, French, German, Spanish and Portuguese for both TTS and STT. More languages are being added to the platform. The same one voice works across all five languages with consistent pronunciation.

How does instant voice cloning work?

Gradium analyzes voice identity based on as few as 10 seconds of recorded audio, and generates your custom speech via the AI voice API or studio. On higher plans, you get up to 1,000 clones with high fidelity across languages.

What's the pricing model?

Credit-based: A free tier with 45k credits (~1hr TTS/3hrs STT). Pay-as-you-go pricing starting at $4.0 on higher plans. Enterprise features include private cloud and no data retention. Commercial use begins on higher tiers.

How is this different from ElevenLabs?

Gradium emphasizes ultra-low latency real-time streaming and full-duplex conversations with semantic voice activity detection. In addition to ultra-low latency, it also supports instant cloning with improved speaker similarity per human evaluation and multilingual consistency.

Is my data secure?

Private Cloud and Zero Data Retention are included in enterprise plans and are intended to support compliance with privacy regulations in use cases such as health care. No certifications have been specifically identified.

Can I integrate with agent frameworks?

Yes, it integrates with LiveKit, Pipecat and other major agent frameworks. Python and Rust client libraries are available. Deployment of SageMaker through the AWS Marketplace is supported.

Is there a free trial?

Yes. Free tier (no credit card required) includes: 1. 45k credits; 2. Studio/API access; and 3. 5 instant voice clones for real-time testing of voice cloning.

What are the latency guarantees?

Designed for systems that require latency, with stable performance at high concurrency levels; enables real-time streaming and full-duplex conversations which eliminate the traditional speaker turns of human conversation.

Expert Verdict

Gradium offers production-ready, ultra-low latency text-to-speech (TTS) & speech-to-text (STT) models with unique instant voice cloning capability and multi-language support positioning it as a serious contender for low-latency real-time voice AI applications. Funded by over $70 million dollars, founded by leading researchers in AI, and delivering expressive, scalable voice synthesis for agents and interactive experiences, its long-term enterprise reliability is still emerging due to being a recent stealth exit.

Developers creating real-time voice agents and conversational AI
Game studios and media needing immersive characters
Applications requiring consistent voices across multiple languages
Start-ups and mid-size teams prioritizing low-latency over broader integrations

!
Use With Caution

Large volume enterprise users — verify custom SLA and private cloud fit
Environment with noisy backgrounds — test semantic voice activity detection (VAD) performance
Cost-conscious projects — credit-based pricing scales based on usage

Not Recommended For

Batch processing needs without real-time constraints — optimized only for streaming
Wide ecosystem integration requirements — focused only on voice AI primitives
Regulated industries which do not allow private deploy — data retention details limited

Expert's Conclusion

Gradium is ideal for developers and enterprises building low-latency, expressive real-time voice applications where multi-language cloning and streaming performance are critical.

Best For

Developers creating real-time voice agents and conversational AIGame studios and media needing immersive charactersApplications requiring consistent voices across multiple languages

Research Summary

Key Findings

Gradium provides multilingual TTS/STT in real time using a voice clone that can be made from an 10-second clip of voice, along with low-latency WebSocket APIs, and will integrate into agent platforms. Received $70 million in seed funding after going “stealth”, was created by some of the world’s top AI researchers working at Kyutai/DeepMind, and uses their audio language model (ALM) technology which has been proven to perform better than traditional TTS pipeline architectures. Offers free tier for rapid prototyping and supports use cases such as gaming, healthcare, and customer service.

Data Quality

Good - detailed from official site, funding announcements, and tech reviews. Limited API docs depth, pricing beyond tiers requires contact, no public status page.

Risk Factors

Stealth launched in 2025; no track record of operating at scale

Competition includes ElevenLabs, Speechify, Hume AI

Relies on a new ALM architecture still being developed

Has limited publicly available information about its enterprise security certifications

Last updated: February 2026

Additional Info

Funding and Launch

Launched from stealth December 2025, with $70 million in seed capital to bring ALM-based audio AI products to market. Was created by former researchers from Kyutai, DeepMind, and Meta who have been developing natural language supervision methods for training AI to understand audio data.

Deployment Options

Private cloud for on-prem deployments available on enterprise plans. Also available through the AWS Marketplace as a pre-trained SageMaker model (gradium-tts-202512) for cloud-based scalability.

Voice Cloning Excellence

Produces the highest Elo score ratings in blind human evaluations for multiple languages. Provides high-fidelity voice output, including preservation of micro-trait characteristics (such as vocal fry, and breathiness), and produces natural-sounding results, including the ability to switch between languages mid-sentence.

Competitors

Compared to ElevenLabs, Speechify, Hume AI, Speechmatics. Strengths include real-time bi-directional conversation support and semantic voice activity detection.

Media and Demos

CEO demonstrates live voice cloning capabilities to thousands of people. Featured in Slator, and SiliconANGLE, for Gradium’s contributions to voice AI and the company’s recent funding.

Alternatives

•
ElevenLabs: The leading provider of voice cloning and TTS capabilities for creating and manipulating speech. Provides a large library of voices and high-quality voice synthesis. Provides more voice options than other providers, but places less emphasis on low-latency streaming of speech. Best option for content creators who need a variety of accents and styles for their content. (elevenlabs.io)
•
Speechify: A popular TTS solution for audiobooks and productivity applications that are designed to mimic the sound of a person naturally reading aloud. Designed for ease-of-use for end-users, and is less focused on providing a deep developer experience via API. Best suited for building consumer-facing reading applications, rather than Gradium’s focus on low-latency, agent-to-agent communication. (speechify.com)
•
Hume AI: Voice AI with Expressive Synthesis & Empathy Detection - Aids in Sentiment Analysis however higher Latency than others. Good fit for Chatbots which are to be Emotionally Responsive (hume.ai)
•
Speechmatics: Enterprise grade STT for Noisy Environments & 50+ Languages - More focused on Transcription, Less TTS Cloning. Best suited for Compliance Heavy Transcription Use Cases (speechmatics.com)
•
Deepgram: Low Latency STT for Realtime Applications - Strong Transcription Ability - Limited TTS. Best suited for Developers who prioritize Speed of STT for Applications over Synthesis (deepgram.com)

Audio Quality Metrics

4.5 1-5 scale

Mean Opinion Score (MOS)

2.1 %

Word Error Rate (WER)

2.8 dB

Mel-Cepstral Distortion (MCD)

30.2 dB

Signal-to-Noise Ratio (SNR)

Performance & Scalability

0.05 seconds

Real-time Factor (RTF)

500 characters/second

Throughput

1.8 milliseconds

Processing Time per Character

99.99 %

System Uptime (Cloud)

10000 simultaneous

Concurrent Request Capacity

Voice Diversity & Customization

Total Voice Count: 50
Supported Languages: 5
Gender Options: Male, Female
Age Variants: Adult
Pitch Adjustment Range: -20 to +20 semitones
Speed Adjustment Range: 0.5x to 2.0x
Voice Cloning Supported: Yes
Emotion/Style Presets: Expressive, Natural
Max Voice Clones per Plan: 1000

Language & Localization Support

Region	Languages	Dialects	SSML Support	Number/Date Handling
Europe	English, French, German, Spanish, Portuguese	Regional variants	Yes	Full (localized formats)
Americas	English, Spanish, Portuguese	Regional variants	Yes	Full (USD, etc.)
Global Support	5	Multiple locales	Yes	Comprehensive

Technical Architecture & Infrastructure

Primary Model Architecture: Advanced neural TTS with cross-attention
Vocoder Technology: Neural vocoder
Model Parameter Count: 500M+ parameters
Sample Rate Options: 48kHz
Bit Depth Support: 16-bit, 24-bit
Inference Hardware: GPU, CPU, AWS SageMaker
Memory Footprint: 2GB+
GPU Acceleration: Yes
Model Versioning: Yes
Deployment Options: Cloud, Private Cloud, On-Prem

API & Integration Capabilities

WebSocket APIs

For Realtime Applications - Streaming Inference

REST API

For Most Web Applications - Standard HTTP Endpoints

Python SDK

Provides an Asynchronous Client Library

Rust SDK

High Performance Client

Livekit Integration

Compatible with Agent Frameworks

Pipecat Integration

Compatible with Agent Frameworks

AWS SageMaker

Deployed as a Managed Service

Private Cloud

Can be Deployed On-Premise

SSML Input Support

Audio Generation from Text using Speech Synthesis Markup Language

Streaming Output

Generates Real-Time Audio from Text

Compliance & Security Certifications

GDPR CompliantZero data retention options

TLS Encryption (In-Transit)

API Key Auth

Private Cloud DeploymentEnterprise data privacy

SOC 2

ISO 27001

HIPAAHealthcare use cases supported

Licensing, Output Rights & Voice Ethics

Generated Audio Ownership: User owns output (commercial plans)
Commercial Use Rights: Yes
Voice Cloning Permitted: Yes
Voice Cloning Minimum Audio: 10 seconds
Pricing Model: Credit-based (1 char TTS = 1 credit)
Pay as you go: $3.8-$4.0 per 1M credits
Free Tier Available: Yes
Enterprise Plans: Unlimited scaling

Expert Reviews

📝

No reviews yet

Be the first to review Gradium!

Write a Review

Similar Products

Interesting Products

Gradium Review: Key Features and Pros&Cons

Company Overview

Key Metrics

Credibility Rating

Company History

Kyutai Non-Profit AI Lab Founded

Gradium Founded

$70M Seed Funding Round

Public Launch & Revenue Generation

Key Executives

Key Features

Tech Stack

Infrastructure

Technologies

Integrations

AI/ML Capabilities

Use Cases

Api Integrations

Faq

Expert Verdict

Recommended For

!Use With Caution

Not Recommended For

Research Summary

Key Findings

Data Quality

Risk Factors

Additional Info

Funding and Launch

Deployment Options

Voice Cloning Excellence

Competitors

Media and Demos

Alternatives

Audio Quality Metrics

Performance & Scalability

Voice Diversity & Customization

Language & Localization Support

Technical Architecture & Infrastructure

API & Integration Capabilities

WebSocket APIs

REST API

Python SDK

Rust SDK

Livekit Integration

Pipecat Integration

AWS SageMaker

Private Cloud

SSML Input Support

Streaming Output

Compliance & Security Certifications

Licensing, Output Rights & Voice Ethics

Expert Reviews

No reviews yet

Similar Products

Interesting Products

!
Use With Caution