Giskard Review: Key Features and Pros&Cons

Name: Giskard
Author: Giskard

What it is:Giskard is an AI security testing platform that detects vulnerabilities in LLM agents through red teaming, including hallucinations, prompt injections, and security flaws. Open-source Python library and enterprise Hub available.
Best for:Enterprise AI teams building LLM agents, Organizations with compliance needs, Teams needing on-premise deployment
Pricing:Free tier available, paid plans from Contact for pricing
Rating:85/100Very Good
Expert's conclusion:Giskard is highly recommended for enterprises interested in LLM safety as well as continuous red teaming.

Reviewed byMaxim Manylov·Web3 Engineer & Serial Founder

Company Overview

Giskard is a software company out of France focused on providing testing and quality assurance for AI models, including detecting potential biases and vulnerabilities in AI models, and ensuring that the AI models meet compliance requirements. Giskard was founded in 2021 by experienced AI professionals who worked for Dataiku, Thales, and CERN. Giskard offers free open source and enterprise level solutions for AI teams to ensure their AI systems are reliable and ethical. Giskard also offers a comprehensive testing platform for both Large Language Models (LLMs) and other types of machine learning (ML) models for use within the enterprise.

Active

📍Paris, France

📅Founded 2021

🏢Private

TARGET SEGMENTS

AI/ML EngineersData ScientistsEnterprise AI TeamsFinancial InstitutionsLarge Corporations

Key Metrics

📊

$4.91M

Raised Funding

💵

$2M

Estimated Revenues

🏢

11-50

Employees

📊

Initial Revenues

Company Stage

📊

Popular open-source library

GitHub Popularity

Credibility Rating

85/100

Excellent

Giskard’s credibility is demonstrated by the support it received from well known venture capital firms and co-founders of leading AI companies, as well as the adoption of Giskard’s holistic testing platform by major firms and the rapid growth of Giskard’s open source framework that addresses many of the most pressing AI safety needs.

BREAKDOWN

Funding & Backers90/100

Market Traction88/100

Team Expertise92/100

Product Innovation85/100

TRUST SIGNALS

Backed by top VCs (Bessemer, Elaia)Endorsed by Hugging Face and Mistral co-foundersEnterprise customers including banks and Fortune 500 companiesOpen-source GitHub library gaining tractionFocus on EU AI Act compliance

Company History

2015

Founders Meet

At Capgemini, Alex Combessie and Jean-Marie John-Mathews form a connection which lays the groundwork for potential future collaboration related to AI.

2021

Company Founded

Giskard AI is formed in Paris by Alex Combessie, Jean-Marie John-Mathews, and Andrey Avtomonov in order to solve the quality problems associated with AI models.

2022

Early Growth

The product development process begins to accelerate with the focus on developing an open-source testing framework for AI.

2023

Funding & Expansion

Giskard raises $4.91 million from Bessemer Venture Partners and Elaia; releases an open-source LLM testing framework; expands its team to twenty employees.

Key Executives

Alex Combessie— Co-founder & Co-CEO: An engineer from former Dataiku with over eleven years of experience in NLP model integration and software engineering; also has experience in data science and business development.
Jean-Marie John-Mathews, PhD— Co-founder & Co-CEO: A PhD qualified researcher in AI ethics; former employee of Capgemini and Thales; has extensive experience in data science and philosophy.
Matteo Dora, PhD— Chief Technology Officer: Technical leadership role in AI development.
Kevin Messiaen— Software Engineering Lead: Leads engineering efforts for the Giskard platform.

Key Features

✨

Open-Source Python Library

Allows users to integrate the testing framework into their ML pipelines in order to create automated test suites that include performance testing, bias detection, hallucination detection, vulnerability scanning, and prompt injection attacks.

✨

AI Quality Hub

Premium tool used for debugging LLMs, comparing different models, analyzing errors, and documenting regulatory compliance.

✨

LLMon Monitoring

Provides real time evaluations of the output of LLMs for toxicity, hallucinations, and fact checking before the LLM output is delivered to users.

🔗

CI/CD Integration

Automated Testing in Development Pipelines via GitHub Scan Reports & Custom Test Suites for RAG Applications 14

👥

Red Teaming & Security Testing

Detection of AI Vulnerabilities Including Prompt Injection, Data Leakage, Harmful Content Generation, etc. 15

Tech Stack

Infrastructure

Self-hosted options available; cloud-agnostic with CI/CD pipeline support

Technologies

PythonOpen-Source SDK

Integrations

Hugging FaceMLFlowWeights & BiasesPyTorchTensorFlowLangChainOpenAI APIs

AI/ML Capabilities

LLM evaluation, RAG testing, model robustness, bias detection, vulnerability scanning, hallucination detection, automated test generation

Based on product descriptions from official site, TechCrunch, and company profiles

Use Cases

Enterprise AI Teams at Banks (e.g., Societe Generale, BPCE)

Ensures Regulatory Compliance & Detects Biases / Security Risks in Financial AI Models 16

LLM Developers Building Chatbots

Automatic Detection of Hallucinations, Prompt Injections & Misinformation 17

RAG Application Builders

Custom Test Suites Utilizing Vector Databases for Domain-Specific Accuracy 18

Data Science Teams in Manufacturing (e.g., Michelin)

CI/CD Integration for Robust Deployment of ML Models 19

GenAI Security Teams

Red Teaming/Vulnerability Scanning for Production AI Systems 20

Consumer-Facing AI Startups

Real-Time Monitoring(LLMon) to Prevent Toxic/Harmful Outputs 21

NOT FORNon-AI Software Teams

Limited Value Without ML Model Evaluation Needs 22

NOT FORSmall Teams Without CI/CD

Advanced Automation Requirements Existent Only With Development Pipeline Infrastructure 23

Pricing

Pricing information with service tiers, costs, and details
☐Service	$Cost	ℹDetails	🔗Source
Open-Source Library	Free	Free open-source library for AI red teaming, LLM evaluation, and testing.	Giskard Pricing page
Giskard Hub (Enterprise Platform)	Contact for pricing	Enterprise platform for continuous AI red teaming, LLM security testing, and RAG evaluation. No hidden fees; additional costs for customization.	Giskard website and CompareYourTech
On-Premise Deployment	Custom (contact sales)	Available for mission-critical workloads in public sector, defense, or sensitive applications.	Giskard FAQ

Open-Source LibraryFree

Free open-source library for AI red teaming, LLM evaluation, and testing.

Giskard Pricing page

Giskard Hub (Enterprise Platform)Contact for pricing

Enterprise platform for continuous AI red teaming, LLM security testing, and RAG evaluation. No hidden fees; additional costs for customization.

Giskard website and CompareYourTech

On-Premise DeploymentCustom (contact sales)

Available for mission-critical workloads in public sector, defense, or sensitive applications.

Giskard FAQ

Competitive Comparison

Feature	Giskard	Competitors (General)
Automated Red Teaming	Yes - Dynamic multi-turn attacks, 40+ probes	Often static testing
LLM Security Testing	Yes - Hallucinations, vulnerabilities, domain-specific	Limited domain specificity
Open-Source Option	Yes - 1900+ GitHub stars	Varies
On-Premise Support	Yes - For sensitive apps	Limited
Data Residency (EU/US)	Yes	Not always
RBAC & Audit Trails	Yes	Varies
SOC 2 Type II, HIPAA	Yes	Varies

Automated Red Teaming

GiskardYes - Dynamic multi-turn attacks, 40+ probes

Competitors (General)Often static testing

LLM Security Testing

GiskardYes - Hallucinations, vulnerabilities, domain-specific

Competitors (General)Limited domain specificity

Open-Source Option

GiskardYes - 1900+ GitHub stars

Competitors (General)Varies

On-Premise Support

GiskardYes - For sensitive apps

Competitors (General)Limited

Data Residency (EU/US)

GiskardYes

Competitors (General)Not always

RBAC & Audit Trails

GiskardYes

Competitors (General)Varies

SOC 2 Type II, HIPAA

GiskardYes

Competitors (General)Varies

Competitive Position

vs Static LLM Testing Tools

Giskard Uses Dynamic, Multi-Turn Red Teaming Agents Adapting in Real Time vs Static Probes 24

Superior for Detection of Sophisticated Conversational Vulnerabilities 27

vs Network-Layer Security Tools

Giskard Tests Domain-Specific Hallucinations & Quality Issues Proactively in Development Pipeline 25

Prevents Business Failures Missed by Reactive Monitoring 28

vs General ML Testing Platforms

Giskard Specializes in Black-Box API Testing of LLM Agents with Reproducible Test Suites 26

Strongest for Conversational AI Security & Regression Prevention 29

Pros Cons

Pros

Comprehensive Coverage of LLM Vulnerabilities – 40+ Probes for Security & Business Failures 30
Open Source Library Available – Free Tier w/ 1900+ GitHub Stars 31
Proactive Testing Pipeline – Converts Issues to Reproducible Test Suites to Prevent Regressions 32
Enterprise Grade Security – SOC 2 Type II, HIPAA, GDPR, Data Residency Options 33
On-Premise Deployment Option – Available for Sensitive Mission-Critical Applications 34

Cons

No Public Pricing Information – Contact Sales for Quotes 35
Limited to Text-To-Text Conversational Agents – Requires API Endpoint Access 36
Additional Costs May be Added For Customization – Fees for Specific Enterprise Needs 37 START_TEXT

Best For

Enterprise AI teams building LLM agents — The company provides continuous Red Teaming, Security Testing & Regression Prevention in Development Pipeline.
Organizations with compliance needs — Company offers GDPR, SOC 2 Type II, HIPAA Compliance along with Data Residency & Role Based Access Control.
Teams needing on-premise deployment — Company supports Sensitive Applications in Public Sector or Defense.

Not Suitable For

Small teams with simple ML models — Pricing is Enterprise Focused but Open-Source is suitable for Basic Needs.
Non-API accessible LLM agents — Black-Box testing just requires API Endpoint Access Only.
Real-time only guardrail users — Company focuses on Batch Evaluations and Proactive Testing rather than Pure Runtime Protection.

Limits Restrictions

Supported Agents: Conversational AI agents in text-to-text mode via API endpoint (black-box)
Open-Source Scope: Core evaluation and testing; enterprise features require Hub
Deployment Options: Cloud (EU/US residency), on-premise for qualified customers
Data Policy: 0-training policy; no model training on customer data

Security Compliance

SOC 2 Type IIAchieved compliance certification.

HIPAACompliance for handling protected health information.

GDPRNative adherence as European entity.

Data Residency & IsolationEU or US processing with isolation guarantees.

RBAC & Audit TrailsRole-based access control, audit logging, IP provider integration.

EncryptionEnd-to-end encryption at rest and in transit.

0-Training PolicyNo training on customer data or IP.

Customer Support

Channels

https://www.giskard.ai/contactprivacy@giskard.aiCustomer engineers via call and email for Enterprise subscribers24x7x365 technical support through AWS infrastructure

Hours: Enterprise support: Not specified; AWS support: 24x7x365
Response Time: Fast-response for AWS support; enterprise details not specified
Satisfaction: Not publicly available
Specialized: Technical consulting from AI security team available after Hub subscription
Business Tier: Giskard Enterprise subscription provides dedicated customer engineers via call and email

Support Limitations

•Basic support via contact form; full customer engineer access requires Enterprise subscription

•No mention of live chat or phone support for standard users

Api Integrations

Black-box API Testing: Supports LLM agents accessible via API endpoint without needing internal model details
Hugging Face Integration: Compatible with Hugging Face Hub models and API
LangChain Embeddings: Supports LangChain for knowledge base integration in evaluations
RetrievalQA Chains: Integrates with LangChain RetrievalQA for RAG-based LLM testing
AWS Marketplace: Deployable via AWS with API access

Faq

What support channels does Giskard offer?

For Support Customers have contact form at giskard.ai/contact, Email Support and Enterprise Subscribers also receive Customer Engineers via Call/Email. If you are an AWS user then you will receive 24x7 support.

Is Giskard suitable for testing LLM chatbots?

Yes Giskard has specialization in Evaluation of Conversational AI Agents in Text-To-Text Mode through Black Box API Testing.

What security features does Giskard provide?

Company has Data Residency (EU/US), RBAC, Audit Trails, IP Controls & 0 Training Policy to Protect Customer Data.

Does Giskard require model internals for testing?

No, Giskard works as Black-Box Testing - Company Just Requires Access to Your LLM Agent's API End Point.

How does Giskard help fix AI vulnerabilities?

Following Scanning Enterprise Users Can Get Technical Consulting from Giskard's AI Security Team for Mitigation.

What enterprise customers use Giskard?

AXA, BNP Paribas, Michelin & Societe Generale Trust Giskard for LLM Evaluation & Security.

Is Giskard accessible to non-technical users?

Yes, Designed for Business Stakeholders with Collaborative Red-Teaming Playground & Annotation Tools.

What pricing model does Giskard use?

AWS Marketplace Charges Per Project Initialization; Enterprise Subscription Available with Dedicated Support.

Expert Verdict

Giskard is Specialized AI Safety Platform That Excels In LLM Red Teaming & Vulnerability Detection Through Black-Box API Testing. It Has Enterprise Grade Security Features & Integration Capabilities Making It Ideal for Production Deployments of AI. Customer Validation From Major Enterprises Confirm Its Effectiveness.

Teams of Developers Building LLM Applications.
Companies Prioritizing AI Security & Compliance.
ML Engineers Needing Robust Model Evaluation Pipelines.
Companies Using Hugging Face or LangChain Ecosystems.

!
Use With Caution

Small Teams Wanting Simple Open-Source Testing
Users requiring the ability to inspect their white box models of LLMs
Start-ups with budgetary constraints that do not have the need for an enterprise version of a product

Not Recommended For

Testing of non-LLM machine learning (ML) models
Only real time inference monitoring

Expert's Conclusion

Giskard is highly recommended for enterprises interested in LLM safety as well as continuous red teaming.

Best For

Teams of Developers Building LLM Applications.Companies Prioritizing AI Security & Compliance.ML Engineers Needing Robust Model Evaluation Pipelines.

Research Summary

Key Findings

Giskard provides black-box security testing of LLMs through API's, and has strong adoption from enterprise clients (AXA, BNP Paribas), EU/US data residency options, and dedicated customer engineering support for enterprise clients.

Data Quality

High - Direct from official site, AWS Marketplace, and customer case studies

Risk Factors

Enterprise-level support requires paid subscription

The limited information provided regarding the free tier does not provide enough detail

Quality of support can vary based on subscription level

Last updated: January 2026

Additional Info

Enterprise Customers

Giskard is trusted by AXA, BNP Paribas, Michelin, and Societe Generale for production evaluation pipelines of LLMs.

Security & Compliance

Data residency options are available within the EU and US, Role Based Access Control (RBAC), Audit Trails, 0-training policy protects customer and client IP/data.

Deployment Options

Giskard is available on the AWS Marketplace and includes infrastructure support, and will support any LLM agent accessible via API.

Use Cases

Chatbots providing customer service, conversational AI agents, etc.; Giskard excels in detecting hallucinations, robustness testing.

Technical Focus

Black-box testing does not require knowledge of internal model structure; Giskard integrates with Hugging Face and LangChain ecosystems.

Business Accessibility

Red-teaming collaborative environment and annotation tools for domain experts and Product Managers.

Alternatives

•
AgentBench: Benchmarking platform for AI agents; more research-focused than Giskard's production security focused approach. Better suited for academic evaluation.
•
Sendbird AI Agent Platform: Automation of customer support across SMS/web/mobile; channel-focused rather than Giskard's focus on model security testing. Better suited for multi-channel deployment.
•
LangSmith: LangChain tracing/debugging tool; developer-centric rather than Giskard's focus on security and red teaming. Better suited for LangChain-specific workflows.
•
Weights & Biases (W&B): Platform for tracking experiments of ML; broad ML ops rather than Giskard's specific focus on LLM security. Better suited for model training workflows.
•
Honeycomb: Observability platform; runtime monitoring rather than Giskard's focus on pre-deployment vulnerability testing. Better suited for production inference.

Evaluation Metrics

Ratio of correctly predicted events

Accuracy

Harmonic mean of precision and recall

F1-score

Plots false positives vs true positives

AUC (Area Under Curve)

Prediction uncertainty measure

Log Loss

True positives over predicted positives

Precision

True positives over actual positives

Recall

String comparison to expected output

Exact Match

Testing Capabilities

Automated Vulnerability Scanning

Finds hidden vulnerabilities using scan reports.

Performance Bias Detection

Examines the bias that is present in different subgroups by slicing the metrics.

Subgroup Fairness Testing

Evaluates how well a model performs on protected or proxy variables.

Custom Test Suites

Provides a way to consistently compare the same metrics across multiple models.

RAG System Evaluation

Metrics for evaluating the performance of a model in which the retrieval-augmentation of generation is being used.

Cross-Validation Support

Multiple data sets for measuring robustness when evaluating efficiency.

Benchmark Support

Benchmark	Description	Supported
LLM Evaluation	Fairness, robustness, explainability metrics	Yes
RAG Evaluation	Correctness, context precision by topic	Yes
Classification Benchmarks	Accuracy, F1, AUC on standard datasets	Yes
Fairness Benchmarks	Demographic parity across subgroups	Yes