Best Model Evaluation & Testing Software

Model Evaluation & Testing software solutions.

11Products
6Related Categories
11 products in Model Evaluation & Testing
Sort by: Score

Giskard

giskard.ai
#1 in this category

Giskard is an AI security testing platform that detects vulnerabilities in LLM agents through red teaming, including hallucinations, prompt injections, and security flaws. Open-source Python library and enterprise Hub available.

Arize AI

arize.com
#2 in this category

Arize AI is a unified LLM observability and agent evaluation platform for monitoring, troubleshooting, and improving AI models and applications in production.

WhyLabs

whylabs.ai
#3 in this category

WhyLabs is an AI observability platform that monitors machine learning models, data pipelines, and generative AI applications for quality, performance, security, and issues like drift and bias.

Fiddler AI

fiddler.ai
#4 in this category

Fiddler AI is an all-in-one AI Observability and Security platform that provides real-time monitoring, guardrails, root cause analysis, and governance for deploying AI agents, LLMs, and ML models in production.

Truera

truera.com
#5 in this category

Truera is a provider of AI observability platforms for machine learning monitoring, quality management, explainability, and predictive diagnostics across model lifecycles.

Deepchecks

deepchecks.com
#6 in this category

Deepchecks is a platform for evaluating and monitoring machine learning models, with a focus on large language models (LLMs) to detect issues like hallucinations, bias, and performance drift.

Confident AI

confidentai.com
#7 in this category

Confident AI is a cloud platform for evaluating, testing, and monitoring large language model applications with metrics, observability tools, and CI/CD integration.

Kolena

kolena.com
#8 in this category

Kolena is a San Francisco-based AI platform for testing, benchmarking, and validating machine learning models while automating document-heavy workflows in sectors like real estate, insurance, and finance.

CTGT

ctgt.ai
#9 in this category

CTGT is a product-focused frontier interpretability lab that uses mathematically-guaranteed techniques to identify AI errors, biases, and hallucinations for safe, transparent deployment in high-stakes industries like healthcare and finance.

LMArena

lmarena.ai
#10 in this category

LMArena is a community-powered platform for blind head-to-head comparisons of AI models using real-user votes to generate human preference data and live leaderboards for evaluation.

Braintrust

braintrustdata.com
#11 in this category

Braintrust is a platform for developing, evaluating, and observing AI applications, offering tools for prompt management, performance tracking, evals, logging, and production traces used by companies like Zapier and Instacart.