# Arklex

> Arklex is an AI agent testing and governance company. **ArkSim** is the open-source agent simulation and evaluation framework (Apache-2.0). The **Arklex Platform** provides cloud-based agent testing, governance, and deployment tools. This file covers ArkSim, the open-source product.

## Company

- [Homepage](https://arklex.ai): Arklex company homepage with product overview, core features, and FAQ
- [About Us](https://arklex.ai/home/aboutus): Arklex team, mission, and company background
- [Blog](https://arklex.ai/home/blogs): Technical articles on AI agent testing, evaluation strategies, and product updates

## ArkSim (Open Source)

ArkSim is a Python CLI tool that simulates realistic multi-turn conversations with AI agents and evaluates their performance. Install with `pip install arksim`.

### Documentation

- [ArkSim Docs](https://docs.arklex.ai): Full documentation
- [ArkSim Docs llms.txt](https://docs.arklex.ai/llms.txt): Machine-readable index of all documentation pages
- [Overview](https://docs.arklex.ai/main/overview): What ArkSim is, who it is for, core capabilities, and FAQ
- [Quickstart](https://docs.arklex.ai/main/quickstart): Install and run your first simulation in minutes
- [Build Scenarios](https://docs.arklex.ai/main/build-scenario): Define test scenarios with user profiles, goals, and knowledge
- [Simulate Conversations](https://docs.arklex.ai/main/simulate-conversation): Generate multi-turn conversations between synthetic users and your agent
- [Evaluate Conversations](https://docs.arklex.ai/main/evaluate-conversation): Score agent performance on quantitative and qualitative metrics
- [Tool Call Capture](https://docs.arklex.ai/main/tool-call-capture): Capture and evaluate tool calls from agent responses
- [Integrations](https://docs.arklex.ai/main/integrations): Connect to agents built with LangChain, CrewAI, OpenAI Agents SDK, and 14+ frameworks
- [CI Integration](https://docs.arklex.ai/main/ci-integration): Run ArkSim as a quality gate in GitHub Actions, CircleCI, or any CI pipeline

### Source Code and Releases

- [GitHub](https://github.com/arklexai/arksim): Source code (Apache-2.0)
- [PyPI](https://pypi.org/project/arksim/): Install with pip install arksim
- [Changelog](https://github.com/arklexai/arksim/blob/main/CHANGELOG.md): Release history

### Research

- [Research Paper (arXiv:2510.11997)](https://arxiv.org/abs/2510.11997): Academic paper on simulation-based agent evaluation

### Example Evaluations

- [Insurance Customer Service Agent](https://docs.arklex.ai/main/insurance-customer-service-agent-evaluation): End-to-end evaluation of an insurance support agent
- [E-Commerce Customer Service Agent](https://docs.arklex.ai/main/e-commerce-customer-service-agent-evaluation): Multi-scenario evaluation of a shopping assistant
- [Tool-Calling Agent](https://docs.arklex.ai/main/customer-service-tool-calling-agent-evaluation): Evaluation of an agent that uses external tools
- [Personal AI Assistant (OpenClaw)](https://docs.arklex.ai/main/personal-ai-assistant-openclaw-evaluation): Evaluation of a general-purpose assistant

### Key Capabilities (ArkSim)

- **Agent Simulation**: Generates LLM-powered simulated users that hold realistic multi-turn conversations with your AI agent
- **Agent Evaluation**: Scores agent responses on helpfulness, coherence, relevance, faithfulness, verbosity, and goal completion
- **Scenario-Based Testing**: Define test scenarios with user profiles, goals, and prior knowledge to stress-test agent behavior
- **Failure Detection**: Automatically categorizes agent failures (false information, disobedience, repetition, lack of specificity, missing clarification)
- **CI/CD Integration**: Run as an automated quality gate in CI pipelines to catch regressions before they ship
- **Framework Agnostic**: Works with 14+ agent frameworks including LangChain, CrewAI, OpenAI Agents SDK, Claude Agent SDK, AutoGen, PydanticAI, and more
- **Tool Call Evaluation**: Capture and evaluate tool calls from agent responses across custom Python agents, Chat Completions HTTP, and A2A protocol
- **Multi-Turn Testing**: Test agent behavior across multiple conversation turns to catch context loss, contradictions, and goal drift