# Arklex > Arklex is an AI agent testing and governance company. **ArkSim** is the open-source agent simulation and evaluation framework (Apache-2.0). The **Arklex Platform** provides cloud-based agent testing, governance, and deployment tools. This file covers ArkSim, the open-source product. ## Company - [Homepage](https://arklex.ai): Arklex company homepage with product overview, core features, and FAQ - [About Us](https://arklex.ai/home/aboutus): Arklex team, mission, and company background - [Blog](https://arklex.ai/home/blogs): Technical articles on AI agent testing, evaluation strategies, and product updates ## ArkSim (Open Source) ArkSim is a Python CLI tool that simulates realistic multi-turn conversations with AI agents and evaluates their performance. Install with `pip install arksim`. ### Documentation - [ArkSim Docs](https://docs.arklex.ai): Full documentation - [ArkSim Docs llms.txt](https://docs.arklex.ai/llms.txt): Machine-readable index of all documentation pages - [Overview](https://docs.arklex.ai/main/overview): What ArkSim is, who it is for, core capabilities, and FAQ - [Quickstart](https://docs.arklex.ai/main/quickstart): Install and run your first simulation in minutes - [Build Scenarios](https://docs.arklex.ai/main/build-scenario): Define test scenarios with user profiles, goals, and knowledge - [Simulate Conversations](https://docs.arklex.ai/main/simulate-conversation): Generate multi-turn conversations between synthetic users and your agent - [Evaluate Conversations](https://docs.arklex.ai/main/evaluate-conversation): Score agent performance on quantitative and qualitative metrics - [Tool Call Capture](https://docs.arklex.ai/main/tool-call-capture): Capture and evaluate tool calls from agent responses - [Integrations](https://docs.arklex.ai/main/integrations): Connect to agents built with LangChain, CrewAI, OpenAI Agents SDK, and 14+ frameworks - [CI Integration](https://docs.arklex.ai/main/ci-integration): Run ArkSim as a quality gate in GitHub Actions, CircleCI, or any CI pipeline ### Source Code and Releases - [GitHub](https://github.com/arklexai/arksim): Source code (Apache-2.0) - [PyPI](https://pypi.org/project/arksim/): Install with pip install arksim - [Changelog](https://github.com/arklexai/arksim/blob/main/CHANGELOG.md): Release history ### Research - [Research Paper (arXiv:2510.11997)](https://arxiv.org/abs/2510.11997): Academic paper on simulation-based agent evaluation ### Example Evaluations - [Insurance Customer Service Agent](https://docs.arklex.ai/main/insurance-customer-service-agent-evaluation): End-to-end evaluation of an insurance support agent - [E-Commerce Customer Service Agent](https://docs.arklex.ai/main/e-commerce-customer-service-agent-evaluation): Multi-scenario evaluation of a shopping assistant - [Tool-Calling Agent](https://docs.arklex.ai/main/customer-service-tool-calling-agent-evaluation): Evaluation of an agent that uses external tools - [Personal AI Assistant (OpenClaw)](https://docs.arklex.ai/main/personal-ai-assistant-openclaw-evaluation): Evaluation of a general-purpose assistant ### Key Capabilities (ArkSim) - **Agent Simulation**: Generates LLM-powered simulated users that hold realistic multi-turn conversations with your AI agent - **Agent Evaluation**: Scores agent responses on helpfulness, coherence, relevance, faithfulness, verbosity, and goal completion - **Scenario-Based Testing**: Define test scenarios with user profiles, goals, and prior knowledge to stress-test agent behavior - **Failure Detection**: Automatically categorizes agent failures (false information, disobedience, repetition, lack of specificity, missing clarification) - **CI/CD Integration**: Run as an automated quality gate in CI pipelines to catch regressions before they ship - **Framework Agnostic**: Works with 14+ agent frameworks including LangChain, CrewAI, OpenAI Agents SDK, Claude Agent SDK, AutoGen, PydanticAI, and more - **Tool Call Evaluation**: Capture and evaluate tool calls from agent responses across custom Python agents, Chat Completions HTTP, and A2A protocol - **Multi-Turn Testing**: Test agent behavior across multiple conversation turns to catch context loss, contradictions, and goal drift