Home → News → March 2026

March 2026 News

4 articles from March 2026

← February 2026

April 2026 →

The Agent Improvement Loop with Traces, Evals, and LangSmith

LangChain • Mar 31, 2026 • 35d ago

Learn how to systematically improve AI agents using a trace-driven feedback loop powered by LangSmith. The approach centers on collecting execution traces from staging, testing, and production, enriching them with automated evaluations and human annotations, and using those insights to identify failure patterns. Developers then make targeted updates across model prompts, orchestration logic, or context layers, and validate improvements through offline evaluation suites before deployment. Continuous production monitoring with online evals and insights ensures regressions are caught early and performance improves over time. This iterative loop—trace collection, enrichment, debugging, evaluation, and redeployment—enables reliable, data-driven optimization of agent behavior at scale.

Agent Evaluation Checklist: How to Build, Run, and Ship Agent Evals

Langchain • Mar 27, 2026 • 39d ago

Build effective agent evaluation systems by starting with simple, high-signal end-to-end evals and iteratively increasing complexity. Use observability tools like LangSmith to analyze real agent traces, define clear success criteria, and separate capability vs regression evals. Focus heavily on failure analysis by categorizing issues (prompt design, tool interfaces, model limits, or data gaps) before automating evaluation. Leverage evaluation levels—single-step (run), full-turn (trace), and multi-turn (thread)—with trace-level evals as the most practical starting point. Ensure infrastructure issues are ruled out, assign ownership to a domain expert, and validate not just outputs but real-world state changes. This approach improves agent reliability, debugging, and continuous performance optimization.

Designing Effective Evals for Deep Agents

LangChain • Mar 26, 2026 • 40d ago

Learn how to build targeted evaluation systems that directly shape agent behavior by measuring real-world capabilities like tool use, retrieval, and multi-step reasoning. This approach emphasizes curating evals from production traces, dogfooding feedback, and adapted benchmarks, rather than relying on large generic test suites. The system uses categorized evals (e.g., tool_use, memory, retrieval) and metrics such as correctness, step ratio, tool call ratio, latency ratio, and solve rate to assess both accuracy and efficiency. By analyzing traces and defining ideal execution trajectories, teams can iteratively improve agent performance, reduce cost, and prevent regressions while maintaining alignment with production needs.

Autonomous context compression

LangChain • Mar 11, 2026 • 55d ago

Deep Agents introduces a tool in its Python SDK and CLI that allows agents to autonomously compress their context windows at optimal moments. Instead of relying on fixed token thresholds, agents can now summarize older context when it becomes less relevant—such as at task boundaries, before large context ingestion, or after extracting key insights. This improves efficiency, reduces context rot, and aligns memory management with the agent’s reasoning process. The system retains recent messages (about 10% of context) while summarizing older interactions, enabling better long-horizon performance without manual intervention or rigid harness tuning.

March 2026 News

The Agent Improvement Loop with Traces, Evals, and LangSmith

Agent Evaluation Checklist: How to Build, Run, and Ship Agent Evals

Designing Effective Evals for Deep Agents

Autonomous context compression

Other Months

Browse by Topic

Quick Links