Tools

Search

Explore

Videos Channels Figures

Atmrix

About

Tools

Search

Explore

Videos Channels Figures

Atmrix

About

HC

Harrison Chase

05/28/25

@ LangChain

Using in-the-loop evals is beneficial when the tolerance for mistakes is low, although it may increase time and costs.

Video

L

How to Solve the #1 Blocker for Getting AI Agents in Production | LangChain Interrupt

@ LangChain

How to Solve the #1 Blocker for Getting AI Agents in Production | LangChain Interrupt

05/28/25

Related Takeaways

HC

Harrison Chase

05/28/25

@ LangChain

In-the-loop evals occur during the agent's operation, enabling real-time corrections and improvements to response quality before the agent responds.

AN

Andrew Ng

05/29/25

@ LangChain

Many developers overlook the importance of creating simple evaluations, which can help identify regressions and improve system performance incrementally.

HC

Harrison Chase

05/28/25

@ LangChain

Offline evals involve running your app against a dataset before production to measure performance and track changes over time.

HC

Harrison Chase

05/28/25

@ LangChain

There are three types of evaluations: offline evals, online evals, and in-the-loop evals, each serving different purposes in the evaluation lifecycle.

JB

Joe Benton

03/18/25

@ Anthropic

Building effective control evaluations is difficult, as they need to closely resemble actual deployment settings to yield valid insights.

HC

Harrison Chase

05/28/25

@ LangChain

To bridge the gap from prototype to production and improve quality, using evaluations throughout different stages of development is essential.

HC

Harrison Chase

05/28/25

@ LangChain

There are three types of evaluators: deterministic code-based evaluations, LLM-as-judge techniques for more complex assessments, and human annotation for real-time feedback.

HC

Harrison Chase

05/28/25

@ LangChain

Online evals assess the app's performance in real-time using actual production data, allowing for immediate tracking of performance.

HC

Harrison Chase

05/28/25

@ LangChain

Using LLMs as judges for evaluating outputs is promising, but it requires careful prompt engineering to ensure accurate grading of responses.