Tools
Search
Import
Library
Explore
Videos
Channels
Figures
Atmrix
About
Tools
Search
Import
Library
Explore
Videos
Channels
Figures
Atmrix
About
Go Back
HC
Harrison Chase
05/28/25
@ LangChain
Using in-the-loop evals is beneficial when the tolerance for mistakes is low, although it may increase time and costs.
Video
L
How to Solve the #1 Blocker for Getting AI Agents in Production | LangChain Interrupt
@ LangChain
05/28/25
Related Takeaways
HC
Harrison Chase
05/28/25
@ LangChain
In-the-loop evals occur during the agent's operation, enabling real-time corrections and improvements to response quality before the agent responds.
AN
Andrew Ng
05/29/25
@ LangChain
Many developers overlook the importance of creating simple evaluations, which can help identify regressions and improve system performance incrementally.
HC
Harrison Chase
05/28/25
@ LangChain
Offline evals involve running your app against a dataset before production to measure performance and track changes over time.
HC
Harrison Chase
05/28/25
@ LangChain
There are three types of evaluations: offline evals, online evals, and in-the-loop evals, each serving different purposes in the evaluation lifecycle.
JB
Joe Benton
03/18/25
@ Anthropic
Building effective control evaluations is difficult, as they need to closely resemble actual deployment settings to yield valid insights.
HC
Harrison Chase
05/28/25
@ LangChain
To bridge the gap from prototype to production and improve quality, using evaluations throughout different stages of development is essential.
HC
Harrison Chase
05/28/25
@ LangChain
There are three types of evaluators: deterministic code-based evaluations, LLM-as-judge techniques for more complex assessments, and human annotation for real-time feedback.
HC
Harrison Chase
05/28/25
@ LangChain
Online evals assess the app's performance in real-time using actual production data, allowing for immediate tracking of performance.
HC
Harrison Chase
05/28/25
@ LangChain
Using LLMs as judges for evaluating outputs is promising, but it requires careful prompt engineering to ensure accurate grading of responses.