Why Observability for LLMs
Converting user input (and some contextual data for that user) into a useful output is something that LLMs are great at. However, it can often be difficult to debug then when they fail at their task. The reasons why they fail are varied:- Natural language inputs can be ill-specified, ambiguous, or simply unexpected.
- You have little to no hope of predicting what users will input.
- Similarly, you have little to no hope of predicting how the LLM will respond to a given input, let alone how useful that output is for users.
- When users are presented with a natural language input, they may try things they would have otherwise not thought to try with other systems.
- Small changes to the prompt can have a large impact on the output, making regressions easy.
- Depending on the model you’re using or its settings, outputs are nondeterministic. Furthermore, this may be by design.
LLM Observability Needs Traces
Trace data is essential to understanding the lifecycle of a system end-to-end as requests flow through it. For LLMs, it’s critical to use OpenTelemetry traces for two reasons:- Traces let you represent several operations that perform meaningful work before or after a call to an LLM.
- OpenTelemetry lets you correlate the behavior tracked in an LLM request with all other behavior in your application.