Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.honeycomb.io/llms.txt

Use this file to discover all available pages before exploring further.

These patterns can help you get more out of Honeycomb MCP, whether you are running quick investigations or building fully autonomous workflows.

Querying Honeycomb

Agents can use Model Context Protocol (MCP) tools to explore your data and answer detailed questions about system behavior. Both goal-directed queries (like responding to an alert) and broader investigations (like identifying performance issues) tend to work well with modern large language models (LLMs). To get useful results:
  • Give specific instructions: If you are responding to a Trigger, mention it by name and tell the agent to use it as a starting point.
  • Point to known issues: For example, if you have observed a latency spike or anomaly, describe it in your prompt so the agent can focus on the relevant time window or service.
Even open-ended prompts like “Investigate latency in the api-gateway service” can be productive. In our testing, agents often begin with duration_ms percentiles (p50, p95, p99) as a baseline.
Clean, well-described fields will improve results. Unclear field names or calculated fields without descriptions can confuse the agent or lead to weaker analysis.

Improving instrumentation

MCP can help agents understand and improve your instrumentation, especially when paired with code access or examples. Some patterns that work well:
  • Use live examples: Ask the agent to look at how other services are instrumented in your codebase. For example: “Write a new service and base its instrumentation on other Golang services in this repo.”
  • Combine auto-instrumentation with refinement: Apply zero-code OpenTelemetry instrumentation, then let the agent analyze the results using MCP. The agent can:
    • Identify duplicated telemetry
    • Consolidate or remove redundant spans
    • Create new instrumentation based on actual business logic
  • Audit and iterate: Pair with an agent and ask it to evaluate your overall instrumentation quality against your actual data shape. Once the agent builds understanding, you can commit its artifacts or share them with teammates or other agents as part of a continuous instrumentation improvement loop.

Migrating queries to Honeycomb

LLMs are generally very good at translating between observability query languages, especially when you already have telemetry available in Honeycomb that maps to your old system. If you are migrating from PromQL, Datadog, or another system:
  1. Paste the existing query into the prompt.
  2. Ask the agent to use MCP to generate an equivalent Honeycomb query.
  3. Let it iterate until the result is either a match or a useful approximation.

Running autonomous agents with Honeycomb

If you are building fully autonomous agents that use Honeycomb regularly, you will get better results by helping your agents build context and avoid unnecessary work. Iterating on your prompts and agent guidance is key.
  • Be explicit about what matters: Tell the agent exactly how to query your data. For example, list which environments and datasets are relevant. This prevents the agent from relearning the structure of your system each time.
  • Reduce ambiguity: Provide access to source-of-truth files beyond Honeycomb, like your telemetry schemas. These help the agent investigate more effectively.
  • Capture useful patterns: Save reliable prompts, queries, or instructions in agent memory files. Reusing these lets the agent build on past successes instead of starting from scratch.

Using the Canvas agent

Honeycomb’s Canvas gives teams a collaborative workspace for incident investigations. Because the Canvas agent is exposed over MCP, you can wire it into your own agentic workflows as a durable coordination point for observability work. Agents call canvas_agent_invoke to initiate a turn and canvas_agent_poll_response to retrieve the result. Passing the same investigation_id across calls lets multiple turns (or multiple agents) build on the same investigation over time. Some patterns that work well:
  • Hand off observability work to Canvas: Direct your local or cloud coding agent to call the Canvas agent for observability questions rather than running queries inline. This keeps observability context out of your coding session and produces an auditable record of the investigation for the rest of your team. You can also paste a Canvas URL into a local coding agent to give it the full context of an in-progress investigation before making code changes.
  • Canvas-driven code review: Code review agents that support MCP can call the Canvas agent as part of their review process. For example, they can project how a PR’s changes will affect system state, verify telemetry changes, or check the health of a canary deployment alongside the diff.
  • Coordinate short-lived agents through an investigation: A Canvas investigation is durable and visible to the whole team, which makes it a useful touchpoint for sandboxed or ephemeral agents. Multiple agents can contribute findings to the same investigation, and another agent or a person can pick up the thread later by referencing the investigation_id.

Managing Boards, Triggers, and SLOs

Honeycomb MCP includes write tools for creating and editing Boards, Triggers, SLOs, and the notification recipients that route their alerts. You can use these to let agents capture investigation results, bootstrap alerting and reliability targets for new services, or migrate definitions from other observability tools. Some patterns that work well:
  • Migrate alerts and dashboards: Paste a Datadog monitor definition or Grafana dashboard JSON into the prompt and ask the agent to create equivalent Honeycomb Triggers and Boards. Agents typically read your existing telemetry first to make sure the translation is grounded in what is actually being emitted.
  • Bootstrap SLOs for a new service: Ask the agent to look at error rates and latency for a service over the last week, propose a Service Level Indicator (SLI) expression, and create an SLO at a reasonable target. The create_slo tool auto-creates the SLI derived column as part of the call, so you do not need to define it ahead of time.
  • Audit and tune existing definitions: Ask the agent to review your Triggers or SLOs against recent data and suggest or apply changes.
  • Route alerts to cloud agents: In addition to Honeycomb’s Anomaly Detection features and automatic investigations, you can create webhook recipients for Triggers/SLOs that initiate other agentic workflows.
Version history for Triggers and SLOs is not stored outside of the Honeycomb Activity Log. We strongly recommend that you maintain canonical definitions in an IaC solution such as Terraform.

Using semantic conventions in MCP

OpenTelemetry’s semantic conventions define standard names, types, and units for common attributes, like http.request.method, db.system, or service.name. Honeycomb MCP exposes these conventions to agents through search_semconv, get_semconv_attribute, and list_semconv_namespaces, and overlays them with your team’s custom attribute descriptions from the Weaver registry. Agents use these tools to write better instrumentation and queries, and to ground their reasoning in standard attribute names rather than relying on training data alone. Some patterns that work well:
  • Generate instrumentation that matches the spec: When asking an agent to add OpenTelemetry instrumentation to a service, tell it to use semantic conventions for any attribute that already has one. The agent uses search_semconv and get_semconv_attribute to confirm the canonical name, value type, and units for attributes like http.response.status_code or db.query.text before writing code.
  • Orient an agent in a new domain: For prompts about an unfamiliar area (databases, messaging, GenAI), ask the agent to call list_semconv_namespaces first to see what attribute families exist. This lets it ask better follow-up questions and converge faster than guessing.
  • Encode team-specific knowledge in Weaver: If your team uses non-standard attributes or has stronger opinions about a standard attribute’s semantics, add them to your Weaver registry. Agents see your team’s descriptions through find_columns, get_dataset_columns, and search_semconv, so customization translates directly into better suggestions.

Monitoring Claude Code with Honeycomb

If your team uses Claude Code, you can point its OpenTelemetry exporter at Honeycomb and then use MCP to investigate what Claude Code is doing, like token spend, tool failures, hook activity, session errors, and compaction triggers, directly from your agent. This is meta-observability: using Honeycomb’s agent to debug another agent. Claude Code emits traces that follow the OpenTelemetry GenAI semantic conventions, which is the same data shape list_aiconversations and get_aiconversation are built around. Once telemetry is flowing, you can ask the Honeycomb agent natural-language questions about specific sessions and get useful answers without writing a query. To learn how to set up the exporter, visit Anthropic’s monitoring guide. Enable the traces beta (CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1) to get the richest data, and point OTEL_EXPORTER_OTLP_ENDPOINT at Honeycomb’s OTLP endpoint with your ingest API key set as the x-honeycomb-team header in OTEL_EXPORTER_OTLP_HEADERS. Some patterns to ask the Honeycomb agent about:
  • Find the most expensive sessions: “Which Claude Code sessions used the most tokens this week?” The agent sums gen_ai.usage.input_tokens and gen_ai.usage.output_tokens, grouped by session.id, user.email, vcs.branch, or session.cwd.
  • Triage long or failing sessions: Call list_aiconversations to rank conversations by event count and error count, then get_aiconversation on the worst offender to see every LLM call, tool call, and error in order, with token totals and durations.
  • Audit tool failures: “Which Claude Code tools have the highest failure rate?” The agent groups by tool.name and tool.outcome, which helps identify brittle MCP servers, hooks, or bash patterns the agent keeps tripping over.
  • Track permission friction: “Which tool calls got blocked on permission prompts today, and how long did they wait?” This drills into claude_code.tool.blocked_on_user spans, which is useful for tuning unattended workflows.
  • Compare models and skills: Group token usage by gen_ai.request.model or gen_ai.skill.names to see which models are doing the work and which skills are loading most frequently.
This approach works for any tool that emits gen_ai semconv data, not just Claude Code. If you have Cursor, custom agents, or other LLM-driven tools in your stack, the same MCP workflows let one agent debug another.