Headless Dashboards for LLM Agents

If you’ve ever pointed an LLM at your data warehouse and asked it a question, you’ve probably had the experience of getting back a confident, well-formatted, completely fabricated number. The model writes beautiful SQL. The SQL runs. The answer is wrong. You only catch it because you happen to know what the right answer looks like.

This is the core tension with LLM-powered analytics. Static dashboards are trustworthy and can be parameterized, but require eyeballs. Open-ended LLMs querying your data are flexible, but they improvise.

There’s a third option I’ve been calling the headless dashboard: a constrained surface of parameterized SQL functions that the agent composes but cannot redefine. A well-designed interface doesn’t reduce capability, it shapes choices. The SQL contracts are the agent’s control panel.

A tangle of glowing threads resolving into a grid of clean, labeled blocks. — Raw warehouse access is a tangle to reason about. A headless dashboard is a fixed set of blocks to pick from.

The pattern in one sentence

The system defines truth. The agent defines how to investigate it.

Each SQL function is a tool with typed inputs, an explicit output schema, and stable business logic. The agent picks which function to call, with which arguments, in which sequence. It can widen a lookback window, zoom in on a specific service, or tighten a statistical threshold, all by choosing parameters. What it can’t do is invent its own definition of failure rate.

A function in the contract layer might look like this:

CREATE FUNCTION get_failure_rate(
  service_name STRING,
  lookback_hours INT DEFAULT 24,
  min_sample_size INT DEFAULT 100
)
RETURNS TABLE (
  service STRING,
  failure_rate FLOAT,
  sample_size INT,
  window_start TIMESTAMP
)
-- aggregation logic lives here, version-controlled, tested

The LLM never writes this. It just emits a tool call:

{
  "tool": "get_failure_rate",
  "arguments": {
    "service_name": "checkout",
    "lookback_hours": 72
  }
}

An intermediate layer translates that into SELECT * FROM get_failure_rate(...) and runs it. Typed in, typed out, business logic locked.

You can think of this as a semantic layer that you didn’t have to build a whole platform for. A full semantic layer (dbt metrics, Cube, AtScale) is a real engineering commitment. A headless dashboard can be a handful of well-typed SQL functions exposed as tool contracts. Same governance principle, much less to stand up.

Why this helps with testing

Because the logic lives in SQL functions, you can test it directly. Time-window boundaries, aggregation correctness, threshold behavior, empty-data handling, all of it goes in CI like normal code.

When a recommendation looks off, you can figure out where it went wrong: stale data, broken contract, or the agent drawing a weird conclusion from correct numbers. Three different problems, three different fixes.

And every investigation is reproducible. You have the function name and the parameters. Compare that to debugging a wall of generated SQL at 11pm.

The annoying parts

This pattern is constrained on purpose, and the constraints have a cost. You only get answers within the contracts you’ve built, so there’s real upfront work. The agent will still mis-prioritize and bury the lead sometimes. And some diagnostics genuinely need APIs that aren’t part of your contract surface.

That’s fine. The goal isn’t a perfect system. It’s adaptive investigation with controlled risk.

The shift in thinking

If you’re building one of these, don’t think of it as “an LLM over your data warehouse.” Think of it as three layers that have to work together. Governed SQL contracts define truth. The agent chooses how to investigate. Tests validate the logic underneath.

Get those three right and you have something you can actually run in production.