SQL Agent using LangGraph
Overview
This article describes a table-scoped SQL agent: given a table name, its column metadata, and a user question, it generates BigQuery SQL, validates it, executes it, and returns the most relevant rows plus a total record count.
Table selection (figuring out which table to use) is intentionally out of scope here and handled by a separate router layer (typically RAG + ranking). This separation keeps responsibilities clean and failure modes diagnosable.
1) Problem Statement and Boundaries
1.1 The Problem
Users ask natural-language questions over operational tables: aggregations (“count by status”), drill-downs (“show top issues for X”), definitions (“what is field Y”), and general queries that translate into filters, groupings, and ordering.
1.2 What This Agent Solves
This agent assumes the correct table is already known and focuses on producing reliable SQL and results for that single table:
-
Plan the SQL steps in plain English (intent → operations).
-
Validate filter assumptions against actual table values.
-
Generate executable BigQuery SQL.
-
Execute safely with a row cap and compute a total count.
1.3 What This Agent Does Not Solve
-
Choosing the right table from a catalog (RAG/router problem).
-
Cross-table joins, multi-dataset reasoning, or schema discovery.
2) High-Level Architecture
2.1 Two-Layer System (Conceptual)
-
Layer A: Table Router (Out of scope)
Uses metadata embeddings, RAG, heuristics, and confidence thresholds to pick the best table(s). -
Layer B: Table-Scoped SQL Agent (This article)
Converts question → validated plan → SQL → executed results on a known table.
2.2 Inputs and Outputs Contract
Inputs
-
user_query -
table_name -
table_metadata(per column: type, description, distinct_count, optional distinct/sample values) -
Two LLMs:
-
thinking_llmfor planning -
base_llmfor generation and validation
-
-
Configuration:
n_iterations(parallel runs),record_limit(default cap)
Outputs
-
Generated SQL (
candidate_sql) -
Rows returned (
raw_results) -
Total record count (
total_record_count) -
Plan summary / reasoning (
reasoning) -
Token usage (input/output)
3) Metadata Strategy
3.1 Why Schema-Only Is Not Enough
Column names and types rarely match how users speak. Reliability improves when the model sees:
-
Column descriptions (business meaning)
-
Distinct counts (cardinality hints)
-
Sample/distinct values (grounding filters)
3.2 Metadata Used by the Agent
For each column, the agent can use:
-
Type: guides filtering/aggregation strategy
-
Distinct count: helps choose grouping keys and low-cardinality “dimension” fields
-
Distinct values (when small) and sample values: improves filter selection and reduces hallucinated values
3.3 Keyword-to-Column Hints
Before planning, the workflow computes a keyword → columns mapping based on literal LIKE matches. This is used as a soft guide to pick relevant columns for filtering or selection.
4) Workflow Orchestration with LangGraph
4.1 Why LangGraph Here
This system is not a single prompt. It requires:
-
Multiple steps (planning, validating, generating, executing)
-
Conditional retries based on observed failures
-
Traceable state and histories per node
LangGraph provides a structured, debuggable workflow with explicit nodes and conditional edges.
4.2 State Model
-
Inputs (query, schema metadata, LLMs)
-
Intermediate artifacts (paraphrases, keyword mapping, reasoning, candidate SQL)
-
Validation/execution results and feedback
-
Histories per stage (reasoner history, generator history, validator history)
-
Token accounting
4.3 Graph Nodes (At a Glance)
-
preprocess_query_variation -
reasoner -
reasoner_validator -
sql_generator -
context_validator -
sql_executor
4.4 Conditional Routing Logic
The workflow self-corrects via conditional edges:
-
If filter validation fails → go back to reasoner
-
If SQL-context validation fails → go back to sql_generator
-
If execution returns empty rows → go back to reasoner with “relax filters” guidance
-
If execution hard-fails → go back to sql_generator using error feedback
A recursion limit is applied to prevent infinite loops.
5) Step-by-Step: What Each Stage Does
5.1 Query Preprocessing and Variations
The agent normalises the user query and asks the LLM to produce five paraphrases. This increases coverage for ambiguous phrasing and improves the planner’s ability to map intent to the schema.
In parallel, a keyword-presence function builds a query keyword → columns mapping from literal matches against the table.
5.2 Reasoner: Plan First, in Plain English
The reasoner produces a short “Plan Summary” describing the SQL stages:
-
SELECT fields
-
WHERE filters
-
GROUP BY + aggregations
-
ORDER BY + optional LIMIT (only if semantically required)
This stage keeps a running message history and can incorporate corrective feedback after validation failures or empty results.
5.3 Reasoner Validator: Tool-Verified Filters
The system extracts string filters from the plan and validates them using a tool:
-
Check if a column contains the proposed value (case-insensitive LIKE)
-
Return structured pass/fail with concise feedback
If validation fails, the workflow routes back to the reasoner to adjust columns or values before generating SQL.
5.4 SQL Generator: BigQuery SQL from the Plan
The SQL generator produces a single BigQuery SQL statement based on:
-
user question + paraphrases
-
validated reasoning plan
-
full schema metadata
It is explicitly instructed to: -
use case-insensitive string matching patterns
-
reference only known columns
-
avoid LIMIT by default (execution caps the rows anyway)
-
include a few additional descriptive columns for context when appropriate
The generator retains a chat history so it can repair SQL using validation or execution error feedback.
5.5 Context Validator: SQL ↔ Intent Alignment
A separate validator checks the generated SQL against:
-
the reasoning plan
-
required filters and aggregations
-
schema compliance
-
case-insensitive filter conventions
It returns strict structured output:
-
valid: true/false -
feedback: "" or concise issue description
5.6 SQL Execution and Result Packaging
The executor:
-
strips trailing semicolons
-
enforces a record cap if the model omitted LIMIT
-
computes a total count using a COUNT wrapper query that removes top-level ORDER BY/LIMIT to keep BigQuery happy
-
marks
empty_resultswhen the query succeeds but returns zero rows, triggering the “relax filters” loop in the reasoner
6) Production Behaviour and Observed Efficacy
6.1 Observed Efficacy
In production usage, the reported efficacy is ~70%+ across varied question types, particularly those that translate cleanly into:
-
aggregations and group-bys
-
drill-down queries with filters
-
“top N” and ordering
-
definition/lookup style queries when the data model supports it
6.2 Reliability Patterns That Matter
This agent behaves more reliably than prompt-only SQL generation because it:
-
validates filter assumptions against live data
-
checks SQL-plan alignment explicitly
-
corrects itself via structured retries (not “try again” randomness)
-
caps result sizes and returns total counts for UI/UX consistency
6.3 Parallel Runs for Better Coverage
The system can run multiple workflows in parallel (n_iterations, default 3). Each run produces a candidate output (SQL + results), which can later be ranked or selected by downstream logic.
Conclusion
This is a production-oriented SQL agent for a known table that:
-
turns natural language into an explicit SQL plan
-
validates critical filter assumptions with tool checks
-
generates BigQuery SQL with schema discipline
-
executes safely with row caps and count queries
-
self-corrects across planning, generation, and execution failures
The biggest gains typically come from:
-
Type-aware validation (numbers/dates/ranges validated like strings are today)
-
Candidate ranking across parallel runs (choose the best result, not just any result)
-
Stronger empty-result recovery (smarter filter relaxation ordering)
-
Tighter contracts with the table-router layer (confidence thresholds, fallbacks, and clarifying-question triggers)
No comments to display
No comments to display