Skip to main content

SQL Agent using LangGraph

Overview

This article describes a table-scoped SQL agent: given a table name, its column metadata, and a user question, it generates BigQuery SQL, validates it, executes it, and returns the most relevant rows plus a total record count.

Table selection (figuring out which table to use) is intentionally out of scope here and handled by a separate router layer (typically RAG + ranking). This separation keeps responsibilities clean and failure modes diagnosable.


1) Problem Statement and Boundaries

1.1 The Problem

Users ask natural-language questions over operational tables: aggregations (“count by status”), drill-downs (“show top issues for X”), definitions (“what is field Y”), and general queries that translate into filters, groupings, and ordering.

1.2 What This Agent Solves

This agent assumes the correct table is already known and focuses on producing reliable SQL and results for that single table:

  • Plan the SQL steps in plain English (intent → operations).

  • Validate filter assumptions against actual table values.

  • Generate executable BigQuery SQL.

  • Execute safely with a row cap and compute a total count.

1.3 What This Agent Does Not Solve

  • Choosing the right table from a catalog (RAG/router problem).

  • Cross-table joins, multi-dataset reasoning, or schema discovery.


2) High-Level Architecture

2.1 Two-Layer System (Conceptual)

  • Layer A: Table Router (Out of scope)
    Uses metadata embeddings, RAG, heuristics, and confidence thresholds to pick the best table(s).

  • Layer B: Table-Scoped SQL Agent (This article)
    Converts question → validated plan → SQL → executed results on a known table.

2.2 Inputs and Outputs Contract

Inputs

  • user_query

  • table_name

  • table_metadata (per column: type, description, distinct_count, optional distinct/sample values)

  • Two LLMs:

    • thinking_llm for planning

    • base_llm for generation and validation

  • Configuration: n_iterations (parallel runs), record_limit (default cap)

Outputs

  • Generated SQL (candidate_sql)

  • Rows returned (raw_results)

  • Total record count (total_record_count)

  • Plan summary / reasoning (reasoning)

  • Token usage (input/output)


3) Metadata Strategy

3.1 Why Schema-Only Is Not Enough

Column names and types rarely match how users speak. Reliability improves when the model sees:

  • Column descriptions (business meaning)

  • Distinct counts (cardinality hints)

  • Sample/distinct values (grounding filters)

3.2 Metadata Used by the Agent

For each column, the agent can use:

  • Type: guides filtering/aggregation strategy

  • Distinct count: helps choose grouping keys and low-cardinality “dimension” fields

  • Distinct values (when small) and sample values: improves filter selection and reduces hallucinated values

3.3 Keyword-to-Column Hints

Before planning, the workflow computes a keyword → columns mapping based on literal LIKE matches. This is used as a soft guide to pick relevant columns for filtering or selection.


4) Workflow Orchestration with LangGraph

4.1 Why LangGraph Here

This system is not a single prompt. It requires:

  • Multiple steps (planning, validating, generating, executing)

  • Conditional retries based on observed failures

  • Traceable state and histories per node

LangGraph provides a structured, debuggable workflow with explicit nodes and conditional edges.

4.2 State Model

A shared GraphState flows through the graph and carries:

  • Inputs (query, schema metadata, LLMs)

  • Intermediate artifacts (paraphrases, keyword mapping, reasoning, candidate SQL)

  • Validation/execution results and feedback

  • Histories per stage (reasoner history, generator history, validator history)

  • Token accounting

4.3 Graph Nodes (At a Glance)

  • preprocess_query_variation

  • reasoner

  • reasoner_validator

  • sql_generator

  • context_validator

  • sql_executor

4.4 Conditional Routing Logic

The workflow self-corrects via conditional edges:

  • If filter validation fails → go back to reasoner

  • If SQL-context validation fails → go back to sql_generator

  • If execution returns empty rows → go back to reasoner with “relax filters” guidance

  • If execution hard-fails → go back to sql_generator using error feedback
    A recursion limit is applied to prevent infinite loops.


5) Step-by-Step: What Each Stage Does

5.1 Query Preprocessing and Variations

The agent normalises the user query and asks the LLM to produce five paraphrases. This increases coverage for ambiguous phrasing and improves the planner’s ability to map intent to the schema.

In parallel, a keyword-presence function builds a query keyword → columns mapping from literal matches against the table.

5.2 Reasoner: Plan First, in Plain English

The reasoner produces a short “Plan Summary” describing the SQL stages:

  • SELECT fields

  • WHERE filters

  • GROUP BY + aggregations

  • ORDER BY + optional LIMIT (only if semantically required)

This stage keeps a running message history and can incorporate corrective feedback after validation failures or empty results.

5.3 Reasoner Validator: Tool-Verified Filters

The system extracts string filters from the plan and validates them using a tool:

  • Check if a column contains the proposed value (case-insensitive LIKE)

  • Return structured pass/fail with concise feedback

If validation fails, the workflow routes back to the reasoner to adjust columns or values before generating SQL.

5.4 SQL Generator: BigQuery SQL from the Plan

The SQL generator produces a single BigQuery SQL statement based on:

  • user question + paraphrases

  • validated reasoning plan

  • full schema metadata
    It is explicitly instructed to:

  • use case-insensitive string matching patterns

  • reference only known columns

  • avoid LIMIT by default (execution caps the rows anyway)

  • include a few additional descriptive columns for context when appropriate

The generator retains a chat history so it can repair SQL using validation or execution error feedback.

5.5 Context Validator: SQL ↔ Intent Alignment

A separate validator checks the generated SQL against:

  • the reasoning plan

  • required filters and aggregations

  • schema compliance

  • case-insensitive filter conventions

It returns strict structured output:

  • valid: true/false

  • feedback: "" or concise issue description

5.6 SQL Execution and Result Packaging

The executor:

  • strips trailing semicolons

  • enforces a record cap if the model omitted LIMIT

  • computes a total count using a COUNT wrapper query that removes top-level ORDER BY/LIMIT to keep BigQuery happy

  • marks empty_results when the query succeeds but returns zero rows, triggering the “relax filters” loop in the reasoner


6) Production Behaviour and Observed Efficacy

6.1 Observed Efficacy

In production usage, the reported efficacy is ~70%+ across varied question types, particularly those that translate cleanly into:

  • aggregations and group-bys

  • drill-down queries with filters

  • “top N” and ordering

  • definition/lookup style queries when the data model supports it

6.2 Reliability Patterns That Matter

This agent behaves more reliably than prompt-only SQL generation because it:

  • validates filter assumptions against live data

  • checks SQL-plan alignment explicitly

  • corrects itself via structured retries (not “try again” randomness)

  • caps result sizes and returns total counts for UI/UX consistency

6.3 Parallel Runs for Better Coverage

The system can run multiple workflows in parallel (n_iterations, default 3). Each run produces a candidate output (SQL + results), which can later be ranked or selected by downstream logic.


Conclusion

This is a production-oriented SQL agent for a known table that:

  • turns natural language into an explicit SQL plan

  • validates critical filter assumptions with tool checks

  • generates BigQuery SQL with schema discipline

  • executes safely with row caps and count queries

  • self-corrects across planning, generation, and execution failures

The biggest gains typically come from:

  • Type-aware validation (numbers/dates/ranges validated like strings are today)

  • Candidate ranking across parallel runs (choose the best result, not just any result)

  • Stronger empty-result recovery (smarter filter relaxation ordering)

  • Tighter contracts with the table-router layer (confidence thresholds, fallbacks, and clarifying-question triggers)