How to Use Large Language Models (LLMs) To Interact with Your SQL Data

Carolyn Weitz

Last Updated: Jan 12, 2026

7 Minute Read

2123 Views

How to Use Large Language Models (LLMs) To Interact with Your SQL Data

Data is an invaluable asset for enterprises. And SQL (Structured Query Language) is the accepted standard for relational database management systems (RDBMS). After all, StackOverflow’s survey shows that SQL is used by 58.6% of all respondents (61.3% of professional developers).

However, extracting actionable insights from the data demands technical expertise, i.e., writing raw SQL queries. As a result, this impedes data democratization, especially for business users without technical SQL expertise.

This is why we use Large Language Models (LLMs) as they convert natural language into SQL queries. This step-by-step guide will reveal how to connect LLMs with an SQL database using natural language.

What is the Role of LLMs in SQL Data Interaction?

As a class of Artificial Intelligence (AI), LLMs have shown impressive performance in understanding and generating human-like text.

With this capability, it is now possible for business users to interact with databases using natural language without understanding complex SQL syntax.

In other words, it facilitates conversation with the database. All you have to do is just submit a prompt (e.g., “Number of male students with depression “), query the database and it gives you the values.

Also read: Complete List of 15 Best Open Source LLMs

LangChain: An Interface between LLMs and Databases

LangChain is a potent innovation based on the concept of chains. It provides SQL chains and Agents that take natural language prompts and execute SQL queries.

It converts natural language into an executable SQL query and transforms database interaction into a conversation. Moreover, it democratizes database interaction and surmounts the barrier to interacting with SQL data.

To practically showcase how to use LLMs to interact with SQL databases. The following are the requirements to follow along:

OpenAI
LangChain
Python Libraries (Pandas)
Database Management System (PostgreSQL in this case)

Step 1:Setup a Virtual Environment

A virtual environment is critical in project development. It is a security practice embraced by most developers. With a virtual environment, project packages are installed in the virtual machine, not the physical one.

Installing a package dependency with malicious code in one project will not affect another or the whole physical machine. Project packages are isolated from each other when using a virtual environment.

Install virtual environment on the terminal

$ pip install virtualenv

Create a virtual environment inside the project directory

$ virtualenv env

Activate the virtual environment

On Git Bash:source env/Scripts/activate

On Windows Powershell or CMD: .\env\Scripts\activate

Install Required Dependencies

pip install openai langchain langchain_experimental pandas

Setup Environment Variables

Hard-coding sensitive details, including API keys, is bad practice. It is prudent to hide the secret key in environment variables and access it using the OS module like so:

import os
api_key = os.environ.get('OPENAI_API_KEY')

#Database Environment Variables
db_name = os.environ.get("DB_NAME")
db_password = os.environ.get("DB_PASS")

Step 2: Reading Data and Uploading to DBMS (PostgreSQL)

In this project, we use student depression data from Kaggle. Download the dataset into your local machine. After downloading, load the dataset into the CSV file and put it into a Pandas dataframe.

To configure and connect to the database, specify the dialect. The dialect is the database management system (DMS) of choice, in this case, PostgreSQL. After the dialect comes the username, password, host, port and database name.

Finally, we utilize the create_engine() class to manage database connections and raw SQL queries that the database can execute.

import os
import pandas as pd
from sqlalchemy import create_engine, text
#Read CSV File
df = pd.read_csv("C:/Users/nehem/Downloads/Student Depression Dataset.csv", encoding='ISO-8859-1')

#Connect to Database
db_uri = "postgresql://postgres:" + db_password + "@localhost:5432/" + db_name

#Create sqlalchemy engine
engine = create_engine(db_uri)
conn = engine.connect()

#Create Table by executing transactions
create_table = text('''
CREATE TABLE IF NOT EXISTS Depression_Levels(
id INTEGER PRIMARY KEY,
Gender VARCHAR,
Age VARCHAR,
City VARCHAR,
Profession VARCHAR,
"Academic Pressure" FLOAT,
"Work Pressure" FLOAT,
CGPA FLOAT,
"Study Satisfaction" FLOAT,
"Job Satisfaction" FLOAT,
"Sleep Duration" VARCHAR,
"Dietary Habits" VARCHAR,
Degree VARCHAR,
"Have you ever had suicidal thoughts?" BOOLEAN,
"Work/Study Hours" FLOAT,
"Financial Stress" FLOAT,
Depression INTEGER)
''')

#Execute the query
conn.execute(create_table)

#Insert DataFrame into PostgreSQL
df.to_sql('Depression_Levels', engine, if_exists='replace', index=False)

#Query the Database
query = text('''SELECT * FROM Depression_Levels LIMIT 200;''')

#Get the query results
results = conn.execute(query).fetchall()

#Close the cursor and the database connection
conn.close()

Step 3: Define a Function to Run SQL Query and Fetch Results

The function is relatively straightforward. It takes in two parameters, i.e., query (a string containing the SQL query) and db_uri (details on connecting to the database).

def run_query(query, db_uri):

#Create Engine object
engine = create_engine(db_uri)

#Manage connection
with engine.connect() as conn:

#Execute the query
results = conn.execute(text(query)).fetchall()
for rows in results:
print(rows)
conn.close()

The body constitutes an instance of the create_engine() class, establishing a connection to the database.

Additionally, utilize the with a statement to close the connection after executing the code inside the block. The engine.connect() opens a new connection and saves it to a variable — conn.

Finally, execute the query and save the query results in the results variable. The conn.execute() executes and fetch all () retrieves all rows from the query result.

Step 4: Invoke Function to Retrieve First 50 Entries

Here is the code you can use to invoke the function and retrieve the first fifty entries:

query = 'SELECT * FROM "Depression_Levels" LIMIT 50;'
run_query(query, db_uri)

Power Your LLM Workloads with AceCloud

Use GPU-powered cloud solutions to build and run large language models efficiently and affordably.

Book Consultation

Step 5: Setup LangChain Utilities to Interact with SQL Database

The listed LangChain modules process and execute natural language into an SQL query.

from langchain.utilities import SQLDatabase
from langchain_openai import OpenAI
from langchain_experimental.sql import SQLDatabaseChain

SQLDatabase: Establishes a connection between SQL Database and LangChain. Facilitates operations such as executing queries and retrieving results.
OpenAI: Configures the LLMs responsible for interpreting natural language and delivering human-like responses.
SQLDatabaseChain: Joins the database and LLMs to form a chain, facilitating the conversion of natural language to SQL.

Step 6: Initialize SQLDatabaseChain and LLM to Process Natural Language

Firstly, establish a connection to the database using the LangChain utility SQLDatabase. Initialize the OpenAI LLM to process the natural language and provide results.

#Initialize SQLDatabaseChain and LLM to Process Natural Language
db_connect = SQLDatabase.from_uri(db_uri)
llm_initialize = OpenAI(api_key=api_key, temperature=0, verbose=True)
database_chain = SQLDatabaseChain.from_llm(llm=llm_initialize, db=db_connect, verbose=True)

natural_results=database_chain.run("What are the Depression levels among male students compared to female students?")
print(natural_results)

The arguments temperature=0 ensure the response is focused and not random. Finally, the code ties the LLMs (llm_initialize) and the database connection (db_connect). It takes the input and uses LLM to generate an SQL query. Here’s the output:

Example Query 1

Compare the depression levels between female and male students.

SQLQuery: SELECT "Gender", AVG("Depression") AS Average_Depression_Level, COUNT(*) AS Total_Students FROM "Depression_Levels" GROUP BY "Gender";
SQLResult: [('Female', 0.5845070422535211, 12354), ('Male', 0.5862867434231685, 15547)]

Answer: The average depression level for female students is 0.5845 across 12354 students and for male students is 0.5863 across 15547 students.

Example Query 2

Which cities have the highest average depression levels?

SQLQuery: SELECT "City", AVG("Depression") FROM "Depression_Levels" GROUP BY "City" ORDER BY AVG("Depression") DESC LIMIT 5;
SQLResult: [('Mira', 1.0), ('Gaurav', 1.0), ('Vaanya', 1.0), ('3.0', 1.0), ('Harsh', 1.0)]

Answer: The cities with the highest average depression levels are Mira, Gaurav, Vaanya, 3.0 and Harsh.

Deploy LLMs with AceCloud Experts

LLMs have surmounted the barrier to data interaction. Previously, interacting with SQL data was limited to individuals with technical expertise to write raw SQL queries.

However, with LLMs, data interaction is democratized and all business users can get actionable insights from data using natural language.

This is because they can write prompts in natural language and LLMs convert this prompt to a query and retrieve the results. Want to learn more about using cloud resources to deploy LLM? Book a free consultation with AceCloud experts and run a free trial!

Frequently Asked Questions

What do I need to connect LLMs with SQL?

You’ll need an OpenAI API key, Python (with Pandas), LangChain and a SQL database. For dataset, you can use publicly available datasets from Kaggle.

Do I need to know SQL to query the database with an LLM?

Not necessarily. You can ask questions in natural language and LangChain will generate SQL for you, but basic SQL knowledge helps you validate results and catch mistakes.

How does LangChain convert my question into an SQL query?

It does that by connecting to your database using SQLDatabase and initializing an OpenAI LLM. Then it uses SQLDatabaseChain to translate your prompt into SQL, execute it and return a natural-language answer.

Can I use MySQL or another SQL database instead of PostgreSQL?

Yes. You can switch by changing the database URI (dialect plus connection details). The example uses a PostgreSQL URI, but the pattern applies to other SQLAlchemy-supported dialects too.

How should I store my OpenAI key and database credentials safely?

Don’t hard-code secrets in your script. We recommend using environment variables (e.g., OPENAI_API_KEY, DB_NAME, DB_PASS) and reading them via Python’s os.environ.

What if the model generates incorrect SQL or returns confusing results?

Run with verbose=True, so you can see the SQL query it generated and the raw SQL result, then refine your prompt or fix schema/column naming issues. If needed, validate the SQL directly in your database client before trusting the final natural-language answer.

Carolyn Weitz

author

Carolyn began her cloud career at a fast-growing SaaS company, where she led the migration from on-prem infrastructure to a fully containerized, cloud-native architecture using Kubernetes. Since then, she has worked with a range of companies from early-stage startups to global enterprises helping them implement best practices in cloud operations, infrastructure automation, and container orchestration. Her technical expertise spans across AWS, Azure, and GCP, with a focus on building scalable IaaS environments and streamlining CI/CD pipelines. Carolyn is also a frequent contributor to cloud-native open-source communities and enjoys mentoring aspiring engineers in the Kubernetes ecosystem.