Build a GPT-Powered Data Analyst on Your Local Machine
Your private AI data assistant that answers questions on your CSVs — entirely offline, totally free.
Imagine This…
You’ve got a CSV file loaded with thousands of rows.
It’s a typical day at work, or maybe it’s your side project’s sales data.
You wonder:
“What’s the average sales by product?”
“Which region performed best last quarter?”
“Can I see a table showing total revenue by month?”
Normally, that means opening up Excel, fiddling with pivot tables, or writing Pandas code that looks like:
df.groupby('Product')['Sales'].mean()
Not exactly something everyone enjoys.
What if you could just ask?
What if — instead — you could simply type:
Average sales by product
…and instantly get the answer, calculated from your CSV, right on your laptop?
No manual coding
No uploading your sensitive data to OpenAI or some cloud service
No API key costs
100% offline, secure, private
That’s exactly what we’re building today.
You’ll end up with your own personal GPT-powered Data Analyst, running entirely on your local machine, answering your CSV questions in plain English.
What Exactly Are We Building?
Here’s the dream:
A slick web app where you upload a CSV
Ask any natural language question about it
Your local GPT (like Mistral via Ollama) reads a sample of your CSV and figures out the answer
Gives you a direct response — like a markdown table or text summary
All happens offline, so your data never leaves your machine.
It’s basically like having ChatGPT trained on your CSV, but living entirely on your laptop.
Tools We’re Using
Tool | Why we’re using it |
---|---|
Python | The glue to hold it all together |
Streamlit | Build a beautiful, interactive UI |
Pandas | Load & explore your CSVs |
Ollama | Runs large language models locally |
Mistral (or LLaMA 3) | Local GPT-like brain for reasoning |
Requests | Talk to the Ollama API |
That’s it.
No OpenAI keys, no sending your private data over the internet.
The Big Deal: Why Local Matters
Data privacy: Your CSV stays on your laptop.
No surprise bills: It costs zero.
Blazing fast: Talking to localhost is faster than waiting on cloud latency.
Freedom to tweak: Want to switch to LLaMA 3 or your own finetuned model? Just change one line.
What Happens Under the Hood?
Here’s the simple architecture:
[ You ] ---> Upload CSV + type question ---> [ Streamlit ]
|
V
[ Pandas DataFrame ]
|
Send small CSV sample + question
|
V
[ Ollama + Mistral ]
|
Receives answer
|
V
Display answer in app
You can literally ask:
“Average sales by product”
…and your local GPT figures it out from the data you gave it.
Prerequisites
A machine with at least 8GB RAM (16GB is better).
Python 3.8 or newer installed.
Ollama installed to run your local GPT.
Basic comfort running a few terminal commands.
Let’s Build This Step by Step
Install Ollama & Mistral
Head to https://ollama.com/download and install Ollama.
Then open your terminal and run:
ollama pull mistral
This downloads the Mistral 7B model (like a local GPT-3.5).
Test it:
ollama run mistral
Try typing:
> What is 7 * 6?
Boom. Instant local inference.
Set Up Python Environment
Make a folder for your app:
mkdir gpt-data-analyst && cd gpt-data-analyst
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate If In Windows this "
venv\Scripts\activate
" throws error then run "Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope Process" then execute the above command
Install dependencies:
pip install streamlit pandas requests
The Streamlit App Code
Now create a file app.py
with this code:
Run Your Local Data Analyst!
Start it up:
You’ll see something like:
Local URL: http://localhost:8501
Click it. Upload your CSV. Ask your question.
Watch your own GPT-powered data analyst at work — entirely on your machine.
Example Questions to Try
Question | Example Output |
---|---|
“Average sales by product” | Markdown table with products + avg |
“Total revenue by month” | Table with Month & Revenue columns |
“Top 5 regions by sales volume” | Direct ranked list |
“Which product had highest sales?” | Simple text answer |
Your local GPT (Mistral) figures this out by looking at the CSV sample and your question.
Why We Use head(100)
We pass only the first 100 rows to the model to:
Keep the prompt size small (big models can’t handle full CSVs).
Still give enough context to understand your data’s columns & values.
If you have a huge CSV, this is super handy.
Looks Amazing, But Why Offline?
Because:
Your data never leaves your laptop (perfect for private financial, medical, or company data).
No API costs or rate limits.
Works even without internet.
You can swap to other models anytime:
json={"model": "llama3", "prompt": prompt, "stream": False}
What Makes This So Fun
It’s a little chat-GPT that knows pandas & your CSV, running right on your computer.
Under the hood:
Streamlit handles the UI & file upload.
pandas loads the CSV.
A prompt is crafted:
"You are an expert data analyst. Given this CSV data... answer this question..."
It’s sent to Ollama on
localhost:11434
.Ollama + Mistral processes it and returns a plain answer (or markdown table).
Streamlit displays it neatly.
Want To Level Up?
Try a bigger model (like
llama3
) Let it generate plots too by extending your prompt.
Save Q&A history in a local file.
Or build a chatbot-style memory that recalls earlier questions.
Real-World Uses
Finance teams:
“What were monthly expenses by category last year?”
Marketing:
“Show me top 5 campaigns by leads.”
Sales:
“Which region closed most deals?”
DIY:
“Analyze electricity usage by season from my smart meter data.”
Security Note
We run exec()
-free here — your GPT gives direct natural language answers.
So zero risk of arbitrary Python code running on your machine.
On the top there is no chance of your data leakage over the internet as this a totally offline local system solution .
(If you do ever want GPT to return actual pandas code to exec()
, keep it local only. Always check outputs.)
Wrap Up
That’s it — you just built your very own GPT-powered Data Analyst,
that runs entirely on your laptop, costs nothing, needs no API key, and keeps your data safe.
It’s like ChatGPT — but it only talks to you, only sees your CSVs, and lives inside your laptop.
Pretty awesome, right?
From PrepEngi
At PrepEngi, I love helping data folks & developers build real-world, hands-on tools that blend AI with engineering.
If this inspired you, share it, fork it, or reach out. I’d love to see what you build!