Stop manually opening 2000 files: finally discover SQL databases
In the previous article, we saw how to automatically create a PDF report from 3 test reports.
The process works perfectly; you automatically generate:
Everything seems perfect.
But a few months later, a new problem arises. You have to deal with:
- Hundreds of reports
- Thousands of CSV files
- Multiple versions
- Folders impossible to explore
And a simple question becomes a nightmare; imagine your manager asking you:
“Find all the tests where the temperature exceeded 850°C”
👉 Without a database, you would have to open the files manually, apply filters, and use complex scripts (Excel macros, Google Script?)!
Even if you’re a programming pro, how long would it take you?
And would the structure of your folders and files allow you to use the same process/program every time?
The hidden file problem
Files are perfect:
✔to archive
✔ to share
✔ to deliver
But they quickly become problematic to:
- search
- compare
- cross
- analyze over time
The solution? Switch to SQL databases!
SQL: the move to live storage
A SQL database allows you to transform:
❌ isolated files
and
✅ an instantly searchable technical memory
An SQL database is simply a structured way of organizing data so that it can be found instantly. We could compare it to an intelligent library capable of finding information in a few milliseconds.
The essential concepts
A SQL database is based on 2 concepts:
1. A table
A table contains organized data:
| Date | Engine | Max Temperature |
2. A query
A query allows you to ask questions of your database.
For example:
SELECT *
FROM essais
WHERE temperature_max > 850;
👉 Translation:
“Show me all the tests where the temperature exceeds 850°C.”
SQL + Python = huge leverage
All of this has a huge benefit; once the data is structured in SQL::
- Python can exploit them automatically
- Dashboards become more reliable
- Analyzes become reproducible
👉 And this is often where time savings become important.
So how to do it concretely?
Creation of a SQLite database
You must first create an SQLite database:
import sqlite3
# Creation / connexion to the database
connexion = sqlite3.connect("essais.db")
print("DataBase connected")
If the database does not exist, Python automatically creates it.
Si la bonne de données n’existe pas, Python la crée automatiquement.
Create a table
Then, you must create the table:
curseur = connexion.cursor()
curseur.execute("""
CREATE TABLE IF NOT EXISTS essais (
id INTEGER PRIMARY KEY AUTOINCREMENT,
moteur TEXT,
temperature_max REAL,
pression_max REAL
)
""")
connexion.commit()
You simply create an organized structure to store your tests.
Automatically insert the results
And this is where it gets really interesting!
Now imagine that your script automatically calculates:
- max temperature
- max pressure
- the RMS
- vibration indicators
👉 You can save everything immediately.
Example :
curseur.execute("""
INSERT INTO essais (engine, temperature_max, pression_max)
VALUES (?, ?, ?)
""", ("Engine_A", 845.2, 12.5))
connexion.commit()
🔥You start to build a real technical memory.
Query the database instantly
Instead of manually opening thousands of files, it is possible to query the database:
resultats = curseur.execute("""
SELECT *
FROM essais
WHERE temperature_max > 800
""")
for ligne in resultats:
print(ligne)
“Show me all the tests where the temperature exceeds 800°C”
And the response time is only a few milliseconds.
Concrete example
Let’s revisit the three trials from the previous articles.
We had the following data structure:
data/
essai_01.csv
essai_02.csv
essai_03.csv
We will retrieve the .csv files, merge them, and save them into a single database that can be queried instantly:
connexion = sqlite3.connect("essais_moteurs.db")
fichiers = Path("~/data/").glob("*.csv")
for fichier in fichiers:
df = pd.read_csv(fichier)
# Retrieving the file name
nom_essai = fichier.stem
# Add a column
df["ESSAI"] = nom_essai
print(df)
# Export SQL
df.to_sql(
"essais",
connexion,
if_exists="append",
index=False
)
print("Import finished")
Let’s query the database with “simple” SQL queries to find the maximum and average power of all the tests:
-- See all the 10 first lines
SELECT *
FROM essais
LIMIT 10;
-- Max Power
SELECT MAX(Puissance), ESSAI
FROM essais;
-- Average power
SELECT AVG(PUISSANCE)
FROM essais;

We can also search for the maximum power per trial, arrange them in descending order, and know when this occurred:
-- Max Power per test
SELECT ESSAI, MAX(PUISSANCE), Temps
FROM essais
GROUP BY ESSAI
ORDER BY MAX(PUISSANCE) DESC;

As you can see, this tool is quite powerful and the results are almost instantaneous.
To summarize
Before we had:
- Dispersed data
- Passive data
- Archived data
Using SQL, our data is now:
- centralized
- queryable
- comparable
- usable
Conclusion: the paradigm change
Storing data everyone knows how to do it.
But the real challenge is being able to retrieve key data weeks, months, or even years later.
And that’s precisely where SQL completely changes the way we work.
With a database:
- You no longer browse through folders,
- you no longer manually search through hundreds of reports,
- you no longer compare files one by one.
You ask a question and the database answers!
Your data thus becomes:
- centralized,
- historical,
- comparable,
- usable on a large scale.
But above all, you begin to build something much more powerful than a simple reporting system: a living technical memory.
And that is the real paradigm shift.
Because once the data is centralized, a new door opens.
With an SQL + Plotly Dash or Streamlit system, you could have:
- Real-time dashboards,
- automatic anomaly detection,
- predictive analytics,
- and why not artificial intelligence applied to industrial testing?

Leave a Reply