Stop manually opening 2000 files: finally discover SQL databases

In the previous article, we saw how to automatically create a PDF report from 3 test reports.

The process works perfectly; you automatically generate:

Everything seems perfect.

But a few months later, a new problem arises. You have to deal with:

  • Hundreds of reports
  • Thousands of CSV files
  • Multiple versions
  • Folders impossible to explore

And a simple question becomes a nightmare; imagine your manager asking you:

“Find all the tests where the temperature exceeded 850°C”

👉 Without a database, you would have to open the files manually, apply filters, and use complex scripts (Excel macros, Google Script?)!

Even if you’re a programming pro, how long would it take you?
And would the structure of your folders and files allow you to use the same process/program every time?

The hidden file problem

Files are perfect:

✔to archive
✔ to share
✔ to deliver

But they quickly become problematic to:

  • search
  • compare
  • cross
  • analyze over time

The solution? Switch to SQL databases!

SQL: the move to live storage

A SQL database allows you to transform:
❌ isolated files
and
✅ an instantly searchable technical memory

An SQL database is simply a structured way of organizing data so that it can be found instantly. We could compare it to an intelligent library capable of finding information in a few milliseconds.

The essential concepts

A SQL database is based on 2 concepts:

1. A table

A table contains organized data:

DateEngineMax Temperature

2. A query

A query allows you to ask questions of your database.
For example:

SELECT *
FROM essais
WHERE temperature_max > 850;

👉 Translation:

“Show me all the tests where the temperature exceeds 850°C.”

SQL + Python = huge leverage

All of this has a huge benefit; once the data is structured in SQL::

  • Python can exploit them automatically
  • Dashboards become more reliable
  • Analyzes become reproducible

👉 And this is often where time savings become important.

So how to do it concretely?

Creation of a SQLite database

You must first create an SQLite database:

import sqlite3

# Creation / connexion to the database
connexion = sqlite3.connect("essais.db")

print("DataBase connected")

If the database does not exist, Python automatically creates it.

Si la bonne de données n’existe pas, Python la crée automatiquement.

Create a table

Then, you must create the table:

curseur = connexion.cursor()

curseur.execute("""
CREATE TABLE IF NOT EXISTS essais (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    moteur TEXT,
    temperature_max REAL,
    pression_max REAL
)
""")

connexion.commit()

You simply create an organized structure to store your tests.

Automatically insert the results

And this is where it gets really interesting!
Now imagine that your script automatically calculates:

  • max temperature
  • max pressure
  • the RMS
  • vibration indicators

👉 You can save everything immediately.

Example :

curseur.execute("""
INSERT INTO essais (engine, temperature_max, pression_max)
VALUES (?, ?, ?)
""", ("Engine_A", 845.2, 12.5))

connexion.commit()

🔥You start to build a real technical memory.

Query the database instantly

Instead of manually opening thousands of files, it is possible to query the database:

resultats = curseur.execute("""
SELECT *
FROM essais
WHERE temperature_max > 800
""")

for ligne in resultats:
    print(ligne)

“Show me all the tests where the temperature exceeds 800°C”

And the response time is only a few milliseconds.

Concrete example

Let’s revisit the three trials from the previous articles.
We had the following data structure:

data/
     essai_01.csv
     essai_02.csv
     essai_03.csv

We will retrieve the .csv files, merge them, and save them into a single database that can be queried instantly:

connexion = sqlite3.connect("essais_moteurs.db")

fichiers = Path("~/data/").glob("*.csv")

for fichier in fichiers:
    df = pd.read_csv(fichier)

    # Retrieving the file name
    nom_essai = fichier.stem

    # Add a column
    df["ESSAI"] = nom_essai
    print(df)

    # Export SQL
    df.to_sql(
        "essais",
        connexion,
        if_exists="append",
        index=False
    )

print("Import finished")

Let’s query the database with “simple” SQL queries to find the maximum and average power of all the tests:

-- See all the 10 first lines
SELECT *
FROM essais
LIMIT 10;

-- Max Power
SELECT MAX(Puissance), ESSAI
FROM essais;

-- Average power
SELECT AVG(PUISSANCE)
FROM essais;
SQL query showing average and maximum values

We can also search for the maximum power per trial, arrange them in descending order, and know when this occurred:

-- Max Power per test
SELECT ESSAI, MAX(PUISSANCE), Temps
FROM essais
GROUP BY ESSAI
ORDER BY MAX(PUISSANCE) DESC;
SQL query showing average and maximum values

As you can see, this tool is quite powerful and the results are almost instantaneous.

To summarize

Before we had:

  • Dispersed data
  • Passive data
  • Archived data

Using SQL, our data is now:

  • centralized
  • queryable
  • comparable
  • usable

Conclusion: the paradigm change

Storing data everyone knows how to do it.
But the real challenge is being able to retrieve key data weeks, months, or even years later.

And that’s precisely where SQL completely changes the way we work.

With a database:

  • You no longer browse through folders,
  • you no longer manually search through hundreds of reports,
  • you no longer compare files one by one.

You ask a question and the database answers!
Your data thus becomes:

  • centralized,
  • historical,
  • comparable,
  • usable on a large scale.

But above all, you begin to build something much more powerful than a simple reporting system: a living technical memory.

And that is the real paradigm shift.

Because once the data is centralized, a new door opens.
With an SQL + Plotly Dash or Streamlit system, you could have:

  • Real-time dashboards,
  • automatic anomaly detection,
  • predictive analytics,
  • and why not artificial intelligence applied to industrial testing?

Leave a Reply