Merge 50 csv files in 10 seconds with Python

It’s Monday morning, 9:00 AM and the test bench has been running all weekend. You receive a folder containing 50 CSV files.

Your mission? Compile them into a single spreadsheet to identify measurement deviations and generate the summary report for the 11:00 AM meeting.

Diagram showing the flow of importing Excel data into Pandas and creating a file merging the results.

The classic scenario:

You open the first file, select the data, and paste it into your “Master Excel.” You open the second, then the third, and so on… By the tenth, your attention wanes.

By the thirtieth, you can’t even remember if you’ve skipped a line!

Estimated time: 45 minutes of repetitive clicking

Risk of errors: High (a bad copy-paste is so easy to make)

Added value: Zero. Your engineering degree was completely useless during those 45 minutes.

The « De Facto Data » approach:

You don’t process 50 files manually. You ask a machine to do it!
With 10 lines of Python code, the process goes from 1 hour to… 1 second.
Human error is virtually eliminated. Traceability is complete.

In this article, I’ll show you how to automate this data flow so that expertise can finally prevail over administrative tasks.

Why does Excel always end up betraying you?

Let’s be honest: Excel is the engineer’s Swiss Army knife. We’ve all been using it for decades. But as soon as it comes to processing large volumes of industrial data, this Swiss Army knife becomes a danger. Here’s why it inevitably ends up “betraying” you at the worst possible moment:

The “silent” mistake (The weakest link)

In Excel, data and formulas are combined in the same cell. One wrong click, a row accidentally deleted, or a few keystrokes that stop one line too soon, and your entire calculation is thrown off. The worst part? Excel won’t give you an error message. It will display a result—incorrect, but plausible. In engineering, a silent error is far more dangerous than a software crash.
The “Black Box” effect (Zero traceability)

Take a complex Excel file created by a colleague six months ago. How long does it take you to understand the logic of the hidden macros or the links between tabs? Excel doesn’t have a clear calculation history. With Python, the code is like a recipe: you see exactly where the data comes from, what transformation it undergoes, and where it goes.
It’s auditable. It’s clean.
The “Circle of Death” (The Physical Limit)

We’ve all been there: you try to open a 100MB file containing thousands of rows of sensor data, and Excel freezes. Your processor goes into overdrive, the fan screams, and you end up killing the process in the task manager. Excel is limited by the RAM it uses for displaying the data. Python, on the other hand, processes data in a stream.
Where Excel struggles, Python processes a million rows in seconds without even overheating.
Addiction to “Copy-Paste”

This is the most insidious betrayal. Excel forces you to be a data entry operator rather than an engineer. Every manual operation is an opportunity to introduce an error, a misalignment, or an omission. By automating, you remove the human from the manipulation loop and place them where they excel: interpreting the results.

De Facto, using Excel for massive data merging is like trying to empty a swimming pool with a teaspoon. It’s possible, but is that really your role?

Finally, here is a comparative table for data merging:

Criteria	Excel Method (Manual)	Python Mathod (Automated)
Speed execution	~1 min per file (click, copy, paste)	< 2 seconds for all the directory
Data volume	Slows down at 10k lines, crashes at 1M	Handles millions of lines without flinching
Human reliabilty	Risk of misalignment or omission with each click	Zero risk: the robot never gets tired
Traceability	“Where is the formula?” (Invisible Logic)	Clear code: A readable and archiveable recipe
Scalability	500 files = 10 hours of manual work	500 files = 0 extra second of work
Mental health	A mind-numbing and repetitive task	Satisfaction at having created a lasting tool

The demonstration: 10 lines to replace 1 hour of clicks

Now that we’ve established that, where do we begin and how do we actually do it?

I’ve prepared a short script to show you that you don’t need to be an experienced developer to implement a data merging procedure:

import pandas as pd
import glob

# 1. List all CSV files in the "essais" folder
files = glob.glob("donnees_essais/*.csv")

# 2. Read and concatenate all files in a single operation
df_total = pd.concat([pd.read_csv(f) for f in fichiers])

# 3. Export the final table to a new Excel or CSV file
df_total.to_csv("synthese_essais_complete.csv", index=False)

print(f"Success: {len(fichiers)} files merged in a flash!")

Line-by-line analysis

If you’ve never opened a code editor, here’s what’s really going on under the hood:

1. We prepara the tools (import)

Initially, two specialized “toolboxes” are called in.

Pandas : It’s the ultra-powerful calculation engine (the equivalent of Excel on steroids).
Glob : This is your “scout”. His only job is to search through your files to find them.

2. Automatic scanning (glob.glob)

Instead of opening each file manually, you give Python a simple instruction: “Go to the ‘essais’ folder and list everything ending in .csv.” Whether there are 5, 50, or 5,000 files, this operation takes the same amount of time for you: zero second.

3. Intelligent stacking (pd.concat)

This is where the magic happens. Python opens each file in the list, reads the data, and stacks it one below the other in a single object called df_total (a DataFrame).
The major advantage: Python automatically aligns the columns. If a file has a misaligned column, it handles it. You no longer have to check the alignment manually.

4. The final export(to_csv)

Once the large array is built in RAM, Python is instructed to write it physically to your hard drive. You get a single, clean file, ready to be opened in Excel for your final graphs… or better yet, ready for automated statistical analysis.

Conclusion

Merging is just the first step. Once all your files are merged into a single dashboard, you’ll quickly notice a problem: the sensors aren’t perfect! Between missing values and outliers, your data will need a makeover.

In the next article we will see how to automatically clean sensor errors from multiple files at once!

➡️ Read the article: How to clean noisy sensor data without a PhD in statistics

If you want to go further and automate your data processing, click here to download the guide for free
“Learn how to automate your data processing in 7 days” (PDF)

50 test reports, 10 seconds: stop copy-pasting, start analyzing