How to clean noisy sensor data without being a PhD in Statistics

The Problem: “Garbage In, Garbage Out”

In engineering, an analysis is only as good as the quality of the input data. A sensor that “glitches” for a fraction of a second can generate an aberrant pressure spike of 10,000 bars, completely skewing your averages and fatigue calculations.

The Excel Nightmare: Scrolling through thousands of rows to spot anomalies or using filters that hide data without actually processing it for subsequent calculations.

raw data, sensor noise, outliers, smoothed data, data cleaning

The Solution: Vectorized Cleaning

Rather than processing cells one by one, Python treats the entire column as a physical signal.
We’ll see how to eliminate “noise” and measurement errors instantly.

The Code in Action: The Automatic “Sanity Check”

In the previous article, we saw how to automate the import of CSV files. Now that we have our working database, let’s tackle signal accuracy.
Here’s how to transform a raw, unusable file into a healthy database for your structural or thermal calculations.

1. Remove empty rows or read errors (NaN) 

In Excel, this requires a manual filter. Here, it’s a single command. 

df_clean = df_final.dropna(subset=['pression', 'temperature']) 

2. Remove outliers 

Example: We know that pressure cannot physically exceed 200 bars. 

Therefore, we only keep pressure values below 200 bars. 

df_clean = df_clean[df_clean['pression'] <= 200] 

3. Signal smoothing (Rolling Average) 

To remove sensor noise and identify the true trend, we smooth the signal using a moving average over 10 measurement points, for example. 

df_clean[‘smoothed_pressure’] = df_clean[‘pressure’].rolling(window=10).mean() 

print(f“Initial data: {len(df_final)} rows”) 
print(f“Cleaned data: {len(df_clean)} rows”) 

Why is this a game-changer for your accuracy?

The obstacle The “Manual” method The “De Facto” method
Missing values Leaving zeros skews the averages.dropna() removes them cleanly from the calculation.
Outliers Difficult to spot on a crowded graph.A logical filter eliminates them in a millisecond.
Measurement noiseAn illegible “jagged” curve.Instantaneous mathematical smoothing (moving average).
TraceabilityThe source file is modified (dangerous).The source file remains intact; the cleanup is scripted.

The “De Facto” advice

Never delete your raw data. The advantage of Python is that your script creates a “clean copy” (df_clean) while keeping the original intact. If you later realize that your threshold of 200 bars was too low, all you have to do is change a number in the code and run the script again. That’s reproducibility.

Concrete example

This interactive graph was created with Plotly, allowing for dynamic exploration of sensor signals.

👉 Click here to learn how to interact with this chart
  • Zoom: Click and drag on an area of the signal.
  • Details: Hover over the points to see the exact pressure.
  • Filter: Click on the legend to hide a curve.
  • Reset: Double-click to return to the overview.

Conclusion:

Now that we’ve eliminated the background noise, we can finally let the physics speak for itself. No need to stretch Excel formulas across thousands of rows.

In the following article we will discuss the concept of vectorized computation through a concrete case.

➡️ Read the article : Calculate at lightning speed. From intrinsic data to physical power

Leave a Reply