Why industrial data is often unusable (and how to fix it)

In many industrial companies, data is everywhere:

  • Sensors
  • Machines
  • Supervision systems
  • Historical databases
  • Excel files

    On paper, it looks like a gold mine.

    We talk about Industry 4.0, AI, predictive maintenance, but when an engineer actually starts working with this data, the reality is often very different.
    The data exists, but it is difficult to use and sometimes simply unusable.

    Contrary to popular belief, the problem in industry is not a lack of data. The problem is the quality of the data.

    Let’s take a few common examples:

    • incorrectly calibrated sensors,
    • missing values,
    • inconsistent units,
    • data recorded at different frequencies,
    • ambiguous variable names,
    • manually modified Excel files

    and so on !
    These problems may seem minor, but when they accumulate, they make any analysis complex or even impossible.

    A very common example

    Imagine that you want to analyze the temperature of an industrial equipment. The data comes from several sources:

    • a monitoring system
    • a CSV export
    • an Excel history

    You quickly discover that:

    • some values are missing,
    • the timestamps are not aligned,
    • the units change depending on the source,
    • some columns have been modified manually.

    Before you even begin the analysis, you spend several hours cleaning the data. This phenomenon is so common that many engineers spend more time preparing data than analyzing it!

    Why this problem is underestimated

    Most discussions about data in the industry focus on:

    • algorithms,
    • machine learning,
    • artificial interlligence

    But these technologies are all based on a simple principle:

    a model cannot be better than the data it uses

    If the data is inconsistent, incomplete, or poorly structured, the results will be unreliable. In some cases, they may even be misleading.

    The temptation to jump straight to models

    It is very tempting to quickly move on to the next step:

    • building a model
    • training an algorithm
    • testing a predictive approach

    But in many industrial projects, this approach fails.

    Why?

    Because the most important step has been overlooked:

    understanding the data.

    Understand before calculating

    Before launching complex analyses, it is essential to answer a few simple questions:

    • Where does the data come from?
    • How is it collected?
    • What transformations has it undergone?
    • What errors are possible?

    This step may seem basic, but it often helps to avoid many problems.

    This is also De Facto Data’s philosophy:

    understand before calculating.

    How to improve the situation

    Fortunately, there are several simple practices for improving the quality of industrial data.

    1. Document data sources

    Each dataset should be accompanied by a clear description:

    • data origin
    • units used
    • acquisition frequency
    • any transformations

    2. Standardize formats

    Consistent formats help avoid many errors:

    • standardized timestamps
    • consistent units
    • explicit variable names

    3. Automate cleaning

    Certain repetitive operations can be automated:

    • detecting outliers
    • removing duplicates
    • harmonizing formats

    This saves time and reduces human error.

    4. Regularly check data quality

    Data quality must be monitored over time.

    Sensors can drift.
    Systems can evolve.

    Regular checks prevent problems from accumulating.

    An opportunity for engineers

    Working with industrial data can sometimes be frustrating, but it is also an opportunity.

    Engineers who know how to:

    • understand data
    • structure information
    • automate analyses

    can transform this data into powerful decision-making tools.

    Going further

    If you regularly work with technical data and certain tasks take you several hours a week, you can start with a simple exercise.

    Identify a repetitive task in your workflow:

    • cleaning files
    • preparing reports
    • consolidating data

    Then figure out how to improve it.

    To help you take that first step, I’ve created a 30-day challenge.
    The goal is simple:

    automate a repetitive task related to your data.

    You can join him here: 30-day challenge


    Welcome to De Facto Data.