Skip to content

Production Data: Types, Capture & How to Use It

By Martin Brandel · Last updated: April 2026

What is production data?

Production data is the operational information generated on the shopfloor during manufacturing: machine states, cycle counts, good and scrap parts, process parameters, order progress, operator actions, alarm events and quality measurements. It is the raw material every MES, OEE dashboard and improvement programme runs on. Without it, every decision above the shopfloor is an educated guess.

I have spent 35 years getting this data out of machines, starting with Simatic S5 and COROS visualisations in 1991 and ending, for now, with OPC UA and cloud IoT gateways. The technology changed four times. The underlying problem has not changed once: most machines were never built to share their data, and the plants that actually use their production data are the minority, not the majority. This article is about both halves of that problem.

The four types of production data

Production data is not one thing. Practitioners separate it into four categories because each is captured differently, stored differently and used for a different purpose. Mixing them up is where most data projects go off the rails in the first meeting.

Type
What it contains
Typical source
Machine data (MDE)
Cycles, states, runtimes, stops, speeds, alarm codes
PLC, CNC control, digital I/O, OPC UA server
Operational data (BDE)
Order status, operator login, material consumption, reason codes
Shopfloor terminal, barcode scanner, operator entry
Process data
Temperatures, pressures, torques, flow rates, dimensions
Sensors, measurement devices, inline gauges
Quality data
Inspection results, defect categories, SPC measurements
CMM, vision systems, manual checks, lab

A useful test: if someone says "we have our production data", ask which of the four they mean. Nine times out of ten the answer is MDE only, because MDE is the easiest to automate. The real value shows up when all four are captured against the same production order, at the same timestamp, in the same system. That is what separates a data collection from a data foundation.

Where production data actually comes from

The textbook answer is "from the machine". The practical answer is a layered reality that depends entirely on how old the equipment is. Real plants typically have all four layers running in parallel, because no plant standardises its machine park in one generation.

OPC UA from modern controls. For machines built in the last 10 to 15 years, OPC UA is the default. Siemens, Beckhoff, Rockwell and most drive manufacturers ship with an OPC UA server. The machine publishes its state, cycles, alarms and a subset of process variables. A cloud-capable gateway subscribes and forwards. Straightforward when the server is configured correctly, which, in my experience, happens roughly half the time out of the box.

Digital I/O for brownfield equipment. For anything older or simpler, the cleanest method is to tap the existing signals. A cycle pulse from the PLC output, a running signal from a contactor, a good/bad discrete from an end-of-line sensor. A digital-I/O gateway reads these, timestamps them, and forwards to the cloud. No PLC modification, no change to the safety logic, no production interruption. The Klocke rollout, 3 weeks across an entire Weingarten packaging plant, was exactly this pattern, entirely on digital I/O, no LAN to the machines at all.

MQTT over IoT gateways. For large-scale global rollouts where plants have inconsistent network conditions, MQTT with a message broker (typically Azure IoT Hub) is more robust than OPC UA over WAN. The Carcoustics rollout, 500+ machines across Germany, Poland, Slovakia, Czechia, Mexico, USA and China, runs on IXON IoT devices publishing MQTT to Azure. OPC UA would have struggled; MQTT scaled.

Operator entry at the terminal. Some data cannot and should not come from machines. Rejection reasons that a human judged, material batch scans, shift handover notes. A tablet or shopfloor terminal integrated with the MES is the right path, kept to a minimum so operators can actually keep up with the production pace.

Practical rule: most plants believe their old machines cannot provide data. That is almost never true. In 30+ years of connecting machines, I can count on one hand the number I could not get cycle data out of within a day. The blocker is usually a budget assumption, not a technical limit.

Why most production data never gets used

This is the part of the topic almost nobody writes about honestly. Four patterns explain why plants that have "plenty of data" still operate in the dark.

1. Data in silos. MDE sits on one historian, BDE on the ERP, quality on a lab system, process data on a SCADA. Joining them for a single production order requires a person with an Excel licence and too much time. The analysis that would reveal the root cause is theoretically possible and practically never done.

2. No reference to the production order. A cycle counter without an order ID is a number. The same cycle counter linked to "order 4078, part XYZ, operator shift B, material lot 2024-112" is actionable. Most legacy MDE systems were built before this context was routinely captured, so years of data exist that cannot be attributed to anything specific.

3. Data captured at too low a resolution. One status update per minute looks reasonable on a spec sheet. For microstops of 20 seconds it is useless. Data at 1-second or higher resolution is what makes short-stop analysis and real OEE possible. Plants that skimp here end up paying the licence for a system that cannot answer their biggest question.

4. Nobody owns the data. IT owns the infrastructure, production owns the machines, quality owns inspection, controlling owns reporting. The data sits between them and nobody is responsible for its correctness. When a number is questioned, it gets explained away rather than fixed.

What changes when production data is captured properly

When the four data types are captured at order-level resolution, stored against a common timestamp, and exposed through one interface, three things become possible that were not before.

  1. Honest KPIs. OEE, availability, quality rate and cycle-time variance computed from the raw data, not from operator estimates. The numbers are usually lower than the plant previously believed, and that drop is the starting point for real improvement.
  2. Root-cause analysis in minutes. A scrap event is cross-referenced automatically against the machine state, the process parameters and the running order at that timestamp. What used to take a three-person meeting on Friday takes a filter in the dashboard on Tuesday afternoon.
  3. Closed-loop control. When quality data feeds back to process data which feeds back to the PLC, the improvement loop runs every cycle, not every quarter. That is the promise Industry 4.0 has been selling for a decade and that most plants still have not realised because the data layer underneath was never finished.

A real case: Carcoustics International

Carcoustics is a global automotive supplier for acoustic and thermal components with plants in Germany, Poland, Slovakia, Czechia, Mexico, USA and China. The starting point was the familiar one: mixed machine parks from four decades (injection moulding, cold foaming, stamping), partial local monitoring in a few plants, no global view of performance. The goal was not another local tool; it was one system that could answer the same question in every plant, regardless of local IT.

The SYMESTIC engagement began as a proof-of-concept at the Haldensleben plant: connect enough machines to prove the data flow and the analytics in weeks, not quarters. On the back of the PoC, Carcoustics replaced existing legacy monitoring in Haldensleben and Poland, then scaled to 500+ machines across all plants within 6 months. The critical technical decisions that made this possible:

  • IXON IoT gateways at each machine, capturing cycles, states and alarms regardless of the underlying control generation
  • MQTT over Azure IoT Hub for WAN-tolerant, high-volume ingestion across seven countries
  • Bidirectional SAP R/3 integration via ABAP IDoc, mapping machine cycles to production orders and returning actuals to ERP
  • Digital setup support replacing paper-based changeover sheets, captured as structured data from day one

The measured results after the rollout: 4 % reduction in downtime, 3 % increase in output, 8 % improvement in availability, and, more importantly, one consistent production data layer across all plants that Carcoustics now extends autonomously using the SYMESTIC modular kit. That last point matters more than the percentages: the data foundation is the customer's, not the vendor's.

FAQ

What is the difference between MDE, BDE and production data?
Production data is the umbrella term. MDE (machine data) and BDE (operational data) are two of the four subsets, alongside process data and quality data. In daily language the terms get blurred, which causes real confusion in requirements discussions. Keep them separate in writing.

Can we capture production data from old machines without OPC UA?
Yes, and in my experience this covers 80 % of the real market. Digital I/O gateways tap existing signals (cycle pulse, running contact, end-of-line good/bad), timestamp them, and forward to the cloud. No PLC change, no production interruption. The Klocke rollout is a working example across an entire plant.

Does production data have to live in the cloud?
No, and the choice depends on scale and context. For a single plant with stable IT, an on-premise historian still works. For multi-plant, multi-country operations, or for any plant that wants to avoid server maintenance, cloud ingestion via MQTT or OPC UA over the internet is now the pragmatic default. Security is solved through hardened gateways, not by keeping data on-site.

How much production data should we store, and for how long?
At machine level, store raw cycle-level events for at least 12 months so you have year-over-year comparisons. Aggregate (per minute, per hour, per shift) for longer periods. In regulated industries, the retention period follows the regulation, not preference. In unregulated discrete manufacturing, 12 to 24 months of raw data plus long-term aggregates is a good baseline.

What is the first step for a plant with no production data today?
Pick one line, one gateway, one set of KPIs. Running for two weeks. Do not try to design the enterprise data model first. Data strategy that starts with PowerPoint almost always stalls; data strategy that starts with a gateway on a real machine almost always grows. SYMESTIC typically goes live in under 30 days for Production Metrics on 10 machines.

How does SYMESTIC handle production data?
Four capture paths (OPC UA, digital I/O, MQTT, operator terminal) into a single cloud data layer, with automatic linking to the production order, machine, operator and material. Bidirectional ERP integration (SAP, Infor, Microsoft, proAlpha and others) so actuals flow back automatically. See SYMESTIC Production Metrics and SYMESTIC Process Data.


Related: MES · OEE · Production Quality · Production Control · Process Monitoring · SYMESTIC Production Metrics · SYMESTIC Process Data

About the author
Martin Brandel
Martin Brandel
MES Consultant at SYMESTIC. 35+ years in industrial automation, from Simatic S5 and COROS visualisations in 1991 to OPC UA and cloud IoT gateways today. Built and led the SYMESTIC automation department for 11 years, covering process control standards in the food, beverage and wood industries and S5-to-S7/TIA retrofits. Since 2019 leads MES and connectivity projects end-to-end from first inquiry to go-live. Dipl.-Ing. Nachrichtentechnik. · LinkedIn
Start working with SYMESTIC today to boost your productivity, efficiency, and quality!
Contact us
Symestic Ninja