Predictive Maintenance

Catching equipment failures before they happen instead of after

Predictive MaintenanceManufacturing5 min read

74%

Reduction in unplanned downtime

340 hrs

Annual hours recovered per facility

6 hrs

Average advance warning time

A precision components manufacturer operating four facilities was losing an average of 340 hours per year per facility to unplanned equipment downtime. The maintenance team operated on a scheduled maintenance model — machines were serviced on a calendar schedule regardless of their actual condition. When machines failed between scheduled maintenance windows, the failure was treated as unpredictable. The plant engineers knew from experience that many failures showed precursor signals hours or days before the failure occurred, but had no system for capturing or acting on those signals.

The outcome

We connected the factory floor sensor networks to an analytics platform that models normal operating patterns for each machine and alerts on deviations before they become failures. Unplanned downtime dropped by 74% across all four facilities in the first year.

When downtime is priced into the business

The manufacturer had calculated its cost of unplanned downtime at approximately $8,400 per hour across direct labor, overhead absorption, and expediting costs when a production run had to be restarted. At 340 annual hours per facility, the total cost across four facilities was approximately $11.4 million per year — a number that had been stable enough for long enough that it had been absorbed into the business model as a fixed cost rather than treated as a problem with a solution. The scheduled maintenance model had been the solution, and the prevailing view among plant management was that the remaining downtime was a residual that couldn't be eliminated. The plant engineers didn't agree. They had records of dozens of cases where a machine had been vibrating differently, running hotter, or drawing more current than usual in the days before it failed. The information had existed; there had just been no system to capture it systematically or alert anyone when patterns crossed meaningful thresholds.

Connecting OT and IT: the integration challenge

The four facilities had different vintages of equipment installed at different times over a span of twenty years. Some machines had modern programmable logic controllers with digital sensor outputs that could be read directly over Ethernet. Others had older PLCs with serial interfaces. A third category — the oldest equipment — had no digital outputs at all and relied on analog gauges that were read by operators on their rounds. We designed the data collection layer in three tiers to match this reality. For digitally-connected equipment, we deployed lightweight data collectors that read sensor values from the PLC over the existing plant network on a one-second polling interval. For serial-connected equipment, we installed protocol converters that translated serial output to a network-addressable format. For analog equipment, we installed retrofit sensor kits — vibration sensors, current clamps, and temperature sensors — that added digital sensing capability without requiring modifications to the machine's control systems. The three collection tiers feed into a common data pipeline. From the analytics platform's perspective, every machine looks the same regardless of its age or interface type.

Building prediction models without touching production schedules

We spent the first eight weeks collecting baseline data without building any prediction models. Every machine in all four facilities contributed sensor data to the baseline. We needed to understand what normal operation looked like for each machine under the full range of production conditions: different materials, different feed rates, different ambient temperatures, different load profiles. A machine that runs hot in summer and cool in winter isn't a machine with a problem — it's a machine responding normally to its environment. Without a baseline that accounted for these variables, any model we built would produce excessive false positives whenever normal seasonal variation pushed a sensor reading outside of a naive threshold. The baseline data also revealed something the plant engineering team hadn't expected: five machines across the four facilities were already operating in patterns that the engineers, upon examination, associated with conditions they'd seen before failures in similar equipment. Those five machines were pulled for early inspection before any prediction model was in production. Two of them had developing faults that would have caused failures within the following few weeks.

Calibrating thresholds and eliminating alert fatigue

Alert fatigue is the most common reason predictive maintenance programs fail in practice. A system that generates too many alerts produces a maintenance team that ignores alerts. We calibrated the alert thresholds through a structured process: for each machine, we defined alert levels based on the statistical distance of a current reading from the established baseline — a mild deviation generated a watch alert visible only on the dashboard, a moderate deviation generated a notification to the maintenance team, and a severe deviation generated an immediate page to the plant engineer on duty. We tuned these thresholds over a sixteen-week period by tracking every alert, its urgency classification, and its outcome. Alerts that led to findings when a maintenance technician investigated were confirmed as correctly calibrated. Alerts that led to no finding were reviewed — some were reduced in sensitivity, some led us to improve the baseline model to account for a variable we'd missed. At the end of the calibration period, the false positive rate was under 8% — meaning that more than nine out of ten alerts led to a genuine finding when investigated.

First year results across all four facilities

In the twelve months following full deployment across all four facilities, unplanned equipment downtime dropped from an average of 340 hours per facility to 89 hours per facility — a 74% reduction. The remaining downtime was predominantly attributable to equipment categories that hadn't been included in the initial deployment due to the difficulty of retrofitting sensor packages on certain older machine types; those machines are being addressed in the second phase of the program. The average advance warning time between the first alert on a developing fault and the subsequent equipment failure — measured on cases where the maintenance team chose to monitor rather than immediately intervene — was six hours. In practice, most alerts lead to same-shift or next-shift maintenance intervention, which means failures are being caught and addressed well within the warning window. The maintenance team's scheduled maintenance workload has been restructured: machines that show no precursor signals approaching their scheduled service interval are rescheduled for a later date, freeing maintenance capacity for condition-triggered interventions. Scheduled maintenance hours have decreased by 22% while actual machine health has improved.

All use cases

In this case study

01When downtime is priced into the business
02Connecting OT and IT: the integration challenge
03Building prediction models without touching production schedules
04Calibrating thresholds and eliminating alert fatigue
05First year results across all four facilities

The challenge

Facing a similar challenge?

We work through the technical and operational specifics with your team — no commitment required.

Talk to an engineer View all use cases

Facing a similar infrastructure challenge?

We're happy to have a technical conversation about your specific environment — no commitment required.

Schedule a call View all use cases