Turning 25 years of materials-analysis records into a queryable archive — fully air-gapped
Unlocking a quarter-century of metallurgical and elemental-analysis data trapped in incompatible file formats inside a compliance CMS — normalized with a consensus parser/ML pipeline and made longitudinally queryable, entirely on-prem in an air-gapped regulated cloud.
Customer
A government-affiliated organization that works with sensitive materials data behind strict access controls. Everything runs on-premises, disconnected from the public internet, with vetted personnel and tightly governed systems — the kind of environment where the data physically never leaves the facility, and where nothing we built could either.
Challenge
For decades, the organization's materials-analysis results — metallurgical reports, elemental-analysis data, and related records — had accumulated inside a compliance-grade content-management system. The records were trustworthy and faithfully governed, but they had been captured over more than 25 years in a sprawl of incompatible file formats: each generation of instrument, software, and analyst left behind its own conventions. The CMS preserved every file and its metadata perfectly, yet it could not answer the question that actually mattered — how has a given material, or a given measurement, behaved over time? Any longitudinal analysis meant a person opening files one at a time.
Compounding it: the entire environment is air-gapped and highly regulated. No cloud service, no hosted ML API, no off-the-shelf SaaS could ever touch the data. Anything we delivered had to run end-to-end inside their walls.
Solution
We designed and deployed a fully on-premises system that lives entirely within the regulated environment, with no external connectivity at any layer.
- Built and deployed on an air-gapped, edge OpenShift Container Platform — every component, including the normalization models, runs on-prem in the regulated cloud.
- Ingested records and their metadata out of the compliance CMS without disturbing the system of record, preserving the governance and audit trail that environment demands.
- Reconciled decades of incompatible formats with a consensus normalization pipeline: deterministic, hand-written parsers for the well-understood formats paired with a machine-learning layer for the messy and one-off cases. The two cross-check each other — agreement is accepted automatically, disagreement is surfaced for review — so accuracy never rides on a single method.
- Loaded the normalized results into a SQL Server database under one unified schema, so analysts can query across the full history — by material, by measurement, by time — instead of opening files one by one.
Twenty-five years of previously unqueryable materials-analysis data is now a single, longitudinal dataset — searchable across decades, entirely within the air-gapped environment. New results flow through the same pipeline, so the archive keeps growing without re-introducing the format sprawl it replaced.