Rebuilding a failing electrochemistry pipeline to handle 100 TB on TimescaleDB
A battery-testing data pipeline had stopped working at the billion-row mark. We re-architected it on TimescaleDB with async ingestion, out-of-order file handling, and automatic KPI extraction — scaling it to 100 TB.
Customer
A battery-testing operation that had accumulated years of electrochemical time-series data — cyclic voltammetry, impedance spectroscopy, and charge/discharge cycles.
Challenge
The data had been ingested into a PostgreSQL database that, at around one billion rows, had effectively stopped working. Queries timed out, ingestion fell behind, and the data was becoming inaccessible rather than useful.
Compounding the problem: files from the same cell arrived out of order, reference performance tests (RPTs) were interleaved with regular cycling data, and there was no reliable way to extract KPIs without first knowing which segments of the data were which.
Solution
We redesigned the pipeline from ingestion through analysis, built for the actual data volumes involved.
- Migrated from PostgreSQL to TimescaleDB — hypertables, continuous aggregates, and compression policies designed for the data profile.
- Rebuilt the ingestion pipeline in Python with NumPy, Pandas, and Polars for high-throughput transformation.
- Implemented out-of-order file handling — cells are assembled correctly regardless of arrival order.
- Built a pattern-detection layer that identifies RPTs and reference tests in the data stream and extracts KPIs automatically.
- Added an async work queue (SQS plus PostgreSQL) for reliable, scalable ingestion across a configurable thread pool.
- Added Parquet export for downstream analysis and data sharing.
- Containerized the whole pipeline for a multithreaded production environment.
100 TB of historical data is now ingested and indexed. RPTs and reference tests are identified automatically, KPIs are extracted, and results are queryable without manual wrangling. Out-of-order files are handled without intervention, and ingestion keeps pace with data generation — a platform that scales with the operation rather than against it.