MetDataPy¶
MetDataPy is a source-agnostic toolkit for ingesting, cleaning, QC-flagging, enriching, and preparing meteorological time-series data for machine learning.
What it provides today¶
- Canonical schema with UTC timestamp index and metric units
- Ingestion from CSV with mapping (explicit or autodetected), including encoding/delimiter
detection and optional
ts.timezonemetadata so naive local timestamps convert to UTC correctly - Weathercloud CSV and directory ingestion helpers
- Interactive mapping wizard and robust autodetection heuristics
- Unit normalization, rain-rate schema support, and rain accumulation fix-up
- Quality control: range, spike (MAD), flatline, and cross-variable consistency checks with non-destructive flags
- Derived metrics: dew point, VPD, heat index, and wind chill
- WeatherSet operations: gap insertion, resampling/aggregation, wind-direction cyclic encoding, rolling features, calendar features, exogenous joins
- CLI commands for ingestion, QC, and templates
Architecture¶
mapper.py: mapping loader/saver and autodetectorcore.py:WeatherSetdata container and transformationsweathercloud.py: Weathercloud-specific CSV and directory ingestionqc.py: QC checks and flagsderive.py: derived meteorological metricscli.py:mdpcommand line interface
See the pages in the navigation for details.