Quality Control¶
All QC functions add boolean flag columns to the DataFrame without modifying original values. True means the check failed for that row.
Range checks — qc_range¶
Flags values outside climatologically plausible bounds.
| Variable | Min | Max |
|---|---|---|
temp_c |
-40 °C | 55 °C |
rh_pct |
0 % | 100 % |
pres_hpa |
870 hPa | 1085 hPa |
wspd_ms |
0 m/s | 75 m/s |
wdir_deg |
0° | 360° |
gust_ms |
0 m/s | 100 m/s |
rain_mm |
0 mm | 1000 mm |
rain_rate_mmh |
0 mm/h | 500 mm/h |
solar_wm2 |
0 W/m² | 1500 W/m² |
uv_index |
0 | 20 |
Output columns: qc_<var>_range (one per variable present).
Spike detection — qc_spike¶
Flags sudden spikes using a rolling Median Absolute Deviation (MAD) z-score. More robust to outliers than standard-deviation methods.
Algorithm:
1. Rolling median over window (default 9), min_periods=3
2. Rolling MAD over the same window
3. z = |x − median| / (1.4826 × MAD + ε)
4. Flag where z > thresh (default 6.0)
By default, spike detection uses the historical centered rolling window. For ML feature generation, pass causal=True to compute the local median and MAD from previous observations only, so each flag depends only on data available at that timestamp.
Output columns: qc_<var>_spike.
from metdatapy.qc import qc_spike
df = qc_spike(df, window=9, thresh=6.0)
df = qc_spike(df, window=9, thresh=6.0, causal=True)
# or
ws.qc_spike(causal=True)
Flatline detection — qc_flatline¶
Flags stuck or frozen sensor readings — periods with suspiciously low rolling variance.
Behaviour (v1.0.2):
- Uses min_periods = max(3, window // 2 + 1) so short series and NaN-dominated windows are not flagged.
- Only flags when variance is genuinely at or below tol and is not NaN. A window with insufficient valid observations produces NaN variance, which is not treated as zero.
- Pass causal=True to compute rolling variance from the current and previous observations only for time-safe ML features.
Output columns: qc_<var>_flatline.
from metdatapy.qc import qc_flatline
df = qc_flatline(df, window=5, tol=0.0) # tol=0 flags perfect flatlines only
df = qc_flatline(df, window=5, tol=1e-6) # flag near-constant values
df = qc_flatline(df, window=5, tol=0.0, causal=True)
# or
ws.qc_flatline(causal=True)
Consistency checks — qc_consistency¶
Flags violations of physical relationships between variables.
| Check | Condition flagged |
|---|---|
| Dew point vs temperature | dew_point_c > temp_c |
| Wind chill vs temperature | wind_chill_c > temp_c |
| Heat index vs temperature | heat_index_c < temp_c |
| Wind direction when calm | wdir_deg is not NaN when wspd_ms ≤ 0.2 m/s |
Only checks for variables present in the DataFrame are applied. Output column: qc_consistency.
from metdatapy.qc import qc_consistency
df = qc_consistency(df)
# or
ws.qc_consistency() # also adds qc_any
Aggregate flag — qc_any¶
Combines all qc_* columns into a single qc_any flag (True if any check failed).
from metdatapy.qc import qc_any
df = qc_any(df)
# Filter to clean rows only
clean = df[df["qc_any"] == False]
WeatherSet.qc_consistency() calls qc_any automatically. To aggregate after running checks individually: