Optimizing aggregation precision for high-frequency sensor data

A triaxial accelerometer sampling at 10 kHz emits 864 million points per axis per day; a naive one-minute rollup collapses each 600,000-sample window into a single mean. When that window is a quiet baseline hovering at 0.0031 mm/s, a fixed four-decimal contract keeps the signal — but when the same sensor sits below its noise floor at 0.00007 mm/s, four decimals quantize the reading to 0.0001 and the micro-g detail that predictive-maintenance models depend on is gone. The opposite failure is just as real: carrying twelve decimals on a noisy 500 Hz stream stores pure quantization noise as if it were signal and inflates the rollup bucket. Optimizing aggregation precision for high-frequency sensor data means picking the decimal precision per window from the signal’s own magnitude and variance, while keeping the aggregation itself deterministic across millions of samples so parallel execution paths never disagree. This page builds a variance-driven dynamic-precision task on top of the static half-even contract defined in precision mapping and rounding strategies.

Prerequisites

InfluxDB v2.7+ (OSS or Cloud) — the option task semantics and math standard library used below are v2-native.
Flux v0.194+ — for math.floor, math.mod, math.log10, and math.pow.
A high-rate source bucket (iot_raw, ≥7-day retention) and a rollup destination (iot_1m).
An API token with read on iot_raw and write on iot_1m.
The round_half_even helper from the parent precision-mapping guide — this page reuses it rather than redefining bias-prone rounding.
A known per-sensor noise floor (the smallest amplitude your ADC resolves above dither), used to cap significant digits.

How precision drifts on high-frequency streams

Before writing the task, it helps to name the three ways fidelity leaks specifically at kHz rates, because each one is amplified by sample count:

Floating-point accumulation. IEEE 754 float64 addition is non-associative, so summing 600,000 samples in a different order — which is exactly what happens when cluster shards aggregate a partitioned series in parallel — yields results that differ by 1e-12 to 1e-8. Imperceptible on a dashboard, but enough to make a deterministic threshold fire on one shard and not another.
Under-quantization of quiet windows. A fixed decimal contract chosen for the signal’s loud regime truncates its quiet regime below the ADC resolution, erasing low-amplitude precursors that vibration analytics look for.
Over-quantization of noisy windows. Carrying digits below the noise floor persists dither as signal, bloating cardinality-adjacent storage and destabilizing downstream comparisons.

The static contract in the parent cluster solves the bias problem (halves rounding one direction) with half-even rounding. High-frequency streams add the scale problem: the right number of decimals is not constant, it tracks the window’s amplitude. The fix is to compute the decimal places from the aggregate itself.

Solution walkthrough

Step 1 — Aggregate deterministically at full precision

Compute the window statistic first, at full float64 precision, and anchor the window so every run tiles the timeline contiguously. For a 500 Hz RMS-velocity stream, align aggregateWindow(every:) to the task’s every and add an offset above the p99 edge-delivery lag so late high-frequency batches land before the window closes. Determinism comes from not rounding yet: quantize once, at the end, never per input sample.

flux

import "math"

option task = {name: "downsample_vibration_1m", every: 1m, offset: 15s}

data = from(bucket: "iot_raw")
    |> range(start: -task.every, stop: now())
    |> filter(fn: (r) => r._measurement == "accelerometer")
    |> filter(fn: (r) => r._field == "rms_velocity")
    // Aggregate at full float64 precision — do NOT round the raw samples.
    |> aggregateWindow(every: 1m, fn: mean, createEmpty: false)

The offset: 15s delays execution past the top of each minute so a gateway flushing a 500 Hz buffer a few seconds late is still included; it does not move the window. Matching every: 1m on both the task and aggregateWindow guarantees exactly one point per series per run, so no sample is counted in two windows. Rounding the raw stream before mean would destroy sub-quantum detail the average preserves — the ordering here is load-bearing.

Step 2 — Derive the decimal places from the window’s magnitude

Instead of a hard-coded decimal count, target a fixed number of significant figures so the precision floats with the signal. For a value $x$ and target significant figures $s$, the number of decimal places is:

$$ d(x) = s - 1 - \lfloor \log_{10} |x| \rfloor $$

A window at 0.00007 with $s = 4$ yields $d = 7$ decimals; a window at 1.03 with the same $s$ yields $d = 3$. Both carry four meaningful digits. Clamp $d$ so it never exceeds the sensor’s noise floor (no point storing digit 8 when the ADC resolves 6) and never goes negative.

flux

// Decimal places for `sig` significant figures at value x, clamped to [0, maxDp].
sig_decimals = (x, sig, maxDp) => {
    mag = if x == 0.0 then 0.0 else math.floor(x: math.log10(x: math.abs(x: x)))
    raw = float(v: sig) - 1.0 - mag
    clampedHigh = if raw > float(v: maxDp) then float(v: maxDp) else raw
    return if clampedHigh < 0.0 then 0.0 else clampedHigh
}

maxDp is the hard ceiling from the sensor’s noise floor — for a vibration channel resolving micro-g detail, maxDp: 7 is typical. The x == 0.0 guard avoids log10(0) returning -Inf and poisoning the exponent on genuinely silent windows.

Step 3 — Quantize each window with half-even at its own precision

Combine the dynamic decimal count with the bias-free round_half_even helper (carried over from the static precision contract) so quiet and loud windows are each rounded without directional drift, then tag the mode and write the result.

flux

import "math"

round_half_even = (x) => {
    f = math.floor(x: x)
    diff = x - f
    return
        if diff < 0.5 then f
        else if diff > 0.5 then f + 1.0
        else if math.mod(x: f, y: 2.0) == 0.0 then f
        else f + 1.0
}

data
    |> map(fn: (r) => {
        d = sig_decimals(x: r._value, sig: 4.0, maxDp: 7.0)
        scale = math.pow(x: 10.0, y: d)
        return {r with _value: round_half_even(x: r._value * scale) / scale}
    })
    |> set(key: "precision_mode", value: "sig4_dynamic")
    |> to(bucket: "iot_1m", org: "production_ops")

Each window scales by its own 10^d, rounds half-even, and unscales — so 0.00007 keeps seven decimals while 1.03 keeps three, and neither picks up the cumulative half-away-from-zero bias that plain math.round() injects. Tagging precision_mode: sig4_dynamic records the contract in the rollup so a downstream consumer can detect a policy change. Because every literal stays a float (10.0, 2.0, + 1.0), the _value column never demotes to int.

Gotchas and edge cases

Dynamic precision breaks a fixed epsilon drift check. A validator that compares raw and rollup with one absolute tolerance (say 1e-6) will false-alarm on loud windows and miss real drift on quiet ones, because the meaningful tolerance now scales with magnitude. Compare in relative terms — |raw − rollup| / |raw| against a fixed ppm budget — or quantize both sides to the same significant figures before differencing.

log10 on a mean that lands exactly on a power of ten. When a window mean is precisely 0.01, floating-point log10 can return -1.9999999 and floor snaps it to -2 instead of -1, shifting the decimal count by one. It is harmless for storage but makes two adjacent windows carry different decimal counts for visually identical values; if that bothers a strict schema audit, add a tiny epsilon (math.log10(x: |x| * (1.0 + 1e-9))) before flooring.

createEmpty: false versus a mandatory dense series. Sparse high-frequency sensors that go silent between events produce no window — with createEmpty: false the rollup simply omits that minute, which keeps storage lean but leaves gaps a naive difference() will misread. Set createEmpty: true only when a downstream consumer genuinely requires a point every minute, and handle the resulting nulls before sig_decimals sees them (a null magnitude poisons log10). Gap recovery itself belongs to fallback chains for missing data.

Verification

Confirm that dynamic precision preserved significant figures rather than absolute decimals by re-aggregating the raw window and comparing relative drift. This offline check quantizes both series to four significant figures with decimal arithmetic, sidestepping float64 representation error:

python

import pandas as pd
from decimal import Decimal

def relative_drift(raw: pd.Series, rollup: pd.Series, sig: int = 4, ppm_budget: float = 50.0) -> dict:
    """Compare index-aligned raw vs. rollup at `sig` significant figures; flag drift above ppm_budget."""
    def sigfig(x):
        return float(Decimal(f"%.{sig}g" % x))
    r = raw.apply(sigfig)
    o = rollup.apply(sigfig)
    rel = ((r - o).abs() / r.abs().replace(0, pd.NA)).dropna() * 1e6  # parts per million
    return {
        "max_ppm": float(rel.max()),
        "windows_over_budget": int((rel > ppm_budget).sum()),
        "passed": bool((rel <= ppm_budget).all()),
    }

raw = pd.Series([0.00007, 1.0312, 0.0031, 0.00007123])
rollup = pd.Series([0.00007000, 1.031, 0.003100, 0.00007123])
print(relative_drift(raw, rollup))

A passed: True result confirms every window kept its four significant figures within the ppm budget, whether it was a micro-g whisper or a full-scale event. To confirm the task itself is writing the expected precision tag, query the rollup directly:

flux

from(bucket: "iot_1m")
    |> range(start: -1h)
    |> filter(fn: (r) => r._measurement == "accelerometer" and r.precision_mode == "sig4_dynamic")
    |> keep(columns: ["_time", "_value", "precision_mode"])

Precision Mapping & Rounding Strategies — the static half-even contract and round_half_even helper this page extends.
Threshold Tuning for Aggregation — keeping alert boundaries aligned when rollup precision varies per window.
Migrating Legacy Continuous Queries to InfluxDB 2.x Tasks — where implicit coercion in old CQs silently hid precision loss.

Up: Precision Mapping & Rounding Strategies

# Optimizing aggregation precision for high-frequency sensor data

# Prerequisites

# How precision drifts on high-frequency streams

# Solution walkthrough

# Step 1 — Aggregate deterministically at full precision

# Step 2 — Derive the decimal places from the window’s magnitude

# Step 3 — Quantize each window with half-even at its own precision

# Gotchas and edge cases

# Verification

# Related

Optimizing aggregation precision for high-frequency sensor data

Prerequisites

How precision drifts on high-frequency streams

Solution walkthrough

Step 1 — Aggregate deterministically at full precision

Step 2 — Derive the decimal places from the window’s magnitude

Step 3 — Quantize each window with half-even at its own precision

Gotchas and edge cases

Verification

Related