Flux Scripting for Task Automation

A downsampling task that looks correct in the query editor can still corrupt a production pipeline the moment the scheduler owns it. The classic failure sequence: an engineer tests from(bucket:...) |> range(start: -1h) interactively, it returns clean data, so they paste it into a task with every: 1h. Under the scheduler, range(start: -1h) is now anchored to wall-clock now() at execution time rather than the logical window the scheduler intended. When a run is delayed by a compaction spike, or a batch of late-arriving IoT packets lands after the boundary, the task either double-counts an overlapping slice or silently skips a gap. The rollup bucket ends up with duplicate points and missing hours that nobody notices until a dashboard shows a temperature average of zero. Flux scripting for task automation is the discipline of writing task scripts that are deterministic under the scheduler, not just correct in an editor — and this page covers the structure, staging, governance, and verification patterns that get you there.

This is a specialization of the broader Automated Task Scheduling & Orchestration model; here we focus specifically on the Flux script itself as the unit of execution.

Prerequisites

Before productionizing task scripts, confirm your environment matches the assumptions used throughout this page:

InfluxDB v2.7+ (OSS or Cloud) — the option task semantics and _tasks system bucket behavior described here are v2-native.
Flux v0.194+ — required for the experimental and date helpers referenced in later sections.
A source bucket for raw writes (this page uses raw_telemetry) and at least one destination bucket (aggregated_telemetry, hourly_rollups).
An API token scoped with read on the source bucket and write on all destination buckets.
For programmatic management: influxdb-client (Python) v1.40+, if you drive tasks from outside the database.
Optional but recommended: a dedicated _monitoring/notification bucket for deadman and health-check output.

Buckets must exist before a task references them — the scheduler does not create destination buckets on demand, and a missing target bucket is a per-run hard failure, not a warning.

Core Concept: Why Window Alignment Is Everything

A native InfluxDB task is a declarative Flux script that the engine executes on a fixed cadence. The single most important property of a well-written task is window alignment: every run must process exactly the slice of time the scheduler intended for it — no more, no less — regardless of when the run actually fires.

The scheduler injects two variables into every execution context: v.timeRangeStart and v.timeRangeStop. These are computed from the task’s cadence and the logical run boundary, not from wall-clock time. Anchoring your range() to these variables is what makes a task idempotent — re-running the same logical window produces the same output, so a retry after a crash never double-writes.

The relationship between the three quantities that govern a run is worth stating precisely. For an interval task, the processed window width equals the cadence, and the run fires after a deliberate delay:

$$ \text{run_time} = (n \cdot \text{every}) + \text{offset}, \qquad \text{window} = [,v.timeRangeStart,; v.timeRangeStop,],\quad v.timeRangeStop - v.timeRangeStart = \text{every} $$

The offset buys time for late data to land before the window closes; it does not shift the window itself. This distinction is the root of most late-data bugs and is explored further under cron & interval scheduling logic, where calendar alignment and drift resistance are compared in depth.

Each run's logical window is pinned to the scheduler boundaries and never slides; offset only delays the firing moment, holding the window open long enough for late edge-device writes to be captured by the run that owns them.

Step-by-Step Implementation

Step 1 — Declare the execution contract

Every task begins with an option task record. This is not metadata decoration; it is the contract the scheduler parses before the script body runs to determine identity, cadence, and alignment.

flux

option task = {
    name: "iot_sensor_rollup",   // unique identity in the _tasks system bucket
    every: 1h,                   // cadence: process one 1h window per run
    offset: 5m,                  // wait 5m past the boundary for late points
}

from(bucket: "raw_telemetry")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r._measurement == "environmental_sensors")
    |> aggregateWindow(every: 10m, fn: mean, createEmpty: false)
    |> to(bucket: "aggregated_telemetry")

The critical callout: range(start: v.timeRangeStart, stop: v.timeRangeStop) — never range(start: -1h) inside a task. The former is scheduler-anchored and idempotent; the latter drifts with execution latency and is the single most common cause of gaps and overlaps in production rollups.

Step 2 — Filter early, before you aggregate

Predicate pushdown means a filter() placed immediately after range() prunes series at the storage layer, before rows are materialized into memory. Filtering late — after a map() or a pivot() — defeats pushdown and forces the engine to scan the full high-cardinality set.

flux

from(bucket: "raw_telemetry")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r._measurement == "machine_vibration")
    |> filter(fn: (r) => r._field == "rms_amplitude")     // narrow to one field early
    |> aggregateWindow(every: 5m, fn: mean, createEmpty: false)

Here createEmpty: false is the correct choice for sparse IoT sensors — a device that reports intermittently should not manufacture null-valued rows for windows in which it was silent. Set it to true only when a downstream consumer requires a dense, gap-free series.

Step 3 — Route explicitly to the destination bucket

Cross-bucket routing is handled natively by to(). Prune columns you will not query so the destination inherits a lean schema — every retained tag becomes part of the series cardinality you pay for downstream.

flux

option task = {
    name: "sensor_downsample_hourly",
    every: 1h,
    offset: 10m,
}

from(bucket: "raw_telemetry")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r._measurement == "machine_vibration")
    |> aggregateWindow(every: 1h, fn: last, createEmpty: false)
    |> keep(columns: ["_time", "_measurement", "_field", "_value", "machine_id", "location"])
    |> to(bucket: "hourly_rollups")

Note that aggregateWindow(every: 1h, ...) matches the task’s every: 1h. Matching the aggregation period to the cadence guarantees each run emits exactly one point per series and eliminates the overlapping-computation class of bug. For staged pipelines where multiple tasks hand off to each other, the boundary-alignment rules are detailed in writing robust Flux scripts for automated data rollups.

Step 4 — Compose stages into a lifecycle

Production telemetry rarely lives in one query. It flows through staged tasks, each owning one transformation:

A staged pipeline: each task owns one transformation and consumes the precise window its upstream emitted, keeping every hop window-aligned end to end.

Ingestion & validation — raw data lands in a high-cardinality measurement; a task applies tag normalization and outlier filtering with filter() and map().
Transformation & rollup — high-frequency data is downsampled with aggregateWindow (mean, last, or a custom reduce), collapsing cardinality for dashboards and alerting. The statistical trade-offs of each aggregation function belong to downsampling aggregation pipeline design.
Retention & archival — processed data is written to long-lived buckets while raw data expires under a short retention window, a policy surface owned by retention policy design.

Each downstream stage must consume the exact window its upstream produced. When a chain grows beyond two hops or introduces conditional branches, the ordering guarantees are no longer expressible inside a single option task block and belong to dependency mapping & DAG construction.

Configuration Reference

The option task record accepts a small, fixed set of keys. Everything else that governs execution — concurrency caps, memory limits, retries — lives at the instance or orchestration layer, not in the script.

Option	Accepted values	Default	Effect
`name`	string (unique)	— (required)	Task identity recorded in the `_tasks` system bucket; must be unique per org.
`every`	duration literal (`15m`, `1h`, `1d`)	— (one of `every`/`cron` required)	Fixed-interval cadence; the processed window width equals this value.
`cron`	POSIX cron string (`"0 2 * * *"`)	— (mutually exclusive with `every`)	Calendar-aligned cadence for wall-clock boundaries (e.g. 02:00 daily).
`offset`	duration literal	`0s`	Delay past the boundary before firing, reserving time for late-arriving points. Does not move the window.

A few keys engineers expect to find here but that do not exist inside option task: there is no concurrency, retry, or memory field in the Flux task record on current InfluxDB. Concurrency and query-memory ceilings are governed by instance settings such as query.max-concurrency and per-query memory limits; task-run retry behavior is a property of the scheduler and orchestration tier. Treat the script as the what and the platform as the how many / how hard.

Common Failure Modes and Fixes

1. Wall-clock range under the scheduler (overlaps and gaps). Symptom: the rollup bucket shows duplicate points for some windows and missing hours for others; totals drift over days. Root cause: the script uses range(start: -1h) instead of the injected variables, so the window slides with execution latency. Fix: anchor to the scheduler window.

flux

// Wrong: |> range(start: -1h)
from(bucket: "raw_telemetry")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)   // idempotent

2. Offset too small for late IoT data. Symptom: points arriving 2–3 minutes after the hour are absent from the rollup even though they exist in raw_telemetry. Root cause: offset closes the window before the field-gateway flush that batches edge-device writes. Fix: set offset to comfortably exceed your worst-case ingestion lag.

flux

option task = {
    name: "iot_sensor_rollup",
    every: 1h,
    offset: 8m,   // > observed p99 delivery lag from the edge gateway
}

3. Late filtering defeats predicate pushdown (query timeouts). Symptom: a task that reads a wide measurement times out or breaches its memory ceiling on long windows. Root cause: filter() sits after pivot()/map(), so the engine materializes the full series set before narrowing. Fix: move every field/tag filter() directly after range(); defer reshaping until the dataset is already small.

4. Missing destination bucket (silent per-run failure). Symptom: task status shows repeated failures with bucket not found; no data is written and no exception surfaces in the query editor. Root cause: to(bucket: "hourly_rollups") names a bucket that was never created. Fix: create destination buckets as part of deployment (Terraform, influx bucket create, or the client library) before activating the task.

5. Aggregation period misaligned with cadence. Symptom: each run writes several points per series, or partial windows appear at run edges. Root cause: aggregateWindow(every: 15m) inside a task with every: 1h splits the window into four sub-buckets, the last of which straddles the boundary. Fix: set aggregateWindow(every:) equal to the task’s every, or make it an exact integer divisor only when you deliberately want multiple sub-points per run.

Verification and Testing

Never trust a task from the editor alone — verify it under scheduler semantics.

Dry-run the exact window before activating. Bind the scheduler variables manually and run the body to confirm it emits what you expect:

flux

import "date"

option v = {
    timeRangeStart: date.truncate(t: -1h, unit: 1h),
    timeRangeStop: date.truncate(t: now(), unit: 1h),
}

from(bucket: "raw_telemetry")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r._measurement == "environmental_sensors")
    |> aggregateWindow(every: 10m, fn: mean, createEmpty: false)

Confirm the rollup actually landed. After the first scheduled run, count points in the destination for the processed window:

flux

from(bucket: "aggregated_telemetry")
    |> range(start: -2h)
    |> filter(fn: (r) => r._measurement == "environmental_sensors")
    |> count()

Add a deadman health check. A downsampling task that silently stops is worse than one that errors loudly. A companion task detects the absence of recent output and emits a signal:

flux

import "influxdata/influxdb/monitor"
import "experimental"

option task = {
    name: "rollup_deadman",
    every: 15m,
    offset: 1m,
}

from(bucket: "aggregated_telemetry")
    |> range(start: -30m)
    |> filter(fn: (r) => r._measurement == "environmental_sensors")
    |> last()
    |> map(fn: (r) => ({ r with dead: r._time < experimental.subDuration(d: 20m, from: now()) }))
    |> filter(fn: (r) => r.dead == true)
    |> to(bucket: "_alerts")

If the query returns rows, the primary rollup has gone quiet — wire the _alerts output to a notification endpoint. Building that notification and escalation layer in Python is covered under Python client orchestration patterns.

Integration Points

Flux task scripts sit at the center of the automation surface but rarely act alone. The cadence and calendar behavior wrapping every option task block is the domain of cron & interval scheduling logic; the statistical correctness of what a rollup computes belongs to downsampling aggregation pipeline design; the bucket topology a task writes into — hot, warm, cold — is defined by bucket architecture & tiering boundaries. When task scripts must be created, versioned, or monitored from application code, the influxdb-client library provides REST access to task creation, run history, and log retrieval — the patterns for wrapping that in resilient application code live in Python client orchestration patterns. For teams standardizing on infrastructure-as-code, the official InfluxDB task management documentation and the Flux documentation detail the endpoints and language surface referenced throughout this page.

Frequently Asked Questions

Why does my task work in the query editor but produce gaps once scheduled?

The editor evaluates range(start: -1h) against wall-clock now() the instant you run it, so it always looks correct. The scheduler, by contrast, has a logical run window that can lag real time when runs queue behind compaction or a restart. Replace relative ranges with range(start: v.timeRangeStart, stop: v.timeRangeStop) so the window is fixed by the scheduler, not by execution latency.

Can I set concurrency or retries inside the `option task` block?

No. On current InfluxDB the task record only accepts name, every (or cron), and offset. Concurrency ceilings and query-memory limits are instance-level settings (query.max-concurrency and related), and run-retry behavior is governed by the scheduler and any external orchestration layer — not by the Flux script.

How large should `offset` be for edge devices on flaky networks?

Set offset to exceed the p99 delivery lag you actually observe from the edge — including any field-gateway batching interval. If gateways flush every 5 minutes and network jitter adds a minute, an offset of 6–8 minutes is reasonable. Too small and you drop late points; excessively large and your rollups become needlessly stale.

Should `createEmpty` be true or false for intermittent sensors?

Use createEmpty: false for sparse or event-driven sensors so silent windows do not fabricate null rows and inflate cardinality. Use createEmpty: true only when a downstream consumer (a chart that must not break the line, or a fixed-shape export) genuinely requires a dense, gap-free series.

How do I confirm a running task hasn’t silently stopped?

Pair each production rollup with a deadman check: a lightweight task that reads the destination’s most recent point and flags it if it is older than a threshold, writing to an alert bucket wired to a notification hook. Combined with a periodic count() over the destination window, this catches both hard failures and the more dangerous case of a task that runs but writes nothing.

Cron & Interval Scheduling Logic — calendar vs. drift-resistant cadences for the option task block.
Python Client Orchestration Patterns — manage, trigger, and monitor task scripts from application code.
Dependency Mapping & DAG Construction — ordering multi-stage task chains beyond a single script.
Writing Robust Flux Scripts for Automated Data Rollups — boundary-alignment patterns for staged rollups.

Up: Automated Task Scheduling & Orchestration

# Flux Scripting for Task Automation

# Prerequisites

# Core Concept: Why Window Alignment Is Everything

# Step-by-Step Implementation

# Step 1 — Declare the execution contract

# Step 2 — Filter early, before you aggregate

# Step 3 — Route explicitly to the destination bucket

# Step 4 — Compose stages into a lifecycle

# Configuration Reference

# Common Failure Modes and Fixes

# Verification and Testing

# Integration Points

# Frequently Asked Questions

# Why does my task work in the query editor but produce gaps once scheduled?

# Can I set concurrency or retries inside the option task block?

# How large should offset be for edge devices on flaky networks?

# Should createEmpty be true or false for intermittent sensors?

# How do I confirm a running task hasn’t silently stopped?

# Related

Explore this section

Related pages

Flux Scripting for Task Automation

Prerequisites

Core Concept: Why Window Alignment Is Everything

Step-by-Step Implementation

Step 1 — Declare the execution contract

Step 2 — Filter early, before you aggregate

Step 3 — Route explicitly to the destination bucket

Step 4 — Compose stages into a lifecycle

Configuration Reference

Common Failure Modes and Fixes

Verification and Testing

Integration Points

Frequently Asked Questions

Why does my task work in the query editor but produce gaps once scheduled?

Can I set concurrency or retries inside the `option task` block?

How large should `offset` be for edge devices on flaky networks?

Should `createEmpty` be true or false for intermittent sensors?

How do I confirm a running task hasn’t silently stopped?

Related