Bucket Architecture & Tiering Boundaries

An IoT platform rarely fails at the moment of ingestion. It fails months later, when a single all-purpose bucket that once served real-time dashboards is now holding a year of 1-second telemetry, every query scans shard groups it does not need, and a compaction cycle stalls writes at the worst possible hour. Bucket architecture is the structural decision that prevents that outcome: mapping data temperature to explicit storage tiers, sizing shard groups to the retention window, and moving data between tiers with automated tasks rather than manual sweeps. This page sits under InfluxDB Data Lifecycle & Architecture Fundamentals and covers how to define those tier boundaries, provision the buckets that back them, and automate the hot-to-warm-to-cold transitions that keep a time-series platform fast and affordable as device fleets grow.

The failure scenario this solves

A fleet of 40,000 environmental sensors writes temperature, humidity, and voltage at 1 Hz into a bucket named telemetry. The bucket was created with the default 7-day shard group duration and an infinite retention window because “we might need the history.” For the first quarter everything is fine. By month six three things have gone wrong at once.

First, real-time dashboards that query the last 15 minutes have slowed from 40 ms to several seconds, because every query opens Time-Structured Merge (TSM) files across dozens of shard groups full of data no dashboard will ever read. Second, the series index has crossed several million entries — the device_id tag alone is 40,000 values, multiplied across measurements and fields — and the storage engine is now spilling the index to disk and paging it back under memory pressure. Third, retention cannot reclaim anything cheaply: with an infinite window nothing expires, and even after an operator sets a window, the coarse shard-group boundary means expiry drops data in clumsy 7-day blocks that rarely align to what is actually cold.

The root cause is not any single setting. It is the absence of tier boundaries — the platform never decided that recent, high-resolution data and old, low-resolution data belong in physically different buckets with different shard sizing and different retention. The remainder of this page builds those boundaries deliberately: a hot bucket tuned for low-latency reads, a warm bucket of aggregates for trend analysis, and a cold bucket for compliance-grade retention, wired together by scheduled tasks that move data down the temperature gradient exactly once.

Prerequisites

InfluxDB 2.7+ (or InfluxDB 3.x Cloud Dedicated) with the native task engine enabled.
Flux 0.x query language (bundled with the versions above; no separate install).
Python 3.9+ and influxdb-client 1.36+ for programmatic bucket and task provisioning.
An operator or all-access token scoped to read/write on the tier buckets and write on the _tasks system bucket.
A naming convention agreed up front — this page uses <domain>_<resolution>_<tier> (e.g. sensor_1s_hot, sensor_1m_warm, sensor_1h_cold).
Familiarity with the storage layout and lifecycle stages described in the parent InfluxDB Data Lifecycle & Architecture Fundamentals guide.

Core concept: data temperature and tier boundaries

A bucket is InfluxDB’s primary namespace for both isolation and retention. Every bucket carries two properties that together define a tier: a retention window (how long data lives before the engine drops it) and a shard group duration (the time span of each internally immutable shard). Data whose timestamp falls inside a shard-group window lands in the same shard, and because retention expires whole shard groups rather than individual points, the shard duration sets the granularity at which storage can be reclaimed.

Tiering means deciding, per bucket, where on the temperature gradient the data sits:

Hot — the freshest, highest-resolution data (seconds), read constantly by dashboards and alerts. Narrow retention (7–30 days), short shard groups, provisioned on the fastest storage available.
Warm — downsampled aggregates (minute/hour resolution) for trend analysis and reporting. Medium retention (90 days–1 year), longer shard groups, lower query frequency.
Cold / archive — heavily downsampled or raw-but-rarely-touched data kept for compliance and audit. Long retention (years), the longest shard groups, optimized for storage density over read latency.

The payoff of moving data down this gradient is quantifiable. For a source series written at interval (i_s) and a downsampled tier written at interval (i_r), the approximate reduction in stored points is

[ R = \frac{i_r}{i_s} ]

so rolling a 1-second hot stream into a 1-minute warm tier yields (R = 60) — sixtyfold fewer points before compression — and a further roll to a 1-hour cold tier compounds to (R = 3600) against the original. That compounding is why the warm and cold tiers can hold longer history on less disk than the hot tier holds for a single week.

The other half of the concept is shard-group sizing. As a rule of thumb, the shard group duration should be roughly one order of magnitude smaller than the retention window: a 7-day hot bucket wants a 1-day shard duration, a 1-year cold bucket a 7-day (or larger) shard duration. Shards that are too short multiply file handles and compaction overhead; shards that are too long make retention reclaim storage in coarse, wasteful blocks and force queries to scan more data than a tight range() should need. The detailed partition-key and cardinality decisions inside a single tier — how to keep the series index bounded so the hot bucket stays fast — are covered in best practices for bucket partitioning in IoT telemetry.

Step-by-step implementation

1. Name the tiers and provision the buckets

Encode resolution and temperature into the bucket name so tasks, dashboards, and infrastructure-as-code can discover tiers programmatically instead of hard-coding them. Create each bucket with a retention and shard duration matched to its tier. Using the CLI:

bash

# Hot: 7-day retention, 1-day shards, fastest storage.
influx bucket create --name sensor_1s_hot  --retention 7d   --shard-group-duration 1d  --org "$INFLUX_ORG"

# Warm: 90-day retention, 7-day shards.
influx bucket create --name sensor_1m_warm --retention 90d  --shard-group-duration 7d  --org "$INFLUX_ORG"

# Cold: 1-year retention, 7-day shards.
influx bucket create --name sensor_1h_cold --retention 52w  --shard-group-duration 7d  --org "$INFLUX_ORG"

The critical parameter here is --shard-group-duration: leaving it at the default lets InfluxDB pick a value from the retention window, which is usually reasonable for warm and cold tiers but too coarse for a 7-day hot bucket that benefits from 1-day shards for fine-grained expiry and tighter query pruning.

2. Provision the same tiers as code

For reproducible environments, create buckets through the client so definitions live in version control alongside the tasks that feed them. This mirrors the Python client orchestration patterns used elsewhere in the platform.

python

import os
from influxdb_client import InfluxDBClient, BucketRetentionRules

client = InfluxDBClient(
    url=os.environ["INFLUX_URL"],
    token=os.environ["INFLUX_TOKEN"],
    org=os.environ["INFLUX_ORG"],
)
buckets_api = client.buckets_api()

TIERS = [
    ("sensor_1s_hot",  7 * 86400,  86400),      # (name, retention_s, shard_s)
    ("sensor_1m_warm", 90 * 86400, 7 * 86400),
    ("sensor_1h_cold", 365 * 86400, 7 * 86400),
]

for name, retention_s, shard_s in TIERS:
    rule = BucketRetentionRules(
        type="expire",
        every_seconds=retention_s,
        shard_group_duration_seconds=shard_s,   # tier-specific shard sizing
    )
    buckets_api.create_bucket(
        bucket_name=name,
        retention_rules=rule,
        org=os.environ["INFLUX_ORG"],
    )
    print(f"provisioned {name}: retention={retention_s}s shard={shard_s}s")

3. Author the hot-to-warm downsampling task

The transition between tiers is a scheduled task that reads the hot bucket, aggregates to the warm resolution, and writes into the warm bucket. Anchor the read to the last successful run so the task resumes cleanly after an outage instead of silently skipping a window, and set an offset so late-arriving edge packets have landed before the window is read — the offset and window-sizing mechanics are the subject of cron & interval scheduling logic.

flux

import "influxdata/influxdb/tasks"

option task = {
    name: "sensor_hot_to_warm",
    every: 5m,
    offset: 1m,          // let late edge packets settle before reading
    concurrency: 1,      // never overlap runs -> no duplicate aggregates
}

from(bucket: "sensor_1s_hot")
    |> range(start: tasks.lastSuccess(orTime: -15m))   // resume from last good run
    |> filter(fn: (r) => r._measurement == "sensor_readings")
    |> filter(fn: (r) => r._field == "temperature_c" or r._field == "voltage_v")
    |> aggregateWindow(every: 1m, fn: mean, createEmpty: false)
    |> to(bucket: "sensor_1m_warm", org: "iot-platform")

Two parameters carry disproportionate weight. createEmpty: false stops aggregateWindow from emitting null rows for minutes where a sensor was silent, which would otherwise inflate warm-tier cardinality and storage. concurrency: 1 guarantees a slow run never overlaps its successor — the single most common cause of double-counted rollup points. The discipline of writing these scripts so they stay correct under retries is developed in Flux scripting for task automation.

4. Author the warm-to-cold rollup task

The cold tier compounds the reduction by aggregating the warm aggregates on a slower cadence. Run it calendar-aligned with cron so the daily rollup lands at a predictable time, and give it a generous offset so the warm tier has finished its own writes first.

flux

import "influxdata/influxdb/tasks"

option task = {
    name: "sensor_warm_to_cold",
    cron: "0 2 * * *",   // 02:00 UTC daily
    offset: 15m,
}

from(bucket: "sensor_1m_warm")
    |> range(start: tasks.lastSuccess(orTime: -25h))
    |> filter(fn: (r) => r._measurement == "sensor_readings")
    |> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
    |> to(bucket: "sensor_1h_cold", org: "iot-platform")

Choosing which aggregate functions to carry into each tier — mean for trends, max/min for envelope preservation, percentiles for SLO reporting — is a pipeline design question covered under downsampling & aggregation pipeline design. The tiering here provides the destinations; that work chooses what to compute for each one.

Configuration reference

Setting	Accepted values	Default	Effect
`retention` / `every_seconds`	duration (`7d`, `90d`, `52w`) or `0` for infinite	`0` (infinite)	How long data lives before the engine drops its shard groups. Set per tier; never leave a hot bucket infinite.
`shard-group-duration`	duration literal	derived from retention	Time span of each immutable shard. Governs expiry granularity and query pruning. Aim ~1/10 of the retention window.
`name`	string	— (required)	Tier identifier. Encode domain, resolution, and temperature (e.g. `sensor_1m_warm`) for programmatic discovery.
`every` (task)	duration literal	—	Cadence for continuous hot→warm transitions. Mutually exclusive with `cron`.
`cron` (task)	5- or 6-field expression	—	Calendar-aligned cadence for daily warm→cold rollups. Evaluated in UTC.
`offset` (task)	duration literal	`0s`	Delays execution past the fire time so late data and upstream writes settle; does not shift the query window.
`concurrency` (task)	integer	`1`	Max simultaneous runs. Keep at `1` on transition tasks so windows never overlap and duplicate.

Duration literals accept ns, us, ms, s, m, h, d, w, mo, y. A hot bucket typically pairs a 7d–30d retention with a 1d shard group; warm and cold tiers pair multi-month or multi-year retention with 7d or larger shards.

Common failure modes and fixes

1. Shard group duration mismatched to retention. Symptom: storage barely drops when data expires, or the shard count per bucket climbs into the hundreds with sluggish compaction. Root cause: shards far larger than needed reclaim space in coarse blocks; shards far smaller multiply file handles. Fix: size the shard group to roughly one tenth of the retention window and recreate the bucket if the mismatch is severe (shard duration cannot be shrunk in place for existing shards).

bash

# Hot bucket: 7d retention wants ~1d shards, not the default 7d.
influx bucket update --id "$HOT_BUCKET_ID" --shard-group-duration 1d

2. Over-partitioning into too many buckets. Symptom: dozens of near-empty buckets, cross-tier analytical queries that need brittle union() calls, and rising operational toil. Root cause: treating buckets as an organizational filing system rather than as tier/temperature boundaries. Fix: partition by access pattern and retention, not by tenant-of-the-week or by month; keep the tier count small (hot/warm/cold) and use tags for finer slicing.

3. Hot tier retention too long, cardinality blows up. Symptom: memory pressure, index spilling to disk, and hot-tier query latency degrading over weeks. Root cause: high-resolution, high-cardinality data lingering in the hot tier past its useful window. Fix: tighten hot retention to the true dashboard window and let the downsampling task carry aggregates onward. Keep high-cardinality identifiers as fields, not tags, before they reach the warm tier.

4. Gap or double-write at the tier transition. Symptom: the warm bucket is missing the most recent window, or aggregates look inflated after retuning. Root cause: a fixed range(start: -Nm) that either underruns the cadence or overlaps a previous run without deterministic boundaries. Fix: anchor reads with tasks.lastSuccess() and snap aggregates to fixed boundaries (aggregateWindow(every: 1m, ...)) so reprocessed windows overwrite the identical series/timestamp key rather than appending.

5. Transition token over-privileged. Symptom: a single compromised task token can read and write every bucket in the org. Root cause: using an all-access token for cross-bucket transitions. Fix: scope transition tokens to read on the source tier and write on the destination tier only — the least-privilege pattern developed in data ingestion security frameworks.

Verification and testing

Confirm tier boundaries are working by checking three things: that each tier’s cardinality stays inside its budget, that the transition tasks are actually landing data, and that a stalled transition raises an alert instead of failing silently.

Check the hot tier’s series cardinality against its budget:

flux

import "influxdata/influxdb/schema"

schema.cardinality(bucket: "sensor_1s_hot", start: -7d)

Confirm the warm tier received aggregated points from the most recent transitions:

flux

from(bucket: "sensor_1m_warm")
    |> range(start: -30m)
    |> filter(fn: (r) => r._measurement == "sensor_readings")
    |> count()

Add a deadman health check so a stalled hot→warm transition pages an operator when the warm tier stops receiving data:

flux

import "influxdata/influxdb/monitor"
import "experimental"

from(bucket: "sensor_1m_warm")
    |> range(start: -30m)
    |> filter(fn: (r) => r._measurement == "sensor_readings")
    |> monitor.deadman(t: experimental.subDuration(from: now(), d: 30m))
    |> filter(fn: (r) => r.dead == true)

From the CLI, verify the tier buckets exist with the retention and shard sizing you intended before trusting the pipeline:

bash

influx bucket list --org "$INFLUX_ORG"

A lightweight Python monitor can assert the same invariant in CI or a scheduled check, raising if a transition has stalled:

python

import os
from influxdb_client import InfluxDBClient

client = InfluxDBClient(url=os.environ["INFLUX_URL"], token=os.environ["INFLUX_TOKEN"], org=os.environ["INFLUX_ORG"])
q = '''
from(bucket: "sensor_1m_warm")
    |> range(start: -30m)
    |> count()
'''
tables = client.query_api().query(q, org=os.environ["INFLUX_ORG"])
total = sum(rec.get_value() for t in tables for rec in t.records)
if total == 0:
    raise RuntimeError("hot->warm transition stalled: no points in warm tier")
print(f"warm tier healthy: {total} points in last 30m")

Integration points

Bucket tiers are the storage substrate the rest of the lifecycle sits on. The retention windows chosen per tier are only half of the retention story — bucket expiry drops shard groups, but selective purges and legacy-policy migration belong to retention policy design, where expiry windows must be at least as long as the slowest downsampling task’s coverage. The transition tasks that move data between tiers are triggered by the cadence rules in cron and interval scheduling, and their transformation logic is written with the retry-safe Flux discipline linked above. When a downstream tier degrades — a warm bucket under disk pressure, or a cold cluster mid-maintenance — the ingestion path must not drop telemetry; queuing and secondary-cluster routing are the province of fallback routing & high availability. And every tier boundary is ultimately in service of the rollups defined in the downsampling work: the tiers are where those aggregates live, and the schedules are what keep them fresh.

FAQ

How many tiers should an IoT platform actually run?

Three is the common sweet spot: hot for real-time reads, warm for trend analysis, cold for compliance. Add a fourth archive tier only when regulatory retention genuinely exceeds what a cold bucket should hold. More tiers than that usually signals partitioning by organizational unit rather than by access pattern, which multiplies operational cost without improving query performance.

Can I change a bucket’s shard group duration after it has data?

You can update the setting, but it only applies to shard groups created after the change; existing shards keep their original duration. When the mismatch is severe — a 7-day hot bucket stuck on 7-day shards — the clean fix is to create a correctly sized bucket and let the transition task backfill, then retire the old one.

Should high-cardinality tags like `device_id` live in the hot tier?

Yes in the hot tier where per-device granularity drives alerting, but they should be dropped from the grouping before data reaches warm and cold. Downsampling tasks that group() on a stable, low-cardinality key set keep the aggregate tiers’ series counts bounded. Anything you only read back rather than filter or group on belongs in a field, not a tag.

Does moving data to a warm bucket delete it from the hot bucket?

No. The transition task writes aggregates into the warm bucket; the hot data is removed only when its own retention window expires. That separation is deliberate — it means a failed transition can be re-run over the same window while the source data still exists, which is what makes the pipeline idempotent and recoverable.

How do I size the hot retention window?

Set it to the longest range your real-time dashboards and alerts actually query, plus a small margin for late data. If nothing reads raw 1-second data older than 7 days, the hot window should be about 7 days; everything beyond that lives more cheaply as aggregates in the warm and cold tiers.

Best practices for bucket partitioning in IoT telemetry — cardinality budgets and partition keys that keep each tier fast.
Retention policy design — set bucket-level expiry so shard groups drop cleanly and storage stays bounded.
Fallback routing & high availability — keep telemetry flowing when a tier or cluster degrades.
Data ingestion security frameworks — scope transition and write tokens to least privilege.
Downsampling & aggregation pipeline design — choose the aggregate functions that populate each tier.

Up one level: InfluxDB Data Lifecycle & Architecture Fundamentals

# Bucket Architecture & Tiering Boundaries

# The failure scenario this solves

# Prerequisites

# Core concept: data temperature and tier boundaries

# Step-by-step implementation

# 1. Name the tiers and provision the buckets

# 2. Provision the same tiers as code

# 3. Author the hot-to-warm downsampling task

# 4. Author the warm-to-cold rollup task

# Configuration reference

# Common failure modes and fixes

# Verification and testing

# Integration points

# FAQ

# How many tiers should an IoT platform actually run?

# Can I change a bucket’s shard group duration after it has data?

# Should high-cardinality tags like device_id live in the hot tier?

# Does moving data to a warm bucket delete it from the hot bucket?

# How do I size the hot retention window?

# Related

Explore this section

Related pages

Bucket Architecture & Tiering Boundaries

The failure scenario this solves

Prerequisites

Core concept: data temperature and tier boundaries

Step-by-step implementation

1. Name the tiers and provision the buckets

2. Provision the same tiers as code

3. Author the hot-to-warm downsampling task

4. Author the warm-to-cold rollup task

Configuration reference

Common failure modes and fixes

Verification and testing

Integration points

FAQ

How many tiers should an IoT platform actually run?

Can I change a bucket’s shard group duration after it has data?

Should high-cardinality tags like `device_id` live in the hot tier?

Does moving data to a warm bucket delete it from the hot bucket?

How do I size the hot retention window?

Related