This week—and likely into next—my primary focus (together with Andre and our AI Agents) is a substantial refactor of contrib/omhttp
. We’re tracking the work in #5957 and will link the PR from there once it’s open.

Prerequisite: core fix that unlocked this work
Before touching omhttp
, we fixed a correctness issue in rsyslog core around transaction suspension/resume. That repair makes core-native retry reliable for HTTP actions and removes the historical rationale for complex module-local retry paths. With the core semantics in place, the refactor below becomes both feasible and worthwhile.
Why refactor omhttp
—focused on container realities
Running rsyslog in Docker/Kubernetes stresses HTTP outputs: ephemeral restarts, transient ingress/DNS issues, bursty backpressure, rolling updates, tight resource budgets. In this environment we need:
- Predictable retry behavior (no stalls when an auxiliary queue fills).
- Clear HTTP status semantics (no accidental retries on 4xx).
- Accurate per-record outcomes for partial batch successes.
- Graceful recovery from brief outages without duplicate storms.
omhttp
is already a solid contributed module with a few known (mostly historic) issues. We appreciate the original contribution and aim to polish it for modern, container-heavy deployments.
What will change (concise, technical)
1) Native transactions (commitTransaction()
)
- Migrate from the older
begin/do/end
path tocommitTransaction()
. - Use core batch visibility; map per-record outcomes precisely (partial success ⇒ selective retry).
- Return suspension only for transient failures so the core handles backoff uniformly.
Benefit: cleaner correctness under failure, fewer duplicates, simpler tuning at scale.
2) Retry defaults: core-native; RetryRuleset remains optional
- Default: suspend on retriable errors and let core retry.
- Keep
retryRuleset
as an optional path for advanced/exotic flows (e.g., special enrichment/routing on failure). We’ll document queue-pressure risks and when it makes sense.
Benefit: safe defaults for most users; power path for specialists.
3) HTTP status policy (explicit and predictable)
- 1xx/2xx ⇒ success
- 3xx ⇒ failure (non-retriable) for now (no implicit follow-redirect)
- 4xx ⇒ permanent failure (non-retriable) by default
- 5xx / transport failure (0) ⇒ retriable (suspend so core retries)
Overrides:
httpretrycodes
adds retriable codes (doesn’t convert failures to success).httpignorablecodes
can explicitly mark certain non-2xx as processed (applied after the base policy).
Benefit: matches real backend behavior; avoids retry storms on 4xx; robust on 5xx.
4) Batching as a thin serializer
- Keep newline/jsonarray/kafkarest/lokirest as formatters over core batches (not a parallel transaction system).
- Unify gzip and header lifecycle; ensure partial acceptance maps to per-record results.
Benefit: correctness + performance without ambiguity about “who owns the batch”.
5) Loki: from partial to first-class
batch.format=lokirest
exists; we’ll verify templates (timestamps/labels), recommendedrestpath
, headers, compression, and container-friendly defaults (label cardinality, batch sizing).- Ensure partial failures are handled per record and document copy-paste examples.
Benefit: straightforward, reliable Loki pipelines in containers.
What the PR will contain (overview)
We’ll link the PR from #5957. High-level contents:
- Transaction migration to
commitTransaction()
with precise per-record outcomes. - Retry changes: core-native by default;
retryRuleset
optional and documented. - HTTP semantics enforcement and coherent use of
httpretrycodes
/httpignorablecodes
. - Batching rework: serializers over core batches; unified gzip/headers.
- Loki specifics: validated payload/labels; container-ready examples.
- Quality & safety: review code for defects and fix; unsafe-code audit; improved doxygen.
- Docs & migration: clear migration notes; parameter tables; decision chart for status handling.
- Tests: status matrix (1xx..5xx + transport failure), partial success, suspend/resume, Loki conformance, perf counters, and queue-pressure scenarios.
Team and “AI First”
This effort is led by me, Andre, and our AI Agents—which we consider part of the team. In our responsible “AI First” approach:
- Agents propose code diffs, run PR checks (lint/style/invariants), and draft doc updates.
- They act as independent reviewers; humans remain in the loop for design and merges.
Timeline
- Focus window: this week, possibly next week.
- Follow progress and discussion in #5957; the PR will be linked there when ready.
If you run rsyslog in containers—especially with Loki—and have edge cases we should test, please comment on the issue. Your input helps us set the right defaults.