Schema transformation and the road to YAML in mmjsontransform

The recent GitHub issue #6251 marks another step in our ongoing modernization work. The topic sounds narrow — enhancements for mmjsontransform — but it connects to a much larger theme: how rsyslog handles structured data and how we can make that power easier to use.

Where this comes from

This work grew out of a migration project that’s still in progress. rsyslog sits at the center of the setup, feeding both an aging, partly unsupported SIEM and a new, fully supported platform. All logs continue to come in from the edge, unchanged. The transformation happens at the rsyslog hub: schema normalization, enrichment, timestamp cleanup, and redaction — the works. One pipeline, two destinations.

That’s a good example of what rsyslog does best: rock-solid, rocket-fast, and flexible enough to bridge systems without vendor lock-in. In many ways, it’s also a practical instance of what I call ROSI — the Rsyslog Open-Stack for Information— where rsyslog is the deterministic backbone holding different observability layers together.

The real point

rsyslog has been capable of schema transformation for years. Templates, variables, lookup tables, and modules like mmnormalize or mmjsonparse already let you reshape data in almost any way. But it’s not simple to use, and too few people even know it’s possible. That’s not a code problem — it’s a usability problem. Making powerful things straightforward is what modernization is really about.

Enter mmjsontransform

mmjsontransform was designed to make schema operations first-class citizens. Flattening and unflattening dotted keys, normalizing nested structures, cleaning up inconsistent field names — all of this at native speed inside rsyslog. It grew from real-world needs, not theory.

The next logical step is to let it take an external, reloadable transformation policy. That’s what #6251 introduces: an optional YAML file describing what should happen to a message — rename, coerce, validate, redact, set defaults, normalize timestamps. When the file changes, rsyslog reloads it on HUP. If it’s broken, the last known good version stays active. No downtime, no surprises.

YAML — just a means to an end

We’ve used JSON for similar policy definitions before. YAML comes in mainly for readability and consistency with what administrators already see elsewhere. But YAML itself isn’t the real news here. The actual step forward is declarative schema transformation — being able to describe complex rewrites in a single place, without needing half a dozen directives. YAML is simply the most ergonomic way to express it right now.

Looking forward

This fits neatly into the broader rsyslog modernization: making structured-data pipelines easier to build and reason about, from small edge nodes to central aggregation points. Most of the functionality already exists. The task now is to make it coherent, discoverable, and predictable. That includes better defaults, clear documentation, and tools that do the hard parts quietly.

The project is ongoing and will no doubt bring more lessons — probably a few unexpected ones. But the direction is clear: rsyslog already knows how to handle structure; we’re just making that knowledge more accessible.

You can follow the discussion at GitHub issue #6251.