AI Code Generation in a 200k LOC C Codebase: What Actually Worked in rsyslog

If you want the CS summary: it is at the end.

I keep seeing the same take pop up: “AI is overhyped. Mostly money burning.” Sure. There is hype. There is also a whole lot of low-effort “vibe coding” that produces low-quality output at impressive speed.

Illustration of a robotic hand carefully adjusting gears in a large rsyslog C codebase machine, symbolizing AI-assisted maintenance of mature software.

But there is also something else: if you treat AI as a serious engineering tool and you are willing to do the unglamorous work, it can make measurable difference and boost productivity and quality.

So here is a concrete case study: agentic code work in rsyslog.

The uncomfortable part: rsyslog is not a demo repo

Rsyslog started life as a syslog daemon. Today it is also a building block in logging pipelines and data processing workflows. In practice, it is infrastructure software that people notice only when it fails. So we try hard to not fail.

The codebase is also not what AI agents are usually shown with:

  • forked as rsyslog in 2004, with roots going back to 1980s syslogd
  • primary language: C; build system: GNU Autotools
  • roughly 200k lines of C code (plus a much larger repo with docs, containers, CI, tests)
  • modular, microkernel-ish architecture
  • complex multi-threaded behavior
  • heavy use of C preprocessor macros for portability
  • mixed idioms due to long history and multiple refactors
  • a style that is not always what a model “expects” when it dreams of modern code

This matters because most public “AI agent” success stories are built on small projects with current idioms, modern formatting, and a low cognitive load. That is fine. It is also not my world.

Our goal anyway: use agentic AI to help maintain and evolve this codebase, and over time move further toward generation from specifications.

The timeline: from “nope” to “ok, this is real now”

We started testing AI code generation when early ChatGPT versions became public.

  • 2023: mostly unusable for serious C work in this repo
  • 2024: usable in narrow niches, but did not increase productivity overall; often cost extra time
  • 2025: a noticeable step change; asynchronous coding agents (in our workflow, OpenAI Codex) pushed things into “prime time”

The important part is not that the model got better (it did). The important part is that we learned what we had to change around it.

What actually improved results

Early on, an agent could generate code that was “technically plausible”, but integrating it was painful. The fixes were not magic. They were boring engineering work.

1) Give the agent a map of the repo

We added repository documentation specifically for agents. One example is an AGENTS.md file that states project rules and expectations:

  • how we structure changes
  • what “done” means
  • how to run tests
  • formatting rules
  • preferred patterns and the stuff we do not want to see again

This is not about “teaching the agent everything”. It is about removing ambiguity and making the agent less creative in the places where creativity is expensive.

2) Improve inline documentation in the C sources

Agents do not read your mind. They read what is in front of them.

When we improved inline comments and interface-level documentation, the agent made fewer wrong assumptions and produced patches that were more consistent with project intent.

This is the same reason good documentation helps humans. The difference is: the agent will happily hallucinate a missing interface contract and then implement it with confidence.

3) Refactor ancient idioms that models barely saw in training

Some of our older constructs are… historically interesting. That is a polite way to say “you can tell they were written before the iPhone existed”.

When an agent repeatedly failed in the same area, we did a root-cause analysis and sometimes changed the idiom to something clearer and more common. That reduces both agent error and human maintenance cost.

This is also where you discover a fun truth: “AI-friendly” often overlaps with “maintainer-friendly”. Not always, but often enough to matter. To be honest, some of this things should probably have been addressed even ten years ago to make it easier for humans.

4) Tabs. Yes, tabs.

One of the largest practical problems was indentation with tab characters.

Not because tabs break code generation. The agent can generate correct code with tabs. The problem is that the agent also likes to “fix” whitespace, and then you get massive diffs where nothing changed except spacing. Review becomes annoying, risk increases, and the patch looks like it rewrote the universe. A core problem, here is AI-Agent nondeterminisim, which made it introduce and change whitespace pretty often.

We solved this by switching to space indentation. The ultimate solution though was to enforce a canonical format via a deterministic clang-format run.

It sounds trivial. It was not trivial. But it did eliminate a whole class of pointless diff noise. I would like to claim this was a deep research insight. It was not. It was “git diff is unreadable, make it stop”.

As a side-note: clang-format also tends to change the canonical format between versions. To solve this with a large contributor basis, we insist currently on clang-format version 18, to be bumped every now and then.

The part people miss: AI support is ongoing maintenance

This is not a one-time setup. Models change, behavior changes. Also, you only discover failures in the code you touch. In a project the size of rsyslog, there will always be some corner an agent effectively, or at least the current model, “has never seen before”.

So the workflow is:

  • agent fails or produces low-quality output
  • we diagnose why
  • we add guardrails: docs, refactors, tests, clearer interfaces
  • repeat

Over time, the code evolves, and the AI support embedded in the repo evolves with it. The result is not “AI writes everything”. The result is that the agent becomes more reliable where we have invested, and we do not waste time rediscovering the same failure modes.

One more practical note: sometimes the agent gets stuck in a dead end. When that happens, the correct response is not to argue with it. The current correct response is to take over manually or split the task into smaller steps.

A common root cause is overly optimistic task sizing. Agents that can plan and decompose tasks automatically help, but you still need a human to apply judgement.

Can you trust AI-generated code in infrastructure?

Yes. The same way you trust human-generated code: you do not.

You verify it.

Infrastructure projects are a target for both accidental bugs and deliberate malicious contributions. Whether code came from a person or an AI is not the key question. The key question is: what did your process do to detect problems?

Rsyslog has long relied on a strict process:

  • extensive automated tests
  • runs across multiple platforms and compiler settings
  • sanitizers for memory, threading, and undefined behavior
  • static analysis with multiple tools
  • code standard checks (including formatting)
  • final manual review before merge

This makes it hard for low-quality patches to slip through (nothing is perfect, but we try).

We also added a mandatory automated AI review step. It is not always right and it sometimes overreacts. But it often finds nits that would otherwise pass CI and human review. For maintainers, it is especially useful as a first-pass assessment on complex patches.

If you are a maintainer, you know the value of “cheap early signal”, even if it is noisy.

So, is the “AI is hype” take wrong?

It is incomplete. AI is hype in the sense that a lot of people expect it to behave like a senior engineer without investing anything. That does not work. In serious projects, it fails quickly and loudly.

But AI can be useful, even in a mature C infrastructure codebase, if you invest in the surrounding engineering:

  • agent-focused documentation
  • better inline contracts
  • refactoring ancient patterns that keep causing errors
  • strong CI and review discipline
  • task sizing and planning that matches reality

In short: the payoff is real, but it is not free. That is not a shocking conclusion. It is engineering.

And yes, it is oddly satisfying when a tool marketed with glossy demos ends up being made useful by… indentation rules and better comments. Welcome to the real world.

What’s next

  • keep tightening agent guidance (repo docs, constraints, preferred patterns) – this is also often a frustrating part, especially as agents evolve (and not always for the better)
  • keep refactoring “historically interesting” code where it blocks maintainability
  • continue pushing more work into reproducible, testable automation
  • gradually raise the abstraction level: from “code generation” toward “spec-driven changes”

CS summary

  • Context: rsyslog is a large, mature, multi-threaded C codebase (~200k LOC) with long-lived idioms and portability macros. This is not a typical AI agent demo environment.
  • Observation: early AI code generation (2023–2024) was mostly unproductive for this repo; 2025 brought a step change with asynchronous coding agents.
  • Key enabling work: agent-focused repo documentation (e.g., AGENTS.md), improved inline documentation/contracts, and refactoring rare/ancient idioms that caused repeated agent failures.
  • Unexpected blocker: tab indentation caused large, noisy diffs; switching to space indentation reduced review friction significantly.
  • Process conclusion: AI-assisted development is viable for infrastructure, but only with strong CI, static/dynamic analysis, and mandatory human review. AI origin does not reduce verification requirements.
  • Strategic view: treat “AI support” as ongoing maintenance; iterate based on observed failure modes and continuously refine the development process alongside the code.