liblognorm’s “rest” parser now more useful

The liblognorm “rest” parser was introduced some time ago, to handle cases where someone just wants to parse a partial message and keep all the “rest of it” into another field. I never was a big fan of this type of parser, but I accepted it because so many people asked. Practice, however, showed that my concerns were right: the “rest” parser has a very broad match and those that used it often got very surprising results.

A key cause of this issue was that the rest parser had the same priority as other parsers, and most importantly a higher priority than a simple character match. so it was actually impossible to match some constant text that was at the same location than the “rest” parser.

I have now changed this so that the rest parser is always called last, if no other thing matches – neither any parser nor any constant text. This will make it work much more like you expect. Still, I caution against using this parser as it continues to provide a very broad match.

Note that the way I have implemented this is not totally clean from a software engineering point of view, but very solid. A cleaner solution will occur during the scheduled rewrite of the algorithm (later in spring/summer).

Note that existing rulebases using “rest” may behave differently with the new algorithm. However, previously the result was more or less random, so any other change to the rulebase could also have caused different behaviour. So this is no compatibility break as there really is no compatibility to retain.

This will be released with 1.1.2, probably in early may. If you need it urgently, you can use a daily build.

Call for Log Samples

There is one big problem in research for better logging methods: no good logging sample repositories exist. Well, not even bad ones… I am currently doing some preliminary steps towards a new, better log normalization system. Among others, it will contain a structure analyzer which will remove much of the manual burden of creating normalization rules. But, guess what: while the project looks very promising, lack of log samples is a real big problem!

To solve that problem, I have setup a public log ingestor that you can simply send logs to. The system is reachable as follows:
hostname: logsubmit.rsyslog.com
port: 514
protocol: any flavor of syslog or other text data

If you run rsyslog, you can use add this snippet to /etc/rsyslog.conf:
*.* @logsubmit.rsyslog.com
How did this idea materialize? During my talk at the German Unix User’s Group FFG 2015 conference last week in Stuttgart, I mentioned that problem and Dirk Wetter had the idea to provide a log receiver that makes it very easy for people to contribute. There were some concerns that this may open up my server for DoS, and that of course is true. Nevertheless, I liked the idea and so we setup a machine today. It may be DDoS’ed and other bad things may happen, but then we got more experience. It’s split from the main systems, so that shouldn’t cause much harm.
For log contributors, please keep on your mind that you send data to a public service and so this is probably not a great idea to do this for sensitive systems. But if we get enough data from uncritical systems, we can still gain a lot from that, most importantly it helps us gain insight into structural log mining methods — which will also lead to above-mentioned tool. All logs gathered by this method will be placed in the research log repository, which currently is hosted on github. It is licensed under BSD 2-clause in the hope that a sufficiently large and diverse data set is also of great value for other researchers (did I mention it is ultra-hard to find any log sample data sets?). If you are interested in cotributing logs, but would want to do so under NDA, that’s of course also possible. In that case, please just drop me an email to see how to best go forward with that.

RFC5424 or cee-enhanced syslog?

I  recently got a question if it would be better to implement RFC5424 or cee-enhanced syslog in a new project. This sounds like a though question, but looking at the details helps to ease it a bit.

First of all, RFC5424 has a far broader scope than cee-enhanced syslog. RFC5424 is the base RFC of a new series of syslog RFCs that solve many problems, among them (this list for sure will not be complete):

  • finally a standardized syslog format (no standard exists before, just a weak understanding)
  • supports multiple transports, e.g. UDP and TLS (all in their own RFCs)
  • solves some long-standing nits, like the date format (RFC5424 supports high precision and demands a year as part of the timestamp)
  • increases the syslog message size and permits operators to set an (almost) arbitrary upper limit for message sizes
  • supports structured logging via so-called “structured data”

In contrast, cee-enhanced syslog only cares about the structured logging feature – none of the others is addressed with it. This is the case because its intent is different: the goal here is to have a format that can work with existing technology and still provide the best possible structured logging solution. Depending on circumstances, “best possible” can be quite different. For example, in a scenario where it needs to work with crufty old syslogds, it can mean that message content can not be longer than 1k, a serious restriction. Even more, it is envisioned that cee-enhanced syslog will also work over non-syslog transport, and may even be transported by simply copying (and merging) files.

CEE has long thought about using RFC5424 structured data format for its structured logging needs. However, that would have limited its ability to work with all these older (and deployed) syslog implementations. Thus, the decision was made to use text that purely resides inside the message part and, without transformation, can be used in legacy systems (with given restrictions, of course).

So now back to the question: what to implement? With this facts, I think it boils down to “both”. To use the benefits of modern-day syslog (as described above), you definitely need to implement RFC5424 and its helper RFCs (like 5425 for TLS). HOWEVER, the question remains if one should use structured data or cee-enhanced syslog. This, I have to admit, is indeed a though question.

My personal opinion, which can be totally wrong and is solely based on my experience (aka “gut feeling” ;)), cee-enhanced syslog sounds more promising for structured data. The main reason for this is that the format has received very good responses from major players and a lot of work is currently being done by a lot of folks driving important projects in the logging environment. There is also a lot of supporting libraries upcoming AND some of them already available. Most importantly, this effort is thightly integrated with Mitre and  it probably is not to far-fetched to assume that cee-enhanced syslog will appear on some purchasing checklists in the not so distant future. RFC5424 structured data does not have this broad backing. So while cee-enhanced syslog is a very fresh project, my personal assumption is that it will take off.

But again, that does not mean that RFC5424 shall not be implemented. I think RFC5424 technology is vital for a number of reasons. Of course, things like TLS can also be obtained without RFC5424, but not in a standards-compliant way. So to ensure interoperability, going the 5424 way is a good idea.

finally… rsyslog agent for windows released

It’s done! We have finally released the rsyslog Agent for Windows, a nice piece of software that enables easy integration of Windows Event Logs into a rsyslog backend system. Ideas for this tool floated around for roughly four to five month, and we had lots of internal discussions. It is important to note that we at Adiscon already have the necessary technology as part of our Windows products (actually, we invented this whole event-log-to-syslog type of software…), so it was just a matter of fine-tuning the code and selecting some useful default settings and policies.

The release is important because it makes clear that there actually is a Windows component (while I tried to convey that several times, people most often did not realize it due to name differences – something with “rsyslog” inside the name was expected). It is also important for me at Adiscon internally: the rsyslog Agent is a commercial product and license sales will make clear that this business is driven by rsyslog. And, obviously, I hope that this will help fund the project without need to resort to other things like premium plugins. This is also why the Agent is so important for the rsyslog project as whole: it will hopefully help to stabilize the funding situation even more.

EDIT: I should probably mention that the Windows Agent was more or less ready when I held its release in order to integrate support for cee-enhanced syslog into it. I am glad I could convince my folks at Adiscon, so that we now have this exciting feature actually available.

CEE-enhanced syslog defined

CEE-enhanced syslog is an upcoming standard for expressing structured data inside syslog messages. It is a cross-platform effort that aims at making log analysis (and log processing in general) much more easy both for log producers and consumers. The idea was originally born as part of MITRE’s CEE effort. It has been adopted by a larger set of logging stakeholders in an initiative that was named “project lumberjack“. Under this project, cee-enhanced syslog, and a framework to make full use of it, is being openly advanced. It is hoped (and planned) that the outcome will flow back to the CEE standard.

In a nutshell cee-enhanced syslog is very simple and powerful: inside the syslog message, a special cookie (“@cee:”) is followed by a JSON representation of the data. The cookie tells processors that the format is actually cee-enhanced. If you are interested in a more technical coverage, have a look at my cee-enhanced syslog howto presentation.

Adiscon is one of the main supporters of project lumberjack and CEE enhanced syslog. Since February 2012, Adiscon products offer basic support for cee-enhanced syslog, being among the first tools to do so.

What is CEE-enhanced syslog?

I just did a quick presentation on what cee-enhanced syslog actually is and how it works. I suggest to have at least a peek, as this format will probably become very important in the future. But why say more…  just get the full story in 5 minutes ;)

cee-enhanced event log to syslog forwarding

As many know, we at Adiscon also work hard at Windows Event Log to syslog forwarding software. During the past days we have taken the time to implement cee-enhanced syslog format inside these products as well. It is currently a proof of concept stage, but mostly because the relevant specs are also at PoC. This effort nicely integrated with the new project lumberjack, which aims at providing structured logging. New releases of the relevant Windows products (EventReporter and MonitorWare Agent) will be released very soon. With these releases, we are again the first-ever folks to release something never seen before, this time CEE support for windows logging ;)

But how does it work? Basically, it is a message format option of the “format syslog” option. If you select cee-enhanced syslog, messages will be emitted in that format. Most importantly, they will included nice name/value pairs of the Windows events (if Windows provided names, else the previous “Paramn” replacement names will be used). For example, a security event is described as follows:

@cee: {“source”: “machine.local”, “nteventlogtype”: “Security”, “sourceproc”: “Microsoft-Windows-Security-Auditing”, “id”: “4648”, “categoryid”: “12544”, “category”: “12544”, “keywordid”: “0x8020000000000000”, “user”: “N\A”, “SubjectUserSid”: “S-1-5-11-222222222-333333333-4444444444-5555”, “SubjectUserName”: “User”, “SubjectDomainName”: “DOMAIN”, “SubjectLogonId”: “0x5efdd”, “LogonGuid”: “{00000000-0000-0000-0000-000000000000}”, “TargetUserName”: “Administrator”, “TargetDomainName”: ” DOMAIN “, “TargetLogonGuid”: “{00000000-0000-0000-0000-000000000000}”, “TargetServerName”: “servername”, “TargetInfo”: ” servername “, “ProcessId”: “0x76c”, “ProcessName”: “C:\Windows\System32\spoolsv.exe”, “IpAddress”: “-“, “IpPort”: “-“, “catname”: “Logon”, “keyword”: “Audit Success”, “level”: “Information”}

Note that we currently focus on cee-enhanced syslog format. We did not yet try to map the Windows field names to the CEE dictionary/profile terms. Probably the most important reason for this focus is that we do not yet have any definite spec to write to. Obviously, once the spec is out, it is fairly easy to upgrade the implementation to support these other names.

A co-worker is right now doing some more testing with rsyslog, which is able to understand that new format. I’ll update you with the findings, and procedures, once they are ready.

Disappointing press reactions…

Friday, 13th… Systemd v38 went out recently and it includes a journald test release. That’s great and congrats to the systemd team to that release. What me saddens is the that the press still conveys only Lennart’s position on how bad syslog is. The counter arguments (http://blog.gerhards.net/2011/11/serious-syslog-problems.html) are mentioned nowhere. Lesson learned? All it takes is some interesting-sounding claims and everyone believes them. Does anybody still wonder why I think journald will become standard [and would do so even if it were totally evil]? ;-)

But don’t get me wrong: while I think the journald proposal and the way it is getting pushed has some flaws, I also see good in it. My initial reaction still stands: http://blog.gerhards.net/2011/11/journald-and-rsyslog.html Unfortunately, the world, and us, the masses, seems not to be prepared for weighted arguments… On to the weekend, have a great one :-)

Feedback Request for digitally signed log store

I have just written about how I plan to implement digital signatures in LogStore, the secure store used by LogTools. The log store digital signature proposal details how and when signatures are written and provides reasoning why it will probably happen this way. There are two goals for the proposal: one is to document how things will work and the other, probably more important, one is to draw some feedback. It is easy to get security tools wrong, and even those with highest experience in that area (which I have not!) can fail. So it would be very beneficial to have some other folks read the proposal and comment on weaknesses they find – or simply things they would do differently or add to the overall idea. With that said, please read the (small) paper and provide feedback ;-).

Please keep on your mind that his is not only related to syslog but can  be used with any text-based log (including binary logs that are converted to text, e.g. by base64 encoding them). So it can affect you even if you are not interested in syslog itself. My (mostly uneducated) assumption is that this could be a toolset of great use for computer forensics.

LogTools 0.1.0 Released

I finally managed to do the initial public release of LogTools, a set of useful utilities for log data processing. Their current most important feature is the initial version of LogStore, a tamper-proof way to store textual log data especially designed for long-term archival. Note that LogTools perfectly process syslog messages, but can be used for anything that is text-based (like Apache or other application text logs).

I am very happy to finally have the initial release ready. Full details can be found in the release announcement. I initially thought it would become available early last week, but I wanted to create some packages. As I had never done this before, it turned out to become a bit of a problem for me. For the time being, I have settled to do an experimental Debian package via checkinstall. While this is obviously a quick and dirty solution, it enables folks to obtain LogTools via the easy way. Also, I don’t think it is too problematic, because in essence only some user-tools are installed that do not affect anything else on the system. But, of course, I’ll think about better packaging as the project continues.

At this point, I would be very interested in feedback both on the current tools as well on what would be considered a plus in the future. Please let me know!