pace of changes in rsyslog

I have achieved much the past days with rsyslog. The output module interface is more or less ready. Of course, a few things need to be cleaned up, but all in all it looks good. Maybe a day or two for the cleanup. The interface is not yet perfect. Especially the way actons are configured should be changed (to make it more secure). However, none of this is really pressing. The initial goal was to get a better to maintain code base and prepare it for future enhancement. Both of this is basically finished.

So wouldn’t it be natural to go ahead and do all the nice other things? Well, I think not really. There has been much change. It doesn’t hurt to let the code base mature a little before further large changes are applied. Especially when we think that it will ship as part of Fedora 8. And I have to admit I need to look at some other work from time to time ;)

With all that, my plan is to refrain from large changes in August. Of course, I’ll probably can’t withstand doing some improvement here and there. But I do not plan to do the loadable plugins, advanced threading or input module interface for the time being. All of that will happen starting September. By then, I think I have a very good starting point for that work. And now I hope that rsyslog get’s a lot of testing (as already has happened the past weeks). I think that what we currently have will more or less 2.0.0. Let’s see how far features creep in besides my good resolutions ;)

rsyslog changes on 2007-07-31

Today was a quite good day. I had time to work the full day on rsyslog and could go for the complex things. To do them, I also had to add some more plumbing, which hopefully will also enable me to do other things much quicker in the future. For example, I have added a full-fledged, keyed linked list class today. I think that this class will most probably serve me again on several occasions. It is very generic and should be applicable to many more code pathes where linked lists are needed. I will not migrate the other linked lists right away, but in the long term this will most probably happen (and resulting in smaller code size for the same functionality).

I have also improved internal error handling and support macros. I have finally arrived at a stage where there is a well defined way (based on srRetVal data type) by which all functions communicate between each other. This includes something like finalizers. It works pretty neat. Of course, lot’s of code is still to be changed. But a lot is already done. I think this system also greatly improves code reliability, as all error states are consistently handled and reported to upper layers. I already see benefit from that in practice. For example, there was the problem that up until now config line parsing errors did not report line numbers. This was quite hard to obtain with the old code. Now, the high-level handler gets a neat error state and if it is an error, the high-level function spits out the line number – mission accomplished. Oh, what a joy when there is a well-designed and solid code base ;) I guess I will enjoy similar benefits more often in the future.

Here is the work log for today:

  • moved doGetGID() to cfsysline.c
  • moved umask & file/dir creation mode parsing to cfsysline.c
  • added macro for easy and consistent check of iRet return value
  • moved the SetCCEscapeCharacter config file directive to cfsysline.c and also generalized it for further use while doing so
  • added doGetInt() to cfsysline.c and adapted dynaFileChaceSize handler to use it
  • added macro to consistently define iRet
  • added macro to abort a function and go to finalizer
  • added output of config file line number when a parsing error occured
  • moved code to open config file into separate function processConfFile()
  • fixed bug in objomsr.c that caused program to abort in debug mode with an invalid assertion (in some cases)
  • moved debug printf code out of init() into its own function
  • added doCustomHdlr() to cfsysline.c – this completes implementing functions for canned handlers.
  • added a generic linked list object (files linkedlist.h/c)
  • fixed a typo that caused the default template for MySQL to be wrong. thanks to mildew for catching this.
  • prepared cfsysline.c for integration into output modules
  • changed modInit() interface to contain pointer to host-function query method
  • added interface to register a cfsysline command handler (basic functionality)
  • got the basic code in place to create an in-memory list of cfsysline handlers (omfile.c used as testing case) — not yet in active code
  • omfile.c now uses the new table-driven cfsysline system
  • added configuration file command $DebugPrintModuleList and $DebugPrintCfSysLineHandlerList
  • all cfsysline directives now use new table-driven cfsysline system except for $ResetConfigVariables
  • $ResetConfigVariables now also works via the table-driven sytem. However, I need to fix an issue with loading default settings when syslogd is started or HUPed

a restartable interface for syslog actions…

These days, I had a quite interesting email discussion. I am reproducing it here anonymized, as I think it is probably useful to get the “big picture” of where rsyslog is heading to.

I got the following request


Btw, I am trying to understand if it is possible to create this logic:

  • specify 2 mysql servers (same schema)
  • when rsyslog detects a writing failure on primary mysql server (let’s
    say after the 1st retry)
  • start logging on the secondary server.
  • rollback to primary server will be manual (sighup to rsyslogd or
    something like this)

My reply was:


thanks for the suggestion. It actually is kind of on the todo list. Later this year, rsyslog will get a restartable queue interface. That is, when MySQL (or whatever else) goes down, messages are spooled to disk. When it comes back up, the spooled messages are written. All of this will happen in sequence.

I am currently doing a big code restructuring, and one of the reasons for it is the restartable interface ;) It will be a very powerful and generic solution, but thus it will take some time. I anticipate some time around fall.

And after some further conversation, I wrote this:


Before a fully restartable interface, I’ll add a capability to work with backup actions. That is something like this


*.* >database-writer
&[onfail] >backup-database-writer

The rule with [onfail] in it would trigger only if the database-writer fails. I have a similar request for tcp forwarding and I think I can elegantly integrate it into the action processor, once the output module interface is fully implemented (which, unfortunately, it not yet is ;)).

This conversation quite good communicates the upcoming ideas and its use-cases. I personally think that around September I can begin to implement it. So the current order of events probably is (in that order)

  • finish output module interface
  • implement multiple actions per selector line (actually also improves performance for multiple actions with the same filter condition – this was the initial reason to design it)
  • implement the failover mode for actions
  • implement the queued interface

Along these actions, we need to implement automatic suspension/re-enabling of actions. I am not yet sure when this happens – it probably depends on when it is needed. Another good point of this functionality would be even more simplified output module code. This also serves as motivation. Another thing that will happen some time is the loadable plug-in interface – but that will probably be quite easy once the output module interface is finished.

rsyslog changes upto 2007-07-30

I’ve not had as much time as I hope to have for rsyslog today. However, I also did a bit coding over the weekend. My work log is as follows:

2007-07-28
– fixed bug in freeSelectors()/stopWorker()
– added some module interface doc
2007-07-30
– released 1.17.5
– added cfsysline objects – initial set of functions
– fixed bug in OMSRcreate() – always returned SR_RET_OK
– fixed a bug that caused ommysql to always complain about missing
templates
– fixed a mem leak in OMSRdestruct – freeing the object itself was
forgotten – thanks to varmojfekoj for the patch
– fixed a memory leak in syslogd/init() that happend when the config
file could not be read – thanks to varmojfekoj for the patch
– moved skipWhiteSpace() to srUtils.c, where I think it fits better
– moved doBinaryOption() and doGetGUID() to cfsysline.c
– fixed insufficient memory allocation in addAction() and its helpers.
The initial fix and idea was developed by mildew, I fine-tuned
it a bit. Thanks a lot for the fix, I’d probably had pulled out my
hair to find the bug…

As you can see, I started on the final piece of output modules, that is handling of $-config lines (called cfsyslines). This time, I have a different, bottom-up approach. I now move the code first to the new object and then implement the object in all its glory. That costs me a bit more time for interim code that I will quickly discard, but it safes me the headache of coding hours and hours without the ability to test what I am doing (that was a big problem last friday). As I was interrupted often today, this approach proved indeed very valuable. It even allowed me to include mildew’s great patch this afternoon AND immediately release it to the anon cvs. As a side-note, this approach is also the reason why there is code in cfsysline.c that so far is never executed – it is the plumbing that I will activate when I have moved all the utility functions. hopefully, that’ll be tomorrow.

on the syslogd -h option

While I work on rsylsog modularization, I also re-visit a lot of code. Please remember that rsyslog is rooted in the sysklogd package (and we always tried to keep it quite compatible with it). When I finished moving out references to the selector_t (struct filed) entries in the modules, I came across a place in the forwarding driver where the message element is accessed. You can look up that code in cvs (omfwd.c, line 597 and below).

This code implements the -h option, which stops forwarding messages when they did not originate from the local host. The intention of that option probably is to avoid a death spiral, which could be caused by two systems sending syslog messages back and forth (this scenario is actually even covered in RFC 3164, so it seems to happen from time to time…).

However, the code in sysklogd relies on hostnames to prevent that behaviour. If the hostname is different from the current hostname, then we have a remotely received message. I question if that check is always reliable (besides, it is not working right at the moment ;)). If that functionality is actually needed, it would be way better to check the messages target IP address against the local addresses (probably a lot of work, but definitely doable).

The question is, if such a feature is actually needed – and if it is needed in the output driver. To me, it sounds like a natural filter condition (“selector does not apply if host == non-localhost”). If that feature is required, it would probably be best suited to build it into filtering than into a (single) output module.

But again, the question is: do we really need to provide this functionality? Or is it an artifact long gone away?

Feedback is appreciated (you may also use the rsyslog forum, if you like).

rsyslog progress on 2007-07-27

I made big progress, even though the work log seems not to indicate it. The issues I worked on were quite complex. And, most frustratingly, there were no simple way to even compile rsyslog until the change was completed. So I hacked for about 6 hours without any feedback on the effect. Of course, after the first compile things were really bad. But over time, I managed to fix the bugs. Now I am quite happy with the result. The output module interface really begins to materialize. The next big thing is handling of configuration system line directives ($-lines). Stay tuned…

The work log for Friday:
– released 1.17.4
– added omsr object (objomsr.c, objomsr.h) – template request for output
modules
– changed doAction() interface
– templates and output string generation for doActon() is now fully
– removed selector_t f references from output modules
– MILESTONE reached: no more access to selector_t from any module, at
least at this layer we communicate via clean interfaces. However,there
remains the topic of global variable access and calling to functions
housed somewhere else (e.g. in syslogd.c). A new code review is now due,
many changes happened, many TODO’s added.

yesterday’s rsyslog changes

During the large rsyslog modularization effort, I take a more detailed audit log of what I am doing. I hope that this log will allow others to both follow the progress as well as help to understand what I am doing. I was not sure (and I still am not) where to post that log. I’ve now decided to post it to my blog, because it doesn’t look really suitable for the “offcial” rsyslog site.

Please note that the work audit contains more detail than the ChangeLog. This is intentional. The ChangeLog shall provide the average user with an idea of what’s now. My audit here provides a finer-grained information for those that are really interested in it.

Here come yesterday’s changes. They are listed in the same order I have done them.

– applied patch from mildew to avoid zombies
– applied patch from Michel Samia to fix compilation when NOT
compiled for pthreads
– implemented onSelectReadyWrite() interface
– MILESTONE reached: no more access to f->f_un in syslogd.c
– shuffled code from tcpsyslog.c to omfwd.c. It looks like it belongs more
to that file. But we need to look at it some time later. The move was
absolutely necessary so that no access to f->f_un happened in
tcpsyslog.c (which was evil)
– MILESTONE reached: no more access to f->f_un from non-output modules
– changed doAction() interface to include module data pointer
– removed references to f_un from omusrmsg.c
– changed module template for parseSelectorAct() [code reduction,
consitency]
– removed references to f_un from ommysql.c
– removed references to f_un from omfwd.c
– removed references to f_un from omshell.c
– removed references to f_un from omfile.c
– MILESTONE reached: f->f_un has gone away!
– removed f_type from omshell.c, omdiscard.c, omusrmsg.c, ommysql.c
– removed f_type from syslogd.c/cflineParseFileName()
– fixed bug in omfile.c which could lead to invalid addressing if “-” was
given to not sync file
– removed f_type from omfile.c
– implemented needUDPSocket() interface
– replaced (mis) use of f_prevcount in omfwd.c -> now data element in
instance data is used for retry counting
– removed f->f_type from syslogd.c, omfwd.c
– removed f->f_file from omfwd.c, omfile.c
– f->f_flags is gone away
– changed doAction() interface to contain the full message string
– f_iov and its handling has been removed
– added IDs to selector_t

If you are interested in even more details, you can go to the rsyslog cvs and see the changes on a file-by-file basis.

As the day closed, I identified a problem with the current interface definition: Modules need to access the template pointer in selector_t. They may even need to have multiple templates (e.g. dynaFiles, a hypothetical email action [subject and message text]). I need to address this soon.

I hope this audit is useful. Yesterday’s changes will be released as 1.17.4 this morning. Then, I continue to work on modularization.

rsyslog approved as Fedora 8 feature

Man, I was so busy, I didn’t even notice that the Fedora steering committee approved the rsyslog feature for Fedora 8. The rsyslog feature page in the Fedora Wiki is an interesting read. I am quite happy with the state of affairs. Most importantly, rsyslog is receiving a lot of testing now, and new bug reports and patches come in each day. This helps to make a rock-solid and feature-rich software, just as it should be.

rsyslog output module interface

Rsyslog‘s output module interface begins to materialize. I have even begun to restructure the code modules, which currently mostly means shifting code to different places. However, there is much more behind this code-shuffling. I’ve been thinking quite a while about modularization of rsyslog. What happens now is the result of this thinking. In the end, we will have output modules running on independent threads, each being able to queue data when the output for some reason is suspended (e.g. the remote syslog server it sends data to is unavailable). And, of course, the module interface will also support plug-ins.

The current MySQL action will become such a plug-in. I need to adopt a way to tell current users a way to migrate to the loadable module interface. I guess, I’ll add a dummy statement like

$ModLoad MySQL

To the current configuration. Well, yes, let’s do that – I’ve created a feature tracker as I write down this blog entry.

The only effect it will have in current code is that it tells the config engine that the user cared about modules. In builds that will later support loadable modules, it will actually load the mysql plugin. Currently, its only function will be to warn users to apply it, when they did not do it. That should take care of a smooth transition.