a restartable interface for syslog actions…

These days, I had a quite interesting email discussion. I am reproducing it here anonymized, as I think it is probably useful to get the “big picture” of where rsyslog is heading to.

I got the following request


Btw, I am trying to understand if it is possible to create this logic:

  • specify 2 mysql servers (same schema)
  • when rsyslog detects a writing failure on primary mysql server (let’s
    say after the 1st retry)
  • start logging on the secondary server.
  • rollback to primary server will be manual (sighup to rsyslogd or
    something like this)

My reply was:


thanks for the suggestion. It actually is kind of on the todo list. Later this year, rsyslog will get a restartable queue interface. That is, when MySQL (or whatever else) goes down, messages are spooled to disk. When it comes back up, the spooled messages are written. All of this will happen in sequence.

I am currently doing a big code restructuring, and one of the reasons for it is the restartable interface ;) It will be a very powerful and generic solution, but thus it will take some time. I anticipate some time around fall.

And after some further conversation, I wrote this:


Before a fully restartable interface, I’ll add a capability to work with backup actions. That is something like this


*.* >database-writer
&[onfail] >backup-database-writer

The rule with [onfail] in it would trigger only if the database-writer fails. I have a similar request for tcp forwarding and I think I can elegantly integrate it into the action processor, once the output module interface is fully implemented (which, unfortunately, it not yet is ;)).

This conversation quite good communicates the upcoming ideas and its use-cases. I personally think that around September I can begin to implement it. So the current order of events probably is (in that order)

  • finish output module interface
  • implement multiple actions per selector line (actually also improves performance for multiple actions with the same filter condition – this was the initial reason to design it)
  • implement the failover mode for actions
  • implement the queued interface

Along these actions, we need to implement automatic suspension/re-enabling of actions. I am not yet sure when this happens – it probably depends on when it is needed. Another good point of this functionality would be even more simplified output module code. This also serves as motivation. Another thing that will happen some time is the loadable plug-in interface – but that will probably be quite easy once the output module interface is finished.

rsyslog changes upto 2007-07-30

I’ve not had as much time as I hope to have for rsyslog today. However, I also did a bit coding over the weekend. My work log is as follows:

2007-07-28
– fixed bug in freeSelectors()/stopWorker()
– added some module interface doc
2007-07-30
– released 1.17.5
– added cfsysline objects – initial set of functions
– fixed bug in OMSRcreate() – always returned SR_RET_OK
– fixed a bug that caused ommysql to always complain about missing
templates
– fixed a mem leak in OMSRdestruct – freeing the object itself was
forgotten – thanks to varmojfekoj for the patch
– fixed a memory leak in syslogd/init() that happend when the config
file could not be read – thanks to varmojfekoj for the patch
– moved skipWhiteSpace() to srUtils.c, where I think it fits better
– moved doBinaryOption() and doGetGUID() to cfsysline.c
– fixed insufficient memory allocation in addAction() and its helpers.
The initial fix and idea was developed by mildew, I fine-tuned
it a bit. Thanks a lot for the fix, I’d probably had pulled out my
hair to find the bug…

As you can see, I started on the final piece of output modules, that is handling of $-config lines (called cfsyslines). This time, I have a different, bottom-up approach. I now move the code first to the new object and then implement the object in all its glory. That costs me a bit more time for interim code that I will quickly discard, but it safes me the headache of coding hours and hours without the ability to test what I am doing (that was a big problem last friday). As I was interrupted often today, this approach proved indeed very valuable. It even allowed me to include mildew’s great patch this afternoon AND immediately release it to the anon cvs. As a side-note, this approach is also the reason why there is code in cfsysline.c that so far is never executed – it is the plumbing that I will activate when I have moved all the utility functions. hopefully, that’ll be tomorrow.

NASA Kids

I am always on the search for English learning resources for kids. I guess I found another one. Me and my son are space addicts. The NASA kids homepage seems to be a great fit. It provides fun and space facts while at the same time offering some *real* motivation to dig into English.

I am still on the search for some weekly or monthly newsletters especially written for kids. And of course, I would find those great that are specifically written for non-native speakers. If you come around any such thing, please drop me a note. The topic doesn’t necessarily be space or astronomy. Wildlife, sealife and anything else would also be great. I am looking forward to all the great suggestions ;) [well, honestly I think receiving even one would be great…].

on the syslogd -h option

While I work on rsylsog modularization, I also re-visit a lot of code. Please remember that rsyslog is rooted in the sysklogd package (and we always tried to keep it quite compatible with it). When I finished moving out references to the selector_t (struct filed) entries in the modules, I came across a place in the forwarding driver where the message element is accessed. You can look up that code in cvs (omfwd.c, line 597 and below).

This code implements the -h option, which stops forwarding messages when they did not originate from the local host. The intention of that option probably is to avoid a death spiral, which could be caused by two systems sending syslog messages back and forth (this scenario is actually even covered in RFC 3164, so it seems to happen from time to time…).

However, the code in sysklogd relies on hostnames to prevent that behaviour. If the hostname is different from the current hostname, then we have a remotely received message. I question if that check is always reliable (besides, it is not working right at the moment ;)). If that functionality is actually needed, it would be way better to check the messages target IP address against the local addresses (probably a lot of work, but definitely doable).

The question is, if such a feature is actually needed – and if it is needed in the output driver. To me, it sounds like a natural filter condition (“selector does not apply if host == non-localhost”). If that feature is required, it would probably be best suited to build it into filtering than into a (single) output module.

But again, the question is: do we really need to provide this functionality? Or is it an artifact long gone away?

Feedback is appreciated (you may also use the rsyslog forum, if you like).

rsyslog progress on 2007-07-27

I made big progress, even though the work log seems not to indicate it. The issues I worked on were quite complex. And, most frustratingly, there were no simple way to even compile rsyslog until the change was completed. So I hacked for about 6 hours without any feedback on the effect. Of course, after the first compile things were really bad. But over time, I managed to fix the bugs. Now I am quite happy with the result. The output module interface really begins to materialize. The next big thing is handling of configuration system line directives ($-lines). Stay tuned…

The work log for Friday:
– released 1.17.4
– added omsr object (objomsr.c, objomsr.h) – template request for output
modules
– changed doAction() interface
– templates and output string generation for doActon() is now fully
– removed selector_t f references from output modules
– MILESTONE reached: no more access to selector_t from any module, at
least at this layer we communicate via clean interfaces. However,there
remains the topic of global variable access and calling to functions
housed somewhere else (e.g. in syslogd.c). A new code review is now due,
many changes happened, many TODO’s added.

yesterday’s rsyslog changes

During the large rsyslog modularization effort, I take a more detailed audit log of what I am doing. I hope that this log will allow others to both follow the progress as well as help to understand what I am doing. I was not sure (and I still am not) where to post that log. I’ve now decided to post it to my blog, because it doesn’t look really suitable for the “offcial” rsyslog site.

Please note that the work audit contains more detail than the ChangeLog. This is intentional. The ChangeLog shall provide the average user with an idea of what’s now. My audit here provides a finer-grained information for those that are really interested in it.

Here come yesterday’s changes. They are listed in the same order I have done them.

– applied patch from mildew to avoid zombies
– applied patch from Michel Samia to fix compilation when NOT
compiled for pthreads
– implemented onSelectReadyWrite() interface
– MILESTONE reached: no more access to f->f_un in syslogd.c
– shuffled code from tcpsyslog.c to omfwd.c. It looks like it belongs more
to that file. But we need to look at it some time later. The move was
absolutely necessary so that no access to f->f_un happened in
tcpsyslog.c (which was evil)
– MILESTONE reached: no more access to f->f_un from non-output modules
– changed doAction() interface to include module data pointer
– removed references to f_un from omusrmsg.c
– changed module template for parseSelectorAct() [code reduction,
consitency]
– removed references to f_un from ommysql.c
– removed references to f_un from omfwd.c
– removed references to f_un from omshell.c
– removed references to f_un from omfile.c
– MILESTONE reached: f->f_un has gone away!
– removed f_type from omshell.c, omdiscard.c, omusrmsg.c, ommysql.c
– removed f_type from syslogd.c/cflineParseFileName()
– fixed bug in omfile.c which could lead to invalid addressing if “-” was
given to not sync file
– removed f_type from omfile.c
– implemented needUDPSocket() interface
– replaced (mis) use of f_prevcount in omfwd.c -> now data element in
instance data is used for retry counting
– removed f->f_type from syslogd.c, omfwd.c
– removed f->f_file from omfwd.c, omfile.c
– f->f_flags is gone away
– changed doAction() interface to contain the full message string
– f_iov and its handling has been removed
– added IDs to selector_t

If you are interested in even more details, you can go to the rsyslog cvs and see the changes on a file-by-file basis.

As the day closed, I identified a problem with the current interface definition: Modules need to access the template pointer in selector_t. They may even need to have multiple templates (e.g. dynaFiles, a hypothetical email action [subject and message text]). I need to address this soon.

I hope this audit is useful. Yesterday’s changes will be released as 1.17.4 this morning. Then, I continue to work on modularization.

rsyslog approved as Fedora 8 feature

Man, I was so busy, I didn’t even notice that the Fedora steering committee approved the rsyslog feature for Fedora 8. The rsyslog feature page in the Fedora Wiki is an interesting read. I am quite happy with the state of affairs. Most importantly, rsyslog is receiving a lot of testing now, and new bug reports and patches come in each day. This helps to make a rock-solid and feature-rich software, just as it should be.

rsyslog output module interface

Rsyslog‘s output module interface begins to materialize. I have even begun to restructure the code modules, which currently mostly means shifting code to different places. However, there is much more behind this code-shuffling. I’ve been thinking quite a while about modularization of rsyslog. What happens now is the result of this thinking. In the end, we will have output modules running on independent threads, each being able to queue data when the output for some reason is suspended (e.g. the remote syslog server it sends data to is unavailable). And, of course, the module interface will also support plug-ins.

The current MySQL action will become such a plug-in. I need to adopt a way to tell current users a way to migrate to the loadable module interface. I guess, I’ll add a dummy statement like

$ModLoad MySQL

To the current configuration. Well, yes, let’s do that – I’ve created a feature tracker as I write down this blog entry.

The only effect it will have in current code is that it tells the config engine that the user cared about modules. In builds that will later support loadable modules, it will actually load the mysql plugin. Currently, its only function will be to warn users to apply it, when they did not do it. That should take care of a smooth transition.

astronomy talk for kids

Live is not just about programming. Today, I took some time off to give an astronomy talk to elementary school kids. Their teacher had approached me some time ago and asked if I could give that talk at the end of their school excursion. Of course I could :)

It was a quite basic talk about the sun, moon and stars, with a focus on understanding our place in to solar system. Of course I covered all the nice planets and especially focused on Saturn (of course, because I am a SOC member ;)). We had big luck with the weather. Around noon, there was pouring rain and clouds, clouds, clouds. When I arrived at the school (they came over to our elementary school in Grossrinderfeld), the rain ceased somewhat.

We prepared and off went the talk. The kids were very interested and obviously had fun. And, believe it or not, by the time the talk completed, there was bright sunshine. So I could bring out my PST solar telescope and the kids could have a great look at our mother star (of course, the teachers liked it to).

To conclude the event, I dispersed NASA stickers (ESA doesn’t provide me some ;)) and left a lucky crowd.

Did I say this is a very rewarding activity? ;)