modules, core functionality and rsyslog v3…

As I have written, I have begun to work on rsyslog v3 (and so far I am pleased to say that I have made quite good progress on it). One of the things with rsyslog v3 is that it will have an even more module architecture, utilizing loadable modules for almost everything. I asked on the mailing list about backward compatibility and I received this very good response by Michael Biebl:

One thing I was wondering:

If you intend to shift all (even core) functionality into loadable modules, how do do you handle things like –help or available command line options like -m?

Do you want to hardcode it or will you provide an interface, where rsyslog will query the module about its help message and available options.

I’m also still a bit uncertain, if moving everything resp. core functionality to modules is a good idea (for problems you already mentioned). Imho having all core functionality in a single binary is simply much more robust and fool proof. For things like the SQL db output plugin, the module interface is great, because it avoids to pull in large library and package dependencies and allows to install them on a as need basis. For other functionality I still need to recognize the benefits.

Rainer, could you roughly sketch, how you envision to break rsyslog into loadable modules in v3. Which kind of functionality would be loadable as module, which functionality do you plan to keep in the rsyslogd binary. A listing of all (planned) modules + the provided functionality and requirements would really help.

Another thing: Say you move the regexp support into a separate module. If a regexp is then used in rsyslog.conf, will you bail out with an error, simply print a warning (which could go unnoticed and the poor administrator doesn’t know why his regexp doesn’t know) or load modules on demand.

For the latter you’d need some kind of interface to query the *.so files for their supported functionality. I.e. the modules would export a list of config directives it supports and rsyslog could upon startup query each available module and create a map.

So, e.g. the ommysql module would export its support for the :ommysql: config directive. Whenever rsyslog finds such a config directive it could/would load the modules on demand.

Same could be done for the command line parameters. The imklog module would export, that it supports the -m command line parameter. Whenever that commandline parameter is used, rsyslog would know which module to load.

There are only rough ideas and there is certainly still much to consider. But what do you think about the basic idea?

This is a great response – it not only asks questions but offers some good solutions, too. It comes at a perfect time, too, because there is much that is not yet finalized for v3. For sure I have (hopefully good ;)) ideas, but all of them need to be proven in practice. The issues that come up here are a good example.

So, now let me go into the rough sketch about I envision what v3 will do. Note that it is what I envision *today* – it may change if I get good reasoning for change and/or smarter solutions.

First, let me introduce two blog posts which you may want to read before continuing here:

And, most importantly, this post already has the root reasoning for pushing things out of the syslogd core:

Let me highlight the two most important parts from that later post:

This is exactly the way rsyslog is heading: we will try to provide an ultry-slim framework which offers just the basic things needed to orchestrate the plug-ins. Most of the functionality will indeed be available via plug-ins, dynamically loaded as needed.

With that design philosophy, we can make rsyslog really universally available, even on low-powered devices (loading just a few plug-ins). At the high end, systems with a lot of plug-ins loaded will be able to handle the most demanding tasks.

And this is actually what the v3 effort is all about: rsyslog should become as modular as possible, with the least amount of code in the core linked binary and everything else provided via plugins. I still do not know exactly how that will happen, I am approaching it incrementally. I am now at the input plugins and trying to set them right.

In the longer term, there will be at least three different types of plugins: output, input and “filter”. I think I do not need to elaborate about the first to. Filter plugins will provide work together with expressions, another feature to come. It will enhance the template and filter system to provide a rich expression capability supporting function calls. For example, a template may look like this in a future release:

$Template MyTemplate, substr(MSG, 5, 10) + “/” + tolower(FROMHOST) + “/”

and a filter condition may be

:expr:substr(MSG, 5, 10) == “error” /var/log/errorlog

Don’t bash me for the config format shown above, that will also change ;)

Regexpt functionality will then be provided by something like a regexp() function. Functions will be defined in loadable modules. Pretty no function will be in the core. A module may contain multiple functions.

Bottom line: almost everything will be a loadable module. If you do not load modules, rsyslog will not do anything useful.

Now a quick look at the command line options: I don’t like them. Take -r, for example. Sure, it allows you to specify a listener port and also allows to convey that a listener should be started at all. But how about multiple instances? How about advanced configuration parameters? I think command line options are good for simple cases but rsyslog will provide much more than can be done with simple cases. I favor to replace all command line options with configuration file directives. This is the right place for them to be. Except, of course, such things like where to look for the master configuration file.

Which brings up backward compatibility. As you know, I begin to be puzzled about that. After all, rsyslog is meant to be a drop-in replacement for sysklogd. That means it should run with the same options like sysklogd – and should also enable administrators to build on their knowledge with sysklogd. Tough call.

Thankfully, sur5r introduced the idea of having a compatibility mode. He suggested to look at the absence of a rsyslog.conf file and then conclude that we need to run in that mode. That probably is a good suggestion that I will pick up. It can also be extended: how about a, for example, “-c” command line switch. If absent it tells rsyslog to use compatibility mode. And it should absent in previous versions as well as sysklogd, because it was not defined there.

Now let’s think. If we know we need to provide compatibility, we can load a plugin implementing compatibility settings (again, moving that out of the core functionality). Once loaded, it could analyze the rest of the command line and load whatever modules are necessary to make rsyslogd correctly interpret a post v3 configuration file. That way we have a somewhat larger then necessary memory footprint, but all works well.

Then back to native mode. Here, indeed, I’d expect that the user loads each and every module needed. I assume, however, that for any typical package the maintainer will probably load all “core” functionality (like write to file, user message, several inputs, common filter functions, …) right there in the default rsyslog.conf. This make sense for today’s hardware. It also will make the config quite foolproof. A good way to implement that would work on the semantics of $IncludeConfig. How about:

$ModLoad /whereever/necessrayplugins/

which would load all plugins in that directory.

The key point, however, is that in a limited environment, the very same binaries can be used. No recompilation required. This would be scenarios with e.g. embedded devices – or security sensitive environments where only those components that are absolutely vital should run (which is good practice because it protects you from bugs in the not-loaded code).

I personally find it OK to handle the situation as described above. I don’t like magic autoloading of modules.

This modular approach has also great advantages when it comes to maintaining the code and making sure it is as bugfree as possible. Modules tend to be small, modules should be independent of each other. So testing and finding/fixing bugs that escaped testing should be considerably easier than with the v2 code base. There are also numerous other advantages, but I think that goes to far for this post…

Comments are appreciated. Especially if you do not like what I intend to do. Now is the time to speak up. In a few weeks from now, things have probably evolved too far to change some of the basics.

Shuttle Tanking Test – good results, launch date affected?

Unfortunately, I was not able to view the full news briefing (hadn’t expected it to happen that soon). But I know the most important facts and would like to pass them on here. On the briefing appeared space shuttle program manager Wayne Hale.

Mr. Hale said that the problem is with the “feedthrough” connector. This is a connector on the external tank that connects the in-tank ECO sensors with the lines that go to the point sensor box inside the space shuttle. That connector fails when put under thermal stress, causing some circuits to become open. This is a very rough description, but I hope it helps you get the idea. Mr. Hale said: “We know it is in the connector, but not if it is in the inside or outside part“.

How today’s test results will affect the launch schedule is not yet clear – the results were somewhat unexpected (I am NOT quoting Hale here) and NASA is now checking which forward actions to be taken (this, however, is a Hale quote ;)). One of the exact quotes I was able to get hold of: “The program is being asked to assemble the ISS. We’d like to do that as quickly as we can in regard to safety. We don’t need to go fly if it is not safe. We will follow that trail and see where it leads us. And when we have fixed it we’ll go fly – no matter if it is Jan 10 or Feb 10th or whenever“.

To me, the bottom line is that the January 10th launch is under consideration. We will know for sure only in a few days when the data gathered has been analyzed and a proper plan has been crafted. After all, this is why such launch schedules are always “no earlier than”. And I think it is a good thing to put safety first!

With that, I conclude my report for today. I’ll try to do a sum-up tomorrow but guess I’ve already covered the most important things.

rsyslog changes for 2007-12-17

Yesterday’s rsyslog changes:

2007-12-17
– fixed a potential race condition with enqueueMsg() – thanks to mildew
for making me aware of this issue
– created thread-class internal wrapper for calling user supplied thread
main function
– solved an issue when compiling immark.c on some platforms. LARGEFILE
preprocessor defines are changed in rsyslog.h, which causes grief
for zlib. As a temporary solution, I have moved rsyslog.h right at the
beginnng of the include order. It’s somewhat dirty, but it works. I think
the real solution will be inside the autoconf files.
– moved thread termination code out to threads.c
– implemented $MarkMessagePeriod config directive
– command $ResetConfigVariables implemented for immark.c
– begun imklog, replacing klogd.c (finally we get rid of it…)
– implemented $DebugPrintKernelSymbols
– implemented afterRun input module interface function
– implemented $klogSymbolsTwice config directive

As you can see, it was quite a busy day. The input module interface has already materialized for the most part.

Space Shuttle Atlantis Tanking Test in Progress

Space Shuttle Atlantis at the Pad during the December, 18th tanking test in support of ECO sensor troubleshooting.Today’s tanking test gathers additional data on ECO sensor (What is an ECO sensor?) failures with the ECO sensor circuits being fully instrumented. In order to gather data, an ECO sensor failure is needed. Before the start of the tanking test, it is anticipated that ECO sensor #3 will fail wet and all others will work correctly.

2:08a: The tanking test just went into fast fill mode. During the last launch attempt, ECO sensor #3 failed only a few minutes into fast fill mode.

8:14a: If I got the commentator right, there already is a failure of a sensor. That would be good.

8:21a: NASA TV commentator: “It appears that #1 sensor has failed, but the hydrogen console has not officially made that call. So they are still looking at it. Number 2 sensor has been intermittent. The engineers at the main propulsion system console are very intently looking at their data right now. The intermittent data on sensor #2 is somewhat of a surprise because we haven’t seen that before on that sensor.”

So as it looks, the ECO sensor system behaves again different from the last tanking, which was at the December, 9th launch attempt.

8:23a: NASA TV: “The propulsion console has provided confirmation that sensor #1 on liquid hydrogen has hard failed, which is what we wanted. The #2 and #3 are intermittent, we are watching for what those are going to do. We have not seen any activity of number 2 sensor on prior tankings.”

8:26a: NASA TV: “The hard fail on #1 is definitely what we need in order to do the troubleshooting at the pad later this morning.”

So far, this looks very good. Based on analysis done in the past days, sensor #1 is not expected to return to a non-failed state.
So this one will hopefully a good candidate for the troubleshooting which will then be able to pinpoint the culprit. The intermittent failures of #2 and #3 are outside of expectations, at least as far as I understood the analysis done. However, if they now remain functional there still may be a good explanation for that. Let’s see how things evolve…

9:32a NASA commentary just announced that engineers have made up their troubleshooting plan. “The TDR sensor equipment will be focussed on sensors #2 and #3 as they have been intermittent and it is the greatest source of interest to see where that intermittent reading is coming from. Number 1 and 4 will be recorded continuously. … Right now they are configuring for the console activity that would be required once we go into stable replenish … very shortly. … So the final inspection team is preparing to enter into the pad, but we probably have another half an hour or so before we are into stable replenish.”

9:48a NASA TV: “The liquid hydrogen tank is full now and in stable replenish. The liquid oxygen (LOX) is at 80%. We’ve probably about another 25 to 30 minutes or so before liquid oxygen is in stable replenish.

10:00a: NASA TV: “We will begin to drain the tank between 1p and 2p and once the tank is drained the teams go back to do additional trouble shooting. A lot of data has already been collected on sensor #1 because it failed early.”

10:13a: “The problem with the liquid ox pump appears that a fuse has blown, so they are setting to the backup lox pump, … which will us back to where we were close to the stable replenish. This will take around 45 minutes to an hour. Why the fuse blow is still not clear”. So we have some delay in the troubleshooting plan.

11:45a: I needed to go out for a while, thus no updates. Atlantis is now in stable replenish state and troubleshooting is right now happening at the pad. The NASA TV commentary just announced that teams are now looking at ECO sensor #4 and the 5% sensor. Here are some pictures from the testing:

The picture above shows a room that is right inside the mobile launcher platform. This is where data today is analyzed.

11:52a: NASA TV “The … team is now on the mobile launcher platform and gathering data on sensor number 2 and 3.”

11:53a: a side-note: the ISS spacewalk just has completed, the spacewalkers are back into the airlock, which is now being repressurized. They did not find any obvious problem, but collected samples to be returned to earth in January on board STS-122.

12:06p: NASA TV: “Data will be collected remotely during de-tanking.” The equipment is currently being set up for this process. People are permitted close to the space shuttle only during stable replenishment phase (which essentially means no tank or detank operation is taking place). During detanking, they must leave for safety reasons. The set up process can be seen in this picture:

6:12p: and now those members at the controls can be seen leaving the control room. Everything is now set up for remote monitoring. The NASA TV commentary tells that good data has been gathered. The tank will be drained to 5% while instrumentation is on. Once it is fully drained, members of the red crew will return for further troubleshooting.


And on this picture, a wire can be seen. This wire is tapped into the ECO sensor circuits and connects to the time domain reflectometer (TDR) equipment that is used to do a detail analysis of the circuits. TDR is an off-the-shelf technology in wide spread use e.g. by cable companies to detect faulty lines.


around 12:25p: I was too slow for a quote, but NASA TV commentary stated that everyone seems to be quite happy with the data gathered. It sounded like a successful test. The commentator also announced that more – and official – information is available in the news briefing later today. For me (being six hours ahead), it probably is too late, but I’ll then do a wrap-up tomorrow morning. Let’s hope that they have actually managed to find the culprit and nail it down – and also let’s hope that the rest of the day (detanking test) also goes well.

12:40p: it looks like the red crew is now leaving the pad after finishing late inspections. I noticed they used binoculars and telescopes during that process. I (now) think they also visually checked for ice buildup and maybe some other things. Probably just part of the usual procedure. Here, the red crew can be seen leaving the mobile launcher platform (the last frame I was able to capture, sorry for the blur…).


The NASA TV commentator announced a few minutes before that engineers are ready for detanking once the red crew leaves. So I assume detanking will begin shortly.

12:48p: NASA TV: “The final inspection team has now left the pad … In the mean time we had a failure of ECO sensor #3, the second sensor that had failed. It was intermittent … and we are collecting data on sensor #3 … right now which is able to be done remotely much as we got the initial data from sensor #1 when it failed earlier today.”

I didn’t get the full quote, but as far as I understood, sensor #3 has now also a hard failure. That, IMHO, would be somewhat bad news because it means that the condition will change once the tank is loaded. However, this is just my uninformed guess. Also, on a actual launch, the shuttle would be long gone and fly in orbit. So that may not mean anything at all. I should stop speculating ;)

12:52pm: NASA TV: “We are close now to start the external tank draining operation. and we did indeed get the kind of data that we needed to get to the (bottom of?) that trouble.”

12:58p: NASA TV: “We did see sensor #3 also fail, so we try to get some data from it before we start draining.” So it actually has a hard fail and NASA uses the opportunity to get something out of that. To me this sounds like it failed just a few minutes ago, otherwise the red team would have looked at it (wouldn’t they?).

01:00p: NASA TV: “on the recently failed ECO sensor #3” – and indeed, it recently failed. Right in time, one could say…

3:00p: as I wrote, I have been away from my computer for a while. NASA TV has ended coverage of the tanking test in the mean time, so I can not provide any more status updates. I’ll try to do another post after the news briefing, but that will probably be tomorrow (I hope I’ll be able to get hold of an archived version of the news briefing). All in all, it looks like an excellent day at Kennedy Space Center, with good results that will bring up the root cause of the ECO sensor issue. Let’s hope that I have the right reception of today’s events.

Thanks to everyone who cared reading my post! I hope it has been useful and will be as a reference.

I’ve now also written down some results from the post-test news briefing.

Tracking file deletions on Windows

Have you ever wondered why an important file magically disappeared? My co-worker Andre has worked with a couple of folks, who didn’t like that scenario. As a corporate policy, deletion to some important file locations must be logged. Andre has created a nice guide for use under Windows. He utilizes MonitorWare Agent’s event log monitoring capabilities together with its advanced rule engine.

That guide is not only a good source of information if you need to implement Windows file deletion tracking. It also shows nicely what can be done with MonitorWare. It definitely helps understanding the full potential and how to make best use of it.

I suggest you have a quick look at Andre’s guide to tracking file and directory deletions under Windows.

ISS SARJ inspection spacewalk has begun

Picture taken shortly after the begin of the ISS SARJ inspection spacewalk on December, 18th 2007.The international space station ISS crew has headed outside of the orbiting complex to check the contamination of the starboard solar array rotary joint (SARJ). They are performing this task while I am writing. First results are expected during the course of today, with a detailed analysis to follow some time later (depending on the findings).

Side-Note: Atlantis tanking test will begin in roughly an hour from now. Final preparations are underway.

Tuesday: ISS Spacewalk and Shuttle Fueling Test…

Tomorrow is a busy day for NASA – both on earth as well as in orbit. Seems to be “troubleshooting Tuesday”: The international space station crew performs a spacewalk to check out what is wrong with the orbiting laboratory while the ground crew at Kennedy Space Center checks out their supply vessel. Both activities are in support of the international space station program. Let’s hope everything turns out well.

But now let me quote a mail that I received from NASA, it is an excellent wrap-up of tomorrow’s activities, including ways to experience them first hand:

NASA Television will provide simultaneous live coverage of a spacewalk by the International Space Station crew and a shuttle fueling test at NASA’s Kennedy Space Center, Fla., on Tuesday, Dec. 18.

Expedition 16 Station Commander Peggy Whitson and Flight Engineer Dan Tani are set to venture outside the station at about 6 a.m. EST to perform a detailed inspection of a giant rotary joint where contamination was found last month. The joint is used to rotate the starboard solar arrays of the complex to face the sun. The astronauts also will devote part of the spacewalk to an inspection of a device that tilts the starboard arrays toward the sun. The device, known as a Beta Gimbal Assembly, experienced unrelated electrical problems last weekend.

NASA TV’s public channel will begin coverage of spacewalk activities at 4:30 a.m. A briefing will follow the spacewalk, originating from NASA’s Johnson Space Center, Houston, no earlier than 1:30 p.m.
Reporters will be able to ask questions from participating NASA sites. The briefing participants are:

— Mike Suffredini, International Space Station Program manager
— Ginger Kerrick, International Space Station spacewalk flight director
— Tomas Gonzalez-Torres, International Space Station spacewalk officer

At 7 a.m. EST, Tuesday, NASA TV’s media channel will begin coverage of a fueling test of space shuttle Atlantis at Kennedy’s Launch Pad 39A. The fueling test will assist engineering efforts to resolve a problem with an engine cutoff sensor system that prevented Atlantis’ launch attempts earlier this month. Reporters will be notified of any plans to hold a news briefing following the test’s conclusion.

For NASA TV streaming video, schedules, and downlink information,visit:

http://www.nasa.gov/ntv

For more information about the space station and the Expedition 16 crew, visit:

http://www.nasa.gov/station

For more information about space shuttle Atlantis’ upcoming STS-122 mission, visit:

http://www.nasa.gov/shuttle

rsyslog changes upto 2007-12-14

This is my worklog for rsyslog:

2007-12-12
– begun to shuffle the mark code to a separate module – that will take some
time and definitely require much more code shuffling. This is the begin
of the input module interface

2007-12-14
– created new branch for what will become 2.0.0 stable
– begin work on on immark, the first input module. In the long term
this will lead to a complete rewrite of the input system
– changed license to GPLv3 (for what is to become rsyslog v3)
– moved core threading helpers out of syslogd.c
– remove USE_PTHREADS macro from all sources except omfwd.c (I wait
for a gssapi patch from Red Hat, removing these macros would probably
cause unnecessary grief…)
– tried approach to terminate input module thread via pthread_kell() – so
far, seems to work ok
– begun to create input module interface and macros
– changed module interface to include function to query type
– milestone: can load input module dynamically, but can not do anything
with it – now I need to think about activating IMs…

STS-122: Atlantis being instrumented…

Space shuttle Atlantis stands on Launch Pad 39A at NASA's Kennedy Space Center in Florida. Photo credit: NASA/Cheryl MansfieldTroubleshooting efforts at Kennedy Space Center (KSC) continues: Space shuttle Atlantis has been instrumented for a tanking test scheduled to begin Monday. In parallel to the work at KSC, engineers at other NASA centers gather data about how the elements of the ECO sensor system should respond during the tanking test. This provides a baseline against which the actual tanking test results can be compared. Unfortunately, it is not sure if the failure will re-occur during the tanking test. But according to the latest findings it is quite likely.

Please also let me quote the relevant part of NASA’s shuttle home page:

Engineers and technicians at NASA’s Kennedy Space Center continue preparations to evaluate the hydrogen fuel sensor system on space shuttle Atlantis’ external fuel tank during a procedure next week. Working at Launch Pad 39A where Atlantis remains pointed to space, workers attached wiring to the cables that lead from the aft compartment of Atlantis to the external tank’s engine cutoff sensor system. Engineers will use the special instruments next Tuesday to send electrical pulses into the wiring and look for indications that will show the location of the issue that caused the sensors to return false readings last week. The failed readings showed up during launch countdowns on Dec. 6 and Dec. 9. Launch controllers postponed the liftoff on both occasions to find out the problem and develop solutions. Evaluations of the instruments themselves are also under way to show technicians what a normal reading on the external tank looks like. Those readings will be compared to the results from the test Tuesday during which the tank attached to Atlantis will be filled with super-cold liquid hydrogen. NASA is targeting Jan. 10 as the next possible launch opportunity for Atlantis on mission STS-122. Atlantis will carry the European-built Columbus laboratory to the International Space Station.

STS-122 now set to Launch January, 10th 2008

The target launch date for space shuttle Atlantis STS-122 mission to the international space station has now been moved to January, 10th. Originally, it was set for January, 2nd when an ECO sensor problem made it impossible to lift off during the December 2007 launch window. The January, 10th date has now been selected to allow NASA workers to get some rest. They have been extremely busy. The holiday period is now a perfect time to make sure everybody is in a great shape when it comes to the next launch attempt.

As far as I know, launching exactly on January, 10th will not affect the overall shuttle launch schedule for 2008. However, I suspect even a further one day delay means trouble for the flight plan.

And finally, this is what the NASA shuttle home page has to say:

NASA’s Space Shuttle Program managers have targeted Jan. 10 for the launch of shuttle Atlantis’ STS-122 mission to the International Space Station.

“The workforce has stepped up to and met every challenge this year,” said Wayne Hale, Space Shuttle Program manager at NASA’s Johnson Space Center. “Moving the next launch attempt of Atlantis to Jan. 10 will allow as many people as possible to have time with family and friends at the time of year when it means the most. A lot has been asked of them this year and a lot will be asked of them in 2008.”

The liftoff date from NASA’s Kennedy Space Center, Florida, depends on the resolution of a problem in a fuel sensor system. The shuttle’s planned launches on Dec. 6 and Dec. 9 were postponed because of false readings from the part of the system that monitors the liquid hydrogen section of the tank.