rsyslog changes for 2007-12-17

Yesterday’s rsyslog changes:

2007-12-17
– fixed a potential race condition with enqueueMsg() – thanks to mildew
for making me aware of this issue
– created thread-class internal wrapper for calling user supplied thread
main function
– solved an issue when compiling immark.c on some platforms. LARGEFILE
preprocessor defines are changed in rsyslog.h, which causes grief
for zlib. As a temporary solution, I have moved rsyslog.h right at the
beginnng of the include order. It’s somewhat dirty, but it works. I think
the real solution will be inside the autoconf files.
– moved thread termination code out to threads.c
– implemented $MarkMessagePeriod config directive
– command $ResetConfigVariables implemented for immark.c
– begun imklog, replacing klogd.c (finally we get rid of it…)
– implemented $DebugPrintKernelSymbols
– implemented afterRun input module interface function
– implemented $klogSymbolsTwice config directive

As you can see, it was quite a busy day. The input module interface has already materialized for the most part.

Space Shuttle Atlantis Tanking Test in Progress

Space Shuttle Atlantis at the Pad during the December, 18th tanking test in support of ECO sensor troubleshooting.Today’s tanking test gathers additional data on ECO sensor (What is an ECO sensor?) failures with the ECO sensor circuits being fully instrumented. In order to gather data, an ECO sensor failure is needed. Before the start of the tanking test, it is anticipated that ECO sensor #3 will fail wet and all others will work correctly.

2:08a: The tanking test just went into fast fill mode. During the last launch attempt, ECO sensor #3 failed only a few minutes into fast fill mode.

8:14a: If I got the commentator right, there already is a failure of a sensor. That would be good.

8:21a: NASA TV commentator: “It appears that #1 sensor has failed, but the hydrogen console has not officially made that call. So they are still looking at it. Number 2 sensor has been intermittent. The engineers at the main propulsion system console are very intently looking at their data right now. The intermittent data on sensor #2 is somewhat of a surprise because we haven’t seen that before on that sensor.”

So as it looks, the ECO sensor system behaves again different from the last tanking, which was at the December, 9th launch attempt.

8:23a: NASA TV: “The propulsion console has provided confirmation that sensor #1 on liquid hydrogen has hard failed, which is what we wanted. The #2 and #3 are intermittent, we are watching for what those are going to do. We have not seen any activity of number 2 sensor on prior tankings.”

8:26a: NASA TV: “The hard fail on #1 is definitely what we need in order to do the troubleshooting at the pad later this morning.”

So far, this looks very good. Based on analysis done in the past days, sensor #1 is not expected to return to a non-failed state.
So this one will hopefully a good candidate for the troubleshooting which will then be able to pinpoint the culprit. The intermittent failures of #2 and #3 are outside of expectations, at least as far as I understood the analysis done. However, if they now remain functional there still may be a good explanation for that. Let’s see how things evolve…

9:32a NASA commentary just announced that engineers have made up their troubleshooting plan. “The TDR sensor equipment will be focussed on sensors #2 and #3 as they have been intermittent and it is the greatest source of interest to see where that intermittent reading is coming from. Number 1 and 4 will be recorded continuously. … Right now they are configuring for the console activity that would be required once we go into stable replenish … very shortly. … So the final inspection team is preparing to enter into the pad, but we probably have another half an hour or so before we are into stable replenish.”

9:48a NASA TV: “The liquid hydrogen tank is full now and in stable replenish. The liquid oxygen (LOX) is at 80%. We’ve probably about another 25 to 30 minutes or so before liquid oxygen is in stable replenish.

10:00a: NASA TV: “We will begin to drain the tank between 1p and 2p and once the tank is drained the teams go back to do additional trouble shooting. A lot of data has already been collected on sensor #1 because it failed early.”

10:13a: “The problem with the liquid ox pump appears that a fuse has blown, so they are setting to the backup lox pump, … which will us back to where we were close to the stable replenish. This will take around 45 minutes to an hour. Why the fuse blow is still not clear”. So we have some delay in the troubleshooting plan.

11:45a: I needed to go out for a while, thus no updates. Atlantis is now in stable replenish state and troubleshooting is right now happening at the pad. The NASA TV commentary just announced that teams are now looking at ECO sensor #4 and the 5% sensor. Here are some pictures from the testing:

The picture above shows a room that is right inside the mobile launcher platform. This is where data today is analyzed.

11:52a: NASA TV “The … team is now on the mobile launcher platform and gathering data on sensor number 2 and 3.”

11:53a: a side-note: the ISS spacewalk just has completed, the spacewalkers are back into the airlock, which is now being repressurized. They did not find any obvious problem, but collected samples to be returned to earth in January on board STS-122.

12:06p: NASA TV: “Data will be collected remotely during de-tanking.” The equipment is currently being set up for this process. People are permitted close to the space shuttle only during stable replenishment phase (which essentially means no tank or detank operation is taking place). During detanking, they must leave for safety reasons. The set up process can be seen in this picture:

6:12p: and now those members at the controls can be seen leaving the control room. Everything is now set up for remote monitoring. The NASA TV commentary tells that good data has been gathered. The tank will be drained to 5% while instrumentation is on. Once it is fully drained, members of the red crew will return for further troubleshooting.


And on this picture, a wire can be seen. This wire is tapped into the ECO sensor circuits and connects to the time domain reflectometer (TDR) equipment that is used to do a detail analysis of the circuits. TDR is an off-the-shelf technology in wide spread use e.g. by cable companies to detect faulty lines.


around 12:25p: I was too slow for a quote, but NASA TV commentary stated that everyone seems to be quite happy with the data gathered. It sounded like a successful test. The commentator also announced that more – and official – information is available in the news briefing later today. For me (being six hours ahead), it probably is too late, but I’ll then do a wrap-up tomorrow morning. Let’s hope that they have actually managed to find the culprit and nail it down – and also let’s hope that the rest of the day (detanking test) also goes well.

12:40p: it looks like the red crew is now leaving the pad after finishing late inspections. I noticed they used binoculars and telescopes during that process. I (now) think they also visually checked for ice buildup and maybe some other things. Probably just part of the usual procedure. Here, the red crew can be seen leaving the mobile launcher platform (the last frame I was able to capture, sorry for the blur…).


The NASA TV commentator announced a few minutes before that engineers are ready for detanking once the red crew leaves. So I assume detanking will begin shortly.

12:48p: NASA TV: “The final inspection team has now left the pad … In the mean time we had a failure of ECO sensor #3, the second sensor that had failed. It was intermittent … and we are collecting data on sensor #3 … right now which is able to be done remotely much as we got the initial data from sensor #1 when it failed earlier today.”

I didn’t get the full quote, but as far as I understood, sensor #3 has now also a hard failure. That, IMHO, would be somewhat bad news because it means that the condition will change once the tank is loaded. However, this is just my uninformed guess. Also, on a actual launch, the shuttle would be long gone and fly in orbit. So that may not mean anything at all. I should stop speculating ;)

12:52pm: NASA TV: “We are close now to start the external tank draining operation. and we did indeed get the kind of data that we needed to get to the (bottom of?) that trouble.”

12:58p: NASA TV: “We did see sensor #3 also fail, so we try to get some data from it before we start draining.” So it actually has a hard fail and NASA uses the opportunity to get something out of that. To me this sounds like it failed just a few minutes ago, otherwise the red team would have looked at it (wouldn’t they?).

01:00p: NASA TV: “on the recently failed ECO sensor #3” – and indeed, it recently failed. Right in time, one could say…

3:00p: as I wrote, I have been away from my computer for a while. NASA TV has ended coverage of the tanking test in the mean time, so I can not provide any more status updates. I’ll try to do another post after the news briefing, but that will probably be tomorrow (I hope I’ll be able to get hold of an archived version of the news briefing). All in all, it looks like an excellent day at Kennedy Space Center, with good results that will bring up the root cause of the ECO sensor issue. Let’s hope that I have the right reception of today’s events.

Thanks to everyone who cared reading my post! I hope it has been useful and will be as a reference.

I’ve now also written down some results from the post-test news briefing.

Tracking file deletions on Windows

Have you ever wondered why an important file magically disappeared? My co-worker Andre has worked with a couple of folks, who didn’t like that scenario. As a corporate policy, deletion to some important file locations must be logged. Andre has created a nice guide for use under Windows. He utilizes MonitorWare Agent’s event log monitoring capabilities together with its advanced rule engine.

That guide is not only a good source of information if you need to implement Windows file deletion tracking. It also shows nicely what can be done with MonitorWare. It definitely helps understanding the full potential and how to make best use of it.

I suggest you have a quick look at Andre’s guide to tracking file and directory deletions under Windows.

ISS SARJ inspection spacewalk has begun

Picture taken shortly after the begin of the ISS SARJ inspection spacewalk on December, 18th 2007.The international space station ISS crew has headed outside of the orbiting complex to check the contamination of the starboard solar array rotary joint (SARJ). They are performing this task while I am writing. First results are expected during the course of today, with a detailed analysis to follow some time later (depending on the findings).

Side-Note: Atlantis tanking test will begin in roughly an hour from now. Final preparations are underway.

Tuesday: ISS Spacewalk and Shuttle Fueling Test…

Tomorrow is a busy day for NASA – both on earth as well as in orbit. Seems to be “troubleshooting Tuesday”: The international space station crew performs a spacewalk to check out what is wrong with the orbiting laboratory while the ground crew at Kennedy Space Center checks out their supply vessel. Both activities are in support of the international space station program. Let’s hope everything turns out well.

But now let me quote a mail that I received from NASA, it is an excellent wrap-up of tomorrow’s activities, including ways to experience them first hand:

NASA Television will provide simultaneous live coverage of a spacewalk by the International Space Station crew and a shuttle fueling test at NASA’s Kennedy Space Center, Fla., on Tuesday, Dec. 18.

Expedition 16 Station Commander Peggy Whitson and Flight Engineer Dan Tani are set to venture outside the station at about 6 a.m. EST to perform a detailed inspection of a giant rotary joint where contamination was found last month. The joint is used to rotate the starboard solar arrays of the complex to face the sun. The astronauts also will devote part of the spacewalk to an inspection of a device that tilts the starboard arrays toward the sun. The device, known as a Beta Gimbal Assembly, experienced unrelated electrical problems last weekend.

NASA TV’s public channel will begin coverage of spacewalk activities at 4:30 a.m. A briefing will follow the spacewalk, originating from NASA’s Johnson Space Center, Houston, no earlier than 1:30 p.m.
Reporters will be able to ask questions from participating NASA sites. The briefing participants are:

— Mike Suffredini, International Space Station Program manager
— Ginger Kerrick, International Space Station spacewalk flight director
— Tomas Gonzalez-Torres, International Space Station spacewalk officer

At 7 a.m. EST, Tuesday, NASA TV’s media channel will begin coverage of a fueling test of space shuttle Atlantis at Kennedy’s Launch Pad 39A. The fueling test will assist engineering efforts to resolve a problem with an engine cutoff sensor system that prevented Atlantis’ launch attempts earlier this month. Reporters will be notified of any plans to hold a news briefing following the test’s conclusion.

For NASA TV streaming video, schedules, and downlink information,visit:

http://www.nasa.gov/ntv

For more information about the space station and the Expedition 16 crew, visit:

http://www.nasa.gov/station

For more information about space shuttle Atlantis’ upcoming STS-122 mission, visit:

http://www.nasa.gov/shuttle

rsyslog changes upto 2007-12-14

This is my worklog for rsyslog:

2007-12-12
– begun to shuffle the mark code to a separate module – that will take some
time and definitely require much more code shuffling. This is the begin
of the input module interface

2007-12-14
– created new branch for what will become 2.0.0 stable
– begin work on on immark, the first input module. In the long term
this will lead to a complete rewrite of the input system
– changed license to GPLv3 (for what is to become rsyslog v3)
– moved core threading helpers out of syslogd.c
– remove USE_PTHREADS macro from all sources except omfwd.c (I wait
for a gssapi patch from Red Hat, removing these macros would probably
cause unnecessary grief…)
– tried approach to terminate input module thread via pthread_kell() – so
far, seems to work ok
– begun to create input module interface and macros
– changed module interface to include function to query type
– milestone: can load input module dynamically, but can not do anything
with it – now I need to think about activating IMs…

STS-122: Atlantis being instrumented…

Space shuttle Atlantis stands on Launch Pad 39A at NASA's Kennedy Space Center in Florida. Photo credit: NASA/Cheryl MansfieldTroubleshooting efforts at Kennedy Space Center (KSC) continues: Space shuttle Atlantis has been instrumented for a tanking test scheduled to begin Monday. In parallel to the work at KSC, engineers at other NASA centers gather data about how the elements of the ECO sensor system should respond during the tanking test. This provides a baseline against which the actual tanking test results can be compared. Unfortunately, it is not sure if the failure will re-occur during the tanking test. But according to the latest findings it is quite likely.

Please also let me quote the relevant part of NASA’s shuttle home page:

Engineers and technicians at NASA’s Kennedy Space Center continue preparations to evaluate the hydrogen fuel sensor system on space shuttle Atlantis’ external fuel tank during a procedure next week. Working at Launch Pad 39A where Atlantis remains pointed to space, workers attached wiring to the cables that lead from the aft compartment of Atlantis to the external tank’s engine cutoff sensor system. Engineers will use the special instruments next Tuesday to send electrical pulses into the wiring and look for indications that will show the location of the issue that caused the sensors to return false readings last week. The failed readings showed up during launch countdowns on Dec. 6 and Dec. 9. Launch controllers postponed the liftoff on both occasions to find out the problem and develop solutions. Evaluations of the instruments themselves are also under way to show technicians what a normal reading on the external tank looks like. Those readings will be compared to the results from the test Tuesday during which the tank attached to Atlantis will be filled with super-cold liquid hydrogen. NASA is targeting Jan. 10 as the next possible launch opportunity for Atlantis on mission STS-122. Atlantis will carry the European-built Columbus laboratory to the International Space Station.

STS-122 now set to Launch January, 10th 2008

The target launch date for space shuttle Atlantis STS-122 mission to the international space station has now been moved to January, 10th. Originally, it was set for January, 2nd when an ECO sensor problem made it impossible to lift off during the December 2007 launch window. The January, 10th date has now been selected to allow NASA workers to get some rest. They have been extremely busy. The holiday period is now a perfect time to make sure everybody is in a great shape when it comes to the next launch attempt.

As far as I know, launching exactly on January, 10th will not affect the overall shuttle launch schedule for 2008. However, I suspect even a further one day delay means trouble for the flight plan.

And finally, this is what the NASA shuttle home page has to say:

NASA’s Space Shuttle Program managers have targeted Jan. 10 for the launch of shuttle Atlantis’ STS-122 mission to the International Space Station.

“The workforce has stepped up to and met every challenge this year,” said Wayne Hale, Space Shuttle Program manager at NASA’s Johnson Space Center. “Moving the next launch attempt of Atlantis to Jan. 10 will allow as many people as possible to have time with family and friends at the time of year when it means the most. A lot has been asked of them this year and a lot will be asked of them in 2008.”

The liftoff date from NASA’s Kennedy Space Center, Florida, depends on the resolution of a problem in a fuel sensor system. The shuttle’s planned launches on Dec. 6 and Dec. 9 were postponed because of false readings from the part of the system that monitors the liquid hydrogen section of the tank.

begun working on rsyslog v3

I reproduce a note here that I sent out to the mailing list this morning. In the mean time, I have done most of the work in CVS.

As you know, I am looking at the way threading is supposed to work in future releases and, most importantly, looking at the inputs (like mark message generation).

Around summer, I wrote that I will probably need to release new major versions when we go into multithreading redesign. It looks like we have reached this stage. I tried to keep a single code base that still supports both single- and multi-threaded operations. I have looked into this the past days and I need to say that it creates a lot of complexity and hard to understand code.

For this reason, I think it is finally time to branch the code based and release some new versions.

Soon, I will create a branch for the current 1.20.1 code base. That will only receive bug fixes, but no new development (except, I guess, GSSAPI which I about to be contributed by Red Hat). When we are confident the last changes worked well and introduced no new bugs, there will be a version 2.0.0 stable release based on that code base.

CVS head, however, will then be rsyslog version 3. It will receive the new input module interface. It requires pthreads, because there is no way input modules and many more of the new desired features can be implemented without them. Consequently, I will remove all single-threading code from it, resulting in an easier to understand code base. Please note that I expect this code to change dramatically when it is being modified to be more modular (much like it was when I introduced modular outputs in summer). Please note that I will apply any non-bugfix patches to this code base, only.

I have somewhat bad feeling of going ahead with implementing a more sophisticated and more parallel multi-threading while we still have an issue with the segfault. However, I think by now we did everything imaginable to capture that rare bug. I have come to the conclusion that the best chance to find it is go ahead and implement the more sophisticated design. That will lead to a review, and rewrite, of much of the code in question, uncovering this we didn’t think about before. The recently discovered race condition is an excellent sample.

One thing about the license: rsyslog 2 will stay with “GPL v2 and above” license, but rsyslog V3 will be licensed under “GPL v3 and above”. I already wrote about that change. It is my firm believe that GPL v3 brings benefit to our freedom to use digital goods. I am a strong oppose of digital restrictions management (DRM) and software patens and I do not like the idea that rsyslog benefits anyone who encourages these things. I hope for your understanding.

I will set stage now for these changes and will do a web announcement soon. Please don’t be surprised that rsyslog v3 will be available before v2, you now know the reason.

ISS Spacewalk on Tuesday

The International Space Station is viewed from space shuttle Discovery after undocking during the STS-120 mission.The international space station ISS crew will put the time until the next space shuttle visits the orbiting complex to good use. A spacewalk is scheduled for next Tuesday. It is part of the ongoing troubleshooting of the solar array rotary joint (SARJ) problem problem that troubles the station for some weeks now.

The SARJ issue reduces power generation from the solar array. This is currently no issue, but when more modules are added, it becomes a constraint. The Columbus module, to be delivered by Atlantis whenever STS-122 is ready to launch, can operate with currently available power. However, the Kibo module, rocketed into space with STS-123, will probably exhaust current power availability. As such, it is vital to solve the issue with the rotary joints.

An international space station's solar array rotary joint (SARJ) shown inside a NASA presentation.
Previous spacewalks found some material on the race ring, a result of abrasion. There is a backup race ring available, but it will not be activated until the root cause of the problem is understood.

And now let me quote the NASA ISS home page:

Station Commander Peggy Whitson and Flight Engineer Dan Tani will perform the 100th spacewalk in support of International Space Station assembly on Tuesday, Dec. 18. The spacewalk will focus on the starboard solar arrays. Whitson and Tani will examine the starboard Solar Alpha Rotary Joint (SARJ) and return a trundle assembly to the station’s interior.

Whitson and Tani also will examine the Beta Gimbal Assembly (BGA). It tilts solar wings for optimal power generation. The starboard BGA has been locked since some power feeds to it were interrupted last Saturday.

While spacewalk preparations are under way, the docked Progress 26 cargo ship is being loaded with discarded items and readied for undocking on Dec. 21. Progress 27 will arrive at the station with supplies on Dec. 26.