rsyslog work log for 2008-01-06

Here is the rsyslog work log for yesterday:

2008-01-06
– fixed a bug with integer conversion in srUtils.c
– changed some lib functions to work on long instead of int
to care for 64 bit platforms (just to be on the save side)
– worked a bit on object serialization
– cleaned up msg structure (interestingly, there were for example
two fields with identical meaning and iSyslogVersion was never
used ;))
– completed serializer for msg (but needs review)
– did a little bit performance cleanup
– worked on object header (now also contains the size)

rsyslog threading

If you followed my work logs or CVS updates, you’ve probably seen that I have worked quite a bit on rsyslog‘s threading. So I thought I share a view “design documents” that cover up the big picture.

Michael Biebel asked me for a few graphical representations of how the modules interact and what the message flow is. I am not a real good computer graphics guy, and an old-fashioned one. So I thought before I let you wait any longer, I share some of my hand sketches. They are not fancy, probably hard to read – but maybe still helpful. Find them below. A klick bring up the hires version, which is a bit less hard to read ;)

I’ll try to add better graphics and descriptions as soon as I find some time. But I have to admit that I currently have so many things on my mind that I’d like to code first. So it may take a short while.

recent rsyslog work

Here is the rsyslog work log for the past days:

2008-01-03
– fixed a few typos noticed by Jonathan Smith – thanks
– moved queue code to its own module (finally)
– restructured queue interface to use rsRetVal and instances, removed
dependency on globals – now more like a real class
– implemented queue type “drivers”
– queue is now a full object and handles threading by itself
– applied Michael Biebl’s patch to clean up the makefiles
– added capability to use a linked list for queuing to the queue class
– added $MainMsgQueueType config parameter
– some cleanup
– added $SpoolDirectory config parameter
– added $MainMsgQueueFilePrefix config parameter
– begun working on disk queueing (not completed, do not use this mode!)
– begun some work on Msg Object serializiation
2008-01-04
– created a kind of general base class
– removed serialization pointer from queue; used new base class instead
– utilized the new auto-destruction capability so that the queue can now
destruct user objects if needed
– changed queue object Construction/Startup interface
2007-08-05
– added capability for concurrent access to the msg class. Can be dynamically
activated. If active, locking is employed.
– added the “direct” queueing mode to queue class (no queing at all)
– added multiple worker thread capability to queue class
– implemented $MainMsgQueueWorkerThreads config directive
– removed some no-longer-needed code (thanks Michael Biebl for the help)

Atlantis to launch on January, 24th?

The NASA space shuttle home page currently states that Atlantis could possibly launch on January, 24th. However, there are serious doubts about that date. From what I have found on the net, early February sounds much more realistic – with a launch on February, 2nd if there will be no further tanking test conducted. The most likely scenario, however, seems to be a launch no early then February, 8th.

Unfortunately, I am currently very busy with one of my projects and thus can not report more in-depth. That will follow hopefully soon. In the mean time, let me quote the NASA shuttle home page:

NASA flight control teams and ground operations teams have been requested to protect for a Jan. 24th launch date for Space Shuttle Atlantis. As work progresses, that date will be modified as required, says John Shannon, deputy manager for the Space Shuttle Program. The schedule depends on test results and modifications to a fuel sensor system connector on the external fuel tank Atlantis will use for launch on its STS-122 mission to the International Space Station. Other launch opportunities could come between Jan. 24th and the first week of February.

The connector suspected of prompting false readings during two previous launch attempts is undergoing intensive testing at NASA’s Marshall Space Flight Center in Huntsville, Ala. Engineers also will test potential modifications to the connector to certify it for flight. Marshall has a test facility that allows the connector to be subjected to the same conditions it saw during the earlier launch attempts.

The modification and testing plans were discussed along with the launch preparation schedule during a meeting of Space Shuttle Program managers Thursday.

Technicians at NASA’s Kennedy Space Center, Fla., will modify a replacement connector for the one that was removed. Metal pins inside the connector will be soldered to the socket, Shannon explained. The new connector is scheduled to be in place by Jan. 10.

“We’re fairly confident that if the problem is where we think it is, that this will solve that,” Shannon said.

Atlantis remains at the launch pad as the agency studies ways to modify the connector. The shuttle will carry the European Space Agency’s Columbus laboratory to the space station during the STS-122 mission.

recent rsyslog work

A short rsyslog work log form the past days:

2007-12-31
– created omtesting, a debug and development aid output module. This is stage
work for the new queueing engine – we need a way to delay rule execution
and that’s what the module currently does ;)
2008-01-02
– released 2.0.0

Shuttle Feedthrough Connector Removal Pictures

As a new year’s gift, NASA has place twelve interesting pictures from the December, 29th removal of the feedthrough connector in the media gallery. The original format is quite a bit hard to read (at least in my opinion), so I thought I recompile them in this post.

The feedthrough connector was removed to be shipped to NASA’s Marshall Space Flight Center for further cryogenic testing. This is part of the ongoing space shuttle ECO sensor troubleshooting. If you doubt why further troubleshooting is needed, you may want to have a look at my “xmas decoration and space shuttle similarities” post ;)

Very interesting to see the technicians at work.

First, the external connector cable is cut:


Then, a pair of support brackets is removed:


Before disconnecting the connector assembly, it receives a cleaning, removing any residual foam insulation:


Then, the connector assembly, with its associated electrical harness, is pulled away from the tank:


Technicians set up equipment that will be used to take X-rays of the connector cable:


Then, the connector is disconnected before it is demated from the external tank:


And finally the demate occurs:


The technician then inspects the connector just removed from the external tank:


Technicians wrap the connector for transport to NASA’s Marshall Space Flight Center in Huntsville, Ala., for further cryogenic testing:


… and place the wrapped connector in a shipping container:


which is then finally carried away for transport to the Marshall Space Flight Center:

Nice work, guys! And now I am eager to hear about the testing results in MSFC! Stay tuned…

Image Credit for all pictures: NASA

rsyslog work log and future directions

Hi folks, probably the last rsyslog work log post for 2007. Thanks for sticking around – and hopefully I’ll see you again in 2008. It’ll become a very exciting year, with a lot of new features. I am eager to implement what is right now on my head, and I’ll most probably will start with modifying the message queue, an endeavor that will ultimately lead to store-and-forward capability just like in syslog-ng’s premium edition. And the good news is that I hope to finish that in January 2008 ;) — what also means that I have made up my priorities. Was not an easy job, and I hope I got it right. So store-and-forward with enhanced output threading is first and the other things will follow later. To me, the hardest decision was to put off expressions, another feature at least I would like to see the sooner the better.

But now back to the work log:
2007-12-27
– added $UDPServerAddress config directive
– added capability to have multiple UDP listeners running concurrently
– applied cross-platform patch from darix to facilitate GSS-API compile
on more platforms
– some cleanup
– internal restructuring in omfwd.c – stage work for further modularization
I think I also fixed a bug as a side-effect – but not looked to much at it
2007-12-28
– took TCPSend() apart and made it generic via function pointers
– moved TCPSend() and frame building code to tcpsyslog.c
– omgssapi created
– removed gss-api code from omfwd.c

Atlantis troubleshooting continues…

Space Shuttle Feedthrough Connector used for the ECO Sensors.The analysis of space shuttle Atlantis ECO-Sensor trouble continues. Everybody focuses now on the Feedthrough Connector.
I have been involved in some (heated ;)) forum discussion on why NASA takes the time to analyse the issue and not provide a quick fix now that the culprit is know (remember, the Feedthrough Connector has been identified as the trouble spot via the TDR data tanken on the last tanking test).

A quick wrap-up on the connector: it is used to feed several signals through from the tank’s interior to the Orbiter systems. Among them are the ECO sensors as well as the 5% sensor signal. The connection essentially consists of three parts. The schematic can be found in the small picture above (click to enlarge). I found a more detailed sketch in the STS-114 Flight Readiness Review (FRR) document:
As you can see, the system consists of an external connector, the actual feedthrough that goes through the tank and an internal connector. As far as I know, the sole purpose of that system is to feed the internal signals through to the external stack parts while ensuring that there is no leak in the tank.

The external part of the connector has yesterday been unmounted and sent over the Marshall Space Flight Center for analysis. As far as I know, analysis results will be available on January, 3rd and will be the basis for the discussion on how to continue.

But back to my forum discussions. Why at all is an analysis been made?

There are a lot of technical words that could describe it. I will take a different route. I’d like to use a real-world analogy that most of us would probably be able to follow ;)

Let’s assume you own a house and it is xmas time. Chances are good you like to decorate your front yard with some nice lights. These lights need power and you need to draw that power from an outlet somewhere in, let’s say, your garage. Of course, the light’s power cord is to short, so you use an extension cord to connect the in-garage power outlet to the decoration’s plug in your yard. Everthing works perfectly and you are really proud of your fine lights.

But then, out of the sudden, a fuse blows and your lights go off. You begin to analyze the problem. One thing you notice is that it started to rain. But everything worked well a couple of times when there was rain, too. You blame the fuse, after all it was a pretty old one. So initially, you just go away with the issue, use a new fuse and be happy again.

After a while, on another rainy day, the fuse blows again. This time, you know it is a real problem. You do another analysis. In that course, you know that it must somehow be related to the extension power cord (let me just assume you somehow magically know it is ;)).

This is where we are with the ECO sensor system currently. The extension power cord is my analogy of the feedthrough connector. Testing done during the tanking test pointed at that connector, just as you now know it is the extension cord. However, test data did not say exactly what is wrong with the connector. In my sample, we also do not know what’s wrong with the extension cord.

Back to the sample: so what to do? If do not want to do a more in-depth analysis, we could simply replace the extension cord (just like we replaced the fuse) and hope that all is well. This might work, especially if we had little trouble in the past. It is also a quick fix, which is useful a few days before xmas (aka “time is running out).

If we take a bit more time, however, we might want to analyze what the root cause is. If we do, we may find out that the extension cord indeed was faulty. Maybe its water protection was damaged. Then, we’ll end up with swapping the cord, but this time with a very good feeling and confidence that our lights will stay on.

But analysis may also show less favorable results: maybe we find out that the cord is perfectly OK. But we made a “design error”. Maybe we find out that we used a non-outdoor rated cord in our “yard light system”. Replacing that cord with a like part would bring no improvement at all. In this case, we would need to do some more change – using an outdoor rated cord would be appropriate. Again, we could modify our power supply and have a very good feeling about its future reliability.

Unfortunately, space hardware is a bit more complex than xmas decoration. So analysis takes a bit longer. But it still offers the same benefits: if you look at the root cause AND be able to find it, you can reliably fix the system. Of course, there are limits and constraints. Too large delays bear other risk. It is NASA management’s task to weigh benefit and risk and do the right thing.

Oh, and one more note: I’ve heard so often “NASA just needs to do fix and everything will be easy. Just throw out that feedthrough connector…”.

Let me use my xmas decoration analogy once again. What that means is that you get a big digger and get rid of the extension cord at all – by creating a permanent electrical circuit with its own fully-outdoor-proven outlet right at the decoration. Of course, this is doable. Of course, it’ll fix the problem. It is just a “simple redesign” of your system. If think, however, that it is not the real smart answer to the problem you faced.

And I think many of all those quick fixes now being proposed “just let’s redesign the shuttle…” are along the same lines. If, of course, they were technically sound… ;)

I hope my sample helps clarify why there is analysis on the ECO sensor problem and why this is a good thing to have. Even though it may push Atlantis launch date a bit further down the calendar.

gss-api and rsyslog v2

I initially sent this message only to the mailing list. But now I think it make sense to reproduce it here. So there we go:

I am working on the modular structure of rsyslog v3. I am currently revisiting gss-api support. I notice that with the current omfwd, it will be extremely hard to separate gss-api support into its own module. Doing so will break backward compatibility to the configuration file.

GSS-API has been out only for a few days, and mostly over the holiday period. So it is much less of a concern if we introduce now some changes that will case rsyslog.conf format modifications. Much less trouble than when we release v2, a release expected to be in wide use for at least half a year, if not much longer. V2 released with the current syntax would require me to do some tricks in v3 to keep compatibility. Quite complex.

So I decided to create a omgssapi for v3 and extract the gss-api code from omfwd. It looks like this can be done without too much code duplication. There will be some duplicate code, but it will shrink as v3 continues to be developed. Once I have a good working version, which I expect very soon, I will backport that to the v1/2 source tree. I’ll then do a new v1 release with a slightly incompatible gss-api config file syntax. After this is out for a few days, I hope I can than finally push out that version as v2.

I hope this is a good decision. I think it will save us major future trouble at the expense of a relatively slight disturbance in the late v1 timeline. I guess most user’s won’t even notice there is a change.

As always, Feedback is appreciated.