Rainer Gerhards

2007-12-31

rsyslog work log and future directions

Hi folks, probably the last rsyslog work log post for 2007. Thanks for sticking around – and hopefully I’ll see you again in 2008. It’ll become a very exciting year, with a lot of new features. I am eager to implement what is right now on my head, and I’ll most probably will start with modifying the message queue, an endeavor that will ultimately lead to store-and-forward capability just like in syslog-ng’s premium edition. And the good news is that I hope to finish that in January 2008 ;) — what also means that I have made up my priorities. Was not an easy job, and I hope I got it right. So store-and-forward with enhanced output threading is first and the other things will follow later. To me, the hardest decision was to put off expressions, another feature at least I would like to see the sooner the better.

But now back to the work log:
2007-12-27
– added $UDPServerAddress config directive
– added capability to have multiple UDP listeners running concurrently
– applied cross-platform patch from darix to facilitate GSS-API compile
on more platforms
– some cleanup
– internal restructuring in omfwd.c – stage work for further modularization
I think I also fixed a bug as a side-effect – but not looked to much at it
2007-12-28
– took TCPSend() apart and made it generic via function pointers
– moved TCPSend() and frame building code to tcpsyslog.c
– omgssapi created
– removed gss-api code from omfwd.c

2007-12-30

Atlantis troubleshooting continues…

The analysis of space shuttle Atlantis ECO-Sensor trouble continues. Everybody focuses now on the Feedthrough Connector.
I have been involved in some (heated ;)) forum discussion on why NASA takes the time to analyse the issue and not provide a quick fix now that the culprit is know (remember, the Feedthrough Connector has been identified as the trouble spot via the TDR data tanken on the last tanking test).

A quick wrap-up on the connector: it is used to feed several signals through from the tank’s interior to the Orbiter systems. Among them are the ECO sensors as well as the 5% sensor signal. The connection essentially consists of three parts. The schematic can be found in the small picture above (click to enlarge). I found a more detailed sketch in the STS-114 Flight Readiness Review (FRR) document:
As you can see, the system consists of an external connector, the actual feedthrough that goes through the tank and an internal connector. As far as I know, the sole purpose of that system is to feed the internal signals through to the external stack parts while ensuring that there is no leak in the tank.

The external part of the connector has yesterday been unmounted and sent over the Marshall Space Flight Center for analysis. As far as I know, analysis results will be available on January, 3rd and will be the basis for the discussion on how to continue.

But back to my forum discussions. Why at all is an analysis been made?

There are a lot of technical words that could describe it. I will take a different route. I’d like to use a real-world analogy that most of us would probably be able to follow ;)

Let’s assume you own a house and it is xmas time. Chances are good you like to decorate your front yard with some nice lights. These lights need power and you need to draw that power from an outlet somewhere in, let’s say, your garage. Of course, the light’s power cord is to short, so you use an extension cord to connect the in-garage power outlet to the decoration’s plug in your yard. Everthing works perfectly and you are really proud of your fine lights.

But then, out of the sudden, a fuse blows and your lights go off. You begin to analyze the problem. One thing you notice is that it started to rain. But everything worked well a couple of times when there was rain, too. You blame the fuse, after all it was a pretty old one. So initially, you just go away with the issue, use a new fuse and be happy again.

After a while, on another rainy day, the fuse blows again. This time, you know it is a real problem. You do another analysis. In that course, you know that it must somehow be related to the extension power cord (let me just assume you somehow magically know it is ;)).

This is where we are with the ECO sensor system currently. The extension power cord is my analogy of the feedthrough connector. Testing done during the tanking test pointed at that connector, just as you now know it is the extension cord. However, test data did not say exactly what is wrong with the connector. In my sample, we also do not know what’s wrong with the extension cord.

Back to the sample: so what to do? If do not want to do a more in-depth analysis, we could simply replace the extension cord (just like we replaced the fuse) and hope that all is well. This might work, especially if we had little trouble in the past. It is also a quick fix, which is useful a few days before xmas (aka “time is running out).

If we take a bit more time, however, we might want to analyze what the root cause is. If we do, we may find out that the extension cord indeed was faulty. Maybe its water protection was damaged. Then, we’ll end up with swapping the cord, but this time with a very good feeling and confidence that our lights will stay on.

But analysis may also show less favorable results: maybe we find out that the cord is perfectly OK. But we made a “design error”. Maybe we find out that we used a non-outdoor rated cord in our “yard light system”. Replacing that cord with a like part would bring no improvement at all. In this case, we would need to do some more change – using an outdoor rated cord would be appropriate. Again, we could modify our power supply and have a very good feeling about its future reliability.

Unfortunately, space hardware is a bit more complex than xmas decoration. So analysis takes a bit longer. But it still offers the same benefits: if you look at the root cause AND be able to find it, you can reliably fix the system. Of course, there are limits and constraints. Too large delays bear other risk. It is NASA management’s task to weigh benefit and risk and do the right thing.

Oh, and one more note: I’ve heard so often “NASA just needs to do fix and everything will be easy. Just throw out that feedthrough connector…”.

Let me use my xmas decoration analogy once again. What that means is that you get a big digger and get rid of the extension cord at all – by creating a permanent electrical circuit with its own fully-outdoor-proven outlet right at the decoration. Of course, this is doable. Of course, it’ll fix the problem. It is just a “simple redesign” of your system. If think, however, that it is not the real smart answer to the problem you faced.

And I think many of all those quick fixes now being proposed “just let’s redesign the shuttle…” are along the same lines. If, of course, they were technically sound… ;)

I hope my sample helps clarify why there is analysis on the ECO sensor problem and why this is a good thing to have. Even though it may push Atlantis launch date a bit further down the calendar.

2007-12-28

gss-api and rsyslog v2

I initially sent this message only to the mailing list. But now I think it make sense to reproduce it here. So there we go:

I am working on the modular structure of rsyslog v3. I am currently revisiting gss-api support. I notice that with the current omfwd, it will be extremely hard to separate gss-api support into its own module. Doing so will break backward compatibility to the configuration file.

GSS-API has been out only for a few days, and mostly over the holiday period. So it is much less of a concern if we introduce now some changes that will case rsyslog.conf format modifications. Much less trouble than when we release v2, a release expected to be in wide use for at least half a year, if not much longer. V2 released with the current syntax would require me to do some tricks in v3 to keep compatibility. Quite complex.

So I decided to create a omgssapi for v3 and extract the gss-api code from omfwd. It looks like this can be done without too much code duplication. There will be some duplicate code, but it will shrink as v3 continues to be developed. Once I have a good working version, which I expect very soon, I will backport that to the v1/2 source tree. I’ll then do a new v1 release with a slightly incompatible gss-api config file syntax. After this is out for a few days, I hope I can than finally push out that version as v2.

I hope this is a good decision. I think it will save us major future trouble at the expense of a relatively slight disturbance in the late v1 timeline. I guess most user’s won’t even notice there is a change.

As always, Feedback is appreciated.

2007-12-27

Atlantis’ launch moves to February 2008

Space Shuttle Atlantis is likely to launch not earlier than February. The target launch date of January, 10th 2007 for Atlantis STS-122 mission has been pushed back to early February. This is the result of today’s mission management meeting.

The official NASA shuttle home page is even a bit more cautious:

NASA space shuttle managers met Thursday and decided to modify a fuel sensor system to correct false readings that postponed shuttle Atlantis’ planned launches on Dec. 6 and Dec. 9.

Testing and analysis indicate that false readings from the engine cutoff sensor system occur in a three part feed-through connector. The connector passes electric signals from sensors inside the external fuel tank to shuttle electronics outside the tank. Technicians will remove portions of the connector and redesign the interface by soldering the pins to sockets at the external-to-feed-through side of the connector prior to installing the replacement into the external tank.

It is unknown how long it will take to complete the modifications and reapply foam to the shuttle’s external tank. Managers will assess the progress of the work before determining a new target launch date for Atlantis.

The emphasis is mine. The repair is quite time consuming. I have currently no details, but it looks like there will be no full root cause analysis. That would probably have required a rollback to the VAB and a destack, and that process would make an early February launch date impossible. I hope to get more details soon and will post them when I have.

2007-12-27

STS-122 Launch Date Decision Today?

Today is the mission management team meeting for space shuttle Atlantis STS-122 mission. The meeting reviews work progress, data gathered and plans drawn and will finally conclude with a recommendation for the next steps. It is assumed that the launch date is targeted.

While January, 10th is still given as an option, news have condensed that this date will most probably not see a launch. It looks like a launch late in January or mid-February is more realistic. In any case, we’ll know much more when the meeting has concluded later today. A press briefing is expected shortly after the meeting. As far as I know, it was set for 10:30am ET, which means the meeting must have started right now.

Let’s see how the day evolves…

2007-12-27

Things to do in rsyslog…

I have made good progress with rsyslog‘s input modules. As it looks, the basic things are done and the input module interface has been proven to be both quite stable as well as very simple. It doesn’t yet support different instances, but I begin to think that I do not even need them – also not in the long term.

Of course, most of the current input modules are not clean modules. They have a lot of dependencies to other parts of the code, which is not yet able to be dynamically loaded. But at least there is a foundation on which additional modules could be build. Getting the current input modules to be real clean modules will require further stage work. Many thanks need to be done.

So what to do next? It now comes down to both a matter of priorities and dependencies. I am writing this note here mostly for myself. It helps my clear up my thoughts and will also probably serve as a reference for quite a while. My thoughts may be hard to understand – sorry for that. But I thought I make them public when I write them down – even if they are not really targeted toward others. I still hope they may help you get some more background info.

So what’s to do:


- find a way to handle global settings
- multi-threaded output modules
  a perquisite for
  - create queued outputs (write to queue if action fails and 
    restart when it resumed)
- re-write way config file is read
  probably perquisite for:
  - create expression support
    - in templates
    - in selector filters
- create interface for (loadable) user function modules
- create a system to allow loading "library" loadable modules
  (e.g. network library for imudp, imtcp, ...)
  - separate GSSAPI from plain TCP (requires libs and lib extension system)

2007-12-27

xmas rsyslog work log

I’ve been a bit busy with rsyslog over the xmas break. Here are the changes:

2007-12-25
– moved some more net functionality out of syslogd.c – stage work
– fixed duplicate license text in syslogd.c – made it ambigious
– moved udp net code – again, stage work
– moved some of the udp input code to its right place
2007-12-26
– moved cross-platform define for AI_NUMERICSERV to net.h
– made udp code somewhat less dependable on global variables — stage work
– removed omfwd code dependency on “finet”
– removed imudp code dependency on “finet”
– removed active INET code from syslogd.c – still some auxiliary things
remain
– fixed socket leak in omfwd.c
– removed global variable LogPort
– removed gloable variable AcceptRemote and external def of bFinished

2007-12-26

NASA’s holiday break

The folks at NASA enjoy a few well-deserved days off during the holiday season. They have worked very hard and finally gotten close to the root cause of the ECO sensor problem. Well done!

As nasaspaceflight.com reports, there are some new problem reports. However, I think this is nothing really unusual, problems appear every now and then. Right now, we are just more aware of them. I personally think we should not get to concerned about them, at least not until further facts are known.

The mission management team is set to meet again on December, 27th. Then, they will look at the work done so far. Besides some work at the orbiter, this most importantly includes plans drawn to address the ECO sensor problem.

From what I have read, the actual root cause is still unknown. It is know that the problem is inside the LH2 feedthrough connector, which is good and provides a lot of repair options. However, the question why the connector suffers problems is not answered yet. As of my understanding, NASA prefers to get hold of Atlantis’ feedthrough connector. That would enable detailed analysis with the actual failed part – and thus there is an excellent chance of finding the root cause. However, that is probably one of the more time consuming options. If that route is taken, launch would be further delayed, and January 10th would not be an option any more.

With just the little information I know, I think it would be useful to sacrifice the launch date in order to get access to the failing feedthrough connector. Remember: the external tank is the only part of the space shuttle stack that is not reusable. As such, analyzing the feedthrough connector after launch is not an option. I personally think it would make more sense to launch, let’s say, in February if that offers the choice to find the root cause. That would not only be good for the remaining shuttle flights, but could also provide valuable “lessons learned” for the Constellation program. Even if Ares will not fly ECO sensors (I don’t know…), the root cause may show something that we do not yet know, be it electrical engineering related, material sciences or whatever else. Getting that missing information can possible increase our understanding and help prevent other, not yet know, problems in future equipment.

But again, keep in mind I have very limited insight. Maybe NASA has even a way to find root cause and still maintain the January, 10th launch attempt. I don’t know for sure. But I know that after the December, 27th press briefing we’ll probably know more. And if you plan to travel to see Atlantis January launch, I wouldn’t book my tickets too early…

2007-12-24

Seasons Greetings to Everyone

My best wishes to everyone! Let me share this lovely impression:

I thought this image conveys much of the beauty of our planet earth and the hopefully peaceful holiday season. I wish all of you great holidays, nice gifts and time with your beloved ones.

In 2007, we’ve gone a long way. Both from an Adiscon perspective, with lots of new product releases and great features and also from the rsyslog point of view. And, of course, there were private highlights as well, for example my unforgettable trip to view space shuttle Discovery’s sts-120 launch. Thanks everyone for your support and all the kind words I received!

Once again, a great holiday season to all of you!

PS: if you enjoyed the image above, you may want to have a look at my xmas 2007 impressions gallery.

2007-12-24

Seasons Greetings to Everyone

My best wishes to everyone! Let me share this impression from my backyard:

Obviously, this is not space-related, but I thought I thank all my regular readers for sticking with me. 2007 has been a very cool year, with me viewing the STS-120 launch being a definite highlight. And, of course, 2007 was the year that made me start this blog.

I wish all of you a peaceful holiday season, great gifts and time with your beloved ones.

And as this is my spaceflight blog, please also help crossing fingers that 2008 will be a great spaceflight year, with Discovery’s STS-122 being launched early in the year. Besides, will see the first European ATV launch, the first Ares launch and many other very cool and interesting things. I’ll try to follow all of them. I’d be delighted if you keep reading my blog!

Once again, a great holiday season to all of you!

PS: if you enjoyed the image above, you may want to have a look at my xmas 2007 impressions gallery.