Rainer Gerhards

2007-12-23

NASA website problems…

Unfortunately, there seem to be some problems with at least a number of NASA websites. Most importantly, the KSC media gallery is no longer accessible. I noticed that yesterday, when I tried to find a good picture to go with yesterday’s article. Unfortunately, the site seems to be defunct and the problem still persists.

Of course, I doubted a local problem first. However, other users (from all around the world) report similar problems. Also, being a network guy, I traced the NASA site both from Europe and the US and it didn’t work in either case. NASA currently has IP packages being bounced between two systems, until the packets expire. Looks like either a router has died or there is a configuration problem.

Unfortunately, the media gallery is not the only resource that is down. Essentially, a number of KSC sites are affected and so I guess there is a problem with the main public Firewall (or whatever…). The bad news is that this will limit my ability to post nice pictures along with the blog posts. Obviously, I hope the situation will be resolved soon. In the mean time, you know why there are so few pictures in the blog these days ;)

2007-12-22

Atlantis STS-122 launch date not decided yet…

As I had written, the launch date for space shuttle Atlantis’ STS-122 mission has been moved to no earlier than January 2007. The tanking test last week seems to have brought quite solid data, but NASA has not yet decided which options to take.

There is still a chance that Atlantis can launch early January – but it looks a bit more like a further delay. I have read both about January, 25th and February, 14th as possible launch dates. In any case, if the shuttle can not be launched on the 10th, the rest of the shuttle flight schedule will be affected. If Atlantis mid-January or later, there is not enough time left to launch shuttle Endeavour’s SS-123 mission on February, 14th as originally planned.

NASA’s mission management team will meet again next week, on the 27th and see which additional data has been gathered. More importantly, repair options will have been thought out in the mean time and so it is expected that after that meeting the exact course of actions will be known.

There is already some work going on at the pad, but my understanding is that this is go forward work: it does some things that may be useful, based on what may be decided on the 27th. Not
doing that work right now, would limit options available.

If the January, 10th launch target can not be preserved, it is most likely that Atlantis will take up Endeavour’s launch window and the other missions move forward in an equivalent way. However, a new launch schedule will than probably be needed.

This also puts some pressure on the Constellation program – they need to wait for Atlantis’ STS-125 flight, the Hubble Space Telescope servicing mission. Only after that has been completed, launch pad 39B can be handed over to Constellation and be reconstructed. So delays in STS-122 will probably also affect constellation.

According to NASA, there is still sufficient buffer available to complete the international space station ISS before the shuttle fleet is set to retire in 2010. But that buffer is also eaten up, so this is probably another concern.

As you can see, there is a lot depending on STS-122. But I applaud NASA “better safe than sorry” approach. It is important that the space shuttle is safe to fly. And it is also important to understand that ECO sensor problem, so that the root cause will not bite again on future missions.

2007-12-21

rsyslog work log for 2007-12-21

It has been a good day today! Finally, the alarm() call has been deleted! :) That was something long on my agenda, but I couldn’t do it without the redesign of the inputs. The alarm() was not really a big issue, but it became an annoyance to me because it was so hard to remove.

I would also like to mention that I will do only occasional work during the holiday period. So I do not expect more serious changes until early January. Some releases, however, are due next week (maybe 2.0.0).

Here is the detailed rsyslog worklog for today:

– removed no longer needed mutex from omfwd
– released a preview of 3.0.0 “as is” to mailing list – just to get the idea
– begun work on imtcp
– created first version of imtcp (still very much depending on syslogd.c for
configuration and a lot of other things)
– cleaned up code (resulting in some shuffeling from syslogd.c to the
“right” module)
– prepared for imudp
– created an initial version of imudp.c. The majority of UDP reception code
is now in that module and it is dynamically loadable. HOWEVER, that doesn’t
mean it is a proper module. There are still many, many dependencies on
global variables, cross-module calls and such. However, havin the code base
separated allows me to carry out some other cleanup before I return to
create a really clean implementation of these modules. So it is kind of a
stage work. Just don’t mistake it with “the real thing”…
– removed code no longer needed
– finally, alarm() has gone away :) — this is now done by the main thread
– some cleanup

2007-12-21

rsyslog work log for 2007-12-20

Yesterday was a very busy day for rsyslog. I am on a good path to input modularization, but the hardest part needs still be done ;)

Here is the log:

– bugfix: fixing memory leak when message queue is full and during
parsing. Thanks to varmojfekoj for the patch.
– working on a potential race condition on the new input module
interface. See newsgroup posting for details on the issue:
http://groups.google.com/group/comp.programming.threads/msg/330b9675f17a1ad6
I tried some mutex operations but came to the conclusion that this
does not really help. So I have now switched to plain thread
cancellation, which so far seems to be OK. Need more practical
experience with other input modules to make a final decision. Thus
I leave all code in and have just disabled the problematic
code.
– implemented $klogUseSyscallInterface config directive
– implemented $klogSymbolLookup config directive
– moved unix socket code to its own module (imuxsock)
– implemented $OmitLocalLogging config directive
– bugfix: memory leak in cfsysline.c/doGetWord() fixed
– implemented $SystemLogSocketName config directive
– implemented $AddUnixListenSocket config directive
– MILESTONE reached: imuxsock initial version done
– removed single-threading support for sending TCP messages; caused
simplyfication of output module interface as well as core syslog
processing.
– moved udp send code to its own function

2007-12-20

rsyslog work log…

Here is my recent rsyslog work log:

2007-12-18
– removed files from cvs that not belong there (thanks to Michael Biebl for
pointing that out)
– restructured #include’s somewhat thanks to Michael Biebl
– code cleanups thanks to Michael Biebl
– applied Michael Biebl’s patch to enhance $includeconfig to support
wildcard filenames
2007-12-19
– applied some more cleanup provided by Michael Biebl
– applied enhanced gss-api functionality provided by varmojfekoj
– GSS-API support for syslog/TCP connections was added. Thanks to
varmojfekoj for providing the patch with this functionality
– release 1.21.0
– added the -c option
– enhanced -c option support (some basics)
– bugfix: llDestroy() left the list with invalid root/last pointers

2007-12-19

Carnival of Space #34

Hello everyone and welcome to the 34th Carnival of Space. Usually, I write about spaceflight and mostly about the space shuttle, ISS and Constellation programs. For the carnival, of course, I’ll broaden my reach. I think Fraser for trusting me with this weeks carnival, much appreciated.

The end of the year is approaching (too fast, as always) and, of course, this calls for a number of “best of the year” things. And, of course, there are now twelve Astronomy Pictures of the Year for 2007. One is more breathtaking than the other. And of course, the bad astronomer has his own stunning ten favorites.

And the bad astronomer also tells us why we should enjoy life now – look at the death ray from 3C321! Steinn Sigurdsson also writes about 3C321 and also has a link to a nice animation. And Centauri Dreams speculates about “Gamma Rays and Civilizations” or, better said, the extinction of the later.

It is also xmas time – and FlyingSinger is giving away a Mars picture book … where he documents his simulated mission to Mars. A well-done and very inspiring work. And Colony Worlds has just right in time posted a solution to maintain human body strength on other celestial bodies. They use Gravity Suits for Off-World Children.

Back on earth and with real hardware, Ian remembers the first Australian satellite, which just happened to have its 40th anniversary. Coming closer to the present, I have followed NASA’s Tuesday space shuttle tanking test. I hope it captures some of the excellence with which engineers over there work.

The Babe in the Universe fills the gap between now and then: she looks at the moon and NASA’s activities about it. So, among others, she noticed that NASA Associate Administrator Alan Stern announced selection of the GRAIL mission to the moon. Here it fits well that Advanced Nanotechnology talks about scramjet technology, which may also provide an alternative to regular rockets.

The Space Cynic proves that anyone can get quoted in the newspaper these days, as his decidedly pragmatic views on the recently concluded Space Investment Summit are carried by the Los Angeles Times. And, judging from the comments on the blog post, some space tragics are decidedly unhappy about this.

And, finally, there is the ultimate post for this time of the year – at least I think so: a recent “Astronomy Picture of The Day” left Stuart Atkinson wondering about our place in the universe, and what exactly we are looking at when we look at an image of the starry sky… Get inspired – and think a bit about our own importance!

What a perfect ending for this week’s Carnival of Space. If you would like to enter next week’s Carnival, be sure to email your entry to carnivalofspace@gmail.com; also feel free to visit Universe Today for the Carnival archives. In the mean time, I wish you happy holidays!

2007-12-19

modules, core functionality and rsyslog v3…

As I have written, I have begun to work on rsyslog v3 (and so far I am pleased to say that I have made quite good progress on it). One of the things with rsyslog v3 is that it will have an even more module architecture, utilizing loadable modules for almost everything. I asked on the mailing list about backward compatibility and I received this very good response by Michael Biebl:

One thing I was wondering:

If you intend to shift all (even core) functionality into loadable modules, how do do you handle things like –help or available command line options like -m?

Do you want to hardcode it or will you provide an interface, where rsyslog will query the module about its help message and available options.

I’m also still a bit uncertain, if moving everything resp. core functionality to modules is a good idea (for problems you already mentioned). Imho having all core functionality in a single binary is simply much more robust and fool proof. For things like the SQL db output plugin, the module interface is great, because it avoids to pull in large library and package dependencies and allows to install them on a as need basis. For other functionality I still need to recognize the benefits.

Rainer, could you roughly sketch, how you envision to break rsyslog into loadable modules in v3. Which kind of functionality would be loadable as module, which functionality do you plan to keep in the rsyslogd binary. A listing of all (planned) modules + the provided functionality and requirements would really help.

Another thing: Say you move the regexp support into a separate module. If a regexp is then used in rsyslog.conf, will you bail out with an error, simply print a warning (which could go unnoticed and the poor administrator doesn’t know why his regexp doesn’t know) or load modules on demand.

For the latter you’d need some kind of interface to query the *.so files for their supported functionality. I.e. the modules would export a list of config directives it supports and rsyslog could upon startup query each available module and create a map.

So, e.g. the ommysql module would export its support for the :ommysql: config directive. Whenever rsyslog finds such a config directive it could/would load the modules on demand.

Same could be done for the command line parameters. The imklog module would export, that it supports the -m command line parameter. Whenever that commandline parameter is used, rsyslog would know which module to load.

There are only rough ideas and there is certainly still much to consider. But what do you think about the basic idea?

This is a great response – it not only asks questions but offers some good solutions, too. It comes at a perfect time, too, because there is much that is not yet finalized for v3. For sure I have (hopefully good ;)) ideas, but all of them need to be proven in practice. The issues that come up here are a good example.

So, now let me go into the rough sketch about I envision what v3 will do. Note that it is what I envision *today* – it may change if I get good reasoning for change and/or smarter solutions.

First, let me introduce two blog posts which you may want to read before continuing here:

And, most importantly, this post already has the root reasoning for pushing things out of the syslogd core:

on the importance of plug-ins for rsyslog

Let me highlight the two most important parts from that later post:

This is exactly the way rsyslog is heading: we will try to provide an ultry-slim framework which offers just the basic things needed to orchestrate the plug-ins. Most of the functionality will indeed be available via plug-ins, dynamically loaded as needed.

… With that design philosophy, we can make rsyslog really universally available, even on low-powered devices (loading just a few plug-ins). At the high end, systems with a lot of plug-ins loaded will be able to handle the most demanding tasks.

And this is actually what the v3 effort is all about: rsyslog should become as modular as possible, with the least amount of code in the core linked binary and everything else provided via plugins. I still do not know exactly how that will happen, I am approaching it incrementally. I am now at the input plugins and trying to set them right.

In the longer term, there will be at least three different types of plugins: output, input and “filter”. I think I do not need to elaborate about the first to. Filter plugins will provide work together with expressions, another feature to come. It will enhance the template and filter system to provide a rich expression capability supporting function calls. For example, a template may look like this in a future release:

$Template MyTemplate, substr(MSG, 5, 10) + “/” + tolower(FROMHOST) + “/”

and a filter condition may be

:expr:substr(MSG, 5, 10) == “error” /var/log/errorlog

Don’t bash me for the config format shown above, that will also change ;)

Regexpt functionality will then be provided by something like a regexp() function. Functions will be defined in loadable modules. Pretty no function will be in the core. A module may contain multiple functions.

Bottom line: almost everything will be a loadable module. If you do not load modules, rsyslog will not do anything useful.

Now a quick look at the command line options: I don’t like them. Take -r, for example. Sure, it allows you to specify a listener port and also allows to convey that a listener should be started at all. But how about multiple instances? How about advanced configuration parameters? I think command line options are good for simple cases but rsyslog will provide much more than can be done with simple cases. I favor to replace all command line options with configuration file directives. This is the right place for them to be. Except, of course, such things like where to look for the master configuration file.

Which brings up backward compatibility. As you know, I begin to be puzzled about that. After all, rsyslog is meant to be a drop-in replacement for sysklogd. That means it should run with the same options like sysklogd – and should also enable administrators to build on their knowledge with sysklogd. Tough call.

Thankfully, sur5r introduced the idea of having a compatibility mode. He suggested to look at the absence of a rsyslog.conf file and then conclude that we need to run in that mode. That probably is a good suggestion that I will pick up. It can also be extended: how about a, for example, “-c” command line switch. If absent it tells rsyslog to use compatibility mode. And it should absent in previous versions as well as sysklogd, because it was not defined there.

Now let’s think. If we know we need to provide compatibility, we can load a plugin implementing compatibility settings (again, moving that out of the core functionality). Once loaded, it could analyze the rest of the command line and load whatever modules are necessary to make rsyslogd correctly interpret a post v3 configuration file. That way we have a somewhat larger then necessary memory footprint, but all works well.

Then back to native mode. Here, indeed, I’d expect that the user loads each and every module needed. I assume, however, that for any typical package the maintainer will probably load all “core” functionality (like write to file, user message, several inputs, common filter functions, …) right there in the default rsyslog.conf. This make sense for today’s hardware. It also will make the config quite foolproof. A good way to implement that would work on the semantics of $IncludeConfig. How about:

$ModLoad /whereever/necessrayplugins/

which would load all plugins in that directory.

The key point, however, is that in a limited environment, the very same binaries can be used. No recompilation required. This would be scenarios with e.g. embedded devices – or security sensitive environments where only those components that are absolutely vital should run (which is good practice because it protects you from bugs in the not-loaded code).

I personally find it OK to handle the situation as described above. I don’t like magic autoloading of modules.

This modular approach has also great advantages when it comes to maintaining the code and making sure it is as bugfree as possible. Modules tend to be small, modules should be independent of each other. So testing and finding/fixing bugs that escaped testing should be considerably easier than with the v2 code base. There are also numerous other advantages, but I think that goes to far for this post…

Comments are appreciated. Especially if you do not like what I intend to do. Now is the time to speak up. In a few weeks from now, things have probably evolved too far to change some of the basics.

2007-12-18

Shuttle Tanking Test – good results, launch date affected?

Unfortunately, I was not able to view the full news briefing (hadn’t expected it to happen that soon). But I know the most important facts and would like to pass them on here. On the briefing appeared space shuttle program manager Wayne Hale.

Mr. Hale said that the problem is with the “feedthrough” connector. This is a connector on the external tank that connects the in-tank ECO sensors with the lines that go to the point sensor box inside the space shuttle. That connector fails when put under thermal stress, causing some circuits to become open. This is a very rough description, but I hope it helps you get the idea. Mr. Hale said: “We know it is in the connector, but not if it is in the inside or outside part“.

How today’s test results will affect the launch schedule is not yet clear – the results were somewhat unexpected (I am NOT quoting Hale here) and NASA is now checking which forward actions to be taken (this, however, is a Hale quote ;)). One of the exact quotes I was able to get hold of: “The program is being asked to assemble the ISS. We’d like to do that as quickly as we can in regard to safety. We don’t need to go fly if it is not safe. We will follow that trail and see where it leads us. And when we have fixed it we’ll go fly – no matter if it is Jan 10 or Feb 10th or whenever“.

To me, the bottom line is that the January 10th launch is under consideration. We will know for sure only in a few days when the data gathered has been analyzed and a proper plan has been crafted. After all, this is why such launch schedules are always “no earlier than”. And I think it is a good thing to put safety first!

With that, I conclude my report for today. I’ll try to do a sum-up tomorrow but guess I’ve already covered the most important things.

2007-12-18

rsyslog changes for 2007-12-17

Yesterday’s rsyslog changes:

2007-12-17
– fixed a potential race condition with enqueueMsg() – thanks to mildew
for making me aware of this issue
– created thread-class internal wrapper for calling user supplied thread
main function
– solved an issue when compiling immark.c on some platforms. LARGEFILE
preprocessor defines are changed in rsyslog.h, which causes grief
for zlib. As a temporary solution, I have moved rsyslog.h right at the
beginnng of the include order. It’s somewhat dirty, but it works. I think
the real solution will be inside the autoconf files.
– moved thread termination code out to threads.c
– implemented $MarkMessagePeriod config directive
– command $ResetConfigVariables implemented for immark.c
– begun imklog, replacing klogd.c (finally we get rid of it…)
– implemented $DebugPrintKernelSymbols
– implemented afterRun input module interface function
– implemented $klogSymbolsTwice config directive

As you can see, it was quite a busy day. The input module interface has already materialized for the most part.

2007-12-18

Space Shuttle Atlantis Tanking Test in Progress

Today’s tanking test gathers additional data on ECO sensor (What is an ECO sensor?) failures with the ECO sensor circuits being fully instrumented. In order to gather data, an ECO sensor failure is needed. Before the start of the tanking test, it is anticipated that ECO sensor #3 will fail wet and all others will work correctly.

2:08a: The tanking test just went into fast fill mode. During the last launch attempt, ECO sensor #3 failed only a few minutes into fast fill mode.

8:14a: If I got the commentator right, there already is a failure of a sensor. That would be good.

8:21a: NASA TV commentator: “It appears that #1 sensor has failed, but the hydrogen console has not officially made that call. So they are still looking at it. Number 2 sensor has been intermittent. The engineers at the main propulsion system console are very intently looking at their data right now. The intermittent data on sensor #2 is somewhat of a surprise because we haven’t seen that before on that sensor.”

So as it looks, the ECO sensor system behaves again different from the last tanking, which was at the December, 9th launch attempt.

8:23a: NASA TV: “The propulsion console has provided confirmation that sensor #1 on liquid hydrogen has hard failed, which is what we wanted. The #2 and #3 are intermittent, we are watching for what those are going to do. We have not seen any activity of number 2 sensor on prior tankings.”

8:26a: NASA TV: “The hard fail on #1 is definitely what we need in order to do the troubleshooting at the pad later this morning.”

So far, this looks very good. Based on analysis done in the past days, sensor #1 is not expected to return to a non-failed state. So this one will hopefully a good candidate for the troubleshooting which will then be able to pinpoint the culprit. The intermittent failures of #2 and #3 are outside of expectations, at least as far as I understood the analysis done. However, if they now remain functional there still may be a good explanation for that. Let’s see how things evolve…

9:32a NASA commentary just announced that engineers have made up their troubleshooting plan. “The TDR sensor equipment will be focussed on sensors #2 and #3 as they have been intermittent and it is the greatest source of interest to see where that intermittent reading is coming from. Number 1 and 4 will be recorded continuously. … Right now they are configuring for the console activity that would be required once we go into stable replenish … very shortly. … So the final inspection team is preparing to enter into the pad, but we probably have another half an hour or so before we are into stable replenish.”

9:48a NASA TV: “The liquid hydrogen tank is full now and in stable replenish. The liquid oxygen (LOX) is at 80%. We’ve probably about another 25 to 30 minutes or so before liquid oxygen is in stable replenish.

10:00a: NASA TV: “We will begin to drain the tank between 1p and 2p and once the tank is drained the teams go back to do additional trouble shooting. A lot of data has already been collected on sensor #1 because it failed early.”

10:13a: “The problem with the liquid ox pump appears that a fuse has blown, so they are setting to the backup lox pump, … which will us back to where we were close to the stable replenish. This will take around 45 minutes to an hour. Why the fuse blow is still not clear”. So we have some delay in the troubleshooting plan.

11:45a: I needed to go out for a while, thus no updates. Atlantis is now in stable replenish state and troubleshooting is right now happening at the pad. The NASA TV commentary just announced that teams are now looking at ECO sensor #4 and the 5% sensor. Here are some pictures from the testing:

The picture above shows a room that is right inside the mobile launcher platform. This is where data today is analyzed.

11:52a: NASA TV “The … team is now on the mobile launcher platform and gathering data on sensor number 2 and 3.”

11:53a: a side-note: the ISS spacewalk just has completed, the spacewalkers are back into the airlock, which is now being repressurized. They did not find any obvious problem, but collected samples to be returned to earth in January on board STS-122.

12:06p: NASA TV: “Data will be collected remotely during de-tanking.” The equipment is currently being set up for this process. People are permitted close to the space shuttle only during stable replenishment phase (which essentially means no tank or detank operation is taking place). During detanking, they must leave for safety reasons. The set up process can be seen in this picture:

6:12p: and now those members at the controls can be seen leaving the control room. Everything is now set up for remote monitoring. The NASA TV commentary tells that good data has been gathered. The tank will be drained to 5% while instrumentation is on. Once it is fully drained, members of the red crew will return for further troubleshooting.

And on this picture, a wire can be seen. This wire is tapped into the ECO sensor circuits and connects to the time domain reflectometer (TDR) equipment that is used to do a detail analysis of the circuits. TDR is an off-the-shelf technology in wide spread use e.g. by cable companies to detect faulty lines.

around 12:25p: I was too slow for a quote, but NASA TV commentary stated that everyone seems to be quite happy with the data gathered. It sounded like a successful test. The commentator also announced that more – and official – information is available in the news briefing later today. For me (being six hours ahead), it probably is too late, but I’ll then do a wrap-up tomorrow morning. Let’s hope that they have actually managed to find the culprit and nail it down – and also let’s hope that the rest of the day (detanking test) also goes well.

12:40p: it looks like the red crew is now leaving the pad after finishing late inspections. I noticed they used binoculars and telescopes during that process. I (now) think they also visually checked for ice buildup and maybe some other things. Probably just part of the usual procedure. Here, the red crew can be seen leaving the mobile launcher platform (the last frame I was able to capture, sorry for the blur…).

The NASA TV commentator announced a few minutes before that engineers are ready for detanking once the red crew leaves. So I assume detanking will begin shortly.

12:48p: NASA TV: “The final inspection team has now left the pad … In the mean time we had a failure of ECO sensor #3, the second sensor that had failed. It was intermittent … and we are collecting data on sensor #3 … right now which is able to be done remotely much as we got the initial data from sensor #1 when it failed earlier today.”

I didn’t get the full quote, but as far as I understood, sensor #3 has now also a hard failure. That, IMHO, would be somewhat bad news because it means that the condition will change once the tank is loaded. However, this is just my uninformed guess. Also, on a actual launch, the shuttle would be long gone and fly in orbit. So that may not mean anything at all. I should stop speculating ;)

12:52pm: NASA TV: “We are close now to start the external tank draining operation. and we did indeed get the kind of data that we needed to get to the (bottom of?) that trouble.”

12:58p: NASA TV: “We did see sensor #3 also fail, so we try to get some data from it before we start draining.” So it actually has a hard fail and NASA uses the opportunity to get something out of that. To me this sounds like it failed just a few minutes ago, otherwise the red team would have looked at it (wouldn’t they?).

01:00p: NASA TV: “on the recently failed ECO sensor #3” – and indeed, it recently failed. Right in time, one could say…

3:00p: as I wrote, I have been away from my computer for a while. NASA TV has ended coverage of the tanking test in the mean time, so I can not provide any more status updates. I’ll try to do another post after the news briefing, but that will probably be tomorrow (I hope I’ll be able to get hold of an archived version of the news briefing). All in all, it looks like an excellent day at Kennedy Space Center, with good results that will bring up the root cause of the ECO sensor issue. Let’s hope that I have the right reception of today’s events.

Thanks to everyone who cared reading my post! I hope it has been useful and will be as a reference.

I’ve now also written down some results from the post-test news briefing.