German UNAWE Committee …

Last Saturday, I had the joy of attending a meeting that formally founded the German UNAWE Committee (which, as a side-note made me become a member of it).

UNAWE (“Universe Awareness for Young Children”) is an internationally-recognized organization trying to educate young children on astronomy. The target age is rougly between 4 and 10 years. Besides astronomy, UNAWE is also about people (children) from different cultures talking to each others and sharing their experiences. This is a fantastic idea and I like it very much. There are already a number of UNAWE committees all around the world and I am eager to help grow that network.

If you are into astronomy, work with children, or both: consider contributing!

Did I find the rsyslog bug?

I have a new leading theory on the rsyslog segfault bug. Before I restructured everything to get rid of the alarm() calls, I did some more research in respect to the best threading model. More or less by accident, I found a nice note on glibc, rentrancy and the _REENTRANT preprocessor macro. That lead me to the “-pthread” compiler option…

May it be that we “just” have a compiler option problem? So far, we only call it with “-lpthreads”, which only effects the linker. -pthread, if I understood it right, defines _REENTRANT, which in turn causes some reentrant versions of standard library functions to be used. If that isn’t outdated information, it could pretty much be our problem.

I am not sure which functions of the run time library are affected by the _REENTRANT macro. So I do not know if I call them. However, non-reentrant run time library functions would be a good explanation for very seldom segfaults that only occur under heavy load and when running in multithreading mode. It would also explain why so many in-depth code reviews did not find anything…

This compiler option finding looks very promising. I’ll probably do a special 1.19.11 release with just that patch and then we’ll see what happens. In the mean time, I wait if somebody comes up with some more diagnostic information. But I finally have found a good explanation for what I see – at least in theory…

STS-122 Flight Readiness Review on Friday…

space shuttle atlantis at the launch padEverything is going very smooth with Atlantis’ STS-122 launch. No matter where I looked, I do not find any information on problems. So no news again means excellent news!

The flight readiness review, the final approval of the launch date, is scheduled for this Friday. As it looks, this is more a formal act than something that will bring up surprises (but of course, you never know…). To quote the NASA space shuttle home page:

NASA managers will hold a flight readiness review on Friday at NASA’s Kennedy Space Center marking the next major milestone for mission STS-122.

NASA officials, space shuttle program managers, engineers and contractors will discuss the readiness of space shuttle Atlantis, the flight crew and payloads to determine if everything is set to proceed for launch. Managers will also select an official launch date at the end of the session. Launch is targeted for Dec. 6 on a mission to install the Columbus laboratory on the International Space Station.

A briefing following the meeting will include Associate Administrator for Space Operations Bill Gerstenmaier, Space Shuttle Program Manager Wayne Hale, International Space Station Program Manager Mike Suffredini and STS-122 Launch Director Doug Lyons.

The briefing will be broadcast live on NASA Television no earlier than 4 p.m. EST.

ISS solar array rotary joint (SRJ)The real question is probably not if and when Atlantis will launch. The most discussed question currently is if the STS-122 mission is extended to allow a focussed inspection of the SARJ ISS solar array rotary joint. Anomalies were detected prior to STS-120 and inspection during recent spacewalks staged from the international space station showed signs of abrasion. This is an unexpected, not yet understood and potentially serious problem – so it is receiving priority for obvious reasons.

The additional inspection spacewalk requires a two-day mission extension. Other than its sister ships Discovery and Endeavour, Atlantis is not equipped with the station to shuttle power transfer system (SSPT). Thus, Atlantis can not support missions as long as its sister ships. So a two-day mission extension requires fully stocked consumables and is something that probably is not very easily done.

Is alarm() the culprit?

I have once again reviewed the threading. Now I have a weak hope. To handle mark messages, there is an alarm() call inside rsyslog. While alarm() typically does not play well with pthreads, rsyslog uses it in a very limited scope. Also, the alarm handler is activated only in one specific thread. But… “alarm() does not play well with pthreads”… In absence of any better solution, may this be the actual cause for this hard to hunt bug?

I will now see what is required to get rid of the alarm(). Actually, this is harder than it initially looks. For alarm() to go away, I need to set up a real background thread that does the timer ticks and that causes the mark messages. That, in turn, means that I have two concurrent messages sources, which involves quite a bit of synchronization. All of that is currently not needed as the alarm signal simply interrupts the select, which in turn leads to execution of nice sequential code. Oh, and yes: the alarm signal handler of course does (contrary to sysklogd) nothing then to set a global variable flag. So I’ll have a look at all that…

found another part of the code … that is not the problem…

I’d like to reproduce part of my conversation with Peter Vrabec over here. Together with others, Peter is very actively helping track down the nasty rsyslog segfault bug. He had quite a good idea of what could have caused it, resulting in me doing another focussed code inspection. I just explained to him why we have not yet found the problem. And there is a lot of detail in that description. Detail, I think, that others can benefit from, too. So here it comes:

> *** glibc detected *** rsyslogd: double free or corruption (!prev):
> 0x09cbd588
> ***
> (gdb) print (uchar *) 0x09cbd588
> $49 = (unsigned char *) 0x9cbd588 “”
>
> is it possible to call free on pszRcvFrom when it points at “” ?
> because I can see:
> msg.c:
> if(pM->pszRcvFrom == NULL)
> return “”;
> else
> return (char*) pM->pszRcvFrom;
>
> same with getRawMsg/getUxTradMsg/…, places where mudflap
> screams in MsgDestruct().

Sorry – if I have not totally screwed up, it’s impossible that this is the problem. I’ve done (yet another) thorough review. All of these functions are either called to supply another Set…() function (which simply copies over the “”, but does not free it) or from MsgGetProp(). MsgGetProp() uses a variable (pbMustBeFreed) to track whether or not a buffer must be freed. In general, unmodified properties are never freed – this happens only at message destruction (MsgDestruct()). MsgDestruct() however, just checks the pointers and frees if they are non-NULL. So MsgDestruct() is not interested in getRawMsg() or another get() function returning an empty string except a NULL pointer.

I have also checked if MsgGetProp() and its helpers correctly handle pbMustBeFreed – and they do it right. So a buffer is only freed when it was dynamically allocated. Message properties are only created during message construction and are free()ed when the message is destroyed.

So I guess we have found another part of the code that does not cause the problem.

What I would like to re-iterate, though, is that the segfault problem seems to disappear under all circumstances if compiled with –disable-pthreads. IF SO (and this is my current assumption), it can not be a general logic error like a double free but must be a synchronization problem. Let me re-iterate: I have yet to find a single installation that has a segfault when running on a single thread! All folks with segfaults that compiled in single threading mode did no longer experience any problems. This is strong evidence to me.

But while I almost think I know it is threading related, I can not find anything wrong in that area, too. The threading model is very simple and construction/destruction of elements is spread in an easy way over the two threads. Basically, the listener thread creates the message and its properties, while the action thread destructs them. There are a few exceptions where properties are derived, but nothing really complex. Yet, I still think it is in the threading area – why does it work in single thread mode otherwise?

Hunting for the segfault…

Do you remember? We are still hunting for a segfault in rsyslog that is very hard to find. The biggest problem is that most people will never experience it. I do not experience it in lab, nor does anybody else who is currently working on the project. And without the ability to reproduce it, there is a lot of guesswork involved.

This is why we are asking for the help of our users (that means: you!). If you run rsyslog and experience a segfault, we would very much appreciate if you could run a specially instrumented version. Peter Vrabec created it and it contains debugging support as well mudflap support, which is a tool to track down nasty memory management errors.

The RPM can be found at:

http://people.redhat.com/pvrabec/rpms/rsyslog/rsyslog-1.19.10-2.mudflap.src.rpm

Please install it. You will notice that CPU usage is higher than normal, but in most cases this is harmless. If you are concerned, drop me a line with specifics and I’ll happily address them.

If a segfault happens when you run this version, please send us

  • binary,
  • coredump and
  • /var/log/rsyslog.mudflap
  • hardware and OS information (which distro? multiprocessor? …?)

Our sincere hope is that we receive enough reports to find something in common between them. So, please contribute your segfault info if you happen to have it. This is a great way to contribute to the project!

Space Shuttle

space shuttleI thought its time to write a bit about the space shuttle itself. As many say, it is the world’s most complex machine ever built.

The space shuttle made its maiden flight on April, 12th 1981 and will, based on current plans, be retired in 2010 after completing the construction of the international space station, its current primary target.

The space shuttle was developed as a replacement for the Apollo spacecrafts. Contrary to Apollo, it can reach low earth orbit, only. It is not capable to go to the moon.

The space shuttle’s major components are the boosters, external tank and the orbiter. The orbiter is the airplane-shaped white “ship” that is commonly called “the shuttle”, though it is only part of it. The reddish external tank contains ascent fuel. And the white booster rockets on the sides of the external tank provide the main propulsion for the initial flight phase after launch.

Its main feature was reusability of most parts
. Only the external tank is lost on launch, the boosters glide back to earth on parachutes after separation from the craft. The initial design called for huge savings from that fact – something the space shuttle could not life up to. Some sources quote that NASA expected to have as much as one flight per week and the shuttle to replace all other launch vehicles. In practice, only a few launches per year were achievable and each of them being much more expensive than initially thought.

The space shuttle program was compromised by budget cuts in its early design phase. Initially, it was planned to have the actual orbiter sitting on top of the external tank and boosters. There would obviously be a different design for the main engines in this concept, too. The then-chosen configuration with the orbiter being mounted to the side of these components is a trouble source until today. It exposes the shuttle to launch debris, for example parts of the external tanks foam isolation that fall off during launch.

Launch debris is very hard to avoid. On launch, each spacecraft is shaken quite well. So chances are great something will go off. With all designs but the space shuttle, this poses no problem, because no vital system can be hit by such debris. If you look at Apollo-days Saturn V launches, you will see lots of ice falling off, but the crew capsule and their support system sat well protected above the debris source. Consequently, NASA’s new constellation moon program designs an Apollo-like craft with the vital systems again sitting on top of the launch propulsion system.

In my personal opinion, the space shuttle is a good example why budget constrains should not overrule engineering decisions. NASA paid badly for the initial savings…

Besides that problem, the space shuttle is an incredible and fascinating machine. Among its many great achievements is the delivery and continued servicing of the Hubble Space Telescope. Also, construction of the international space station ISS depends on space shuttles doing the heavy hauling. The space shuttle is also the only spacecraft ever capable to capture massive satellites in orbit and deliver them back to earth.

The space shuttle is also very inspiring. Viewing a space shuttle launch is a special experience.

space shuttle launch

Astronauts also praise the space shuttle for its roominess and the smooth ascent and descent, which puts very low G-forces on the crew.

NASA’s future constellation space program borrows heavily both at the space shuttle and Apollo programs. It is expected to get the best of two worlds. For example, Ares rockets will fly modified space shuttle boosters.

So while the space shuttle has some weaknesses, it is a very successful craft that not only contributed significantly to science, but will also help pave the way to the Moon, Mars and beyond. In my personal opinion, even the weaknesses were kind of success: they proved which things needed to be done differently. And, of course, a lot of issues were already fixed during the lifetime of the space shuttle program.

Currently, the shuttle fleet is set to retire in 2010. This is a political decision not backed by hard technical fact. In my personal opinion, I would like to see the space shuttle flying at least once a year until the Ares I and Orion vehicles are ready to launch. Of course, I do not know exactly what this requires, but I am a bit hesitant to leave access to the international space station just to the Russians. I also doubt that there will really be a “just” six-year inability of carrying humans into space – the Constellation program already has some of its schedule’s slipping. And with an endeavor as complex Constellation, it would be wise to count on some more schedule slips. I wouldn’t be surprised if the first Ares manned flight will not happen before 2018…

The space shuttle has received numerous fixes both in procedures and technology. It is more capable than ever before. It is safer than ever before. Wouldn’t it be wise to count on it as long as its successor is not ready?

November, 24th ISS spacewalk a success

Astronauts Peggy Whitson and Dan Tani conducted a successful spacewalk yesterday. It was the last in a series of important construction space walks which readied the international space station ISS to receive the European Columbus module.

Columbus will be delivered by space shuttle Atlantis’ STS-122 mission, set to launch from Kennedy Space Center on December, 6th 2007. As such, success of the spacewalk was also important for STS-122. Without it, a launch would not have been possible.

Now, with the successful spacewalk and everything going very smooth in the processing of Atlantis, it looks like weather is becoming the only constraint for the launch attempt. This is good news, because STS-122 has a very short launch window. It extends for just one week. So there is not much room for delays.

Here is also some more detail information from the NASA homepage:

Spacewalkers Peggy Whitson and Dan Tani completed Saturday’s spacewalk at 11:54 a.m. EST. The 7 hour and 4 minute excursion started an hour and 10 minutes early. They completed their main tasks well ahead of the timeline then moved on to perform some get-ahead work.

The two spacewalkers moved the 300-pound, 18.5 foot Loop B fluid tray from the station’s main truss to the port side of Destiny and completed fluid and electrical connections.

Tani did an inspection of a Solar Alpha Rotary Joint that had previously shown increased power consumption and vibration while rotating as it followed the Sun. Whitson deployed and mated cables to be used as part of the Station to Shuttle Power Transfer System, or SSPTS. A portable foot restraint was also installed on Node 2 for upcoming spacewalks when the European Columbus laboratory is installed on the STS-122 mission.

If you like even more details, you can find them on an additional NASA page devoted to Saturday’s spacewalk.

Spaceports: Obama Would Delay Moon Return

I have some time to review other space blogs right now (an advantage of doing business with the US – if it is a holiday over there, I’ve some spare time, too ;)). I read this interesting report on potential cuts into NASA’s budget:

Spaceports: Obama Would Delay Moon Return

I do not like the idea at all. What needs to be known is that the money that fuels NASA’s Constellation moon program already is taken from the regular budget. There was no budget increase that came together with the plan to go to the moon again. NASA’s science program is already suffering very badly.

If now additional funds are taken from NASA’s budget, that would IMHO severely compromise NASA’s ability to do useful missions. Not to mention that fact that it would be depending on Russia for all its manned space flight activities for at least a decade.

Even though I am German, I do not at all like this idea. But, granted, it’s the same problem everywhere: Germany cut the space budget that much that even though we have a number of slot in current ESA and NASA missions, we do not have any funding left to use them :( …

Apollo Mission in Pictures…

I just found a nice link I’d like to share – it is a nice, quick look at NASA’s Apollo mission in pictures. I personally think that the new “moon race” carried out now is at least as interesting as the Apollo missions. And I find it very interesting that the NASA’s Constellation program is building on so many Apollo concepts.

Give the link a try, the pictures are really inspiring. BTW: does anybody have a recording of the old moon TV coverage? Having a few samples online would be really great…