November 2007 - Rainer Gerhards

2007-11-30

John Glenn on the NASA Budget

I had the pleasure to be able to listen to great American hero and former Senator John Glenn at World Space Expo 2007. The event was held in November this year in Kennedy Space Center, Florida.

Both John Glenn and Scott Carpenter were honored guest on the evening event. Apollo 15 astronaut Al Worden discussed with them over their experiences as well as their visions for the future. I take some videos of that event. Thankfully, I also captures John Glenn’s speech on the future NASA budget. He very rightfully stressed that fact that Constellation, NASA’s new moon program, has taken a lot of money from science missions. He explains that there is no special funding for the whole constellation program. But listen yourself:

This speech couldn’t be more on-time: Democratic presidential candidate Barack Obama is thinking about taking that Constellation money from NASA to fund education. So I think it is good to know the fact that NASA has not received any special funding and is already starving in its science activities.

If you listen closely, however, you will notice that John Glenn assigns science priority over the moon program. But that doesn’t mean that money taken away from science should now be removed from the budget at all…

2007-11-29

Astronauts will wear overgloves…

On the picture to the right, you see the actual layers of a current space glove. I shot this picture when I attended World Space Expo 2007 at Kennedy Space Center. The green part on the left is the inner pressure bladder, the one in the middle is worn above it and can be configured to the astronauts hand. The white glove to the right is the outer layer. NASA always speaks of five glove layers, but I think this refers to layers of material brought up onto the three different glove parts you see in the picture. At least, I could not find anything else (if you happen to know, I’d appreciate to learn about it).

On the recent international space station spacewalks (aka “EVA”), there were very often problems with cut or punctured gloves. Thankfully, these cuts were always only in the outer layers, posing no risk to the spacewalkers. It is believed that there are some unknown sharp edges at the space station, but nobody knows for sure where (thus they are unknown ;)).

Below, find a picture of a damaged space glove. This was taken after a spacewalk on the STS-118 mission:

punctured space glove after STS-118 spacewalk

To protect the astronauts, frequent checking of the gloves is now a requirement during spacewalks. However, the detection of a glove issue can cut a spacewalk short and thus seriously compromise the mission. To prevent that problem, STS-122 spacewalkers will wear overgloves. They made their first live test on the STS-120 spacewalk devoted to repairing the torn solar array.

I have not yet seen an actual picture of these overgloves. But obviously, they cause some loss of feeling and flexibility. As such, tasks carried out by the astronauts may take a bit longer than usual. NASA has put only a conditional order to wear the overgloves. For delicate work, spacewalkers may remove them. This is also possible if time is running out on spacewalk. This poses no extra risk, as the rigid glove-checking guidelines then apply. So the overgloves are actually more to save the mission than the astronaut.

2007-11-29

STS-122 Press Kit Available

For everybody interested, the STS-122 press kit can now be downloaded from NASA. The press kit is an excellent resource for insight information on the flight AND for great pictures. I recommend to have a look at it to anyone really interested in this flight!

2007-11-28

German UNAWE Committee …

Last Saturday, I had the joy of attending a meeting that formally founded the German UNAWE Committee (which, as a side-note made me become a member of it).

UNAWE (“Universe Awareness for Young Children”) is an internationally-recognized organization trying to educate young children on astronomy. The target age is rougly between 4 and 10 years. Besides astronomy, UNAWE is also about people (children) from different cultures talking to each others and sharing their experiences. This is a fantastic idea and I like it very much. There are already a number of UNAWE committees all around the world and I am eager to help grow that network.

If you are into astronomy, work with children, or both: consider contributing!

2007-11-28

Did I find the rsyslog bug?

I have a new leading theory on the rsyslog segfault bug. Before I restructured everything to get rid of the alarm() calls, I did some more research in respect to the best threading model. More or less by accident, I found a nice note on glibc, rentrancy and the _REENTRANT preprocessor macro. That lead me to the “-pthread” compiler option…

May it be that we “just” have a compiler option problem? So far, we only call it with “-lpthreads”, which only effects the linker. -pthread, if I understood it right, defines _REENTRANT, which in turn causes some reentrant versions of standard library functions to be used. If that isn’t outdated information, it could pretty much be our problem.

I am not sure which functions of the run time library are affected by the _REENTRANT macro. So I do not know if I call them. However, non-reentrant run time library functions would be a good explanation for very seldom segfaults that only occur under heavy load and when running in multithreading mode. It would also explain why so many in-depth code reviews did not find anything…

This compiler option finding looks very promising. I’ll probably do a special 1.19.11 release with just that patch and then we’ll see what happens. In the mean time, I wait if somebody comes up with some more diagnostic information. But I finally have found a good explanation for what I see – at least in theory…

2007-11-27

STS-122 Flight Readiness Review on Friday…

Everything is going very smooth with Atlantis’ STS-122 launch. No matter where I looked, I do not find any information on problems. So no news again means excellent news!

The flight readiness review, the final approval of the launch date, is scheduled for this Friday. As it looks, this is more a formal act than something that will bring up surprises (but of course, you never know…). To quote the NASA space shuttle home page:

NASA managers will hold a flight readiness review on Friday at NASA’s Kennedy Space Center marking the next major milestone for mission STS-122.

NASA officials, space shuttle program managers, engineers and contractors will discuss the readiness of space shuttle Atlantis, the flight crew and payloads to determine if everything is set to proceed for launch. Managers will also select an official launch date at the end of the session. Launch is targeted for Dec. 6 on a mission to install the Columbus laboratory on the International Space Station.

A briefing following the meeting will include Associate Administrator for Space Operations Bill Gerstenmaier, Space Shuttle Program Manager Wayne Hale, International Space Station Program Manager Mike Suffredini and STS-122 Launch Director Doug Lyons.

The briefing will be broadcast live on NASA Television no earlier than 4 p.m. EST.

The real question is probably not if and when Atlantis will launch. The most discussed question currently is if the STS-122 mission is extended to allow a focussed inspection of the SARJ ISS solar array rotary joint. Anomalies were detected prior to STS-120 and inspection during recent spacewalks staged from the international space station showed signs of abrasion. This is an unexpected, not yet understood and potentially serious problem – so it is receiving priority for obvious reasons.

The additional inspection spacewalk requires a two-day mission extension. Other than its sister ships Discovery and Endeavour, Atlantis is not equipped with the station to shuttle power transfer system (SSPT). Thus, Atlantis can not support missions as long as its sister ships. So a two-day mission extension requires fully stocked consumables and is something that probably is not very easily done.

2007-11-27

Is alarm() the culprit?

I have once again reviewed the threading. Now I have a weak hope. To handle mark messages, there is an alarm() call inside rsyslog. While alarm() typically does not play well with pthreads, rsyslog uses it in a very limited scope. Also, the alarm handler is activated only in one specific thread. But… “alarm() does not play well with pthreads”… In absence of any better solution, may this be the actual cause for this hard to hunt bug?

I will now see what is required to get rid of the alarm(). Actually, this is harder than it initially looks. For alarm() to go away, I need to set up a real background thread that does the timer ticks and that causes the mark messages. That, in turn, means that I have two concurrent messages sources, which involves quite a bit of synchronization. All of that is currently not needed as the alarm signal simply interrupts the select, which in turn leads to execution of nice sequential code. Oh, and yes: the alarm signal handler of course does (contrary to sysklogd) nothing then to set a global variable flag. So I’ll have a look at all that…

2007-11-272018-06-08

found another part of the code … that is not the problem…

I’d like to reproduce part of my conversation with Peter Vrabec over here. Together with others, Peter is very actively helping track down the nasty rsyslog segfault bug. He had quite a good idea of what could have caused it, resulting in me doing another focussed code inspection. I just explained to him why we have not yet found the problem. And there is a lot of detail in that description. Detail, I think, that others can benefit from, too. So here it comes:

> *** glibc detected *** rsyslogd: double free or corruption (!prev):
> 0x09cbd588
> ***
> (gdb) print (uchar *) 0x09cbd588
> $49 = (unsigned char *) 0x9cbd588 “”
>
> is it possible to call free on pszRcvFrom when it points at “” ?
> because I can see:
> msg.c:
> if(pM->pszRcvFrom == NULL)
> return “”;
> else
> return (char*) pM->pszRcvFrom;
>
> same with getRawMsg/getUxTradMsg/…, places where mudflap
> screams in MsgDestruct().

Sorry – if I have not totally screwed up, it’s impossible that this is the problem. I’ve done (yet another) thorough review. All of these functions are either called to supply another Set…() function (which simply copies over the “”, but does not free it) or from MsgGetProp(). MsgGetProp() uses a variable (pbMustBeFreed) to track whether or not a buffer must be freed. In general, unmodified properties are never freed – this happens only at message destruction (MsgDestruct()). MsgDestruct() however, just checks the pointers and frees if they are non-NULL. So MsgDestruct() is not interested in getRawMsg() or another get() function returning an empty string except a NULL pointer.

I have also checked if MsgGetProp() and its helpers correctly handle pbMustBeFreed – and they do it right. So a buffer is only freed when it was dynamically allocated. Message properties are only created during message construction and are free()ed when the message is destroyed.

So I guess we have found another part of the code that does not cause the problem.

What I would like to re-iterate, though, is that the segfault problem seems to disappear under all circumstances if compiled with –disable-pthreads. IF SO (and this is my current assumption), it can not be a general logic error like a double free but must be a synchronization problem. Let me re-iterate: I have yet to find a single installation that has a segfault when running on a single thread! All folks with segfaults that compiled in single threading mode did no longer experience any problems. This is strong evidence to me.

But while I almost think I know it is threading related, I can not find anything wrong in that area, too. The threading model is very simple and construction/destruction of elements is spread in an easy way over the two threads. Basically, the listener thread creates the message and its properties, while the action thread destructs them. There are a few exceptions where properties are derived, but nothing really complex. Yet, I still think it is in the threading area – why does it work in single thread mode otherwise?

2007-11-27

Hunting for the segfault…

Do you remember? We are still hunting for a segfault in rsyslog that is very hard to find. The biggest problem is that most people will never experience it. I do not experience it in lab, nor does anybody else who is currently working on the project. And without the ability to reproduce it, there is a lot of guesswork involved.

This is why we are asking for the help of our users (that means: you!). If you run rsyslog and experience a segfault, we would very much appreciate if you could run a specially instrumented version. Peter Vrabec created it and it contains debugging support as well mudflap support, which is a tool to track down nasty memory management errors.

The RPM can be found at:

http://people.redhat.com/pvrabec/rpms/rsyslog/rsyslog-1.19.10-2.mudflap.src.rpm

Please install it. You will notice that CPU usage is higher than normal, but in most cases this is harmless. If you are concerned, drop me a line with specifics and I’ll happily address them.

If a segfault happens when you run this version, please send us

binary,

coredump and

/var/log/rsyslog.mudflap

hardware and OS information (which distro? multiprocessor? …?)

Our sincere hope is that we receive enough reports to find something in common between them. So, please contribute your segfault info if you happen to have it. This is a great way to contribute to the project!

2007-11-25

Space Shuttle

I thought its time to write a bit about the space shuttle itself. As many say, it is the world’s most complex machine ever built.

The space shuttle made its maiden flight on April, 12th 1981 and will, based on current plans, be retired in 2010 after completing the construction of the international space station, its current primary target.

The space shuttle was developed as a replacement for the Apollo spacecrafts. Contrary to Apollo, it can reach low earth orbit, only. It is not capable to go to the moon.

The space shuttle’s major components are the boosters, external tank and the orbiter. The orbiter is the airplane-shaped white “ship” that is commonly called “the shuttle”, though it is only part of it. The reddish external tank contains ascent fuel. And the white booster rockets on the sides of the external tank provide the main propulsion for the initial flight phase after launch.

Its main feature was reusability of most parts. Only the external tank is lost on launch, the boosters glide back to earth on parachutes after separation from the craft. The initial design called for huge savings from that fact – something the space shuttle could not life up to. Some sources quote that NASA expected to have as much as one flight per week and the shuttle to replace all other launch vehicles. In practice, only a few launches per year were achievable and each of them being much more expensive than initially thought.

The space shuttle program was compromised by budget cuts in its early design phase. Initially, it was planned to have the actual orbiter sitting on top of the external tank and boosters. There would obviously be a different design for the main engines in this concept, too. The then-chosen configuration with the orbiter being mounted to the side of these components is a trouble source until today. It exposes the shuttle to launch debris, for example parts of the external tanks foam isolation that fall off during launch.

Launch debris is very hard to avoid. On launch, each spacecraft is shaken quite well. So chances are great something will go off. With all designs but the space shuttle, this poses no problem, because no vital system can be hit by such debris. If you look at Apollo-days Saturn V launches, you will see lots of ice falling off, but the crew capsule and their support system sat well protected above the debris source. Consequently, NASA’s new constellation moon program designs an Apollo-like craft with the vital systems again sitting on top of the launch propulsion system.

In my personal opinion, the space shuttle is a good example why budget constrains should not overrule engineering decisions. NASA paid badly for the initial savings…

Besides that problem, the space shuttle is an incredible and fascinating machine. Among its many great achievements is the delivery and continued servicing of the Hubble Space Telescope. Also, construction of the international space station ISS depends on space shuttles doing the heavy hauling. The space shuttle is also the only spacecraft ever capable to capture massive satellites in orbit and deliver them back to earth.

The space shuttle is also very inspiring. Viewing a space shuttle launch is a special experience.

Astronauts also praise the space shuttle for its roominess and the smooth ascent and descent, which puts very low G-forces on the crew.

NASA’s future constellation space program borrows heavily both at the space shuttle and Apollo programs. It is expected to get the best of two worlds. For example, Ares rockets will fly modified space shuttle boosters.

So while the space shuttle has some weaknesses, it is a very successful craft that not only contributed significantly to science, but will also help pave the way to the Moon, Mars and beyond. In my personal opinion, even the weaknesses were kind of success: they proved which things needed to be done differently. And, of course, a lot of issues were already fixed during the lifetime of the space shuttle program.

Currently, the shuttle fleet is set to retire in 2010. This is a political decision not backed by hard technical fact. In my personal opinion, I would like to see the space shuttle flying at least once a year until the Ares I and Orion vehicles are ready to launch. Of course, I do not know exactly what this requires, but I am a bit hesitant to leave access to the international space station just to the Russians. I also doubt that there will really be a “just” six-year inability of carrying humans into space – the Constellation program already has some of its schedule’s slipping. And with an endeavor as complex Constellation, it would be wise to count on some more schedule slips. I wouldn’t be surprised if the first Ares manned flight will not happen before 2018…

The space shuttle has received numerous fixes both in procedures and technology. It is more capable than ever before. It is safer than ever before. Wouldn’t it be wise to count on it as long as its successor is not ready?