virtual appliance for disaster recovery?

I was asked what role virtual appliances speak in disaster recovery planning. I though I share my view here. Speaking for ourselfs as a smaller company: we are moving towards virtual environments not only in order to consolidate systems, but also because it is much easier to move over functionality from a failed system to another. Some of the functions (like mail gateway, firewall etc) do not even require state data, so they can simply be restored by using a generic template virtual machine.

Instantiating this is much quicker then building a machine with scripts from scratch, not to mention that we do not need to have the hardware in stock. In fact, we think about moving such functionality even to data center servers and thus be able to quickly switch between them if there is need to.

My syslog appliance could play a similar role in disaster recovery. While it probably is not appropriate to lose data (depending on use case), it may make sense to set up a new temporary appliance, just to continue gather data and provide analysis while the rest of the system is restored. Instant log analysis is probably a key thing you would like to have in your early recovery stages.

Doing an appliance right…

Why do people turn to (virtual) software appliances? I think the number one reason is ease of installation. If an appliance has one benefit, then it is that the system was put together by someone who really knows what he does. So the end-user can simply “plug it in” into the local network, do a few configuration steps and enjoy the software.

While we worked on the virtual syslog appliance, we have checked out various other appliances. They live up to this promise in very different ways. Some are really plug and play, while others are more a demo-type of a complicated system, where the user does not know what to do with the appliance unless he reads through a big manual. This is definitely not what people are after if they look for appliances.

With SyslogAppliance, I try hard to do things as simple as possible. I learned that I probably need to add some nice HTML start page, not only the plain phplogcon log analysis display. So I have now begun to do this appliance home page, just to see that displaying information is probably not sufficient.

I will need to do some basic configuration of the appliance, too. I was (and am) tempted to use something like webmin. But on the other hand, there are so many settings. I think most appliance user will never want to touch them. So a full config front-end is probably good for those in the know. But for the rest, a software appliance should come with the bare minimum of config options that are absolutely essential to do the job. For me, the “make everything configurable expert”, this is a hard lesson to learn. Usability is top priority with appliances and usability means to present only those options that are useful to most folks (the rest will probably not use an appliance, at least not for anything but demo).

I thought I share this interesting thought on my way to creating great virtual software appliances. Besides logging, I have some other ideas (and all benefit from a great logging interface), but it is too early to talk about these, now.

New rsyslog HUP processing

There has been some discussions about rsyslog HUP processing. Traditionally, SIGHUP is used to signal the syslogd to a) close its files and b) reload its config. Rsyslog carried over this behavior from sysklogd.

However, rsyslog is much more capable than sysklogd. Among others, it is able to buffer messages that were received, but could not yet be processed. To remain compatible to the sysklogd of doing HUP, rsyslogd does a full daemon restart when it is HUPed. Among others, that means that messages from the queue are discarded, at least if the queue is configured with default settings. David Lang correctly stated that this may surprise some, if not most users. While I am still of the view that discarding the queue, under these circumstances, is the right thing to do, I agree it may be surprising (I added a hint to the man pages recently to reduce the level of surprise).

Still, there is no real need to do a full daemon restart in most cases. The typical HUP case is when logrotation wants to rotate files away and it needs to tell rsyslogd to close them. Actually, I asked if anybody knew any script that HUPs rsyslog to do a full config reload. The outcome was that nobody knew. However, some people liked to stick with the old semantics, and there may be reason to do so.

I have now implemented a lightweight HUP to address this issue. It is triggered via a new configuration directive, $HUPisRestart. If set to “on”, rsyslogd will work as usual and do a (very, very expensive) full restart. This is the default to keep folks happy that want to keep things as backwards-compatible as possible. Still, I guess most folks will set it to “off”, which is the new non-restart mode. In it, only output files are closed. Actually, the output plugin receive a HUP notification and can do whatever it likes. Currently, onle omfile acts on that and closes any open files. I can envision that other outputs, e.g. omfwd, can also be configured to do some light HUP action (for example close outbound connections).

The administrator needs to select either mode for the system. I think this is no issue at all and it safes me the trouble to define multiple signals just to do different types of HUP. My suggestion obviously is to use the new lightweight HUP for file closing, which means you need not to change anything for logrotate et al. Then, when you need to do a config reload, do a “real” restart by issuing a command like “/etc/init.d/rsyslogd restart”. And if there really exists a script that requires a config-reload HUP, that should be changed accordingly.

rsyslog v4

Finally, it is time to think about the next major rsyslog release! I have done many enhancements in v3, but the latest performance optimization work leads to a couple of significant changes in the core engine. I think it makes sense to roll these into a new major release. That leaves folks with the option to keep at the feature-rich v3-stable branch, while avoiding some of the potential unavoidable bugs in the upcoming v4 branch.

From a feature point of few, version 3 would have been good for at least three to four major releases, which I did not do just because to prevent you from coming scared by the pace with which we are moving ;)- so I think it now is a perfect spot to begin developing v4. I hope that we will see a first beta of that branch around xmas, which, I think, is a nice gift.

syslog appliance website online

I have now set up a first basic web site for SyslogAppliance. It is not great yet, but it provides a stable reference point for any work that comes up. So people can hopefully begin to use this site as a pointer for useful resources.

As a side-note, you may notice I am using a .de (German) domain. Thanks to the spammers, com, org and net domains are already used by spamming sites. And I thought it does not matter if we use a de domain. After all, we live in a time where domains from the Cocos Islands (cc) or Tuvalu (tv) are being abused for generic purposes, so why not use .de for a generic site, too?

Oh, and one interesting find: at least one person actually downloaded and tried the version of SyslogAppliance I uploaded yesterday. How do I know? I had forgotten to include the phpLogCon user in the README ;) [of course, this is fixed now ;)]

rsyslog performance

Thanks to David Lang, I have been able to gather some performance data on rsyslog. More importantly, I have been able to improve rsyslog’s performance dramatically while working with David. He does not only dispense good advise, he has also a great test environment which I lack. If you would like to see how things evolve, be sure to follow this (lengthy ;) thread: http://kb.monitorware.com/rsyslog-performance-t8691.html.

But you are probably interested in actual numbers.
The current v3-stable (3.18.x) manages to process around 22.000 messages per second (mps) with DNS name resolution turned on and about double that value without. That’s not bad, but obviously there is room for improvement.

Thanks to our combined effort, we have reached a state where we can process more than 100,000 mps and there is an experimental version (applying some lock-free algorithms) that goes well beyond 200,000 mps. I am not yet sure if we will pursue the lock-free algorithm. There are ample of additional ideas available and I am quite positive we can push the limit even further.

All numbers were tested with a minimal configuration (one udp input, one file output) on a capable multi-core machine. The numbers above are for sustained traffic rates. More messages can be accepted (and buffered) during bursts.

virtual syslog appliance

I’ve just recently blogged about my syslog appliance idea. Now this has become reality. There is the first 0.0.1 version of rsyslog and phplogcon as a virtual appliance.

For starters, I have created a very simple system. While I have a number of options for the operating platform, I started to Ubuntu JeOS mainly because it had good guides for getting started with an appliance quickly. Being based based on Debian also was a plus. Some may argue that the downside is that the log appliance currently requires VMWare. While I agree this may be an issue, it is not an extremely big one especially as VMWare server runs quite well under Linux and is free to use.

I will investigate Red Hat’s AOS, but I think I need to get some results from the app point of view first and JeOS looks quite promising in this regard.

For now, I have even started with the stock rsyslog package, which is quite outdated on that platform. However, I’ll do a couple of iterations in the next days and so will come up to the current release soon. But for what the appliance currently needs, the older version is not really a problem.

I am now very interested in feedback on this new offering. The appliance can be downloaded from

http://www.rsyslog.com/Downloads-req-viewdownloaddetails-lid-136.phtml

One of my next actions is to set up a dedicated site, which will make finding (and providing!) information on the appliance much easier. But one thing after the other…

Oh, and one thing on the licensing: the appliance is free for non-commercial use. However, we intend to request a moderate fee for commercial use, which I think is a fair policy. Of course, all appliance components are freely available.

If you try out the appliance, please provide feedback!. I have set up a dedicated forum at

syslog appliance forum

As I said, the initial version will probably not as “plug and play” as I hope, but I am very positive we are on a good path. Besides, it is an exciting project.

India is going to the moon…

Everybody seems to want to go the moon these days. Russia does, China does, Europe, as usual, “says” it does (if someone else provides the ferry ship ;)) and the US, of course, will, too.

Today, India has launched a moon mission, just to tell us they, too, are serious about this topic. A rocket carrying the Chandrayaan-1 probe rocketed into the skies at India’s spaceport Sriharikota. Chandrayaan-1’s mission will last two years. It is tasked to create a detailed map of minerals and chemical properties of the moon surfaces, as well as general surface structures.

The moon seems to promise big business. It is also politically quite important. With the US right in front of a very important election, it will be very interesting to see which direction the new administration will take. NASA’s constellation program is underfunded and has unrealistic goals if being worked on at the current (finance-dictated) pace.

Will the US be among the last folks to go back to the moon? The Russians are on a good path already and seem to have funding and a commercial vision. Or will a new moon race start, where the US demonstrates technical leadership? Interesting question, time will tell. At least we have a new player who seems to be serious inside this game…

Hubble partly restored, Atlantis heading back…

The Hubble repairs go well, but unfortunately not too well. As NASA reports, the restoration succeeded only partly, some systems are still defunctional:


On Wednesday, October 14, engineers at NASA’s Goddard Space Flight Center reconfigured six components of the Hubble Data Management System and five components in the Science Instrument Command and Data Handling (SIC &DH) system to use their redundant (or B) sides. This was done to work around a failure that occurred on September 27 in the Side A Science Data Formatter in the SIC&DH and resulted in the cessation of all science observations except for astrometry with the Fine Guidance Sensors.

The reconfiguration proceeded nominally and Hubble resumed the science timeline at Noon ET on Thursday, October 16. The first activities out of that on-board science timeline were the commanding of the science instruments from their safe to operate modes. This occurred nominally for Wide Field Planetary Camera 2 and the Near Infrared Camera and Multi Object Spectrometer. However, an anomaly occurred during the last steps of the commanding to the Advanced Camera for Surveys. At 1:40 pm, when the low voltage power supply to the ACS Solar Blind Channel was commanded on, software running in a microprocessor in ACS detected an incorrect voltage level in the Solar Blind Channel and suspended ACS. Then at 5:14 pm, the Hubble spacecraft computer sensed the loss of a “keep alive” signal from the NASA Standard Spacecraft Computer in the SIC&DH and correctly responded by safing the NSSC-I and the science instruments. It is not yet known if these two events were related.

The investigation into both anomalies is underway. All data has been collected and is being analyzed. The science instruments will remain in safe mode until the NSSC-I issue is resolved. All other subsystems on the spacecraft are performing nominally and astrometry observations continue.

But at least some observations can be carried on.

At the same time, Space Shuttle Atlantis is heading back to the VAB to get to a save haven while the Hubble repair mission is postponed. Unfortunately, a rod struck parts of Atlantis while it was removed from the launch pad. It is now investigated whether or not repairs are necessary. From what I have read, the external tank probably needs some attention, the rest of the space shuttle stack seems to have not been damaged. Thankfully, there is enough time left until mid-February, which is considered the earliest launch date.