personal Archives - Page 2 of 9

2011-01-24

Multi-Threading rsyslog’s TCP input

A form thread made me aware that there seems to be an issue with rsyslog performance if TLS is used. The past two weeks, I have worked on a paper which looks in-depth at rsyslog performance an I came across a paper [1] that promotes writing servers in that “traditional” multi-threaded way (with a single thread per connection). It addressed some of my concerns, and I thought it is worth actually trying out this approach (I outruled it for several years and never again looked at it). As a result, I created an experimental module imttcp, which works in this mode. I put this to test, especially as that would also lead to a much simpler programming paradigm. Unfortuantely, the performance results are devastive: while there is a very slight speedup with a low connection number (close to the number of cores on the system), there is a dramatic negative speedup if running with many threads. Even at only 50 connections, rsyslog is dramatically slower (80 seconds for the same workload which was processed in 60 seconds with traditional imtcp or when running on a single connection). At 1,000 connections, the run was *extremely* slow. So this is definitely a dead-end. To be honest, Behren, condit and Brewer (the authors of [1]) claim that the problem lies in the current implementation of thread libraries. As one cure, they propose user-level threads. However, as far as I could find out, User-Level threads seem not to be much faster under Linux than Kernel-Level threads (which I used in my approach).

Even more convincing is, from the rsyslog PoV, that there are clear reasons why the highly threaded input must be slower:

batch sizes are smaller, leading to much more overhead
many more context switches are needed to switch between the various i/o handlers
more OS API calls are required because in this model we get more frequent wakeups on new incoming data, so we have less data available to read at each instant
more lock contention because many more threads compete on the main queue mutex

All in all, this means that the approach is not the right one, at least not for rsyslog (it may work better if the input can be processed totally independent, but I have note evaluated this). So I will look into an enhanced event-based model with a small set of input workers pulling off data (I assume this is useful for e.g. TLS, as TLS transport is much more computebound than other inputs, and this computation becomes a limiting factor for the overall processing speed under some circumstances – see [2]).

As a side-note: for obvious reasons, I will not try to finish imttcp. However, I have decided to leave it included in the source tree, so that a) someone else can build on it, if he sees value in that b) I may use it for some other tests in the future.

[1] R. Von Behren, J. Condit, and E. Brewer. Why events are a bad idea
(for high-concurrency servers). In Proceedings of the 9th conference on Hot
Topics in Operating Systems-Volume 9, page 4. USENIX Association, 2003.

2010-10-19

Sagan and rsyslog [Guest Posting]

Hi everyone,

I am happy to feature a guest post written by Champ Clark III, the author of Sagan, a real time, “snort like” event and system log sniffing tool. Champ will explain a bit about it and how he made it work with rsyslog. I think it is a very interesting project and I am glad it now has full rsyslog support.

But enough of my words, enjoy the real thing ;)

Rainer

I admit it, I’m a recent convert to rsyslog. I’ve known about rsyslog for years, but have only recently started using rsyslog in production environments. The primary reason for looking into rsyslog is users of Sagan are requesting support for it. I’m very glad they pushed me in that direction. I knew how popular rsyslog was,
but the ‘hassles’ of changing our core logging facilities seemed like a pain.

I can tell you, it was easy and seamless. Also, after reading Rainer Gerhards’ excellent, “rsyslog: going up from 40K messages per second to 250K“, I knew that I liked this project.

So I bit the bullet, and started working with Sagan and rsyslog. I haven’t looked back since.

I work in the network & computer security field. I’ve known for years the importance of log management. One thing that I had noticed was a lack of open source log & packet levelcorrelation engines. This is essentially what Sagan does. One common comparison of Sagan is Cisco’s MARS. Sagan reads in your logs and attempts to correlate the information with the intrusion detection/prevention system’s packet level information.

At Softwink, Inc, my place of employment, we monitor security events for various clients. At the
packet-level inspection for ‘bad events’ (security related), we use Snort. Snort ‘watches’ the network connectionsand sends out an ‘alert’ when it sees nefarious traffic. We configure Snort to send the ‘alert’ to a MySQL database for further analysis. We can then monitoring these Snort sensors for ‘bad events/attacks’ in real time.

However, we found that we were missing the ‘bigger picture’ without logs. This is where rsyslog and Sagan come into play. Essentially, we take all machines and equipment on a network and forward it to a
centralized server. Rsyslog is the ‘receiver’, and sometimes the sender of these log messages. In many cases, we find that centralized secure logging is a requirement for clients. With rsyslog, we
have the ability to store log information into a MySQL database for archive purposes. We can then give the client access to the log information via Loganalyzer for easy, simple retrieval.

How does Sagan fit into this picture? For security analysis, we only want key, typically security related, events from the logs. Manually searching databases for ‘security related’ events is prone to error. It is easy to ‘miss’ key events. Sagan is the ‘eyes on the logs’ watching for security related events in real time. First, Sagan has to have access to the logs coming into the network. This is very simple with Rsyslog:

# rsyslog.conf file.
#
# As rsyslog receives logs from remote systems,  we put them into a format
# that Sagan can understand:
#

$template
sagan,"%fromhost-ip%|%syslogfacility-text%|%syslogpriority-text%|%syslogseverity-text%|%syslogtag%|%timegenerated:1:10:date-rfc3339%|%timegenerated:12:19:date-rfc3339%|%programname%|%msg%n"

# We now take the logs,  in the above format,  and send them to a 'named pipe'
# or FIFO.

*.*     |/var/run/sagan.fifo;sagan

Sagan can now ‘read’ the logs as they come into rsyslog from the /var/run/sagan.fifo (named pipe/FIFO) in real time. rsyslog actually performs double duty for us; logging to our MySQLdatabase for archival purposes and handing Sagan log information for analysis.

Over all, there is nothing really new about this concept. However, Sagan does something a bit different than other log analysis engines. When Sagan sees a ‘bad event’, Sagan will log that to your Snort IDS/IPS MySQL/PostgreSQL database. What does this mean? Packet level security events and log events reside in
the same database for correlation. There are several advantages; for one, we can now have a single, unified console for log and IDS/IPS events! Second, we can now take advantage of Snort front-end
software to view log events. For example, if you use BASE or Snorby to view packet level IDS/IPS events, you can use the same software to view log level Sagan events. Maybe your shop uses report generation
tools that query the Snort database to show ‘bad packet events’ in your network. Guess what. You can use those same reporting tools for your log information as well. I’ve posted some example screen shots of Snort & Sagan working together here. The idea is that we take advantage of the Snort community’s work on consoles.

Correlation with Sagan and Snort, at the engine level, works several different ways. First, Sagan can in some cases pull network information directly from the log message and use that for correlation in the SQL database. For example, let’s say an attacker is probing your network and is attempting to get information on the SMTP port. The attacker sends your SMTP server ‘expn root’. Your IDS/IPS engine will ‘detect’ this traffic and
store it. It’ll record the source IP, destination IP, packet dump, time stamp, etc. Sagan will do the same at the log level. Sagan will ‘extract’ as much of the information from the log message for further correlation with the packet level.

Recently, Rainer announced liblognorm (early liblognorm website). This is an exciting project. The idea is to “normalize” log information to a nice, standard usable format. I plan on putting as much support and effort as I can into this project, because it’s an important step. For Sagan, it means we will be able to better
correlate information. In my time to ponder about it since its recent announcement, I can see liblognorm being extremely useful for many different projects.

Sagan also shares another feature with Snort; it uses the same rule sets. Sagan rules sets are very much ‘Snort like’. Here is an example rule (this is a single line, broken just for readability):

alert tcp $EXTERNAL_NET any -> $HOME_NET 22 (msg:"[OPENSSH] Invalid or illegal user";
pcre: "/invalid user|illegal user/i"; classtype: attempted-user;
program: sshd; parse_ip_simple; parse_port_simple; threshold:type limit, 
track by_src, count 5, seconds 300; reference: 
url,wiki.softwink.com/bin/view/Main/5000022; sid:5000022; rev:4;)

If you’re already a Snort user, then the format and syntax should be very simple to understand. We use ‘pcre’ (regular expressions) to ‘look’ for a message from the program ‘sshd’ that contains the term ‘invalid user’ or ‘illegal user’ (case insensitive). We set the classifications, just as Snort does (for further correlation). We can ‘threshold’ the rule, so we don’t get flooded with events.

Sagan uses this format for a variety of reasons. For one, its a well know format in the security field. Second, we can now take advantage of Snort rule maintenance software! For example ‘oinkmaster’ or ‘pulled pork’. The idea is that with Sagan, you don’t need to ‘re-tool’ your network in order for it to work.

Using Sagan with your Snort based IDS/IPS system is just a feature of Sagan. Sagan can operate independently from Snort databases, and offers the normal bells/whistlers you’d expect in a SEIM (e-mailing alerts, etc).

To tie all this together, it means we can simply monitor packet level threats and log level events from a unified console. We can monitor just about everything in a network from the log level standpoint. We can monitor Cisco gear, Fortigate firewalls, Linux/*nix servers, wireless access points, etc.

Sagan is a relatively new project and still under development. Like rsyslog, Sagan is built from the ground up with performance in mind. Sagan is multi-threaded and written in C with the thought that it should be as efficient with memory and the CPU(s) as possible. Rsyslog seems to follow the same philosophy, yet another reason I made the switch.

The more information you know about a threat to your network/system, the better off you’ll be. That is what the mix of rsyslog and Sagan offers. Throw in IDS/IPS (Snort) monitoring, and you can get a complete view about ‘bad things’ happening in your environment.

For more information about Sagan, please see http://sagan.softwink.com.

2010-10-05

comments on this blog are re-enabled

Finally, Google has create some useful tools to help fight comment spam. As a result, I was able to quickly delete the remaining 100 (or so) spam comments and so I can now show comments on this blog again. This is useful, as they contain a lot of insight. Also, I hope that my readers will now comment again!

2010-08-09

P != NP solved?

Maybe a historical moment, but let those elaborate that know far more:

2010-04-30

Comment moderation turned on

Unfortunately, I had several waves of spam comments on my blog. Turning Word Verification on did not help, nor did the requirement to have a Google account help. So I was unfortunately forced to turn on comment moderation. In short, that means your comments no longer appear instantaneously, but only after I was able to review them. I know this is a bad thing, but I think it is better than turning off comments at large.

I have also temporarily hidden all comments. This is so that I have some time to clean up the mess. Once this is done, I will re-enable all valid comments.

Thanks,
Rainer

2009-04-30

A batch output handling algorithm

With this post, I’d like to reproduce a posting from David Lang on the rsyslog mailing list. I consider this to be important information and would like to have it available for easy reference.

Here we go:

the company that I work for has decided to sponser multi-message queue
output capability, they have chosen to remain anonomous (I am posting from
my personal account)

there are two parts to this.

1. the interaction between the output module and the queue

2. the configuration of the output module for it’s interaction with the
database

On for the first part (how the output module interacts with the queue), the
criteria are that

1. it needs to be able to maintain guarenteed delivery (even in the face
of crashes, assuming rsyslog is configured appropriately)

2. at low-volume times it must not wait for ‘enough’ messages to
accumulate, messages should be processed with as little latency as
possible

to meet these criteria, what is being proposed is the following

a configuration option to define the max number of messages to be
processed at once.

the output module goes through the following loop

X=max_messages

if (messages in queue)
mark that it is going to process the next X messages
grab the messages
format them for output
attempt to deliver the messages
if (message delived sucessfully)
mark messages in the queue as delivered
X=max_messages (reset X in case it was reduced due to delivery errors)
else (delivering this batch failed, reset and try to deliver the first half)
unmark the messages that it tried to deliver (putting them back into the status where no delivery has been attempted)
X=int(# messages attempted / 2)
if (X=0)
unable to deliver a single message, do existing message error
process

this approach is more complex than a simple ‘wait for X messages, then
insert them all’, but it has some significant advantages

1. no waiting for ‘enough’ things to happen before something gets written

2. if you have one bad message, it will transmit all the good messages
before the bad one, then error out only on the bad one before picking up
with the ones after the bad one.

3. nothing is marked as delivered before delivery is confirmed.

an example of how this would work

max_messages=15

messages arrive 1/sec

it takes 2+(# messages/2) seconds to process each message (in reality the
time to insert things into a database is more like 10 + (# messages / 100)
or even more drastic)

with the traditional rsyslog output, this would require multiple output
threads to keep up (processing a single message takes 1.5 seconds with
messages arriving 1/sec)

with the new approach and a cold start you would see

message arrives (Q=1) at T=0
om starts processing message a T=0 (expected to take 2.5)
message arrives (Q=2) at T=1
message arrives (Q=3) at T=2
om finishes processing message (Q=2) at T=2.5
om starts processing 2 messages at T=2.5 (expected to take 3)
message arrives (Q=4) at T=3
message arrives (Q=5) at T=4
message arrives (Q=6) at T=5
om finishes processing 2 messages (Q=4) at T=5.5
om starts processing 4 messages at T=5.5 (expected to take 4)
message arrives (Q=5) at T=6
message arrives (Q=6) at T=7
message arrives (Q=7) at T=8
message arrives (Q=8) at T=9
om finishes processing 4 messages (Q=4) at T=9.5
om starts processing 4 messages at T=9.5 (expected to take 4)

the system is now in a steady state

message arrives (Q=5) at T=10
message arrives (Q=6) at T=11
message arrives (Q=7) at T=12
message arrives (Q=8) at T=13
om finishes processing 4 messages (Q=4) at T=13.5
om starts processing 4 messages at T=13.5 (expected to take 4)

if a burst of 10 extra messages arrived at time 13.5 this last item would
become

11 messages arrive at (Q=14) at T=13.5
om starts processing 14 messages at T=13.5 (expected to take 9)
message arrives (Q=15) at T=14
message arrives (Q=16) at T=15
message arrives (Q=17) at T=16
message arrives (Q=18) at T=17
message arrives (Q=19) at T=18
message arrives (Q=20) at T=19
message arrives (Q=21) at T=20
message arrives (Q=22) at T=21
message arrives (Q=23) at T=22
om finishes processing 14 messages (Q=9) at T=22.5
om starts processing 9 messages at T=22.5 (expected to take 6.5)

2009-04-08

what is “nextmaster” good for?

People that looked at rsyslog’s git may have wondered what the branch “nextmaster” is good for. This actually is an indication that the next rsyslog stable/beta/devel rollover will happen soon. With it, the current beta becomes the next v3-stable. At the same time, the current (v4) devel becomes the next beta (which means there won’t be any beta any longer in v3). In order to facilitate this, I have branched of “nextmaster”, which I will currently work on. The “master” branch will no longer be touched and soon become beta. Then, I will merge “nextmaster” back into the “master” branch and continue to work with it.

The bottom line is that you currently need to pull nextmaster if you would like to keep current on the edge of development. Sorry for any inconvenience this causes, but this is the best approach I see to go through the migration (and I’ve done the same in the past with good success, just that then nobody noticed it ;)).

2009-02-102018-06-11

screwed up on LinkedIn ;)

A couple of days ago, I created a rsyslog group on LinkedIn. Then I was curios what happened. Well, nothing. Nothing at all. So I thought it was probably not the right time for such a thing.

And, surprise, surprise, I today browsed through LinkedIn and saw there were 16 join requests. Oops… there seem to be no email notifications for them. Bad… Well, I approved all folks. If you were one of them and now read this blog post: please accept my apologies! Obviously, this was just another time I screwed up on the Internet…

To prevent any further such incidents, I have now set the group to automatically approve everyone who is interested in joining. That’s great for this type of group, actually I am happy for everyone who comes along ;)

2008-10-212019-07-15

Carnival of Space #75 is live

While I hibernated a bit on this blog, things have evolved elsewhere. Thankfully, though, the Carnival of Space has remained. So let me re-start and old tradition today and introduce you to “Lounge of the Lab Lemming: The space carnival has the biggest tent this election” which has a very interesting selection of sites.

2008-07-252018-02-08

Work as a Human Bond

This is from a conversation with a collaborator on rsyslog, after his country was hit by a nature disaster. We went a bit philosophical, and I tried to explain how important I think it is to believe in your work and how I feel about cooperating. Again, that’s a previously unpublished bit that I thought is useful to be found (timestamp changed to original date).

To me, work (including rsyslog) is much more than just “doing something for a living”. Of course, that aspect is involved, I can’t deny that. But to be good at something, one must love what one does. So any work we conduct should ideally match our interests and be something we can be proud of (which also includes failing to deliver good work should make us ashamed and thus trying to fix the situation).

Not everything, even well done, is “good work”. Good work is work that benefits society at large. That doesn’t mean I need to be Einstein – every garbageman also provides a useful service to society (and should be proud in what he does, provided that he does it well). As a side-note, in that sense I do not see that any one work is more valuable than any other: people who try similarly hard to provide good service to society, each one with all the capabilities they have, deserve the same respect, no matter how large their contribution to society is being considered by other people. In fact, a highly educated scholar working on something light-hearted is in my opinion much less respectable than a garbageman who tries his very best in fulfilling his duties.

Having said that, I do not consider work to be something “external” to me. Instead, it is a very important part of my personality. Not the only one, and I don’t try to assign priorities to different parts of my personality so I can’t say if it is the most important one or not – but that doesn’t really matter, I think. In that sense, if you help me succeeding in my work, you also help me succeeding in growing my personality. You help me being more proud of what I am doing because you help making it better, more well-known and, importantly, more valuable o society at large. And I hope that my contribution to your work (e.g. by providing some basis) will have a similar effect for you. What’s more important is that the borders between “my work” and “your work” go away.

So it becomes “our work”, something we jointly work on, and something we are actually being tied together. And, in a sense, part of my personality becomes yours and vice versa. Doesn’t that justify to also care a bit about the person who is behind that shared work? To me, I think so, even though we “know” each other only via electrons traveling a global network…