A batch output handling algorithm

With this post, I’d like to reproduce a posting from David Lang on the rsyslog mailing list. I consider this to be important information and would like to have it available for easy reference.

Here we go:

the company that I work for has decided to sponser multi-message queue
output capability, they have chosen to remain anonomous (I am posting from
my personal account)

there are two parts to this.

1. the interaction between the output module and the queue

2. the configuration of the output module for it’s interaction with the
database

On for the first part (how the output module interacts with the queue), the
criteria are that

1. it needs to be able to maintain guarenteed delivery (even in the face
of crashes, assuming rsyslog is configured appropriately)

2. at low-volume times it must not wait for ‘enough’ messages to
accumulate, messages should be processed with as little latency as
possible

to meet these criteria, what is being proposed is the following

a configuration option to define the max number of messages to be
processed at once.

the output module goes through the following loop

X=max_messages

if (messages in queue)
mark that it is going to process the next X messages
grab the messages
format them for output
attempt to deliver the messages
if (message delived sucessfully)
mark messages in the queue as delivered
X=max_messages (reset X in case it was reduced due to delivery errors)
else (delivering this batch failed, reset and try to deliver the first half)
unmark the messages that it tried to deliver (putting them back into the status where no delivery has been attempted)
X=int(# messages attempted / 2)
if (X=0)
unable to deliver a single message, do existing message error
process

this approach is more complex than a simple ‘wait for X messages, then
insert them all’, but it has some significant advantages

1. no waiting for ‘enough’ things to happen before something gets written

2. if you have one bad message, it will transmit all the good messages
before the bad one, then error out only on the bad one before picking up
with the ones after the bad one.

3. nothing is marked as delivered before delivery is confirmed.

an example of how this would work

max_messages=15

messages arrive 1/sec

it takes 2+(# messages/2) seconds to process each message (in reality the
time to insert things into a database is more like 10 + (# messages / 100)
or even more drastic)

with the traditional rsyslog output, this would require multiple output
threads to keep up (processing a single message takes 1.5 seconds with
messages arriving 1/sec)

with the new approach and a cold start you would see

message arrives (Q=1) at T=0
om starts processing message a T=0 (expected to take 2.5)
message arrives (Q=2) at T=1
message arrives (Q=3) at T=2
om finishes processing message (Q=2) at T=2.5
om starts processing 2 messages at T=2.5 (expected to take 3)
message arrives (Q=4) at T=3
message arrives (Q=5) at T=4
message arrives (Q=6) at T=5
om finishes processing 2 messages (Q=4) at T=5.5
om starts processing 4 messages at T=5.5 (expected to take 4)
message arrives (Q=5) at T=6
message arrives (Q=6) at T=7
message arrives (Q=7) at T=8
message arrives (Q=8) at T=9
om finishes processing 4 messages (Q=4) at T=9.5
om starts processing 4 messages at T=9.5 (expected to take 4)

the system is now in a steady state

message arrives (Q=5) at T=10
message arrives (Q=6) at T=11
message arrives (Q=7) at T=12
message arrives (Q=8) at T=13
om finishes processing 4 messages (Q=4) at T=13.5
om starts processing 4 messages at T=13.5 (expected to take 4)

if a burst of 10 extra messages arrived at time 13.5 this last item would
become

11 messages arrive at (Q=14) at T=13.5
om starts processing 14 messages at T=13.5 (expected to take 9)
message arrives (Q=15) at T=14
message arrives (Q=16) at T=15
message arrives (Q=17) at T=16
message arrives (Q=18) at T=17
message arrives (Q=19) at T=18
message arrives (Q=20) at T=19
message arrives (Q=21) at T=20
message arrives (Q=22) at T=21
message arrives (Q=23) at T=22
om finishes processing 14 messages (Q=9) at T=22.5
om starts processing 9 messages at T=22.5 (expected to take 6.5)