Generated by spec.rb at: Thu Jan 10 12:20:34 +0100 2008

How to detect changes in a set containing values from ascending number sequence

Table of contents

Intro

This article deals with an extremely simple and fairly efficient way to detect changes in a set which gets its values from ascending number sequence.

Sounds fancy, right? I mean -- that sounds like something you'd use to impress your date (assuming she's kinda geeky, like you are). In reality, you might sometime need this as a web programmer doing some basic work (for instance in Ruby-on-Rails dealing with auto_increment fields in MySQL).

Imagine scenario where you have simple "Queue" that you need to check periodically and react to situation when some of the tickets got kicked off the queue and/or some tickets got added. You don't care all that much which ones, you just need to be able to answer this question:

and act upon the answer.

Also, the implicit assumption here is that you can't use any "RPC" call to tell from one process (the one making changes) to another (the one monitoring the queue) that something changed. If you could (for instance by using some other table to store "I made a change" events), it would be easy.

A road to solution

So, what's the first thing that comes into mind? Let's grab those ticket ids into an array and compare the two later on.

Sounds lovely, doesn't it?

You'd code it (in Rails / ActiveRecord) like this:

queue = Queue.find(:all).map { |x| x.id }

compare that result with previously saved array and it'd work very well -- for a table with 10 values.

Now try that on a table with a couple of thousands records -- and go fetch a coffee while you're waiting to get an reply.

But wait, we can make it all better, right? We'll just fetch "id" attribute and make it much faster:

queue = Queue.find(:all, :select => "id").map { |x| x.id }

Ha! It's running much better now! Yeeee-haw!

No, not really. Couple of times faster, but still dog-slow.

What now?

Well, you might feel you've hit dead end. You might try to mess with Observers on the model, which is fine approach -- but it still sucks when you don't have Rails and/or the Queue is filled outside of Rails by some non-railsy process.

Fortunately, there's a (much) better way.

What if we counted number of elements in the Queue? That would tell us about single change (add, deletion), right?

But that's hardly enough, as having 1 add and 1 delete would prevent us from getting the correct answer.

So, we gotta add something else ...

I'm growing bored typing all this, so I'll make it short from now on ("Oh, thank God!" you say? ;-) ).

A solution

If you compute two things:

and compare it to previously saved values, you'll get the correct answer.

Which would look like this:

count = Queue.count
sum = Queue.sum(:id)

Seems too simple to work, right? OK, let's convince you it's correct.

Why is the solution correct

First let's recap what we're dealing with here:

set of values
meaning no value is repeated
domain is ever-growing (ascending) integer sequence
like the auto_increment field from MySQL some Railsers love so much

From those conditions we can conclude that the only real problem we face is when we get N new elements added and N elements removed at the same time, right?

Because if you get different count added/removed, the "count" portion would be off from previous value and you can stop right there.

Now, if you add N new elements, their ID values will all be greater than ID of the greatest element from that set.

Therefore if you at the same time remove N elements from that set, the sum has to be different (bigger, actually), as the removed elements had lower IDs than the newly added ones.

To try it in Math-speak:

Q                  <- our queue (set of values)
N                  <- number of elements added/removed
A_1, A_2, ..., A_N <- newly added elements' IDs, no two of the same value
B_1, B_2, ..., B_N <- deleted elements' IDs, no two of the same value
M                  <- max Q
A                  <- sum { A_1, ..., A_N }
B                  <- sum { B_1, ..., B_N }

(1) for all A_i (i in 1..N): A_i > M.
(2) for all B_i (i in 1..N): B_i <= M.

therefore:

A = N * M + X, X > 0  because of (1)
B = N * M - Y, Y > 0  because of (2)

therefore

A > B

QED.

Conclusion

I've shown a simple way to detect changes in a set with values from ascending number sequence when you can't use any outside mechanism to signal the change was made (RPC, observers, all that jazz).

Of course, this solution is bad when you trip over certain number of elements in the queue, as it's time complexity is still linear -- O(n).

Btw, when I have to implement beast like this, I do it along this way:

conditions = "..."
count_proc = proc { Queue.count(:conditions => conditions) }
sum_proc = proc { Queue.sum(:id, :conditions => conditions) }

old_count = count_proc.call
old_sum = sum_proc.call

if old_count == count_proc.call && old_sum == sum_proc.call
    return :unchanged
else
    return :changed
end

to avoid code duplication and (possibly) cut the execution time in half by use of lazy evaluation.

Also, I'm aware of the fact this is a bit wasteful, as you have to go through the whole set two times -- to get count and sum independently. But so far I don't know how to do it in a single step (in a db-neutral fashion), do you?

If you do know or have any comments regarding this article, drop me a line (you can find my email at front page).