24
Aug 08Squared Euclidean Distance on Hashes in Ruby
For some simple analysis I’m doing, I needed a routine to calculate squared Euclidean distance in Ruby on two hashes. The squared Euclidean distance d2 between two n-dimensional vectors x1 and x2 is:
I needed this calculation for values corresponding to the intersection of the set of keys for two hashes. I came up with a couple of ways to do this.
Here’s a few test cases to show what I need (I’m playing with colorization by Emacs Htmlize):
if __FILE__ == $PROGRAM_NAME require 'test/unit' class TestSqEucDist < Test::Unit::TestCase def test_same_hash_should_be_zero a = { 'one' => 1.0, 'two' => 2.0, 'three' => 3.0 } assert_equal(euc_sq(a, a), 0.0, "euc_sq(a, a) should be zero.") assert_equal(a.euc_sq_from(a), 0.0, "euc_sq(a, a) should be zero.") end def test_no_common_keys_should_be_zero a = { 'one' => 1.0, 'two' => 2.0, 'three' => 3.0 } b = { 'a' => 4.4, 'b' => 5.5, 'c' => 6.6 } assert_equal(euc_sq(a, b), 0.0, "euc_sq on hashes with no common keys should be zero.") assert_equal(euc_sq(b, a), 0.0, "euc_sq on hashes with no common keys should be zero.") assert_equal(a.euc_sq_from(b), 0.0, "euc_sq on hashes with no common keys should be zero.") end def test_two_random_hashes a = { 'one' => 1.0, 'two' => 2.0, 'three' => 3.0, 'four' => 4.0 } b = { 'a' => 4.4, 'three' => 5.5, 'c' => 6.6, 'two' => 2.2 } assert_equal(euc_sq(a, b), 6.29, "euc_sq on a random case.") assert_equal(a.euc_sq_from(b), 6.29, "euc_sq on a random case.") end end end
Based on these simple test cases, I’ve come up with a couple of approaches.
The first implementation I did used Hash#merge to find common keys:
# Sum the squares of the differences of corresponding values of two hashes def euc_sq(h1,h2) sum = 0 h1.merge(h2) do |k, old, new| sum += (old - new)**2 end sum end
When a common key is found, the block is executed, which accumulates the result.
However, I really didn’t care for manually keeping track of the accumulator. Accumulation operations should really leverage Enumerable#inject or a similar mechanism. So, I rewrote it to be:
def euc_sq(h1,h2) h1.inject(0) do |s, (k,v)| h2.has_key?(k) ? (s + (v - h2[k])**2) : s end end
I also packaged it up as a method on Hash, as can be seen in the tests listed above:
class Hash def euc_sq_from(other) inject(0) do |s, (k,v)| other.has_key?(k) ? (s + (v - other[k])**2) : s end end end
But this seems like a fairly specialized task to be bundling up into Hash. Isn’t there a way to generalize this a bit? It’s basically an accumulation operation on the values from the intersection of the key set of two hashes. That seems like it could be written as:
class Hash def intersection_accum(init, other) inject(init) do |acc, (k, v)| other.has_key?(k) ? (yield acc, k, v, other[k]) : acc end end end
Then the squared Euclidean distance operation could be:
class Hash def euc_sq_from(other) intersection_accum(0.0, other) { |s, k, v1, v2| s + (v1 - v2)**2 } end end
And then I think it buys you the flexibility to do other operations, such as Manhattan distance (or city-block distance, …):
class Hash def manhattan_dist_from(other) intersection_accum(0.0, other) { |s, k, v1, v2| s + (v1 - v2).abs } end end
I’m not sure where this is going, but I think the minor flexibility bonus is a good result of the progression.

June 2nd, 2009 at 2:38 pm
Andy – you rock! I needed the euclidean distance for a project I’m working on. This is a great implementation. Much cleaner than the way I was doing it (keeping track of the sum).
September 1st, 2009 at 9:33 am
Hey Marc, glad you could use this.
December 24th, 2009 at 10:32 pm
nice implementation, also cleaner than the one I wrote, I wonder if we can make it any faster.