24
Aug 08

Squared Euclidean Distance on Hashes in Ruby

For some simple analysis I’m doing, I needed a routine to calculate squared Euclidean distance in Ruby on two hashes. The squared Euclidean distance d2 between two n-dimensional vectors x1 and x2 is:

Formula for squared Euclidean distance

I needed this calculation for values corresponding to the intersection of the set of keys for two hashes. I came up with a couple of ways to do this.

Here’s a few test cases to show what I need (I’m playing with colorization by Emacs Htmlize):

if __FILE__ == $PROGRAM_NAME

  require 'test/unit'

  class TestSqEucDist < Test::Unit::TestCase
    def test_same_hash_should_be_zero
      a = { 'one' => 1.0, 'two' => 2.0, 'three' => 3.0 }
      assert_equal(euc_sq(a, a), 0.0, "euc_sq(a, a) should be zero.")
      assert_equal(a.euc_sq_from(a), 0.0, "euc_sq(a, a) should be zero.")
    end

    def test_no_common_keys_should_be_zero
      a = { 'one' => 1.0, 'two' => 2.0, 'three' => 3.0 }
      b = { 'a' => 4.4, 'b' => 5.5, 'c' => 6.6 }
      assert_equal(euc_sq(a, b), 0.0, "euc_sq on hashes with no common keys should be zero.")
      assert_equal(euc_sq(b, a), 0.0, "euc_sq on hashes with no common keys should be zero.")
      assert_equal(a.euc_sq_from(b), 0.0, "euc_sq on hashes with no common keys should be zero.")
    end

    def test_two_random_hashes
      a = { 'one' => 1.0, 'two' => 2.0, 'three' => 3.0, 'four' => 4.0 }
      b = { 'a' => 4.4, 'three' => 5.5, 'c' => 6.6, 'two' => 2.2 }
      assert_equal(euc_sq(a, b), 6.29, "euc_sq on a random case.")
      assert_equal(a.euc_sq_from(b), 6.29, "euc_sq on a random case.")
    end

  end

end

Based on these simple test cases, I’ve come up with a couple of approaches.

The first implementation I did used Hash#merge to find common keys:

# Sum the squares of the differences of corresponding values of two hashes
def euc_sq(h1,h2)
  sum = 0
  h1.merge(h2) do |k, old, new|
    sum += (old - new)**2
  end
  sum
end

When a common key is found, the block is executed, which accumulates the result.

However, I really didn’t care for manually keeping track of the accumulator. Accumulation operations should really leverage Enumerable#inject or a similar mechanism. So, I rewrote it to be:

def euc_sq(h1,h2)
  h1.inject(0) do |s, (k,v)|
    h2.has_key?(k) ? (s + (v - h2[k])**2) : s
  end
end

I also packaged it up as a method on Hash, as can be seen in the tests listed above:

class Hash
  def euc_sq_from(other)
    inject(0) do |s, (k,v)|
      other.has_key?(k) ? (s + (v - other[k])**2) : s
    end
  end
end

But this seems like a fairly specialized task to be bundling up into Hash. Isn’t there a way to generalize this a bit? It’s basically an accumulation operation on the values from the intersection of the key set of two hashes. That seems like it could be written as:

class Hash
  def intersection_accum(init, other)
    inject(init) do |acc, (k, v)|
      other.has_key?(k) ? (yield acc, k, v, other[k]) : acc
    end
  end
end

Then the squared Euclidean distance operation could be:

class Hash
  def euc_sq_from(other)
    intersection_accum(0.0, other) { |s, k, v1, v2| s + (v1 - v2)**2 }
  end
end

And then I think it buys you the flexibility to do other operations, such as Manhattan distance (or city-block distance, …):

class Hash
  def manhattan_dist_from(other)
    intersection_accum(0.0, other) { |s, k, v1, v2| s + (v1 - v2).abs }
  end
end

I’m not sure where this is going, but I think the minor flexibility bonus is a good result of the progression.


3 Responses to “Squared Euclidean Distance on Hashes in Ruby”

  1. Marc Says:

    Andy – you rock! I needed the euclidean distance for a project I’m working on. This is a great implementation. Much cleaner than the way I was doing it (keeping track of the sum).

  2. andy Says:

    Hey Marc, glad you could use this.

  3. jason Says:

    nice implementation, also cleaner than the one I wrote, I wonder if we can make it any faster.

Leave a Reply


Copyright © 2010 ANDY PAYNE
Powered by WordPress and Theme Lab