The Semantic DB Project: a superposition implementation of simm

Previously I gave a list implementation for simm.
Recall:

simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)
simm(w,f,g) = \Sum_k w[k] min(f[k],g[k]) / max(w*f,w*g)

In this post I'm going to give a couple of superposition implementations of this equation. And we should note in the BKO scheme most superpositions have coeffs >= 0, so we can use the min version of simm given above. So, first we need to observe some correspondences:

\Sum_k x[k] <=> x.count_sum()
min[f[k],g[k]] <=> intersection(f,g)
rescale so w*f == w*g == 1 <=> f.normalize() and g.normalize()

And so, some python:

# unscaled simm:
def unscaled_simm(A,B):
  wf = A.count_sum()
  wg = B.count_sum()
  if wf == 0 and wg == 0:
    return 0
  return intersection(A,B).count_sum()/max(wf,wg)

# weighted scaled simm:
def weighted_simm(w,A,B):
  A = multiply(w,A)
  B = multiply(w,B)
  return intersection(A.normalize(),B.normalize()).count_sum()

# standard use case simm:
def simm(A,B):
  if A.count() <= 1 and B.count() <= 1:
    a = A.ket()
    b = B.ket()
    if a.label != b.label:
      return 0
    a = max(a.value,0)                        # just making sure they are >= 0.
    b = max(b.value,0)
    if a == 0 and b == 0:                     # prevent div by zero.
      return 0
    return min(a,b)/max(a,b)
  return intersection(A.normalize(),B.normalize()).count_sum()

where, most of the time I use the last one. And I should note it is only the last one that takes into account you should not rescale if length = 1. I didn't bother with the other two versions, since I don't actually use them much at all.

So why is this distinction important, well, some examples:
If you rescale length 1 you get:
simm(|a>,2|a>) = 1
when you really want:
simm(|a>,2|a>) = 0.5
But this is only an issue if both superpositions are length 1. eg, this case has no issues:
simm(|a>,|a> + |b>) = 0.5

I guess that is it for this post! Heaps more to come!

The Semantic DB Project

Thursday, 19 February 2015

a superposition implementation of simm

No comments:

Post a Comment