The Semantic DB Project: a list implementation of the simm

Briefly, here is the code for a list implementation of simm (as opposed to the superposition version). Note it is unscaled, so we are not rescaling f and g so that w*f == w*g. I may give the code for that later (I don't think I have it typed up anywhere just yet).

Recall:

simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)

Python:

def list_simm(w,f,g):
  the_len = min(len(f),len(g))
#  w += [0] * (the_len - len(w))
  w += [1] * (the_len - len(w))
  f = f[:the_len]
  g = g[:the_len]

  wf = sum(abs(w[k]*f[k]) for k in range(the_len))
  wg = sum(abs(w[k]*g[k]) for k in range(the_len))
  wfg = sum(abs(w[k]*f[k] - w[k]*g[k]) for k in range(the_len))
  
  if wf == 0 and wg == 0:
    return 0
  else:
    return (wf + wg - wfg)/(2*max(wf,wg))

And that's it! Heaps more to come!

Update: OK. I wrote a rescaled version of list simm. I haven't tested it, but I think it is probably right :)
Python:

def rescaled_list_simm(w,f,g):
  the_len = min(len(f),len(g))
# normalize lengths of our lists:
#  w += [0] * (the_len - len(w))
  w += [1] * (the_len - len(w))
  f = f[:the_len]
  g = g[:the_len]

# rescale step, first find size:
  s1 = sum(abs(w[k]*f[k]) for k in range(the_len))
  s2 = sum(abs(w[k]*g[k]) for k in range(the_len))

# if s1 == 0, or s2 == 0, we can't rescale:
  if s1 == 0 or s2 == 0:
    return 0
  
# now rescale:
# we just need w*f == w*g, the exact value doesn't matter, so we choose 1.
# noting that our equation has symmetry under: "f => k.f, g => k.g"
# also, note that finite precision floats means sometimes it does matter, but hopefully we will be fine.
  f = [f[k]/s1 for k in range(the_len)]  
  g = [g[k]/s2 for k in range(the_len)]  
  
# proceed with algo:
# if we did the rescale step correctly we will have:
# wf == wg == 1
#  wf = sum(abs(w[k]*f[k]) for k in range(the_len))
#  wg = sum(abs(w[k]*g[k]) for k in range(the_len))
  wfg = sum(abs(w[k]*f[k] - w[k]*g[k]) for k in range(the_len))
  
# we should never have wf or wg == 0 in the rescaled case:
#  if wf == 0 and wg == 0:
#    return 0
#  else:
#    return (wf + wg - wfg)/(2*max(wf,wg))
  return (2 - wfg)/2

Update: OK. May as well do an implementation of Gaussian simm too:

gaussian-simm(s,f,g) = exp(-||f - g||^2/2s)

Python:

import math

# define Euclidean Distance function:
def ED(f,g):
  if len(f) != len(g):
    print("different length vectors!")
    return 0
  return math.sqrt(sum((f[k] - g[k])**2 for k in range(len(f))))

# define Guassian simm:
# guassian-simm(s,f,g) = exp(-||f - g||^2/2s)
def guass_simm(s,f,g):
  return math.exp(-ED(f,g)**2/2*s)

The Semantic DB Project

Tuesday, 17 February 2015

a list implementation of the simm

No comments:

Post a Comment