Thursday, 28 July 2016

p pattern similarity metric in python

Last post we gave the p pattern similarity metric in nice pretty LaTeX. In this post, verify my maths by actually implementing it in python. Yeah, there were a couple of steps I wasn't 100% confident in, but the python says I'm fine.

Here is the python:
import math
import cmath

# define p'th roots of unity:
def jpk(p,k):
  return cmath.exp(1j*2*math.pi*k/p)

# define wf_k:
def wf(vect):
  return sum(abs(x) for x in vect)

# define wf^p:
def wfp(vects):
  p = len(vects)
  i_max = len(vects[0])          # assume all vects are the same size as the first one.
  r1 = 0
  for i in range(i_max):
    r2 = 0
    for k in range(p):
      r2 += jpk(p,k)*vects[k][i]
    r1 += abs(r2)
  return r1

def multi_simm(vects):
  p = len(vects)
  i_max = len(vects[0])          # assume all vects are the same size as the first one.

  # sum over wf_k term:
  r1 = 0
  max_wf = 0
  for k in range(p):
    wf_k = wf(vects[k])
    max_wf = max(max_wf,wf_k)
    r1 += wf_k

  # wf^p term:
  r2 = wfp(vects)

  # p.max term:
  r3 = p*max_wf

  # prevent divide by 0:
  if r3 == 0:
    return 0

  # return result:
  return (r1 - r2)/r3

def rescaled_multi_simm(vects):
  p = len(vects)
  i_max = len(vects[0])          # assume all vects are the same size as the first one.

  # find normalization terms:
  norms = []
  for k in range(p):
    wf_k = wf(vects[k])
    if wf_k == 0:                # prevent divide by zero
      return 0
    norms.append(wf_k)

  # find normalized wf^p:
  r1 = 0
  for i in range(i_max):
    r2 = 0
    for k in range(p):
      r2 += jpk(p,k)*vects[k][i]/norms[k]
    r1 += abs(r2)

  # return result:
  return 1 - r1/p

# test the code:
print("wfp: %s" % wfp(list_of_vects))
print("multi-simm: %s" % multi_simm(list_of_vects))
print("rescaled-multi-simm: %s" % rescaled_multi_simm(list_of_vects))
Now, some test cases. First, with all patterns equal, which should give 1, else we made a mistake!
list_of_vects = [[2,3,4,5,6], [2,3,4,5,6], [2,3,4,5,6]]

$ ./multi-simm.py
wfp: 5.887076992907251e-15
multi-simm: 0.9999999999999999
rescaled-multi-simm: 0.9999999999999999
Next, all patterns "disjoint", this time we expect 0, else we made a mistake:
list_of_vects = [[5,0,0,0], [0,-5,0,0], [0,0,-5,0], [0,0,0,5]]

$ ./multi-simm.py
wfp: 20.0
multi-simm: 0.0
rescaled-multi-simm: 0.0
Next, test that rescaling works, and gives a different answer to non-rescaled:
list_of_vects = [[2,3,4,5,6], [4,6,8,10,12], [6,9,12,15,18]]

$ ./multi-simm.py
wfp: 34.641016151377556
multi-simm: 0.47421657693679137
rescaled-multi-simm: 0.9999999999999999
And finally, a test case where we don't expect 0 or 1:
list_of_vects = [[2,3,4,5,6], [6,5,4,3,2], [2,4,3,5,6], [2,4,5,3,6]]

$ ./multi-simm.py
wfp: 10.82842712474619
multi-simm: 0.8646446609406727
rescaled-multi-simm: 0.8646446609406726
Cool. It all seems to work as desired. Heh, now I need to find a use case for p > 2.

No comments:

Post a Comment