Thursday, 19 March 2015

the normed frequency class equation

In this post I will give the normed frequency class equation. I guess it could be considered a type of fuzzy set membership function. If all coeffs in a superposition X are equal, then it gives 1 if a ket is in X, and 0 if that ket is not in X. If the coeffs are not all equal then it has fuzzier properties.

Here is the python:
# e is a ket, X is a superposition
# for best effect X should be a frequency list
def normed_frequency_class(e,X):
  e = e.ket()                                  # make sure e is a ket, not a superposition, else X.find_value(e) bugs out.
  X = X.drop()                                 # drop elements with coeff <= 0
  smallest = X.find_min_coeff()                # return the min coeff in X as float
  largest = X.find_max_coeff()                 # return the max coeff in X as float
  f = X.find_value(e)                          # return the value of ket e in superposition X as float

  if largest <= 0 or f <= 0:                   # otherwise the math.log() blows up!
    return 0

  fc_max = math.floor(0.5 - math.log(smallest/largest,2)) + 1  # NB: the + 1 is important, else the smallest element in X gets reported as not in set.
  return 1 - math.floor(0.5 - math.log(f/largest,2))/fc_max
The motivation for this function is the frequency class equation given on wikipedia.
N = floor(1/2 - log_2(frequency-of-this-item/frequency-of-most-common-item))
All I have done is normalized it so 1 for best match, 0 for not in set.

I guess I should give some examples. Let's load up some knowledge in the console:
sa: load normed-frequency-class-examples.sw
sa: dump
----------------------------------------
|context> => |context: normed frequency class>

 |X> => |the> + |he> + |king> + |boy> + |outrageous> + |stringy> + |transduction> + |mouse>

 |Y> => 13|the> + 13|he> + 13|king> + 13|boy> + 13|outrageous> + 13|stringy> + 13|transduction> + 13|mouse>

 |Z> => 3789654|the> + 2098762|he> + 57897|king> + 56975|boy> + 76|outrageous> + 5|stringy> + |transduction> + |mouse>

the |*> #=> ket-nfc(|the>,""|_self>)
he |*> #=> ket-nfc(|he>,""|_self>)
king |*> #=> ket-nfc(|king>,""|_self>)
boy |*> #=> ket-nfc(|boy>,""|_self>)
outrageous |*> #=> ket-nfc(|outrageous>,""|_self>)
stringy |*> #=> ket-nfc(|stringy>,""|_self>)
transduction |*> #=> ket-nfc(|transduction>,""|_self>)
mouse |*> #=> ket-nfc(|mouse>,""|_self>)
not-in-set |*> #=> ket-nfc(|not-in-set>,""|_self>)

 |nfc table> #=> table[SP,the,he,king,boy,outrageous,stringy,transduction,mouse,not-in-set] split |X Y Z>
----------------------------------------

-- now take a look at the table:
sa: "" |nfc table>
+----+-----+----------+----------+----------+------------+----------+--------------+----------+------------+
| SP | the | he       | king     | boy      | outrageous | stringy  | transduction | mouse    | not-in-set |
+----+-----+----------+----------+----------+------------+----------+--------------+----------+------------+
| X  | nfc | nfc      | nfc      | nfc      | nfc        | nfc      | nfc          | nfc      | 0 nfc      |
| Y  | nfc | nfc      | nfc      | nfc      | nfc        | nfc      | nfc          | nfc      | 0 nfc      |
| Z  | nfc | 0.96 nfc | 0.74 nfc | 0.74 nfc | 0.30 nfc   | 0.13 nfc | 0.04 nfc     | 0.04 nfc | 0 nfc      |
+----+-----+----------+----------+----------+------------+----------+--------------+----------+------------+
And we can clearly see it has the properties promised above. |X> and |Y> give the same results even though X has all coeffs 1, and Y has all coeffs 13. "not-in-set" returned 0, since it is not in any of the three superpositions. And Z gives a nice demonstration of the fuzzy set membership idea.

That's it for this post. We will be putting it to use in the next couple of posts.

Update: for frequency lists, log(f/largest) is the best choice. But in other cases, maybe some other foo(f/largest) would work better. I haven't given it all that much thought.

No comments:

Post a Comment