Friday, 6 March 2015

the main event: pattern recognition of websites

We are finally there! We deliberately left |website 11> out of our average website hashes. Now, given those as an input, do they classify correctly?

Here is the BKO:
```-- define the list of average websites:
|ave list> => |average abc> + |average adelaidenow> + |average slashdot> + |average smh> + |average wikipedia> + |average youtube>

-- we want average hash to be distinct from the other hashes:
|null> => map[hash-4B,average-hash-4B] "" |ave list>

-- now, let's see how well these patterns recognize the pages we left out of our average:
result |abc 11> => 100 similar[hash-4B,average-hash-4B] |abc 11>
result |adelaidenow 11> => 100 similar[hash-4B,average-hash-4B] |adelaidenow 11>
result |slashdot 11> => 100 similar[hash-4B,average-hash-4B] |slashdot 11>
result |smh 11> => 100 similar[hash-4B,average-hash-4B] |smh 11>
result |wikipedia 11> => 100 similar[hash-4B,average-hash-4B] |wikipedia 11>
result |youtube 11> => 100 similar[hash-4B,average-hash-4B] |youtube 11>

-- tidy results:
tidy-result |abc 11> => drop-below[40] result |_self>
tidy-result |adelaidenow 11> => drop-below[40] result |_self>
tidy-result |slashdot 11> => drop-below[40] result |_self>
tidy-result |smh 11> => drop-below[40] result |_self>
tidy-result |wikipedia 11> => drop-below[40] result |_self>
tidy-result |youtube 11> => drop-below[40] result |_self>
```
And now, drum-roll, the results!
```sa: load improved-fragment-webpages.sw

sa: matrix[result]
[ average abc         ] = [  91.70  28.73  25.76  37.77  29.45  24.33  ] [ abc 11         ]
[ average adelaidenow ]   [  28.77  78.11  26.71  29.85  25.25  28.18  ] [ adelaidenow 11 ]
[ average slashdot    ]   [  25.76  26.88  79.05  28.27  26.86  23.20  ] [ slashdot 11    ]
[ average smh         ]   [  37.80  29.75  28.16  85.55  32.06  24.95  ] [ smh 11         ]
[ average wikipedia   ]   [  29.71  25.25  26.91  31.86  85.19  22.09  ] [ wikipedia 11   ]
[ average youtube     ]   [  24.32  28.18  23.47  24.92  21.94  82.12  ] [ youtube 11     ]

sa: matrix[tidy-result]
[ average abc         ] = [  91.70  0      0      0      0      0      ] [ abc 11         ]
[ average adelaidenow ]   [  0      78.11  0      0      0      0      ] [ adelaidenow 11 ]
[ average slashdot    ]   [  0      0      79.05  0      0      0      ] [ slashdot 11    ]
[ average smh         ]   [  0      0      0      85.55  0      0      ] [ smh 11         ]
[ average wikipedia   ]   [  0      0      0      0      85.19  0      ] [ wikipedia 11   ]
[ average youtube     ]   [  0      0      0      0      0      82.12  ] [ youtube 11     ]
```
Finally, let's look at the discrimination. ie the difference between the highest matching result and the second highest:
```sa: discrimination |*> #=> discrim result |_self>
sa: table[page,discrimination] rel-kets[result] |>
+----------------+----------------+
| page           | discrimination |
+----------------+----------------+
| abc 11         | 53.90          |
| adelaidenow 11 | 48.36          |
| slashdot 11    | 50.89          |
| smh 11         | 47.78          |
| wikipedia 11   | 53.14          |
| youtube 11     | 53.94          |
+----------------+----------------+
```
There we have it. Discrimination on the order of 50%! That is good.

Heaps more to come!