Friday, 5 August 2016

full mnist results

I don't want to think about how long it took me to get to this point, but I finally have the full MNIST results: roughly 5.4% error. Yeah, not as good as I hoped, but it was my first try. I need thinking time to work out where to go from here. I have a couple of obvious options: 1) retry with different average-categorize settings, 2) try for more than one layer. The difficulty with (2) is that I don't know how. I'm not using the standard ANN of y = f(dot-product(w,x)), I'm using my custom y = f(simm(w,x)), so having multiple layers has different properties than standard ANN's.

The details:
-- NB: need to tweak the destination sw file inside these scripts:
$ ./phi-superpositions.py 5 work-on-handwritten-digits/phi-transformed-images-v2--10k-test--edge-enhanced-20/
$ ./phi-superpositions-v3.py 5 work-on-handwritten-digits/phi-transformed-images-v2--60k-train--edge-enhanced-20/

-- now we have the sw files, load them into the console:
load image-phi-superpositions--test-10k--using-edge-enhanced-features--k_5--t_0_4.sw
load image-phi-superpositions--train-60k--using-edge-enhanced-features--k_5--t_0_4.sw
load mnist-test-labels--edge-enhanced.sw
load mnist-train-labels--edge-enhanced.sw

-- define our if-then machine operator:
simm-op |*> #=> 100 select[1,40] similar-input[train-log-phi-sp] log-phi-sp |_self>

-- find all the similarity results:
map[simm-op,similarity] rel-kets[log-phi-sp]

-- define the operators that create the result table:
equal? |*> #=> equal(100 test-label |_self>,h |_self>)
h |*> #=> normalize[100] select[1,1] coeff-sort train-label select[1,1] similarity |_self>
score |top 1> => 0.01 equal? ket-sort rel-kets[similarity] |>

h |*> #=> normalize[100] select[1,1] coeff-sort train-label select[1,2] similarity |_self>
score |top 2> => 0.01 equal? ket-sort rel-kets[similarity] |>

h |*> #=> normalize[100] select[1,1] coeff-sort train-label select[1,3] similarity |_self>
score |top 3> => 0.01 equal? ket-sort rel-kets[similarity] |>
...

h |*> #=> normalize[100] select[1,1] coeff-sort train-label select[1,30] similarity |_self>
score |top 30> => 0.01 equal? ket-sort rel-kets[similarity] |>

-- finally, spit out the result table:
table[top-k,score] rel-kets[score]
+--------+------------+
| top-k  | score      |
+--------+------------+
| top 1  | 93.10 True |
| top 2  | 93.10 True |
| top 3  | 94.12 True |
| top 4  | 94.54 True |
| top 5  | 94.57 True |
| top 6  | 94.60 True |
| top 7  | 94.55 True |
| top 8  | 94.53 True |
| top 9  | 94.43 True |
| top 10 | 94.53 True |
| top 11 | 94.43 True |
| top 12 | 94.48 True |
| top 13 | 94.44 True |
| top 14 | 94.49 True |
| top 15 | 94.37 True |
| top 16 | 94.36 True |
| top 17 | 94.29 True |
| top 18 | 94.22 True |
| top 19 | 94.20 True |
| top 20 | 94.19 True |
| top 21 | 94.17 True |
| top 22 | 94.16 True |
| top 23 | 94.16 True |
| top 24 | 94.09 True |
| top 25 | 94.05 True |
| top 26 | 94.06 True |
| top 27 | 93.97 True |
| top 28 | 94 True    |
| top 29 | 94 True    |
| top 30 | 93.99 True |
+--------+------------+

-- save the results:
save full-mnist-phi-transformed-edge-enhanced--saved.sw
And if we are allowed to pick and choose how many results to average over, if we average over the top 6, we get 94.6% correct, or 5.4% error. However, if we compare this result with those on the MNIST home page, 5.4% error is like 1998 level result, or slightly better. But like I said, this is a first attempt, surely I can improve on it.

Anyway, I think my point is made: "we can make a lot of progress in pattern recognition if we can find mappings from objects to well-behaved, deterministic, distinctive superpositions". I just need to find a better mapping for digit images to superpositions.

Update: I have a new idea to test. Maybe if-then machines don't work the way I expected. Consider:
pattern |node 1: 1> => sp1
then |node 1: 1> => then-sp

pattern |node 2: 1> => sp2
then |node 2: 1> => then-sp
versus:
pattern |node 1: 1> => sp1
pattern |node 1: 2> => sp2
then |node 1: *> => then-sp
I had assumed, without much thought, that functionally these are equivalent. ie, we can expand or contract the if-then machines, if they share a "then" pattern. I now suspect, but have yet to test, that the second one, where we contract the if-then machines might work better. Consider an input spatial pattern that is partly sp1 and partly sp2. Presumably the second case will give better results. Anyway, I now have to try this on MNIST. So instead of effectively 60,000 if-then machines, we will have 10.