## Tuesday, 21 April 2015

### the full wage prediction results

So, I got bored of this, but I guess I should post my results! Spoiler: 77.1% success rate.

OK. First I did some precomputation:
simm |*> #=> select[1,100] similar[input-pattern,pattern] |_self>
map[simm,similarity-result] rel-kets[input-pattern] |>
This took about a week! Yeah, we could do with more speed. Thankfully similar[op] should be easy to parallelize. But now we have this it is very quick to play with settings.
-- load up the results:

-- find the number of "above 50k" and "below 50k" in the training set:
\$ grep "^M" adult-wage-pattern-recognition--saved-simm.sw | grep -c "above"
7841

\$ grep "^M" adult-wage-pattern-recognition--saved-simm.sw | grep -c "below"
24720

-- define our norm matrix, that takes into account the relative frequencies of "above 50k" vs "below 50k":
sa: norm |above-50K> => .000127534753220 |_self>
sa: norm |below-50K> => .000040453074433 |_self>

-- define our first attempt at a h:
sa: h |*> #=> normalize[100] coeff-sort norm M select[1,5] similarity-result |_self>

-- define a couple of useful operators:
sa: equal? |*> #=> equal(h|_self>,100 answer |_self>)
sa: is-equal? |*> #=> max-elt wif(equal? |_self>,|True>,|False>)

-- find the table of results:
sa: table[input,h,answer,is-equal?] rel-kets[input-pattern] |>

-- now the results for this h:
\$ grep -c "example" adult-wage-prediction-table-select-1-5.txt
16281

\$ grep -c "True" adult-wage-prediction-table-select-1-5.txt
12195

-- the percent correct:
100*12195/16281
= 74.903 %

-- next attempt at h (just pick the best match, and ignore the rest):
sa: h |*> #=> 100 M select[1,1] similarity-result |_self>

-- find the table of results:
sa: table[input,h,answer,is-equal?] rel-kets[input-pattern] |>

-- now the results for this h:
\$ grep -c "True" adult-wage-prediction-table-select-1-1.txt
12549

-- the percent correct:
100*12549/16281
= 77.077 %
Finally, I tried using apply-weights, but I couldn't improve on 77.1%.
eg:
h |*> #=> normalize[100] coeff-sort norm M apply-weights[5,4,3,2,1] similarity-result |_self>
Maybe if we had some iterative procedure to choose the weights on a sample set, and then apply that to the full set, we might improve on 77%. But I gave up!

And a note, these tables of 16,281 entries take about 2 minutes to generate. Without the precomputation, they would take the full week, for each tweak of h.

Another possible method to improve on 77% and get closer to the 84% I see with other methods is to tweak our supervised pattern recognition algo. The apply-weights is really trying to change weights after the similarity has been calculated. But we can also do it before, and pre-weight our superpositions before we feed them to simm.

Given the training data set D:
D = {(X1,Y1),(X2,Y2),...(Xn,Yn)}
where Xi, and Yi are superpositions (and must not be empty superpositions that have all coeffs equal to 0)

Then learn these rules:
pattern |node: 1> => X1
pattern |node: 2> => X2
...
pattern |node: n> => Xn

M |node: 1> => Y1
M |node: 2> => Y2
...
M |node: n> => Yn

Then given the unlabeled data set U = {Z1,Z2,...Zm}, where Zi are superpositions of the same type as Xi, learn these rules:
input-pattern |example: 1> =>  Z1
input-pattern |example: 2> =>  Z2
...
input-pattern |example: m> =>  Zm
We first find a matrix W that re-weights our Xk and Zk superpositions/patterns. Then do:
Given the training data set D:
D = {(X1,Y1),(X2,Y2),...(Xn,Yn)}
where Xi, and Yi are superpositions (and must not be empty superpositions that have all coeffs equal to 0)

Then learn these rules:
pattern |node: 1> => W X1
pattern |node: 2> => W X2
...
pattern |node: n> => W Xn

M |node: 1> => Y1
M |node: 2> => Y2
...
M |node: n> => Yn

Then given the unlabeled data set U = {Z1,Z2,...Zm}, where Zi are superpositions of the same type as Xi, learn these rules:
input-pattern |example: 1> =>  W Z1
input-pattern |example: 2> =>  W Z2
...
input-pattern |example: m> =>  W Zm
And note that W does not need to be square. Indeed, the output of "W Xk" can be a completely different type of superposition than Xk. But again, like the apply-weights idea, I don't know a good way to find W. Perhaps borrow some ideas from standard artificial neural networks?

That's it for this post!

Update: I tried a new h, but only got 74% success (12043/16281).
h |*> #=> normalize[100] coeff-sort norm M invert subtraction-invert[1] select[1,5] similarity-result |_self>
I also tried select[1,3] and select[1,10] but they were worse.