OK. First I did some precomputation:
load adult-wage-pattern-recognition.sw simm |*> #=> select[1,100] similar[input-pattern,pattern] |_self> map[simm,similarity-result] rel-kets[input-pattern] |> save adult-wage-pattern-recognition--saved-simm.swThis took about a week! Yeah, we could do with more speed. Thankfully similar[op] should be easy to parallelize. But now we have this it is very quick to play with settings.
-- load up the results: sa: load adult-wage-pattern-recognition--saved-simm.sw -- find the number of "above 50k" and "below 50k" in the training set: $ grep "^M" adult-wage-pattern-recognition--saved-simm.sw | grep -c "above" 7841 $ grep "^M" adult-wage-pattern-recognition--saved-simm.sw | grep -c "below" 24720 -- define our norm matrix, that takes into account the relative frequencies of "above 50k" vs "below 50k": sa: norm |above-50K> => .000127534753220 |_self> sa: norm |below-50K> => .000040453074433 |_self> -- define our first attempt at a h: sa: h |*> #=> normalize[100] coeff-sort norm M select[1,5] similarity-result |_self> -- define a couple of useful operators: sa: equal? |*> #=> equal(h|_self>,100 answer |_self>) sa: is-equal? |*> #=> max-elt wif(equal? |_self>,|True>,|False>) -- find the table of results: sa: table[input,h,answer,is-equal?] rel-kets[input-pattern] |> result: adult-wage-prediction-table-select-1-5.txt -- now the results for this h: $ grep -c "example" adult-wage-prediction-table-select-1-5.txt 16281 $ grep -c "True" adult-wage-prediction-table-select-1-5.txt 12195 -- the percent correct: 100*12195/16281 = 74.903 % -- next attempt at h (just pick the best match, and ignore the rest): sa: h |*> #=> 100 M select[1,1] similarity-result |_self> -- find the table of results: sa: table[input,h,answer,is-equal?] rel-kets[input-pattern] |> result: adult-wage-prediction-table-select-1-1.txt -- now the results for this h: $ grep -c "True" adult-wage-prediction-table-select-1-1.txt 12549 -- the percent correct: 100*12549/16281 = 77.077 %Finally, I tried using apply-weights, but I couldn't improve on 77.1%.
eg:
h |*> #=> normalize[100] coeff-sort norm M apply-weights[5,4,3,2,1] similarity-result |_self>Maybe if we had some iterative procedure to choose the weights on a sample set, and then apply that to the full set, we might improve on 77%. But I gave up!
And a note, these tables of 16,281 entries take about 2 minutes to generate. Without the precomputation, they would take the full week, for each tweak of h.
Another possible method to improve on 77% and get closer to the 84% I see with other methods is to tweak our supervised pattern recognition algo. The apply-weights is really trying to change weights after the similarity has been calculated. But we can also do it before, and pre-weight our superpositions before we feed them to simm.
So instead of:
Given the training data set D: D = {(X1,Y1),(X2,Y2),...(Xn,Yn)} where Xi, and Yi are superpositions (and must not be empty superpositions that have all coeffs equal to 0) Then learn these rules: pattern |node: 1> => X1 pattern |node: 2> => X2 ... pattern |node: n> => Xn M |node: 1> => Y1 M |node: 2> => Y2 ... M |node: n> => Yn Then given the unlabeled data set U = {Z1,Z2,...Zm}, where Zi are superpositions of the same type as Xi, learn these rules: input-pattern |example: 1> => Z1 input-pattern |example: 2> => Z2 ... input-pattern |example: m> => ZmWe first find a matrix W that re-weights our Xk and Zk superpositions/patterns. Then do:
Given the training data set D: D = {(X1,Y1),(X2,Y2),...(Xn,Yn)} where Xi, and Yi are superpositions (and must not be empty superpositions that have all coeffs equal to 0) Then learn these rules: pattern |node: 1> => W X1 pattern |node: 2> => W X2 ... pattern |node: n> => W Xn M |node: 1> => Y1 M |node: 2> => Y2 ... M |node: n> => Yn Then given the unlabeled data set U = {Z1,Z2,...Zm}, where Zi are superpositions of the same type as Xi, learn these rules: input-pattern |example: 1> => W Z1 input-pattern |example: 2> => W Z2 ... input-pattern |example: m> => W ZmAnd note that W does not need to be square. Indeed, the output of "W Xk" can be a completely different type of superposition than Xk. But again, like the apply-weights idea, I don't know a good way to find W. Perhaps borrow some ideas from standard artificial neural networks?
That's it for this post!
Update: I tried a new h, but only got 74% success (12043/16281).
h |*> #=> normalize[100] coeff-sort norm M invert subtraction-invert[1] select[1,5] similarity-result |_self>I also tried select[1,3] and select[1,10] but they were worse.
No comments:
Post a Comment