First, an easy one. Recall H-I pattern recognition?
Now, in the console:
sa: load H-I-pat-rec.sw sa: categorize[pixels,0.6,result] sa: dump |result> category-0 |result> => |letter: H> + |noisy: H> + |noisy: H2> category-1 |result> => |letter: I> + |noisy: I> + |noisy: I2> category-2 |result> => |letter: O>See, as simple as that! And it worked.
Now for a harder example.
webpage superpositions:
sa: load improved-fragment-webpages.sw sa: categorize[hash-4B,0.7,result] -- now look at the results: sa: dump |result> category-0 |result> => |abc 1> + |abc 2> + |abc 3> + |abc 4> + |abc 5> + |abc 6> + |abc 7> + |abc 8> + |abc 9> + |abc 10> + |abc 11> category-1 |result> => |adelaidenow 1> + |adelaidenow 2> + |adelaidenow 3> + |adelaidenow 4> + |adelaidenow 5> + |adelaidenow 6> + |adelaidenow 7> + |adelaidenow 8> + |adelaidenow 9> + |adelaidenow 10> + |adelaidenow 11> category-2 |result> => |slashdot 1> + |slashdot 2> + |slashdot 3> + |slashdot 4> + |slashdot 5> + |slashdot 6> + |slashdot 7> + |slashdot 8> + |slashdot 9> + |slashdot 10> + |slashdot 11> category-3 |result> => |smh 1> + |smh 2> + |smh 3> + |smh 4> + |smh 5> + |smh 6> + |smh 7> + |smh 8> + |smh 9> + |smh 10> + |smh 11> category-4 |result> => |wikipedia 1> + |wikipedia 2> + |wikipedia 3> + |wikipedia 4> + |wikipedia 5> + |wikipedia 6> + |wikipedia 7> + |wikipedia 8> + |wikipedia 9> + |wikipedia 10> + |wikipedia 11> category-5 |result> => |youtube 1> + |youtube 2> + |youtube 3> + |youtube 4> + |youtube 5> + |youtube 6> + |youtube 7> + |youtube 8> + |youtube 9> + |youtube 10> + |youtube 11> -- now pretty print those results: sa: websites |*> #=> apply(|_self>,|result>) sa: table[category,websites] supported-ops |result> +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | category | websites | +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | category-0 | abc 1, abc 2, abc 3, abc 4, abc 5, abc 6, abc 7, abc 8, abc 9, abc 10, abc 11 | | category-1 | adelaidenow 1, adelaidenow 2, adelaidenow 3, adelaidenow 4, adelaidenow 5, adelaidenow 6, adelaidenow 7, adelaidenow 8, adelaidenow 9, adelaidenow 10, adelaidenow 11 | | category-2 | slashdot 1, slashdot 2, slashdot 3, slashdot 4, slashdot 5, slashdot 6, slashdot 7, slashdot 8, slashdot 9, slashdot 10, slashdot 11 | | category-3 | smh 1, smh 2, smh 3, smh 4, smh 5, smh 6, smh 7, smh 8, smh 9, smh 10, smh 11 | | category-4 | wikipedia 1, wikipedia 2, wikipedia 3, wikipedia 4, wikipedia 5, wikipedia 6, wikipedia 7, wikipedia 8, wikipedia 9, wikipedia 10, wikipedia 11 | | category-5 | youtube 1, youtube 2, youtube 3, youtube 4, youtube 5, youtube 6, youtube 7, youtube 8, youtube 9, youtube 10, youtube 11 | +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+And again, simple, and it worked! Though this example did take 7 minutes. And being, I think, O(n^3), it becomes impractical for large data sets :(.
Heh, also. The BKO to make the table is somewhat opaque, even to me!
And that is it for categorize. On to normed frequency class for the next few posts.
Update: the best value for t in categorize (0.6 and 0.7 above), depends strongly on your data set. I usually make a similarity matrix first, and then use that to choose the value. Maybe there is a neater way? Consider the adult wage prediction task from the other day. In that case you would need t = 0.99998 or something to give good results!
No comments:
Post a Comment