Wednesday 18 March 2015

some categorize examples

This time, a couple of categorize examples.

First, an easy one. Recall H-I pattern recognition?
Now, in the console:
sa: load H-I-pat-rec.sw
sa: categorize[pixels,0.6,result]

sa: dump |result>
category-0 |result> => |letter: H> + |noisy: H> + |noisy: H2>
category-1 |result> => |letter: I> + |noisy: I> + |noisy: I2>
category-2 |result> => |letter: O>
See, as simple as that! And it worked.

Now for a harder example.
webpage superpositions:
sa: load improved-fragment-webpages.sw
sa: categorize[hash-4B,0.7,result]

-- now look at the results:
sa: dump |result>
category-0 |result> => |abc 1> + |abc 2> + |abc 3> + |abc 4> + |abc 5> + |abc 6> + |abc 7> + |abc 8> + |abc 9> + |abc 10> + |abc 11>
category-1 |result> => |adelaidenow 1> + |adelaidenow 2> + |adelaidenow 3> + |adelaidenow 4> + |adelaidenow 5> + |adelaidenow 6> + |adelaidenow 7> + |adelaidenow 8> + |adelaidenow 9> + |adelaidenow 10> + |adelaidenow 11>
category-2 |result> => |slashdot 1> + |slashdot 2> + |slashdot 3> + |slashdot 4> + |slashdot 5> + |slashdot 6> + |slashdot 7> + |slashdot 8> + |slashdot 9> + |slashdot 10> + |slashdot 11>
category-3 |result> => |smh 1> + |smh 2> + |smh 3> + |smh 4> + |smh 5> + |smh 6> + |smh 7> + |smh 8> + |smh 9> + |smh 10> + |smh 11>
category-4 |result> => |wikipedia 1> + |wikipedia 2> + |wikipedia 3> + |wikipedia 4> + |wikipedia 5> + |wikipedia 6> + |wikipedia 7> + |wikipedia 8> + |wikipedia 9> + |wikipedia 10> + |wikipedia 11>
category-5 |result> => |youtube 1> + |youtube 2> + |youtube 3> + |youtube 4> + |youtube 5> + |youtube 6> + |youtube 7> + |youtube 8> + |youtube 9> + |youtube 10> + |youtube 11>

-- now pretty print those results:
sa: websites |*> #=> apply(|_self>,|result>)
sa: table[category,websites] supported-ops |result>
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| category   | websites                                                                                                                                                              |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| category-0 | abc 1, abc 2, abc 3, abc 4, abc 5, abc 6, abc 7, abc 8, abc 9, abc 10, abc 11                                                                                         |
| category-1 | adelaidenow 1, adelaidenow 2, adelaidenow 3, adelaidenow 4, adelaidenow 5, adelaidenow 6, adelaidenow 7, adelaidenow 8, adelaidenow 9, adelaidenow 10, adelaidenow 11 |
| category-2 | slashdot 1, slashdot 2, slashdot 3, slashdot 4, slashdot 5, slashdot 6, slashdot 7, slashdot 8, slashdot 9, slashdot 10, slashdot 11                                  |
| category-3 | smh 1, smh 2, smh 3, smh 4, smh 5, smh 6, smh 7, smh 8, smh 9, smh 10, smh 11                                                                                         |
| category-4 | wikipedia 1, wikipedia 2, wikipedia 3, wikipedia 4, wikipedia 5, wikipedia 6, wikipedia 7, wikipedia 8, wikipedia 9, wikipedia 10, wikipedia 11                       |
| category-5 | youtube 1, youtube 2, youtube 3, youtube 4, youtube 5, youtube 6, youtube 7, youtube 8, youtube 9, youtube 10, youtube 11                                             |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
And again, simple, and it worked! Though this example did take 7 minutes. And being, I think, O(n^3), it becomes impractical for large data sets :(.

Heh, also. The BKO to make the table is somewhat opaque, even to me!

And that is it for categorize. On to normed frequency class for the next few posts.

Update: the best value for t in categorize (0.6 and 0.7 above), depends strongly on your data set. I usually make a similarity matrix first, and then use that to choose the value. Maybe there is a neater way? Consider the adult wage prediction task from the other day. In that case you would need t = 0.99998 or something to give good results!

No comments:

Post a Comment