Monday, 14 December 2015

a brief look at sum of prime factors

OK. A fun one today. Well, I think it is fun, a mathematician would call it trivial. Sum of prime factors. Way back I called these things strange-integers, and they served as an early test of my code. Simply enough define:
strange-int[x] = the sum of the prime factors of x.

Say r has prime factorisation:
r = p1^n1 * p2^n2 * p3^n3 * p4^n4 * ...
Then strange-int[r] = p1*n1 + p2*n2 + p3*n3 + p4*n4 + ...
It has these properties:
strange-int[p] = p if p is prime, or p = 4
strange-int[p] < p otherwise
This has the effect that we can form a tree of integers (for the impatient, here is an image of that tree), where each application of strange-int[] steps one step closer to a final prime.

Here are the first twenty:
-- we need this due to a quirk of my code. 
-- where the strange-int on the right hand side is a function-operator in my functions code.
sa: strange-int |*> #=> strange-int |_self>
sa: table[number,strange-int] range(|number: 2>,|number: 20>)
+--------+-------------+
| number | strange-int |
+--------+-------------+
| 2      | 2           |
| 3      | 3           |
| 4      | 4           |
| 5      | 5           |
| 6      | 5           |
| 7      | 7           |
| 8      | 6           |
| 9      | 6           |
| 10     | 7           |
| 11     | 11          |
| 12     | 7           |
| 13     | 13          |
| 14     | 9           |
| 15     | 8           |
| 16     | 8           |
| 17     | 17          |
| 18     | 8           |
| 19     | 19          |
| 20     | 9           |
+--------+-------------+
Next we can take a look using the strange-int-list operator. This maps an integer to a list of integers down to the final prime. Here are the first 30:
sa: strange-int-list |*> #=> strange-int-list |_self>
sa: how-long-is-strange-int-list |*> #=> how-many strange-int-list |_self>
sa: table[number,strange-int,strange-int-list,how-long-is-strange-int-list] range(|number: 2>,|number: 30>)
+--------+-------------+------------------+------------------------------+
| number | strange-int | strange-int-list | how-long-is-strange-int-list |
+--------+-------------+------------------+------------------------------+
| 2      | 2           | 2                | 1                            |
| 3      | 3           | 3                | 1                            |
| 4      | 4           | 4                | 1                            |
| 5      | 5           | 5                | 1                            |
| 6      | 5           | 6, 5             | 2                            |
| 7      | 7           | 7                | 1                            |
| 8      | 6           | 8, 6, 5          | 3                            |
| 9      | 6           | 9, 6, 5          | 3                            |
| 10     | 7           | 10, 7            | 2                            |
| 11     | 11          | 11               | 1                            |
| 12     | 7           | 12, 7            | 2                            |
| 13     | 13          | 13               | 1                            |
| 14     | 9           | 14, 9, 6, 5      | 4                            |
| 15     | 8           | 15, 8, 6, 5      | 4                            |
| 16     | 8           | 16, 8, 6, 5      | 4                            |
| 17     | 17          | 17               | 1                            |
| 18     | 8           | 18, 8, 6, 5      | 4                            |
| 19     | 19          | 19               | 1                            |
| 20     | 9           | 20, 9, 6, 5      | 4                            |
| 21     | 10          | 21, 10, 7        | 3                            |
| 22     | 13          | 22, 13           | 2                            |
| 23     | 23          | 23               | 1                            |
| 24     | 9           | 24, 9, 6, 5      | 4                            |
| 25     | 10          | 25, 10, 7        | 3                            |
| 26     | 15          | 26, 15, 8, 6, 5  | 5                            |
| 27     | 9           | 27, 9, 6, 5      | 4                            |
| 28     | 11          | 28, 11           | 2                            |
| 29     | 29          | 29               | 1                            |
| 30     | 10          | 30, 10, 7        | 3                            |
+--------+-------------+------------------+------------------------------+
Now, lets sort by strange-int-list length, and find the top 100 of those in the first 100,000 integers.
sa: strange-int |*> #=> strange-int |_self>
sa: strange-int-list |*> #=> strange-int-list |_self>
sa: length-of-strange-int-list |*> #=> how-many strange-int-list |_self>
sa: table[number,strange-int,strange-int-list,length-of-strange-int-list] select[1,100] reverse sort-by[length-of-strange-int-list] range(|number: 2>,|number: 100000>)
+--------+-------------+---------------------------------------------------------+----------------------------+
| number | strange-int | strange-int-list                                        | length-of-strange-int-list |
+--------+-------------+---------------------------------------------------------+----------------------------+
| 55694  | 27849       | 55694, 27849, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5 | 12                         |
| 27933  | 9314        | 27933, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5  | 12                         |
| 99895  | 19984       | 99895, 19984, 1257, 422, 213, 74, 39, 16, 8, 6, 5       | 11                         |
| 97629  | 4659        | 97629, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5        | 11                         |
| 97533  | 10843       | 97533, 10843, 1556, 393, 134, 69, 26, 15, 8, 6, 5       | 11                         |
| 92915  | 18588       | 92915, 18588, 1556, 393, 134, 69, 26, 15, 8, 6, 5       | 11                         |
| 90177  | 30062       | 90177, 30062, 15033, 5014, 134, 69, 26, 15, 8, 6, 5     | 11                         |
| 86696  | 10843       | 86696, 10843, 1556, 393, 134, 69, 26, 15, 8, 6, 5       | 11                         |
| 86662  | 43333       | 86662, 43333, 2566, 1285, 262, 133, 26, 15, 8, 6, 5     | 11                         |
| 84934  | 42469       | 84934, 42469, 6074, 3039, 1016, 133, 26, 15, 8, 6, 5    | 11                         |
| 83718  | 4659        | 83718, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5        | 11                         |
| 74924  | 18735       | 74924, 18735, 1257, 422, 213, 74, 39, 16, 8, 6, 5       | 11                         |
| 74416  | 4659        | 74416, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5        | 11                         |
| 69765  | 4659        | 69765, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5        | 11                         |
| 68653  | 5294        | 68653, 5294, 2649, 886, 445, 94, 49, 14, 9, 6, 5        | 11                         |
| 55686  | 9286        | 55686, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5        | 11                         |
| 46405  | 9286        | 46405, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5        | 11                         |
| 46142  | 23073       | 46142, 23073, 7694, 3849, 1286, 645, 51, 20, 9, 6, 5    | 11                         |
| 36398  | 18201       | 36398, 18201, 6070, 614, 309, 106, 55, 16, 8, 6, 5      | 11                         |
| 30993  | 10334       | 30993, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5     | 11                         |
| 27849  | 9286        | 27849, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5        | 11                         |
| 9314   | 4659        | 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5         | 11                         |
| 99853  | 7694        | 99853, 7694, 3849, 1286, 645, 51, 20, 9, 6, 5           | 10                         |
| 99763  | 1556        | 99763, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 10                         |
| 99482  | 49743       | 99482, 49743, 5533, 514, 259, 44, 15, 8, 6, 5           | 10                         |
| 98422  | 49213       | 98422, 49213, 1726, 865, 178, 91, 20, 9, 6, 5           | 10                         |
| 98386  | 49195       | 98386, 49195, 9844, 134, 69, 26, 15, 8, 6, 5            | 10                         |
| 98284  | 24575       | 98284, 24575, 993, 334, 169, 26, 15, 8, 6, 5            | 10                         |
| 98214  | 16374       | 98214, 16374, 2734, 1369, 74, 39, 16, 8, 6, 5           | 10                         |
| 98151  | 32720       | 98151, 32720, 422, 213, 74, 39, 16, 8, 6, 5             | 10                         |
| 97618  | 48811       | 97618, 48811, 393, 134, 69, 26, 15, 8, 6, 5             | 10                         |
| 97209  | 1556        | 97209, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 10                         |
| 96523  | 13796       | 96523, 13796, 3453, 1154, 579, 196, 18, 8, 6, 5         | 10                         |
| 96467  | 13788       | 96467, 13788, 393, 134, 69, 26, 15, 8, 6, 5             | 10                         |
| 95962  | 47983       | 95962, 47983, 3704, 469, 74, 39, 16, 8, 6, 5            | 10                         |
| 95562  | 5317        | 95562, 5317, 422, 213, 74, 39, 16, 8, 6, 5              | 10                         |
| 94593  | 31534       | 94593, 31534, 15769, 1226, 615, 49, 14, 9, 6, 5         | 10                         |
| 94469  | 5574        | 94469, 5574, 934, 469, 74, 39, 16, 8, 6, 5              | 10                         |
| 94426  | 1556        | 94426, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 10                         |
| 94222  | 47113       | 94222, 47113, 4294, 134, 69, 26, 15, 8, 6, 5            | 10                         |
| 93494  | 46749       | 93494, 46749, 15586, 7795, 1564, 44, 15, 8, 6, 5        | 10                         |
| 93414  | 15574       | 93414, 15574, 614, 309, 106, 55, 16, 8, 6, 5            | 10                         |
| 93382  | 46693       | 93382, 46693, 934, 469, 74, 39, 16, 8, 6, 5             | 10                         |
| 92702  | 46353       | 92702, 46353, 15454, 7729, 190, 26, 15, 8, 6, 5         | 10                         |
| 92654  | 46329       | 92654, 46329, 15446, 7725, 116, 33, 14, 9, 6, 5         | 10                         |
| 92244  | 7694        | 92244, 7694, 3849, 1286, 645, 51, 20, 9, 6, 5           | 10                         |
| 92156  | 23043       | 92156, 23043, 7684, 134, 69, 26, 15, 8, 6, 5            | 10                         |
| 91445  | 18294       | 91445, 18294, 3054, 514, 259, 44, 15, 8, 6, 5           | 10                         |
| 91398  | 15238       | 91398, 15238, 422, 213, 74, 39, 16, 8, 6, 5             | 10                         |
| 91285  | 18262       | 91285, 18262, 422, 213, 74, 39, 16, 8, 6, 5             | 10                         |
| 91213  | 1774        | 91213, 1774, 889, 134, 69, 26, 15, 8, 6, 5              | 10                         |
| 91198  | 45601       | 91198, 45601, 1502, 753, 254, 129, 46, 25, 10, 7        | 10                         |
| 89974  | 44989       | 89974, 44989, 6434, 3219, 69, 26, 15, 8, 6, 5           | 10                         |
| 89702  | 44853       | 89702, 44853, 14954, 7479, 286, 26, 15, 8, 6, 5         | 10                         |
| 89222  | 6382        | 89222, 6382, 3193, 134, 69, 26, 15, 8, 6, 5             | 10                         |
| 89038  | 44521       | 89038, 44521, 422, 213, 74, 39, 16, 8, 6, 5             | 10                         |
| 88870  | 8894        | 88870, 8894, 4449, 1486, 745, 154, 20, 9, 6, 5          | 10                         |
| 87985  | 17602       | 87985, 17602, 692, 177, 62, 33, 14, 9, 6, 5             | 10                         |
| 87938  | 43971       | 87938, 43971, 14660, 742, 62, 33, 14, 9, 6, 5           | 10                         |
| 87554  | 43779       | 87554, 43779, 14596, 134, 69, 26, 15, 8, 6, 5           | 10                         |
| 87222  | 14542       | 87222, 14542, 674, 339, 116, 33, 14, 9, 6, 5            | 10                         |
| 86934  | 14494       | 86934, 14494, 7249, 670, 74, 39, 16, 8, 6, 5            | 10                         |
| 86408  | 1556        | 86408, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 10                         |
| 85983  | 28664       | 85983, 28664, 3589, 134, 69, 26, 15, 8, 6, 5            | 10                         |
| 85686  | 14286       | 85686, 14286, 2386, 1195, 244, 65, 18, 8, 6, 5          | 10                         |
| 85497  | 28502       | 85497, 28502, 14253, 4754, 2379, 77, 18, 8, 6, 5        | 10                         |
| 85317  | 28442       | 85317, 28442, 14223, 445, 94, 49, 14, 9, 6, 5           | 10                         |
| 85299  | 28436       | 85299, 28436, 7113, 2374, 1189, 70, 14, 9, 6, 5         | 10                         |
| 84944  | 5317        | 84944, 5317, 422, 213, 74, 39, 16, 8, 6, 5              | 10                         |
| 84939  | 1257        | 84939, 1257, 422, 213, 74, 39, 16, 8, 6, 5              | 10                         |
| 84684  | 7064        | 84684, 7064, 889, 134, 69, 26, 15, 8, 6, 5              | 10                         |
| 83686  | 41845       | 83686, 41845, 8374, 134, 69, 26, 15, 8, 6, 5            | 10                         |
| 83466  | 4645        | 83466, 4645, 934, 469, 74, 39, 16, 8, 6, 5              | 10                         |
| 83374  | 41689       | 83374, 41689, 934, 469, 74, 39, 16, 8, 6, 5             | 10                         |
| 83282  | 41643       | 83282, 41643, 674, 339, 116, 33, 14, 9, 6, 5            | 10                         |
| 82556  | 20643       | 82556, 20643, 993, 334, 169, 26, 15, 8, 6, 5            | 10                         |
| 81845  | 16374       | 81845, 16374, 2734, 1369, 74, 39, 16, 8, 6, 5           | 10                         |
| 80381  | 11490       | 80381, 11490, 393, 134, 69, 26, 15, 8, 6, 5             | 10                         |
| 80269  | 11474       | 80269, 11474, 5739, 1916, 483, 33, 14, 9, 6, 5          | 10                         |
| 79635  | 5317        | 79635, 5317, 422, 213, 74, 39, 16, 8, 6, 5              | 10                         |
| 79302  | 13222       | 79302, 13222, 614, 309, 106, 55, 16, 8, 6, 5            | 10                         |
| 79146  | 4405        | 79146, 4405, 886, 445, 94, 49, 14, 9, 6, 5              | 10                         |
| 78747  | 26252       | 78747, 26252, 6567, 213, 74, 39, 16, 8, 6, 5            | 10                         |
| 78609  | 26206       | 78609, 26206, 13105, 2626, 116, 33, 14, 9, 6, 5         | 10                         |
| 78502  | 39253       | 78502, 39253, 2326, 1165, 238, 26, 15, 8, 6, 5          | 10                         |
| 78362  | 39183       | 78362, 39183, 393, 134, 69, 26, 15, 8, 6, 5             | 10                         |
| 78038  | 39021       | 78038, 39021, 13010, 1308, 116, 33, 14, 9, 6, 5         | 10                         |
| 77845  | 15574       | 77845, 15574, 614, 309, 106, 55, 16, 8, 6, 5            | 10                         |
| 77709  | 25906       | 77709, 25906, 12955, 2596, 74, 39, 16, 8, 6, 5          | 10                         |
| 77642  | 38823       | 77642, 38823, 12944, 817, 62, 33, 14, 9, 6, 5           | 10                         |
| 77396  | 1774        | 77396, 1774, 889, 134, 69, 26, 15, 8, 6, 5              | 10                         |
| 77391  | 8605        | 77391, 8605, 1726, 865, 178, 91, 20, 9, 6, 5            | 10                         |
| 77289  | 25766       | 77289, 25766, 1006, 505, 106, 55, 16, 8, 6, 5           | 10                         |
| 77109  | 25706       | 77109, 25706, 12855, 865, 178, 91, 20, 9, 6, 5          | 10                         |
| 76870  | 7694        | 76870, 7694, 3849, 1286, 645, 51, 20, 9, 6, 5           | 10                         |
| 76363  | 10916       | 76363, 10916, 2733, 914, 459, 26, 15, 8, 6, 5           | 10                         |
| 76338  | 4249        | 76338, 4249, 614, 309, 106, 55, 16, 8, 6, 5             | 10                         |
| 76165  | 15238       | 76165, 15238, 422, 213, 74, 39, 16, 8, 6, 5             | 10                         |
| 75722  | 37863       | 75722, 37863, 614, 309, 106, 55, 16, 8, 6, 5            | 10                         |
| 74192  | 4645        | 74192, 4645, 934, 469, 74, 39, 16, 8, 6, 5              | 10                         |
+--------+-------------+---------------------------------------------------------+----------------------------+
So it seems, in the first 100,000 integers, 55,694 and 27,933 are in some sense "the least prime", taking 12 steps of strange-int to reach a destination prime. And 91,198 is the only one in the top 100 that has 7 as the final prime instead of 5.

Now, let's upscale to the first 1,000,000 integers (I tried 10,000,000 but python seg-faulted at 1.4 million).
sa: strange-int-list |*> #=> strange-int-list |_self>
sa: length-of-strange-int-list |*> #=> how-many strange-int-list |_self>
sa: table[number,strange-int-list,length-of-strange-int-list] select[1,100] reverse sort-by[length-of-strange-int-list] range(|number: 2>,|number: 1000000>)
+--------+-------------------------------------------------------------------------+----------------------------+
| number | strange-int-list                                                        | length-of-strange-int-list |
+--------+-------------------------------------------------------------------------+----------------------------+
| 334142 | 334142, 167073, 55694, 27849, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5 | 14                         |
| 921327 | 921327, 27933, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5          | 13                         |
| 868022 | 868022, 434013, 144674, 72339, 24116, 6033, 2014, 74, 39, 16, 8, 6, 5   | 13                         |
| 823502 | 823502, 411753, 137254, 5294, 2649, 886, 445, 94, 49, 14, 9, 6, 5       | 13                         |
| 723853 | 723853, 55694, 27849, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5         | 13                         |
| 167073 | 167073, 55694, 27849, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5         | 13                         |
| 999593 | 999593, 142806, 23806, 11905, 2386, 1195, 244, 65, 18, 8, 6, 5          | 12                         |
| 991456 | 991456, 30993, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5             | 12                         |
| 982503 | 982503, 36398, 18201, 6070, 614, 309, 106, 55, 16, 8, 6, 5              | 12                         |
| 959566 | 959566, 479785, 95962, 47983, 3704, 469, 74, 39, 16, 8, 6, 5            | 12                         |
| 945958 | 945958, 36398, 18201, 6070, 614, 309, 106, 55, 16, 8, 6, 5              | 12                         |
| 944206 | 944206, 472105, 94426, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 12                         |
| 929490 | 929490, 30993, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5             | 12                         |
| 922660 | 922660, 46142, 23073, 7694, 3849, 1286, 645, 51, 20, 9, 6, 5            | 12                         |
| 912805 | 912805, 182566, 91285, 18262, 422, 213, 74, 39, 16, 8, 6, 5             | 12                         |
| 904603 | 904603, 129236, 32313, 10774, 5389, 334, 169, 26, 15, 8, 6, 5           | 12                         |
| 896734 | 896734, 448369, 15490, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 12                         |
| 892578 | 892578, 148768, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5               | 12                         |
| 892569 | 892569, 297526, 148765, 29758, 14881, 670, 74, 39, 16, 8, 6, 5          | 12                         |
| 888542 | 888542, 444273, 148094, 74049, 24686, 12345, 831, 280, 18, 8, 6, 5      | 12                         |
| 887613 | 887613, 295874, 147939, 4497, 1502, 753, 254, 129, 46, 25, 10, 7        | 12                         |
| 882202 | 882202, 441103, 33944, 4249, 614, 309, 106, 55, 16, 8, 6, 5             | 12                         |
| 873336 | 873336, 36398, 18201, 6070, 614, 309, 106, 55, 16, 8, 6, 5              | 12                         |
| 866890 | 866890, 86696, 10843, 1556, 393, 134, 69, 26, 15, 8, 6, 5               | 12                         |
| 864627 | 864627, 288212, 72057, 24022, 12013, 334, 169, 26, 15, 8, 6, 5          | 12                         |
| 837277 | 837277, 119618, 59811, 19940, 1006, 505, 106, 55, 16, 8, 6, 5           | 12                         |
| 780437 | 780437, 111498, 18588, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 12                         |
| 774575 | 774575, 30993, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5             | 12                         |
| 745717 | 745717, 106538, 53271, 1982, 993, 334, 169, 26, 15, 8, 6, 5             | 12                         |
| 743815 | 743815, 148768, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5               | 12                         |
| 735006 | 735006, 122506, 61255, 12256, 393, 134, 69, 26, 15, 8, 6, 5             | 12                         |
| 727780 | 727780, 36398, 18201, 6070, 614, 309, 106, 55, 16, 8, 6, 5              | 12                         |
| 723749 | 723749, 55686, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5                | 12                         |
| 719214 | 719214, 119874, 19984, 1257, 422, 213, 74, 39, 16, 8, 6, 5              | 12                         |
| 697062 | 697062, 116182, 5294, 2649, 886, 445, 94, 49, 14, 9, 6, 5               | 12                         |
| 688958 | 688958, 344481, 114830, 11490, 393, 134, 69, 26, 15, 8, 6, 5            | 12                         |
| 674593 | 674593, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                 | 12                         |
| 668958 | 668958, 111498, 18588, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 12                         |
| 650643 | 650643, 30993, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5             | 12                         |
| 645862 | 645862, 46142, 23073, 7694, 3849, 1286, 645, 51, 20, 9, 6, 5            | 12                         |
| 638782 | 638782, 319393, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5            | 12                         |
| 638422 | 638422, 319213, 5294, 2649, 886, 445, 94, 49, 14, 9, 6, 5               | 12                         |
| 631924 | 631924, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                 | 12                         |
| 613628 | 613628, 153411, 51140, 2566, 1285, 262, 133, 26, 15, 8, 6, 5            | 12                         |
| 612505 | 612505, 122506, 61255, 12256, 393, 134, 69, 26, 15, 8, 6, 5             | 12                         |
| 609218 | 609218, 304611, 101540, 5086, 2545, 514, 259, 44, 15, 8, 6, 5           | 12                         |
| 606823 | 606823, 86696, 10843, 1556, 393, 134, 69, 26, 15, 8, 6, 5               | 12                         |
| 601113 | 601113, 200374, 5294, 2649, 886, 445, 94, 49, 14, 9, 6, 5               | 12                         |
| 599345 | 599345, 119874, 19984, 1257, 422, 213, 74, 39, 16, 8, 6, 5              | 12                         |
| 592684 | 592684, 148175, 5937, 1982, 993, 334, 169, 26, 15, 8, 6, 5              | 12                         |
| 587427 | 587427, 195812, 48957, 16322, 8163, 913, 94, 49, 14, 9, 6, 5            | 12                         |
| 580885 | 580885, 116182, 5294, 2649, 886, 445, 94, 49, 14, 9, 6, 5               | 12                         |
| 575422 | 575422, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                 | 12                         |
| 568983 | 568983, 189664, 5937, 1982, 993, 334, 169, 26, 15, 8, 6, 5              | 12                         |
| 568196 | 568196, 142053, 47354, 23679, 886, 445, 94, 49, 14, 9, 6, 5             | 12                         |
| 557465 | 557465, 111498, 18588, 1556, 393, 134, 69, 26, 15, 8, 6, 5              | 12                         |
| 538414 | 538414, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                 | 12                         |
| 533198 | 533198, 266601, 88870, 8894, 4449, 1486, 745, 154, 20, 9, 6, 5          | 12                         |
| 509446 | 509446, 36398, 18201, 6070, 614, 309, 106, 55, 16, 8, 6, 5              | 12                         |
| 496966 | 496966, 248485, 49702, 24853, 886, 445, 94, 49, 14, 9, 6, 5             | 12                         |
| 471633 | 471633, 157214, 78609, 26206, 13105, 2626, 116, 33, 14, 9, 6, 5         | 12                         |
| 458187 | 458187, 152732, 38187, 4249, 614, 309, 106, 55, 16, 8, 6, 5             | 12                         |
| 446466 | 446466, 74416, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                | 12                         |
| 434013 | 434013, 144674, 72339, 24116, 6033, 2014, 74, 39, 16, 8, 6, 5           | 12                         |
| 424113 | 424113, 141374, 70689, 23566, 11785, 2362, 1183, 33, 14, 9, 6, 5        | 12                         |
| 417591 | 417591, 46405, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5                | 12                         |
| 411753 | 411753, 137254, 5294, 2649, 886, 445, 94, 49, 14, 9, 6, 5               | 12                         |
| 401614 | 401614, 200809, 28694, 14349, 4786, 2395, 484, 26, 15, 8, 6, 5          | 12                         |
| 390914 | 390914, 195459, 5937, 1982, 993, 334, 169, 26, 15, 8, 6, 5              | 12                         |
| 372055 | 372055, 74416, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                | 12                         |
| 371192 | 371192, 46405, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5                | 12                         |
| 367509 | 367509, 122506, 61255, 12256, 393, 134, 69, 26, 15, 8, 6, 5             | 12                         |
| 360692 | 360692, 90177, 30062, 15033, 5014, 134, 69, 26, 15, 8, 6, 5             | 12                         |
| 353134 | 353134, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                 | 12                         |
| 343249 | 343249, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                 | 12                         |
| 334086 | 334086, 55686, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5                | 12                         |
| 306818 | 306818, 153411, 51140, 2566, 1285, 262, 133, 26, 15, 8, 6, 5            | 12                         |
| 287773 | 287773, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                 | 12                         |
| 279044 | 279044, 69765, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                | 12                         |
| 278405 | 278405, 55686, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5                | 12                         |
| 268342 | 268342, 134173, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5            | 12                         |
| 260079 | 260079, 86696, 10843, 1556, 393, 134, 69, 26, 15, 8, 6, 5               | 12                         |
| 223239 | 223239, 74416, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                | 12                         |
| 139526 | 139526, 69765, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                | 12                         |
| 55694  | 55694, 27849, 9286, 4645, 934, 469, 74, 39, 16, 8, 6, 5                 | 12                         |
| 27933  | 27933, 9314, 4659, 1556, 393, 134, 69, 26, 15, 8, 6, 5                  | 12                         |
| 999242 | 999242, 499623, 166544, 1502, 753, 254, 129, 46, 25, 10, 7              | 11                         |
| 998454 | 998454, 166414, 83209, 11894, 334, 169, 26, 15, 8, 6, 5                 | 11                         |
| 995216 | 995216, 62209, 8894, 4449, 1486, 745, 154, 20, 9, 6, 5                  | 11                         |
| 994966 | 994966, 71078, 5086, 2545, 514, 259, 44, 15, 8, 6, 5                    | 11                         |
| 994822 | 994822, 497413, 71066, 35535, 134, 69, 26, 15, 8, 6, 5                  | 11                         |
| 994094 | 994094, 497049, 23679, 886, 445, 94, 49, 14, 9, 6, 5                    | 11                         |
| 993838 | 993838, 496921, 6070, 614, 309, 106, 55, 16, 8, 6, 5                    | 11                         |
| 993189 | 993189, 331066, 165535, 33112, 4145, 834, 144, 14, 9, 6, 5              | 11                         |
| 992778 | 992778, 165468, 13796, 3453, 1154, 579, 196, 18, 8, 6, 5                | 11                         |
| 992202 | 992202, 165372, 13788, 393, 134, 69, 26, 15, 8, 6, 5                    | 11                         |
| 990816 | 990816, 10334, 5169, 1726, 865, 178, 91, 20, 9, 6, 5                    | 11                         |
| 990588 | 990588, 82556, 20643, 993, 334, 169, 26, 15, 8, 6, 5                    | 11                         |
| 989593 | 989593, 89974, 44989, 6434, 3219, 69, 26, 15, 8, 6, 5                   | 11                         |
| 989513 | 989513, 141366, 23566, 11785, 2362, 1183, 33, 14, 9, 6, 5               | 11                         |
+--------+-------------------------------------------------------------------------+----------------------------+
I guess one observation to make here is that length-of-strange-int-list grows rather slowly. And that 334,142 is in some sense the least prime.

Plenty more way's to slice and dice strange-ints. eg, count how many integers map to a given prime. Here is a simple example, making use of strange-int-prime that maps numbers to their primes:
-- show strange-int-prime in use:
sa: strange-int-prime |number: 55694>
|number: 5>
sa: strange-int-prime |number: 91198>
|number: 7>

-- now count the strange-int-primes:
sa: table[number,coeff] strange-int-prime range(|number: 2>,|number: 20>)
+--------+-------+
| number | coeff |
+--------+-------+
| 2      | 1     |
| 3      | 1     |
| 4      | 1     |
| 5      | 9     |
| 7      | 3     |
| 11     | 1     |
| 13     | 1     |
| 17     | 1     |
| 19     | 1     |
+--------+-------+
Now upscale to 100,000 and sort:
sa: table[number,coeff] select[1,100] coeff-sort strange-int-prime range(|number: 2>,|number: 100000>)
+--------+-------+
| number | coeff |
+--------+-------+
| 5      | 27023 |
| 7      | 15753 |
| 13     | 6821  |
| 11     | 6032  |
| 19     | 3837  |
| 17     | 3146  |
| 23     | 2289  |
| 31     | 1497  |
| 29     | 1194  |
| 43     | 1161  |
| 37     | 937   |
| 47     | 792   |
| 41     | 747   |
| 61     | 719   |
| 73     | 658   |
| 53     | 626   |
| 59     | 477   |
| 103    | 463   |
| 109    | 445   |
| 67     | 444   |
| 71     | 444   |
| 83     | 405   |
| 79     | 401   |
| 113    | 354   |
| 89     | 348   |
| 139    | 326   |
| 101    | 325   |
| 107    | 316   |
| 151    | 275   |
| 131    | 247   |
| 97     | 246   |
| 181    | 226   |
| 127    | 218   |
| 137    | 208   |
| 167    | 207   |
| 199    | 207   |
| 149    | 206   |
| 193    | 202   |
| 229    | 190   |
| 173    | 185   |
| 163    | 177   |
| 157    | 175   |
| 197    | 174   |
| 241    | 164   |
| 179    | 157   |
| 191    | 143   |
| 283    | 129   |
| 239    | 127   |
| 271    | 126   |
| 233    | 124   |
| 211    | 121   |
| 257    | 119   |
| 227    | 117   |
| 313    | 116   |
| 281    | 115   |
| 293    | 99    |
| 251    | 97    |
| 263    | 97    |
| 277    | 95    |
| 269    | 94    |
| 317    | 92    |
| 223    | 90    |
| 349    | 86    |
| 383    | 70    |
| 311    | 67    |
| 463    | 66    |
| 307    | 61    |
| 373    | 61    |
| 367    | 60    |
| 421    | 60    |
| 353    | 59    |
| 359    | 58    |
| 401    | 58    |
| 433    | 58    |
| 331    | 56    |
| 523    | 56    |
| 347    | 55    |
| 619    | 55    |
| 397    | 54    |
| 443    | 54    |
| 337    | 53    |
| 389    | 51    |
| 379    | 50    |
| 449    | 50    |
| 467    | 48    |
| 503    | 48    |
| 409    | 46    |
| 457    | 44    |
| 571    | 44    |
| 661    | 44    |
| 431    | 43    |
| 461    | 43    |
| 509    | 42    |
| 601    | 42    |
| 439    | 41    |
| 479    | 41    |
| 617    | 38    |
| 613    | 37    |
| 419    | 35    |
| 491    | 35    |
+--------+-------+
Now, upscale to 500,000:
sa: table[number,coeff] select[1,100] coeff-sort strange-int-prime range(|number: 2>,|number: 500000>)
+--------+--------+
| number | coeff  |
+--------+--------+
| 5      | 133689 |
| 7      | 76847  |
| 13     | 33690  |
| 11     | 29661  |
| 19     | 18500  |
| 17     | 15111  |
| 23     | 11463  |
| 31     | 7730   |
| 29     | 6187   |
| 43     | 5479   |
| 37     | 4444   |
| 47     | 3879   |
| 41     | 3849   |
| 61     | 3503   |
| 73     | 3263   |
| 53     | 3096   |
| 103    | 2308   |
| 67     | 2304   |
| 109    | 2105   |
| 59     | 2097   |
| 71     | 2071   |
| 83     | 1926   |
| 79     | 1925   |
| 113    | 1655   |
| 139    | 1608   |
| 89     | 1607   |
| 107    | 1457   |
| 101    | 1278   |
| 151    | 1155   |
| 97     | 1148   |
| 181    | 1045   |
| 131    | 1022   |
| 199    | 991    |
| 193    | 955    |
| 137    | 903    |
| 229    | 901    |
| 167    | 889    |
| 173    | 852    |
| 157    | 821    |
| 197    | 821    |
| 127    | 795    |
| 163    | 787    |
| 149    | 784    |
| 179    | 771    |
| 241    | 748    |
| 233    | 708    |
| 283    | 686    |
| 191    | 660    |
| 271    | 652    |
| 313    | 650    |
| 281    | 616    |
| 227    | 615    |
| 239    | 591    |
| 263    | 569    |
| 211    | 559    |
| 257    | 558    |
| 251    | 538    |
| 277    | 530    |
| 317    | 516    |
| 269    | 499    |
| 223    | 484    |
| 349    | 467    |
| 293    | 459    |
| 311    | 450    |
| 421    | 412    |
| 463    | 412    |
| 353    | 399    |
| 383    | 397    |
| 359    | 386    |
| 433    | 381    |
| 401    | 375    |
| 307    | 374    |
| 389    | 366    |
| 337    | 359    |
| 373    | 350    |
| 467    | 347    |
| 379    | 346    |
| 619    | 345    |
| 347    | 339    |
| 397    | 330    |
| 331    | 327    |
| 449    | 320    |
| 523    | 316    |
| 367    | 315    |
| 443    | 302    |
| 509    | 298    |
| 461    | 293    |
| 661    | 290    |
| 439    | 288    |
| 419    | 284    |
| 409    | 283    |
| 571    | 272    |
| 601    | 268    |
| 479    | 264    |
| 491    | 263    |
| 431    | 262    |
| 503    | 257    |
| 457    | 256    |
| 643    | 255    |
| 617    | 245    |
+--------+--------+
And I guess we can ask the question, does this ordering of primes (5,7,13,11,19,17,23,31,29,43,37,...) stabilize as n -> infinity? I suspect it just might. OK. I had a closer look with more examples, and it might and it might not. If it does, it would provide a kind of natural ordering of primes. Whatever use that may be.
n from 2 to 100,000 gives this ordering:
5,7,13,11,19,17,23,31,29,43,37,47,41,61,73,53,59,103,109,67,71,83,79,113,89,139,101,107,151,131,97,...

n from 2 to 500,000 gives this ordering:
5,7,13,11,19,17,23,31,29,43,37,47,41,61,73,53,103,67,109,59,71,83,79,113,139,89,107,101,151,97,181,... 

n from 500,000 to 1,000,000 gives this ordering:
5,7,13,11,19,17,23,31,29,43,37,41,47,61,73,53,103,71,109,67,59,83,79,113,139,89,107,101,151,97,181,...

n from 1,000,000 to 1,500,000 gives this ordering:
5,7,13,11,19,17,23,31,29,43,37,47,41,61,73,53,59,103,67,71,109,83,79,113,89,107,139,101,151,181,97,...
Next, we can use strange-int-prime to define an equivalency class:
x is in [p] if strange-int-prime[x] = p
Now, in BKO:
-- find strange-int-primes:
sa: strange-int-prime-op |*> #=> strange-int-prime |_self>
sa: map[strange-int-prime-op,strange-int-prime] range(|number: 2>,|number: 1000>)

-- find inverse-strange-int-primes:
sa: find-inverse[strange-int-prime]

-- find primes:
sa: is-prime-op |*> #=> is-prime |_self>
sa: map[is-prime-op,is-prime] range(|number: 2>,|number: 1000>)

-- find inverse-primes:
sa: find-inverse[is-prime]

-- for display reasons, just want the first 25 inverse-strange-int-primes:
sa: inverse-strange-int-prime-list |*> #=> select[1,25] inverse-strange-int-prime |_self>

-- now our pretty table:
sa: table[prime,inverse-strange-int-prime-list] select[1,75] inverse-is-prime |yes>
+-------+---------------------------------------------------------------------------------------------------------------------------+
| prime | inverse-strange-int-prime-list                                                                                            |
+-------+---------------------------------------------------------------------------------------------------------------------------+
| 2     | 2                                                                                                                         |
| 3     | 3                                                                                                                         |
| 5     | 5, 6, 8, 9, 14, 15, 16, 18, 20, 24, 26, 27, 33, 39, 44, 49, 51, 55, 62, 65, 66, 69, 70, 74, 77                            |
| 7     | 7, 10, 12, 21, 25, 30, 32, 35, 36, 38, 42, 46, 50, 60, 64, 68, 72, 81, 87, 124, 129, 141, 152, 155, 158                   |
| 11    | 11, 28, 40, 45, 48, 54, 86, 111, 115, 138, 164, 187, 215, 218, 226, 249, 258, 266, 287, 319, 329, 338, 380, 391, 407      |
| 13    | 13, 22, 56, 57, 63, 75, 80, 85, 90, 96, 102, 108, 121, 122, 146, 159, 166, 182, 212, 236, 260, 284, 308, 312, 314         |
| 17    | 17, 52, 88, 99, 147, 175, 194, 210, 224, 235, 250, 252, 282, 300, 320, 346, 360, 384, 405, 415, 432, 451, 466, 486, 498   |
| 19    | 19, 34, 93, 104, 117, 145, 165, 174, 176, 198, 245, 253, 289, 294, 303, 326, 350, 356, 420, 448, 452, 494, 500, 502, 504  |
| 23    | 23, 76, 136, 153, 219, 273, 302, 325, 355, 385, 390, 416, 426, 462, 468, 542, 550, 596, 655, 660, 686, 704, 706, 766, 786 |
| 29    | 29, 184, 207, 399, 475, 507, 543, 570, 595, 608, 684, 714, 715, 794, 847, 850, 858, 895                                   |
| 31    | 31, 58, 265, 318, 345, 368, 414, 517, 526, 561, 665, 697, 798, 833, 841, 845, 950                                         |
| 37    | 37, 248, 279, 435, 464, 522, 554, 759, 866, 867                                                                           |
| 41    | 41, 148, 651, 775, 930, 992                                                                                               |
| 43    | 43, 82, 237, 296, 333, 662, 781, 879, 932, 957                                                                            |
| 47    | 47, 172, 328, 369, 734, 777, 835, 925                                                                                     |
| 53    | 53, 376, 423, 842, 903                                                                                                    |
| 59    | 59, 424, 477                                                                                                              |
| 61    | 61, 118, 565, 678, 795, 848, 954                                                                                          |
| 67    | 67, 488, 549, 885, 944                                                                                                    |
| 71    | 71, 268                                                                                                                   |
| 73    | 73, 142, 417, 536, 603, 685, 822                                                                                          |
| 79    | 79, 584, 657                                                                                                              |
| 83    | 83, 316, 939                                                                                                              |
| 89    | 89, 664, 747                                                                                                              |
| 97    | 97                                                                                                                        |
| 101   | 101, 388                                                                                                                  |
| 103   | 103, 202, 597, 776, 873, 985                                                                                              |
| 107   | 107, 412, 808, 909                                                                                                        |
| 109   | 109, 214, 633, 824, 927                                                                                                   |
| 113   | 113, 436, 856, 963                                                                                                        |
| 127   | 127                                                                                                                       |
| 131   | 131, 508                                                                                                                  |
| 137   | 137                                                                                                                       |
| 139   | 139, 274, 813                                                                                                             |
| 149   | 149                                                                                                                       |
| 151   | 151, 298                                                                                                                  |
| 157   | 157                                                                                                                       |
| 163   | 163                                                                                                                       |
| 167   | 167, 652                                                                                                                  |
| 173   | 173                                                                                                                       |
| 179   | 179                                                                                                                       |
| 181   | 181, 358                                                                                                                  |
| 191   | 191                                                                                                                       |
| 193   | 193, 382                                                                                                                  |
| 197   | 197, 772                                                                                                                  |
| 199   | 199, 394                                                                                                                  |
| 211   | 211                                                                                                                       |
| 223   | 223                                                                                                                       |
| 227   | 227, 892                                                                                                                  |
| 229   | 229, 454                                                                                                                  |
| 233   | 233, 916                                                                                                                  |
| 239   | 239                                                                                                                       |
| 241   | 241, 478                                                                                                                  |
| 251   | 251                                                                                                                       |
| 257   | 257                                                                                                                       |
| 263   | 263                                                                                                                       |
| 269   | 269                                                                                                                       |
| 271   | 271, 538                                                                                                                  |
| 277   | 277                                                                                                                       |
| 281   | 281                                                                                                                       |
| 283   | 283, 562                                                                                                                  |
| 293   | 293                                                                                                                       |
| 307   | 307                                                                                                                       |
| 311   | 311                                                                                                                       |
| 313   | 313, 622                                                                                                                  |
| 317   | 317                                                                                                                       |
| 331   | 331                                                                                                                       |
| 337   | 337                                                                                                                       |
| 347   | 347                                                                                                                       |
| 349   | 349, 694                                                                                                                  |
| 353   | 353                                                                                                                       |
| 359   | 359                                                                                                                       |
| 367   | 367                                                                                                                       |
| 373   | 373                                                                                                                       |
| 379   | 379                                                                                                                       |
+-------+---------------------------------------------------------------------------------------------------------------------------+
The interpretation being all elements in the second column are members of the strange-int-prime equivalency class of the prime in the first column.

Finally, let's visualize our sum-of-prime factor tree:
sa: context sum of prime factors
sa: SoPF-op |*> #=> extract-value strange-int merge-labels(|number: > + |_self>)
sa: map[SoPF-op,SoPF] range(|2>,|1000>)
sa: save sum-of-prime-factors.sw
sa: q

$ ./sw2dot-v2.py sw-examples/sum-of-prime-factors.sw
Then open graph-examples/sum-of-prime-factors.dot with graphviz, producing:

NB: the picture is too large to see here, so click on it to see the full, but pretty, tree.

Anyway, the point of this post is not just about strange-ints. It is more about giving some examples of how you can slice and dice knowledge using the knowledge engine.

Friday, 4 December 2015

update on visualizing sw files

I had a bit more of a read of the DOT manual, and found a neater way to write my dot files. So, here is the new code.

Now, a couple of examples:

First, Fibonacci numbers, with some saved examples.
The sw:
|context> => |context: Fibonacci>

fib |0> => |1>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)
fib-ratio |*> #=> arithmetic( fib |_self> , |/>, fib n-1 |_self> )

fib |2> => |2>
fib |3> => |3>
fib |4> => |5>
fib |5> => |8>
fib |6> => |13>
fib |7> => |21>
fib |8> => |34>
fib |9> => |55>
fib |10> => |89>
fib |11> => |144>
fib |12> => |233>
fib |13> => |377>
fib |14> => |610>
fib |15> => |987>
fib |16> => |1597>
fib |17> => |2584>
fib |18> => |4181>
fib |19> => |6765>
fib |20> => |10946>
fib |21> => |17711>
fib |22> => |28657>
fib |23> => |46368>
fib |24> => |75025>
fib |25> => |121393>
fib |26> => |196418>
fib |27> => |317811>
fib |28> => |514229>
fib |29> => |832040>
fib |30> => |1346269>
And the dot:
digraph g {
"context" -> "Fibonacci"
"0" -> "1" [label="fib",arrowhead=normal]
"1" -> "1" [label="fib",arrowhead=normal]
"*" -> "arithmetic(|_self>,|->,|1>)" [label="n-1",arrowhead=box]
"*" -> "arithmetic(|_self>,|->,|2>)" [label="n-2",arrowhead=box]
"*" -> "arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)" [label="fib",arrowhead=box]
"*" -> "arithmetic( fib |_self> , |/>, fib n-1 |_self> )" [label="fib-ratio",arrowhead=box]
"2" -> "2" [label="fib",arrowhead=normal]
"3" -> "3" [label="fib",arrowhead=normal]
"4" -> "5" [label="fib",arrowhead=normal]
"5" -> "8" [label="fib",arrowhead=normal]
"6" -> "13" [label="fib",arrowhead=normal]
"7" -> "21" [label="fib",arrowhead=normal]
"8" -> "34" [label="fib",arrowhead=normal]
"9" -> "55" [label="fib",arrowhead=normal]
"10" -> "89" [label="fib",arrowhead=normal]
"11" -> "144" [label="fib",arrowhead=normal]
"12" -> "233" [label="fib",arrowhead=normal]
"13" -> "377" [label="fib",arrowhead=normal]
"14" -> "610" [label="fib",arrowhead=normal]
"15" -> "987" [label="fib",arrowhead=normal]
"16" -> "1597" [label="fib",arrowhead=normal]
"17" -> "2584" [label="fib",arrowhead=normal]
"18" -> "4181" [label="fib",arrowhead=normal]
"19" -> "6765" [label="fib",arrowhead=normal]
"20" -> "10946" [label="fib",arrowhead=normal]
"21" -> "17711" [label="fib",arrowhead=normal]
"22" -> "28657" [label="fib",arrowhead=normal]
"23" -> "46368" [label="fib",arrowhead=normal]
"24" -> "75025" [label="fib",arrowhead=normal]
"25" -> "121393" [label="fib",arrowhead=normal]
"26" -> "196418" [label="fib",arrowhead=normal]
"27" -> "317811" [label="fib",arrowhead=normal]
"28" -> "514229" [label="fib",arrowhead=normal]
"29" -> "832040" [label="fib",arrowhead=normal]
"30" -> "1346269" [label="fib",arrowhead=normal]
}
 
And now, some plurals:
|context> => |context: learning plurals>

source |context: learning plurals> => |url: http://www.macmillandictionary.com/thesaurus-category/british/irregular-plurals>

plural |word: *> #=> merge-labels(|_self> + |s>)

plural |word: calf> => |word: calves>
plural |word: child> => |word: children>
plural |word: corpus> => |word: corpora>
plural |word: elf> => |word: elves>
plural |word: foot> => |word: feet>
plural |word: goose> => |word: geese>
plural |word: genus> => |word: genera>
plural |word: half> => |word: halves>
plural |word: hoof> => |word: hooves>
plural |word: index> => |word: indices>
plural |word: knife> => |word: knives>
plural |word: leaf> => |word: leaves>
plural |word: life> => |word: lives>
plural |word: loaf> => |word: loaves>
plural |word: matrix> => |word: matrices>
plural |word: medium> => |word: media>
plural |word: man> => |word: men>
plural |word: mouse> => |word: mice>
plural |word: ovum> => |word: ova>
plural |word: ox> => |word: oxen>
plural |word: penny> => |word: pence>
plural |word: person> => |word: people>
plural |word: quantum> => |word: quanta>
plural |word: radius> => |word: radii>
plural |word: scarf> => |word: scarves>
plural |word: self> => |word: selves>
plural |word: serum> => |word: sera>
plural |word: sheaf> => |word: sheaves>
plural |word: shelf> => |word: shelves>
plural |word: sky> => |word: skies>
plural |word: stratum> => |word: strata>
plural |word: tooth> => |word: teeth>
plural |word: testis> => |word: testes>
plural |word: this> => |word: these>
plural |word: that> => |word: those>
plural |word: wife> => |word: wives>
plural |word: wolf> => |word: wolves>
plural |word: woman> => |word: women>
The dot:
digraph g {
"context" -> "learning plurals"
"context: learning plurals" -> "url: http://www.macmillandictionary.com/thesaurus-category/british/irregular-plurals" [label="source",arrowhead=normal]
"word: *" -> "merge-labels(|_self> + |s>)" [label="plural",arrowhead=box]
"word: calf" -> "word: calves" [label="plural",arrowhead=normal]
"word: child" -> "word: children" [label="plural",arrowhead=normal]
"word: corpus" -> "word: corpora" [label="plural",arrowhead=normal]
"word: elf" -> "word: elves" [label="plural",arrowhead=normal]
"word: foot" -> "word: feet" [label="plural",arrowhead=normal]
"word: goose" -> "word: geese" [label="plural",arrowhead=normal]
"word: genus" -> "word: genera" [label="plural",arrowhead=normal]
"word: half" -> "word: halves" [label="plural",arrowhead=normal]
"word: hoof" -> "word: hooves" [label="plural",arrowhead=normal]
"word: index" -> "word: indices" [label="plural",arrowhead=normal]
"word: knife" -> "word: knives" [label="plural",arrowhead=normal]
"word: leaf" -> "word: leaves" [label="plural",arrowhead=normal]
"word: life" -> "word: lives" [label="plural",arrowhead=normal]
"word: loaf" -> "word: loaves" [label="plural",arrowhead=normal]
"word: matrix" -> "word: matrices" [label="plural",arrowhead=normal]
"word: medium" -> "word: media" [label="plural",arrowhead=normal]
"word: man" -> "word: men" [label="plural",arrowhead=normal]
"word: mouse" -> "word: mice" [label="plural",arrowhead=normal]
"word: ovum" -> "word: ova" [label="plural",arrowhead=normal]
"word: ox" -> "word: oxen" [label="plural",arrowhead=normal]
"word: penny" -> "word: pence" [label="plural",arrowhead=normal]
"word: person" -> "word: people" [label="plural",arrowhead=normal]
"word: quantum" -> "word: quanta" [label="plural",arrowhead=normal]
"word: radius" -> "word: radii" [label="plural",arrowhead=normal]
"word: scarf" -> "word: scarves" [label="plural",arrowhead=normal]
"word: self" -> "word: selves" [label="plural",arrowhead=normal]
"word: serum" -> "word: sera" [label="plural",arrowhead=normal]
"word: sheaf" -> "word: sheaves" [label="plural",arrowhead=normal]
"word: shelf" -> "word: shelves" [label="plural",arrowhead=normal]
"word: sky" -> "word: skies" [label="plural",arrowhead=normal]
"word: stratum" -> "word: strata" [label="plural",arrowhead=normal]
"word: tooth" -> "word: teeth" [label="plural",arrowhead=normal]
"word: testis" -> "word: testes" [label="plural",arrowhead=normal]
"word: this" -> "word: these" [label="plural",arrowhead=normal]
"word: that" -> "word: those" [label="plural",arrowhead=normal]
"word: wife" -> "word: wives" [label="plural",arrowhead=normal]
"word: wolf" -> "word: wolves" [label="plural",arrowhead=normal]
"word: woman" -> "word: women" [label="plural",arrowhead=normal]
}
And I guess that is about it for this post.

And a brief comment. We could tweak my script still further. So far I haven't made use of this notation:
"some node" -> {"a node" "b node" "c node"}
I've been using the long form:
"some node" -> "a node"
"some node" -> "b node"
"some node" -> "c node"

But I don't know if I can be bothered to do that. I'll think about it.
I guess it shouldn't be too hard. Just use " ".join(...).

Thursday, 3 December 2015

visualizing sw files

Happy to announce I made some major progress in the last couple of days! I discovered a nice tidy graph description language called DOT, and realized with not much work I could convert my sw files to DOT, and then use graphviz to plot the graphs. Suddenly, my math-looking sw files can now be made into pretty pictures (at least for smaller sw files. I tried names.sw but graphviz crashed out!).

This means I now have 4 ways to vizualize my sw data:
1) matrices
2) tables
3) bar charts
4) these network graphs

Now, let's give some examples.

Let's start with our earliest example, the www-document flow-chart.
For comparison, here is the sw:
|context> => |context: www proposal>

describes |document: www proposal> => |"Hypertext"> + |A Proposal "Mesh">
refers-to |document: www proposal> => |Comms ACM>
describes |Comms ACM> => |"Hypertext">
includes |"Hypertext"> => |Linked information> + |Hypermedia>
for-example |Linked information> => |Hyper Card> + |ENQUIRE> + |A Proposal "Mesh">
describes |a proposal "mesh"> => |CERN>
unifies |a proposal "mesh"> => |ENQUIRE> + |VAX/NOTES> + |uucp News> + |CERNDOC>
examples |Computer conferencing> => |IBM GroupTalk> + |uucp News> + |VAX/NOTES> + |A Proposal "Mesh">
for-example |Hierarchical systems> => |CERN> + |CERNDOC> + |Vax/Notes> + |uucp News> + |IBM GroupTalk>
includes |CERNDOC> => |document: www proposal>
wrote |person: Tim Berners-Lee> => |document: www proposal>
Here it is in DOT:
$ cat graph-examples/www-proposal.dot
digraph g {
context -> context_name
context_name [label="www proposal"]
n0 -> n1 [label="describes",arrowhead="normal"]
n0 -> n2 [label="describes",arrowhead="normal"]
n0 -> n3 [label="refers-to",arrowhead="normal"]
n3 -> n1 [label="describes",arrowhead="normal"]
n1 -> n4 [label="includes",arrowhead="normal"]
n1 -> n5 [label="includes",arrowhead="normal"]
n4 -> n6 [label="for-example",arrowhead="normal"]
n4 -> n7 [label="for-example",arrowhead="normal"]
n4 -> n2 [label="for-example",arrowhead="normal"]
n8 -> n9 [label="describes",arrowhead="normal"]
n8 -> n7 [label="unifies",arrowhead="normal"]
n8 -> n10 [label="unifies",arrowhead="normal"]
n8 -> n11 [label="unifies",arrowhead="normal"]
n8 -> n12 [label="unifies",arrowhead="normal"]
n13 -> n14 [label="examples",arrowhead="normal"]
n13 -> n11 [label="examples",arrowhead="normal"]
n13 -> n10 [label="examples",arrowhead="normal"]
n13 -> n2 [label="examples",arrowhead="normal"]
n15 -> n9 [label="for-example",arrowhead="normal"]
n15 -> n12 [label="for-example",arrowhead="normal"]
n15 -> n16 [label="for-example",arrowhead="normal"]
n15 -> n11 [label="for-example",arrowhead="normal"]
n15 -> n14 [label="for-example",arrowhead="normal"]
n12 -> n0 [label="includes",arrowhead="normal"]
n17 -> n0 [label="wrote",arrowhead="normal"]
n6 [label="Hyper Card"]
n1 [label="\"Hypertext\""]
n17 [label="person: Tim Berners-Lee"]
n10 [label="VAX/NOTES"]
n13 [label="Computer conferencing"]
n4 [label="Linked information"]
n7 [label="ENQUIRE"]
n2 [label="A Proposal \"Mesh\""]
n3 [label="Comms ACM"]
n5 [label="Hypermedia"]
n8 [label="a proposal \"mesh\""]
n14 [label="IBM GroupTalk"]
n16 [label="Vax/Notes"]
n9 [label="CERN"]
n11 [label="uucp News"]
n12 [label="CERNDOC"]
n15 [label="Hierarchical systems"]
n0 [label="document: www proposal"]
}
Here it is after graphviz does its thing:
Here is another example. First the sw:
|context> => |context: friends>

friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>
friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
Here is the DOT:
digraph g {
context -> context_name
context_name [label="friends"]
n0 -> n1 [label="friends",arrowhead="normal"]
n0 -> n2 [label="friends",arrowhead="normal"]
n0 -> n3 [label="friends",arrowhead="normal"]
n0 -> n4 [label="friends",arrowhead="normal"]
n0 -> n5 [label="friends",arrowhead="normal"]
n0 -> n6 [label="friends",arrowhead="normal"]
n0 -> n7 [label="friends",arrowhead="normal"]
n0 -> n8 [label="friends",arrowhead="normal"]
n9 -> n8 [label="friends",arrowhead="normal"]
n9 -> n10 [label="friends",arrowhead="normal"]
n9 -> n7 [label="friends",arrowhead="normal"]
n9 -> n1 [label="friends",arrowhead="normal"]
n9 -> n11 [label="friends",arrowhead="normal"]
n9 -> n12 [label="friends",arrowhead="normal"]
n9 -> n13 [label="friends",arrowhead="normal"]
n6 [label="Patrick"]
n9 [label="Sam"]
n8 [label="Charlie"]
n7 [label="Emma"]
n11 [label="Rober"]
n13 [label="Julie"]
n10 [label="George"]
n0 [label="Fred"]
n5 [label="Rob"]
n2 [label="Harry"]
n1 [label="Jack"]
n4 [label="Mary"]
n12 [label="Frank"]
n3 [label="Ed"]
}
And after graphviz does its thing:
Here is methanol.sw:
|context> => |context: methanol>

molecular-pieces |molecule: methanol> => |methanol: 1> + |methanol: 2> + |methanol: 3> + |methanol: 4> + |methanol: 5> + |methanol: 6>

atom-type |methanol: 1> => |atom: H>
bonds-to |methanol: 1> => |methanol: 4>

atom-type |methanol: 2> => |atom: H>
bonds-to |methanol: 2> => |methanol: 4>

atom-type |methanol: 3> => |atom: H>
bonds-to |methanol: 3> => |methanol: 4>

atom-type |methanol: 4> => |atom: C>
bonds-to |methanol: 4> => |methanol: 1> + |methanol: 2> + |methanol: 3> + |methanol: 5>

atom-type |methanol: 5> => |atom: O>
bonds-to |methanol: 5> => |methanol: 4> + |methanol: 6>

atom-type |methanol: 6> => |atom: H>
bonds-to |methanol: 6> => |methanol: 5>
Here is methanol.dot:
digraph g {
context -> context_name
context_name [label="methanol"]
n0 -> n1 [label="molecular-pieces",arrowhead="normal"]
n0 -> n2 [label="molecular-pieces",arrowhead="normal"]
n0 -> n3 [label="molecular-pieces",arrowhead="normal"]
n0 -> n4 [label="molecular-pieces",arrowhead="normal"]
n0 -> n5 [label="molecular-pieces",arrowhead="normal"]
n0 -> n6 [label="molecular-pieces",arrowhead="normal"]
n1 -> n7 [label="atom-type",arrowhead="normal"]
n1 -> n4 [label="bonds-to",arrowhead="normal"]
n2 -> n7 [label="atom-type",arrowhead="normal"]
n2 -> n4 [label="bonds-to",arrowhead="normal"]
n3 -> n7 [label="atom-type",arrowhead="normal"]
n3 -> n4 [label="bonds-to",arrowhead="normal"]
n4 -> n8 [label="atom-type",arrowhead="normal"]
n4 -> n1 [label="bonds-to",arrowhead="normal"]
n4 -> n2 [label="bonds-to",arrowhead="normal"]
n4 -> n3 [label="bonds-to",arrowhead="normal"]
n4 -> n5 [label="bonds-to",arrowhead="normal"]
n5 -> n9 [label="atom-type",arrowhead="normal"]
n5 -> n4 [label="bonds-to",arrowhead="normal"]
n5 -> n6 [label="bonds-to",arrowhead="normal"]
n6 -> n7 [label="atom-type",arrowhead="normal"]
n6 -> n5 [label="bonds-to",arrowhead="normal"]
n6 [label="methanol: 6"]
n0 [label="molecule: methanol"]
n2 [label="methanol: 2"]
n4 [label="methanol: 4"]
n3 [label="methanol: 3"]
n9 [label="atom: O"]
n5 [label="methanol: 5"]
n8 [label="atom: C"]
n1 [label="methanol: 1"]
n7 [label="atom: H"]
}
Here is methanol.png:
Here is the adjacency matrix from the movie "Good Will Hunting":
|context> => |context: good will hunting adjacency matrix>

adj |1> => |2> + |4>
adj |2> => |1> + |4> + 2|3>
adj |3> => 2|2>
adj |4> => |1> + |2>
Here is good-will-hunting-adjacency-matrix.dot:
digraph g {
context -> context_name
context_name [label="good will hunting adjacency matrix"]
n0 -> n1 [label="adj",arrowhead="normal"]
n0 -> n2 [label="adj",arrowhead="normal"]
n1 -> n0 [label="adj",arrowhead="normal"]
n1 -> n2 [label="adj",arrowhead="normal"]
n1 -> n3 [label="adj",arrowhead="normal"]
n3 -> n1 [label="adj",arrowhead="normal"]
n2 -> n0 [label="adj",arrowhead="normal"]
n2 -> n1 [label="adj",arrowhead="normal"]
n3 [label="3"]
n0 [label="1"]
n2 [label="4"]
n1 [label="2"]
}
Here is the graph:
Here is a simple network:
O |a1> => |a2>
O |a2> => |a3>
O |a3> => |a4>
O |a4> => |a5>
O |a5> => |a6>
O |a6> => |a7>
O |a7> => |a8>
O |a8> => |a9>
O |a9> => |a10>
O |a10> => |a1> + |b1>

O |b1> => |b2>
O |b2> => |b3>
O |b3> => |b4>
O |b4> => |b5>
O |b5> => |b6>
O |b6> => |b7>
O |b7> => |b1>
Here is the DOT:
digraph g {
n0 -> n1 [label="O",arrowhead="normal"]
n1 -> n2 [label="O",arrowhead="normal"]
n2 -> n3 [label="O",arrowhead="normal"]
n3 -> n4 [label="O",arrowhead="normal"]
n4 -> n5 [label="O",arrowhead="normal"]
n5 -> n6 [label="O",arrowhead="normal"]
n6 -> n7 [label="O",arrowhead="normal"]
n7 -> n8 [label="O",arrowhead="normal"]
n8 -> n9 [label="O",arrowhead="normal"]
n9 -> n0 [label="O",arrowhead="normal"]
n9 -> n10 [label="O",arrowhead="normal"]
n10 -> n11 [label="O",arrowhead="normal"]
n11 -> n12 [label="O",arrowhead="normal"]
n12 -> n13 [label="O",arrowhead="normal"]
n13 -> n14 [label="O",arrowhead="normal"]
n14 -> n15 [label="O",arrowhead="normal"]
n15 -> n16 [label="O",arrowhead="normal"]
n16 -> n10 [label="O",arrowhead="normal"]
n16 [label="b7"]
n3 [label="a4"]
n6 [label="a7"]
n0 [label="a1"]
n13 [label="b4"]
n15 [label="b6"]
n4 [label="a5"]
n11 [label="b2"]
n9 [label="a10"]
n7 [label="a8"]
n5 [label="a6"]
n12 [label="b3"]
n2 [label="a3"]
n10 [label="b1"]
n14 [label="b5"]
n1 [label="a2"]
n8 [label="a9"]
}
Here is the image:
And finally, let's finish with the early US presidents. sw-file, dot-file, and the image:
Anyway, all nice and pretty!
Some more dot files and png files.

Wednesday, 25 November 2015

revisiting wikipedia inverse-links-to semantic similarities

Nothing new here, just some more wikipedia semantic similarity examples. The motivation was partly word2vec. They have some examples of semantic similarity using their word vectors. I had planned to write my own word2sp but so far my idea failed! And I couldn't use their word vectors because they have negative coeffs, while my similarity metric requires positive coeffs.

So in the mean-time I decided to re-run my wikipedia code. I tried to use 300,000 wikipedia links sw file, but that failed too. It needed too much RAM and took too long to run. I thought I had used it in the past, in which case I don't know why it failed this time!

Here is the first word2vec example (distance to "france"):
                 Word       Cosine distance
-------------------------------------------
                spain              0.678515
              belgium              0.665923
          netherlands              0.652428
                italy              0.633130
          switzerland              0.622323
           luxembourg              0.610033
             portugal              0.577154
               russia              0.571507
              germany              0.563291
            catalonia              0.534176
Here it is using my code:
sa: load 30k--wikipedia-links.sw
sa: find-inverse[links-to]
sa: T |*> #=> table[page,coeff] select[1,200] 100 self-similar[inverse-links-to] |_self>
sa: T |WP: France>
+--------------------------------+--------+
| page                           | coeff  |
+--------------------------------+--------+
| France                         | 100.0  |
| Germany                        | 31.771 |
| United_Kingdom                 | 30.537 |
| Italy                          | 27.452 |
| Spain                          | 23.566 |
| United_States                  | 20.152 |
| Japan                          | 19.556 |
| Netherlands                    | 19.309 |
| Russia                         | 18.877 |
| Canada                         | 18.384 |
| Europe                         | 17.273 |
| India                          | 17.212 |
| China                          | 16.78  |
| Paris                          | 16.595 |
| England                        | 16.286 |
| World_War_II                   | 15.923 |
| Australia                      | 15.238 |
| Soviet_Union                   | 14.867 |
| Belgium                        | 14.189 |
| Poland                         | 14.127 |
| Portugal                       | 13.819 |
| World_War_I                    | 13.757 |
| Austria                        | 13.695 |
| Sweden                         | 13.572 |
| Switzerland                    | 13.51  |
| Egypt                          | 12.647 |
| European_Union                 | 12.4   |
| Brazil                         | 12.338 |
| United_Nations                 | 12.091 |
| Greece                         | 11.906 |
| London                         | 11.906 |
| Israel                         | 11.783 |
| Turkey                         | 11.783 |
| Denmark                        | 11.598 |
| French_language                | 11.536 |
| Norway                         | 11.413 |
| Latin                          | 10.611 |
| Rome                           | 10.364 |
| Mexico                         | 10.364 |
| English_language               | 9.994  |
| South_Africa                   | 9.747  |
...
which works pretty well I must say.

Here is the next word2vec example (distance to San Francisco):
                 Word       Cosine distance
-------------------------------------------
          los_angeles              0.666175
          golden_gate              0.571522
              oakland              0.557521
           california              0.554623
            san_diego              0.534939
             pasadena              0.519115
              seattle              0.512098
                taiko              0.507570
              houston              0.499762
     chicago_illinois              0.491598
Here it is using my code:
+---------------------------------------+--------+
| page                                  | coeff  |
+---------------------------------------+--------+
| San_Francisco                         | 100.0  |
| Los_Angeles                           | 16.704 |
| Chicago                               | 15.919 |
| 1924                                  | 15.522 |
| 1916                                  | 14.566 |
| California                            | 14.502 |
| 1915                                  | 14.286 |
| 2014                                  | 14.217 |
| 1933                                  | 14.031 |
| 1913                                  | 14.006 |
| 1918                                  | 14.0   |
| 1930                                  | 14.0   |
| Philadelphia                          | 13.99  |
| 1925                                  | 13.984 |
| 1931                                  | 13.904 |
| 1920                                  | 13.802 |
| 1932                                  | 13.776 |
| 1942                                  | 13.744 |
| 1999                                  | 13.725 |
...
Hrmm... that didn't work so great. I wonder why.

Here is the next word2vec example:
Enter word or sentence (EXIT to break): /en/geoffrey_hinton

                        Word       Cosine distance
--------------------------------------------------
           /en/marvin_minsky              0.457204
             /en/paul_corkum              0.443342
 /en/william_richard_peltier              0.432396
           /en/brenda_milner              0.430886
    /en/john_charles_polanyi              0.419538
          /en/leslie_valiant              0.416399
         /en/hava_siegelmann              0.411895
            /en/hans_moravec              0.406726
         /en/david_rumelhart              0.405275
             /en/godel_prize              0.405176
And here it is using my code:
+------------------------------------------------------------+--------+
| page                                                       | coeff  |
+------------------------------------------------------------+--------+
| Geoffrey_Hinton                                            | 100    |
| perceptron                                                 | 66.667 |
| Tom_M._Mitchell                                            | 66.667 |
| computational_learning_theory                              | 66.667 |
| Nils_Nilsson_(researcher)                                  | 66.667 |
| beam_search                                                | 66.667 |
| Raj_Reddy                                                  | 50     |
| AI_effect                                                  | 40     |
| ant_colony_optimization                                    | 40     |
| List_of_artificial_intelligence_projects                   | 33.333 |
| AI-complete                                                | 33.333 |
| Cyc                                                        | 33.333 |
| Hugo_de_Garis                                              | 33.333 |
| Joyce_K._Reynolds                                          | 33.333 |
| Kleene_closure                                             | 33.333 |
| Mondegreen                                                 | 33.333 |
| Supervised_learning                                        | 33.333 |
...
And now a couple more examples:
sa: T |WP: Linux>
+------------------------------------------------+--------+
| page                                           | coeff  |
+------------------------------------------------+--------+
| Linux                                          | 100.0  |
| Microsoft_Windows                              | 46.629 |
| operating_system                               | 37.333 |
| Unix                                           | 28.956 |
| Mac_OS_X                                       | 26.936 |
| C_(programming_language)                       | 24.242 |
| Microsoft                                      | 22.535 |
| GNU_General_Public_License                     | 22.222 |
| Mac_OS                                         | 19.529 |
| Unix-like                                      | 19.192 |
| IBM                                            | 19.048 |
| open_source                                    | 17.845 |
| FreeBSD                                        | 17.845 |
| Apple_Inc.                                     | 16.498 |
| Java_(programming_language)                    | 15.825 |
| OS_X                                           | 15.488 |
| free_software                                  | 15.488 |
| Sun_Microsystems                               | 15.152 |
| C++                                            | 15.152 |
| source_code                                    | 15.152 |
| Macintosh                                      | 14.815 |
| MS-DOS                                         | 13.468 |
| Solaris_(operating_system)                     | 13.468 |
| PowerPC                                        | 13.131 |
| DOS                                            | 13.131 |
| Android_(operating_system)                     | 13.131 |
| Windows_NT                                     | 12.795 |
| Intel                                          | 12.458 |
| programming_language                           | 12.121 |
| personal_computer                              | 12.121 |
| OpenBSD                                        | 11.785 |
| Unicode                                        | 11.111 |
| graphical_user_interface                       | 10.774 |
| video_game                                     | 10.774 |
| Cross-platform                                 | 10.774 |
| Internet                                       | 10.574 |
| OS/2                                           | 10.438 |
...

sa: T |WP: Ronald_Reagan>
+---------------------------------------------------------+--------+
| page                                                    | coeff  |
+---------------------------------------------------------+--------+
| Ronald_Reagan                                           | 100.0  |
| John_F._Kennedy                                         | 22.951 |
| Bill_Clinton                                            | 22.404 |
| Barack_Obama                                            | 22.283 |
| George_H._W._Bush                                       | 22.131 |
| Jimmy_Carter                                            | 22.131 |
| Richard_Nixon                                           | 22.131 |
| George_W._Bush                                          | 22.131 |
| Republican_Party_(United_States)                        | 21.785 |
| Democratic_Party_(United_States)                        | 20.779 |
| United_States_Senate                                    | 19.444 |
| President_of_the_United_States                          | 17.538 |
| White_House                                             | 15.574 |
| Franklin_D._Roosevelt                                   | 15.301 |
| Vietnam_War                                             | 15.242 |
| United_States_House_of_Representatives                  | 14.754 |
| United_States_Congress                                  | 14.085 |
| Supreme_Court_of_the_United_States                      | 13.388 |
| Lyndon_B._Johnson                                       | 13.388 |
| Margaret_Thatcher                                       | 13.115 |
| Cold_War                                                | 13.093 |
| Dwight_D._Eisenhower                                    | 12.568 |
| Nobel_Peace_Prize                                       | 12.368 |
| The_Washington_Post                                     | 12.295 |
| Gerald_Ford                                             | 12.022 |
...

sa: T |WP: Los_Angeles>
+--------------------------------------------------+--------+
| page                                             | coeff  |
+--------------------------------------------------+--------+
| Los_Angeles                                      | 100.0  |
| Chicago                                          | 20.852 |
| California                                       | 18.789 |
| Los_Angeles_Times                                | 17.833 |
| San_Francisco                                    | 16.704 |
| New_York_City                                    | 15.536 |
| Philadelphia                                     | 14.221 |
| NBC                                              | 12.641 |
| Washington,_D.C.                                 | 11.484 |
| Boston                                           | 11.061 |
| USA_Today                                        | 10.609 |
| Texas                                            | 10.384 |
| Academy_Award                                    | 10.158 |
| Seattle                                          | 9.932  |
| New_York                                         | 9.88   |
| Time_(magazine)                                  | 9.851  |
| Mexico_City                                      | 9.707  |
| The_New_York_Times                               | 9.685  |
| Rolling_Stone                                    | 9.481  |
| CBS                                              | 9.481  |
| Toronto                                          | 9.481  |
...
Now for a final couple of comments. First, my code is painfully slow. But to be fair I'm a terrible programmer and this is research, not production, code. The point I'm trying to make is that a real programmer could probably make the speed acceptable, and hence usable.

For the second point, let me quote from word2vec:
"It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen')"

I haven't tested it, but I'm pretty sure my inverse-links-to sparse vectors do not have this fun property. That is why I want to create my own word2sp code. Though it would take a slightly different form in BKO. Something along the lines of:
vector |some object> => exclude(vector|France>,vector|Paris>) + vector|Italy>
table[object,coeff] select[1,20] 100 self-similar[vector] |some object>
should, if it works, return |Rome> as one of the top similarity matches.
Likewise:
vector |some object> => exclude(vector|man>,vector|king>) + vector|woman>
table[object,coeff] select[1,20] 100 self-similar[vector] |some object>
should return |queen> as a top hit.

Definitely something I would like to test, but I need working code first.

Monday, 23 November 2015

revisiting the letter rambler

After changing the back-end meaning of add_learn, from something that was really append-learn to a literal add_learn (I hope I didn't break anything in the process!), I decided to redo the letter rambler example. Now our ngrams have frequency information too. This has a big effect on the letter rambler, as I will shortly show.

First, a comparison of the resulting sw files:
append-learn version
add-learn (frequency) version
Noting they both use the same code

A couple of lines to visually show the difference (note the coefficients in the second case):
next-2-letters | by> => |  > + | t> + | s> + | a> + | h> + | w> + | n> + | d> + | o> + | q> + | M> + |. > + | e> + | m> + | p> + | b> + | r> + | y> + | H> + | C> + | i> + | c> + |?"> + | f> + | S> + | l> + | F> + | A> + |, > + | u> + | k> + | U>
next-2-letters | by> => |  > + 138.0| t> + 19.0| s> + 42.0| a> + 25.0| h> + 12.0| w> + 7.0| n> + 4.0| d> + 8.0| o> + 2.0| q> + 9.0| M> + |. > + 6.0| e> + 16.0| m> + 7.0| p> + 3.0| b> + 4.0| r> + 6.0| y> + | H> + | C> + 3.0| i> + 7.0| c> + |?"> + 2.0| f> + | S> + | l> + | F> + 3.0| A> + |, > + | u> + | k> + | U>
And now the new letter rambler:
sa: load ngram-letter-pairs--sherlock-holmes--add-learn.sw
sa: letter-ramble |*> #=> merge-labels(|_self> + weighted-pick-elt next-2-letters extract-3-tail-chars |_self>)
sa: letter-ramble^200 |The>
|They wered. Now did not acces of a humable, as shall beeched to belong meet as en bars in part the seriend's ask Mr. As when you. I could no had steps bell, and some forty, Mrs. Readings as so weathe that his is long colour minute mone out this, of the streat axe-Consibly after, whom I claim above use of my my doctorter inquiries. For and pisoda angry, as fulling heave dog-wheer ther obviouse when, >
And we can compare with the old version that had no frequency information by dropping back from weighted-pick-elt to just pick-elt (weighted-pick-elt takes coeffs in to account, while pick-elt does not):
sa: old-letter-ramble |*> #=> merge-labels(|_self> + pick-elt next-2-letters extract-3-tail-chars |_self>)
sa: old-letter-ramble^200 |The>
|The ont by, dea wed upset book, anxious him. Totts vanies tangibly ignotum tea cartyrdom Hosmoppine-margins," reporary, curson. Young criminas, apables Mr. Just you:  "Holbornine Aventrally, wore Sand. Royal brary. Warshy mergesty. John Ha! The dug an End on samps onlike a wound--1000 pound ord Suburly or 'G' wish us. Yes?" a "P," Holmask.  "Star' had. I owe, Winchisels whipping-schan legs. Having, I>
So similar, just a bit less English like.

That's all I wanted to show for this post.

Tuesday, 10 November 2015

an interpretation of my BKO scheme

I think it is long past time to try and give an interpretation of my BKO scheme. I guess the simplest description is that it is a quite general representation for networks. Certainly static networks, but also a lot of machinery to manipulate more dynamic networks. The long term aspiration is that in its simplest form the brain is a network, albeit a very, very large and complex one, so perhaps we could use some of the BKO structures to approximate a brain? The question is though, by analogy with human flight, are we trying to build a bird with all its intricate details, or just something that flies using aerodynamic principles and simple non-flapping wings? I think with our BKO scheme we are building a plane.

Now, back to the very basics!
Recall |some text> is a ket. Sure, but what is less clear is that |some text> can be considered a node in a network. So every ket in some sw file, corresponds to a node in the network represented by that sw file. In the brain they often map to concepts. For example |hungry>, |emotion: happy>, |url: http://write-up.semantic-db.org/>, |shopping list>, |person: Mary> etc.

And it is known (from epilepsy surgery) that there are specific neurons for say "Bill Clinton" and "Halle Berry". So we can assume our brain sw file will include the kets |person: Bill Clinton> and |person: Halle Berry>.

Further, it is well known there are mirror neurons and place neurons. So for example, watching someone eat stimulates, to a lesser or greater degree, the same neuron as thinking you want to eat, or actually eating. And in rat brains there are specific neurons that correspond to places. Which implies in a brain sw file we will have corresponding kets such as |eating food>, and |cafe near my house>.

But we can also have kets that have not a lot of meaning by themselves. They only acquire meaning in a superposition with other such kets. The best example I have is for web-page superpositions. These being quite close to the SDR's of Jeff Hawkins (Sparse Distributed Representations).

But we diverge from SDR's in that superpositions are not restricted to coefficients in {0,1}. Our superpositions can have arbitrary float coefficients, though almost always positive. The interpretation being that the coefficient of a ket is the strength that ket is currently activated. I guess in the brain this probably corresponds to how rapidly a neuron is firing within some integration period. An example being: 0.8|emotion: sad> + 12|hungry> corresponds to "a little sad but very hungry".

Next, we can use this to build static networks.
For example these simple learn rules:
op1 |u> => 7.2|a> + 0.02|b> + 13|c> + |d>
op2 |v> => 5|x> + 0.9|u> + 2.712|z>
Can be interpreted to mean:
op1 links the node |u> to |a> with link strength 7.2
op1 links the node |u> to |b> with link strength 0.02
op1 links the node |u> to |c> with link strength 13
op1 links the node |u> to |d> with link strength 1

op2 links the node |v> to |x> with link strength 5
op2 links the node |v> to |u> with link strength 0.9
op2 links the node |v> to |z> with link strength 2.712
And that should be sufficient to construct arbitrary static networks. For dynamic networks we need more work. For example, currently it is impossible to associate a "propagation time" with the links. Which contrasts with the brain, where it is obvious different pathways/links take different times to travel along. Whether this feature is important in building a "plane" I don't yet know. If it is though, I can imagine a couple of ways to roughly implement it. For example, the proposed swc language, which is an extension of the current sw file format, to include more familiar programming language structures could help with this.

Now, for dynamic networks we need our suite of function operators, and stored rules, and to a lesser extent memoizing rules. "#=>" and "!=>" respectively. I suspect we can already represent a lot of dynamic networks with what we already have. And as needs arise we can add a few new function operators here and there. But for a full brain, we almost certainly need the full swc language, or the sw/BKO python back-end.

Quite bizarrely there seems to be some correspondence to our structure, intended to describe a brain, with the notation used to describe quantum mechanics. I don't know how deeply it goes, though assuming too deeply would probably be a bit crazy. There is the obvious similarity with the notation, the borrowed kets, bra's, operators and superpositions. But also wave function collapse is very close to our weighted-pick-elt operator. And making a quantum measurement is quite close to asking a question in the console.

For example:
sa: is-alive |Schrodinger's cat> !=> normalize weighted-pick-elt (0.5|yes> + 0.5|no>)

sa: is-alive |Schrodinger's cat>
|yes>
There are of course some obvious differences. The biggest being that QM is over the complex numbers, and BKO is over real valued floats. And QM has a Schrodinger equation, and plenty of other maths that is not replicated in the BKO scheme. Though of course to build a full brain we would need some kind of time evolution operator. But it is unclear what form that would take.

Finally, the idea of a path integral. That a particle in some sense travels through space-time from starting point to end point along all possible pathways is somewhat unobvious to the non-Feynman's of the world. Yet the idea that a spike train travels from starting neuron to end neuron along all possible brain pathways is obvious, and a trivial observation. This correspondence is kind of weird. And I'm not sure how superficial or deep it really is. Given this correspondence maybe it is appropriate to call a brain "brain-space"? Where concepts map to neurons, and neurons map to a 3D location in a physical brain. And operators map superpositions to superpositions, and propagate signals through brain-space.

Let's finish with one possible mapping from simple BKO learn rules to a neural structure:
where:
- the BKO is:
supported-ops |x> => |op: op1> + |op: op2> + |op: op3>
op1 |x> => |a> + |b> + |c>
op2 |x> => |d> + |e>
op3 |x> => |f> + |g> + |h> + |i>
- large circles correspond to neuron cell bodies
- small circles correspond to synapses
- labeled lines correspond to axons

But this picture was from a long time ago. I now think it is probably vastly too simple!

new function operator: inhibition[t]

The idea is, given a superposition, try to increase the difference between the top most ket and the rest. I don't yet know when it will be useful, but it seems it might be, and it was only a couple of lines of code.

How does it work? Well, you feed in a parameter that specifies how much to suppress the smaller terms.
-- define a superposition:
sa: |list> => |a> + 5.2|b> + 23|c> + 13|d> + 17|e> + 17|f> + 15|g>

-- suppress everything except the biggest term:
sa: inhibition[1] "" |list>
0|a> + 0|b> + 23|c> + 0|d> + 0|e> + 0|f> + 0|g>

-- inhibit at half strength:
sa: inhibition[0.5] "" |list>
0.5|a> + 2.6|b> + 23|c> + 6.5|d> + 8.5|e> + 8.5|f> + 7.5|g>

-- negative inhibit (ie, suppress the biggest term):
sa: inhibition[-1] "" |list>
2|a> + 10.4|b> + 23|c> + 26|d> + 34|e> + 34|f> + 30|g>

-- and again:
sa: inhibition[-1]^2 "" |list>
4|a> + 20.8|b> + 46|c> + 52|d> + 34|e> + 34|f> + 60|g>

-- and again:
sa: inhibition[-1]^3 "" |list>
8|a> + 41.6|b> + 92|c> + 104|d> + 68|e> + 68|f> + 60|g>
Hopefully that makes some sense.

Update: inhibition[-1]^k may be one way to approach the idea of creativity. When thinking of a problem, the obvious, boring, answer will have higher coefficient. But if we suppress the highest few coefficients, then maybe we have something more creative. Of course, we need some way to test the proposed solution satisfies the required properties. I'm not yet sure the way to implement this, but it seems to be something that the brain makes heavy use of. Posing questions, and then hunting for answers with the right properties.

Wednesday, 28 October 2015

simple particle entanglement example

We can use our BKO scheme to encode a simplified version of particle entanglement. In this example, 2 particles, with spin either up or down. The idea is we don't know what state the particles are in until we measure. Makes use of weighted-pick-elt which has some similarities to wave-function collapse.

Here is some BKO:
----------------------------------------
|context> => |context: simple entanglement example>

entanglement-1 |particles> => |particle 1: spin up> + |particle 2: spin down>
entanglement-2 |particles> => |particle 1: spin down> + |particle 2: spin up>
the-list-of-possible-entanglements |particles> => |op: entanglement-1> + |op: entanglement-2>
measure |particles> #=> apply(weighted-pick-elt the-list-of-possible-entanglements|_self>,|_self>)
----------------------------------------
Now, let's measure our particles:
sa: measure |particles>
|particle 1: spin down> + |particle 2: spin up>
It's random, so let's try again:
sa: measure |particles>
|particle 1: spin down> + |particle 2: spin up>
And again:
sa: measure |particles>
|particle 1: spin up> + |particle 2: spin down>
I guess that is simple enough. And if we want to encode the idea that the particles take a fix state on measurement, then we should use memoizing rules instead of stored rules:
wave-fn-collapse-measure |particles> !=> apply(weighted-pick-elt the-list-of-possible-entanglements|_self>,|_self>)
Now, see what we have:
sa: dump
----------------------------------------
|context> => |context: simple entanglement example>

entanglement-1 |particles> => |particle 1: spin up> + |particle 2: spin down>
entanglement-2 |particles> => |particle 1: spin down> + |particle 2: spin up>
the-list-of-possible-entanglements |particles> => |op: entanglement-1> + |op: entanglement-2>
measure |particles> #=> apply(weighted-pick-elt the-list-of-possible-entanglements|_self>,|_self>)
wave-fn-collapse-measure |particles> !=> apply(weighted-pick-elt the-list-of-possible-entanglements|_self>,|_self>)
----------------------------------------
Now measure our particles and then see what we know:
sa: wave-fn-collapse-measure |particles>
|particle 1: spin up> + |particle 2: spin down>

sa: dump
----------------------------------------
|context> => |context: simple entanglement example>

entanglement-1 |particles> => |particle 1: spin up> + |particle 2: spin down>
entanglement-2 |particles> => |particle 1: spin down> + |particle 2: spin up>
the-list-of-possible-entanglements |particles> => |op: entanglement-1> + |op: entanglement-2>
measure |particles> #=> apply(weighted-pick-elt the-list-of-possible-entanglements|_self>,|_self>)
wave-fn-collapse-measure |particles> => |particle 1: spin up> + |particle 2: spin down>
----------------------------------------
Anyway, hope that is clear.

Sunday, 11 October 2015

shopping with process-reaction()

Today an example using process-reaction to go shopping. I guess the idea is that buying stuff is a kind of reaction:
price-of-object -> object
First, learn some prices:
price |apple> => 0.6|dollar>
price |orange> => 0.8|dollar>
price |milk> => 2.3|dollar>
price |coffee> => 5.5|dollar>
price |steak> => 9|dollar>
Now, let's go shopping (we have $30 to spend):
-- buy an orange:
sa: process-reaction(30|dollar>,price |orange>,|orange>)
29.2|dollar> + |orange>

-- buy 4 apples:
sa: process-reaction(29.2|dollar> + |orange>,price 4 |apple>,4 |apple>)
26.8|dollar> + |orange> + 4|apple>

-- buy milk, coffee and steak:
sa: process-reaction(26.8|dollar> + |orange> + 4|apple>,price |milk> + price |coffee> + price |steak>,|milk> + |coffee> + |steak>)
10|dollar> + |orange> + 4|apple> + |milk> + |coffee> + |steak>
Now we have it working, we can compact it! First, define a shopping list:
list-for |shopping> => |orange> + 4|apple> + |milk> + |coffee> + |steak>
Now, let's buy it all at once:
sa: process-reaction(30|dollar>,price list-for |shopping>,list-for |shopping>)
10|dollar> + |orange> + 4|apple> + |milk> + |coffee> + |steak>
I'm impressed how easy that was! Define your prices, define your shopping list, and then you are essentially done. And a key part of why it is so easy is the linearity of the price operator acting on the shopping list.

BTW, there is a quirk that you get stuff for free if it doesn't have a defined price. Whether we want to tweak process-reaction() to prevent this, or just be aware of it, I'm not yet sure. Anyway, here is an example. We add tomato to the shopping list, but we don't have a price:
sa: list-for |shopping> +=> |tomato>
sa: process-reaction(30|dollar>,price list-for |shopping>,list-for |shopping>)
10|dollar> + |orange> + 4|apple> + |milk> + |coffee> + |steak> + |tomato>
And note we still have $10, and a tomato. We got it for free! BTW, here is the price of the tomato:
sa: price |tomato>
|>
Hrmm... I now think it is impossible to tweak process-reaction() to handle undefined prices. Why? Because "price list-for |shopping>" is calculated before it is even sent to process-reaction() and |> being the identity element for superpositions means it is silently dropped.

Finally, we can find the cost of our shopping simply enough:
sa: price list-for |shopping>
20|dollar>
Update: I found one way to solve the free tomato problem. It goes something like this:
price |*> => |undefined price>
ie, if price is not defined for an object, return |undefined price> (making use of label descent). Now if we look at the price of our shopping list:
sa: price list-for |shopping>
20|dollar> + |undefined price>
Then we try to buy our full shopping list:
sa: process-reaction(30|dollar>,price list-for |shopping>,list-for |shopping>)
30|dollar>
And we see our shopping list didn't go through. This is a good thing. Process-reaction didn't know how to handle |undefined price>, and so the reaction was not processed.

Here is another way to handle it. The |undefined price> learn rule gets in the way, so let's drop back to this knowledge:
price |apple> => 0.6|dollar>
price |orange> => 0.8|dollar>
price |milk> => 2.3|dollar>
price |coffee> => 5.5|dollar>
price |steak> => 9|dollar>
list-for |shopping> => |orange> + 4|apple> + |milk> + |coffee> + |steak> + |tomato>
Define a new operator:
price-is-defined |*> #=> do-you-know price |_self>
Now, filter our shopping list to those we know the price of, and then buy the items:
sa: list-of |available items> => such-that[price-is-defined] list-for |shopping>
sa: process-reaction(30|dollar>,price list-of |available items>,list-of |available items>)
10|dollar> + |orange> + 4|apple> + |milk> + |coffee> + |steak>
And noting we didn't get a free tomato this time.

Update: just a quick conversion of operator names to be more like regular English:
the-price-for |apple> => 0.6|dollar>
the-price-for |orange> => 0.8|dollar>
the-price-for |milk> => 2.3|dollar>
the-price-for |coffee> => 5.5|dollar>
the-price-for |steak> => 9|dollar>
the |shopping list> => |orange> + 4|apple> + |milk> + |coffee> + |steak>
Now, ask "what is the price for the shopping list?":
sa: the-price-for the |shopping list>
20|dollar>
Cool, huh?

Eventually the plan is to have code to automatically cast English questions to BKO, and cast BKO answers back to English, but I don't fully know how to do that yet.