Thursday, 14 July 2016

random encode similarity matrices

In the HTM style sequence learning, there is an initial step where we random encode each object in the sequence. In this post, I want to have a look at the corresponding similarity matrices, and visually see how much overlap there are between the objects.

Here is the relevant code:
full |range> => range(|1>,|65536>)
encode |count> => pick[40] full |range>
encode |one> => pick[40] full |range>
encode |two> => pick[40] full |range>
encode |three> => pick[40] full |range>
encode |four> => pick[40] full |range>
encode |five> => pick[40] full |range>
encode |six> => pick[40] full |range>
encode |seven> => pick[40] full |range>
encode |Fibonacci> => pick[40] full |range>
encode |eight> => pick[40] full |range>
encode |thirteen> => pick[40] full |range>
encode |factorial> => pick[40] full |range>
encode |twenty-four> => pick[40] full |range>
encode |one-hundred-twenty> => pick[40] full |range>
Now, let's see the resulting similarity matrix:
simm-op |*> #=> 100 self-similar[encode] |_self>
map[simm-op,similarity] rel-kets[encode]
matrix[similarity]
[ count              ] = [  100.0  0      0      0      0      0      0      0      0      0      0      0      0      0      ] [ count              ]
[ eight              ]   [  0      100.0  0      0      0      0      0      2.5    0      0      0      0      0      0      ] [ eight              ]
[ factorial          ]   [  0      0      100.0  0      0      0      0      0      0      0      0      2.5    0      0      ] [ factorial          ]
[ Fibonacci          ]   [  0      0      0      100.0  0      0      0      0      0      0      0      0      0      0      ] [ Fibonacci          ]
[ five               ]   [  0      0      0      0      100.0  0      0      0      0      0      0      0      0      0      ] [ five               ]
[ four               ]   [  0      0      0      0      0      100.0  0      0      0      0      0      0      0      0      ] [ four               ]
[ one                ]   [  0      0      0      0      0      0      100.0  0      0      0      0      0      0      0      ] [ one                ]
[ one-hundred-twenty ]   [  0      2.5    0      0      0      0      0      100.0  0      0      0      0      0      0      ] [ one-hundred-twenty ]
[ seven              ]   [  0      0      0      0      0      0      0      0      100.0  0      0      0      0      0      ] [ seven              ]
[ six                ]   [  0      0      0      0      0      0      0      0      0      100.0  0      0      0      0      ] [ six                ]
[ thirteen           ]   [  0      0      0      0      0      0      0      0      0      0      100.0  0      2.5    0      ] [ thirteen           ]
[ three              ]   [  0      0      2.5    0      0      0      0      0      0      0      0      100.0  0      0      ] [ three              ]
[ twenty-four        ]   [  0      0      0      0      0      0      0      0      0      0      2.5    0      100.0  0      ] [ twenty-four        ]
[ two                ]   [  0      0      0      0      0      0      0      0      0      0      0      0      0      100.0  ] [ two                ]
Now, let's repeat, this time using only 2048 length vectors:
full |range> => range(|1>,|2048>)
encode |count> => pick[40] full |range>
encode |one> => pick[40] full |range>
encode |two> => pick[40] full |range>
encode |three> => pick[40] full |range>
encode |four> => pick[40] full |range>
encode |five> => pick[40] full |range>
encode |six> => pick[40] full |range>
encode |seven> => pick[40] full |range>
encode |Fibonacci> => pick[40] full |range>
encode |eight> => pick[40] full |range>
encode |thirteen> => pick[40] full |range>
encode |factorial> => pick[40] full |range>
encode |twenty-four> => pick[40] full |range>
encode |one-hundred-twenty> => pick[40] full |range>

simm-op |*> #=> 100 self-similar[encode] |_self>
map[simm-op,similarity] rel-kets[encode]
matrix[similarity]
[ count              ] = [  100.0  2.5    0      0      0      2.5    0      2.5    0      5      2.5    0      0      2.5    ] [ count              ]
[ eight              ]   [  2.5    100.0  2.5    2.5    0      0      2.5    0      5      2.5    5      7.5    5      0      ] [ eight              ]
[ factorial          ]   [  0      2.5    100.0  2.5    0      2.5    0      2.5    5      5      5      2.5    0      2.5    ] [ factorial          ]
[ Fibonacci          ]   [  0      2.5    2.5    100.0  0      5      7.5    2.5    0      2.5    5      5      2.5    2.5    ] [ Fibonacci          ]
[ five               ]   [  0      0      0      0      100.0  2.5    2.5    0      2.5    0      5      2.5    0      0      ] [ five               ]
[ four               ]   [  2.5    0      2.5    5      2.5    100.0  5      2.5    0      0      0      2.5    0      0      ] [ four               ]
[ one                ]   [  0      2.5    0      7.5    2.5    5      100.0  2.5    2.5    0      2.5    0      2.5    0      ] [ one                ]
[ one-hundred-twenty ]   [  2.5    0      2.5    2.5    0      2.5    2.5    100.0  2.5    5      0      0      5      0      ] [ one-hundred-twenty ]
[ seven              ]   [  0      5      5      0      2.5    0      2.5    2.5    100.0  5      2.5    0      0      5      ] [ seven              ]
[ six                ]   [  5      2.5    5      2.5    0      0      0      5      5      100.0  0      0      2.5    2.5    ] [ six                ]
[ thirteen           ]   [  2.5    5      5      5      5      0      2.5    0      2.5    0      100.0  2.5    0      2.5    ] [ thirteen           ]
[ three              ]   [  0      7.5    2.5    5      2.5    2.5    0      0      0      0      2.5    100.0  0      0      ] [ three              ]
[ twenty-four        ]   [  0      5      0      2.5    0      0      2.5    5      0      2.5    0      0      100.0  2.5    ] [ twenty-four        ]
[ two                ]   [  2.5    0      2.5    2.5    0      0      0      0      5      2.5    2.5    0      2.5    100.0  ] [ two                ]
Which confirms that 2048 is too small for my code. I wonder what happens with something bigger than 65536? Let's try 1000000:
[ count              ] = [  100.0  0      0      0      0      0      0      0      0      0      0      0      0      0      ] [ count              ]
[ eight              ]   [  0      100.0  0      0      0      0      0      0      0      0      0      0      0      0      ] [ eight              ]
[ factorial          ]   [  0      0      100.0  0      0      0      0      0      0      0      0      0      0      0      ] [ factorial          ]
[ Fibonacci          ]   [  0      0      0      100.0  0      0      0      0      0      0      0      0      0      0      ] [ Fibonacci          ]
[ five               ]   [  0      0      0      0      100.0  0      0      0      0      0      0      0      0      0      ] [ five               ]
[ four               ]   [  0      0      0      0      0      100.0  0      0      0      0      0      0      0      0      ] [ four               ]
[ one                ]   [  0      0      0      0      0      0      100.0  0      0      0      0      0      0      0      ] [ one                ]
[ one-hundred-twenty ]   [  0      0      0      0      0      0      0      100.0  0      0      0      0      0      0      ] [ one-hundred-twenty ]
[ seven              ]   [  0      0      0      0      0      0      0      0      100.0  0      0      0      0      0      ] [ seven              ]
[ six                ]   [  0      0      0      0      0      0      0      0      0      100.0  0      0      0      0      ] [ six                ]
[ thirteen           ]   [  0      0      0      0      0      0      0      0      0      0      100.0  0      0      0      ] [ thirteen           ]
[ three              ]   [  0      0      0      0      0      0      0      0      0      0      0      100.0  0      0      ] [ three              ]
[ twenty-four        ]   [  0      0      0      0      0      0      0      0      0      0      0      0      100.0  0      ] [ twenty-four        ]
[ two                ]   [  0      0      0      0      0      0      0      0      0      0      0      0      0      100.0  ] [ two                ]
Which is good. Now there are no overlaps between the encodings at all. But 1,000,000 is way too slow in my code. Not exactly sure why range() is so slow yet. The other question is how many objects can you encode before you get too many collisions? I don't know.

That's it for this post.

No comments:

Post a Comment