Thursday 2 April 2015

similarity matrices for wikipedia word frequency lists

Just a quick one this time. The similarity matrices for our wikipedia word frequency lists.

In the console:
-- load the data:
sa: load improved-WP-word-frequencies.sw

-- create our matrices:
sa: simm-1 |*> #=> 100 self-similar[words-1] |_self>
sa: simm-2 |*> #=> 100 self-similar[words-2] |_self>
sa: simm-3 |*> #=> 100 self-similar[words-3] |_self>
sa: map[simm-1,similarity-1] rel-kets[words-1] |>
sa: map[simm-2,similarity-2] rel-kets[words-2] |>
sa: map[simm-3,similarity-3] rel-kets[words-3] |>

-- display them:
sa: matrix[similarity-1]
[ WP: Adelaide         ] = [  100.0  56.86  15.13  37.53  38.7   27.41  24.25  ] [ WP: Adelaide         ]
[ WP: Australia        ]   [  56.86  100.0  16.48  37.53  40.39  27.63  24.32  ] [ WP: Australia        ]
[ WP: country list     ]   [  15.13  16.48  100.0  9.77   8.53   30.32  21.36  ] [ WP: country list     ]
[ WP: particle physics ]   [  37.53  37.53  9.77   100.0  51.62  22.68  19.5   ] [ WP: particle physics ]
[ WP: physics          ]   [  38.7   40.39  8.53   51.62  100.0  22.76  18.1   ] [ WP: physics          ]
[ WP: rivers           ]   [  27.41  27.63  30.32  22.68  22.76  100.0  24.52  ] [ WP: rivers           ]
[ WP: US presidents    ]   [  24.25  24.32  21.36  19.5   18.1   24.52  100.0  ] [ WP: US presidents    ]

sa: matrix[similarity-2]
[ WP: Adelaide         ] = [  100.0  15.04  1.73   6.4    7.71   4.39   3.2    ] [ WP: Adelaide         ]
[ WP: Australia        ]   [  15.04  100.0  2.27   5.92   8.04   4.43   4.26   ] [ WP: Australia        ]
[ WP: country list     ]   [  1.73   2.27   100.0  1.49   1.46   2.18   1.35   ] [ WP: country list     ]
[ WP: particle physics ]   [  6.4    5.92   1.49   100.0  13.86  3.81   3.28   ] [ WP: particle physics ]
[ WP: physics          ]   [  7.71   8.04   1.46   13.86  100.0  4.52   2.63   ] [ WP: physics          ]
[ WP: rivers           ]   [  4.39   4.43   2.18   3.81   4.52   100.0  2.98   ] [ WP: rivers           ]
[ WP: US presidents    ]   [  3.2    4.26   1.35   3.28   2.63   2.98   100.0  ] [ WP: US presidents    ]

sa: matrix[similarity-3]
[ WP: Adelaide         ] = [  100.0  2.59   0.16   0.64   0.73   0.24   0.1    ] [ WP: Adelaide         ]
[ WP: Australia        ]   [  2.59   100.0  0.34   0.47   0.96   0.35   0.53   ] [ WP: Australia        ]
[ WP: country list     ]   [  0.16   0.34   100.0  0.14   0.1    0.46   0.22   ] [ WP: country list     ]
[ WP: particle physics ]   [  0.64   0.47   0.14   100.0  2.98   0.26   0.17   ] [ WP: particle physics ]
[ WP: physics          ]   [  0.73   0.96   0.1    2.98   100.0  0.48   0.14   ] [ WP: physics          ]
[ WP: rivers           ]   [  0.24   0.35   0.46   0.26   0.48   100.0  0.21   ] [ WP: rivers           ]
[ WP: US presidents    ]   [  0.1    0.53   0.22   0.17   0.14   0.21   100.0  ] [ WP: US presidents    ]
That's all clear enough. No further comment needed.

Heh, we can even show the matrices of words, but they are way too big to post.
Instead:
matrix[words-1]
matrix[words-2]
matrix[words-3]

No comments:

Post a Comment