If we just keep 4 digits of the hash, there are 64k possible kets.
5 digits, then roughly 1 million.
8 digits, then roughly 4 billion possible kets.
Here are the actual numbers:
sa: load improved-fragment-webpages.sw sa: table[page,count-1-hash-64k,count-1-hash-1M,count-1-hash-4B] rel-kets[hash-64k] |> +----------------+------------------+-----------------+-----------------+ | page | count-1-hash-64k | count-1-hash-1M | count-1-hash-4B | +----------------+------------------+-----------------+-----------------+ | abc 1 | 650 | 653 | 653 | | abc 2 | 652 | 655 | 655 | | abc 3 | 652 | 656 | 656 | | abc 4 | 661 | 664 | 664 | | abc 5 | 660 | 665 | 665 | | abc 6 | 660 | 663 | 663 | | abc 7 | 651 | 654 | 654 | | abc 8 | 659 | 663 | 663 | | abc 9 | 650 | 654 | 654 | | abc 10 | 655 | 658 | 658 | | abc 11 | 660 | 664 | 664 | | adelaidenow 1 | 2635 | 2698 | 2704 | | adelaidenow 2 | 2718 | 2775 | 2780 | | adelaidenow 3 | 2697 | 2751 | 2757 | | adelaidenow 4 | 2701 | 2764 | 2770 | | adelaidenow 5 | 2674 | 2745 | 2750 | | adelaidenow 6 | 2675 | 2739 | 2744 | | adelaidenow 7 | 2716 | 2767 | 2772 | | adelaidenow 8 | 2758 | 2821 | 2827 | | adelaidenow 9 | 2704 | 2774 | 2778 | | adelaidenow 10 | 2693 | 2755 | 2760 | | adelaidenow 11 | 2748 | 2807 | 2814 | | slashdot 1 | 1153 | 1159 | 1160 | | slashdot 2 | 1157 | 1164 | 1165 | | slashdot 3 | 1164 | 1168 | 1170 | | slashdot 4 | 1133 | 1137 | 1138 | | slashdot 5 | 1163 | 1169 | 1169 | | slashdot 6 | 1182 | 1189 | 1190 | | slashdot 7 | 1149 | 1161 | 1162 | | slashdot 8 | 1155 | 1170 | 1170 | | slashdot 9 | 1160 | 1167 | 1167 | | slashdot 10 | 1131 | 1145 | 1145 | | slashdot 11 | 1137 | 1146 | 1146 | | smh 1 | 2584 | 2636 | 2636 | | smh 2 | 2595 | 2645 | 2651 | | smh 3 | 2613 | 2661 | 2661 | | smh 4 | 2572 | 2632 | 2635 | | smh 5 | 2578 | 2625 | 2628 | | smh 6 | 2605 | 2654 | 2656 | | smh 7 | 2610 | 2675 | 2676 | | smh 8 | 2592 | 2632 | 2632 | | smh 9 | 2569 | 2622 | 2622 | | smh 10 | 2560 | 2606 | 2607 | | smh 11 | 2629 | 2686 | 2690 | | wikipedia 1 | 1078 | 1082 | 1082 | | wikipedia 2 | 1037 | 1043 | 1043 | | wikipedia 3 | 1070 | 1074 | 1075 | | wikipedia 4 | 1144 | 1151 | 1152 | | wikipedia 5 | 1080 | 1084 | 1085 | | wikipedia 6 | 1068 | 1076 | 1077 | | wikipedia 7 | 1054 | 1059 | 1060 | | wikipedia 8 | 1112 | 1119 | 1119 | | wikipedia 9 | 1101 | 1111 | 1111 | | wikipedia 10 | 1046 | 1055 | 1055 | | wikipedia 11 | 1116 | 1125 | 1125 | | youtube 1 | 1363 | 1379 | 1379 | | youtube 2 | 1289 | 1301 | 1301 | | youtube 3 | 1284 | 1299 | 1299 | | youtube 4 | 1374 | 1388 | 1389 | | youtube 5 | 1369 | 1387 | 1389 | | youtube 6 | 1363 | 1383 | 1384 | | youtube 7 | 1382 | 1396 | 1400 | | youtube 8 | 1383 | 1396 | 1398 | | youtube 9 | 1104 | 1114 | 1115 | | youtube 10 | 1376 | 1391 | 1392 | | youtube 11 | 1378 | 1387 | 1388 | +----------------+------------------+-----------------+-----------------+So we observe the actual ket count for the 64k case is only just smaller than the 4B case. So our 4B superpositions are really very sparse. A big win for the superposition representation over that of a vector/array representation. I guess it also means 64k superpositions are sufficient to represent webpages. Though it does leave open the question of how many webpages we can store using 64k superpositions before distinct webpages accidentally look similar.
That's it for this post. More on this topic in future posts.
No comments:
Post a Comment