Friday 6 March 2015

website similarity matrices

This time, take a look at how similar websites are to themselves over the 11 days.

Here is the BKO:
-- create website similarity matrices:
-- list of abc websites, note we include |abc 11> and |average abc>
|full abc list> => |abc 1> + |abc 2> + |abc 3> + |abc 4> + |abc 5> + |abc 6> + |abc 7> + |abc 8> + |abc 9> + |abc 10> + |abc 11> + |average abc>

-- we want abc-hash to be distinct from standard hash, to reduce the matrix to abc only
abc-hash-4B |*> #=> hash-4B |_self>
|null> => map[abc-hash-4B] "" |full abc list>

-- we want the abc-simm to be distinct from the standard simm, to reduce the matrix to abc only
abc-simm |*> #=> 100 self-similar[abc-hash-4B] |_self>
|null> => map[abc-simm,abc-similarity] "" |full abc list>

-- now the rest of them:
|full adelaidenow list> => |adelaidenow 1> + |adelaidenow 2> + |adelaidenow 3> + |adelaidenow 4> + |adelaidenow 5> + |adelaidenow 6> + |adelaidenow 7> + |adelaidenow 8> + |adelaidenow 9> + |adelaidenow 10> + |adelaidenow 11> + |average adelaidenow>
adelaidenow-hash-4B |*> #=> hash-4B |_self>
|null> => map[adelaidenow-hash-4B] "" |full adelaidenow list>
adelaidenow-simm |*> #=> 100 self-similar[adelaidenow-hash-4B] |_self>
|null> => map[adelaidenow-simm,adelaidenow-similarity] "" |full adelaidenow list>

|full slashdot list> => |slashdot 1> + |slashdot 2> + |slashdot 3> + |slashdot 4> + |slashdot 5> + |slashdot 6> + |slashdot 7> + |slashdot 8> + |slashdot 9> + |slashdot 10> + |slashdot 11> + |average slashdot>
slashdot-hash-4B |*> #=> hash-4B |_self>
|null> => map[slashdot-hash-4B] "" |full slashdot list>
slashdot-simm |*> #=> 100 self-similar[slashdot-hash-4B] |_self>
|null> => map[slashdot-simm,slashdot-similarity] "" |full slashdot list>

|full smh list> => |smh 1> + |smh 2> + |smh 3> + |smh 4> + |smh 5> + |smh 6> + |smh 7> + |smh 8> + |smh 9> + |smh 10> + |smh 11> + |average smh>
smh-hash-4B |*> #=> hash-4B |_self>
|null> => map[smh-hash-4B] "" |full smh list>
smh-simm |*> #=> 100 self-similar[smh-hash-4B] |_self>
|null> => map[smh-simm,smh-similarity] "" |full smh list>

|full wikipedia list> => |wikipedia 1> + |wikipedia 2> + |wikipedia 3> + |wikipedia 4> + |wikipedia 5> + |wikipedia 6> + |wikipedia 7> + |wikipedia 8> + |wikipedia 9> + |wikipedia 10> + |wikipedia 11> + |average wikipedia>
wikipedia-hash-4B |*> #=> hash-4B |_self>
|null> => map[wikipedia-hash-4B] "" |full wikipedia list>
wikipedia-simm |*> #=> 100 self-similar[wikipedia-hash-4B] |_self>
|null> => map[wikipedia-simm,wikipedia-similarity] "" |full wikipedia list>

|full youtube list> => |youtube 1> + |youtube 2> + |youtube 3> + |youtube 4> + |youtube 5> + |youtube 6> + |youtube 7> + |youtube 8> + |youtube 9> + |youtube 10> + |youtube 11> + |average youtube>
youtube-hash-4B |*> #=> hash-4B |_self>
|null> => map[youtube-hash-4B] "" |full youtube list>
youtube-simm |*> #=> 100 self-similar[youtube-hash-4B] |_self>
|null> => map[youtube-simm,youtube-similarity] "" |full youtube list>
Here are the resulting matrices:
-- load the data:
sa: load improved-fragment-webpages.sw
sa: load create-average-website-fragments.sw
sa: load create-website-similarity-matrices.sw

-- show our resulting matrices:
sa: matrix[abc-similarity]
[ abc 1       ] = [  100.00  95.52   95.03   91.85   91.36   91.86   91.88   91.85   92.19   92.19   91.42   93.47   ] [ abc 1       ]
[ abc 2       ]   [  95.52   100.00  95.50   91.86   91.38   91.91   92.04   91.71   92.41   92.47   91.25   93.51   ] [ abc 2       ]
[ abc 3       ]   [  95.03   95.50   100.00  91.80   91.32   92.03   92.10   91.66   92.41   92.41   91.20   93.46   ] [ abc 3       ]
[ abc 4       ]   [  91.85   91.86   91.80   100.00  92.64   92.55   91.80   92.19   91.69   91.68   92.13   92.93   ] [ abc 4       ]
[ abc 5       ]   [  91.36   91.38   91.32   92.64   100.00  92.44   91.40   91.85   91.29   91.25   91.41   92.56   ] [ abc 5       ]
[ abc 6       ]   [  91.86   91.91   92.03   92.55   92.44   100.00  92.95   92.78   91.93   91.90   91.59   93.23   ] [ abc 6       ]
[ abc 7       ]   [  91.88   92.04   92.10   91.80   91.40   92.95   100.00  93.05   93.13   93.02   91.74   93.16   ] [ abc 7       ]
[ abc 8       ]   [  91.85   91.71   91.66   92.19   91.85   92.78   93.05   100.00  93.71   93.52   92.06   93.40   ] [ abc 8       ]
[ abc 9       ]   [  92.19   92.41   92.41   91.69   91.29   91.93   93.13   93.71   100.00  95.57   91.56   93.46   ] [ abc 9       ]
[ abc 10      ]   [  92.19   92.47   92.41   91.68   91.25   91.90   93.02   93.52   95.57   100.00  91.61   93.45   ] [ abc 10      ]
[ abc 11      ]   [  91.42   91.25   91.20   92.13   91.41   91.59   91.74   92.06   91.56   91.61   100.00  91.70   ] [ abc 11      ]
[ average abc ]   [  93.47   93.51   93.46   92.93   92.56   93.23   93.16   93.40   93.46   93.45   91.70   100.00  ] [ average abc ]

sa: matrix[adelaidenow-similarity]
[ adelaidenow 1       ] = [  100.00  86.13   83.22   78.11   77.27   76.23   75.86   75.61   75.79   76.08   76.13   80.64   ] [ adelaidenow 1       ]
[ adelaidenow 2       ]   [  86.13   100.00  87.38   81.46   77.52   77.60   77.07   76.90   76.26   77.26   76.53   82.36   ] [ adelaidenow 2       ]
[ adelaidenow 3       ]   [  83.22   87.38   100.00  83.60   78.24   77.29   76.68   76.56   76.45   77.41   76.22   82.18   ] [ adelaidenow 3       ]
[ adelaidenow 4       ]   [  78.11   81.46   83.60   100.00  83.39   78.50   77.28   77.24   75.75   76.77   76.67   81.82   ] [ adelaidenow 4       ]
[ adelaidenow 5       ]   [  77.27   77.52   78.24   83.39   100.00  81.76   77.29   76.84   76.31   76.39   77.43   81.12   ] [ adelaidenow 5       ]
[ adelaidenow 6       ]   [  76.23   77.60   77.29   78.50   81.76   100.00  79.82   78.85   76.34   77.09   76.98   81.08   ] [ adelaidenow 6       ]
[ adelaidenow 7       ]   [  75.86   77.07   76.68   77.28   77.29   79.82   100.00  84.92   78.38   77.13   76.90   81.08   ] [ adelaidenow 7       ]
[ adelaidenow 8       ]   [  75.61   76.90   76.56   77.24   76.84   78.85   84.92   100.00  82.06   79.31   77.56   81.59   ] [ adelaidenow 8       ]
[ adelaidenow 9       ]   [  75.79   76.26   76.45   75.75   76.31   76.34   78.38   82.06   100.00  85.68   78.15   80.79   ] [ adelaidenow 9       ]
[ adelaidenow 10      ]   [  76.08   77.26   77.41   76.77   76.39   77.09   77.13   79.31   85.68   100.00  82.97   80.86   ] [ adelaidenow 10      ]
[ adelaidenow 11      ]   [  76.13   76.53   76.22   76.67   77.43   76.98   76.90   77.56   78.15   82.97   100.00  78.11   ] [ adelaidenow 11      ]
[ average adelaidenow ]   [  80.64   82.36   82.18   81.82   81.12   81.08   81.08   81.59   80.79   80.86   78.11   100.00  ] [ average adelaidenow ]

sa: matrix[slashdot-similarity]
[ average slashdot ] = [  100.00  81.12   80.99   81.20   81.45   81.67   81.31   81.17   81.37   81.24   81.09   79.05   ] [ average slashdot ]
[ slashdot 1       ]   [  81.12   100.00  79.43   77.62   79.45   78.59   78.19   78.66   78.62   77.47   79.05   79.06   ] [ slashdot 1       ]
[ slashdot 2       ]   [  80.99   79.43   100.00  79.26   78.77   78.15   79.32   77.31   77.63   78.11   78.44   77.96   ] [ slashdot 2       ]
[ slashdot 3       ]   [  81.20   77.62   79.26   100.00  79.08   78.22   79.18   78.34   77.85   78.71   78.50   78.37   ] [ slashdot 3       ]
[ slashdot 4       ]   [  81.45   79.45   78.77   79.08   100.00  81.20   78.27   79.20   78.12   78.15   78.83   78.82   ] [ slashdot 4       ]
[ slashdot 5       ]   [  81.67   78.59   78.15   78.22   81.20   100.00  78.58   79.85   79.29   79.04   78.56   77.78   ] [ slashdot 5       ]
[ slashdot 6       ]   [  81.31   78.19   79.32   79.18   78.27   78.58   100.00  78.62   78.54   79.33   78.07   78.00   ] [ slashdot 6       ]
[ slashdot 7       ]   [  81.17   78.66   77.31   78.34   79.20   79.85   78.62   100.00  80.17   78.86   78.40   78.65   ] [ slashdot 7       ]
[ slashdot 8       ]   [  81.37   78.62   77.63   77.85   78.12   79.29   78.54   80.17   100.00  79.38   79.60   79.02   ] [ slashdot 8       ]
[ slashdot 9       ]   [  81.24   77.47   78.11   78.71   78.15   79.04   79.33   78.86   79.38   100.00  78.32   78.34   ] [ slashdot 9       ]
[ slashdot 10      ]   [  81.09   79.05   78.44   78.50   78.83   78.56   78.07   78.40   79.60   78.32   100.00  81.39   ] [ slashdot 10      ]
[ slashdot 11      ]   [  79.05   79.06   77.96   78.37   78.82   77.78   78.00   78.65   79.02   78.34   81.39   100.00  ] [ slashdot 11      ]

sa: matrix[smh-similarity]
[ average smh ] = [  100.00  87.76   88.16   87.95   87.62   86.82   87.31   87.12   87.72   87.54   87.61   85.55   ] [ average smh ]
[ smh 1       ]   [  87.76   100.00  89.44   87.69   86.23   85.33   85.30   84.97   85.34   84.70   85.04   84.75   ] [ smh 1       ]
[ smh 2       ]   [  88.16   89.44   100.00  89.80   86.25   85.27   85.58   85.36   85.54   85.61   85.38   85.40   ] [ smh 2       ]
[ smh 3       ]   [  87.95   87.69   89.80   100.00  86.81   85.04   85.31   85.40   85.21   85.19   85.47   84.93   ] [ smh 3       ]
[ smh 4       ]   [  87.62   86.23   86.25   86.81   100.00  86.63   86.12   85.06   85.64   85.15   85.34   85.00   ] [ smh 4       ]
[ smh 5       ]   [  86.82   85.33   85.27   85.04   86.63   100.00  85.24   84.36   85.20   84.55   85.00   84.99   ] [ smh 5       ]
[ smh 6       ]   [  87.31   85.30   85.58   85.31   86.12   85.24   100.00  86.19   86.59   85.03   85.26   85.81   ] [ smh 6       ]
[ smh 7       ]   [  87.12   84.97   85.36   85.40   85.06   84.36   86.19   100.00  86.36   85.48   85.38   85.48   ] [ smh 7       ]
[ smh 8       ]   [  87.72   85.34   85.54   85.21   85.64   85.20   86.59   86.36   100.00  87.76   86.94   85.89   ] [ smh 8       ]
[ smh 9       ]   [  87.54   84.70   85.61   85.19   85.15   84.55   85.03   85.48   87.76   100.00  90.65   85.16   ] [ smh 9       ]
[ smh 10      ]   [  87.61   85.04   85.38   85.47   85.34   85.00   85.26   85.38   86.94   90.65   100.00  85.75   ] [ smh 10      ]
[ smh 11      ]   [  85.55   84.75   85.40   84.93   85.00   84.99   85.81   85.48   85.89   85.16   85.75   100.00  ] [ smh 11      ]

sa: matrix[wikipedia-similarity]
[ average wikipedia ] = [  100.00  87.67   87.51   87.82   85.86   88.06   88.10   87.71   86.65   87.33   87.47   85.19   ] [ average wikipedia ]
[ wikipedia 1       ]   [  87.67   100.00  88.60   87.80   85.57   85.41   85.93   84.48   84.13   83.71   84.48   84.21   ] [ wikipedia 1       ]
[ wikipedia 2       ]   [  87.51   88.60   100.00  89.28   84.95   85.85   86.16   85.72   82.95   83.88   86.01   82.89   ] [ wikipedia 2       ]
[ wikipedia 3       ]   [  87.82   87.80   89.28   100.00  85.36   86.57   86.41   85.67   83.09   84.50   85.48   83.13   ] [ wikipedia 3       ]
[ wikipedia 4       ]   [  85.86   85.57   84.95   85.36   100.00  84.04   83.77   82.42   83.65   82.37   82.08   83.32   ] [ wikipedia 4       ]
[ wikipedia 5       ]   [  88.06   85.41   85.85   86.57   84.04   100.00  88.34   86.72   84.71   85.80   85.91   84.22   ] [ wikipedia 5       ]
[ wikipedia 6       ]   [  88.10   85.93   86.16   86.41   83.77   88.34   100.00  87.70   85.43   85.85   86.64   84.31   ] [ wikipedia 6       ]
[ wikipedia 7       ]   [  87.71   84.48   85.72   85.67   82.42   86.72   87.70   100.00  86.18   87.02   88.46   85.24   ] [ wikipedia 7       ]
[ wikipedia 8       ]   [  86.65   84.13   82.95   83.09   83.65   84.71   85.43   86.18   100.00  85.88   85.55   86.28   ] [ wikipedia 8       ]
[ wikipedia 9       ]   [  87.33   83.71   83.88   84.50   82.37   85.80   85.85   87.02   85.88   100.00  88.10   86.46   ] [ wikipedia 9       ]
[ wikipedia 10      ]   [  87.47   84.48   86.01   85.48   82.08   85.91   86.64   88.46   85.55   88.10   100.00  87.17   ] [ wikipedia 10      ]
[ wikipedia 11      ]   [  85.19   84.21   82.89   83.13   83.32   84.22   84.31   85.24   86.28   86.46   87.17   100.00  ] [ wikipedia 11      ]

sa: matrix[youtube-similarity]
[ average youtube ] = [  100.00  85.21   84.63   85.38   85.98   86.04   85.33   84.43   84.90   81.10   84.43   82.12   ] [ average youtube ]
[ youtube 1       ]   [  85.21   100.00  83.96   84.76   85.73   85.18   85.04   80.41   81.58   77.43   82.51   79.80   ] [ youtube 1       ]
[ youtube 2       ]   [  84.63   83.96   100.00  87.30   85.63   84.66   81.56   81.12   79.96   79.08   78.90   79.32   ] [ youtube 2       ]
[ youtube 3       ]   [  85.38   84.76   87.30   100.00  86.54   87.12   83.85   80.31   80.86   78.14   80.15   78.80   ] [ youtube 3       ]
[ youtube 4       ]   [  85.98   85.73   85.63   86.54   100.00  89.46   85.78   81.19   82.22   76.97   81.13   79.50   ] [ youtube 4       ]
[ youtube 5       ]   [  86.04   85.18   84.66   87.12   89.46   100.00  86.87   81.77   81.96   77.08   81.08   79.79   ] [ youtube 5       ]
[ youtube 6       ]   [  85.33   85.04   81.56   83.85   85.78   86.87   100.00  82.21   82.71   78.36   81.86   80.81   ] [ youtube 6       ]
[ youtube 7       ]   [  84.43   80.41   81.12   80.31   81.19   81.77   82.21   100.00  85.97   82.26   84.98   85.98   ] [ youtube 7       ]
[ youtube 8       ]   [  84.90   81.58   79.96   80.86   82.22   81.96   82.71   85.97   100.00  81.99   87.14   84.61   ] [ youtube 8       ]
[ youtube 9       ]   [  81.10   77.43   79.08   78.14   76.97   77.08   78.36   82.26   81.99   100.00  82.12   82.82   ] [ youtube 9       ]
[ youtube 10      ]   [  84.43   82.51   78.90   80.15   81.13   81.08   81.86   84.98   87.14   82.12   100.00  86.58   ] [ youtube 10      ]
[ youtube 11      ]   [  82.12   79.80   79.32   78.80   79.50   79.79   80.81   85.98   84.61   82.82   86.58   100.00  ] [ youtube 11      ]
OK. That is kind of cool. Though the matrices will presumably line-wrap if your screen is too small. Take home message, all webpages are greater than 75% similar with themselves over the 11 day period. Which I guess means we don't even need to average over 10 days! Presumably the average will give better results though.

No comments:

Post a Comment