Friday 5 June 2015

more inverse-simm results

OK. In the last post we discovered similar[inverse-links-to] seems to give some good results. Let's expand our test set, and try it on a few more examples.
-- define our test set:
|list> => |WP: Erwin_Schrdinger> + |WP: Richard_Feynman> + |WP: Cat> + |WP: Dog> + |WP: Apple> + |WP: Adelaide> + |WP: University_of_Adelaide> + |WP: Particle_physics> + |WP: Lisp_(programming_language)> + |WP: APL_(programming_language)> + |WP: SQL> + |WP: SPARQL> + |WP: The_Doors> + |WP: Rugby> + |WP: Australian_Football_League>

-- how many incoming links?
sa: how-many-in-links |*> #=> how-many inverse-links-to |_self>
sa: table[wikipage,how-many-in-links] "" |list>
+-----------------------------+-------------------+
| wikipage                    | how-many-in-links |
+-----------------------------+-------------------+
| Erwin_Schrdinger            | 53                |
| Richard_Feynman             | 79                |
| Cat                         | 14                |
| Dog                         | 24                |
| Apple                       | 21                |
| Adelaide                    | 81                |
| University_of_Adelaide      | 10                |
| Particle_physics            | 17                |
| Lisp_(programming_language) | 64                |
| APL_(programming_language)  | 24                |
| SQL                         | 41                |
| SPARQL                      | 6                 |
| The_Doors                   | 41                |
| Rugby                       | 0                 |
| Australian_Football_League  | 30                |
+-----------------------------+-------------------+

-- create the data:
sa: inverse-simm-op |WP: *> #=> select[1,500] 100 self-similar[inverse-links-to] |_self>
sa: |null> => map[inverse-simm-op,inverse-simm] "" |list>

-- define an operator to explore the resulting data:
sa: T |*> #=> table[wikipage,coeff] select[1,20] inverse-simm |_self>

-- now our examples:
sa: T |WP: Erwin_Schrdinger>
+---------------------------+--------+
| wikipage                  | coeff  |
+---------------------------+--------+
| Erwin_Schrdinger          | 100.0  |
| Max_Born                  | 32.075 |
| Niels_Bohr                | 31.646 |
| Schrdinger_equation       | 30.189 |
| Paul_Dirac                | 29.31  |
| Wolfgang_Pauli            | 28.302 |
| Werner_Heisenberg         | 28.049 |
| Max_Planck                | 26.984 |
| uncertainty_principle     | 26.415 |
| photoelectric_effect      | 24.528 |
| Roger_Penrose             | 22.642 |
| Bohr_model                | 20.755 |
| Arnold_Sommerfeld         | 20.755 |
| Louis_de_Broglie          | 20.755 |
| wave_function             | 20.755 |
| Copenhagen_interpretation | 18.868 |
| quantum_state             | 18.868 |
| Ernest_Rutherford         | 17.742 |
| Maxwell's_equations       | 17.241 |
| Pauli_exclusion_principle | 16.981 |
+---------------------------+--------+

sa: T |WP: Richard_Feynman>
+------------------------------------+--------+
| wikipage                           | coeff  |
+------------------------------------+--------+
| Richard_Feynman                    | 100.0  |
| Werner_Heisenberg                  | 24.39  |
| special_relativity                 | 20.792 |
| Niels_Bohr                         | 20.253 |
| Paul_Dirac                         | 20.253 |
| particle_physics                   | 20.225 |
| classical_mechanics                | 20.0   |
| fermion                            | 18.987 |
| spin_(physics)                     | 18.987 |
| Standard_Model                     | 17.722 |
| Schrdinger_equation                | 17.722 |
| quantum_field_theory               | 17.722 |
| electromagnetism                   | 17.241 |
| Erwin_Schrdinger                   | 16.456 |
| Pauli_exclusion_principle          | 16.456 |
| quark                              | 16.456 |
| Stephen_Hawking                    | 16.456 |
| quantum_electrodynamics            | 16.456 |
| Julian_Schwinger                   | 16.456 |
| Category:Concepts_in_physics | 16.279 |
+------------------------------------+--------+

sa: T |WP: Cat>
+----------+--------+
| wikipage | coeff  |
+----------+--------+
| Cat      | 100.0  |
| Horse    | 31.25  |
| Donkey   | 28.571 |
| Goat     | 28.571 |
| Elephant | 21.429 |
| Pig      | 21.429 |
| Rabbit   | 21.429 |
| Deer     | 21.429 |
| Mule     | 21.429 |
| Goose    | 21.429 |
| Dog      | 20.833 |
| Sheep    | 20     |
| Lion     | 18.75  |
| Almond   | 14.286 |
| Alder    | 14.286 |
| Ant      | 14.286 |
| Bear     | 14.286 |
| Bee      | 14.286 |
| Fox      | 14.286 |
| Lizard   | 14.286 |
+----------+--------+

sa: T |WP: Dog>
+------------------+--------+
| wikipage         | coeff  |
+------------------+--------+
| Dog              | 100.0  |
| Horse            | 29.167 |
| coyote           | 22.222 |
| Gray_wolf        | 21.429 |
| Arctic_fox       | 20.833 |
| Cat              | 20.833 |
| Canidae          | 20.833 |
| Elephant         | 20.833 |
| bobcat           | 20.833 |
| red_fox          | 20.833 |
| Donkey           | 20.833 |
| red_wolf         | 20.833 |
| Rabbit           | 16.667 |
| African_wild_dog | 16.667 |
| gray_wolf        | 16.667 |
| Domestic_sheep   | 16.667 |
| dingo            | 16.667 |
| Goat             | 16.667 |
| Cattle           | 16.667 |
| otter            | 16.667 |
+------------------+--------+

sa: T |WP: Apple>
+----------------+--------+
| wikipage       | coeff  |
+----------------+--------+
| Apple          | 100.0  |
| Strawberry     | 33.333 |
| Cranberry      | 23.81  |
| Grape          | 23.81  |
| Tomato         | 23.81  |
| Cherry         | 23.81  |
| Kiwifruit      | 19.048 |
| Blackberry     | 19.048 |
| plum           | 19.048 |
| Lime_(fruit)   | 19.048 |
| Pineapple      | 19.048 |
| Lemon          | 19.048 |
| Blueberry      | 19.048 |
| peach          | 17.5   |
| pear           | 17.391 |
| Orange_(fruit) | 16.667 |
| Pear           | 14.286 |
| Banana         | 14.286 |
| Peach          | 14.286 |
| Squash_(plant) | 14.286 |
+----------------+--------+

sa: T |WP: Adelaide>
+-------------------------------------+--------+
| wikipage                            | coeff  |
+-------------------------------------+--------+
| Adelaide                            | 100.0  |
| Brisbane                            | 37.079 |
| Perth                               | 32.099 |
| South_Australia                     | 26.042 |
| Melbourne                           | 18.687 |
| Canberra                            | 15.044 |
| The_Age                             | 14.634 |
| Sydney                              | 14.583 |
| Australian_Broadcasting_Corporation | 13.445 |
| Australian_rules_football           | 12.346 |
| Auckland                            | 12.195 |
| Australian_Labor_Party              | 11.111 |
| Darwin,_Northern_Territory          | 11.111 |
| Triple_J                            | 11.111 |
| Seven_Network                       | 11.111 |
| States_and_territories_of_Australia | 11.111 |
| Karachi                             | 10.989 |
| Australian_Football_League          | 9.877  |
| The_Australian                      | 9.877  |
| Western_Australia                   | 8.911  |
+-------------------------------------+--------+

sa: T |WP: University_of_Adelaide>
+----------------------------------------+-------+
| wikipage                               | coeff |
+----------------------------------------+-------+
| University_of_Adelaide                 | 100.0 |
| Port_Adelaide_Football_Club            | 20    |
| Adelaide_Oval                          | 20    |
| Adelaide_city_centre                   | 20    |
| University_of_South_Australia          | 20    |
| Port_Adelaide                          | 20    |
| Australian_Grand_Prix                  | 20    |
| State_Bank_of_South_Australia          | 20    |
| Mount_Lofty                            | 20    |
| South_Eastern_Freeway                  | 20    |
| Southern_Expressway_(Australia)        | 20    |
| Government_of_South_Australia          | 20    |
| Flinders_University_of_South_Australia | 20    |
| South_Australian_Museum                | 20    |
| Adelaide_Crows                         | 20    |
| Glenelg,_South_Australia               | 20    |
| Australian_Central_Standard_Time       | 20    |
| Australian_Central_Daylight_Time       | 20    |
| Fleurieu_Peninsula                     | 20    |
| River_Torrens                          | 20    |
+----------------------------------------+-------+

sa: T |WP: Particle_physics>
+----------------------------------------+--------+
| wikipage                               | coeff  |
+----------------------------------------+--------+
| Particle_physics                       | 100    |
| Optics                                 | 20     |
| Cosmology                              | 20     |
| Acoustics                              | 17.647 |
| Condensed_matter_physics               | 17.647 |
| Fluid_dynamics                         | 17.647 |
| Thermodynamics                         | 17.647 |
| kinematics                             | 17.647 |
| atomic,_molecular,_and_optical_physics | 17.647 |
| cosmic_inflation                       | 17.647 |
| Fluid_statics                          | 17.647 |
| Lambda-CDM_model                       | 17.647 |
| Biophysics                             | 17.647 |
| Category:Physics                 | 17.647 |
| Lev_Landau                             | 15.789 |
| Nuclear_physics                        | 14.286 |
| virtual_particle                       | 14.286 |
| quantum_chemistry                      | 13.636 |
| American_Physical_Society              | 13.333 |
| Elementary_particle                    | 13.043 |
+----------------------------------------+--------+

sa: T |WP: Lisp_(programming_language)>
+------------------------------------+--------+
| wikipage                           | coeff  |
+------------------------------------+--------+
| Lisp_(programming_language)        | 100    |
| Smalltalk                          | 28.125 |
| Pascal_(programming_language)      | 24.675 |
| Fortran                            | 23.881 |
| Scheme_(programming_language)      | 23.438 |
| Ruby_(programming_language)        | 23.188 |
| object-oriented_programming        | 21.875 |
| PHP                                | 20.779 |
| Prolog                             | 20.312 |
| Haskell_(programming_language)     | 20.312 |
| Ada_(programming_language)         | 18.75  |
| APL_(programming_language)         | 18.75  |
| BASIC                              | 18.75  |
| COBOL                              | 18.75  |
| functional_programming             | 18.75  |
| John_McCarthy_(computer_scientist) | 18.75  |
| C_Sharp_(programming_language)     | 18.75  |
| programming_language               | 17.647 |
| JavaScript                         | 17.526 |
| compiler                           | 17     |
+------------------------------------+--------+

sa: T |WP: APL_(programming_language)>
+------------------------------------+--------+
| wikipage                           | coeff  |
+------------------------------------+--------+
| APL_(programming_language)         | 100.0  |
| Kenneth_E._Iverson                 | 33.333 |
| John_McCarthy_(computer_scientist) | 25.0   |
| John_Backus                        | 25.0   |
| Prolog                             | 23.333 |
| Alan_Kay                           | 20.833 |
| AWK                                | 20.833 |
| Grace_Hopper                       | 20.833 |
| ML_(programming_language)          | 20.833 |
| Niklaus_Wirth                      | 20.833 |
| logic_programming                  | 20.833 |
| J_(programming_language)           | 20.833 |
| bytecode                           | 19.231 |
| Lisp_(programming_language)        | 18.75  |
| programmer                         | 17.857 |
| Objective-C                        | 17.241 |
| BASIC                              | 17.188 |
| Mathematica                        | 17.143 |
| SQL                                | 17.073 |
| ALGOL                              | 16.667 |
+------------------------------------+--------+

sa: T |WP: SQL>
+----------------------------------------+--------+
| wikipage                               | coeff  |
+----------------------------------------+--------+
| SQL                                    | 100.0  |
| Haskell_(programming_language)         | 22.727 |
| PHP                                    | 19.481 |
| APL_(programming_language)             | 17.073 |
| Category:Cross-platform_software | 17.073 |
| Visual_Basic                           | 17.073 |
| relational_database                    | 17.073 |
| COBOL                                  | 16.327 |
| PostgreSQL                             | 14.634 |
| R_(programming_language)               | 14.634 |
| Run_time_(program_lifecycle_phase)     | 14.634 |
| Ruby_(programming_language)            | 14.493 |
| JavaScript                             | 13.402 |
| C_Sharp_(programming_language)         | 13.208 |
| database                               | 12.644 |
| Lisp_(programming_language)            | 12.5   |
| Common_Lisp                            | 12.195 |
| Graphical_user_interface               | 12.195 |
| MySQL                                  | 12.195 |
| Mathematica                            | 12.195 |
+----------------------------------------+--------+

sa: T |WP: SPARQL>
+--------------------------------------------------------------------------------------------+--------+
| wikipage                                                                                   | coeff  |
+--------------------------------------------------------------------------------------------+--------+
| SPARQL                                                                                     | 100.0  |
| Web_Ontology_Language                                                                      | 33.333 |
| Agris:_International_Information_System_for_the_Agricultural_Sciences_and_Technology | 33.333 |
| W3C_XML_Schema                                                                             | 33.333 |
| GRDDL                                                                                      | 33.333 |
| Conceptual_interoperability                                                                | 33.333 |
| Category:Web_services                                                                | 33.333 |
| Resource_Description_Framework                                                             | 20     |
| Analytic_geometry                                                                          | 16.667 |
| DHTML                                                                                      | 16.667 |
| Interpolation                                                                              | 16.667 |
| GNU_nano                                                                                   | 16.667 |
| Pico_(text_editor)                                                                         | 16.667 |
| Relational_database                                                                        | 16.667 |
| Sir_Charles_Lyell                                                                          | 16.667 |
| Synchronized_Multimedia_Integration_Language                                               | 16.667 |
| Semantic_network                                                                           | 16.667 |
| Backronym                                                                                  | 16.667 |
| Interoperability                                                                           | 16.667 |
| RAS_syndrome                                                                               | 16.667 |
+--------------------------------------------------------------------------------------------+--------+

sa: T |WP: The_Doors>
+---------------------------------------+--------+
| wikipage                              | coeff  |
+---------------------------------------+--------+
| The_Doors                             | 100.0  |
| Jim_Morrison                          | 31.707 |
| Ray_Manzarek                          | 21.951 |
| Jefferson_Airplane                    | 17.073 |
| The_Band                              | 14.634 |
| Bee_Gees                              | 14.634 |
| Janis_Joplin                          | 12.195 |
| Summer_of_Love                        | 12.195 |
| Timothy_Leary                         | 12.195 |
| folk_rock                             | 12.195 |
| Governor_of_American_Samoa            | 12.195 |
| Cream_(band)                          | 12.195 |
| Sgt._Pepper's_Lonely_Hearts_Club_Band | 12.195 |
| The_Byrds                             | 12.195 |
| Joan_Baez                             | 12.195 |
| Iron_Butterfly                        | 12.195 |
| Brian_Jones                           | 12.195 |
| Creedence_Clearwater_Revival          | 12.195 |
| Johnny_Winter                         | 12.195 |
| The_Yardbirds                         | 12.195 |
+---------------------------------------+--------+

sa: T |WP: Rugby>
+----------+-------+
| wikipage | coeff |
+----------+-------+
+----------+-------+

sa: T |WP: Australian_Football_League>
+---------------------------------+--------+
| wikipage                        | coeff  |
+---------------------------------+--------+
| Australian_Football_League      | 100.0  |
| West_Coast_Eagles               | 40     |
| Richmond_Football_Club          | 40     |
| Sydney_Swans                    | 36.667 |
| St_Kilda_Football_Club          | 36.667 |
| Collingwood_Football_Club       | 36.667 |
| Hawthorn_Football_Club          | 36.667 |
| Australian_rules_football       | 33.871 |
| Essendon_Football_Club          | 33.333 |
| North_Melbourne_Football_Club   | 33.333 |
| Western_Bulldogs                | 33.333 |
| Carlton_Football_Club           | 33.333 |
| Brownlow_Medal                  | 33.333 |
| Seven_Network                   | 32.353 |
| Melbourne_Cricket_Ground        | 30     |
| Australian_Bureau_of_Statistics | 30     |
| Special_Broadcasting_Service    | 30     |
| Melbourne_Football_Club         | 30     |
| Fitzroy_Football_Club           | 30     |
| 2001_AFL_season                 | 30     |
+---------------------------------+--------+
Wow! That works unbelievably well. I don't know exactly why, but hey. And why similar[inverse-links-to] works better than similar[links-to], I don't know either! A question that comes to mind is, if we use a larger subset of wikipedia, will these results get better or worse? I suspect better, but not sure.

BTW, this is from:
100*30000/15559125
= 0.19 %
of the total English wikipedia.

And the resulting sw file is here.

No comments:

Post a Comment