-- define our test set: |list> => |WP: Erwin_Schrdinger> + |WP: Richard_Feynman> + |WP: Cat> + |WP: Dog> + |WP: Apple> + |WP: Adelaide> + |WP: University_of_Adelaide> + |WP: Particle_physics> + |WP: Lisp_(programming_language)> + |WP: APL_(programming_language)> + |WP: SQL> + |WP: SPARQL> + |WP: The_Doors> + |WP: Rugby> + |WP: Australian_Football_League> -- how many incoming links? sa: how-many-in-links |*> #=> how-many inverse-links-to |_self> sa: table[wikipage,how-many-in-links] "" |list> +-----------------------------+-------------------+ | wikipage | how-many-in-links | +-----------------------------+-------------------+ | Erwin_Schrdinger | 53 | | Richard_Feynman | 79 | | Cat | 14 | | Dog | 24 | | Apple | 21 | | Adelaide | 81 | | University_of_Adelaide | 10 | | Particle_physics | 17 | | Lisp_(programming_language) | 64 | | APL_(programming_language) | 24 | | SQL | 41 | | SPARQL | 6 | | The_Doors | 41 | | Rugby | 0 | | Australian_Football_League | 30 | +-----------------------------+-------------------+ -- create the data: sa: inverse-simm-op |WP: *> #=> select[1,500] 100 self-similar[inverse-links-to] |_self> sa: |null> => map[inverse-simm-op,inverse-simm] "" |list> -- define an operator to explore the resulting data: sa: T |*> #=> table[wikipage,coeff] select[1,20] inverse-simm |_self> -- now our examples: sa: T |WP: Erwin_Schrdinger> +---------------------------+--------+ | wikipage | coeff | +---------------------------+--------+ | Erwin_Schrdinger | 100.0 | | Max_Born | 32.075 | | Niels_Bohr | 31.646 | | Schrdinger_equation | 30.189 | | Paul_Dirac | 29.31 | | Wolfgang_Pauli | 28.302 | | Werner_Heisenberg | 28.049 | | Max_Planck | 26.984 | | uncertainty_principle | 26.415 | | photoelectric_effect | 24.528 | | Roger_Penrose | 22.642 | | Bohr_model | 20.755 | | Arnold_Sommerfeld | 20.755 | | Louis_de_Broglie | 20.755 | | wave_function | 20.755 | | Copenhagen_interpretation | 18.868 | | quantum_state | 18.868 | | Ernest_Rutherford | 17.742 | | Maxwell's_equations | 17.241 | | Pauli_exclusion_principle | 16.981 | +---------------------------+--------+ sa: T |WP: Richard_Feynman> +------------------------------------+--------+ | wikipage | coeff | +------------------------------------+--------+ | Richard_Feynman | 100.0 | | Werner_Heisenberg | 24.39 | | special_relativity | 20.792 | | Niels_Bohr | 20.253 | | Paul_Dirac | 20.253 | | particle_physics | 20.225 | | classical_mechanics | 20.0 | | fermion | 18.987 | | spin_(physics) | 18.987 | | Standard_Model | 17.722 | | Schrdinger_equation | 17.722 | | quantum_field_theory | 17.722 | | electromagnetism | 17.241 | | Erwin_Schrdinger | 16.456 | | Pauli_exclusion_principle | 16.456 | | quark | 16.456 | | Stephen_Hawking | 16.456 | | quantum_electrodynamics | 16.456 | | Julian_Schwinger | 16.456 | | Category:Concepts_in_physics | 16.279 | +------------------------------------+--------+ sa: T |WP: Cat> +----------+--------+ | wikipage | coeff | +----------+--------+ | Cat | 100.0 | | Horse | 31.25 | | Donkey | 28.571 | | Goat | 28.571 | | Elephant | 21.429 | | Pig | 21.429 | | Rabbit | 21.429 | | Deer | 21.429 | | Mule | 21.429 | | Goose | 21.429 | | Dog | 20.833 | | Sheep | 20 | | Lion | 18.75 | | Almond | 14.286 | | Alder | 14.286 | | Ant | 14.286 | | Bear | 14.286 | | Bee | 14.286 | | Fox | 14.286 | | Lizard | 14.286 | +----------+--------+ sa: T |WP: Dog> +------------------+--------+ | wikipage | coeff | +------------------+--------+ | Dog | 100.0 | | Horse | 29.167 | | coyote | 22.222 | | Gray_wolf | 21.429 | | Arctic_fox | 20.833 | | Cat | 20.833 | | Canidae | 20.833 | | Elephant | 20.833 | | bobcat | 20.833 | | red_fox | 20.833 | | Donkey | 20.833 | | red_wolf | 20.833 | | Rabbit | 16.667 | | African_wild_dog | 16.667 | | gray_wolf | 16.667 | | Domestic_sheep | 16.667 | | dingo | 16.667 | | Goat | 16.667 | | Cattle | 16.667 | | otter | 16.667 | +------------------+--------+ sa: T |WP: Apple> +----------------+--------+ | wikipage | coeff | +----------------+--------+ | Apple | 100.0 | | Strawberry | 33.333 | | Cranberry | 23.81 | | Grape | 23.81 | | Tomato | 23.81 | | Cherry | 23.81 | | Kiwifruit | 19.048 | | Blackberry | 19.048 | | plum | 19.048 | | Lime_(fruit) | 19.048 | | Pineapple | 19.048 | | Lemon | 19.048 | | Blueberry | 19.048 | | peach | 17.5 | | pear | 17.391 | | Orange_(fruit) | 16.667 | | Pear | 14.286 | | Banana | 14.286 | | Peach | 14.286 | | Squash_(plant) | 14.286 | +----------------+--------+ sa: T |WP: Adelaide> +-------------------------------------+--------+ | wikipage | coeff | +-------------------------------------+--------+ | Adelaide | 100.0 | | Brisbane | 37.079 | | Perth | 32.099 | | South_Australia | 26.042 | | Melbourne | 18.687 | | Canberra | 15.044 | | The_Age | 14.634 | | Sydney | 14.583 | | Australian_Broadcasting_Corporation | 13.445 | | Australian_rules_football | 12.346 | | Auckland | 12.195 | | Australian_Labor_Party | 11.111 | | Darwin,_Northern_Territory | 11.111 | | Triple_J | 11.111 | | Seven_Network | 11.111 | | States_and_territories_of_Australia | 11.111 | | Karachi | 10.989 | | Australian_Football_League | 9.877 | | The_Australian | 9.877 | | Western_Australia | 8.911 | +-------------------------------------+--------+ sa: T |WP: University_of_Adelaide> +----------------------------------------+-------+ | wikipage | coeff | +----------------------------------------+-------+ | University_of_Adelaide | 100.0 | | Port_Adelaide_Football_Club | 20 | | Adelaide_Oval | 20 | | Adelaide_city_centre | 20 | | University_of_South_Australia | 20 | | Port_Adelaide | 20 | | Australian_Grand_Prix | 20 | | State_Bank_of_South_Australia | 20 | | Mount_Lofty | 20 | | South_Eastern_Freeway | 20 | | Southern_Expressway_(Australia) | 20 | | Government_of_South_Australia | 20 | | Flinders_University_of_South_Australia | 20 | | South_Australian_Museum | 20 | | Adelaide_Crows | 20 | | Glenelg,_South_Australia | 20 | | Australian_Central_Standard_Time | 20 | | Australian_Central_Daylight_Time | 20 | | Fleurieu_Peninsula | 20 | | River_Torrens | 20 | +----------------------------------------+-------+ sa: T |WP: Particle_physics> +----------------------------------------+--------+ | wikipage | coeff | +----------------------------------------+--------+ | Particle_physics | 100 | | Optics | 20 | | Cosmology | 20 | | Acoustics | 17.647 | | Condensed_matter_physics | 17.647 | | Fluid_dynamics | 17.647 | | Thermodynamics | 17.647 | | kinematics | 17.647 | | atomic,_molecular,_and_optical_physics | 17.647 | | cosmic_inflation | 17.647 | | Fluid_statics | 17.647 | | Lambda-CDM_model | 17.647 | | Biophysics | 17.647 | | Category:Physics | 17.647 | | Lev_Landau | 15.789 | | Nuclear_physics | 14.286 | | virtual_particle | 14.286 | | quantum_chemistry | 13.636 | | American_Physical_Society | 13.333 | | Elementary_particle | 13.043 | +----------------------------------------+--------+ sa: T |WP: Lisp_(programming_language)> +------------------------------------+--------+ | wikipage | coeff | +------------------------------------+--------+ | Lisp_(programming_language) | 100 | | Smalltalk | 28.125 | | Pascal_(programming_language) | 24.675 | | Fortran | 23.881 | | Scheme_(programming_language) | 23.438 | | Ruby_(programming_language) | 23.188 | | object-oriented_programming | 21.875 | | PHP | 20.779 | | Prolog | 20.312 | | Haskell_(programming_language) | 20.312 | | Ada_(programming_language) | 18.75 | | APL_(programming_language) | 18.75 | | BASIC | 18.75 | | COBOL | 18.75 | | functional_programming | 18.75 | | John_McCarthy_(computer_scientist) | 18.75 | | C_Sharp_(programming_language) | 18.75 | | programming_language | 17.647 | | JavaScript | 17.526 | | compiler | 17 | +------------------------------------+--------+ sa: T |WP: APL_(programming_language)> +------------------------------------+--------+ | wikipage | coeff | +------------------------------------+--------+ | APL_(programming_language) | 100.0 | | Kenneth_E._Iverson | 33.333 | | John_McCarthy_(computer_scientist) | 25.0 | | John_Backus | 25.0 | | Prolog | 23.333 | | Alan_Kay | 20.833 | | AWK | 20.833 | | Grace_Hopper | 20.833 | | ML_(programming_language) | 20.833 | | Niklaus_Wirth | 20.833 | | logic_programming | 20.833 | | J_(programming_language) | 20.833 | | bytecode | 19.231 | | Lisp_(programming_language) | 18.75 | | programmer | 17.857 | | Objective-C | 17.241 | | BASIC | 17.188 | | Mathematica | 17.143 | | SQL | 17.073 | | ALGOL | 16.667 | +------------------------------------+--------+ sa: T |WP: SQL> +----------------------------------------+--------+ | wikipage | coeff | +----------------------------------------+--------+ | SQL | 100.0 | | Haskell_(programming_language) | 22.727 | | PHP | 19.481 | | APL_(programming_language) | 17.073 | | Category:Cross-platform_software | 17.073 | | Visual_Basic | 17.073 | | relational_database | 17.073 | | COBOL | 16.327 | | PostgreSQL | 14.634 | | R_(programming_language) | 14.634 | | Run_time_(program_lifecycle_phase) | 14.634 | | Ruby_(programming_language) | 14.493 | | JavaScript | 13.402 | | C_Sharp_(programming_language) | 13.208 | | database | 12.644 | | Lisp_(programming_language) | 12.5 | | Common_Lisp | 12.195 | | Graphical_user_interface | 12.195 | | MySQL | 12.195 | | Mathematica | 12.195 | +----------------------------------------+--------+ sa: T |WP: SPARQL> +--------------------------------------------------------------------------------------------+--------+ | wikipage | coeff | +--------------------------------------------------------------------------------------------+--------+ | SPARQL | 100.0 | | Web_Ontology_Language | 33.333 | | Agris:_International_Information_System_for_the_Agricultural_Sciences_and_Technology | 33.333 | | W3C_XML_Schema | 33.333 | | GRDDL | 33.333 | | Conceptual_interoperability | 33.333 | | Category:Web_services | 33.333 | | Resource_Description_Framework | 20 | | Analytic_geometry | 16.667 | | DHTML | 16.667 | | Interpolation | 16.667 | | GNU_nano | 16.667 | | Pico_(text_editor) | 16.667 | | Relational_database | 16.667 | | Sir_Charles_Lyell | 16.667 | | Synchronized_Multimedia_Integration_Language | 16.667 | | Semantic_network | 16.667 | | Backronym | 16.667 | | Interoperability | 16.667 | | RAS_syndrome | 16.667 | +--------------------------------------------------------------------------------------------+--------+ sa: T |WP: The_Doors> +---------------------------------------+--------+ | wikipage | coeff | +---------------------------------------+--------+ | The_Doors | 100.0 | | Jim_Morrison | 31.707 | | Ray_Manzarek | 21.951 | | Jefferson_Airplane | 17.073 | | The_Band | 14.634 | | Bee_Gees | 14.634 | | Janis_Joplin | 12.195 | | Summer_of_Love | 12.195 | | Timothy_Leary | 12.195 | | folk_rock | 12.195 | | Governor_of_American_Samoa | 12.195 | | Cream_(band) | 12.195 | | Sgt._Pepper's_Lonely_Hearts_Club_Band | 12.195 | | The_Byrds | 12.195 | | Joan_Baez | 12.195 | | Iron_Butterfly | 12.195 | | Brian_Jones | 12.195 | | Creedence_Clearwater_Revival | 12.195 | | Johnny_Winter | 12.195 | | The_Yardbirds | 12.195 | +---------------------------------------+--------+ sa: T |WP: Rugby> +----------+-------+ | wikipage | coeff | +----------+-------+ +----------+-------+ sa: T |WP: Australian_Football_League> +---------------------------------+--------+ | wikipage | coeff | +---------------------------------+--------+ | Australian_Football_League | 100.0 | | West_Coast_Eagles | 40 | | Richmond_Football_Club | 40 | | Sydney_Swans | 36.667 | | St_Kilda_Football_Club | 36.667 | | Collingwood_Football_Club | 36.667 | | Hawthorn_Football_Club | 36.667 | | Australian_rules_football | 33.871 | | Essendon_Football_Club | 33.333 | | North_Melbourne_Football_Club | 33.333 | | Western_Bulldogs | 33.333 | | Carlton_Football_Club | 33.333 | | Brownlow_Medal | 33.333 | | Seven_Network | 32.353 | | Melbourne_Cricket_Ground | 30 | | Australian_Bureau_of_Statistics | 30 | | Special_Broadcasting_Service | 30 | | Melbourne_Football_Club | 30 | | Fitzroy_Football_Club | 30 | | 2001_AFL_season | 30 | +---------------------------------+--------+Wow! That works unbelievably well. I don't know exactly why, but hey. And why similar[inverse-links-to] works better than similar[links-to], I don't know either! A question that comes to mind is, if we use a larger subset of wikipedia, will these results get better or worse? I suspect better, but not sure.
BTW, this is from:
100*30000/15559125 = 0.19 %of the total English wikipedia.
And the resulting sw file is here.
No comments:
Post a Comment