So I've been doing a little bit of reading about
dbpedia, since it has a lot of overlap with what I am trying to do! Amongst other things there, there are a few SPARQL examples. For example,
"cities with more than 2 million habitants".
First in SPARQL:
(name and population for the top 30 biggest cities)
SELECT ?subject ?population WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
?subject <http://dbpedia.org/ontology/populationUrban> ?population.
FILTER (xsd:integer(?population) > 2000000)
}
ORDER BY DESC(xsd:integer(?population))
LIMIT 30
Then in BKO:
-- load up geonames data for all cities with population >= 15000:
sa: load improved-geonames-cities-15000.sw
-- restrict to cities with population >= 2,000,000, and sort the result:
sa: |result> => coeff-sort drop-below[2000000] population-self relevant-kets[population-self] |>
-- define population-comma operator (tidies our population results a little):
sa: population-comma |*> #=> to-comma-number extract-value population |_self>
-- how many results:
sa: how-many "" |result>
|number: 149>
-- print rank table of top 30 results:
sa: rank-table[id,name,population-comma] select[1,30] "" |result>
+------+-------------+----------------+------------------+
| rank | id | name | population-comma |
+------+-------------+----------------+------------------+
| 1 | id: 1796236 | Shanghai | 22,315,474 |
| 2 | id: 3435910 | Buenos Aires | 13,076,300 |
| 3 | id: 1275339 | Mumbai | 12,691,836 |
| 4 | id: 3530597 | Mexico City | 12,294,193 |
| 5 | id: 1816670 | Beijing | 11,716,620 |
| 6 | id: 1174872 | Karachi | 11,624,219 |
| 7 | id: 745044 | Istanbul | 11,174,257 |
| 8 | id: 1792947 | Tianjin | 11,090,314 |
| 9 | id: 1809858 | Guangzhou | 11,071,424 |
| 10 | id: 1273294 | Delhi | 10,927,986 |
| 11 | id: 1701668 | Manila | 10,444,527 |
| 12 | id: 524901 | Moscow | 10,381,222 |
| 13 | id: 1795565 | Shenzhen | 10,358,381 |
| 14 | id: 1185241 | Dhaka | 10,356,500 |
| 15 | id: 1835848 | Seoul | 10,349,312 |
| 16 | id: 3448439 | Sao Paulo | 10,021,295 |
| 17 | id: 1791247 | Wuhan | 9,785,388 |
| 18 | id: 2332459 | Lagos | 9,000,000 |
| 19 | id: 1642911 | Jakarta | 8,540,121 |
| 20 | id: 1850147 | Tokyo | 8,336,599 |
| 21 | id: 5128581 | New York City | 8,175,133 |
| 22 | id: 1812545 | Dongguan | 8,000,000 |
| 23 | id: 1668341 | Taipei | 7,871,900 |
| 24 | id: 2314302 | Kinshasa | 7,785,965 |
| 25 | id: 3936456 | Lima | 7,737,002 |
| 26 | id: 360630 | Cairo | 7,734,614 |
| 27 | id: 3688689 | Bogota | 7,674,366 |
| 28 | id: 2643741 | City of London | 7,556,900 |
| 29 | id: 2643743 | London | 7,556,900 |
| 30 | id: 1814906 | Chongqing | 7,457,600 |
+------+-------------+----------------+------------------+
Of mild note is some populations here are wildly different from the dbpedia results. Different ways of counting population (eg, center of a city, vs a broader definition) I presume.
I guess the other thing about this post is I'm not a big fan of the SPARQL notation. I much prefer my notation, but heh, I'm kinda biased.
The other thing to note is I now have 4 permutations of the table operator: table[], strict-table[], rank-table[], and strict-rank-table[]. Also, I tweaked table[] so that it auto-sets all coeffs of the incoming superposition to 1 (using the set-to[1] sigmoid). I figure there are no use cases for when we want non 1 coeffs for the incoming superposition. If I'm wrong, then just have to comment out one line of code.
Update: what if we want a table without the "id" column? Well, not super easy. We have to do it indirectly again. And also noting that in the geonames data, id is unique, but names are not. Anyway, our code neatly handles ambiguity like that.
Now, using the same |result> as defined above, let's take a quick look at what we know:
sa: select[1,5] "" |result>
22315474.000|id: 1796236> + 13076300.000|id: 3435910> + 12691836.000|id: 1275339> + 12294193.000|id: 3530597> + 11716620.000|id: 1816670>
sa: name select[1,5] "" |result>
22315474.000|Shanghai> + 13076300.000|Buenos Aires> + 12691836.000|Mumbai> + 12294193.000|Mexico City> + 11716620.000|Beijing>
-- now define some operators:
-- define the pop operator (NB: the "id" just before |_self> that maps back from "name" to geoname "id"):
popn |*> #=> to-comma-number extract-value population id |_self>
-- define improved popn operator (sorts population results):
improved-popn |*> #=> to-comma-number extract-value population reverse sort-by[population] id |_self>
-- a variant of that is this (makes use of list-to-words):
improved-popn |*> #=> list-to-words to-comma-number extract-value population reverse sort-by[population] id |_self>
-- define the "count-name-matches" operator, which tells us the number of different objects sharing a name:
-- (defined to ignore value if count == 1)
count-name-matches |*> #=> push-float drop-below[2] pop-float extract-value count id |_self>
-- show the resulting table:
sa: rank-table[city,improved-popn,count-name-matches] name "" |result>
+------+----------------------+------------------------------------+--------------------+
| rank | city | improved-popn | count-name-matches |
+------+----------------------+------------------------------------+--------------------+
| 1 | Shanghai | 22,315,474 | |
| 2 | Buenos Aires | 13,076,300 | |
| 3 | Mumbai | 12,691,836 | |
| 4 | Mexico City | 12,294,193 | |
| 5 | Beijing | 11,716,620 | |
| 6 | Karachi | 11,624,219 | |
| 7 | Istanbul | 11,174,257 | |
| 8 | Tianjin | 11,090,314 | |
| 9 | Guangzhou | 11,071,424 | |
| 10 | Delhi | 10,927,986 | |
| 11 | Manila | 10,444,527 | |
| 12 | Moscow | 10,381,222, 23,800 | 2 |
| 13 | Shenzhen | 10,358,381 | |
| 14 | Dhaka | 10,356,500, 36,111 | 2 |
| 15 | Seoul | 10,349,312 | |
| 16 | Sao Paulo | 10,021,295 | |
| 17 | Wuhan | 9,785,388 | |
| 18 | Lagos | 9,000,000, 18,831 | 2 |
| 19 | Jakarta | 8,540,121 | |
| 20 | Tokyo | 8,336,599 | |
| 21 | New York City | 8,175,133 | |
| 22 | Dongguan | 8,000,000 | |
| 23 | Taipei | 7,871,900 | |
| 24 | Kinshasa | 7,785,965 | |
| 25 | Lima | 7,737,002, 38,771 | 2 |
| 26 | Cairo | 7,734,614 | |
| 27 | Bogota | 7,674,366 | |
| 28 | City of London | 7,556,900 | |
| 29 | London | 7,556,900, 346,765 | 2 |
| 30 | Chongqing | 7,457,600 | |
| 31 | Chengdu | 7,415,590 | |
| 32 | Nanjing | 7,165,292 | |
| 33 | Tehran | 7,153,309 | |
| 34 | Nanchong | 7,150,000 | |
| 35 | Hong Kong | 7,012,738 | |
| 36 | Xi'an | 6,501,190 | |
| 37 | Lahore | 6,310,888 | |
| 38 | Shenyang | 6,255,921 | |
| 39 | Hangzhou | 6,241,971 | |
| 40 | Rio de Janeiro | 6,023,699 | |
| 41 | Harbin | 5,878,939 | |
| 42 | Baghdad | 5,672,513 | |
| 43 | Tai'an | 5,499,000 | |
| 44 | Suzhou | 5,345,961, 205,130 | 2 |
| 45 | Shantou | 5,329,024 | |
| 46 | Bangkok | 5,104,476 | |
| 47 | Bangalore | 5,104,047 | |
| 48 | Saint Petersburg | 5,028,000, 244,769 | 2 |
| 49 | Santiago | 4,837,295, 108,414, 46,611, 35,968 | 4 |
| 50 | Kolkata | 4,631,392 | |
| 51 | Sydney | 4,627,345, 105,968 | 2 |
| 52 | Toronto | 4,612,191 | |
| 53 | Yangon | 4,477,638 | |
| 54 | Jinan | 4,335,989 | |
| 55 | Chennai | 4,328,063 | |
| 56 | Zhengzhou | 4,253,913 | |
| 57 | Melbourne | 4,246,375, 76,068 | 2 |
| 58 | Riyadh | 4,205,961 | |
| 59 | Changchun | 4,193,073 | |
| 60 | Dalian | 4,087,733 | |
| 61 | Chittagong | 3,920,222 | |
| 62 | Kunming | 3,855,346 | |
| 63 | Alexandria | 3,811,516, 139,966, 49,346, 47,723 | 4 |
| 64 | Los Angeles | 3,792,621, 125,430 | 2 |
| 65 | Ahmedabad | 3,719,710 | |
| 66 | Qingdao | 3,718,835 | |
| 67 | Busan | 3,678,555 | |
| 68 | Abidjan | 3,677,115 | |
| 69 | Kano | 3,626,068 | |
| 70 | Foshan | 3,600,000 | |
| 71 | Hyderabad | 3,597,816, 1,386,330 | 2 |
| 72 | Puyang | 3,590,000 | |
| 73 | Yokohama | 3,574,443 | |
| 74 | Ibadan | 3,565,108 | |
| 75 | Singapore | 3,547,809 | |
| 76 | Wuxi | 3,543,719, 66,442 | 2 |
| 77 | Xiamen | 3,531,347 | |
| 78 | Ankara | 3,517,182 | |
| 79 | Tianshui | 3,500,000 | |
| 80 | Ningbo | 3,491,597 | |
| 81 | Ho Chi Minh City | 3,467,331 | |
| 82 | Shiyan | 3,460,000, 408,055 | 2 |
| 83 | Cape Town | 3,433,441 | |
| 84 | Taiyuan | 3,426,519 | |
| 85 | Berlin | 3,426,354 | |
| 86 | Tangshan | 3,372,102 | |
| 87 | Hefei | 3,310,268 | |
| 88 | Montreal | 3,268,513 | |
| 89 | Madrid | 3,255,944, 50,437 | 2 |
| 90 | Pyongyang | 3,222,000 | |
| 91 | Casablanca | 3,144,909 | |
| 92 | Zibo | 3,129,228 | |
| 93 | Zhongshan | 3,121,275 | |
| 94 | Durban | 3,120,282 | |
| 95 | Changsha | 3,093,980 | |
| 96 | Kabul | 3,043,532 | |
| 97 | UEruemqi | 3,029,372 | |
| 98 | Caracas | 3,000,000 | |
| 99 | Pune | 2,935,744 | |
| 100 | Surat | 2,894,504 | |
| 101 | Jeddah | 2,867,446 | |
| 102 | Shijiazhuang | 2,834,942 | |
| 103 | Kanpur | 2,823,249 | |
| 104 | Kiev | 2,797,553 | |
| 105 | Luanda | 2,776,168 | |
| 106 | Quezon City | 2,761,720 | |
| 107 | Addis Ababa | 2,757,729 | |
| 108 | Nairobi | 2,750,547 | |
| 109 | Salvador | 2,711,840 | |
| 110 | Jaipur | 2,711,758 | |
| 111 | Dar es Salaam | 2,698,652 | |
| 112 | Chicago | 2,695,598 | |
| 113 | Lanzhou | 2,628,426 | |
| 114 | Incheon | 2,628,000 | |
| 115 | Yunfu | 2,612,800 | |
| 116 | Navi Mumbai | 2,600,000 | |
| 117 | Al Basrah | 2,600,000 | |
| 118 | Osaka-shi | 2,592,413 | |
| 119 | Mogadishu | 2,587,183 | |
| 120 | Daegu | 2,566,540 | |
| 121 | Faisalabad | 2,506,595 | |
| 122 | Izmir | 2,500,603 | |
| 123 | Dakar | 2,476,400 | |
| 124 | Lucknow | 2,472,011 | |
| 125 | Al Jizah | 2,443,203 | |
| 126 | Fortaleza | 2,400,000 | |
| 127 | Cali | 2,392,877 | |
| 128 | Surabaya | 2,374,658 | |
| 129 | Belo Horizonte | 2,373,224 | |
| 130 | Nanchang | 2,357,839 | |
| 131 | Grand Dakar | 2,352,057 | |
| 132 | Rome | 2,318,895, 36,303, 33,725 | 3 |
| 133 | Mashhad | 2,307,177 | |
| 134 | Brooklyn | 2,300,664 | |
| 135 | Borough of Queens | 2,272,771 | |
| 136 | Nagpur | 2,228,018 | |
| 137 | Maracaibo | 2,225,000 | |
| 138 | Brasilia | 2,207,718 | |
| 139 | Santo Domingo | 2,201,941, 45,476 | 2 |
| 140 | Nagoya-shi | 2,191,279 | |
| 141 | Brisbane | 2,189,878 | |
| 142 | Havana | 2,163,824 | |
| 143 | Paris | 2,138,551, 25,171 | 2 |
| 144 | Houston | 2,099,451 | |
| 145 | Al Mawsil al Jadidah | 2,065,597 | |
| 146 | Johannesburg | 2,026,469 | |
| 147 | Kowloon | 2,019,533 | |
| 148 | Al Basrat al Qadimah | 2,015,483 | |
| 149 | Almaty | 2,000,900 | |
+------+----------------------+------------------------------------+--------------------+
And that's it for now. Heaps more to come, as usual.