Tuesday, 17 February 2015

a similarity metric

In this post I will try to explain the derivation of my similarity metric.This thing is useful in a bunch of places.

First, note that in my travels I have seen this given as a similarity metric:
similarity-metric(f,g,s) = exp(-||f - g||^2/2s)
So, if f == g, it returns 1, if ||f - g||^2 is big, it approaches 0.

Now, for the derivation of my metric.
First, define a multiplication operator that will simplify the equations a little:
a*b = \Sum_k abs(a[k] . b[k]) -- discrete version
a*b = \Int dt abs(a(t) . b(t)) -- continuous version
where . is the standard multiplication operator.
Both of these have the property:
0 <= w*[f - g] <= w*f + w*g
where 0 if f == g, and w*f + w*g if f and g are perfectly disjoint.
which is really just another way of stating a couple of standard properties of metrics:
2. d(x,y) = 0 if and only if x = y
4. d(x,z) <= d(x,y) + d(y,z)
And it is illustrative to note that:
w*[f - g] = \Sum_k |w[k].f[k] - w[k].g[k]|  -- discrete version
w*[f - g] = \Int dt |w(t).f(t) - w(t).g(t)| -- continuous version
Now, just divide through by the right hand side, casting it into the range [0,1]
0 <= w*[f - g]/(w*f + w*g) <= 1
Now, this is a good start, but for some examples it doesn't work as well as expected (maybe one day I will dig up an example to show my point!).
But we can fix this by adding a term: R.abs(w*f - w*g), where R >= 0, and works best with R = 1
And hence simm:
0 <= (w*[f - g] + R.abs(w*f - w*g))/(w*f + w*g + R.abs(w*f - w*g)) <= 1
So, OK. Let's set R = 1, and note that a + b + abs(a - b) = 2.max(a,b), we can now compact it down to:
0 <= (w*[f - g] + abs(w*f - w*g))/2.max(w*f,w*g) <= 1
Cool. And now observe, as in the physics tradition, this equation has some symmetries:
given the scalar k (not equal to 0), we have:
1) symmetry under w => k.w                       
2) symmetry under f => k.f and g => k.g
3) symmetry under w => w.exp(i*t)/R(t), f => R(t).exp(-i*t).f, g => R(t).exp(-i*t).g   
Now, I can't think of a use case for the third symmetry there, but I put it in because it reminds me of gauge transformations in physics, and is a good decider on what terms are valid to use in our metric. eg: w*f, w*g, w*[f - g], abs(w*f - w*g).

We are about done. Final thing to note is the above metric returns 0 for exact match, and 1 for perfectly disjoint. If we want to swap this just map the metric m to 1 - m (presuming m "attains" its upper and lower bounds)
ie:
(w*[f - g] + R.abs(w*f - w*g))/(w*f + w*g + R.abs(w*f - w*g))
becomes:
1 - (w*[f - g] + R.abs(w*f - w*g))/(w*f + w*g + R.abs(w*f - w*g))
  = (w*f + w*g + R.abs(w*f - w*g) - w*[f - g] - R.abs(w*f - w*g))/(w*f + w*g + R.abs(w*f - w*g))
  = (w*f + w*g - w*[f - g])/(w*f + w*g + R.abs(w*f - w*g)) 
  = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)                -- assuming R = 1
So there we have it:
simm(w,f,g) = (w*f + w*g - w*[f - g])/2.max(w*f,w*g)
NB: if w is not given, ie we are using simm(f,g), then assume w[k] = 1 for all k, or w(t) = 1 for all t.   
Now a couple of observations:
1) the above derivation of my simm was based on the Manhattan metric. A similar derivation could be done using the standard Euclidean metric. Or presumably any metric. Though I think Manhattan is better than Euclidean because it is in some sense linear, and doesn't require any sqrt()s.
2) the fact that the denominator collapses down only when R = 1 is a hint that that is the right choice!
3) from experience playing with this, simm(w,f,g) usually, though not always, gives best results when f and g are rescaled so that w*f == w*g (excluding the case where they are length 1 vectors, in which case you should not rescale!)
4) if f[k] and g[k] are >= 0 for all terms, then the equation can be compressed using a + b - abs(a - b) = 2.min(a,b), and becomes:
simm(w,f,g) = \Sum_k w[k] min(f[k],g[k]) / max(w*f,w*g)

And that's it! An implementation in python, and some examples coming up next.

Update: A quick expansion on observation (1). So, using the Manhattan metric as the base, we have:
w*[f - g] = \Sum_k |w[k].f[k] - w[k].g[k]|
If we used the Eucldiean metric instead, we would have:
w*[f - g] = sqrt(\Sum_k |w[k].f[k] - w[k].g[k]|^2)
and hence my comment "in some sense linear".

And an expansion on observation (4):
Consider just the numerator:
w*f + w*g - w*[f - g]
Now, expand that out using our definition of a*b:
w*f + w*g - w*[f - g]
= \Sum_k |w[k].f[k]| + \Sum_k |w[k].g[k]| - \Sum_k |w[k].f[k] - w[k].g[k]|
= \Sum_k |w[k].f[k]| + |w[k].g[k]| - |w[k].f[k] - w[k].g[k]|
= \Sum_k |w[k]| (|f[k]| + |g[k]| - |f[k] - g[k]|)
Now, make use of "f[k] and g[k] are >= 0 for all terms":
= \Sum_k |w[k]| (f[k] + g[k] - |f[k] - g[k]|)
Now, make use of a + b - abs(a - b) = 2.min(a,b):
= \Sum_k |w[k]| 2.min(f[k],g[k])
And hence (assuming w[k] >= 0 for all k):
simm(w,f,g) = \Sum_k w[k] min(f[k],g[k]) / max(w*f,w*g)
as promised! And we take this pretty compression as a strong hint that Manhattan was a better choice than Euclidean! BTW, also note the factor of 2 from in front of min, and max cancelled out.

Update: OK. I don't know a use case for these, but we can generalize to more than pairs {f,g}.
simm(w,f1,f2) = \Sum_k w[k] min(f1[k],f2[k])/max(w*f1,w*f2)
simm(w,f1,f2,f3) = \Sum_k w[k] min(f1[k],f2[k],f3[k])/max(w*f1,w*f2,w*f3)
simm(w,f1,f2,f3,f4) = \Sum_k w[k] min(f1[k],f2[k],f3[k],f4[k])/max(w*f1,w*f2,w*f3,w*f4)
simm(w,f1,f2,f3,f4,f5) = \Sum_k w[k] min(f1[k],f2[k],f3[k],f4[k],f5[k])/max(w*f1,w*f2,w*f3,w*f4,w*f5)
and so on.

Sunday, 15 February 2015

new feature: memoizing rules

So, I was going to write up about "similar" in this phase of the project, but today I implemented a new feature I want to mention: "memoizing rules". Rules that save the results of their computation. I guess the motivation was quantum mechanics. This whole project I have been borrowing/co-opting ideas from QM. bra's, kets, operators, superpositions, exponentiating operators, and so on.

Indeed, stored rules were based on the QM idea that an object doesn't have a property until you measure it. In BKO, say an example like this (making use of stored rules):
is-alive |cat> #=> normalize pick-elt (0.5|yes> + 0.5|no>)

OK. So that is the start of a representation of Schroedinger's poor cat, but what about wave-fn collapse? The idea that before measurement it is in a superposition, but after measurement it is in a single state. Stored rules almost represent this, but not quite. And hence memoizing rules (NB: the "!=>"):
is-alive |cat> !=> normalize pick-elt (0.5|yes> + 0.5|no>)

With this new notation we have both "is-alive |cat>" is undefined until we measure, and once we measure it, it collapses into one state. Hopefully an example in the console will help explain:
-- load up the rules, and see what we have:
sa: dump
----------------------------------------
|context> => |context: memoizing Schrodingers cat>

is-alive |cat> #=> normalize pick-elt (0.5|yes> + 0.5|no>)
memoizing-is-alive |cat> !=> normalize pick-elt (0.5|yes> + 0.5|no>)
----------------------------------------
Now, ask in the console the status of the cat:
sa: is-alive |cat>
|yes>

sa: memoizing-is-alive |cat>
|no>
Now see what we have:
sa: dump
----------------------------------------
|context> => |context: memoizing Schrodingers cat>

is-alive |cat> #=> normalize pick-elt (0.5|yes> + 0.5|no>)
memoizing-is-alive |cat> => |no>
----------------------------------------
So there we have it! Undefined value until measurement, and collapse to a single value after measurement.

OK. So that is cool and all, but what is the use of this? Well, one is to drastically speed up recursive algo's like Fibonacci. Here, let me show:
sa: dump
----------------------------------------
|context> => |context: memoizing Fibonacci>

fib |0> => |0>
fib |1> => |1>
n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> !=> arithmetic(fib n-1 |_self>,|+>,fib n-2 |_self>)
----------------------------------------

-- ask value of fib |13>
sa: fib |13>
|233>
 
-- see what we now know:
sa: dump
----------------------------------------
|context> => |context: memoizing Fibonacci>

fib |0> => |0>
fib |1> => |1>

n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> !=> arithmetic(fib n-1 |_self>,|+>,fib n-2 |_self>)

fib |2> => |1>
fib |3> => |2>
fib |4> => |3>
fib |5> => |5>
fib |6> => |8>
fib |7> => |13>
fib |8> => |21>
fib |9> => |34>
fib |10> => |55>
fib |11> => |89>
fib |12> => |144>
fib |13> => |233>
----------------------------------------
Cool, huh! And is a massive speed up! I've been told normal Fibonacci is O(n!), not sure exactly what the memoizing fib is, but it is closer to O(n). And powerfully enough, we don't really have to do much work to use this auto-memoizing feature. Instead of using stored-rules "#=>", just use the exact same code with a memoizing-rule "!=>".

Update: I just ran fib-ratio |100>, and it was smoking fast! Almost instant. Previously, fib-ratio |30> was sloooow. Code here.

Friday, 13 February 2015

Announcing phase 3: similar and find-topic operators

Phase 1 was literal operators, phase 2 was function operators, and now phase 3 we concentrate on a couple of special function operators:
similar[op] |x>
find-topic[op] |x>

I'll start with similar in the next post. Heaps more to come!

Thursday, 12 February 2015

This Concludes Phase 2

So, this post concludes phase 2 of my write-up. Phase 1 was largely about (trying) to explain the basics, and concentrating largely on literal operators (recall, defined using OP KET => SUPERPOSITION). Phase 2 was largely about describing and giving some example usages of function operators. Also, I hope, along the way I demonstrated some of the power of the whole BKO scheme.

Where the "BKO scheme" is the idea that you can get a long way just with the idea that everything is a ket, a superposition, or an operator that acts on kets and/or superpositions, and returns kets or superpositions. (and maybe eventually I will include sequences in this scheme. The idea being that superpositions are a space based collection of kets, and sequences are a time based collection of kets. With more thought, another way to look at it is for superpositions the order of the kets is irrelevant, ie kets commute, with sequences the order of the kets is important.)

One consequence of this unified model is you no longer need a new class for each type of object you are working with (the standard Object-Orientated model). Instead we use category/data-type labels to represent that. So instead of needing to define say a person class, we use:
op |person: *> #=> ...
for an operator that works for all members of type person. And:
op |plant: tree: *> #=> ...
for an operator that works for all types of trees.
op |plant: tree: elm> #=> ...
for an operator that works just for trees of type elm.
op |*> #=> ...
for an operator that works for all types (though usually this just means we have been too lazy to specify the data-type!).

Indeed, if we look at the code, we have, and need, only a handful of classes. One for kets, superpositions, stored-rules, context, and context-list. That's it!

Talking of code, at some stage I will link to it. The long term plan is for this to be open source, and for others to bang on it. Of course, currently the code quality is terrible (but mostly works). When I started I was a programming newb, and did terrible things! For the shorter term plan I would love if I could get others on-board to map more data to sw format. And the other side of that, get more people (ie, someone other than me!) to use the sw format.

So world, welcome to the Feynman Knowledge Engine, a minimalist, but working, knowledge engine. And in the back end a custom data representation format (the sw format), and a language to work with it. Feynman being one of my physics hero's!

I have some interesting stuff coming up in phase 3, so as always, heaps more to come!

Update: OK. So I'm not going to link to the code just yet, but I will link to the console history file. The console saves a record of all work done at the console. This is useful in that you can look back and see how you did something, but also useful so others can see how I make use of the BKO scheme.

Update: So, I've been re-reading the Sci Am semantic web paper. In particular, this interested me:
"The real power of the Semantic Web will be realized when people create many programs that collect Web content from diverse sources, process the information and exchange the results with other programs.
...
A typical process will involve the creation of a "value chain" in which subassemblies of information are passed from one agent to another, each one "adding value," to construct the final product requested by the end user."

Strikes me we could do that with not too much work by using the sw format as the method to transfer information from agent to agent. Also strikes me that this will go some way to making the web/internet into a giant brain. Though that is still some way off!

Heh, I have a mental picture of the BKO scheme operators mapping incoming superpositions to out-going superpositions. And agents a level higher than that, mapping incoming sw files to out-going sw files. Interesting!

I guess the next question is what language will these agents be written in? It is kinda clunky to use my back-end python. I've been thinking of a new language called "swc" (ie, semantic-web-code). Not sure exactly what form it will take just yet, but the idea is sw is a subset, so all valid sw is valid in swc, but in contrast to sw it will also have multi-line constructs like loops and functions, and other features like input and output and so on. But I don't really have a mental image of what this full language will look like, yet.

Update: another comment on the power of the BKO scheme.
Say we have n different operators, and we step k steps from our starting superposition, then, there are n^k possible daughter superpositions! Now, not all of these will be useful, or distinct, but as a rough measure n^k is large! The more general, and powerful your operators, the more useful daughter superpositions you will have available to you! Cool!

And if we add 1 new general purpose operator, we now have (n+1)^k - n^k new daughter superpositions.

Indeed, if we have M starting superpositions, n operators, and we take a max of k steps, then the upper bound on the number of distinct and useful superpositions we have is:
M*(1 + n + n^2 + n^3 + ... + n^k-1 + n^k)

Update: we can consider operators to be analogous to tools. Say we are in state:
|hungry> + |have closed can of food>
OK. Not much we can do about solving that.
But if we learn about a "can-opener" (ie, a tool) and an "eat" operator:
sa: can-opener |have closed can of food> => |have open can of food>
sa: eat |have open can of food> => |not hungry> + |have empty can of food>

And now we can do:
sa: eat can-opener (|hungry> + |have closed can of food>)
|not hungry> + |have empty can of food>

Cool huh! And to me indicates BKO is the right notation/language to use!

Update: here is an interesting little observation. If some of your operators commute, ie "op1 op2 SP" = "op2 op1 SP" then we have less distinct daughter superpositions.

brief note on pretty print tables

Just wanted to make a visible note that I tweaked the pretty print table code. We no longer have to do the "extract-value" dance, the table code does that all automatically now. Almost always this is an improvement, though I can think of one case where it might not be. The idea is the column header pretty much indicates the category/data-type, so no need to repeat that for every row in the table.

Old:
sa: load foaf-example-in-sw.sw
sa: table[name,email,works-for] "" |list>
+------------------------+----------------------------------+------------------------+
| name                   | email                            | works-for              |
+------------------------+----------------------------------+------------------------+
| Dan                    | email: danbri@w3.org             | organisation: ILRT     |
| Libby                  | email: libby.miller@bris.ac.uk   | organisation: ILRT     |
| Craig                  | email: craig@netgates.co.uk      | organisation: Netgates |
| Liz                    |                                  | organisation: Netgates |
| Kathleen               |                                  | organisation: Netgates |
| Damian                 |                                  |                        |
| Martin                 | email: m.l.poulter@bristol.ac.uk |                        |
| organisation: ILRT     |                                  |                        |
| organisation: Netgates |                                  |                        |
+------------------------+----------------------------------+------------------------+
New:
sa: load foaf-example-in-sw.sw
sa: table[name,email,works-for] "" |list>
+----------+---------------------------+-----------+
| name     | email                     | works-for |
+----------+---------------------------+-----------+
| Dan      | danbri@w3.org             | ILRT      |
| Libby    | libby.miller@bris.ac.uk   | ILRT      |
| Craig    | craig@netgates.co.uk      | Netgates  |
| Liz      |                           | Netgates  |
| Kathleen |                           | Netgates  |
| Damian   |                           |           |
| Martin   | m.l.poulter@bristol.ac.uk |           |
| ILRT     |                           |           |
| Netgates |                           |           |
+----------+---------------------------+-----------+
Old:
sa: load early-us-presidents.sw
sa: table[name,full-name,president-number,party] "" |early US Presidents: _list>
+------------+---------------------------+------------------+------------------------------+
| name       | full-name                 | president-number | party                        |
+------------+---------------------------+------------------+------------------------------+
| Washington | person: George Washington | number: 1        | party: Independent           |
| Adams      | person: John Adams        | number: 2        | party: Federalist            |
| Jefferson  | person: Thomas Jefferson  | number: 3        | party: Democratic-Republican |
| Madison    | person: James Madison     | number: 4        | party: Democratic-Republican |
| Monroe     | person: James Monroe      | number: 5        | party: Democratic-Republican |
| Q Adams    | person: John Quincy Adams | number: 6        | party: Democratic-Republican |
+------------+---------------------------+------------------+------------------------------+

sa: years-in-office |*> #=> extract-value president-era |_self>
sa: table[name,years-in-office] "" |early US Presidents: _list>
+------------+------------------------------------------------------+
| name       | years-in-office                                      |
+------------+------------------------------------------------------+
| Washington | 1789, 1790, 1791, 1792, 1793, 1794, 1795, 1796, 1797 |
| Adams      | 1797, 1798, 1799, 1800, 1801                         |
| Jefferson  | 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809 |
| Madison    | 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817 |
| Monroe     | 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825 |
| Q Adams    | 1825, 1826, 1827, 1828, 1829                         |
+------------+------------------------------------------------------+
New (NB: we no longer need to define the "years-in-office" operator which applied "extract-value" to the president-era operator):
sa: table[name,full-name,president-number,party] "" |early US Presidents: _list>
+------------+-------------------+------------------+-----------------------+
| name       | full-name         | president-number | party                 |
+------------+-------------------+------------------+-----------------------+
| Washington | George Washington | 1                | Independent           |
| Adams      | John Adams        | 2                | Federalist            |
| Jefferson  | Thomas Jefferson  | 3                | Democratic-Republican |
| Madison    | James Madison     | 4                | Democratic-Republican |
| Monroe     | James Monroe      | 5                | Democratic-Republican |
| Q Adams    | John Quincy Adams | 6                | Democratic-Republican |
+------------+-------------------+------------------+-----------------------+

sa: table[name,president-era] "" |early US Presidents: _list>
+------------+------------------------------------------------------+
| name       | president-era                                        |
+------------+------------------------------------------------------+
| Washington | 1789, 1790, 1791, 1792, 1793, 1794, 1795, 1796, 1797 |
| Adams      | 1797, 1798, 1799, 1800, 1801                         |
| Jefferson  | 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809 |
| Madison    | 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817 |
| Monroe     | 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825 |
| Q Adams    | 1825, 1826, 1827, 1828, 1829                         |
+------------+------------------------------------------------------+
And I guess that's it. The difference between old and improved should be obvious enough.

Update: another tweak of the table code. Now we can dump everything we know about a given superposition without having to manually specify all the operators. Makes use of the "supported-ops" operator.

eg, consider:
sa: load foaf-example-in-sw.sw
sa: supported-ops "" |list>
3.000|op: where-live> + 2.000|op: lives-with> + 4.000|op: email> + 5.000|op: works-for> + |op: wife> + |op: knows-quite-well> + |op: where-lives> + 2.000|op: website> 

sa: load early-us-presidents.sw
sa: supported-ops "" |early US Presidents: _list>
6.000|op: president-number> + 6.000|op: president-era> + 6.000|op: party> + 6.000|op: full-name>
Point is, we can now do this (NB: the "*" instead of an operator label, noting also that * is not a valid operator name):
sa: table[name,*] "" |list>
+----------+--------------+--------------+---------------------------+-----------+------+---------------------------+-------------+----------------------------+
| name     | where-live   | lives-with   | email                     | works-for | wife | knows-quite-well          | where-lives | website                    |
+----------+--------------+--------------+---------------------------+-----------+------+---------------------------+-------------+----------------------------+
| Dan      | Zetland road | Libby, Craig | danbri@w3.org             | ILRT      |      |                           |             |                            |
| Libby    |              |              | libby.miller@bris.ac.uk   | ILRT      |      |                           |             |                            |
| Craig    |              |              | craig@netgates.co.uk      | Netgates  | Liz  |                           |             |                            |
| Liz      | Bristol      | Kathleen     |                           | Netgates  |      |                           |             |                            |
| Kathleen |              |              |                           | Netgates  |      |                           |             |                            |
| Damian   | London       |              |                           |           |      |                           |             |                            |
| Martin   |              |              | m.l.poulter@bristol.ac.uk |           |      | Craig, Damian, Dan, Libby | Bristol     |                            |
| ILRT     |              |              |                           |           |      |                           |             | http://ilrt.org/           |
| Netgates |              |              |                           |           |      |                           |             | http://www.netgates.co.uk/ |
+----------+--------------+--------------+---------------------------+-----------+------+---------------------------+-------------+----------------------------+

sa: table[name,*] "" |early US Presidents: _list>
+------------+------------------+------------------------------------------------------+-----------------------+-------------------+
| name       | president-number | president-era                                        | party                 | full-name         |
+------------+------------------+------------------------------------------------------+-----------------------+-------------------+
| Washington | 1                | 1789, 1790, 1791, 1792, 1793, 1794, 1795, 1796, 1797 | Independent           | George Washington |
| Adams      | 2                | 1797, 1798, 1799, 1800, 1801                         | Federalist            | John Adams        |
| Jefferson  | 3                | 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809 | Democratic-Republican | Thomas Jefferson  |
| Madison    | 4                | 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817 | Democratic-Republican | James Madison     |
| Monroe     | 5                | 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825 | Democratic-Republican | James Monroe      |
| Q Adams    | 6                | 1825, 1826, 1827, 1828, 1829                         | Democratic-Republican | John Quincy Adams |
+------------+------------------+------------------------------------------------------+-----------------------+-------------------+
I suppose I should make a couple of observations.
1) this code highlighted a bug in my foaf-example data-set! Note the "where-lives" instead of "where-live".
2) the "where-live" column shows one example where the auto extract-value on table elements is not ideal. It chomped off the "UK: " prefix. Recall:
where-live |Dan> => |UK: Bristol: Zetland road>
where-live |Liz> => |UK: Bristol>
where-live |Damian> => |UK: London>
I don't think there is much I can do about it though.

I guess that's it. Quite likely I will tweak the table code yet further!

Update: here is a fun, English like, description of supported ops:
sa: load foaf-example-in-sw.sw
sa: list-to-words int-coeffs-to-word extract-value supported-ops "" |list>
|3 where-live, 2 lives-with, 4 email, 5 works-for, 1 wife, 1 knows-quite-well, 1 where-lives and 2 website>
Heh, fun!

new functions: is-greater-than, is-greater-equal-than, etc

I decided a family of comparison functions will be useful, so added them to my function file, and "wired them in" to my processor.

A minimal description (noting there are two variations):
greater-than[t] |x: n> == |x: n> if float(n) > t, else |>
greater-equal-than[t] |x: n> == |x: n> if float(n) >= t, else |>
less-than[t] |x: n> == |x: n> if float(n) < t, else |>
less-equal-than[t] |x: n> == |x: n> if float(n) <= t, else |>
equal[t] |x: n> == |x: n> if  (t - epsilon) <= float(n) <= (t + epsilon), else |>
in-range[t1,t2] |x: n> == |x: n> if t1 <= float(n) <= t2, else |>
And the second variation:
is-greater-than[t] |x: n> == |yes> if float(n) > t, else |no>
is-greater-equal-than[t] |x: n> == |yes> if float(n) >= t, else |no>
is-less-than[t] |x: n> == |yes> if float(n) < t, else |no>
is-less-equal-than[t] |x: n> == |yes> if float(n) <= t, else |no>
is-equal[t] |x: n> == |yes> if  (t - epsilon) <= float(n) <= (t + epsilon), else |no>
is-in-range[t1,t2] |x: n> == |yes> if t1 <= float(n) <= t2, else |no>
They of course share some similarity with functions I already have, such as drop-below, drop-above, and some of the sigmoids.

Anyway, let's compare the "old" way, without these functions vs the new way:
Old:
is-greater-than-0 |*> #=> do-you-know push-float drop-below[1] pop-float |_self>
is-greater-than-5 |*> #=> do-you-know push-float drop-below[5.01] pop-float |_self>
count-name-matches |*> #=> push-float drop-below[2] pop-float extract-value count id |_self>
is-early |time: 24h: *> #=> do-you-know drop-above[700] drop-below[330] pop-float |_self>
is-late |time: 24h: *> #=> do-you-know (drop-below[2230] pop-float |_self> + drop-above[330] pop-float |_self>)
range-is-early |time: 24h: *> #=> do-you-know drop sigmoid-in-range[330,700] pop-float |_self>
range-is-late |time: 24h: *> #=> do-you-know drop (sigmoid-in-range[2230,2400] pop-float |_self> + sigmoid-in-range[0,330] pop-float |_self>)
is-teenager |person: *> #=> do-you-know drop-below[13] drop-above[19] pop-float age |_self>
is-adult |person: *> #=> do-you-know drop-below[18] pop-float age|_self>
range-is-teenager |person: *> #=> do-you-know drop sigmoid-in-range[13,19] pop-float age |_self> 
New:
improved-is-greater-than-0 |*> #=> is-greater-than[0] |_self>
improved-is-greater-than-5 |*> #=> is-greater-than[5] |_self>
improved-count-name-matches |*> #=> greater-than[1] count id |_self>
improved-range-is-early |time: 24h: *> #=> is-in-range[330,700] |_self>
improved-range-is-late |time: 24h: *> #=> do-you-know ( in-range[2230,2400] |_self> + in-range[0,330] |_self>)
improved-is-teenager |person: *> #=> is-in-range[13,19] age |_self>
improved-is-adult |person: *> #=> is-greater-equal-than[18] age |_self>
I guess that is about that. Like I said elsewhere, I'm always in favour of improvements that decrease the amount of work, or dancing, you need to do.

That's it for this post. Heaps more to come!

Update: I guess another point to observe is we can transfer not just data, but active rules when we pass around sw files. This I think will be useful!

Update: decided to implement a variant on these, that applies to the coeff of kets. This I guess is just syntactic sugar, so we don't need "is-equal[100] push-float", we can now do: "is-coeff-equal[100]".
coeff-greater-than[t] n|x: y> == n|x: y> if n > t, else |>
coeff-greater-equal-than[t] n|x: y> == n|x: y> if n >= t, else |>
coeff-less-than[t] n|x: y> == |x: y> if n < t, else |>
coeff-less-equal-than[t] n|x: y> == n|x: y> if n <= t, else |>
coeff-equal[t] n|x: y> == n|x: y> if  (t - epsilon) <= n <= (t + epsilon), else |>
coeff-in-range[t1,t2] n|x: y> == n|x: y> if t1 <= n <= t2, else |>
And the second variation:
is-coeff-greater-than[t] n|x: y> == |yes> if n > t, else |no>
is-coeff-greater-equal-than[t] n|x: y> == |yes> if n >= t, else |no>
is-coeff-less-than[t] n|x: y> == |yes> if n < t, else |no>
is-coeff-less-equal-than[t] n|x: y> == |yes> if n <= t, else |no>
is-coeff-equal[t] n|x: y> == |yes> if  (t - epsilon) <= n <= (t + epsilon), else |no>
is-coeff-in-range[t1,t2] n|x: y> == |yes> if t1 <= n <= t2, else |no>
Now I guess the comment comes up, how much is too little and how much is too much syntactic sugar? I guess I lean towards more rather than less.

Tuesday, 10 February 2015

XML vs BKO

So today, I want to briefly mention there is also overlap between XML and my sw/BKO notation. And of course, it being my child, I much prefer my notation! One advantage of my notation is that it is trivial to parse just using grep and sed, let alone python say. Another advantage is all the rest of the power of the BKO knowledge engine once in sw format.

So, how about an example?
Here is an XML breakfast menu:
<breakfast_menu>
  <food>
    <name>Belgian Waffles</name>
    <price>$5.95</price>
    <description>Two of our famous Belgian Waffles with plenty of real maple syrup</description>
    <calories>650</calories>
  </food>
  <food>
    <name>Strawberry Belgian Waffles</name>
    <price>$7.95</price>
    <description>Light Belgian waffles covered with strawberries and whipped cream</description>
    <calories>900</calories>
  </food>
  <food>
    <name>Berry-Berry Belgian Waffles</name>
    <price>$8.95</price>
    <description>Light Belgian waffles covered with an assortment of fresh berries and whipped cream</description>
    <calories>900</calories>
  </food>
  <food>
    <name>French Toast</name>
    <price>$4.50</price>
    <description>Thick slices made from our homemade sourdough bread</description>
    <calories>600</calories>
  </food>
  <food>
    <name>Homestyle Breakfast</name>
    <price>$6.95</price>
    <description>Two eggs, bacon or sausage, toast, and our ever-popular hash browns</description>
    <calories>950</calories>
  </food>
</breakfast_menu>
And the same thing in BKO (I will admit BKO is probably harder to read than XML for those not familiar with the notation):
|context> => |context: breakfast menu>

 |menu: breakfast: list> => |food: Belgian Waffles> + |food: Strawberry Belgian Waffles> + |food: Berry-Berry Belgian Waffles> + |food: French Toast> + |food: Homestyle Breakfast>

name |food: Belgian Waffles> => |text: "Belgian Waffles">
price |food: Belgian Waffles> => |price: 5.95>
description |food: Belgian Waffles> => |text: "Two of our famous Belgian Waffles with plenty of real maple syrup">
calories |food: Belgian Waffles> => |calories: 650>

name |food: Strawberry Belgian Waffles> => |text: "Strawberry Belgian Waffles">
price |food: Strawberry Belgian Waffles> => |price: 7.95>
description |food: Strawberry Belgian Waffles> => |text: "Light Belgian waffles covered with strawberries and whipped cream">
calories |food: Strawberry Belgian Waffles> => |calories: 900>

name |food: Berry-Berry Belgian Waffles> => |text: "Berry-Berry Belgian Waffles">
price |food: Berry-Berry Belgian Waffles> => |price: 8.95>
description |food: Berry-Berry Belgian Waffles> => |text: "Light Belgian waffles covered with an assortment of fresh berries and whipped cream">
calories |food: Berry-Berry Belgian Waffles> => |calories: 900>

name |food: French Toast> => |text: "French Toast">
price |food: French Toast> => |price: 4.50>
description |food: French Toast> => |text: "Thick slices made from our homemade sourdough bread">
calories |food: French Toast> => |calories: 600>

name |food: Homestyle Breakfast> => |text: "Homestyle Breakfast">
price |food: Homestyle Breakfast> => |price: 6.95>
description |food: Homestyle Breakfast> => |text: "Two eggs, bacon or sausage, toast, and our ever-popular hash browns">
calories |food: Homestyle Breakfast> => |calories: 950> 
name |food: Belgian Waffles> => |text: "Belgian Waffles">
price |food: Belgian Waffles> => |price: 5.95>
description |food: Belgian Waffles> => |text: "Two of our famous Belgian Waffles with plenty of real maple syrup">
calories |food: Belgian Waffles> => |calories: 650>
And now, load her up, and print a table (noting the table code has been updated so it always extracts-values. I now think this is the way to go, but the idea is currently on probation):
sa: load clean-breakfast-menu.sw
sa: table[food-name,price,calories,description] "" |menu: breakfast: list>
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
| food-name                   | price | calories | description                                                                           |
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
| Belgian Waffles             | 5.95  | 650      | "Two of our famous Belgian Waffles with plenty of real maple syrup"                   |
| Strawberry Belgian Waffles  | 7.95  | 900      | "Light Belgian waffles covered with strawberries and whipped cream"                   |
| Berry-Berry Belgian Waffles | 8.95  | 900      | "Light Belgian waffles covered with an assortment of fresh berries and whipped cream" |
| French Toast                | 4.50  | 600      | "Thick slices made from our homemade sourdough bread"                                 |
| Homestyle Breakfast         | 6.95  | 950      | "Two eggs, bacon or sausage, toast, and our ever-popular hash browns"                 |
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
And of course, we can sort by price:
table[food,price,calories,description] sort-by[price] "" |menu: breakfast: list>
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
| food                        | price | calories | description                                                                           |
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
| French Toast                | 4.50  | 600      | "Thick slices made from our homemade sourdough bread"                                 |
| Belgian Waffles             | 5.95  | 650      | "Two of our famous Belgian Waffles with plenty of real maple syrup"                   |
| Homestyle Breakfast         | 6.95  | 950      | "Two eggs, bacon or sausage, toast, and our ever-popular hash browns"                 |
| Strawberry Belgian Waffles  | 7.95  | 900      | "Light Belgian waffles covered with strawberries and whipped cream"                   |
| Berry-Berry Belgian Waffles | 8.95  | 900      | "Light Belgian waffles covered with an assortment of fresh berries and whipped cream" |
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
And we can sort by calories:
table[food,price,calories,description] sort-by[calories] "" |menu: breakfast: list>
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
| food                        | price | calories | description                                                                           |
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
| French Toast                | 4.50  | 600      | "Thick slices made from our homemade sourdough bread"                                 |
| Belgian Waffles             | 5.95  | 650      | "Two of our famous Belgian Waffles with plenty of real maple syrup"                   |
| Strawberry Belgian Waffles  | 7.95  | 900      | "Light Belgian waffles covered with strawberries and whipped cream"                   |
| Berry-Berry Belgian Waffles | 8.95  | 900      | "Light Belgian waffles covered with an assortment of fresh berries and whipped cream" |
| Homestyle Breakfast         | 6.95  | 950      | "Two eggs, bacon or sausage, toast, and our ever-popular hash browns"                 |
+-----------------------------+-------+----------+---------------------------------------------------------------------------------------+
And that's it for this post. Heaps more to come!

Update: I don't know. Perhaps someone smarter than I could write an XML => sw, and a sw => XML transform code?

Thursday, 5 February 2015

another simple one, aiming towards natural language

This is just a simple one, again, inching ever so slowly towards natural language.

-- let's learn some rules:
nothing-like |*> #=> 0 |_self>
a-little-like |*> #=> 0.3 |_self>
like |*> #=> 0.7 |_self>
a-lot-like |*> #=> 0.9 |_self>
exactly-like |*> #=> |_self>

-- learn the taste of an orange:
taste |orange> => |citrus> + |sweet> + |juicy>

-- learn the taste of a banana is nothing like that of an orange:
taste |banana> #=> nothing-like taste |orange>

-- ask the taste of a banana:
sa: taste |banana>
0.000|citrus> + 0.000|sweet> + 0.000|juicy>

-- learn the taste of a mandarin is a lot like that of an orange:
taste |mandarin> #=> a-lot-like taste |orange>

-- ask the taste of a mandarin:
sa: taste |mandarin>
0.900|citrus> + 0.900|sweet> + 0.900|juicy>

Anyway, just some proof of concept, I suppose.

Next example:
-- learn: "a person is happy if they are whistling".
is-happy |*> #=> if(is-whistling |_self>,|yes>,|>)

-- learn Fred is whistling:
is-whistling |Fred> => |yes>

-- now ask some questions:
sa: do-you-know is-happy |Fred>
|yes>

sa: is-happy |Fred>
|yes>

sa: do-you-know is-happy |Sam>
|no>

sa: is-happy |Sam>
|>

Update: for lack of a better place to put this one.
-- learn a couple of "is-greater-than" operators:
is-greater-than-0 |*> #=> do-you-know push-float drop-below[1] pop-float |_self>
is-greater-than-5 |*> #=> do-you-know push-float drop-below[5.01] pop-float |_self>

-- now take a look:
sa: table[number,is-greater-than-0,is-greater-than-5] range(|-5>,|10>)
+--------+-------------------+-------------------+
| number | is-greater-than-0 | is-greater-than-5 |
+--------+-------------------+-------------------+
| -5     | no                | no                |
| -4     | no                | no                |
| -3     | no                | no                |
| -2     | no                | no                |
| -1     | no                | no                |
| 0      | no                | no                |
| 1      | yes               | no                |
| 2      | yes               | no                |
| 3      | yes               | no                |
| 4      | yes               | no                |
| 5      | yes               | no                |
| 6      | yes               | yes               |
| 7      | yes               | yes               |
| 8      | yes               | yes               |
| 9      | yes               | yes               |
| 10     | yes               | yes               |
+--------+-------------------+-------------------+
And we can also easily enough count the number greater than 0 in that range:
(makes use of linearity of literal operators, and that kets add)
sa: is-greater-than-0 range(|-5>,|10>)
6.000|no> + 10.000|yes>

And how many are greater than 5:
sa: is-greater-than-5 range(|-5>,|10>)
11.000|no> + 5.000|yes>

Update: a table version of "is-happy":
sa: is-happy |*> #=> if(is-whistling |_self>,|yes>,|>)
sa: is-whistling |Fred> => |yes>
sa: do-you-know-is-happy |*> #=> do-you-know is-happy |_self>
sa: table[name,do-you-know-is-happy,is-happy,is-whistling] (|Fred> + |Sam>) 
+------+----------------------+----------+--------------+
| name | do-you-know-is-happy | is-happy | is-whistling |
+------+----------------------+----------+--------------+
| Fred | yes                  | yes      | yes          |
| Sam  | no                   |          |              |
+------+----------------------+----------+--------------+

SPARQL vs BKO

So I've been doing a little bit of reading about dbpedia, since it has a lot of overlap with what I am trying to do! Amongst other things there, there are a few SPARQL examples. For example, "cities with more than 2 million habitants".

First in SPARQL:
(name and population for the top 30 biggest cities)
SELECT ?subject ?population WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
?subject <http://dbpedia.org/ontology/populationUrban> ?population.
FILTER (xsd:integer(?population) > 2000000)
}
ORDER BY DESC(xsd:integer(?population))
LIMIT 30
Then in BKO:
-- load up geonames data for all cities with population >= 15000:
sa: load improved-geonames-cities-15000.sw

-- restrict to cities with population >= 2,000,000, and sort the result:
sa: |result> => coeff-sort drop-below[2000000] population-self relevant-kets[population-self] |>

-- define population-comma operator (tidies our population results a little):
sa: population-comma |*> #=> to-comma-number extract-value population |_self>

-- how many results:
sa: how-many "" |result>
|number: 149>

-- print rank table of top 30 results:
sa: rank-table[id,name,population-comma] select[1,30] "" |result>
+------+-------------+----------------+------------------+
| rank | id          | name           | population-comma |
+------+-------------+----------------+------------------+
| 1    | id: 1796236 | Shanghai       | 22,315,474       |
| 2    | id: 3435910 | Buenos Aires   | 13,076,300       |
| 3    | id: 1275339 | Mumbai         | 12,691,836       |
| 4    | id: 3530597 | Mexico City    | 12,294,193       |
| 5    | id: 1816670 | Beijing        | 11,716,620       |
| 6    | id: 1174872 | Karachi        | 11,624,219       |
| 7    | id: 745044  | Istanbul       | 11,174,257       |
| 8    | id: 1792947 | Tianjin        | 11,090,314       |
| 9    | id: 1809858 | Guangzhou      | 11,071,424       |
| 10   | id: 1273294 | Delhi          | 10,927,986       |
| 11   | id: 1701668 | Manila         | 10,444,527       |
| 12   | id: 524901  | Moscow         | 10,381,222       |
| 13   | id: 1795565 | Shenzhen       | 10,358,381       |
| 14   | id: 1185241 | Dhaka          | 10,356,500       |
| 15   | id: 1835848 | Seoul          | 10,349,312       |
| 16   | id: 3448439 | Sao Paulo      | 10,021,295       |
| 17   | id: 1791247 | Wuhan          | 9,785,388        |
| 18   | id: 2332459 | Lagos          | 9,000,000        |
| 19   | id: 1642911 | Jakarta        | 8,540,121        |
| 20   | id: 1850147 | Tokyo          | 8,336,599        |
| 21   | id: 5128581 | New York City  | 8,175,133        |
| 22   | id: 1812545 | Dongguan       | 8,000,000        |
| 23   | id: 1668341 | Taipei         | 7,871,900        |
| 24   | id: 2314302 | Kinshasa       | 7,785,965        |
| 25   | id: 3936456 | Lima           | 7,737,002        |
| 26   | id: 360630  | Cairo          | 7,734,614        |
| 27   | id: 3688689 | Bogota         | 7,674,366        |
| 28   | id: 2643741 | City of London | 7,556,900        |
| 29   | id: 2643743 | London         | 7,556,900        |
| 30   | id: 1814906 | Chongqing      | 7,457,600        |
+------+-------------+----------------+------------------+
Of mild note is some populations here are wildly different from the dbpedia results. Different ways of counting population (eg, center of a city, vs a broader definition) I presume.

I guess the other thing about this post is I'm not a big fan of the SPARQL notation. I much prefer my notation, but heh, I'm kinda biased.

The other thing to note is I now have 4 permutations of the table operator: table[], strict-table[], rank-table[], and strict-rank-table[]. Also, I tweaked table[] so that it auto-sets all coeffs of the incoming superposition to 1 (using the set-to[1] sigmoid). I figure there are no use cases for when we want non 1 coeffs for the incoming superposition. If I'm wrong, then just have to comment out one line of code.

Update: what if we want a table without the "id" column? Well, not super easy. We have to do it indirectly again. And also noting that in the geonames data, id is unique, but names are not. Anyway, our code neatly handles ambiguity like that.

Now, using the same |result> as defined above, let's take a quick look at what we know:
sa: select[1,5] "" |result>
22315474.000|id: 1796236> + 13076300.000|id: 3435910> + 12691836.000|id: 1275339> + 12294193.000|id: 3530597> + 11716620.000|id: 1816670>

sa: name select[1,5] "" |result>
22315474.000|Shanghai> + 13076300.000|Buenos Aires> + 12691836.000|Mumbai> + 12294193.000|Mexico City> + 11716620.000|Beijing>

-- now define some operators:
-- define the pop operator (NB: the "id" just before |_self> that maps back from "name" to geoname "id"):
popn |*> #=> to-comma-number extract-value population id |_self>

-- define improved popn operator (sorts population results):
improved-popn |*> #=> to-comma-number extract-value population reverse sort-by[population] id |_self>

-- a variant of that is this (makes use of list-to-words):
improved-popn |*> #=> list-to-words to-comma-number extract-value population reverse sort-by[population] id |_self>

-- define the "count-name-matches" operator, which tells us the number of different objects sharing a name:
-- (defined to ignore value if count == 1) 
count-name-matches |*> #=> push-float drop-below[2] pop-float extract-value count id |_self>

-- show the resulting table:
sa: rank-table[city,improved-popn,count-name-matches] name "" |result>
+------+----------------------+------------------------------------+--------------------+
| rank | city                 | improved-popn                      | count-name-matches |
+------+----------------------+------------------------------------+--------------------+
| 1    | Shanghai             | 22,315,474                         |                    |
| 2    | Buenos Aires         | 13,076,300                         |                    |
| 3    | Mumbai               | 12,691,836                         |                    |
| 4    | Mexico City          | 12,294,193                         |                    |
| 5    | Beijing              | 11,716,620                         |                    |
| 6    | Karachi              | 11,624,219                         |                    |
| 7    | Istanbul             | 11,174,257                         |                    |
| 8    | Tianjin              | 11,090,314                         |                    |
| 9    | Guangzhou            | 11,071,424                         |                    |
| 10   | Delhi                | 10,927,986                         |                    |
| 11   | Manila               | 10,444,527                         |                    |
| 12   | Moscow               | 10,381,222, 23,800                 | 2                  |
| 13   | Shenzhen             | 10,358,381                         |                    |
| 14   | Dhaka                | 10,356,500, 36,111                 | 2                  |
| 15   | Seoul                | 10,349,312                         |                    |
| 16   | Sao Paulo            | 10,021,295                         |                    |
| 17   | Wuhan                | 9,785,388                          |                    |
| 18   | Lagos                | 9,000,000, 18,831                  | 2                  |
| 19   | Jakarta              | 8,540,121                          |                    |
| 20   | Tokyo                | 8,336,599                          |                    |
| 21   | New York City        | 8,175,133                          |                    |
| 22   | Dongguan             | 8,000,000                          |                    |
| 23   | Taipei               | 7,871,900                          |                    |
| 24   | Kinshasa             | 7,785,965                          |                    |
| 25   | Lima                 | 7,737,002, 38,771                  | 2                  |
| 26   | Cairo                | 7,734,614                          |                    |
| 27   | Bogota               | 7,674,366                          |                    |
| 28   | City of London       | 7,556,900                          |                    |
| 29   | London               | 7,556,900, 346,765                 | 2                  |
| 30   | Chongqing            | 7,457,600                          |                    |
| 31   | Chengdu              | 7,415,590                          |                    |
| 32   | Nanjing              | 7,165,292                          |                    |
| 33   | Tehran               | 7,153,309                          |                    |
| 34   | Nanchong             | 7,150,000                          |                    |
| 35   | Hong Kong            | 7,012,738                          |                    |
| 36   | Xi'an                | 6,501,190                          |                    |
| 37   | Lahore               | 6,310,888                          |                    |
| 38   | Shenyang             | 6,255,921                          |                    |
| 39   | Hangzhou             | 6,241,971                          |                    |
| 40   | Rio de Janeiro       | 6,023,699                          |                    |
| 41   | Harbin               | 5,878,939                          |                    |
| 42   | Baghdad              | 5,672,513                          |                    |
| 43   | Tai'an               | 5,499,000                          |                    |
| 44   | Suzhou               | 5,345,961, 205,130                 | 2                  |
| 45   | Shantou              | 5,329,024                          |                    |
| 46   | Bangkok              | 5,104,476                          |                    |
| 47   | Bangalore            | 5,104,047                          |                    |
| 48   | Saint Petersburg     | 5,028,000, 244,769                 | 2                  |
| 49   | Santiago             | 4,837,295, 108,414, 46,611, 35,968 | 4                  |
| 50   | Kolkata              | 4,631,392                          |                    |
| 51   | Sydney               | 4,627,345, 105,968                 | 2                  |
| 52   | Toronto              | 4,612,191                          |                    |
| 53   | Yangon               | 4,477,638                          |                    |
| 54   | Jinan                | 4,335,989                          |                    |
| 55   | Chennai              | 4,328,063                          |                    |
| 56   | Zhengzhou            | 4,253,913                          |                    |
| 57   | Melbourne            | 4,246,375, 76,068                  | 2                  |
| 58   | Riyadh               | 4,205,961                          |                    |
| 59   | Changchun            | 4,193,073                          |                    |
| 60   | Dalian               | 4,087,733                          |                    |
| 61   | Chittagong           | 3,920,222                          |                    |
| 62   | Kunming              | 3,855,346                          |                    |
| 63   | Alexandria           | 3,811,516, 139,966, 49,346, 47,723 | 4                  |
| 64   | Los Angeles          | 3,792,621, 125,430                 | 2                  |
| 65   | Ahmedabad            | 3,719,710                          |                    |
| 66   | Qingdao              | 3,718,835                          |                    |
| 67   | Busan                | 3,678,555                          |                    |
| 68   | Abidjan              | 3,677,115                          |                    |
| 69   | Kano                 | 3,626,068                          |                    |
| 70   | Foshan               | 3,600,000                          |                    |
| 71   | Hyderabad            | 3,597,816, 1,386,330               | 2                  |
| 72   | Puyang               | 3,590,000                          |                    |
| 73   | Yokohama             | 3,574,443                          |                    |
| 74   | Ibadan               | 3,565,108                          |                    |
| 75   | Singapore            | 3,547,809                          |                    |
| 76   | Wuxi                 | 3,543,719, 66,442                  | 2                  |
| 77   | Xiamen               | 3,531,347                          |                    |
| 78   | Ankara               | 3,517,182                          |                    |
| 79   | Tianshui             | 3,500,000                          |                    |
| 80   | Ningbo               | 3,491,597                          |                    |
| 81   | Ho Chi Minh City     | 3,467,331                          |                    |
| 82   | Shiyan               | 3,460,000, 408,055                 | 2                  |
| 83   | Cape Town            | 3,433,441                          |                    |
| 84   | Taiyuan              | 3,426,519                          |                    |
| 85   | Berlin               | 3,426,354                          |                    |
| 86   | Tangshan             | 3,372,102                          |                    |
| 87   | Hefei                | 3,310,268                          |                    |
| 88   | Montreal             | 3,268,513                          |                    |
| 89   | Madrid               | 3,255,944, 50,437                  | 2                  |
| 90   | Pyongyang            | 3,222,000                          |                    |
| 91   | Casablanca           | 3,144,909                          |                    |
| 92   | Zibo                 | 3,129,228                          |                    |
| 93   | Zhongshan            | 3,121,275                          |                    |
| 94   | Durban               | 3,120,282                          |                    |
| 95   | Changsha             | 3,093,980                          |                    |
| 96   | Kabul                | 3,043,532                          |                    |
| 97   | UEruemqi             | 3,029,372                          |                    |
| 98   | Caracas              | 3,000,000                          |                    |
| 99   | Pune                 | 2,935,744                          |                    |
| 100  | Surat                | 2,894,504                          |                    |
| 101  | Jeddah               | 2,867,446                          |                    |
| 102  | Shijiazhuang         | 2,834,942                          |                    |
| 103  | Kanpur               | 2,823,249                          |                    |
| 104  | Kiev                 | 2,797,553                          |                    |
| 105  | Luanda               | 2,776,168                          |                    |
| 106  | Quezon City          | 2,761,720                          |                    |
| 107  | Addis Ababa          | 2,757,729                          |                    |
| 108  | Nairobi              | 2,750,547                          |                    |
| 109  | Salvador             | 2,711,840                          |                    |
| 110  | Jaipur               | 2,711,758                          |                    |
| 111  | Dar es Salaam        | 2,698,652                          |                    |
| 112  | Chicago              | 2,695,598                          |                    |
| 113  | Lanzhou              | 2,628,426                          |                    |
| 114  | Incheon              | 2,628,000                          |                    |
| 115  | Yunfu                | 2,612,800                          |                    |
| 116  | Navi Mumbai          | 2,600,000                          |                    |
| 117  | Al Basrah            | 2,600,000                          |                    |
| 118  | Osaka-shi            | 2,592,413                          |                    |
| 119  | Mogadishu            | 2,587,183                          |                    |
| 120  | Daegu                | 2,566,540                          |                    |
| 121  | Faisalabad           | 2,506,595                          |                    |
| 122  | Izmir                | 2,500,603                          |                    |
| 123  | Dakar                | 2,476,400                          |                    |
| 124  | Lucknow              | 2,472,011                          |                    |
| 125  | Al Jizah             | 2,443,203                          |                    |
| 126  | Fortaleza            | 2,400,000                          |                    |
| 127  | Cali                 | 2,392,877                          |                    |
| 128  | Surabaya             | 2,374,658                          |                    |
| 129  | Belo Horizonte       | 2,373,224                          |                    |
| 130  | Nanchang             | 2,357,839                          |                    |
| 131  | Grand Dakar          | 2,352,057                          |                    |
| 132  | Rome                 | 2,318,895, 36,303, 33,725          | 3                  |
| 133  | Mashhad              | 2,307,177                          |                    |
| 134  | Brooklyn             | 2,300,664                          |                    |
| 135  | Borough of Queens    | 2,272,771                          |                    |
| 136  | Nagpur               | 2,228,018                          |                    |
| 137  | Maracaibo            | 2,225,000                          |                    |
| 138  | Brasilia             | 2,207,718                          |                    |
| 139  | Santo Domingo        | 2,201,941, 45,476                  | 2                  |
| 140  | Nagoya-shi           | 2,191,279                          |                    |
| 141  | Brisbane             | 2,189,878                          |                    |
| 142  | Havana               | 2,163,824                          |                    |
| 143  | Paris                | 2,138,551, 25,171                  | 2                  |
| 144  | Houston              | 2,099,451                          |                    |
| 145  | Al Mawsil al Jadidah | 2,065,597                          |                    |
| 146  | Johannesburg         | 2,026,469                          |                    |
| 147  | Kowloon              | 2,019,533                          |                    |
| 148  | Al Basrat al Qadimah | 2,015,483                          |                    |
| 149  | Almaty               | 2,000,900                          |                    |
+------+----------------------+------------------------------------+--------------------+
And that's it for now. Heaps more to come, as usual.

Tuesday, 3 February 2015

African capital cities and population in table format

Bah! Only got part-way through this one, but I'll post what I have. I wanted to load up some data to pretty print a table of African countries and their capital cities. So I had that working, and then thought why not population too? This time from wikipedia. Problem is the two data-sets have some differences in naming countries (yeah, I guess this is going to be a problem in general for the semantic web). Then I got greedy and thought why not a more comprehensive data set for Africa. Looked again at wikipedia, gave it some thought, and changed my mind. Too much work. Almost need to write a custom parser. Too lazy for that. So I gave up. Here is what I do have:
sa: load africa.sw
sa: table[name,capital-city,population] "" |Africa: country: list>
+---------------------------------------+--------------------+-----------------------+
| name                                  | capital-city       | population            |
+---------------------------------------+--------------------+-----------------------+
| country: Algeria                      | city: Algiers      | population: 39903000  |
| country: Angola                       | city: Luanda       | population: 25326000  |
| country: Benin                        | city: Porto-Novo   | population: 10750000  |
| country: Botswana                     | city: Gaborone     | population: 2176000   |
| country: Burkina Faso                 | city: Ouagadougou  | population: 18477000  |
| country: Burundi                      | city: Bujumbura    | population: 9824000   |
| country: Cameroon                     | city: Yaounde      | population: 21918000  |
| country: Cape Verde                   | city: Praia        | population: 525000    |
| country: Central African Republic     | city: Bangui       | population: 5545000   |
| country: Chad                         | city: N'Djamena    | population: 13675000  |
| country: Cote d'Ivoire                | city: Yamoussoukro |                       |
| country: Democratic Republic of Congo | city: Kinshasa     |                       |
| country: Egypt                        | city: Cairo        | population: 88523000  |
| country: Equatorial Guinea            | city: Malabo       | population: 1996000   |
| country: Eritrea                      | city: Asmara       | population: 6895000   |
| country: Ethiopia                     | city: AddisAbaba   | population: 90076000  |
| country: Gabon                        | city: Libreville   | population: 2382000   |
| country: Ghana                        | city: Accra        | population: 27714000  |
| country: Kenya                        | city: Nairobi      | population: 44153000  |
| country: Lesotho                      | city: Maseru       | population: 1908000   |
| country: Liberia                      | city: Monrovia     | population: 4046000   |
| country: Libya                        | city: Tripoli      | population: 6521000   |
| country: Madagascar                   | city: Antananarivo | population: 23053000  |
| country: Malawi                       | city: Lilongwe     | population: 16307000  |
| country: Mali                         | city: Bamako       | population: 17796000  |
| country: Mauritania                   | city: Nouakchott   | population: 3632000   |
| country: Mauritius                    | city: Port Louis   | population: 1263000   |
| country: Morocco                      | city: Rabat        | population: 33656000  |
| country: Mozambique                   | city: Maputo       | population: 25728000  |
| country: Niger                        | city: Niamey       | population: 18880000  |
| country: Nigeria                      | city: Abuja        | population: 185043000 |
| country: Republic of Congo            | city: Brazzaville  |                       |
| country: Republic of Djibouti         | city: Djibouti     |                       |
| country: Republic of Guinea           | city: Conakry      |                       |
| country: Republic of Namibia          | city: Windhoek     |                       |
| country: Republic of South Sudan      | city: Juba         |                       |
| country: Republic of Sudan            | city: Khartoum     |                       |
| country: Republic of Tunisia          | city: Tunis        |                       |
| country: Rwanda                       | city: Kigali       | population: 11324000  |
| country: Sao Tome and Principe        | city: Sao Tome     |                       |
| country: Senegal                      | city: Dakar        | population: 14150000  |
| country: Seychelles                   | city: Victoria     | population: 97000     |
| country: Sierra Leone                 | city: Freetown     | population: 6513000   |
| country: Somalia                      | city: Mogadishu    | population: 10972000  |
| country: South Africa                 | city: Pretoria     | population: 54844000  |
| country: Swaziland                    | city: Mbabane      | population: 1097000   |
| country: Tanzania                     | city: Dodoma       | population: 48829000  |
| country: The Gambia                   | city: Banjul       |                       |
| country: Togo                         | city: Lome         | population: 7065000   |
| country: Uganda                       | city: Kampala      | population: 35760000  |
| country: Union of Comoros             | city: Moroni       |                       |
| country: Western Sahara               | city: El Aaiun     |                       |
| country: Zambia                       | city: Lusaka       | population: 15474000  |
| country: Zimbabwe                     | city: Harare       | population: 13503000  |
+---------------------------------------+--------------------+-----------------------+
And I guess that is it for now.

Update: if we don't want "country: ", "city: " and "population: " in there we have to do a dance. At first guess you (I certainly did) might try "extract-value" applied to the incoming superposition, to strip the "country: " prefix. But here is what happens (top 5 results for brevity sake):
sa: table[country,capital-city,population] extract-value select[1,5] "" |Africa: country: list>
+--------------+--------------+------------+
| country      | capital-city | population |
+--------------+--------------+------------+
| Algeria      |              |            |
| Angola       |              |            |
| Benin        |              |            |
| Botswana     |              |            |
| Burkina Faso |              |            |
+--------------+--------------+------------+
So why did this happen? Because we have:
capital-city |country: Algeria> => |city: Algiers>
population |country: Algeria> => |population: 39903000>
but we have no knowledge of "capital-city" and "population" applied to kets without the "country: " data-type. ie, these are undefined:
capital-city |Algeria> => ...
population |Algeria> => ...
So this is where the dancing kicks in (NB: the merge-labels() that re-inserts the "country: " data-type):
capital |*> #=> extract-value capital-city merge-labels(|country: > + |_self>)
popn |*> #=> to-comma-number extract-value population merge-labels(|country: > + |_self>)
And now the tidied table:
sa: table[country,capital,popn] extract-value "" |Africa: country: list>
+------------------------------+--------------+-------------+
| country                      | capital      | popn        |
+------------------------------+--------------+-------------+
| Algeria                      | Algiers      | 39,903,000  |
| Angola                       | Luanda       | 25,326,000  |
| Benin                        | Porto-Novo   | 10,750,000  |
| Botswana                     | Gaborone     | 2,176,000   |
| Burkina Faso                 | Ouagadougou  | 18,477,000  |
| Burundi                      | Bujumbura    | 9,824,000   |
| Cameroon                     | Yaounde      | 21,918,000  |
| Cape Verde                   | Praia        | 525,000     |
| Central African Republic     | Bangui       | 5,545,000   |
| Chad                         | N'Djamena    | 13,675,000  |
| Cote d'Ivoire                | Yamoussoukro |             |
| Democratic Republic of Congo | Kinshasa     |             |
| Egypt                        | Cairo        | 88,523,000  |
| Equatorial Guinea            | Malabo       | 1,996,000   |
| Eritrea                      | Asmara       | 6,895,000   |
| Ethiopia                     | AddisAbaba   | 90,076,000  |
| Gabon                        | Libreville   | 2,382,000   |
| Ghana                        | Accra        | 27,714,000  |
| Kenya                        | Nairobi      | 44,153,000  |
| Lesotho                      | Maseru       | 1,908,000   |
| Liberia                      | Monrovia     | 4,046,000   |
| Libya                        | Tripoli      | 6,521,000   |
| Madagascar                   | Antananarivo | 23,053,000  |
| Malawi                       | Lilongwe     | 16,307,000  |
| Mali                         | Bamako       | 17,796,000  |
| Mauritania                   | Nouakchott   | 3,632,000   |
| Mauritius                    | Port Louis   | 1,263,000   |
| Morocco                      | Rabat        | 33,656,000  |
| Mozambique                   | Maputo       | 25,728,000  |
| Niger                        | Niamey       | 18,880,000  |
| Nigeria                      | Abuja        | 185,043,000 |
| Republic of Congo            | Brazzaville  |             |
| Republic of Djibouti         | Djibouti     |             |
| Republic of Guinea           | Conakry      |             |
| Republic of Namibia          | Windhoek     |             |
| Republic of South Sudan      | Juba         |             |
| Republic of Sudan            | Khartoum     |             |
| Republic of Tunisia          | Tunis        |             |
| Rwanda                       | Kigali       | 11,324,000  |
| Sao Tome and Principe        | Sao Tome     |             |
| Senegal                      | Dakar        | 14,150,000  |
| Seychelles                   | Victoria     | 97,000      |
| Sierra Leone                 | Freetown     | 6,513,000   |
| Somalia                      | Mogadishu    | 10,972,000  |
| South Africa                 | Pretoria     | 54,844,000  |
| Swaziland                    | Mbabane      | 1,097,000   |
| Tanzania                     | Dodoma       | 48,829,000  |
| The Gambia                   | Banjul       |             |
| Togo                         | Lome         | 7,065,000   |
| Uganda                       | Kampala      | 35,760,000  |
| Union of Comoros             | Moroni       |             |
| Western Sahara               | El Aaiun     |             |
| Zambia                       | Lusaka       | 15,474,000  |
| Zimbabwe                     | Harare       | 13,503,000  |
+------------------------------+--------------+-------------+
And one more for luck! Top 10 countries in Africa sorted by population, in a rank table, with "country: ", "city: ", and "population: " removed:
sa: rank-table[country,capital,popn] select[1,10] extract-value reverse sort-by[population] "" |Africa: country: list> 
+------+--------------+------------+-------------+
| rank | country      | capital    | popn        |
+------+--------------+------------+-------------+
| 1    | Nigeria      | Abuja      | 185,043,000 |
| 2    | Ethiopia     | AddisAbaba | 90,076,000  |
| 3    | Egypt        | Cairo      | 88,523,000  |
| 4    | South Africa | Pretoria   | 54,844,000  |
| 5    | Tanzania     | Dodoma     | 48,829,000  |
| 6    | Kenya        | Nairobi    | 44,153,000  |
| 7    | Algeria      | Algiers    | 39,903,000  |
| 8    | Uganda       | Kampala    | 35,760,000  |
| 9    | Morocco      | Rabat      | 33,656,000  |
| 10   | Ghana        | Accra      | 27,714,000  |
+------+--------------+------------+-------------+
And I think that is it for this post. I'm slowly learning the best way to interact with tables.

Update: the table code now auto-applies "extract-value" operator to table elements. Now, we no longer need to "do a dance" to remove category text cluttering up our tables. Now simply:
sa: load africa.sw
sa: popn |*> #=> to-comma-number population |_self>
sa: rank-table[country,capital-city,popn] select[1,10] reverse sort-by[population] "" |Africa: country: list>
+------+--------------+--------------+-------------+
| rank | country      | capital-city | popn        |
+------+--------------+--------------+-------------+
| 1    | Nigeria      | Abuja        | 185,043,000 |
| 2    | Ethiopia     | AddisAbaba   | 90,076,000  |
| 3    | Egypt        | Cairo        | 88,523,000  |
| 4    | South Africa | Pretoria     | 54,844,000  |
| 5    | Tanzania     | Dodoma       | 48,829,000  |
| 6    | Kenya        | Nairobi      | 44,153,000  |
| 7    | Algeria      | Algiers      | 39,903,000  |
| 8    | Uganda       | Kampala      | 35,760,000  |
| 9    | Morocco      | Rabat        | 33,656,000  |
| 10   | Ghana        | Accra        | 27,714,000  |
+------+--------------+--------------+-------------+
I'm always in favour of minimising how much work you need to do. So I think this is a big improvement! No dancing just to tidy up the table a little.

Monday, 2 February 2015

new function: int-coeffs-to-word

I've been meaning to write this one for ages and ages. Finally got around to it. Had to add a new operator type in the parsers function tables, though. But I'm not in the mood to explain details. Anyway, let's explain how it works.

int-coeffs-to-word n |x> == |no plural(x)> (if int(n) == 0)
int-coeffs-to-word n |x> == |no x>             (if int(n) == 0, and plural is not defined)
int-coeffs-to-word n |x> == |1 x>               (if int(n) == 1)
int-coeffs-to-word n |x> == |n plural(x)>   (if int(n) != 1 and plural is defined)
int-coeffs-to-word n |x> == |n x>               (if int(n) != 1 and plural is not defined)

Let's load up some plurals (yeah, omitting the "word" data-type):
plural |*> #=> merge-labels(|_self> + |s>)
plural |foot> => |feet>
plural |mouse> => |mice>
plural |radius> => |radii>
plural |tooth> => |teeth>
plural |person> => |people> 
And give an example:
sa: int-coeffs-to-word (|apple> + 3|mouse> + 2|tooth> + 9|cat>)
|1 apple> + |3 mice> + |2 teeth> + |9 cats>
Now as a sentence fragment:
sa: list-to-words int-coeffs-to-word (|apple> + 3|mouse> + 2|tooth> + 9|cat>)
|1 apple, 3 mice, 2 teeth and 9 cats>
And we finish with a fun one:
sa: list-to-words int-coeffs-to-word (2|ear> + 2|eye> + |nose> + 0|tooth>)
|2 ears, 2 eyes, 1 nose and no teeth> 
And that's it for this post. Simple, fun, and slowly working towards natural language. Also, thanks to this post I found a serious bug in context.recall(). Fix was easy enough, thankfully.

non-Abelian algebra

My back-end code has been able to handle basic non-Abelian algebra since I first wrote the algebra code. But until now, it wasn't wired in. Only took like 2 minutes to wire it in, so not sure why I procrastinated so long. Anyway, a couple of examples.
-- a couple of Abelian examples:
sa: algebra(|a> + |b>,|^>,|2>)
|a*a> + 2.000|a*b> + |b*b>

sa: algebra(|a> + |b>,|^>,|3>)
|a*a*a> + 3.000|a*a*b> + 3.000|a*b*b> + |b*b*b>

-- a couple of non-Abelian examples:
sa: non-Abelian-algebra(|a> + |b>,|^>,|2>)
|a*a> + |a*b> + |b*a> + |b*b>

sa: non-Abelian-algebra(|a> + |b>,|^>,|3>)
|a*a*a> + |a*a*b> + |a*b*a> + |a*b*b> + |b*a*a> + |b*a*b> + |b*b*a> + |b*b*b>
And that is about it. Simple enough. Just wanted to mention it.

tweaked pretty print table code

Tweaked the code so that there is now a "strict" mode. In this mode it only prints a row if all elements in that row are not empty.

An example in non-strict mode:
sa: load foaf-example-in-sw.sw
sa: table[name,email,works-for] "" |list>
+------------------------+----------------------------------+------------------------+
| name                   | email                            | works-for              |
+------------------------+----------------------------------+------------------------+
| Dan                    | email: danbri@w3.org             | organisation: ILRT     |
| Libby                  | email: libby.miller@bris.ac.uk   | organisation: ILRT     |
| Craig                  | email: craig@netgates.co.uk      | organisation: Netgates |
| Liz                    |                                  | organisation: Netgates |
| Kathleen               |                                  | organisation: Netgates |
| Damian                 |                                  |                        |
| Martin                 | email: m.l.poulter@bristol.ac.uk |                        |
| organisation: ILRT     |                                  |                        |
| organisation: Netgates |                                  |                        |
+------------------------+----------------------------------+------------------------+
Same example, but in strict mode:
sa: strict-table[name,email,works-for] "" |list>
+------------------------+----------------------------------+------------------------+
| name                   | email                            | works-for              |
+------------------------+----------------------------------+------------------------+
| Dan                    | email: danbri@w3.org             | organisation: ILRT     |
| Libby                  | email: libby.miller@bris.ac.uk   | organisation: ILRT     |
| Craig                  | email: craig@netgates.co.uk      | organisation: Netgates |
+------------------------+----------------------------------+------------------------+
And that's it. Heaps more to come!

Sunday, 1 February 2015

is-early and is-late in BKO

We can use a similar construct to that we recently used for is-teenager, and is-adult, to implement is-early and is-late. For example sake, assume:
is-early: 3:30 -> 7:00
is-late: 10:30 -> 3:30
-- learn this:
is-early |time: 24h: *> #=> do-you-know drop-above[700] drop-below[330] pop-float |_self>
is-late |time: 24h: *> #=> do-you-know (drop-below[2230] pop-float |_self> + drop-above[330] pop-float |_self>)

-- now test it:
-- learn a list of times:
|24h times> => range(|time: 24h: 100>,|time: 24h: 2400>,|100>)

-- pretty print a table:
sa: table[24h-time,is-early,is-late] "" |24h times>
+-----------------+----------+---------+
| 24h-time        | is-early | is-late |
+-----------------+----------+---------+
| time: 24h: 100  | no       | yes     |
| time: 24h: 200  | no       | yes     |
| time: 24h: 300  | no       | yes     |
| time: 24h: 400  | yes      | no      |
| time: 24h: 500  | yes      | no      |
| time: 24h: 600  | yes      | no      |
| time: 24h: 700  | yes      | no      |
| time: 24h: 800  | no       | no      |
| time: 24h: 900  | no       | no      |
| time: 24h: 1000 | no       | no      |
| time: 24h: 1100 | no       | no      |
| time: 24h: 1200 | no       | no      |
| time: 24h: 1300 | no       | no      |
| time: 24h: 1400 | no       | no      |
| time: 24h: 1500 | no       | no      |
| time: 24h: 1600 | no       | no      |
| time: 24h: 1700 | no       | no      |
| time: 24h: 1800 | no       | no      |
| time: 24h: 1900 | no       | no      |
| time: 24h: 2000 | no       | no      |
| time: 24h: 2100 | no       | no      |
| time: 24h: 2200 | no       | no      |
| time: 24h: 2300 | no       | yes     |
| time: 24h: 2400 | no       | yes     |
+-----------------+----------+---------+
So, that works and all. I guess a thing to note is we can rewrite our learn rules using the in-range sigmoid and the drop function, instead of drop-above and drop-below. Simply enough:
range-is-early |time: 24h: *> #=> do-you-know drop sigmoid-in-range[330,700] pop-float |_self>
range-is-late |time: 24h: *> #=> do-you-know drop (sigmoid-in-range[2230,2400] pop-float |_self> + sigmoid-in-range[0,330] pop-float |_self>)

-- now look at the resulting table:
-- (we see they essentially agree)
sa: table[24h-time,is-early,range-is-early,is-late,range-is-late] "" |24h times>
+-----------------+----------+----------------+---------+---------------+
| 24h-time        | is-early | range-is-early | is-late | range-is-late |
+-----------------+----------+----------------+---------+---------------+
| time: 24h: 100  | no       | no             | yes     | yes           |
| time: 24h: 200  | no       | no             | yes     | yes           |
| time: 24h: 300  | no       | no             | yes     | yes           |
| time: 24h: 400  | yes      | yes            | no      | no            |
| time: 24h: 500  | yes      | yes            | no      | no            |
| time: 24h: 600  | yes      | yes            | no      | no            |
| time: 24h: 700  | yes      | yes            | no      | no            |
| time: 24h: 800  | no       | no             | no      | no            |
| time: 24h: 900  | no       | no             | no      | no            |
| time: 24h: 1000 | no       | no             | no      | no            |
| time: 24h: 1100 | no       | no             | no      | no            |
| time: 24h: 1200 | no       | no             | no      | no            |
| time: 24h: 1300 | no       | no             | no      | no            |
| time: 24h: 1400 | no       | no             | no      | no            |
| time: 24h: 1500 | no       | no             | no      | no            |
| time: 24h: 1600 | no       | no             | no      | no            |
| time: 24h: 1700 | no       | no             | no      | no            |
| time: 24h: 1800 | no       | no             | no      | no            |
| time: 24h: 1900 | no       | no             | no      | no            |
| time: 24h: 2000 | no       | no             | no      | no            |
| time: 24h: 2100 | no       | no             | no      | no            |
| time: 24h: 2200 | no       | no             | no      | no            |
| time: 24h: 2300 | no       | no             | yes     | yes           |
| time: 24h: 2400 | no       | no             | yes     | yes           |
+-----------------+----------+----------------+---------+---------------+
That's probably it for today. Heaps more to come, as usual. Though maybe I should make the observation if I had a better parser, we could compact the range-is-late operator a little to this:
range-is-late |time: 24h: *> #=> do-you-know drop (sigmoid-in-range[2230,2400] + sigmoid-in-range[0,330]) pop-float |_self>

new function: sort-by

Wrote another new function today, didn't take long. This thing is called the sort-by operator, and does pretty much what it says. Instead of having to sort lists indirectly using "clean coeff-sort op-self "" |some list>" as I did in my last post, we can now do it directly: sort-by[op] "" |some list>. (also now means we have less need for op-self operators, but presumably they are still useful somewhere)

General usage:
sort-by[op] some-superposition
one common usage is to use in combination with the table function operator.

Let's use the Australian cities example again:
sa: load pretty-print-table-of-australian-cities.sw

-- sort by ket lables:
-- (though in this case |city list> is already in the right order)
sa: table[city,area,population,annual-rainfall] ket-sort "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Darwin    | 112  | 120900     | mm: 1714.7      |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
+-----------+------+------------+-----------------+

-- sort by area:
sa: table[city,area,population,annual-rainfall] sort-by[area] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Darwin    | 112  | 120900     | mm: 1714.7      |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
+-----------+------+------------+-----------------+

-- sort by population:
sa: table[city,area,population,annual-rainfall] sort-by[population] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Darwin    | 112  | 120900     | mm: 1714.7      |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
+-----------+------+------------+-----------------+

-- reverse sort by population:
-- NB: the "reverse" operator in there
sa: table[city,area,population,annual-rainfall] reverse sort-by[population] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Darwin    | 112  | 120900     | mm: 1714.7      |
+-----------+------+------------+-----------------+

-- sort by annual-rainfall:
sa: table[city,area,population,annual-rainfall] sort-by[annual-rainfall] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Darwin    | 112  | 120900     | mm: 1714.7      |
+-----------+------+------------+-----------------+
I guess that is all kind of obvious and clear.

Update: Had bug after bug trying to handle sorting by numbers in some cases, text in others, and then the mixed case when the rest are numbers, but some are the empty string (which happens when you hit a |>). Anyway, swapped in a "natural sort" and it all seems to work! Cool. Natural sort should also neatly handle the case when an operator applied to different kets returns different data-types. eg, height data mixing cm and feet, say.

sorting in the BKO scheme

Currently there are only two ways to sort superpositions in the BKO scheme. coeff-sort that sorts by the float coefficients of kets, and ket-sort that does a natural sort (though occasionally the natural sort breaks!) on the ket-labels. So today, some examples showing how to sort a superposition/list.

Let's consider the data here, and tweak it to this (noting they are all op-self operators. ie, they only change the coeff of a ket, not the ket label):
area-self |Adelaide> => 1295|Adelaide>
population-self |Adelaide> => 1158259|Adelaide>
annual-rainfall-self |Adelaide> => 600.5|Adelaide>

area-self |Brisbane> => 5905|Brisbane>
population-self |Brisbane> => 1857594|Brisbane>
annual-rainfall-self |Brisbane> => 1146.4|Brisbane>

area-self |Darwin> => 112|Darwin>
population-self |Darwin> => 120900|Darwin>
annual-rainfall-self |Darwin> => 1714.7|Darwin>

area-self |Hobart> => 1357|Hobart>
population-self |Hobart> => 205556|Hobart>
annual-rainfall-self |Hobart> => 619.5|Hobart>

area-self |Melbourne> => 1566|Melbourne>
population-self |Melbourne> => 3806092|Melbourne>
annual-rainfall-self |Melbourne> => 646.9|Melbourne>

area-self |Perth> => 5386|Perth>
population-self |Perth> => 1554769|Perth>
annual-rainfall-self |Perth> => 869.4|Perth>

area-self |Sydney> => 2058|Sydney>
population-self |Sydney> => 4336374|Sydney>
annual-rainfall-self |Sydney> => 1214.8|Sydney>
And I guess now, jump into some examples:
-- sort by area:
sa: coeff-sort area-self "" |city list>
5905.000|Brisbane> + 5386.000|Perth> + 2058.000|Sydney> + 1566.000|Melbourne> + 1357.000|Hobart> + 1295.000|Adelaide> + 112.000|Darwin>

-- tidy that up (by applying the clean sigmoid):
sa: clean coeff-sort area-self "" |city list>
|Brisbane> + |Perth> + |Sydney> + |Melbourne> + |Hobart> + |Adelaide> + |Darwin>

-- sort by population:
sa: clean coeff-sort population-self "" |city list>
|Sydney> + |Melbourne> + |Brisbane> + |Perth> + |Adelaide> + |Hobart> + |Darwin>

-- sort by annual rainfall:
sa: clean coeff-sort annual-rainfall-self "" |city list>
|Darwin> + |Sydney> + |Brisbane> + |Perth> + |Melbourne> + |Hobart> + |Adelaide>
So, until today (more in the next post) this along with ket-sort was the only way to sort superpositions. So I suppose I should give a ket-sort example too.
-- let's create a shuffled list:
|shuffled list> => shuffle "" |city list>

-- let's take a look:
sa: "" |shuffled list>
|Hobart> + |Brisbane> + |Melbourne> + |Adelaide> + |Darwin> + |Sydney> + |Perth>
-- yup. Looks sufficiently shuffled.

-- now ket sort:
sa: ket-sort "" |shuffled list>
|Adelaide> + |Brisbane> + |Darwin> + |Hobart> + |Melbourne> + |Perth> + |Sydney>
I guess the only thing left to mention in this post is that they can be applied just before the table operator:
sa: table[city-name,area,population,annual-rainfall] clean coeff-sort population-self "" |city list>
+-----------+------+------------+-----------------+
| city-name | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Darwin    | 112  | 120900     | mm: 1714.7      |
+-----------+------+------------+-----------------+
Finally, a NB. This is the mess you get if you forget to clean the incoming superposition (ie, your incoming coeffs are not all 1):
sa: table[city-name,area,population,annual-rainfall] coeff-sort population-self "" |city list>
+----------------------+-----------------+--------------------+-----------------------+
| city-name            | area            | population         | annual-rainfall       |
+----------------------+-----------------+--------------------+-----------------------+
| 4336374.00 Sydney    | 4336374.00 2058 | 4336374.00 4336374 | 4336374.00 mm: 1214.8 |
| 3806092.00 Melbourne | 3806092.00 1566 | 3806092.00 3806092 | 3806092.00 mm: 646.9  |
| 1857594.00 Brisbane  | 1857594.00 5905 | 1857594.00 1857594 | 1857594.00 mm: 1146.4 |
| 1554769.00 Perth     | 1554769.00 5386 | 1554769.00 1554769 | 1554769.00 mm: 869.4  |
| 1158259.00 Adelaide  | 1158259.00 1295 | 1158259.00 1158259 | 1158259.00 mm: 600.5  |
| 205556.00 Hobart     | 205556.00 1357  | 205556.00 205556   | 205556.00 mm: 619.5   |
| 120900.00 Darwin     | 120900.00 112   | 120900.00 120900   | 120900.00 mm: 1714.7  |
+----------------------+-----------------+--------------------+-----------------------+
Kind of hard to explain to others what is going on here. A couple of pieces that partly explain it are:
-- first noting that it is not a "clean superposition", ie, it has coeffs with value other than 1:
sa: coeff-sort population-self "" |city list>
4336374.000|Sydney> + 3806092.000|Melbourne> + 1857594.000|Brisbane> + 1554769.000|Perth> + 1158259.000|Adelaide> + 205556.000|Hobart> + 120900.000|Darwin>
And this code in the ket class:
  def readable_display(self):
    if self.label == '':
      return ""
    if self.value == 1:
      return self.label
    else:
      return "{0:.2f} {1}".format(self.value,self.label)
That's it for this post, more sorting in the next one.

Update: Tweaks on the pretty print table code means that it no longer matters if the incoming superposition is a clean superposition or not. Code auto-runs set-to[1] sigmoid on the incoming superposition. I can't think of a use case where you would want otherwise. It is just one line of code to comment out if we do want to switch this feature off though.