The Semantic DB Project

Monday, 2 February 2015

new function: int-coeffs-to-word

I've been meaning to write this one for ages and ages. Finally got around to it. Had to add a new operator type in the parsers function tables, though. But I'm not in the mood to explain details. Anyway, let's explain how it works.

int-coeffs-to-word n |x> == |no plural(x)> (if int(n) == 0)
int-coeffs-to-word n |x> == |no x>             (if int(n) == 0, and plural is not defined)
int-coeffs-to-word n |x> == |1 x>               (if int(n) == 1)
int-coeffs-to-word n |x> == |n plural(x)>   (if int(n) != 1 and plural is defined)
int-coeffs-to-word n |x> == |n x>               (if int(n) != 1 and plural is not defined)

Let's load up some plurals (yeah, omitting the "word" data-type):

plural |*> #=> merge-labels(|_self> + |s>)
plural |foot> => |feet>
plural |mouse> => |mice>
plural |radius> => |radii>
plural |tooth> => |teeth>
plural |person> => |people>

And give an example:

sa: int-coeffs-to-word (|apple> + 3|mouse> + 2|tooth> + 9|cat>)
|1 apple> + |3 mice> + |2 teeth> + |9 cats>

Now as a sentence fragment:

sa: list-to-words int-coeffs-to-word (|apple> + 3|mouse> + 2|tooth> + 9|cat>)
|1 apple, 3 mice, 2 teeth and 9 cats>

And we finish with a fun one:

sa: list-to-words int-coeffs-to-word (2|ear> + 2|eye> + |nose> + 0|tooth>)
|2 ears, 2 eyes, 1 nose and no teeth>

And that's it for this post. Simple, fun, and slowly working towards natural language. Also, thanks to this post I found a serious bug in context.recall(). Fix was easy enough, thankfully.

non-Abelian algebra

My back-end code has been able to handle basic non-Abelian algebra since I first wrote the algebra code. But until now, it wasn't wired in. Only took like 2 minutes to wire it in, so not sure why I procrastinated so long. Anyway, a couple of examples.

-- a couple of Abelian examples:
sa: algebra(|a> + |b>,|^>,|2>)
|a*a> + 2.000|a*b> + |b*b>

sa: algebra(|a> + |b>,|^>,|3>)
|a*a*a> + 3.000|a*a*b> + 3.000|a*b*b> + |b*b*b>

-- a couple of non-Abelian examples:
sa: non-Abelian-algebra(|a> + |b>,|^>,|2>)
|a*a> + |a*b> + |b*a> + |b*b>

sa: non-Abelian-algebra(|a> + |b>,|^>,|3>)
|a*a*a> + |a*a*b> + |a*b*a> + |a*b*b> + |b*a*a> + |b*a*b> + |b*b*a> + |b*b*b>

And that is about it. Simple enough. Just wanted to mention it.

tweaked pretty print table code

Tweaked the code so that there is now a "strict" mode. In this mode it only prints a row if all elements in that row are not empty.

An example in non-strict mode:

sa: load foaf-example-in-sw.sw
sa: table[name,email,works-for] "" |list>
+------------------------+----------------------------------+------------------------+
| name                   | email                            | works-for              |
+------------------------+----------------------------------+------------------------+
| Dan                    | email: danbri@w3.org             | organisation: ILRT     |
| Libby                  | email: libby.miller@bris.ac.uk   | organisation: ILRT     |
| Craig                  | email: craig@netgates.co.uk      | organisation: Netgates |
| Liz                    |                                  | organisation: Netgates |
| Kathleen               |                                  | organisation: Netgates |
| Damian                 |                                  |                        |
| Martin                 | email: m.l.poulter@bristol.ac.uk |                        |
| organisation: ILRT     |                                  |                        |
| organisation: Netgates |                                  |                        |
+------------------------+----------------------------------+------------------------+

Same example, but in strict mode:

sa: strict-table[name,email,works-for] "" |list>
+------------------------+----------------------------------+------------------------+
| name                   | email                            | works-for              |
+------------------------+----------------------------------+------------------------+
| Dan                    | email: danbri@w3.org             | organisation: ILRT     |
| Libby                  | email: libby.miller@bris.ac.uk   | organisation: ILRT     |
| Craig                  | email: craig@netgates.co.uk      | organisation: Netgates |
+------------------------+----------------------------------+------------------------+

And that's it. Heaps more to come!

Sunday, 1 February 2015

is-early and is-late in BKO

We can use a similar construct to that we recently used for is-teenager, and is-adult, to implement is-early and is-late. For example sake, assume:
is-early: 3:30 -> 7:00
is-late: 10:30 -> 3:30

-- learn this:
is-early |time: 24h: *> #=> do-you-know drop-above[700] drop-below[330] pop-float |_self>
is-late |time: 24h: *> #=> do-you-know (drop-below[2230] pop-float |_self> + drop-above[330] pop-float |_self>)

-- now test it:
-- learn a list of times:
|24h times> => range(|time: 24h: 100>,|time: 24h: 2400>,|100>)

-- pretty print a table:
sa: table[24h-time,is-early,is-late] "" |24h times>

+-----------------+----------+---------+
| 24h-time        | is-early | is-late |
+-----------------+----------+---------+
| time: 24h: 100  | no       | yes     |
| time: 24h: 200  | no       | yes     |
| time: 24h: 300  | no       | yes     |
| time: 24h: 400  | yes      | no      |
| time: 24h: 500  | yes      | no      |
| time: 24h: 600  | yes      | no      |
| time: 24h: 700  | yes      | no      |
| time: 24h: 800  | no       | no      |
| time: 24h: 900  | no       | no      |
| time: 24h: 1000 | no       | no      |
| time: 24h: 1100 | no       | no      |
| time: 24h: 1200 | no       | no      |
| time: 24h: 1300 | no       | no      |
| time: 24h: 1400 | no       | no      |
| time: 24h: 1500 | no       | no      |
| time: 24h: 1600 | no       | no      |
| time: 24h: 1700 | no       | no      |
| time: 24h: 1800 | no       | no      |
| time: 24h: 1900 | no       | no      |
| time: 24h: 2000 | no       | no      |
| time: 24h: 2100 | no       | no      |
| time: 24h: 2200 | no       | no      |
| time: 24h: 2300 | no       | yes     |
| time: 24h: 2400 | no       | yes     |
+-----------------+----------+---------+

So, that works and all. I guess a thing to note is we can rewrite our learn rules using the in-range sigmoid and the drop function, instead of drop-above and drop-below. Simply enough:

range-is-early |time: 24h: *> #=> do-you-know drop sigmoid-in-range[330,700] pop-float |_self>
range-is-late |time: 24h: *> #=> do-you-know drop (sigmoid-in-range[2230,2400] pop-float |_self> + sigmoid-in-range[0,330] pop-float |_self>)

-- now look at the resulting table:
-- (we see they essentially agree)
sa: table[24h-time,is-early,range-is-early,is-late,range-is-late] "" |24h times>
+-----------------+----------+----------------+---------+---------------+
| 24h-time        | is-early | range-is-early | is-late | range-is-late |
+-----------------+----------+----------------+---------+---------------+
| time: 24h: 100  | no       | no             | yes     | yes           |
| time: 24h: 200  | no       | no             | yes     | yes           |
| time: 24h: 300  | no       | no             | yes     | yes           |
| time: 24h: 400  | yes      | yes            | no      | no            |
| time: 24h: 500  | yes      | yes            | no      | no            |
| time: 24h: 600  | yes      | yes            | no      | no            |
| time: 24h: 700  | yes      | yes            | no      | no            |
| time: 24h: 800  | no       | no             | no      | no            |
| time: 24h: 900  | no       | no             | no      | no            |
| time: 24h: 1000 | no       | no             | no      | no            |
| time: 24h: 1100 | no       | no             | no      | no            |
| time: 24h: 1200 | no       | no             | no      | no            |
| time: 24h: 1300 | no       | no             | no      | no            |
| time: 24h: 1400 | no       | no             | no      | no            |
| time: 24h: 1500 | no       | no             | no      | no            |
| time: 24h: 1600 | no       | no             | no      | no            |
| time: 24h: 1700 | no       | no             | no      | no            |
| time: 24h: 1800 | no       | no             | no      | no            |
| time: 24h: 1900 | no       | no             | no      | no            |
| time: 24h: 2000 | no       | no             | no      | no            |
| time: 24h: 2100 | no       | no             | no      | no            |
| time: 24h: 2200 | no       | no             | no      | no            |
| time: 24h: 2300 | no       | no             | yes     | yes           |
| time: 24h: 2400 | no       | no             | yes     | yes           |
+-----------------+----------+----------------+---------+---------------+

That's probably it for today. Heaps more to come, as usual. Though maybe I should make the observation if I had a better parser, we could compact the range-is-late operator a little to this:

range-is-late |time: 24h: *> #=> do-you-know drop (sigmoid-in-range[2230,2400] + sigmoid-in-range[0,330]) pop-float |_self>

new function: sort-by

Wrote another new function today, didn't take long. This thing is called the sort-by operator, and does pretty much what it says. Instead of having to sort lists indirectly using "clean coeff-sort op-self "" |some list>" as I did in my last post, we can now do it directly: sort-by[op] "" |some list>. (also now means we have less need for op-self operators, but presumably they are still useful somewhere)

General usage:
sort-by[op] some-superposition
one common usage is to use in combination with the table function operator.

Let's use the Australian cities example again:

sa: load pretty-print-table-of-australian-cities.sw

-- sort by ket lables:
-- (though in this case |city list> is already in the right order)
sa: table[city,area,population,annual-rainfall] ket-sort "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Darwin    | 112  | 120900     | mm: 1714.7      |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
+-----------+------+------------+-----------------+

-- sort by area:
sa: table[city,area,population,annual-rainfall] sort-by[area] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Darwin    | 112  | 120900     | mm: 1714.7      |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
+-----------+------+------------+-----------------+

-- sort by population:
sa: table[city,area,population,annual-rainfall] sort-by[population] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Darwin    | 112  | 120900     | mm: 1714.7      |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
+-----------+------+------------+-----------------+

-- reverse sort by population:
-- NB: the "reverse" operator in there
sa: table[city,area,population,annual-rainfall] reverse sort-by[population] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Darwin    | 112  | 120900     | mm: 1714.7      |
+-----------+------+------------+-----------------+

-- sort by annual-rainfall:
sa: table[city,area,population,annual-rainfall] sort-by[annual-rainfall] "" |city list>
+-----------+------+------------+-----------------+
| city      | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Darwin    | 112  | 120900     | mm: 1714.7      |
+-----------+------+------------+-----------------+

I guess that is all kind of obvious and clear.

Update: Had bug after bug trying to handle sorting by numbers in some cases, text in others, and then the mixed case when the rest are numbers, but some are the empty string (which happens when you hit a |>). Anyway, swapped in a "natural sort" and it all seems to work! Cool. Natural sort should also neatly handle the case when an operator applied to different kets returns different data-types. eg, height data mixing cm and feet, say.

sorting in the BKO scheme

Currently there are only two ways to sort superpositions in the BKO scheme. coeff-sort that sorts by the float coefficients of kets, and ket-sort that does a natural sort (though occasionally the natural sort breaks!) on the ket-labels. So today, some examples showing how to sort a superposition/list.

Let's consider the data here, and tweak it to this (noting they are all op-self operators. ie, they only change the coeff of a ket, not the ket label):

area-self |Adelaide> => 1295|Adelaide>
population-self |Adelaide> => 1158259|Adelaide>
annual-rainfall-self |Adelaide> => 600.5|Adelaide>

area-self |Brisbane> => 5905|Brisbane>
population-self |Brisbane> => 1857594|Brisbane>
annual-rainfall-self |Brisbane> => 1146.4|Brisbane>

area-self |Darwin> => 112|Darwin>
population-self |Darwin> => 120900|Darwin>
annual-rainfall-self |Darwin> => 1714.7|Darwin>

area-self |Hobart> => 1357|Hobart>
population-self |Hobart> => 205556|Hobart>
annual-rainfall-self |Hobart> => 619.5|Hobart>

area-self |Melbourne> => 1566|Melbourne>
population-self |Melbourne> => 3806092|Melbourne>
annual-rainfall-self |Melbourne> => 646.9|Melbourne>

area-self |Perth> => 5386|Perth>
population-self |Perth> => 1554769|Perth>
annual-rainfall-self |Perth> => 869.4|Perth>

area-self |Sydney> => 2058|Sydney>
population-self |Sydney> => 4336374|Sydney>
annual-rainfall-self |Sydney> => 1214.8|Sydney>

And I guess now, jump into some examples:

-- sort by area:
sa: coeff-sort area-self "" |city list>
5905.000|Brisbane> + 5386.000|Perth> + 2058.000|Sydney> + 1566.000|Melbourne> + 1357.000|Hobart> + 1295.000|Adelaide> + 112.000|Darwin>

-- tidy that up (by applying the clean sigmoid):
sa: clean coeff-sort area-self "" |city list>
|Brisbane> + |Perth> + |Sydney> + |Melbourne> + |Hobart> + |Adelaide> + |Darwin>

-- sort by population:
sa: clean coeff-sort population-self "" |city list>
|Sydney> + |Melbourne> + |Brisbane> + |Perth> + |Adelaide> + |Hobart> + |Darwin>

-- sort by annual rainfall:
sa: clean coeff-sort annual-rainfall-self "" |city list>
|Darwin> + |Sydney> + |Brisbane> + |Perth> + |Melbourne> + |Hobart> + |Adelaide>

So, until today (more in the next post) this along with ket-sort was the only way to sort superpositions. So I suppose I should give a ket-sort example too.

-- let's create a shuffled list:
|shuffled list> => shuffle "" |city list>

-- let's take a look:
sa: "" |shuffled list>
|Hobart> + |Brisbane> + |Melbourne> + |Adelaide> + |Darwin> + |Sydney> + |Perth>
-- yup. Looks sufficiently shuffled.

-- now ket sort:
sa: ket-sort "" |shuffled list>
|Adelaide> + |Brisbane> + |Darwin> + |Hobart> + |Melbourne> + |Perth> + |Sydney>

I guess the only thing left to mention in this post is that they can be applied just before the table operator:

sa: table[city-name,area,population,annual-rainfall] clean coeff-sort population-self "" |city list>
+-----------+------+------------+-----------------+
| city-name | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Darwin    | 112  | 120900     | mm: 1714.7      |
+-----------+------+------------+-----------------+

Finally, a NB. This is the mess you get if you forget to clean the incoming superposition (ie, your incoming coeffs are not all 1):

sa: table[city-name,area,population,annual-rainfall] coeff-sort population-self "" |city list>
+----------------------+-----------------+--------------------+-----------------------+
| city-name            | area            | population         | annual-rainfall       |
+----------------------+-----------------+--------------------+-----------------------+
| 4336374.00 Sydney    | 4336374.00 2058 | 4336374.00 4336374 | 4336374.00 mm: 1214.8 |
| 3806092.00 Melbourne | 3806092.00 1566 | 3806092.00 3806092 | 3806092.00 mm: 646.9  |
| 1857594.00 Brisbane  | 1857594.00 5905 | 1857594.00 1857594 | 1857594.00 mm: 1146.4 |
| 1554769.00 Perth     | 1554769.00 5386 | 1554769.00 1554769 | 1554769.00 mm: 869.4  |
| 1158259.00 Adelaide  | 1158259.00 1295 | 1158259.00 1158259 | 1158259.00 mm: 600.5  |
| 205556.00 Hobart     | 205556.00 1357  | 205556.00 205556   | 205556.00 mm: 619.5   |
| 120900.00 Darwin     | 120900.00 112   | 120900.00 120900   | 120900.00 mm: 1714.7  |
+----------------------+-----------------+--------------------+-----------------------+

Kind of hard to explain to others what is going on here. A couple of pieces that partly explain it are:

-- first noting that it is not a "clean superposition", ie, it has coeffs with value other than 1:
sa: coeff-sort population-self "" |city list>
4336374.000|Sydney> + 3806092.000|Melbourne> + 1857594.000|Brisbane> + 1554769.000|Perth> + 1158259.000|Adelaide> + 205556.000|Hobart> + 120900.000|Darwin>

And this code in the ket class:

  def readable_display(self):
    if self.label == '':
      return ""
    if self.value == 1:
      return self.label
    else:
      return "{0:.2f} {1}".format(self.value,self.label)

That's it for this post, more sorting in the next one.

Update: Tweaks on the pretty print table code means that it no longer matters if the incoming superposition is a clean superposition or not. Code auto-runs set-to[1] sigmoid on the incoming superposition. I can't think of a use case where you would want otherwise. It is just one line of code to comment out if we do want to switch this feature off though.

Thursday, 29 January 2015

another pretty print table example

Recall some of the data from the other day. Well, let's pretty print some of that.

-- load up the data:
sa: load foaf-example-in-sw.sw

-- first, we need a list of objects of interest:
sa: |list> => |Dan> + |Libby> + |Craig> + |Liz> + |Kathleen> + |Damian> + |Martin> + |organisation: ILRT> + |organisation: Netgates>

-- now show the table:
sa: table[name,where-live,lives-with,email,works-for,website] "" |list>
+------------------------+---------------------------+--------------+----------------------------------+------------------------+---------------------------------+
| name                   | where-live                | lives-with   | email                            | works-for              | website                         |
+------------------------+---------------------------+--------------+----------------------------------+------------------------+---------------------------------+
| Dan                    | UK: Bristol: Zetland road | Libby, Craig | email: danbri@w3.org             | organisation: ILRT     |                                 |
| Libby                  |                           |              | email: libby.miller@bris.ac.uk   | organisation: ILRT     |                                 |
| Craig                  |                           |              | email: craig@netgates.co.uk      | organisation: Netgates |                                 |
| Liz                    | UK: Bristol               | Kathleen     |                                  | organisation: Netgates |                                 |
| Kathleen               |                           |              |                                  | organisation: Netgates |                                 |
| Damian                 | UK: London                |              |                                  |                        |                                 |
| Martin                 |                           |              | email: m.l.poulter@bristol.ac.uk |                        |                                 |
| organisation: ILRT     |                           |              |                                  |                        | url: http://ilrt.org/           |
| organisation: Netgates |                           |              |                                  |                        | url: http://www.netgates.co.uk/ |
+------------------------+---------------------------+--------------+----------------------------------+------------------------+---------------------------------+

-- another example:
sa: |people list> => |Dan> + |Libby> + |Craig> + |Liz> + |Kathleen> + |Damian> + |Martin>
sa: table[name,wife,where-live,lives-with,knows-quite-well] "" |people list>
+----------+------+---------------------------+--------------+---------------------------+
| name     | wife | where-live                | lives-with   | knows-quite-well          |
+----------+------+---------------------------+--------------+---------------------------+
| Dan      |      | UK: Bristol: Zetland road | Libby, Craig |                           |
| Libby    |      |                           |              |                           |
| Craig    | Liz  |                           |              |                           |
| Liz      |      | UK: Bristol               | Kathleen     |                           |
| Kathleen |      |                           |              |                           |
| Damian   |      | UK: London                |              |                           |
| Martin   |      |                           |              | Craig, Damian, Dan, Libby |
+----------+------+---------------------------+--------------+---------------------------+

And I guess that is about it for this example. Though I guess we could again make the observation that even if we have no data on an operator applied to a ket, the code handles it gracefully.

pretty print some data about Australian cities

So, in my travels trying to work out how to write the pretty print table code (and yeah, I wrote my code from scratch, rather than copying anyone else's code), I found this page. Now, I did not even look at the code used there, but I did borrow the data in his table. Let's pretty print that with my new code:

-- first some knowledge:
sa: load pretty-print-table-of-australian-cities.sw
sa: dump
----------------------------------------
|context> => |context: pretty print table of Australian cities>

 |city list> => |Adelaide> + |Brisbane> + |Darwin> + |Hobart> + |Melbourne> + |Perth> + |Sydney>

area |Adelaide> => |1295>
population |Adelaide> => |1158259>
annual-rainfall |Adelaide> => |mm: 600.5>

area |Brisbane> => |5905>
population |Brisbane> => |1857594>
annual-rainfall |Brisbane> => |mm: 1146.4>

area |Darwin> => |112>
population |Darwin> => |120900>
annual-rainfall |Darwin> => |mm: 1714.7>

area |Hobart> => |1357>
population |Hobart> => |205556>
annual-rainfall |Hobart> => |mm: 619.5>

area |Melbourne> => |1566>
population |Melbourne> => |3806092>
annual-rainfall |Melbourne> => |mm: 646.9>

area |Perth> => |5386>
population |Perth> => |1554769>
annual-rainfall |Perth> => |mm: 869.4>

area |Sydney> => |2058>
population |Sydney> => |4336374>
annual-rainfall |Sydney> => |mm: 1214.8>
----------------------------------------

Now, let's pretty print it:

sa: table[city-name,area,population,annual-rainfall] "" |city list>
+-----------+------+------------+-----------------+
| city-name | area | population | annual-rainfall |
+-----------+------+------------+-----------------+
| Adelaide  | 1295 | 1158259    | mm: 600.5       |
| Brisbane  | 5905 | 1857594    | mm: 1146.4      |
| Darwin    | 112  | 120900     | mm: 1714.7      |
| Hobart    | 1357 | 205556     | mm: 619.5       |
| Melbourne | 1566 | 3806092    | mm: 646.9       |
| Perth     | 5386 | 1554769    | mm: 869.4       |
| Sydney    | 2058 | 4336374    | mm: 1214.8      |
+-----------+------+------------+-----------------+

Cool and pretty! Heaps more to come!

new function: pretty print a table

Just as promised recently, we can add more functions when we think of a new need. Today, it took a while, but I wrote a pretty print table function. Feed it a superposition, apply a bunch of operators to it, then spit out a pretty table.

General usage:
table[first-column-heading,op1,op2,...] some-superposition

So, here are some examples. First up temperature:

-- NB: that due to quirks of the parser we need to do these two first:
-- (essentially casting function operators {F,K} to literal operators)
sa: F |*> #=> F |_self>
sa: K |*> #=> K |_self>

-- now, spit out a temperature table:
sa: table[C,F,K] range(|C: 0>,|C: 100>,|10>)
+--------+-----------+-----------+
| C      | F         | K         |
+--------+-----------+-----------+
| C: 0   | F: 32.00  | K: 273.15 |
| C: 10  | F: 50.00  | K: 283.15 |
| C: 20  | F: 68.00  | K: 293.15 |
| C: 30  | F: 86.00  | K: 303.15 |
| C: 40  | F: 104.00 | K: 313.15 |
| C: 50  | F: 122.00 | K: 323.15 |
| C: 60  | F: 140.00 | K: 333.15 |
| C: 70  | F: 158.00 | K: 343.15 |
| C: 80  | F: 176.00 | K: 353.15 |
| C: 90  | F: 194.00 | K: 363.15 |
| C: 100 | F: 212.00 | K: 373.15 |
+--------+-----------+-----------+

-- Now pretty print as a table:
sa: table[name,mother,father,age,height] (|Fred> + |Nicole> + |Sam>)
+--------+--------+--------+---------+---------+
| name   | mother | father | age     | height  |
+--------+--------+--------+---------+---------+
| Fred   | Jane   | Robert | age: 21 | cm: 179 |
| Nicole | Bev    |        |         | cm: 168 |
| Sam    | Betty  | Tom    | age: 27 | cm: 183 |
+--------+--------+--------+---------+---------+

Noting that it gracefully handles not knowing something, in this table, Nicole's father and age.

Next example, pretty printing some of what we know about early US presidents:

sa: load early-us-presidents.sw
sa: table[name,full-name,president-number,party] "" |early US Presidents: _list>
+------------+---------------------------+------------------+------------------------------+
| name       | full-name                 | president-number | party                        |
+------------+---------------------------+------------------+------------------------------+
| Washington | person: George Washington | number: 1        | party: Independent           |
| Adams      | person: John Adams        | number: 2        | party: Federalist            |
| Jefferson  | person: Thomas Jefferson  | number: 3        | party: Democratic-Republican |
| Madison    | person: James Madison     | number: 4        | party: Democratic-Republican |
| Monroe     | person: James Monroe      | number: 5        | party: Democratic-Republican |
| Q Adams    | person: John Quincy Adams | number: 6        | party: Democratic-Republican |
+------------+---------------------------+------------------+------------------------------+

sa: years-in-office |*> #=> extract-value president-era |_self>
sa: table[name,years-in-office] "" |early US Presidents: _list>
+------------+------------------------------------------------------+
| name       | years-in-office                                      |
+------------+------------------------------------------------------+
| Washington | 1789, 1790, 1791, 1792, 1793, 1794, 1795, 1796, 1797 |
| Adams      | 1797, 1798, 1799, 1800, 1801                         |
| Jefferson  | 1801, 1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809 |
| Madison    | 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817 |
| Monroe     | 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825 |
| Q Adams    | 1825, 1826, 1827, 1828, 1829                         |
+------------+------------------------------------------------------+

Now, another example that ties in with my recent post:

-- load up some knowledge:
age |person: Emma> => |age: 12>
age |person: Fred> => |age: 17>
age |person: Sam> => |age: 18>
age |person: Liz> => |age: 19>
age |person: Jack> => |age: 20>
is-teenager |person: *> #=> do-you-know drop-below[13] drop-above[19] pop-float age |_self>
is-adult |person: *> #=> do-you-know drop-below[18] pop-float age|_self>
|list> => |person: Emma> + |person: Fred> + |person: Sam> + |person: Liz> + |person: Jack>

-- take a look in table format:
sa: table[name,age,is-teenager,is-adult] "" |list>
+--------------+---------+-------------+----------+
| name         | age     | is-teenager | is-adult |
+--------------+---------+-------------+----------+
| person: Emma | age: 12 | no          | no       |
| person: Fred | age: 17 | yes         | no       |
| person: Sam  | age: 18 | yes         | yes      |
| person: Liz  | age: 19 | yes         | yes      |
| person: Jack | age: 20 | no          | yes      |
+--------------+---------+-------------+----------+

Now, an example that ties in with my example of a simple network:

-- load up the data:
sa: load simple-network.sw

-- define some operators:
-- (and again, due to the parser, we have to do it indirectly)
sa: O2 |*> #=> O^2 |_self>
sa: O3 |*> #=> O^3 |_self>
sa: O4 |*> #=> O^4 |_self>
sa: O5 |*> #=> O^5 |_self>

-- take a look:
sa: table[position,O,O2,O3,O4,O5] relevant-kets[O]
+----------+--------+--------+--------+--------+--------+
| position | O      | O2     | O3     | O4     | O5     |
+----------+--------+--------+--------+--------+--------+
| a1       | a2     | a3     | a4     | a5     | a6     |
| a2       | a3     | a4     | a5     | a6     | a7     |
| a3       | a4     | a5     | a6     | a7     | a8     |
| a4       | a5     | a6     | a7     | a8     | a9     |
| a5       | a6     | a7     | a8     | a9     | a10    |
| a6       | a7     | a8     | a9     | a10    | a1, b1 |
| a7       | a8     | a9     | a10    | a1, b1 | a2, b2 |
| a8       | a9     | a10    | a1, b1 | a2, b2 | a3, b3 |
| a9       | a10    | a1, b1 | a2, b2 | a3, b3 | a4, b4 |
| a10      | a1, b1 | a2, b2 | a3, b3 | a4, b4 | a5, b5 |
| b1       | b2     | b3     | b4     | b5     | b6     |
| b2       | b3     | b4     | b5     | b6     | b7     |
| b3       | b4     | b5     | b6     | b7     | b1     |
| b4       | b5     | b6     | b7     | b1     | b2     |
| b5       | b6     | b7     | b1     | b2     | b3     |
| b6       | b7     | b1     | b2     | b3     | b4     |
| b7       | b1     | b2     | b3     | b4     | b5     |
+----------+--------+--------+--------+--------+--------+

And I guess that is about it. This thing is going to be useful all over the place.

Update: an important thing to note is that these tables are not just of stored values. Some of the entries (eg, is-teenager, is-adult, years-in-office) are calculated at the time we generate the table. And they are again generated next time we ask the code to spit out a table. Remember, to generate our tables we are applying operators to the incoming superposition, not just looking up values. One consequence is that if in the background we update Emma's age (she had a birthday, perhaps, or we discovered we had made a mistake) the is-teenager and is-adult operators will do the right thing. No need to update those elements manually, as we would have to do if we were working with just stored values. Besides, if you do want to store values, instead of calculating over and over with each table, just use the map function. Perhaps I will explain later! Of course, if you do this, and then you change the definition of your operator, then the stored values will not be updated.

Another thing to note is that if you want a different name for your column heading than the operator name, just use the standard method for defining an alias. eg:
better-column-name |*> #=> unwanted-operator-name |_self>
And instead of:
table[foo,unwanted-operator-name] "" |list>
use:
table[foo,better-column-name] "" |list>

Tuesday, 27 January 2015

is-teenager and is-adult in BKO

is-teenager |person: *> #=> do-you-know drop sigmoid-in-range[13,19] pop-float age |_self>

Sunday, 25 January 2015

FOAF vs sw

So, I was doing a little reading about FOAF, as it has some overlap with what I am trying to do. Anyway, here is a quote from that page:

Here's an example, a fragment from the mostly-fictional FOAF database. First we list some facts, then describe how the FOAF system makes it possible to explore the Web learning such things.

Dan lives in Zetland road, Bristol, UK with Libby and Craig. Dan's email address is danbri@w3.org. Libby's email address is libby.miller@bris.ac.uk. Craig's is craig@netgates.co.uk. Dan and Libby work for an organisation called "ILRT" whose website is at http://ilrt.org/. Craig works for "Netgates", an organisation whose website is at http://www.netgates.co.uk/. Craig's wife Liz lives in Bristol with Kathleen. Kathleen and Liz also work at "Netgates". Damian lives in London. Martin knows Craig, Damian, Dan and Libby quite well. Martin lives in Bristol and has an email address of m.l.poulter@bristol.ac.uk. (etc...)

Feeling in a minimalist word mood, here is how we would represent that knowledge in sw:

where-live |Dan> => |UK: Bristol: Zetland road>
lives-with |Dan> => |Libby> + |Craig>
email |Dan> => |email: danbri@w3.org>
works-for |Dan> => |organisation: ILRT>

email |Libby> => |email: libby.miller@bris.ac.uk>
works-for |Libby> => |organisation: ILRT>

email |Craig> => |email: craig@netgates.co.uk>
works-for |Craig> => |organisation: Netgates>
wife |Craig> => |Liz>

where-live |Liz> => |UK: Bristol>
lives-with |Liz> => |Kathleen>
works-for |Liz> => |organisation: Netgates>

works-for |Kathleen> => |organisation: Netgates>

website |organisation: ILRT> => |url: http://ilrt.org/>

website |organisation: Netgates> => |url: http://www.netgates.co.uk/>

where-live |Damian> => |UK: London>

knows-quite-well |Martin> => |Craig> + |Damian> + |Dan> + |Libby>
where-lives |Martin> => |UK: Bristol>
email |Martin> => |email: m.l.poulter@bristol.ac.uk>

Next example on that page:

- Find me today's web page recommendations made by people who work for Medical organisations
- Find me recent publications by people I've co-authored documents with
- Show me critiques of this web page, and the home pages of the author of that critique

One page might tell use that "daniel.brickley@bristol.ac.uk works-at http://ilrt.org/". Another might tell use that "http://ilrt.org/ based-in bristol". On this basis, RDF-aware tools could conclude that the person whose email address is daniel.brickley@bristol.ac.uk works for an organisation based in Bristol.

-- learn this knowledge
works-at |daniel.brickley@bristol.ac.uk> => |http://ilrt.org/>
based-in |http://ilrt.org/> => |Bristol>

-- ask in the console:
sa: based-in works-at |daniel.brickley@bristol.ac.uk>
|Bristol>

And that is it for today! I hope a) the examples make sense, and b) I have shown a little of the power of the whole BKO scheme. Heh, and we haven't even used any function operators, this was all literal operators, and a big dose of linearity of literal operators.

I guess a couple of observations. 1) imagine how much harder this would be to do if we used standard neural net matrices and vectors. I think my symbolic notation is much simpler to use. 2) notice how close the questions asked in the console are to English. Again, another win for my notation.

Update: now we have pretty print table code, we can do things like:

sa: author-homepage |*> #=> homepage author |_self>
sa: table[critique,author,author-homepage] list-of-critiques |this web page>
+------------+--------+-----------------+
| critique   | author | author-homepage |
+------------+--------+-----------------+
| critique 1 | Liz    | http://liz.org  |
| critique 2 | Ron    | http://ron.org  |
+------------+--------+-----------------+

Saturday, 24 January 2015

a big collection of function operators

I have been putting this off, as it will be quite a bit of work. But now is the time to try and describe some of the more interesting function operators. There is a whole collection of them (a quick grep says about 150 of them) in this file. And unlike the functions built into ket/sp classes, the plan is to add as many of these as we want or need. Indeed, get other people to write them too (if I can get others interested in this project). Note that behind the scenes, once you have a new function you need to "wire it in" to the processor. This currently means adding an entry into the appropriate hash-table (a black art that, on first try, I frequently get wrong! Though the console debugging info on similar functions is often helpful.)

Preamble over, let's jump in.

-- the ket-length function:
ket-length |abcde> == |number: len(abcde)>

-- the apply-value function:
apply-value |a: b: n> == n |a: b: n> (if n is a float)
apply-value |a: b: n> == |a: b: n> (otherwise)

-- the extract category/data-type function:
extract-category |a> == |>
extract-category |a: b> == |a>
extract-category |a: b: c> == |a: b>

-- the extract value function (the opposite of extract-category):
extract-value |a> == |a>
extract-value |a: b> == |b>
extract-value |a: b: c> == |c>

-- the category depth function:
cat-depth |> == |number: 0>
cat-depth |a> == |number: 1>
cat-depth |a: b> == |number: 2>
cat-depth |a: b: c> == |number: 3>
cat-depth |a: b: c: d: e: f: g> == |number: 7>

-- the expand-hierarchy function:
sa: expand-hierarchy |a: b: c: d: e>
|a> + |a: b> + |a: b: c> + |a: b: c: d> + |a: b: c: d: e>

-- pop-float and push-float
-- Here are some examples:
-- NB: this is not |>, there is a space in there, an important distinction!
pop-float |3.2> == 3.2| >
pop-float 5|7> == 35| > -- NB: the multiplication of 5 and 7
pop-float |x: 2> == 2|x>
pop-float 5.1|x: y: 2> == 10.2|x: y> -- NB: the multiplication of 5.1 and 2
pop-float |x: y> == |x: y>

push-float n|> == |> for all n
push-float 3| > == |3> (NB: the space in there, | > not |>)
push-float |x> == |x: 1>
push-float 3|x> == |x: 3>
push-float 3.2|x: y > == |x: y: 3.2>

-- a couple of example usages:
-- action man reached a height 4 times that of everest
-- first, learn height of everest:
height |everest> => |km: 8>
-- learn height of "action man", noting that the units of height for everest are irrelevant.
height |action man> => push-float 4 pop-float height |everest>
-- "some mountain" is 1/3 the height of everest
height |some mountain> => push-float 0.3333 pop-float height |everest>

-- the to-coeff function
-- kind of a dual to the clean sigmoid
-- clean sets all coeffs to 1
-- to-coeff sets all labels to | >
-- (excluding the identity operator, which we leave intact)
to-coeff n|> == |> for all n
to-coeff n|a> == n| > for all a

-- the to-number function
-- eg, as used in the algebra() code
-- idea, is to map all types of kets to the form "n | >", where n is a float
to-number |7.2> == 7.200| >
to-number 3|9> == 27| >
to-number |number: 3.1415> == 3.142| >
to-number 8|number: 3> == 24.000| >
-- NB: this code treats the "number" data-type differently than other types:
to-number |number: not-a-float> == 0| >
-- when you use a data-type other than "number" we just return the input ket:
to-number |a: b> == |a: b>
to-number 27|a: b: c: d: e> == 27.000|a: b: c: d: e>

-- the round[t] function
-- rounds floats to t decimal places
-- round[t] |a: b: n> == |a: b: round(n,t)> if n is a float, else |a: b: n>
-- eg:
round[2] |pi: 3.14159265> == |pi: 3.14>
round[7] |a: b: c> == |a: b: c>

-- the range function (this one is very useful in defining lists to work on):
-- categories/data-types must be equal:
-- in this case "a" != "b"
sa: range(|a: 2>,|b: 5>)
|>

-- default is step of size 1
sa: range(|5>,|11>)
|5> + |6> + |7> + |8> + |9> + |10> + |11>

-- specify a data-type (here "x"):
sa: range(|x: 1>,|x: 6>)
|x: 1> + |x: 2> + |x: 3> + |x: 4> + |x: 5> + |x: 6>

-- step size of 2
sa: range(|5>,|11>,|2>)
|5> + |7> + |9> + |11>

-- float step size of 0.25
sa: range(|5>,|7>,|0.25>)
|5.00> + |5.25> + |5.50> + |5.75> + |6.00> + |6.25> + |6.50> + |6.75> + |7.00>

-- negative step sizes is currently broken!
range(|5>,|8>,|-1>) == |>
range(|8>,|5>,|-1>) == |8> + |7>

-- the arithmetic function:
-- categories/data-types must be equal (to prevent mix type errors):
-- in this case "a" != "b"
arithmetic(|a: 5>,|+>,|b: 3>) == |>

-- this is one way to ensure data-types are equal:
-- NB: the to-km operator applied to the ket using miles.
arithmetic(to-km |miles: 5>,|+>,|km: 3>) == |km: 11.047>

-- more generally (assuming "a" and "b" have to-X defined correctly):
arithmetic(to-X |a>,|op>,to-X |b>)

Final note, arithmetic supports these operators: +, -, *, /, %, ^
(addition, subtraction, multiplication, division, modulus, exponentiation)

-- the algebra function:
-- (13x + 17)*(19y + 2z + 5)
sa: algebra(13|x> + |17>,|*>,19|y> + 2|z> + |5>)
247.000|x*y> + 26.000|x*z> + 65.000|x> + 323.000|y> + 34.000|z> + 85.000| >

-- (a + b)^6
sa: algebra(|a> + |b>,|^>,|6>)
|a*a*a*a*a*a> + 6.000|a*a*a*a*a*b> + 15.000|a*a*a*a*b*b> + 20.000|a*a*a*b*b*b> + 15.000|a*a*b*b*b*b> + 6.000|a*b*b*b*b*b> + |b*b*b*b*b*b>

And note that algebra currently supports these operators: +, -, *, ^
(addition, subtraction, multiplication, exponentiation)
Also note that currently algebra is Abelian,
ie, labels commute: |x*y> == |y*x>

-- set union and intersection:
-- if coeffs are in {0,1} it works like standard union and intersection:
sa: union(|a> + |c> + |d>,|a> + |b> + |c> + |d> + |e>)
|a> + |c> + |d> + |b> + |e>

sa: intersection(|a> + |c> + |d>,|a> + |b> + |c> + |d> + |e>)
|a> + |c> + |d>

-- if coeffs are not strictly {0,1} then union is max(a,b) and intersection is min(a,b)
-- eg, the simplest possible example:
sa: union(3|a>,7|a>)
7.000|a>

sa: intersection(3|a>,7|a>)
3.000|a>

-- extends in the obvious way for more interesting superpositions:
sa: union(2|a> + 0.3|b> + 0|c> + 13|d> + 0.9|e>,|a> + 11|b> + 23|c> + 0.5|d> + 7|e>)
2.000|a> + 11.000|b> + 23.000|c> + 13.000|d> + 7.000|e>

sa: intersection(2|a> + 0.3|b> + 0|c> + 13|d> + 0.9|e>,|a> + 11|b> + 23|c> + 0.5|d> + 7|e>)
|a> + 0.300|b> + 0.500|d> + 0.900|e>

-- using the same back-end code, we can implement other examples of foo(a,b).
-- eg, multiplication and addition, and so on.
sa: multiply(2|a> + 3|b> + 5|c>,7|a> + 5|b> + 0|c> + 9|d>)
14.000|a> + 15.000|b> + 0.000|c> + 0.000|d>

sa: addition(2|a> + 3|b> + 5|c>,7|a> + 5|b> + 0|c> + 9|d>)
9.000|a> + 8.000|b> + 5.000|c> + 9.000|d>

-- now a couple of really simple ones:
-- spell and read:
sa: spell |word: frog>
|letter: f> + |letter: r> + |letter: o> + |letter: g>

-- NB: since it is a superposition, the duplicate letters get added together.
-- plan is to eventually have a sequence type, where this doesn't happen
-- in that case we would instead have:
-- |letter: l> . |letter: e> . |letter: t> . |letter: t> . |letter: e> . |letter: r>
sa: spell |word: letter>
|letter: l> + 2.000|letter: e> + 2.000|letter: t> + |letter: r>

-- NB: read ignores case and punctuation, as we can see:
sa: read |text: I don't know about that!>
|word: i> + |word: don't> + |word: know> + |word: about> + |word: that>

-- now, spell assumes the "word" data-type, and read assumes the "text" data-type
-- and returns |> if they are not, but if it turns out this isn't useful (I think it will be),
-- it is trivial to change.

-- now, their inverse, which I had totally forgotten about (heh, that's how useful they are :).
sa: read-letters spell |word: letter>
|word: letter>

sa: read-words read |text: I don't know about that!>
|text: i don't know about that>
-- again, they would work better using sequences, not superpositions.

-- now code wise simple, but useful:
-- merge-labels()
sa: merge-labels(|a> + |b> + |c> + |d> + |e>)
|abcde>

-- now a couple of simple number related functions:
is-prime |number: n> == |yes> (if n is prime)
is-prime |number: n> == |no> (if n is not prime)
is-prime |blah> == |> (since we require the "number" data-type)
is-prime |blah: n> == |>
factor |number: n> returns list of prime factors, and again requires the "number" data-type.

sa: is-prime |number: 21>
|no>

-- as far as I know the python is using arbitrary precision integers:
sa: is-prime |number: 90214539181246357>
|yes>

sa: factor |number: 210>
|number: 2> + |number: 3> + |number: 5> + |number: 7>

sa: factor |number: 398714527>
|number: 521> + |number: 765287>

sa: factor |number: 987298762329>
4.000|number: 3> + |number: 11> + |number: 1108079419>

-- convert numbers into the word equivalent
-- (and eventually we would want the inverse too)
-- currently unimplemented!
-- though it would look something like this:
number-to-words |number: 7> => |text: seven>
number-to-words |number: 35> => |text: thirty five>
number-to-words |number: 137> => |text: one hundred and thirty seven>
number-to-words |number: 8,921> => |text: eight thousand, nine hundred and twenty one>
number-to-words |number: 54,329> => |text: fifty four thousand, three hundred and twenty nine>
number-to-words |number: 673,421> => |text: six hundred and seventy three thousand, four hundred and twenty one>
number-to-words |number: 3,896,520> => |text: three million, eight hundred and ninety six thousand, five hundred and twenty>

-- convert decimal number to another base:
sa: to-base(|350024>,|2>)
0.000|1> + 0.000|2> + 0.000|4> + |8> + 0.000|16> + 0.000|32> + |64> + 0.000|128> + |256> + |512> + |1024> + 0.000|2048> + |4096> + 0.000|8192> + |16384> + 0.000|32768> + |65536> + 0.000|131072> + |262144>

sa: to-base(|350024>,|8>)
0.000|1> + |8> + 5.000|64> + 3.000|512> + 5.000|4096> + 2.000|32768> + |262144>

sa: to-base(|350024>,|10>)
4.000|1> + 2.000|10> + 0.000|100> + 0.000|1000> + 5.000|10000> + 3.000|100000>

-- now a couple of functions to swap between temperature and distance units
-- proof of concept really, in practice we would want more (for other unit types),
-- and a cleaner way to implement them
-- F operator maps Celcius and Kelvin to Fahrenheit:
sa: F |C: 0>
|F: 32.00>

sa: F |C: 100>
|F: 212.00>

sa: F |K: 0>
|F: -459.67>

-- C maps Fahrenheit and Kelvin to Celcius:
sa: C |K: 0>
|C: -273.15>

sa: C |F: 0>
|C: -17.78>

sa: C |F: 100>
|C: 37.78>

-- K maps Fahrenheit and Celcius to Kelvin:
sa: K |C: 18>
|K: 291.15>

sa: K |C: 0>
|K: 273.15>

sa: K |F: 100>
|K: 310.93>

-- now similar, but for distances:
-- to-km maps meters or miles to km:
sa: to-km |miles: 1>
|km: 1.609>

-- to-meter maps km or miles to meters
sa: to-meter |miles: 7>
|m: 11265.408>

sa: to-meter |km: 5.213>
|m: 5213.000>

-- to-mile(s) maps km or m to miles
sa: to-miles |km: 42>
|miles: 26.098>

sa: to-miles |m: 800>
|miles: 0.497>

-- now a fun one! This should be useful in a bunch of places:
-- the list-to-words function:
list-to-words |x> == |x>
list-to-words (|x> + |y>) == |x and y>
list-to-words (|x> + |y> + |z>) == |x, y and z>
list-to-words (|x> + |y> + |z> + |u> + |v>) == |x, y, z, u and v>
and so on.

-- a practical example:
-- learn Eric's list of friends:
sa: friends |person: Eric> => |person: Fred> + |person: Sam> + |person: Harry> + |person: Mary> + |person: liz>

-- output Eric's list of friends:
sa: list-to-words extract-value friends |person: Eric>
|Fred, Sam, Harry, Mary and liz>

-- the "common" function (a type of intersection)
-- (though intersection is currently limited to 2 or 3 parameters, common can handle any number)
common[op] (|x> + |y> + |z>)
-- expands to:
intersection(op|x>, op|y>, op|z>)

-- some common usages:
common[friends] (|Fred> + |Sam>)
common[actors] (|movie-1> + |movie-2>)
-- or indirectly:
|list> => |Fred> + |Sam> + |Charles> + |Liz>
common[friends] "" |list>

-- next, we have an if statement in BKO.
-- really does require its' own post, to explain best how to use it. Perhaps later.
-- raw details are just:
if(|x>,|a>,|b>) returns |a> if |x> == |True>, |b> otherwise

-- and its more useful brother (since we try to avoid just living in the {0,1} world):
-- the weighted-if function:
wif(|x>,|a>,|b>)
eg:
wif(0.7|x>,|a>,|b>)
if |x> == |True>, returns 0.7|a> + 0.3|b>
if |x> != |True>, returns 0.3|a> + 0.7|b>

-- next, the map function, again this one is very useful!
-- we need this since we don't have multi-line for loops, so we use this to map operators to a list of kets.
map[op] (|x> + |y> + |z>)
runs:
op |x> => op |_self>
op |y> => op |_self>
op |z> => op |_self>

map[fn,result] (|a> + |b> + |c> + |d>)
runs:
result |a> => fn |_self>
result |b> => fn |_self>
result |c> => fn |_self>
result |d> => fn |_self>

-- most common usage is:
fn |*> #=> ... some details here
map[fn,result] "" |some list>

-- the exp function:
exp[op,n] |x>
maps to:
(1 + op + op^2 + ... + op^n) |x>

-- the exp-max function:
exp-max[op] |x>
maps to:
(1 + op + op^2 + ... + op^n) |x>
for an n such that exp[op,n] |x> == exp[op,n+1] |x>
-- ie, we have found every "child node" of |x>
-- with a warning that we have no idea how big the result is going to be, or how many steps deep.
-- a common usage is to find 6 degrees of separation:
exp[friends,6] |Fred>
exp-max[friends] |Fred>

-- the apply() function:
-- again, this is one of those very useful ones!
eg: apply(|op: age> + |op: friends> + |op: father>,|Fred>)
maps to:
age |Fred> + friends |Fred> + father |Fred>

-- a common usage is to define a list of operators separately:
|op list> => |op: mother> + |op: father> + |op: dob> + |op: age>
-- then apply them:
apply("" |op list>,|Fred>)

-- eg, maybe use like this:
|basic info op list> => |op: mother> + |op: father> + |op: height> + |op: age> + |op: eye-colour>
basic-info |*> #=> apply("" |basic info op list>,|_self>)
basic-info |Fred>
basic-info |Sam>

-- here is a toy function, maps dates to day of the week:
sa: day-of-the-week |date: 2015/01/24>
|day: Saturday>

-- here is one that saves typing, the split operator:
sa: split |a b c d e>
|a> + |b> + |c> + |d> + |e>

sa: split |word1 word2 word3 word4>
|word1> + |word2> + |word3> + |word4>
-- currently only splits on space chars, but maybe useful to specify the split char(s).

-- the clone ket function (not yet sure of a use case):
-- clone(|x>,|y>) copies rules from |x> and applies them to |y>
-- hence the name, clone().
-- say we have:
-- age |x> => |27>
-- mother |x> => |Jane>
-- after clone(|x>,|y>) we have:
-- age |y> == |27>
-- mother |y> == |Jane>
-- eg, if |x> and |y> are twin sisters.
--
-- thought of a use case:
-- say we have just learnt "elm" is a type of tree.
-- well, load that up with some default values we know about all tree's:
-- (cf. inheriting from a parent class in OO programming)
-- clone(|plant: tree>,|plant: tree: elm>)
-- then fill in more specific data as we learn more.

-- the relevant-kets[op] function
-- returns a list of all the kets in the current context that have "op" defined.
-- relevant-kets[op] is frequently useful for generating lists we can apply the map function to.
-- eg, learn some data:
sa: friends |Fred> => |Sam> + |Liz>
sa: friends |Rob> => |Jack> + |Tom>
sa: age |Fred> => |22>

-- now find who knows what operator types:
sa: relevant-kets[friends] |>
|Fred> + |Rob>

sa: relevant-kets[age] |>
|Fred>

-- there is a variant on this.
-- returns: intersection(relevant-kets[op],SP)
intn-relevant-kets[op] SP

-- eg, we can chain them and find all kets that support both friends, and age:
-- (NB: one has "intn" prefix, and one doesn't!)
intn-relevant-kets[age] relevant-kets[friends] |>

-- the pretty print rules as a matrix function.
-- first define some rules:
sa: op |a> => |a> + 2.000|b> + 3.000|c>
sa: op |b> => 0.500|b> + 9.000|c> + 5.000|e>
sa: op |c> => 7.000|e> + 2.000|b>

-- now take a look:

sa: matrix[op]
[ a ] = [  1.00  0     0     ] [ a ]
[ b ]   [  2.00  0.50  2.00  ] [ b ]
[ c ]   [  3.00  9.00  0     ] [ c ]
[ e ]   [  0     5.00  7.00  ]
|matrix>

-- and we finish with a slightly more interesting function, the train-of-thought function.
-- this code makes heavy use of supported-ops, pick-elt and apply().
-- and will work much better with a big knowledge base, but even a small one gives hints of what a large example will be like.
sa: load early-us-presidents.sw -- load up some knowledge
sa: create inverse -- needed, else we run into dead ends.
sa: train-of-thought[13] |Madison> -- take 13 steps, starting with |Madison>

context: sw console
one: |Madison>
n: 13
|X>: |Madison>

|early US Presidents: _list>
|Adams>
|year: 1797>
|Washington>
|early US Presidents: _list>
|Adams>
|number: 2>
|Adams>
|year: 1801>
|Jefferson>
|early US Presidents: _list>
|Adams>
|year: 1799>

Anyway, I guess the summary of this post is that we have some proof of concept functions trying to map our BKO scheme towards a more general purpose knowledge engine. Don't take the above functions as finished, take them as hints on where we could take this project.

Tuesday, 6 January 2015

introducing sigmoids

Now, next thing we need to mention are what I call "sigmoids". They are functions that you apply to superpositions and they only change the coeffs in superpositions. They do not, by themselves, change the order or labels of kets in superpositions. Named loosely after these guys.

-- set all coeffs above 0 to 1, else 0
clean SP

-- set everything below t to 0, else x
-- similar to drop-below[t]
threshold-filter[t] SP

-- set everything below t to x, else 0
-- similar to drop-above[t]
not-threshold-filter[t] SP

-- set everything below 0.96 to 0, else 1
binary-filter SP

-- set everything below 0.96 to 1, else 0
not-binary-filter SP

-- set everything below 0 to 0, else x
pos SP

-- return the absolute value of x
abs SP

-- set everything above t to t, else x
max-filter[t] SP

-- set everything below 0.04 to 1, else 0
NOT SP

-- set everything in range [0.96,1.04] to 1, else 0
xor-filter SP

-- set everything in range [a,b] to x, else 0
sigmoid-in-range[a,b] SP

-- set all coeffs to 1/x. if x == 0, then return 0
invert SP

-- set all coeffs to t - x
subtraction-invert[t] SP

-- sets all coeffs, including zeros, to t
set-to[t] SP

They are simple enough, so here is the python:

def clean(x):
  if x <= 0:
    return 0
  else:
    return 1

def threshold_filter(x,t):
  if x < t:
    return 0
  else:
    return x

def not_threshold_filter(x,t):
  if x <= t:
    return x
  else:
    return 0

def binary_filter(x):
  if x <= 0.96:
    return 0
  else:
    return 1

def not_binary_filter(x):
  if x <= 0.96:
    return 1
  else:
    return 0

def pos(x):
  if x <= 0:
    return 0
  else:
    return x

def sigmoid_abs(x):           
  return abs(x)

def max_filter(x,t):
  if x <= t:
    return x
  else:
    return t

def NOT(x):
  if x <= 0.04:
    return 1
  else:
    return 0

# otherwise known as the Goldilock's function.
# not too hot, not too cold.
def xor_filter(x):
  if 0.96 <= x and x <= 1.04:
    return 1
  else:
    return 0

# this is another type of "Goldilock function"
# the in-range sigmoid:
def sigmoid_in_range(x,a,b):
  if a <= x and x <= b:
    return x
  else:
    return 0

def invert(x):
  if x == 0:
    return 0
  else:
    return 1/x
    
def subtraction_invert(x,t):
  return t - x

def set_to(x,t):
  return t

Sigmoids are simple and tidy enough, so if it turns out we need a new one, it will be fine to add it. This is in contrast with ket/sp built in functions, where we are reluctant to add new functions unless there is no neater way to do it.

So, that is it for now. Heaps more function operators to come!

Update: Here is a visualization of a couple of sigmoids in action:
-- define a function, with values in [0,3]
-- NB: the multiply by 0 is in there, else it would be in range [1,4] since the noise is additive
|f> => absolute-noise[3] 0 range(|x: 0>,|x: 255>)

-- apply binary filter to that function:
-- NB: I set |x: 0> to 3, so graph y-axis wasn't auto-scaled to [0,1]
binary-filter "" |f>

-- apply threshold-filter, with t = 1
threshold-filter[1] "" |f>

-- apply threshold-filter, with t = 2
threshold-filter[2] "" |f>

-- apply threshold-filter, with t = 2.5
threshold-filter[2.5] "" |f>

some built in functions

First up in phase 2, some functions built into our ket and superposition classes. There are quite a few of these, but in this post I think I will only mention the most useful ones. (note that SP is just some superposition)

-- randomly select an element from SP
-- eventually I want a weighted pick-elt too.
pick-elt SP

-- normalize so sum of coeffs = 1
-- (this can be used to map a frequency list to a list of probabilities)
normalize SP

-- normalize so sum of coeffs = t
normalize[t] SP

-- rescale coeffs so coeff of max element = 1
rescale SP

-- rescale coeffs so coeff of max element = t
rescale[t] SP

--returns number of elements in SP in |number: x> format
count SP
how-many SP

-- returns sum of coeffs of the elements in SP in |number: x> format
count-sum SP
sum SP

-- returns the product of coeffs of the elements in SP in |number: x> format
product SP

-- drop elements from SP with coeff <= 0.
-- NB: in our model coeffs are almost always >= 0
drop SP

-- drop elements from SP with coeff below t
drop-below[t] SP

-- drop elements from SP with coeff above t
drop-above[t] SP

-- keep elements with index in range [a,b]
-- NB: index starts at 1, not 0
select-range[a,b] SP
select[a,b] SP

-- return element with index k
select-elt[k] SP

-- delete k'th element from the superposition.
delete-elt[k] SP

-- reverse the SP
reverse SP

-- shuffle the SP
shuffle SP

-- sort superposition by the coeffs of the kets
-- this one is very useful!
-- especially in combination with op-self operators
coeff-sort SP

-- sort using a natural sort of lowercase labels of the kets
-- NB: sometimes natural sort bugs out, and I have to manually
-- swap the code back to standard lowercase sort.
-- eg, the binary tree example with kets such as |00> and |0010> and so on.
ket-sort SP

-- return the first ket found with the max coeff
max-elt SP

-- return the first ket found with the min coeff
min-elt SP

-- return the kets with the max coeff
max SP

-- return the kets with the min coeff
min SP

-- return the max coeff in the SP in |number: x> format
max-coeff SP

-- return the min coeff in the SP in |number: x> format
min-coeff SP

-- mulitply all coeffs by t
mult[t] SP

-- add noise to the SP in range [0,t]
absolute-noise[t] SP

-- add noise to the SP in range [0,t*max_coeff]
relative-noise[t] SP

-- returns the difference between the largest coeff and the second largest coeff.
-- in 3| > format.
discrimination SP
discrim SP

-- returns |no> if SP is the identity element |>
-- otherwise returns |yes>
not-empty SP
do-you-know SP

I guess that is about it! Note there is a longer, more detailed version of the above here, which shows the mapping between the underlying python and the BKO (though it is incomplete).

BTW, I deliberately left out these two, as I will describe them in phase 3 of the write-up:
similar[op] |x>
find-topic[op] |x>

Sunday, 4 January 2015

Announcing phase 2: function operators

Now, with phase 1 out of the way, I can move on to phase 2. Phase 1 was largely about literal operators, ie those defined in a learn rule: OP KET => SUPERPOSITION. But so far I have only hinted that there is also a large collection of function operators, operators that do some processing, eg, mentioned so far: pick-elt, normalize, algebra, and a couple of others.

Friday, 12 December 2014

the first BKO claim:

And now, what I have been building to, the first big claim for the BKO scheme:
"all finite relatively static knowledge can be represented in the BKO scheme"

If I was a real computer scientist or mathematician I would prove that claim. But I'm a long way from that, so the best I can offer is, "it seems to be the case with the examples I have tried so far". Yeah, vastly unsatisfying compared to a real proof.

And a couple of notes:
1) BKO is what I call an "active representation". Once in this format a lot of things are relatively easy. eg, working with the IMDB data was pretty much trivial.
2) One of the driving forces behind BKO is to try and collapse all knowledge into one general/unified representation (OP KET => SUPERPOSITION). The benefit being that if you develop machinery for one branch of knowledge it often extends easily to other branches.
3) The next driving force is efficiency of the notation. So while my results are possible to reproduce using other methods, BKO notation is more efficient. One benefit of the correct notation is if you make a hard thing easier, you in turn make a harder thing possible. eg, in physics there are notational short-cuts all over the place for just this reason.
4) Another selling point for the BKO notation is how closely English maps to it.

How about a summary of some examples that show how clean and powerful the notation is:
Learning plurals:

plural |word: *> #=> merge-labels(|_self> + |s>)
plural |word: foot> => |word: feet>
plural |word: mouse> => |word: mice>
plural |word: radius> => |word: radii>
plural |word: tooth> => |word: teeth>
plural |word: person> => |word: people>

Some general rules that apply to all people:

siblings |person: *> #=> brothers |_self> + sisters |_self>
children |person: *> #=> sons |_self> + daughters |_self>
parents |person: *> #=> mother |_self> + father |_self>
uncles |person: *> #=> brothers parents |_self>
aunts |person: *> #=> sisters parents |_self>
aunts-and-uncles |person: *> #=> siblings parents |_self>
cousins |person: *> #=> children siblings parents |_self>
grand-fathers |person: *> #=> father parents |_self>
grand-mothers |person: *> #=> mother parents |_self>
grand-parents |person: *> #=> parents parents |_self>
grand-children |person: *> #=> children children |_self>
great-grand-parents |person: *> #=> parents parents parents |_self>
great-grand-children |person: *> #=> children children children |_self>
immediate-family |person: *> #=> siblings |_self> + parents |_self> + children |_self>
friends-and-family |person: *> #=> friends |_self> + family |_self>

Asking about Fred:

"Who are Fred's friends?"
friends |Fred>

"What age are Fred's friends?"
age friends |Fred>

"How many friends does Fred have?"
how-many friends |Fred>

"Do you know the age of Fred's friends?"
do-you-know age friends |Fred>

"Who are Fred's friends of friends?"
friends friends |Fred>

"What age are Fred's friends of friends?"
age friends friends |Fred>

"How many friends of friends does Fred have?"
how-many friends friends |Fred>
... and so on.

Some movie trivia:

"What movies do Matt Damon and Morgan Freeman have in common?"
common[movies] (|actor: Matt Damon> + |actor: Morgan (I) Freeman>)
movie: Invictus (2009)
movie: The People Speak (2009)
movie: Magnificent Desolation: Walking on the Moon 3D (2005)

"What actors do "Ocean's Twelve" and "Ocean's Thirteen" have in common?"
common[actors] (|movie: Ocean's Twelve (2004)> + |movie: Ocean's Thirteen (2007)> )
actor: Don Cheadle
actor: Casey Affleck
actor: Elliott Gould
actor: Bernie Mac
actor: Matt Damon
actor: Brad Pitt
actor: Eddie Izzard
actor: Eddie Jemison
actor: George Clooney
actor: Scott Caan
actor: Jerry (I) Weintraub
actor: Scott L. Schwartz
actor: Carl Reiner
actor: Shaobo Qin
actor: Andy (I) Garcia
actor: Vincent Cassel

Kevin Bacon numbers:

kevin-bacon-0 |result> => |actor: Kevin (I) Bacon> 
kevin-bacon-1 |result> => actors movies |actor: Kevin (I) Bacon>
kevin-bacon-2 |result> => [actors movies]^2 |actor: Kevin (I) Bacon>
kevin-bacon-3 |result> => [actors movies]^3 |actor: Kevin (I) Bacon>
kevin-bacon-4 |result> => [actors movies]^4 |actor: Kevin (I) Bacon>

Working with a grid:

sa: load 45-by-45-grid.sw
sa: near |grid: *> #=> |_self> + N|_self> + NE|_self> + E|_self> + SE|_self> + S|_self> + SW|_self> + W|_self> + NW|_self>
sa: building |grid: 4 40> => |building: cafe>
sa: create inverse
sa: current |location> => inverse-building |building: cafe>

-- now ask what is your current location?
sa: current |location>
|grid: 4 40>

-- what is near your current location?
sa: near current |location>
|grid: 4 40> + |grid: 3 40> + |grid: 3 41> + |grid: 4 41> + |grid: 5 41> + |grid: 5 40> + |grid: 5 39> + |grid: 4 39> + |grid: 3 39>

-- what is 3 steps NW of your current location?
sa: NW^3 current |location>
|grid: 1 37>

-- what is near 3 steps NW of your current location?
-- NB: since we are at the edge of a finite universe grid, there are less neighbours than in other examples of near.
-- this BTW, makes use of |> as the identity element for superpositions
sa: near NW^3 current |location>
|grid: 1 37> + |grid: 1 38> + |grid: 2 38> + |grid: 2 37> + |grid: 2 36> + |grid: 1 36>

-- what is 7 steps S of your current location?
sa: S^7 current |location>
|grid: 11 40>

-- what is near 7 steps S of your current location?
sa: near S^7 current |location>
|grid: 11 40> + |grid: 10 40> + |grid: 10 41> + |grid: 11 41> + |grid: 12 41> + |grid: 12 40> + |grid: 12 39> + |grid: 11 39> + |grid: 10 39>

Working with ages:

age-1 |age: *> #=> arithmetic(|_self>,|->,|age: 1>)
age+1 |age: *> #=> arithmetic(|_self>,|+>,|age: 1>)
almost |age: *> #=> age-1 |_self>
roughly |age: *> #=> 0.5 age-1 |_self> + |_self> + 0.5 age+1 |_self>

-- almost 19:
sa: almost |age: 19>
|age: 18>

-- roughly 27:
sa: roughly |age: 27>
0.500|age: 26> + |age: 27> + 0.500|age: 28>

Learning indirectly:

-- set our "you" variable:
sa: |you> => |Sam>
-- learn your age:
sa: age "" |you> => |age: 23>

-- ask Sam's age:
sa: age |Sam>
|age: 23>

-- greet you:
sa: random-greet "" |you>
|Good morning Sam.>

And that pretty much concludes phase 1 of my write-up. Heaps more to come!

Update: here is a fun example of the right notation making things easier.
Consider algebra, but using Roman not Arabic numerals:
27x^3 + 78y + 14
vs:
XXVII * x^3 + LXXVIII * y + XIV
or alternatively consider trying to read a large matrix with values written in Roman numerals. Ouch!

Update: we can do all the above examples using just these key pieces:
OP KET => SUPERPOSITION
label descent, eg: |word: *>, |person: *>, |grid: *>, |age: *>
stored rules, ie: #=>
merge-labels()
|_self>
common[OP] (|x> + |y>)
load file.sw
create inverse
|> as the identity element for superpositions
arithmetic()
learning indirectly, eg: age "" |you> => |age: 27>
pick-elt (inside random-greet)

Thursday, 11 December 2014

the maths rules for the BKO scheme

Here are the maths rules behind BKO:

1) <x||y> == 0 if x != y.  
2) <x||y> == 1 if x == y.
3) <!x||y> == 1 if x != y. (NB: the ! acts as a not. cf, the -v switch for grep) 
4) <!x||y> == 0 if x == y.
5) <x: *||y: z> == 0 if x != y.
6) <x: *||y: z> == 1 if x == y, for any z. 
7) applying bra's is linear. <x|(|a> + |b> + |c>) == <x||a> + <x||b> + <x||c>
8) if a coeff is not given, then it is 1. eg, <x| == <x|1 and 1|x> == |x>
9) bra's and ket's commute with the coefficients. eg, <x|7 == 7 <x| and 13|x> == |x>13  
10) in contrast to QM, in BKO operators are right associative only.
<a|(op|b>) is valid and is identical to <a|op|b>
(<a|op)|b> is invalid, and undefined.
11) again, in contrast to QM, <a|op|b> != <b|op|a>^* (a consequence of (10) really)
12) applying projections is linear. |x><x|(|a> + |b> + |c>) == |x><x||a> + |x><x||b> + |x><x||c>
13) kets in superpositions commute. |a> + |b> == |b> + |a>
14) kets in sequences do not commute. |a> . |b> != |b> . |a>
Though maybe in the sequence version of simm, this would be useful:
|a> . |b> = c |b> . c |a>, where usually c is < 1. (yeah, it "bugs out" if you swap it back again, but in practice should be fine)
another example: 
  |c> . |a> . |b> = c |a> . c |c> . |b>
                  = c |a> . c |b> . c^2 |c>
15) operators (in general) do not commute. <b|op2 op1|a> != <b|op1 op2|a>
16) if a coeff in a superposition is zero, we can drop it from the superposition without changing the meaning of that superposition. 
17) we can arbitrarily add kets to a superposition if they have coeff zero without changing the meaning of that superposition.
18) |> is the identity element for superpositions. sp + |> == |> + sp == sp.
19) the + sign in superpositions is literal. ie, kets add.
|a> + |a> + |a> = 3|a>
|a> + |b> + |c> + 6|b> = |a> + 7|b> + |c>
20) <x|op-sequence|y> is always a scalar/float
21) |x><x|op-sequence|y> is always a ket or a superposition

some bigger sw examples

Now we have most of the basics out of the way, we can look at some bigger examples.
So here are several public databases that I have mapped to sw format:

GeoNames
"The GeoNames geographical database covers all countries and contains over eight million placenames that are available for download free of charge."
http://www.geonames.org/
And my rough version of the Australian data here (yeah, I want to redo at some stage):
geonames-au-id-version.sw (50 MB, 7 op types and 871,186 learn rules)
improved sw versions:
improved-geonames-au.sw (109 MB, 11 op types and 1,544,996 learn rules)
improved-geonames-cities-1000.sw (94 MB, 11 op types and 1,375,844 learn rules)
improved-geonames-cities-15000.sw (17 MB, 11 op types and 236,184 learn rules)
improved-geonames-de.sw (99 MB, 11 op types and 1,454,997 learn rules)
improved-geonames-fr.sw (83 MB, 11 op types and 1,203,510 learn rules)
improved-geonames-gb.sw (31 MB, 11 op types and 441,431 learn rules)
improved-geonames-us.sw (1.3 GB, 11 op types and 18,950,003 learn rules)

Moby Thesaurus
"Moby Thesaurus is the largest and most comprehensive thesaurus data source in English available for commercial use. This second edition has been thoroughly revised adding more than 5,000 root words (to total more than 30,000) with an additional million synonyms and related terms (to total more than 2.5 million synonyms and related terms)."
http://icon.shef.ac.uk/Moby/mthes.html
Again, my rough version here:
moby-thesaurus.sw (32 MB, 1 op type and 30,244 learn rules)
improved sw version:
improved-moby-thesaurus.sw (50 MB, 1 op type and 30,260 learn rules)

Moby Part-of-Speech
"This second edition is a particularly thorough revision of the original Moby Part-of-Speech. Beyond the fifteen thousand new entries, many thousand more entries have been scrutinized for correctness and modernity. This is unquestionably the largest P-O-S list in the world."
http://icon.shef.ac.uk/Moby/mpos.html
The sw version:
part-of-speech.sw (16 MB, 1 op type and 233,090 learn rules)

Frequently Occurring Surnames from Census 1990
http://www.census.gov/topics/population/genealogy/data/1990_census/1990_census_namefiles.html#
The sw version:
names.sw (1.7 MB, 1 op type and 4 learn rules)

IMDB database:
ftp://ftp.fu-berlin.de/pub/misc/movies/database/
The sw version:
improved-imdb.sw (588 MB, 2 op types and 2,591,132 learn rules)
imdb-ratings.sw (1.6 MB, 4 op types and 21,820 learn rules)
improved-imdb-year.sw (470 MB, 4 op types and 3,146,301 learn rules)

A year's worth of historical share data (I forget the source!)
shares.sw (17 MB, 5 op types and 2,622 learn rules)

And I guess that is about it. The only note I want to make is that in each of these examples I had to write a custom script to parse! Once in sw format, parsing is trivial and identical in each case. Yeah, I'm trying to push my sw notation! I certainly think it is superior to XML.

Update: in a future phase of the project I would like to extend this, and map even more data sets to sw format.

Update: I now have a script to produce the stats summary of sw files.

finding common movies and actors

We can have a lot of fun with the imdb data. This time, given two actors, find which movies they shared. Alternatively, given two movies, find the common actors.

Let's jump right into the python:

#!/usr/bin/env python3

import sys

if len(sys.argv) < 3:
  print("\nUsage:")
  print("  ./find_common.py actor1 actor2")
  print("  ./find_common.py movie1 movie2\n")
  sys.exit(0)

one = sys.argv[1]
two = sys.argv[2]

def file_recall(filename,op,label):
  pattern = op + " |" + label + "> => "
  n = len(pattern)
  with open(filename,'r') as f:
    for line in f:
      if line.startswith(pattern):
        line = line[n:]
        return line[1:-1].split("> + |")
  return []

def display(line):
#  return ", ".join(line)
  return "\n".join(line)

def intersection(a,b):
  return list(set(a) & set(b))

imdb_sw = "sw-examples/improved-imdb.sw"    # our imdb data

def print_common_movies(sw_file,one,two):
  actor1 = "actor: " + one
  actor2 = "actor: " + two
  movies1 = file_recall(sw_file,"movies",actor1)
  movies2 = file_recall(sw_file,"movies",actor2)

# check if we have info on them:
  if len(movies1) == 0 or len(movies2) == 0:
    return

  common_movies = intersection(movies1,movies2)

  print()
  print("common movies for:")
  print(one)
  print(two)
  print("number of common movies:",len(common_movies))
  print("common movies:")
  print(display(common_movies))
  print()

def print_common_actors(sw_file,one,two):
  movie1 = "movie: " + one
  movie2 = "movie: " + two
  actors1 = file_recall(sw_file,"actors",movie1)
  actors2 = file_recall(sw_file,"actors",movie2)

# check if we have info on them:
  if len(actors1) == 0 or len(actors2) == 0:
    return

  common_actors = intersection(actors1,actors2)

  print()
  print("common actors for:")
  print(one)
  print(two)
  print("number of common actors:",len(common_actors))
  print("common actors:")
  print(display(common_actors))
  print()


print_common_actors(imdb_sw,one,two)
print_common_movies(imdb_sw,one,two)

Now some examples:

$ ./find_common.py "Tom Cruise" "Nicole Kidman"

common movies for:
Tom Cruise
Nicole Kidman
number of common movies: 8
common movies:
movie: Eyes Wide Shut (1999)
movie: Days of Thunder (1990)
movie: Der Geist des Geldes (2007)
movie: August (2008)
movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)
movie: Stanley Kubrick: A Life in Pictures (2001)
movie: Far and Away (1992)
movie: The Queen (2006)


$ ./find_common.py "Matt Damon" "Morgan (I) Freeman"

common movies for:
Matt Damon
Morgan (I) Freeman
number of common movies: 3
common movies:
movie: Invictus (2009)
movie: The People Speak (2009)
movie: Magnificent Desolation: Walking on the Moon 3D (2005)

$ ./find_common.py "Bruce Willis" "Tom Cruise"

common movies for:
Bruce Willis
Tom Cruise
number of common movies: 1
common movies:
movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)

$ ./find_common.py "Bruce Willis" "Matt Damon"

common movies for:
Bruce Willis
Matt Damon
number of common movies: 1
common movies:
movie: Ocean's Twelve (2004)

$ ./find_common.py "Brad Pitt" "George Clooney"

common movies for:
Brad Pitt
George Clooney
number of common movies: 7
common movies:
movie: Burn After Reading (2008)
movie: Ocean's Twelve (2004)
movie: Boffo! Tinseltown's Bombs and Blockbusters (2006)
movie: Ocean's Eleven (2001)
movie: Touch of Evil (2011)
movie: Ocean's Thirteen (2007)
movie: Confessions of a Dangerous Mind (2002)

$ ./find_common.py "Matt Damon" "Ben Affleck"

common movies for:
Matt Damon
Ben Affleck
number of common movies: 10
common movies:
movie: Chasing Amy (1997)
movie: Jersey Girl (2004)
movie: Jay and Silent Bob Strike Back (2001)
movie: School Ties (1992)
movie: Glory Daze (1995)
movie: Dogma (1999)
movie: The Third Wheel (2002)
movie: Good Will Hunting (1997)
movie: Unite for Japan (2011)
movie: Field of Dreams (1989)

-- now find the common actors for the Ocean's series of movies:
$ ./find_common.py "Ocean's Eleven (2001)" "Ocean's Twelve (2004)"

common actors for:
Ocean's Eleven (2001)
Ocean's Twelve (2004)
number of common actors: 18
common actors:
actor: David (II) Sontag
actor: Casey Affleck
actor: Julia (I) Roberts
actor: Andy (I) Garcia
actor: George Clooney
actor: Eddie Jemison
actor: Elliott Gould
actor: Larry Sontag
actor: Matt Damon
actor: Carl Reiner
actor: Topher Grace
actor: Brad Pitt
actor: Scott L. Schwartz
actor: Scott Caan
actor: Bernie Mac
actor: Jerry (I) Weintraub
actor: Shaobo Qin
actor: Don Cheadle

$ ./find_common.py "Ocean's Twelve (2004)" "Ocean's Thirteen (2007)"

common actors for:
Ocean's Twelve (2004)
Ocean's Thirteen (2007)
number of common actors: 16
common actors:
actor: Don Cheadle
actor: Casey Affleck
actor: Elliott Gould
actor: Bernie Mac
actor: Matt Damon
actor: Brad Pitt
actor: Eddie Izzard
actor: Eddie Jemison
actor: George Clooney
actor: Scott Caan
actor: Jerry (I) Weintraub
actor: Scott L. Schwartz
actor: Carl Reiner
actor: Shaobo Qin
actor: Andy (I) Garcia
actor: Vincent Cassel

So that was all kinda fun! I guess the only note is I was thinking of implementing a google like "did you mean", because currently if you don't get the actor or movie name exactly right, you don't get any results. Shouldn't be too hard to implement something like that.

Update: IMDB does something similar too, which isn't really surprising.

Update: Here are a bunch of other results using the IMDB data:
all-actors-average.txt
all-actors-weighted-average.txt
movie-only-votes-ratings-title.txt
sorted-all-actors-average.txt
sorted-all-actors-weighted-average.txt
sorted-top-1000-actors-average.txt
sorted-top-1000-actors-weighted-average.txt
star-studded-movies.txt
top-1000-actors-average.txt
top-1000-actors-weighted-average.txt
top-1000-actors.txt
top-2500-well-known-actors.txt
votes-ratings-title.txt
well-known-actors.txt
kevin-bacon-numbers.sw