Tuesday, 16 February 2016

towards a definition of intelligence

So, have we done enough work that we can now make a reasonable guess at a definition of intelligence? Let's see. In my travels I have seen one definition being, an intelligent agent, given a current situation, will manipulate things so as to maximize the number of potential future states. So, if such an agent is stuck in a valley, it will climb to the top of the hill to maximize its potential pathways.

Mathematically, roughly (in a simplified 1 dimension):
F = d/dx V(x)
where V(x) is the landscape, and F is the direction you want to head.

I have an alternate definition:
1) given the agents current state (as represented by some superposition) find a pathway (as represented by some operator sequence) to your desired state (again, represented by some superposition). The quicker the agent can do this, and the shorter the pathway, then the more intelligence points we give that agent. Noting that for sufficiently hard problems, most agents won't be able to find a pathway at all.
2) given an object the agent wishes to understand, how well constructed is the agents internal representation of that object. At one extreme we have rote learning, say you recall an objects definition word for word, with essentially no understanding. At the other we have a very dense network linking the object with the rest of the knowledge in the agents memory store. The denser the network, the more intelligence points we give that agent. And I suppose we should give some points for speed as well.

Comments:
1) the above is somewhat dependent on the agent already have a large body of knowledge. This isn't perfect since young children do not have as much knowledge as an adult, but in some regards are far more intelligent than adults. Frankly, it is hard work to boot-strap from nothing to a thorough understanding of the world.
2) if you ever watch Richard Feynman talk, it is obvious he had a very dense network representation of physics in his head. Everything was linked to everything. This gives him lots of (2) points in my scheme, but then he was a physics genius!
3) OK. So how do we build an intelligent agent? Heh. No one knows!! My guess is that it requires at least three components: 1) a processing center (eg the neocortex), 2) a memory system (eg the hippocampus), and 3) an attention system (eg the thalamus). I personally think the attention system is the most important of the three. We need some system to filter and only attend to what is currently important, and to dynamically change attention as needed. Indeed, this sounds an awful lot like a von Neumann architecture computer! With CPU, RAM and instruction pointer (as the attention system). But in detail they are quite different. Especially the attention system, what I have in mind is a lot more involved than an IP.
3) superpositions and operator sequences should be sufficient to represent any current state, or pathway between states, of interest. That being my main thesis of the project! Is there anything that can't be represented this way? I don't know. But the implication would be that a human brain couldn't represent it either.

Sunday, 14 February 2016

new operators: guess-ket and guess-operator

I decided it might be useful to have a couple of operators that guess the ket or the operator, even if you don't know their name exactly. Don't have a strong use-case yet, but seems to be something that humans do, so should presumably be useful eventually.

There are three variations of each:
guess-ket |ket>
guess-ket[k] |ket>
guess-ket[*] |ket>
The first one just gives you the best matching ket. The second returns the top k matches. The third gives all matches with similarity > 0.

Likewise, we have:
guess-operator[op]
guess-operator[op,k]
guess-operator[op,*]
where the first one gives you the best matching operator. The second gives the top k matches. The third gives all of them with simm > 0.

Now, for a little bit of the details in the back-ground. We basically use the similarity metric on the superpositions created by the process_string() function, against all known kets "context.relevant_kets("*")" or known supported operators "context.supported_operators()":
  def process_string(s):
    one = ket(s.lower())
    return make_ngrams(one,'1,2,3','letter')
Now, a couple of examples:
-- learn a little knowledge:
the-age-of |Fred> => |age: 27>
the-age-of |Frank> => |age: 33>
the-age-of |Robert> => |age: 29>
the-age-of |Rob> => |age: 31>

-- return only the best match to "freddie":
sa: guess-ket |freddie>
0.611|Fred>

-- see all matches to "freddie":
sa: guess-ket[*] |freddie>
0.611|Fred> + 0.167|Frank> + 0.122|Robert> + 0.056|Rob>

-- now try the guess-operator[]:
sa: guess-operator[age]
0.259|op: the-age-of>

-- who is "roberto"?
sa: guess-ket |roberto>
0.844|Robert>

-- one potential use case. Guess the operator and the ket:
sa: apply(guess-operator[age] |>,guess-ket |roberto>)
0.219|age: 29>
NB: in this last example we used "guess-operator[age] |>". Note especially the |> tacked on the end. We need this so it is parsed as a (compound) superposition. In the console though, it is not mandatory, and I often get lazy and leave it out. A similar thing applies to rel-kets[op] and probably some other function operators. If something doesn't work as expected, put |> in, and that should fix it. Indeed, best practice is to always include it!

That's probably about it for this post. Though I have to wonder with my if-then machines, if guess-operator, and guess-age are redundant? Don't know. Time will tell!

If interested, the code for these are at the bottom of the functions code, with names "guess_ket" and "guess_operator". Just CTRL-F to find them.

That's it for this post.

Friday, 12 February 2016

learning simple images using if-then machines

Today, let's play with simple images from ages ago. BTW, I call them "simple images" because we don't need to translate, rotate, magnify or otherwise align (which we would with more general images), and we restrict pixel values to 0 or 1. This is to make things easier. We will of course eventually try for more general or typical images sometime in the future, but they are distinctly harder! And require many layers of if-then machines. eg, the brain has roughly 20 layers in the visual cortex.

Here are our images:
|letter: H>
#   #
#   #
#   #
#####
#   #
#   #
#   #

|noisy: H>
    #
#   #
#   #
### #
#    
#   #
#   #

|noisy: H2>
#   #
#    
# ###
#####
##  #
#   #
### #

|letter: I>
#####
  #
  #
  #
  #
  #
#####

|noisy: I>
####
  #
  
  
  #
  #
# ###

|noisy: I2>
##  #
 ###
  #
  #
  ###
####
#####

|letter: O>
######
#    #
#    #
#    #
#    #
#    #
######
Now, let's define our 3 if-then machines:
load H-I-pat-rec.sw

image |node: 1: 1> => pixels |letter: H>
image |node: 1: 2> => pixels |noisy: H>
image |node: 1: 3> => pixels |noisy: H2>
then |node: 1: *> => |letter H>

image |node: 2: 1> => pixels |letter: I>
image |node: 2: 2> => pixels |noisy: I>
image |node: 2: 3> => pixels |noisy: I2>
then |node: 2: *> => |letter I>

image |node: 3: 1> => pixels |letter: O>
then |node: 3: *> => |letter O>

the |list of images> => |node: 1: 1> + |node: 1: 2> + |node: 1: 3> + |node: 2: 1> + |node: 2: 2> + |node: 2: 3> + |node: 3: 1>
which-image |*> #=> then select[1,1] similar-input[image] image |_self>
Note that today I used "select[1,1]" instead of "drop-below[]". This just means select the first element in the superposition, and noting that similar-input[op] sorts its results.
Now, put "which-image" to use:
sa: which-image |node: 2: 3>
1.0|letter I>

sa: which-image |node: 1: 2>
1.0|letter H>

-- now, choose images randomly, and see what we get:
-- noting we are leaving in the INFO: lines, that I normally chomp out. This is so we can see which kets pick-elt has chosen.
sa: which-image pick-elt the |list of images>
INFO: ket: list of images
INFO: ket: node: 1: 2
INFO: ket: node: 1: 2
INFO: ket: node: 1: 2
1.0|letter H>

sa: which-image pick-elt the |list of images>
INFO: ket: list of images
INFO: ket: node: 3: 1
INFO: ket: node: 3: 1
INFO: ket: node: 3: 1
1.0|letter O>

sa: which-image pick-elt the |list of images>
INFO: ket: list of images
INFO: ket: node: 1: 3
INFO: ket: node: 1: 3
INFO: ket: node: 1: 3
1.0|letter H>

sa: which-image pick-elt the |list of images>
INFO: ket: list of images
INFO: ket: node: 2: 2
INFO: ket: node: 2: 2
INFO: ket: node: 2: 2
1.0|letter I>

sa: which-image pick-elt the |list of images>
INFO: ket: list of images
INFO: ket: node: 2: 3
INFO: ket: node: 2: 3
INFO: ket: node: 2: 3
1.0|letter I>

-- and so on!
Now for a couple of comments:

1) if you look under the hood, the above is quite boring! We are not making much use of similar-input[op] at all, in that we are feeding in, and detecting, exact patterns. The only interesting bit is that it is pooling the different image types. Hrmm... let's try for some noisy examples:
sa: then select[1,1] similar-input[image] absolute-noise[1] image |node: 1: 1>
0.919|letter H>

sa: then select[1,1] similar-input[image] absolute-noise[1] image |node: 2: 3>
0.907|letter I>

sa: then select[1,1] similar-input[image] absolute-noise[30] image |node: 1: 2>
0.761|letter H>

sa: then select[1,1] similar-input[image] absolute-noise[30] image |node: 3: 1>
0.738|letter O>
Heh. Even at absolute-noise[30] we are still matching at over 70%. And now we are clearly using the similarity metric, and "fuzzy matching".
2) support vector machines talk about patterns to classify as linearly separable. Well, in the world of superpositions, linearly separable doesn't really even make sense. And similar-input[op] doesn't care either, and works on any superposition type.
3) "which-image" is linear, which we can see with this:
sa: which-image the |list of images>
3|letter H> + 3|letter I> + 1.0|letter O>
4) finally, here is what we now know:
sa: dump
----------------------------------------
|context> => |context: H I pat rec>

pixels |letter: H> => |pixel: 1: 1> + |pixel: 1: 5> + |pixel: 2: 1> + |pixel: 2: 5> + |pixel: 3: 1> + |pixel: 3: 5> + |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5> + |pixel: 5: 1> + |pixel: 5: 5> + |pixel: 6: 1> + |pixel: 6: 5> + |pixel: 7: 1> + |pixel: 7: 5>
dim-1 |letter: H> => |dimension: 5>
dim-2 |letter: H> => |dimension: 7>

pixels |noisy: H> => |pixel: 1: 5> + |pixel: 2: 1> + |pixel: 2: 5> + |pixel: 3: 1> + |pixel: 3: 5> + |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 5> + |pixel: 5: 1> + |pixel: 6: 1> + |pixel: 6: 5> + |pixel: 7: 1> + |pixel: 7: 5>
dim-1 |noisy: H> => |dimension: 5>
dim-2 |noisy: H> => |dimension: 7>

pixels |noisy: H2> => |pixel: 1: 1> + |pixel: 1: 5> + |pixel: 2: 1> + |pixel: 3: 1> + |pixel: 3: 3> + |pixel: 3: 4> + |pixel: 3: 5> + |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5> + |pixel: 5: 1> + |pixel: 5: 2> + |pixel: 5: 5> + |pixel: 6: 1> + |pixel: 6: 5> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 5>
dim-1 |noisy: H2> => |dimension: 5>
dim-2 |noisy: H2> => |dimension: 7>

pixels |letter: I> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5> + |pixel: 2: 3> + |pixel: 3: 3> + |pixel: 4: 3> + |pixel: 5: 3> + |pixel: 6: 3> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>
dim-1 |letter: I> => |dimension: 5>
dim-2 |letter: I> => |dimension: 7>

pixels |noisy: I> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 2: 3> + |pixel: 5: 3> + |pixel: 6: 3> + |pixel: 7: 1> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>
dim-1 |noisy: I> => |dimension: 5>
dim-2 |noisy: I> => |dimension: 7>

pixels |noisy: I2> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 5> + |pixel: 2: 2> + |pixel: 2: 3> + |pixel: 2: 4> + |pixel: 3: 3> + |pixel: 4: 3> + |pixel: 5: 3> + |pixel: 5: 4> + |pixel: 5: 5> + |pixel: 6: 1> + |pixel: 6: 2> + |pixel: 6: 3> + |pixel: 6: 4> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>
dim-1 |noisy: I2> => |dimension: 5>
dim-2 |noisy: I2> => |dimension: 7>

pixels |letter: O> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5> + |pixel: 1: 6> + |pixel: 2: 1> + |pixel: 2: 6> + |pixel: 3: 1> + |pixel: 3: 6> + |pixel: 4: 1> + |pixel: 4: 6> + |pixel: 5: 1> + |pixel: 5: 6> + |pixel: 6: 1> + |pixel: 6: 6> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5> + |pixel: 7: 6>
dim-1 |letter: O> => |dimension: 6>
dim-2 |letter: O> => |dimension: 7>

image |node: 1: 1> => |pixel: 1: 1> + |pixel: 1: 5> + |pixel: 2: 1> + |pixel: 2: 5> + |pixel: 3: 1> + |pixel: 3: 5> + |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5> + |pixel: 5: 1> + |pixel: 5: 5> + |pixel: 6: 1> + |pixel: 6: 5> + |pixel: 7: 1> + |pixel: 7: 5>

image |node: 1: 2> => |pixel: 1: 5> + |pixel: 2: 1> + |pixel: 2: 5> + |pixel: 3: 1> + |pixel: 3: 5> + |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 5> + |pixel: 5: 1> + |pixel: 6: 1> + |pixel: 6: 5> + |pixel: 7: 1> + |pixel: 7: 5>

image |node: 1: 3> => |pixel: 1: 1> + |pixel: 1: 5> + |pixel: 2: 1> + |pixel: 3: 1> + |pixel: 3: 3> + |pixel: 3: 4> + |pixel: 3: 5> + |pixel: 4: 1> + |pixel: 4: 2> + |pixel: 4: 3> + |pixel: 4: 4> + |pixel: 4: 5> + |pixel: 5: 1> + |pixel: 5: 2> + |pixel: 5: 5> + |pixel: 6: 1> + |pixel: 6: 5> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 5>

then |node: 1: *> => |letter H>

image |node: 2: 1> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5> + |pixel: 2: 3> + |pixel: 3: 3> + |pixel: 4: 3> + |pixel: 5: 3> + |pixel: 6: 3> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>

image |node: 2: 2> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 2: 3> + |pixel: 5: 3> + |pixel: 6: 3> + |pixel: 7: 1> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>

image |node: 2: 3> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 5> + |pixel: 2: 2> + |pixel: 2: 3> + |pixel: 2: 4> + |pixel: 3: 3> + |pixel: 4: 3> + |pixel: 5: 3> + |pixel: 5: 4> + |pixel: 5: 5> + |pixel: 6: 1> + |pixel: 6: 2> + |pixel: 6: 3> + |pixel: 6: 4> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5>

then |node: 2: *> => |letter I>

image |node: 3: 1> => |pixel: 1: 1> + |pixel: 1: 2> + |pixel: 1: 3> + |pixel: 1: 4> + |pixel: 1: 5> + |pixel: 1: 6> + |pixel: 2: 1> + |pixel: 2: 6> + |pixel: 3: 1> + |pixel: 3: 6> + |pixel: 4: 1> + |pixel: 4: 6> + |pixel: 5: 1> + |pixel: 5: 6> + |pixel: 6: 1> + |pixel: 6: 6> + |pixel: 7: 1> + |pixel: 7: 2> + |pixel: 7: 3> + |pixel: 7: 4> + |pixel: 7: 5> + |pixel: 7: 6>

then |node: 3: *> => |letter O>

the |list of images> => |node: 1: 1> + |node: 1: 2> + |node: 1: 3> + |node: 2: 1> + |node: 2: 2> + |node: 2: 3> + |node: 3: 1>

which-image |*> #=> then select[1,1] similar-input[image] image |_self>
----------------------------------------
And that's it for this post. And I need thinking time to find more interesting if-then machine examples.

Thursday, 11 February 2016

learning days of the week using if-then machines

Today, an example of learning days of the week using 7 if-then machines. Note that if-then machines are probably over-kill if you spell your days correctly. In this post we make use of string similarity using letter-ngrams.

Here is the code:
  context weekday if-then machines
  ngrams |*> #=> letter-ngrams[1,2,3] lower-case |_self>

  day |node: 1: 1> => ngrams |Monday>
  day |node: 1: 2> => ngrams |mon>
  day |node: 1: 3> => ngrams |Mo>
  previous |node: 1: *> => |Sunday>
  id |node: 1: *> => |Monday>
  next |node: 1: *> => |Tuesday>
 
  day |node: 2: 1> => ngrams |Tuesday>
  day |node: 2: 2> => ngrams |tue>
  day |node: 2: 3> => ngrams |Tu>
  previous |node: 2: *> => |Monday>
  id |node: 2: *> => |Tuesday>
  next |node: 2: *> => |Wednesday>

  day |node: 3: 1> => ngrams |Wednesday>
  day |node: 3: 2> => ngrams |wed>
  day |node: 3: 3> => ngrams |We>
  previous |node: 3: *> => |Tuesday>
  id |node: 3: *> => |Wednesday>
  next |node: 3: *> => |Thursday>

  day |node: 4: 1> => ngrams |Thursday>
  day |node: 4: 2> => ngrams |thurs>
  day |node: 4: 3> => ngrams |Th>
  previous |node: 4: *> => |Wednesday>
  id |node: 4: *> => |Thursday>
  next |node: 4: *> => |Friday>

  day |node: 5: 1> => ngrams |Friday>
  day |node: 5: 2> => ngrams |fri>
  day |node: 5: 3> => ngrams |Fr>
  previous |node: 5: *> => |Thursday>
  id |node: 5: *> => |Friday>
  next |node: 5: *> => |Saturday>

  day |node: 6: 1> => ngrams |Saturday>
  day |node: 6: 2> => ngrams |sat>
  day |node: 6: 3> => ngrams |Sa>
  previous |node: 6: *> => |Friday>
  id |node: 6: *> => |Saturday>
  next |node: 6: *> => |Sunday>

  day |node: 7: 1> => ngrams |Sunday>
  day |node: 7: 2> => ngrams |sun>
  day |node: 7: 3> => ngrams |Su>
  previous |node: 7: *> => |Saturday>
  id |node: 7: *> => |Sunday>
  next |node: 7: *> => |Monday>

  yesterday |*> #=> previous drop-below[0.65] similar-input[day] ngrams |_self>
  today |*> #=> id drop-below[0.65] similar-input[day] ngrams |_self>
  tomorrow |*> #=> next drop-below[0.65] similar-input[day] ngrams |_self>
Now, some example usages in the console:
-- correct spelling means coeff = 1:
sa: tomorrow |sun>
1.0|Monday>

-- spelling is not perfect, but close enough (with respect to strings mapped to letter-ngrams) that we can guess what was meant:
sa: tomorrow |tues>
0.667|Wednesday>

-- making use of operator exponentiation. In this case equivalent to "tomorrow tomorrow tomorrow"
-- also note the coeff propagates. If we shoved a "clean" sigmoid in the "yesterday, today and tomorrow" operators, we could change that behaviour.
-- eg: yesterday |*> #=> previous clean drop-below[0.65] similar-input[day] ngrams |_self>
sa: tomorrow^3 |tues>
0.667|Friday>

-- "tomorrow" and "yesterday" are perfect inverses of each other:
sa: tomorrow yesterday |fri>
1.0|Friday>

sa: yesterday tomorrow |fri>
1.0|Friday>

-- mapping abbreviation to the full word:
sa: today |Sa>
|Saturday>

sa: yesterday |thurs>
|Wednesday>

-- typo, "thrusday" instead of "thursday", but the code guessed what we meant.
-- this is one of the main benefits of the if-then machines, you usually don't have to get the input exactly right (depending on how you set drop threshold t).
sa: yesterday |thrusday>
0.667|Wednesday>

-- this is an example of over-counting, I suppose you could call it.
-- since "thursd" matched both:
-- day |node: 4: 1> => ngrams |Thursday>
-- day |node: 4: 2> => ngrams |thurs>
-- we briefly mentioned this possibility in my first if-then machine post.
sa: yesterday |thursd>
1.514|Wednesday>

-- Next, we have a couple of function operators that return todays time and date:
sa: current-time
|time: 20:33:16>

sa: current-date
|date: 2016-02-11>

-- and we have another function operator that converts dates to days of the week:
-- what day of the week is New Year:
sa: day-of-the-week |date: 2016-1-1>
|day: Friday>

-- what day of the week is today?:
sa: day-of-the-week current-date
|day: Thursday>

-- what day was it three days ago?
-- NB: not a 100% match because of the "day: " prefix.
sa: yesterday^3 day-of-the-week current-date
0.702|Monday>

-- if you care about that one fix is to remove the category or extract the value:
-- another fix is to add more patterns to our if-then machines
-- eg:
-- day |node: 2: 4> => ngrams |day: Tuesday>
-- day |node: 2: 5> => ngrams |day: tue>
-- day |node: 2: 6> => ngrams |day: Tu>
-- there are other possible fixes too.
-- eg:
-- ngrams |*> #=> letter-ngrams[1,2,3] lower-case extract-value |_self>
sa: extract-value day-of-the-week current-date
|Thursday>

-- what day was it three days ago?
sa: yesterday^3 extract-value day-of-the-week current-date
1.0|Monday>

-- what day is it five days from now?
sa: tomorrow^5 extract-value day-of-the-week current-date
1.0|Tuesday>

-- now, our "tomorrow, yesterday and today" operators are linear (since they are defined with a |*> rule).
-- so a quick demonstration of that:
sa: tomorrow^3 (|Monday> + |Tuesday> + |Saturday>)
1.0|Thursday> + 1.0|Friday> + |Tuesday>
-- and similarly for the other two operators.

-- finally, weekdays are mod 7:
sa: tomorrow^7 |thurs>
1.0|Thursday>

sa: yesterday^21 |thurs>
|Thursday>
I guess that is about it. A fairly simple, somewhat useful, 7 if-then machine system. And an observation I want to make. Usually operator definition time is on the ugly side. As it kind of is above. But operator application time is usually quite clean. I think this is not a bad property to have, though I didn't really design it that way, it was just the way it turned out. So perhaps one use case is that if defining desired operators is too messy for you personally, then find them implemented elsewhere on the net and just web-load the sw file. Heh, assuming I can get anyone interested in the sw file format!

A couple of comments:
1) I had to hand tweak the drop-below threshold to 0.65. If I set it too much higher than that then I wasn't matching things I wanted to. And if I set it to 0.6 then "Sunday" and "Monday" matched.
sa: id drop-below[0.6] similar-input[day] ngrams |Monday>
1.0|Monday> + 0.6|Sunday>
2) If my proposition that if-then machines are a fairly good mathematical approximation to biological neurons, then the above is only a 7 neuron system. The brain has trillions of neurons! That is a lot of processing power!! Though our ngrams operator probably needs a few neurons too. I don't really know at this point how many.
3) here is one way to find the full set of days, given a starting day. Not sure it is all that useful in this particular case, but hey, probably is for other if-then machine systems.
sa: exp-max[tomorrow] |Monday>
2|Monday> + 1.0|Tuesday> + |Wednesday> + 1.0|Thursday> + |Friday> + 1.0|Saturday> + 1.0|Sunday>
Whether we want to tweak exp-max[] so that it doesn't over-count, I'm not yet sure. Probably cleaner if we did.
4) we can define things like the "day-after-tomorrow" operator:
-- define the operator:
sa: day-after-tomorrow |*> #=> tomorrow^2 day-of-the-week current-date |>

-- invoke it:
sa: day-after-tomorrow |x>
0.702|Saturday>
Noting the 0.7 coeff is from the "day: " prefix. And we could define plenty of others, like "day-before-yesterday", and so on.
5) for completeness, here is what we now know:
sa: dump
----------------------------------------
|context> => |context: weekday if-then machines>

ngrams |*> #=> letter-ngrams[1,2,3] lower-case |_self>
yesterday |*> #=> previous drop-below[0.65] similar-input[day] ngrams |_self>
today |*> #=> id drop-below[0.65] similar-input[day] ngrams |_self>
tomorrow |*> #=> next drop-below[0.65] similar-input[day] ngrams |_self>
day-after-tomorrow |*> #=> tomorrow^2 day-of-the-week current-date |>

day |node: 1: 1> => |m> + |o> + |n> + |d> + |a> + |y> + |mo> + |on> + |nd> + |da> + |ay> + |mon> + |ond> + |nda> + |day>

day |node: 1: 2> => |m> + |o> + |n> + |mo> + |on> + |mon>

day |node: 1: 3> => |m> + |o> + |mo>

previous |node: 1: *> => |Sunday>
id |node: 1: *> => |Monday>
next |node: 1: *> => |Tuesday>

day |node: 2: 1> => |t> + |u> + |e> + |s> + |d> + |a> + |y> + |tu> + |ue> + |es> + |sd> + |da> + |ay> + |tue> + |ues> + |esd> + |sda> + |day>

day |node: 2: 2> => |t> + |u> + |e> + |tu> + |ue> + |tue>

day |node: 2: 3> => |t> + |u> + |tu>

previous |node: 2: *> => |Monday>
id |node: 2: *> => |Tuesday>
next |node: 2: *> => |Wednesday>

day |node: 3: 1> => |w> + 2|e> + 2|d> + |n> + |s> + |a> + |y> + |we> + |ed> + |dn> + |ne> + |es> + |sd> + |da> + |ay> + |wed> + |edn> + |dne> + |nes> + |esd> + |sda> + |day>

day |node: 3: 2> => |w> + |e> + |d> + |we> + |ed> + |wed>

day |node: 3: 3> => |w> + |e> + |we>

previous |node: 3: *> => |Tuesday>
id |node: 3: *> => |Wednesday>
next |node: 3: *> => |Thursday>

day |node: 4: 1> => |t> + |h> + |u> + |r> + |s> + |d> + |a> + |y> + |th> + |hu> + |ur> + |rs> + |sd> + |da> + |ay> + |thu> + |hur> + |urs> + |rsd> + |sda> + |day>

day |node: 4: 2> => |t> + |h> + |u> + |r> + |s> + |th> + |hu> + |ur> + |rs> + |thu> + |hur> + |urs>

day |node: 4: 3> => |t> + |h> + |th>

previous |node: 4: *> => |Wednesday>
id |node: 4: *> => |Thursday>
next |node: 4: *> => |Friday>

day |node: 5: 1> => |f> + |r> + |i> + |d> + |a> + |y> + |fr> + |ri> + |id> + |da> + |ay> + |fri> + |rid> + |ida> + |day>

day |node: 5: 2> => |f> + |r> + |i> + |fr> + |ri> + |fri>

day |node: 5: 3> => |f> + |r> + |fr>

previous |node: 5: *> => |Thursday>
id |node: 5: *> => |Friday>
next |node: 5: *> => |Saturday>

day |node: 6: 1> => |s> + 2|a> + |t> + |u> + |r> + |d> + |y> + |sa> + |at> + |tu> + |ur> + |rd> + |da> + |ay> + |sat> + |atu> + |tur> + |urd> + |rda> + |day>

day |node: 6: 2> => |s> + |a> + |t> + |sa> + |at> + |sat>

day |node: 6: 3> => |s> + |a> + |sa>

previous |node: 6: *> => |Friday>
id |node: 6: *> => |Saturday>
next |node: 6: *> => |Sunday>

day |node: 7: 1> => |s> + |u> + |n> + |d> + |a> + |y> + |su> + |un> + |nd> + |da> + |ay> + |sun> + |und> + |nda> + |day>

day |node: 7: 2> => |s> + |u> + |n> + |su> + |un> + |sun>

day |node: 7: 3> => |s> + |u> + |su>

previous |node: 7: *> => |Saturday>
id |node: 7: *> => |Sunday>
next |node: 7: *> => |Monday>
----------------------------------------
And I guess that is it for this post.

Saturday, 6 February 2016

learning a sequence using if-then machines

Last post I claimed that we can easily learn sequences using if-then machines. This post is just to give an example of that.

Let's dive in:
context if-then machine learning a sequence

-- define our superpositions:
-- let's make them random 10 dimensional, with coeffs in range [0,20]
the |sp1> => absolute-noise[20] 0 range(|x: 1>,|x: 10>)
the |sp2> => absolute-noise[20] 0 range(|x: 1>,|x: 10>)
the |sp3> => absolute-noise[20] 0 range(|x: 1>,|x: 10>)
the |sp4> => absolute-noise[20] 0 range(|x: 1>,|x: 10>)
the |sp5> => absolute-noise[20] 0 range(|x: 1>,|x: 10>)

-- define our if-then machines:
-- ie, learn the sequence of superpositions
seq |node: 1: 1> => the |sp1>
then |node: 1: *> => the |sp2>

seq |node: 2: 1> => the |sp2>
then |node: 2: *> => the |sp3>

seq |node: 3: 1> => the |sp3>
then |node: 3: *> => the |sp4>

seq |node: 4: 1> => the |sp4>
then |node: 4: *> => the |sp5>

seq |node: 5: 1> => the |sp5>
then |node: 5: *> => |the finish line>

-- define the input superposition:
the |input> => the |sp1>

-- see what we have:
table[node,coeff] 100 similar-input[seq] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 1: 1 | 100.0  |
| 5: 1 | 69.718 |
| 2: 1 | 65.306 |
| 3: 1 | 65.192 |
| 4: 1 | 62.993 |
+------+--------+

table[node,coeff] 100 similar-input[seq] then drop-below[0.9] similar-input[seq] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 2: 1 | 100    |
| 1: 1 | 65.306 |
| 3: 1 | 64.579 |
| 4: 1 | 62.829 |
| 5: 1 | 52.732 |
+------+--------+

table[node,coeff] 100 similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 3: 1 | 100    |
| 5: 1 | 79.326 |
| 4: 1 | 73.162 |
| 1: 1 | 65.192 |
| 2: 1 | 64.579 |
+------+--------+

table[node,coeff] 100 similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 4: 1 | 100    |
| 5: 1 | 76.359 |
| 3: 1 | 73.162 |
| 1: 1 | 62.993 |
| 2: 1 | 62.829 |
+------+--------+

table[node,coeff] 100 similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 5: 1 | 100.0  |
| 3: 1 | 79.326 |
| 4: 1 | 76.359 |
| 1: 1 | 69.718 |
| 2: 1 | 52.732 |
+------+--------+

sa: then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] then drop-below[0.9] similar-input[seq] the |input>
1.0|the finish line>

-- finally, see what the ugly details look like:
sa: dump
----------------------------------------
|context> => |context: if-then machine learning a sequence>

the |sp1> => 12.363|x: 1> + 7.862|x: 2> + 4.541|x: 3> + 2.752|x: 4> + 15.782|x: 5> + 13.444|x: 6> + 8.522|x: 7> + 7.512|x: 8> + 11.056|x: 9> + 17.304|x: 10>
the |sp2> => 8.05|x: 1> + 4.543|x: 2> + 14.629|x: 3> + 3.443|x: 4> + 4.74|x: 5> + 1.059|x: 6> + 3.91|x: 7> + 17.714|x: 8> + 14.833|x: 9> + 11.074|x: 10>
the |sp3> => 19.846|x: 1> + 19.852|x: 2> + 19.825|x: 3> + 6.605|x: 4> + 10.253|x: 5> + 8.096|x: 6> + 1.937|x: 7> + 7.358|x: 8> + 12.041|x: 9> + 1.345|x: 10>
the |sp4> => 14.787|x: 1> + 10.035|x: 2> + 3.728|x: 3> + 16.038|x: 4> + 5.647|x: 5> + 3.857|x: 6> + 3.552|x: 7> + 7.227|x: 8> + 16.747|x: 9> + 1.412|x: 10>
the |sp5> => 18.044|x: 1> + 18.396|x: 2> + 8.64|x: 3> + 14.424|x: 4> + 19.749|x: 5> + 6.61|x: 6> + 7.26|x: 7> + 4.446|x: 8> + 9.583|x: 9> + 1.272|x: 10>

seq |node: 1: 1> => 12.363|x: 1> + 7.862|x: 2> + 4.541|x: 3> + 2.752|x: 4> + 15.782|x: 5> + 13.444|x: 6> + 8.522|x: 7> + 7.512|x: 8> + 11.056|x: 9> + 17.304|x: 10>
then |node: 1: *> => 8.05|x: 1> + 4.543|x: 2> + 14.629|x: 3> + 3.443|x: 4> + 4.74|x: 5> + 1.059|x: 6> + 3.91|x: 7> + 17.714|x: 8> + 14.833|x: 9> + 11.074|x: 10>

seq |node: 2: 1> => 8.05|x: 1> + 4.543|x: 2> + 14.629|x: 3> + 3.443|x: 4> + 4.74|x: 5> + 1.059|x: 6> + 3.91|x: 7> + 17.714|x: 8> + 14.833|x: 9> + 11.074|x: 10>
then |node: 2: *> => 19.846|x: 1> + 19.852|x: 2> + 19.825|x: 3> + 6.605|x: 4> + 10.253|x: 5> + 8.096|x: 6> + 1.937|x: 7> + 7.358|x: 8> + 12.041|x: 9> + 1.345|x: 10>

seq |node: 3: 1> => 19.846|x: 1> + 19.852|x: 2> + 19.825|x: 3> + 6.605|x: 4> + 10.253|x: 5> + 8.096|x: 6> + 1.937|x: 7> + 7.358|x: 8> + 12.041|x: 9> + 1.345|x: 10>
then |node: 3: *> => 14.787|x: 1> + 10.035|x: 2> + 3.728|x: 3> + 16.038|x: 4> + 5.647|x: 5> + 3.857|x: 6> + 3.552|x: 7> + 7.227|x: 8> + 16.747|x: 9> + 1.412|x: 10>

seq |node: 4: 1> => 14.787|x: 1> + 10.035|x: 2> + 3.728|x: 3> + 16.038|x: 4> + 5.647|x: 5> + 3.857|x: 6> + 3.552|x: 7> + 7.227|x: 8> + 16.747|x: 9> + 1.412|x: 10>
then |node: 4: *> => 18.044|x: 1> + 18.396|x: 2> + 8.64|x: 3> + 14.424|x: 4> + 19.749|x: 5> + 6.61|x: 6> + 7.26|x: 7> + 4.446|x: 8> + 9.583|x: 9> + 1.272|x: 10>

seq |node: 5: 1> => 18.044|x: 1> + 18.396|x: 2> + 8.64|x: 3> + 14.424|x: 4> + 19.749|x: 5> + 6.61|x: 6> + 7.26|x: 7> + 4.446|x: 8> + 9.583|x: 9> + 1.272|x: 10>
then |node: 5: *> => |the finish line>

the |input> => 12.363|x: 1> + 7.862|x: 2> + 4.541|x: 3> + 2.752|x: 4> + 15.782|x: 5> + 13.444|x: 6> + 8.522|x: 7> + 7.512|x: 8> + 11.056|x: 9> + 17.304|x: 10>
----------------------------------------
And if you follow the tables, it works exactly as expected. Note though that we chose our if-then machines to be exact matches (ie, 100%) to the input superposition. I did that for demonstration purposes. How about a tweak on the above, where that is not the case. We will use absolute-noise[1] to nosiy up our superpositions:
-- define a new layer of input patterns seq2 (note, we don't need to (re)define the "then" operator, since we are using the same ones as above):
seq2 |node: 1: 1> => absolute-noise[1] the |sp1>
seq2 |node: 2: 1> => absolute-noise[1] the |sp2>
seq2 |node: 3: 1> => absolute-noise[1] the |sp3>
seq2 |node: 4: 1> => absolute-noise[1] the |sp4>
seq2 |node: 5: 1> => absolute-noise[1] the |sp5>

-- now put it to use:
table[node,coeff] 100 similar-input[seq2] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 1: 1 | 98.604 |
| 5: 1 | 70.492 |
| 3: 1 | 65.944 |
| 2: 1 | 65.777 |
| 4: 1 | 63.699 |
+------+--------+

table[node,coeff] 100 similar-input[seq2] then drop-below[0.9] similar-input[seq2] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 2: 1 | 98.698 |
| 1: 1 | 66.038 |
| 3: 1 | 65.322 |
| 4: 1 | 63.723 |
| 5: 1 | 53.78  |
+------+--------+

table[node,coeff] 100 similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 3: 1 | 98.775 |
| 5: 1 | 79.902 |
| 4: 1 | 73.249 |
| 1: 1 | 66.02  |
| 2: 1 | 65.681 |
+------+--------+

table[node,coeff] 100 similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 4: 1 | 98.632 |
| 5: 1 | 76.244 |
| 3: 1 | 74.323 |
| 1: 1 | 64.251 |
| 2: 1 | 63.981 |
+------+--------+

table[node,coeff] 100 similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] the |input>
+------+--------+
| node | coeff  |
+------+--------+
| 5: 1 | 98.495 |
| 3: 1 | 80.222 |
| 4: 1 | 76.313 |
| 1: 1 | 70.211 |
| 2: 1 | 53.996 |
+------+--------+

sa: then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] then drop-below[0.9] similar-input[seq2] the |input>
0.985|the finish line>
Note that as desired we again find the |node: 1: 1>, |node: 2: 1> ... |node: 5: 1> sequence, but this time with a roughly 98% rather than a 100% match. Hopefully that makes my point.

A couple of comments:
1) if-then machines work with any superpositions.
2) the then operator can also have side effects. eg using stored rules. This is a big deal! And makes if-then machines even more powerful.
3) the above are quite simple if-then machines in that there is no pooling. ie, only one input superposition triggers each machine. A full if-then machine can have many "pooled" inputs.
4) once again, a whinge about my parser. If that was finished, we could short-cut the above using:
next (*) #=> then drop-below[0.9] similar-input[seq] |_self>
next2 (*) #=> then drop-below[0.9] similar-input[seq2] |_self>

-- after which we would use:
table[node,coeff] 100 similar-input[seq] next^k the |input>
table[node,coeff] 100 similar-input[seq2] next2^k the |input>
5) for any sufficiently long if-then machine sequence with the matches below 100%, eventually you will reach a point where the result is less than the drop-below threshold t leaving you with the empty ket |>. Which kind of makes sense. If you are reasoning with probabilities, not certainties, for a long enough chain you can't be sure of your conclusion. On the flip side, if your matches are pretty much 100%, as in the world of mathematics, then you should be able to have long sequences and still be above the drop-below threshold.
6) I'm pretty sure temporal pooling and spatial pooling look the same, as far as if-then machines are concerned. In that case the difference is where the superpositions come from, not the structure of the if-then machine.

Finally, this is what we have now:
----------------------------------------
|context> => |context: fixed if-then machine learning a sequence>

the |sp1> => 12.363|x: 1> + 7.862|x: 2> + 4.541|x: 3> + 2.752|x: 4> + 15.782|x: 5> + 13.444|x: 6> + 8.522|x: 7> + 7.512|x: 8> + 11.056|x: 9> + 17.304|x: 10>
the |sp2> => 8.05|x: 1> + 4.543|x: 2> + 14.629|x: 3> + 3.443|x: 4> + 4.74|x: 5> + 1.059|x: 6> + 3.91|x: 7> + 17.714|x: 8> + 14.833|x: 9> + 11.074|x: 10>
the |sp3> => 19.846|x: 1> + 19.852|x: 2> + 19.825|x: 3> + 6.605|x: 4> + 10.253|x: 5> + 8.096|x: 6> + 1.937|x: 7> + 7.358|x: 8> + 12.041|x: 9> + 1.345|x: 10>
the |sp4> => 14.787|x: 1> + 10.035|x: 2> + 3.728|x: 3> + 16.038|x: 4> + 5.647|x: 5> + 3.857|x: 6> + 3.552|x: 7> + 7.227|x: 8> + 16.747|x: 9> + 1.412|x: 10>
the |sp5> => 18.044|x: 1> + 18.396|x: 2> + 8.64|x: 3> + 14.424|x: 4> + 19.749|x: 5> + 6.61|x: 6> + 7.26|x: 7> + 4.446|x: 8> + 9.583|x: 9> + 1.272|x: 10>

seq |node: 1: 1> => 12.363|x: 1> + 7.862|x: 2> + 4.541|x: 3> + 2.752|x: 4> + 15.782|x: 5> + 13.444|x: 6> + 8.522|x: 7> + 7.512|x: 8> + 11.056|x: 9> + 17.304|x: 10>
seq2 |node: 1: 1> => 13.351|x: 1> + 8.69|x: 2> + 4.772|x: 3> + 3.054|x: 4> + 16.642|x: 5> + 13.708|x: 6> + 9.148|x: 7> + 8.439|x: 8> + 11.983|x: 9> + 17.609|x: 10>
then |node: 1: *> => 8.05|x: 1> + 4.543|x: 2> + 14.629|x: 3> + 3.443|x: 4> + 4.74|x: 5> + 1.059|x: 6> + 3.91|x: 7> + 17.714|x: 8> + 14.833|x: 9> + 11.074|x: 10>

seq |node: 2: 1> => 8.05|x: 1> + 4.543|x: 2> + 14.629|x: 3> + 3.443|x: 4> + 4.74|x: 5> + 1.059|x: 6> + 3.91|x: 7> + 17.714|x: 8> + 14.833|x: 9> + 11.074|x: 10>
seq2 |node: 2: 1> => 8.568|x: 1> + 4.859|x: 2> + 15.537|x: 3> + 3.768|x: 4> + 5.416|x: 5> + 1.809|x: 6> + 4.228|x: 7> + 18.566|x: 8> + 15.799|x: 9> + 11.21|x: 10>
then |node: 2: *> => 19.846|x: 1> + 19.852|x: 2> + 19.825|x: 3> + 6.605|x: 4> + 10.253|x: 5> + 8.096|x: 6> + 1.937|x: 7> + 7.358|x: 8> + 12.041|x: 9> + 1.345|x: 10>

seq |node: 3: 1> => 19.846|x: 1> + 19.852|x: 2> + 19.825|x: 3> + 6.605|x: 4> + 10.253|x: 5> + 8.096|x: 6> + 1.937|x: 7> + 7.358|x: 8> + 12.041|x: 9> + 1.345|x: 10>
seq2 |node: 3: 1> => 20.38|x: 1> + 20.207|x: 2> + 20.263|x: 3> + 7.372|x: 4> + 10.786|x: 5> + 8.449|x: 6> + 2.488|x: 7> + 8.009|x: 8> + 12.633|x: 9> + 1.406|x: 10>
then |node: 3: *> => 14.787|x: 1> + 10.035|x: 2> + 3.728|x: 3> + 16.038|x: 4> + 5.647|x: 5> + 3.857|x: 6> + 3.552|x: 7> + 7.227|x: 8> + 16.747|x: 9> + 1.412|x: 10>

seq |node: 4: 1> => 14.787|x: 1> + 10.035|x: 2> + 3.728|x: 3> + 16.038|x: 4> + 5.647|x: 5> + 3.857|x: 6> + 3.552|x: 7> + 7.227|x: 8> + 16.747|x: 9> + 1.412|x: 10>
seq2 |node: 4: 1> => 15.785|x: 1> + 10.785|x: 2> + 3.955|x: 3> + 16.93|x: 4> + 5.953|x: 5> + 4.312|x: 6> + 3.647|x: 7> + 7.997|x: 8> + 17.241|x: 9> + 2.228|x: 10>
then |node: 4: *> => 18.044|x: 1> + 18.396|x: 2> + 8.64|x: 3> + 14.424|x: 4> + 19.749|x: 5> + 6.61|x: 6> + 7.26|x: 7> + 4.446|x: 8> + 9.583|x: 9> + 1.272|x: 10>

seq |node: 5: 1> => 18.044|x: 1> + 18.396|x: 2> + 8.64|x: 3> + 14.424|x: 4> + 19.749|x: 5> + 6.61|x: 6> + 7.26|x: 7> + 4.446|x: 8> + 9.583|x: 9> + 1.272|x: 10>
seq2 |node: 5: 1> => 18.631|x: 1> + 19.0|x: 2> + 9.594|x: 3> + 14.643|x: 4> + 20.408|x: 5> + 7.457|x: 6> + 7.315|x: 7> + 4.685|x: 8> + 10.152|x: 9> + 1.88|x: 10>
then |node: 5: *> => |the finish line>

the |input> => 12.363|x: 1> + 7.862|x: 2> + 4.541|x: 3> + 2.752|x: 4> + 15.782|x: 5> + 13.444|x: 6> + 8.522|x: 7> + 7.512|x: 8> + 11.056|x: 9> + 17.304|x: 10>
----------------------------------------
Update: now we have all these superpositions, let's go on and show pooling.
-- define our if-then machine:
seq3 |node: 17: 1> => the |sp1>
seq3 |node: 17: 2> => the |sp2>
seq3 |node: 17: 3> => the |sp3>
seq3 |node: 17: 4> => the |sp4>
seq3 |node: 17: 5> => the |sp5>
then |node: 17: *> => |the SP sequence>

-- now, let's do some testing first.
-- randomly pick one of {sp1,sp2,sp3,sp4,sp5}, add some noise, and see what we have:
sa: table[node,coeff] 100 similar-input[seq3] absolute-noise[5] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
+-------+--------+
| node  | coeff  |
+-------+--------+
| 17: 5 | 92.791 |
| 17: 3 | 80.064 |
| 17: 4 | 76.8   |
| 17: 1 | 73.887 |
| 17: 2 | 59.433 |
+-------+--------+
-- so the input must have been sp5

sa: table[node,coeff] 100 similar-input[seq3] absolute-noise[5] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
+-------+--------+
| node  | coeff  |
+-------+--------+
| 17: 4 | 93.223 |
| 17: 5 | 78.173 |
| 17: 3 | 72.651 |
| 17: 2 | 68.824 |
| 17: 1 | 67.369 |
+-------+--------+
-- the input must have been sp4

sa: table[node,coeff] 100 similar-input[seq3] absolute-noise[5] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
+-------+--------+
| node  | coeff  |
+-------+--------+
| 17: 4 | 91.116 |
| 17: 5 | 81.028 |
| 17: 3 | 77.603 |
| 17: 2 | 67.575 |
| 17: 1 | 66.832 |
+-------+--------+
-- sp4 again

sa: table[node,coeff] 100 similar-input[seq3] absolute-noise[5] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
+-------+--------+
| node  | coeff  |
+-------+--------+
| 17: 2 | 89.845 |
| 17: 1 | 74.979 |
| 17: 4 | 69.849 |
| 17: 3 | 68.51  |
| 17: 5 | 61.79  |
+-------+--------+
-- the input must have been sp2

-- now ramp up the noise from absolute-noise[5] to absolute-noise[10], and use the full if-then machine:
sa: then drop-below[0.9] similar-input[seq3] absolute-noise[10] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
|>

sa: then drop-below[0.9] similar-input[seq3] absolute-noise[10] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
|>

sa: then drop-below[0.9] similar-input[seq3] absolute-noise[10] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
0.901|the SP sequence>

sa: then drop-below[0.9] similar-input[seq3] absolute-noise[10] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
|>

sa: then drop-below[0.9] similar-input[seq3] absolute-noise[10] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
0.933|the SP sequence>
And there we have it! Pooling of {sp1,sp2,sp3,sp4,sp5}, and an output of "the SP sequence". And since we added so much noise, it is only sometimes above the drop-below threshold t (in this case 0.9). And since it is all so abstract, this thing is very general.

Note that pooling is a very important concept. Basically it means you can have multiple different representations of the same thing, even though they may look nothing alike. For example, a friends face from different angles in terms of pixels is very different, yet they all trigger the "hey, that's my friend". Another is, the lyrics or notes in a song. Despite being different, they all map to the same song name.

Next, instead of adding noise to the incoming superposition, we project down from 10 dimensions to 9 (using pick[9]):
sa: then drop-below[0.9] similar-input[seq3] pick[9] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
0.987|the SP sequence>

sa: then drop-below[0.9] similar-input[seq3] pick[9] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
0.957|the SP sequence>

sa: then drop-below[0.9] similar-input[seq3] pick[9] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
0.983|the SP sequence>

sa: then drop-below[0.9] similar-input[seq3] pick[9] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
|>

sa: then drop-below[0.9] similar-input[seq3] pick[9] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
0.973|the SP sequence>
Next, I tried projecting down to 8 dimensions (using pick[8]), but almost always the result was below threshold. BTW, the current pick[n] code changes the order of the superposition. This of course has no impact on similar-input[op], and hence if-then machines. Most of the time changing the ordering of a superposition does not change the meaning of that superposition. Though some of the time it is of course useful to sort superpositions, and we have operators for that (ket-sort, coeff-sort, sort-by[], and so on).

Also, I should note that if-then machines are, in general, fairly tolerant of adding noise (using absolute-noise[t]) and removing elements from the superposition (pick[n]). They become more so if you decrease t, and less so if you increase t. Though you don't want t too small, else it will match more than you would like. And if you increase t to 0.98 or higher, then you are in the maths world of black and white/true and false.

Update: note that if you have a long line of code, and don't fully understand it, then we can always decompose that sequence into smaller steps. eg, given:
then drop-below[0.9] similar-input[seq3] pick[9] the pick-elt split |sp1 sp2 sp3 sp4 sp5>
we can work through it step by step:
sa: split |sp1 sp2 sp3 sp4 sp5>
|sp1> + |sp2> + |sp3> + |sp4> + |sp5>

sa: pick-elt split |sp1 sp2 sp3 sp4 sp5>
|sp2>

sa: the |sp2>
8.05|x: 1> + 4.543|x: 2> + 14.629|x: 3> + 3.443|x: 4> + 4.74|x: 5> + 1.059|x: 6> + 3.91|x: 7> + 17.714|x: 8> + 14.833|x: 9> + 11.074|x: 10>

sa: pick[9] the |sp2>
4.74|x: 5> + 8.05|x: 1> + 14.833|x: 9> + 3.91|x: 7> + 4.543|x: 2> + 1.059|x: 6> + 11.074|x: 10> + 17.714|x: 8> + 3.443|x: 4>

sa: similar-input[seq3] (4.74|x: 5> + 8.05|x: 1> + 14.833|x: 9> + 3.91|x: 7> + 4.543|x: 2> + 1.059|x: 6> + 11.074|x: 10> + 17.714|x: 8> + 3.443|x: 4>)
0.826|node: 17: 2> + 0.692|node: 17: 1> + 0.663|node: 17: 4> + 0.526|node: 17: 3> + 0.512|node: 17: 5>

sa: drop-below[0.9] (0.826|node: 17: 2> + 0.692|node: 17: 1> + 0.663|node: 17: 4> + 0.526|node: 17: 3> + 0.512|node: 17: 5>)
|>

sa: then |>
|>
Noting that random steps like pick[n] and pick-elt complicate this, in that each time you use them you will get different answers. Hence why I copied the sp result from the previous line into the next step.

Thursday, 4 February 2016

introducing the if-then machine

I think this thing is going to be seriously interesting! Let me define it, and then I will try to explain it:
seq |node 1: 1> => sp1-1
seq |node 1: 2> => sp1-2
seq |node 1: 3> => sp1-3
...
seq |node 1: n> => sp1-n

then |node 1: *> => sp2
next (*) #=> then drop-below[t] similar-input[seq] |_self>
where {sp1-k} and sp2 are all superpositions. And we call this collection of rules the if-then machine.

This has the useful and cool property that if any one of the sp1-k superpositions matches close enough to the input (we can call this a pooling of the sp1-k superpositions), then it spits out sp2 as a kind of prediction. t is a parameter that determines how close that match has to be for there to be any output. If the input is not close enough, then "next input-sp" just returns the empty ket |>. In the world of mathematics where we want things to be black and white and true and false, we might want t = 0.99 or something. If we want to be generous and tolerant, then maybe t = 0.6 (ie, 60% match is good enough). Or we could make it dynamic. If this particular if-then machine "fires too much" then increase t. If it "fires too little" then decrease it. The exact details of that is something we can work out later.

BTW, I don't think I have defined similar-input[op] just yet. It is very close to its brother similar[op] ket (see eg here). Indeed, they are only a couple of lines of python apart.
similar[some-op] |x>
returns a list of kets that have superpositions defined with respect to some-op, sorted by their similarity to the superposition defined by "some-op |x>".

In contrast:
similar-input[some-op] some-sp
returns a list of kets that have superpositions defined with respect to some-op, sorted by their similarity to "some-sp".

In context with our if-then machine defined above, this means "seq" is the operator that determines which "layer" of if-then machines to match our input sp with. If seq is not defined for a series of nodes, then similar-input[seq] won't compare the input against those nodes. Say seq2 is instead defined for those nodes, then similar-input[seq2] will compare the input against those. To see what nodes are in a particular "layer", in the console type "rel-kets[seq]" or "rel-kets[seq2]".

Next. In general there is no restriction on the superpositions sp1-k. But sometimes it is useful for them to be somewhat "orthogonal". Though I don't mean that in a strict maths sense of vector dot product equals zero, or similar. I mean in the sense that if one of sp1-k matches input-sp, then ideally we don't want any of the others to do so. Because if they did, we would get a double counting effect.

Let me try to show this double counting effect. Say "node 1: 3" matches input-sp with 80% and "node 1: 17" matches with 92% (and the rest barely at all), then we would have:
drop-below[0.75] similar-input[seq] input-sp
= 0.92|node 1: 17> + 0.8|node 1: 3>

then drop-below[0.75] similar-input[seq] input-sp
= then (0.92|node 1: 17> + 0.8|node 1: 3>)
= 0.92 then |node 1: 17> + 0.8 then |node 1: 3>
= 0.92 sp2 + 0.8 sp2
= 1.72 sp2
And note the coeff of sp2 is greater than 1. So what can we do about it? One possibility, and probably the one I will go for, is average-categorize. Though probably a tweak of that, but it is close to what I want. Basically, if the superpositions are similar enough with respect to simm(), add them, if not, then keep them distinct. But I should note that sometimes we do want this double counting, or triple counting, or whatever, depending on how many matches there are above the t threshold.

I guess now I should give an example. How about a simple 3 if-then machine example:
----------------------------------------
|context> => |context: if-then machine>

seq |node 1: 1> => |X>
seq |node 1: 2> => |Y>
seq |node 1: 3> => |X> + |Y>
then |node 1: *> => |X or Y>

seq |node 2: 1> => |X> + |Y>
then |node 2: *> => |X and Y>

seq |node 3: 1> => |X> + |Y> + |Z>
then |node 3: *> => |X and Y and Z>
----------------------------------------
Now, test it's properties. First, without the "then" operator, and the drop-below[t] operator:
sa: similar-input[seq] |Y>
|node 1: 2> + 0.5|node 1: 3> + 0.5|node 2: 1> + 0.333|node 3: 1>

sa: similar-input[seq] (|X> + |Y>)
|node 1: 3> + |node 2: 1> + 0.667|node 3: 1> + 0.5|node 1: 1> + 0.5|node 1: 2>

sa: similar-input[seq] (|X> + |Y> + |Z>)
|node 3: 1> + 0.667|node 1: 3> + 0.667|node 2: 1> + 0.333|node 1: 1> + 0.333|node 1: 2>
Set t = 0.8 and repeat:
sa: drop-below[0.8] similar-input[seq] |Y>
|node 1: 2>

sa: drop-below[0.8] similar-input[seq] (|X> + |Y>)
|node 1: 3> + |node 2: 1>

sa: drop-below[0.8] similar-input[seq] (|X> + |Y> + |Z>)
|node 3: 1>
Finally, apply the "then" operator:
sa: then drop-below[0.8] similar-input[seq] |X>
|X or Y>

sa: then drop-below[0.8] similar-input[seq] |Y>
|X or Y>

sa: then drop-below[0.8] similar-input[seq] |Z>
|>

sa: then drop-below[0.8] similar-input[seq] (|X> + |Y>)
|X or Y> + |X and Y>

sa: then drop-below[0.8] similar-input[seq] (|X> + |Y> + |Z>)
|X and Y and Z>
Now, once more, this time using t = 0.6
sa: then drop-below[0.6] similar-input[seq] |X>
|X or Y>

sa: then drop-below[0.6] similar-input[seq] |Y>
|X or Y>

sa: then drop-below[0.6] similar-input[seq] |Z>
|>

sa: then drop-below[0.6] similar-input[seq] (|X> + |Y>)
|X or Y> + |X and Y> + 0.667|X and Y and Z>

sa: then drop-below[0.6] similar-input[seq] (|X> + |Y> + |Z>)
|X and Y and Z> + 0.667|X or Y> + 0.667|X and Y>
OK. That is a pretty simple example! But in practice this thing is very general and very powerful. Mostly because superpositions can be almost anything.

Now, on to the really interesting bit. I propose that the if-then machine roughly corresponds to a single neuron. It certainly has a lot more power than a support vector machine (for example, the input does not need to be linearly separable), which also makes that claim. The if-then machine is itself very powerful, which means an entire brain of them would be unimaginably powerful.

Some notes:
1) Jeff Hawkins and the HTM (hierarchical temporal memory) guys claim SDR's (sparse distributed representations) are the brains data-type. Superpositions are more general than SDR's since they can have coeffs other than {0,1}, so I in turn claim superpositions are the brains data-type.
2) we can tweak our if-then machine in various ways. We could put in sigmoids, or an inhibition layer so that nearby neurons inhibit their neighbours and so on. Plus probably other tweaks.
3) the current parser does not yet handle learning superpositions rules. eg, in this case: "next (*) #=> ...". Hence I had to type the full "then drop-below[0.8] similar-input[seq]". If we could handle them our examples would instead look like this:
sa: next (*) #=> then drop-below[0.8] similar-input[seq] |_self>
sa: next |X>
sa: next |Y>
sa: next |Z>
sa: next (|X> + |Y>)
sa: next (|X> + |Y> + |Z>)
4) the above if-then machine used a static learn rule: "then |node 1: *> => ...". We could instead use some dynamic rule: "then |node 1: *> #=> some-action". NB: the symbol for a stored-rule "#=>".
5) our if-then machine is very close to my claimed "general supervised pattern recognition algo".
6) we should be able to use the if-then machine to learn sequences. I need to look into that, but I'm pretty sure of it. And to find the k'th element in the learned sequence we would do something like: "next^k SP". Something like this I suppose:
seq |node 1: 1> => sp1
then |node 1: *> => sp2

seq |node 2: 1> => sp2
then |node 2: *> => sp3

seq |node 3: 1> => sp3
then |node 3: *> => sp4

seq |node 4: 1> => sp4
then |node 4: *> => sp5

seq |node 5: 1> => sp5
then |node 5: *> => sp6
7) we could then pool a sequence into a single output:
seq2 |node P: 0> => SP
seq2 |node P: 1> => next SP
seq2 |node P: 2> => next^2 SP
seq2 |node P: 3> => next^3 SP
...
seq2 |node P: n> => next^n SP
then |node P: *> => |the-SP-sequence>
8) we need some mechanism to find the desired superpositions.
9) if not obvious, I should mention the if-then machine is not restricted to Boolean values. It can be considered a generalization, though if you set t close enough to 1, then it emulates a Boolean world.

Update: a bigger slightly more interesting example:
-- define our context:
sa: context bigger if-then machine

-- define our object -> superposition operator:
sa: ngrams |*> #=> letter-ngrams[1,2,3] |_self>

-- define our 2 if-then machines:
sa: seq |node 1> => ngrams |the cat sat on the mat>
sa: then |node 1> => |the cat sat on the mat>
sa: seq |node 2> => ngrams |the man on the moon>
sa: then |node 2> => |the man on the moon>

-- see what we have:
sa: dump
----------------------------------------
|context> => |context: bigger if-then machine>

ngrams |*> #=> letter-ngrams[1,2,3] |_self>

seq |node 1> => 5|t> + 2|h> + 2|e> + 5| > + |c> + 3|a> + |s> + |o> + |n> + |m> + 2|th> + 2|he> + 2|e > + | c> + |ca> + 3|at> + 2|t > + | s> + |sa> + | o> + |on> + |n > + | t> + | m> + |ma> + 2|the> + 2|he > + |e c> + | ca> + |cat> + 2|at > + |t s> + | sa> + |sat> + |t o> + | on> + |on > + |n t> + | th> + |e m> + | ma> + |mat>
then |node 1> => |the cat sat on the mat>

seq |node 2> => 2|t> + 2|h> + 2|e> + 4| > + 2|m> + |a> + 3|n> + 3|o> + 2|th> + 2|he> + 2|e > + 2| m> + |ma> + |an> + 2|n > + | o> + 2|on> + | t> + |mo> + |oo> + 2|the> + 2|he > + 2|e m> + | ma> + |man> + |an > + |n o> + | on> + |on > + |n t> + | th> + | mo> + |moo> + |oon>
then |node 2> => |the man on the moon>
----------------------------------------

-- now put it to use (we don't have drop-below[t], so effectively t = 0)
-- though first define a short-cut for our if-then machine:
sa: guess-phrase |*> #=> then similar-input[seq] ngrams |_self>

sa: guess-phrase |the >
0.381|the cat sat on the mat> + 0.37|the man on the moon>

sa: guess-phrase |the cat>
0.548|the cat sat on the mat> + 0.37|the man on the moon>

sa: guess-phrase |the cat sat>
0.717|the cat sat on the mat> + 0.356|the man on the moon>

sa: guess-phrase |the cat sat on the>
0.862|the cat sat on the mat> + 0.544|the man on the moon>

-- NB: I deliberately chose "the cat sat on the moon" to lift up the probability of "the man on the moon":
sa: guess-phrase |the cat sat on the moon>
0.866|the cat sat on the mat> + 0.675|the man on the moon>

sa: guess-phrase |the man on the>
0.805|the man on the moon> + 0.602|the cat sat on the mat>

-- now with the exact phrases:
sa: guess-phrase |the cat sat on the mat>
1.0|the cat sat on the mat> + 0.59|the man on the moon>

sa: guess-phrase |the man on the moon>
1.0|the man on the moon> + 0.59|the cat sat on the mat>
So, you have to read the coeffs above carefully, but if you do, you will see it works really quite well. Much better than a Boolean system of logic.

Thursday, 21 January 2016

new operators: mod[k] and is-mod[k]

Just a couple more simple arithmetic operators, mod[k] and is-mod[k]. They do pretty much what they say they do. One for example usage, is the fizz-buzz exercise.

Quick look in the console:
sa: mod[3] |number: 12>
|number: 0>

sa: mod[3] |8>
|2>

sa: is-mod[7] |50>
|no>

sa: is-mod[7] |96889010407>
|yes>

sa: is-mod[3] |category 1: category 2: category 3: 286725>
|yes>
And I guess we should note that they work independent of the category/data-type (as do the other simple arithmetic operators). Here the first one has data-type "number". Those in the middle have no data-type. And the last one has "category 1: category 2: category 3".

Now, on to FizzBuzz. Here are the exercise details:
-- task:
-- For numbers 1 through 100,

-- if the number is divisible by 3 print Fizz;
-- if the number is divisible by 5 print Buzz;
-- if the number is divisible by 3 and 5 (15) print FizzBuzz;
-- else, print the number.
Here are three alternate potential ways to do it, but the incomplete parser meant only the third version actually worked:
|context> => |context: Fizz Buzz exercise in BKO: v1>
|list> => range(|1>,|100>)
fizz-buzz-0 |*> #=> |_self>
fizz-buzz-1 |*> #=> if(arithmetic(|_self>,|%>,|3>) == |0>,|Fizz>,|>)
fizz-buzz-2 |*> #=> if(arithmetic(|_self>,|%>,|5>) == |0>,|Buzz>,|>)
fizz-buzz-3 |*> #=> if(arithmetic(|_self>,|%>,|15>) == |0>,|FizzBuzz>,|>)

map[fizz-buzz-0,fizz-buzz] "" |list>
map[fizz-buzz-1,fizz-buzz] "" |list>
map[fizz-buzz-2,fizz-buzz] "" |list>
map[fizz-buzz-3,fizz-buzz] "" |list>



|context> => |context: Fizz Buzz exercise in BKO: v2>
|list> => range(|1>,|100>)
is-zero |*> => |False>
is-zero |0> => |True>
fizz-buzz-0 |*> #=> |_self>
-- current parser can't handle the nested ()
fizz-buzz-1 |*> #=> if(is-zero arithmetic(|_self>,|%>,|3>),|Fizz>,|>)
fizz-buzz-2 |*> #=> if(is-zero arithmetic(|_self>,|%>,|5>),|Buzz>,|>)
fizz-buzz-3 |*> #=> if(is-zero arithmetic(|_self>,|%>,|15>),|FizzBuzz>,|>)

map[fizz-buzz-0,fizz-buzz] "" |list>
map[fizz-buzz-1,fizz-buzz] "" |list>
map[fizz-buzz-2,fizz-buzz] "" |list>
map[fizz-buzz-3,fizz-buzz] "" |list>



|context> => |context: Fizz Buzz exercise in BKO: v3>
-- define our list
|list> => range(|1>,|100>)

-- define is-zero function
is-zero |*> => |False>
is-zero |0> => |True>

-- define is-mod functions
is-mod-3 |*> #=> is-zero arithmetic(|_self>,|%>,|3>)
is-mod-5 |*> #=> is-zero arithmetic(|_self>,|%>,|5>)
is-mod-15 |*> #=> is-zero arithmetic(|_self>,|%>,|15>)

-- apply them
map[is-mod-3] "" |list>
map[is-mod-5] "" |list>
map[is-mod-15] "" |list>

-- define fizz-buzz functions
fizz-buzz-0 |*> #=> |_self>
fizz-buzz-1 |*> #=> if(is-mod-3 |_self>,|Fizz>,|>)
fizz-buzz-2 |*> #=> if(is-mod-5 |_self>,|Buzz>,|>)
fizz-buzz-3 |*> #=> if(is-mod-15 |_self>,|FizzBuzz>,|>)

-- apply them
map[fizz-buzz-0,fizz-buzz] "" |list>
map[fizz-buzz-1,fizz-buzz] "" |list>
map[fizz-buzz-2,fizz-buzz] "" |list>
map[fizz-buzz-3,fizz-buzz] "" |list>
Now we have is-mod[k] we can give a nice pretty and tidy version:
|context> => |context: Fizz Buzz exercise, is-mod version>
the |list> => range(|1>,|100>)

fizz-buzz-0 |*> #=> |_self>
fizz-buzz-1 |*> #=> if(is-mod[3] |_self>,|Fizz>,|>)
fizz-buzz-2 |*> #=> if(is-mod[5] |_self>,|Buzz>,|>)
fizz-buzz-3 |*> #=> if(is-mod[15] |_self>,|FizzBuzz>,|>)

|null> => map[fizz-buzz-0,fizz-buzz] the |list>
|null> => map[fizz-buzz-1,fizz-buzz] the |list>
|null> => map[fizz-buzz-2,fizz-buzz] the |list>
|null> => map[fizz-buzz-3,fizz-buzz] the |list>
And here is the result:
sa: load fizz-buzz--is-mod.sw
sa: dump
----------------------------------------
|context> => |context: Fizz Buzz exercise, is-mod version>

the |list> => |1> + |2> + |3> + |4> + |5> + |6> + |7> + |8> + |9> + |10> + |11> + |12> + |13> + |14> + |15> + |16> + |17> + |18> + |19> + |20> + |21> + |22> + |23> + |24> + |25> + |26> + |27> + |28> + |29> + |30> + |31> + |32> + |33> + |34> + |35> + |36> + |37> + |38> + |39> + |40> + |41> + |42> + |43> + |44> + |45> + |46> + |47> + |48> + |49> + |50> + |51> + |52> + |53> + |54> + |55> + |56> + |57> + |58> + |59> + |60> + |61> + |62> + |63> + |64> + |65> + |66> + |67> + |68> + |69> + |70> + |71> + |72> + |73> + |74> + |75> + |76> + |77> + |78> + |79> + |80> + |81> + |82> + |83> + |84> + |85> + |86> + |87> + |88> + |89> + |90> + |91> + |92> + |93> + |94> + |95> + |96> + |97> + |98> + |99> + |100>

fizz-buzz-0 |*> #=> |_self>
fizz-buzz-1 |*> #=> if(is-mod[3] |_self>,|Fizz>,|>)
fizz-buzz-2 |*> #=> if(is-mod[5] |_self>,|Buzz>,|>)
fizz-buzz-3 |*> #=> if(is-mod[15] |_self>,|FizzBuzz>,|>)

fizz-buzz |1> => |1>
fizz-buzz |2> => |2>
fizz-buzz |3> => |Fizz>
fizz-buzz |4> => |4>
fizz-buzz |5> => |Buzz>
fizz-buzz |6> => |Fizz>
fizz-buzz |7> => |7>
fizz-buzz |8> => |8>
fizz-buzz |9> => |Fizz>
fizz-buzz |10> => |Buzz>
fizz-buzz |11> => |11>
fizz-buzz |12> => |Fizz>
fizz-buzz |13> => |13>
fizz-buzz |14> => |14>
fizz-buzz |15> => |FizzBuzz>
fizz-buzz |16> => |16>
fizz-buzz |17> => |17>
fizz-buzz |18> => |Fizz>
fizz-buzz |19> => |19>
fizz-buzz |20> => |Buzz>
fizz-buzz |21> => |Fizz>
fizz-buzz |22> => |22>
fizz-buzz |23> => |23>
fizz-buzz |24> => |Fizz>
fizz-buzz |25> => |Buzz>
fizz-buzz |26> => |26>
fizz-buzz |27> => |Fizz>
fizz-buzz |28> => |28>
fizz-buzz |29> => |29>
fizz-buzz |30> => |FizzBuzz>
fizz-buzz |31> => |31>
fizz-buzz |32> => |32>
fizz-buzz |33> => |Fizz>
fizz-buzz |34> => |34>
fizz-buzz |35> => |Buzz>
fizz-buzz |36> => |Fizz>
fizz-buzz |37> => |37>
fizz-buzz |38> => |38>
fizz-buzz |39> => |Fizz>
fizz-buzz |40> => |Buzz>
fizz-buzz |41> => |41>
fizz-buzz |42> => |Fizz>
fizz-buzz |43> => |43>
fizz-buzz |44> => |44>
fizz-buzz |45> => |FizzBuzz>
fizz-buzz |46> => |46>
fizz-buzz |47> => |47>
fizz-buzz |48> => |Fizz>
fizz-buzz |49> => |49>
fizz-buzz |50> => |Buzz>
fizz-buzz |51> => |Fizz>
fizz-buzz |52> => |52>
fizz-buzz |53> => |53>
fizz-buzz |54> => |Fizz>
fizz-buzz |55> => |Buzz>
fizz-buzz |56> => |56>
fizz-buzz |57> => |Fizz>
fizz-buzz |58> => |58>
fizz-buzz |59> => |59>
fizz-buzz |60> => |FizzBuzz>
fizz-buzz |61> => |61>
fizz-buzz |62> => |62>
fizz-buzz |63> => |Fizz>
fizz-buzz |64> => |64>
fizz-buzz |65> => |Buzz>
fizz-buzz |66> => |Fizz>
fizz-buzz |67> => |67>
fizz-buzz |68> => |68>
fizz-buzz |69> => |Fizz>
fizz-buzz |70> => |Buzz>
fizz-buzz |71> => |71>
fizz-buzz |72> => |Fizz>
fizz-buzz |73> => |73>
fizz-buzz |74> => |74>
fizz-buzz |75> => |FizzBuzz>
fizz-buzz |76> => |76>
fizz-buzz |77> => |77>
fizz-buzz |78> => |Fizz>
fizz-buzz |79> => |79>
fizz-buzz |80> => |Buzz>
fizz-buzz |81> => |Fizz>
fizz-buzz |82> => |82>
fizz-buzz |83> => |83>
fizz-buzz |84> => |Fizz>
fizz-buzz |85> => |Buzz>
fizz-buzz |86> => |86>
fizz-buzz |87> => |Fizz>
fizz-buzz |88> => |88>
fizz-buzz |89> => |89>
fizz-buzz |90> => |FizzBuzz>
fizz-buzz |91> => |91>
fizz-buzz |92> => |92>
fizz-buzz |93> => |Fizz>
fizz-buzz |94> => |94>
fizz-buzz |95> => |Buzz>
fizz-buzz |96> => |Fizz>
fizz-buzz |97> => |97>
fizz-buzz |98> => |98>
fizz-buzz |99> => |Fizz>
fizz-buzz |100> => |Buzz>

 |null> => |map>
----------------------------------------
And I guess that is it for this post.

Tuesday, 19 January 2016

new feature: list 2 sp

Just a quick bit of code to improve context.learn(). We can now learn a rule that is a list.

Here is the python:
def list_2_sp(one):
  r = superposition()
  if type(one) == list:
    for x in one:                                # what do we want to do if type(x) is not int, float or string?
      if type(x) == int or type(x) == float:
        r += ket("number: " + str(x))
      elif type(x) == str:
        r += ket(x)
  return r
And a quick demonstration of it in action:
# define some example lists:
list1 = [2,3,5,7,11,13]
list2 = ["cat","dog","horse","rat","mouse","lion","horse"]
list3 = ["a","b",37,2.1828,"a","fish","a",37]

# test learn code:
context.learn("list-of","small primes",list1)
context.learn("list-of","common animals",list2)
context.learn("list-of","test elements",list3)

# see what we have learnt:
context.print_universe()
Outputs (NB: the repeated elements with coeff > 1):
list-of |small primes> => |number: 2> + |number: 3> + |number: 5> + |number: 7> + |number: 11> + |number: 13>
list-of |common animals> => |cat> + |dog> + 2|horse> + |rat> + |mouse> + |lion>
list-of |test elements> => 3|a> + |b> + 2|number: 37> + |number: 2.1828> + |fish>
So, simple enough. Just makes using the python context.learn() a little cleaner. eg, instead of:
context.learn("friends","Fred",ket("Sam") + ket("Frank") + ket("Jim") + ket("Rob"))
we can now do:
context.learn("friends","Fred",["Sam","Frank","Jim","Rob"])
Of course, if you need your kets to have coeff other than 1, then you need to do it the old way, with the explicit ket(string,value). Though I suspect most of the time we can get away with just using lists.

Monday, 18 January 2016

new operators: times-by, divide-by, plus, minus

So, I've had times[k] for quite a while now, but not the rest. And it was implemented in an ugly, and not perfect, way. Anyway, all tidied now. And using the same code we can now also do divide, plus and minus. This means we can now do a few things in a neater way.

First up, factorial.
The old way:
fact |0> => |1>
n-1 |*> #=> arithmetic(|_self>,|->,|1>)
fact |*> #=> arithmetic( |_self>, |*>, fact n-1 |_self>)
The new way:
fact |0> => |1>
fact |*> #=> arithmetic( |_self>, |*>, fact minus[1] |_self>)
Fibonacci the old way:
fib |0> => |0>
fib |1> => |1>
n-1 |*> #=> arithmetic(|_self>,|->,|1>)
n-2 |*> #=> arithmetic(|_self>,|->,|2>)
fib |*> #=> arithmetic( fib n-1 |_self>, |+>, fib n-2 |_self>)
The new way:
fib |0> => |0>
fib |1> => |1>
fib |*> #=> arithmetic( fib minus[1] |_self>, |+>, fib minus[2] |_self>)
And from a long, long time ago we have this aspirational code to implement currency exchange information:
to-USD |currency: USD: _x> => |currency: USD: _x>
to-USD |currency: GBP: _x > => |currency: USD: _x*1.62>

to-GBP |currency: USD: _x> => | currency: GBP: _x*0.62>
to-GBP |currency: GBP: _x> => | currency: GBP: _x >
I say aspirational since I never found a clean way to implement this. I thought I might have had to change the parser and a bunch of other things. Now we can do it simply enough:
to-USD |currency: USD: *> #=> |_self>
to-USD |currency: GBP: *> #=> merge-labels(|currency: USD: > + times-by[1.62] extract-value |_self>)

to-GBP |currency: USD: *> #=> merge-labels(|currency: GBP: > + divide-by[1.62] extract-value |_self>)
to-GBP |currency: GBP: *> #=> |_self>
Again from long ago I had this for temperature conversion:
to-Kelvin |temperature: Kelvin: _x > => |temperature: Kelvin: _x >
to-Celsius |temperature: Kelvin: _x > => |temperature: Celsius: (_x - 273.15) >
to-Fahrenheit |temperature: Kelvin: _x> => |temperature: Fahrenheit: (_x*9/5 - 459.67) >

to-Kelvin |temperature: Celsius: _x > => |temperature: Kelvin: (_x + 273.15) >
to-Celsius |temperature: Celsius: _x> => |temperature: Celsius: _x >
to-Fahrenheit |temperature: Celsius: _x> => |temperature: Fahrenheit: (_x*9/5 + 32) >

to-Kelvin |temperature: Fahrenheit: _x > => |temperature: Kelvin: (_x + 459.67)*5/9 >
to-Celsius |temperature: Fahrenheit: _x > => |temperature: Celsius: (_x - 32)*5/9 >
to-Fahrenheit |temperature: Fahrenheit: _x > => |temperature: Fahrenheit: _x >
Now we can do it like this:
to-Kelvin |temperature: Kelvin: *> #=> |_self>
to-Celsius |temperature: Kelvin: *> #=> merge-labels(|temperature: Celsius: > + minus[273.15] extract-value |_self>)
to-Fahrenheit |temperature: Kelvin: *> #=> merge-labels(|temperature: Fahrenheit: > + minus[459.67] times-by[9/5] extract-value |_self>)

to-Kelvin |temperature: Celsius: *> #=> merge-labels(|temperature: Kelvin: > + plus[273.15] extract-value |_self>)
to-Celsius |temperature: Celsius: *> #=> |_self>
to-Fahrenheit |temperature: Celsius: *> #=> merge-labels(|temperature: Fahrenheit: > + plus[32] times-by[9/5] extract-value |_self>)

to-Kelvin |temperature: Fahrenheit: *> #=> merge-labels(|temperature: Kelvin: > + times-by[5/9] plus[459.67] extract-value |_self>)
to-Celsius |temperature: Fahrenheit: *> #=> merge-labels(|temperature: Celsius: > + times-by[5/9] minus[32] extract-value |_self>)
to-Fahrenheit |temperature: Fahrenheit: *> #=> |_self>
Yeah, a bit ugly I'm afraid. Though when we swap in "|a> _ |b>" for "merge-labels(|a> + |b>)" in the parser, that should make it a bit cleaner.

This is what that would look like:
to-Kelvin |temperature: Kelvin: *> #=> |_self>
to-Celsius |temperature: Kelvin: *> #=> |temperature: Celsius: > _ minus[273.15] extract-value |_self>
to-Fahrenheit |temperature: Kelvin: *> #=> |temperature: Fahrenheit: > _ minus[459.67] times-by[9/5] extract-value |_self>

to-Kelvin |temperature: Celsius: *> #=> |temperature: Kelvin: > _ plus[273.15] extract-value |_self>
to-Celsius |temperature: Celsius: *> #=> |_self>
to-Fahrenheit |temperature: Celsius: *> #=> |temperature: Fahrenheit: > _ plus[32] times-by[9/5] extract-value |_self>

to-Kelvin |temperature: Fahrenheit: *> #=> |temperature: Kelvin: > _ times-by[5/9] plus[459.67] extract-value |_self>
to-Celsius |temperature: Fahrenheit: *> #=> |temperature: Celsius: > _ times-by[5/9] minus[32] extract-value |_self>
to-Fahrenheit |temperature: Fahrenheit: *> #=> |_self>
And if the above looks like too much work. Well, that shouldn't be a problem. It only needs to be written once, and then we can web-load it. eg:
sa: web-load http://semantic-db.org/sw-examples/gbp-usd-exchange-rate.sw
sa: web-load http://semantic-db.org/sw-examples/temperature-conversion.sw
Though another reason why it looks ugly is the explicit data-type. We could leave that off and it would be a bit cleaner. eg:
to-Kelvin |Kelvin: *> #=> |_self>
to-Celsius |Kelvin: *> #=> |Celsius: > _ minus[273.15] extract-value |_self>
to-Fahrenheit |Kelvin: *> #=> |Fahrenheit: > _ minus[459.67] times-by[9/5] extract-value |_self>

to-Kelvin |Celsius: *> #=> |Kelvin: > _ plus[273.15] extract-value |_self>
to-Celsius |Celsius: *> #=> |_self>
to-Fahrenheit |Celsius: *> #=> |Fahrenheit: > _ plus[32] times-by[9/5] extract-value |_self>

to-Kelvin |Fahrenheit: *> #=> |Kelvin: > _ times-by[5/9] plus[459.67] extract-value |_self>
to-Celsius |Fahrenheit: *> #=> |Celsius: > _ times-by[5/9] minus[32] extract-value |_self>
to-Fahrenheit |Fahrenheit: *> #=> |_self>
And we could abbreviate it even more:
to-K |K: *> #=> |_self>
to-C |K: *> #=> |C: > _ minus[273.15] extract-value |_self>
to-F |K: *> #=> |F: > _ minus[459.67] times-by[9/5] extract-value |_self>

to-K |C: *> #=> |K: > _ plus[273.15] extract-value |_self>
to-C |C: *> #=> |_self>
to-F |C: *> #=> |F: > _ plus[32] times-by[9/5] extract-value |_self>

to-K |F: *> #=> |K: > _ times-by[5/9] plus[459.67] extract-value |_self>
to-C |F: *> #=> |C: > _ times-by[5/9] minus[32] extract-value |_self>
to-F |F: *> #=> |_self>
And I think that is enough for today! Though BTW, the above general scheme can be used for conversions of other units. Say distances, weights, etc.

ket arithmetic

OK, a slightly weird one today. We can actually do some simple arithmetic just using the coeffs of kets. Though they are fixed numbers, rather than parameters, so you can't do all that much interesting with them.

Anyway, some examples:
-- 3.2 + 5.3 + 7
sa: 3.2|x> + 5.3|x> + 7|x>
15.5|x>

-- 3.2 + 5.3 - 7
sa: 3.2|x> + 5.3|x> + -7 |x>
1.5|x>

-- 5 * 11 * 17
sa: 5 11 17 |x>
935|x>

-- 9/5 * 10
sa: 9/5 10 |x>
18|x>

-- 13^9
sa: 13^9 |x>
10604499373|x>

-- 5.2^2 (3 + 7.1 + 9.8)
sa: 5.2^2 (3|x> + 7.1|x> + 9.8|x>)
538.096|x>

-- 2 * 3 * 7^7 (5 + 9.2)
sa: 2 3 7^7 (5|x> + 9.2|x>)
70165863.6|x>
And I guess that is it. It should be obvious from there how to do arbitrary arithmetic using ket coeffs.

Sunday, 17 January 2016

is BKO related to category theory?

I don't know for sure, since I am a long, long way from being a mathematician. But by the wikipedia definition, it just might be the case.

Wikipedia gives this definition for a category and a functor:

A category C consists of the following three mathematical entities:

    A class ob(C), whose elements are called objects;
    A class hom(C), whose elements are called morphisms or maps or arrows. Each morphism f has a source object a and target object b.
    The expression f : a → b, would be verbally stated as "f is a morphism from a to b".
    The expression hom(a, b) — alternatively expressed as homC(a, b), mor(a, b), or C(a, b) — denotes the hom-class of all morphisms from a to b.
    A binary operation ∘, called composition of morphisms, such that for any three objects a, b, and c, we have hom(b, c) × hom(a, b) → hom(a, c). The composition of f : a → b and g : b → c is written as g ∘ f or gf,[4] governed by two axioms:
        Associativity: If f : a → b, g : b → c and h : c → d then h ∘ (g ∘ f) = (h ∘ g) ∘ f, and
        Identity: For every object x, there exists a morphism 1x : x → x called the identity morphism for x, such that for every morphism f : a → b, we have 1b ∘ f = f = f ∘ 1a.

Functors are structure-preserving maps between categories. They can be thought of as morphisms in the category of all (small) categories.

A (covariant) functor F from a category C to a category D, written F : C → D, consists of:

    for each object x in C, an object F(x) in D; and
    for each morphism f : x → y in C, a morphism F(f) : F(x) → F(y),

such that the following two properties hold:

    For every object x in C, F(1x) = 1F(x);
    For all morphisms f : x → y and g : y → z, F(g ∘ f) = F(g) ∘ F(f).

with the mapping:
objects -> kets
morphisms -> operators
morphism composition -> operator composition
category -> sw file
functor -> code that maps one sw file to another, with the same network structure, but different ket and operator names.
noting that our operators naturally satisfy the associativity requirement, and it is easy to create identity morphisms.

So, a for example in BKO:
f |a> => |b>
id-a |a> => |a>

g |b> => |c>
id-b |b> => |b>

h |c> => |d>
id-c |c> => |c>

id |*> #=> |_self>
And then a demonstration of operator composition:
sa: f |a>
|b>

sa: f id-a |a>
|b>

sa: id-b f |a>
|b>

sa: g f |a>
|c>

sa: h g f |a>
|d>
Perhaps we should not be surprised by this correspondence (providing it is true). Why? Well the project is trying to demonstrate BKO is a general representation for knowledge, and that representation has a natural interpretation of being manipulating network state. Maths is a subset of human knowledge, so there must be a way to represent that in BKO. The question is how compact is that representation? I don't know. Consider group theory. And say we want to represent that in BKO. By what we just said, there must be a representation of that in terms of networks. But how big or how small is this network? I suspect it is on the smaller side. My hunch being that abstract things have a smaller network structure than concrete things. I say this because concrete things have all these links to specific examples, and their related networks. An abstract maths idea can exist on its own without this. And the other reason is the very definition of abstract mathematics. It is abstracting the essence from a series of examples with the same structure. The essence of a thing being smaller than the thing. Though all the in-brain/in-mathematician machinery to manipulate mathematical objects adds to the size of the network!

A thing to wonder is, what is the category theory equivalent of mapping a ket to a superposition, or to a superposition with coeffs other than 1? eg:
f |x> => |a> + |b> + |c> + |d> + |e>
g |y> => 3.7|a> + |b> + 2.2218|c> + 13|e>
Though I don't want to tread too far into real mathematics, I'm not a mathematician and would make a fool of myself.

Update: a couple of comments:
1) standard semantic web is somewhat close to a category too, with the mapping:
objects -> URI's
morphisms -> URI's
category -> RDF file
however, RDF, as far as I know, doesn't have morphism composition, or identity morphisms.

2) metaphors and analogies are approximately like functors. Though only approximate, in most cases, since analogies are rarely perfect. You can usually push them too far, to a point where the two systems have different properties.

Update: a demonstration of 3 ways that we have "commuting" operators:
-- trivial, f is essentially the same as g:
f |a> => |b>
f |b> => |c>
g |a> => |b>
g |b> => |c>

-- now test:
sa: g f |a>
|c>

sa: f g |a>
|c>

-- this time, make use of float commutation:
f |a> => 0.71 |b>
f |b> => 0.71 |c>
g |a> => 3.5 |b>
g |b> => 3.5 |c>

-- now test:
sa: g f |a>
2.485|c>

sa: f g |a>
2.485|c>

-- finally, "g f" and "f g" use different pathways, but end at the same spot:
f |a> = |x>
f |y> => |b>
g |a> => |y>
g |x> => |b>

-- now test:
sa: g f |a>
|b>

sa: f g |a>
|b>

Sunday, 20 December 2015

how well do you know X

What does it mean when you ask "how well do you know X?" One possible definition: the amount of "everything" you know about a given ket.

Here, let's drop to BKO to try and demonstrate:
-- load some sample data:
sa: load early-us-presidents.sw

-- define the everything operator:
sa: everything-op |*> #=> apply(supported-ops |_self>,|_self>)

-- apply it to all known kets in the sw file:
sa: map[everything-op,everything] rel-kets[*] |>

-- define operators to measure how big "everything" is:
sa: how-many-everything |*> #=> how-many everything |_self>
sa: how-many-everything-2 |*> #=> how-many everything^2 |_self>
sa: how-many-everything-3 |*> #=> how-many everything^3 |_self>

-- show a pretty table:
sa: table[ket,how-many-everything,how-many-everything-2,how-many-everything-3] rel-kets[*] |>
+-----------------------+---------------------+-----------------------+-----------------------+
| ket                   | how-many-everything | how-many-everything-2 | how-many-everything-3 |
+-----------------------+---------------------+-----------------------+-----------------------+
| _list                 | 6                   | 56                    | 8                     |
| Washington            | 12                  | 1                     | 0                     |
| George Washington     | 1                   | 0                     | 0                     |
| Adams                 | 8                   | 1                     | 0                     |
| John Adams            | 1                   | 0                     | 0                     |
| Jefferson             | 12                  | 3                     | 0                     |
| Thomas Jefferson      | 1                   | 0                     | 0                     |
| Madison               | 12                  | 3                     | 0                     |
| James Madison         | 1                   | 0                     | 0                     |
| Monroe                | 12                  | 3                     | 0                     |
| James Monroe          | 1                   | 0                     | 0                     |
| Q Adams               | 8                   | 3                     | 0                     |
| John Quincy Adams     | 1                   | 0                     | 0                     |
| Democratic-Republican | 2                   | 0                     | 0                     |
| *                     | 0                   | 0                     | 0                     |
+-----------------------+---------------------+-----------------------+-----------------------+
Now, another example:
-- load another set of data:
sa: load bots.sw

-- process this data:
sa: everything-op |*> #=> apply(supported-ops |_self>,|_self>)
sa: map[everything-op,everything] rel-kets[*] |>
sa: how-many-everything |*> #=> how-many everything |_self>
sa: how-many-everything-2 |*> #=> how-many everything^2 |_self>
sa: how-many-everything-3 |*> #=> how-many everything^3 |_self>

-- show a pretty table:
sa: table[ket,how-many-everything,how-many-everything-2,how-many-everything-3] rel-kets[*] |>
+---------+---------------------+-----------------------+-----------------------+
| ket     | how-many-everything | how-many-everything-2 | how-many-everything-3 |
+---------+---------------------+-----------------------+-----------------------+
| Bella   | 19                  | 0                     | 0                     |
| Emma    | 19                  | 0                     | 0                     |
| Madison | 22                  | 38                    | 0                     |
| *       | 0                   | 0                     | 0                     |
+---------+---------------------+-----------------------+-----------------------+
So not a bad start at defining "how well you do know X?".

Now, lets try to automate this a bit, by making use of the fact that loading sw files can do computation.

So first, let's create this sw file (NB: we must not define a context since we want it to be applied to the current context, instead of its own):
$ cat sw-examples/how-well-do-you-know.sw
-- process the data:
  everything-op |*> #=> apply(supported-ops |_self>,|_self>)
  |null> => map[everything-op,everything] rel-kets[*] |>

  how-many-everything |*> #=> how-many everything |_self>
  how-many-everything-2 |*> #=> how-many everything^2 |_self>
  how-many-everything-3 |*> #=> how-many everything^3 |_self>

-- show a pretty table:
  |null> => table[ket,how-many-everything,how-many-everything-2,how-many-everything-3] rel-kets[*] |>
NB: the "|null> => ..." is needed because only valid learn rules are processed during sw file loading. So we just use |null> as a dummy variable. If you leave them out, then our map and table lines would not be processed.

Now, put it to use:
$ ./the_semantic_db_console.py
Welcome!

sa: load fred-sam-friends.sw
sa: load how-well-do-you-know.sw
+------+---------------------+-----------------------+-----------------------+
| ket  | how-many-everything | how-many-everything-2 | how-many-everything-3 |
+------+---------------------+-----------------------+-----------------------+
| Fred | 8                   | 0                     | 0                     |
| Sam  | 7                   | 0                     | 0                     |
| *    | 0                   | 0                     | 0                     |
| null | 0                   | 0                     | 0                     |
+------+---------------------+-----------------------+-----------------------+
And another example:
sa: load breakfast-menu.sw
sa: load how-well-do-you-know.sw
+-----------------------------+---------------------+-----------------------+-----------------------+
| ket                         | how-many-everything | how-many-everything-2 | how-many-everything-3 |
+-----------------------------+---------------------+-----------------------+-----------------------+
| breakfast                   | 5                   | 19                    | 0                     |
| Belgian Waffles             | 4                   | 0                     | 0                     |
| Strawberry Belgian Waffles  | 4                   | 0                     | 0                     |
| Berry-Berry Belgian Waffles | 4                   | 0                     | 0                     |
| French Toast                | 4                   | 0                     | 0                     |
| Homestyle Breakfast         | 4                   | 0                     | 0                     |
| waffles                     | 1                   | 0                     | 0                     |
| belgian                     | 1                   | 0                     | 0                     |
| strawberries                | 2                   | 0                     | 0                     |
| berries                     | 2                   | 0                     | 0                     |
| french                      | 1                   | 0                     | 0                     |
| toast                       | 1                   | 0                     | 0                     |
| breakfast                   | 1                   | 0                     | 0                     |
| egg                         | 1                   | 0                     | 0                     |
| eggs                        | 1                   | 0                     | 0                     |
| bacon                       | 1                   | 0                     | 0                     |
| sausage                     | 1                   | 0                     | 0                     |
| two                         | 1                   | 0                     | 0                     |
| cream                       | 1                   | 0                     | 0                     |
| *                           | 0                   | 0                     | 0                     |
| null                        | 0                   | 0                     | 0                     |
+-----------------------------+---------------------+-----------------------+-----------------------+
I think that is kind of cool!

visualizing edit distance

Last time we implemented edit distance in BKO, and gave a couple of simple examples. Today I just want to visualize that.

Recall from last time:
-- Levenshtein distance:
sa: insert[g,6] substitute[i,e,4] substitute[s,k,0] |kitten>
|sitting>

-- LCS distance:
sa: insert[g,6] insert[i,4] delete[e,4] insert[s,0] delete[k,0] |kitten>
|sitting>
First, we need a little bit of work to convert this to a pretty graph. First, in the console:
  context visualizing edit distance
  op1 |kitten> => substitute[s,k,0] |_self>
  op2 substitute[s,k,0] |kitten> => substitute[i,e,4] |_self>
  op3 substitute[i,e,4] substitute[s,k,0] |kitten> => insert[g,6] |_self>

  op4 |kitten> => delete[k,0] |_self>
  op5 op4 |kitten> => insert[s,0] |_self>
  op6 op5 op4 |kitten> => delete[e,4] |_self>
  op7 op6 op5 op4 |kitten> => insert[i,4] |_self>
  op8 op7 op6 op5 op4 |kitten> => insert[g,6] |_self>
Now, see what we have:
sa: dump
----------------------------------------
|context> => |context: visualizing edit distance>

op1 |kitten> => |sitten>
op4 |kitten> => |itten>

op2 |sitten> => |sittin>
op6 |sitten> => |sittn>

op3 |sittin> => |sitting>
op8 |sittin> => |sitting>

op5 |itten> => |sitten>

op7 |sittn> => |sittin>
----------------------------------------
sa: save visualizing-edit-distance--raw.sw
sa: q

$ ./sw2dot-v2.py sw-examples/visualizing-edit-distance--raw.sw

$ cat graph-examples/visualizing-edit-distance--raw.dot
digraph g {
"context" -> "visualizing edit distance"
"kitten" -> "sitten" [label="op1",arrowhead=normal]
"kitten" -> "itten" [label="op4",arrowhead=normal]
"sitten" -> "sittin" [label="op2",arrowhead=normal]
"sitten" -> "sittn" [label="op6",arrowhead=normal]
"sittin" -> "sitting" [label="op3",arrowhead=normal]
"sittin" -> "sitting" [label="op8",arrowhead=normal]
"itten" -> "sitten" [label="op5",arrowhead=normal]
"sittn" -> "sittin" [label="op7",arrowhead=normal]
}
Now, we need to tweak the operator labels to this:
$ cat graph-examples/visualizing-edit-distance.dot
digraph g {
"context" -> "visualizing edit distance"
"kitten" -> "sitten" [label="substitute[s,k,0]",arrowhead=normal]
"kitten" -> "itten" [label="delete[k,0]",arrowhead=normal]
"sitten" -> "sittin" [label="substitute[i,e,4]",arrowhead=normal]
"sitten" -> "sittn" [label="delete[e,4]",arrowhead=normal]
"sittin" -> "sitting" [label="insert[g,6]",arrowhead=normal]
"sittin" -> "sitting" [label="insert[g,6]",arrowhead=normal]
"itten" -> "sitten" [label="insert[s,0]",arrowhead=normal]
"sittn" -> "sittin" [label="insert[i,4]",arrowhead=normal]
}
And graph it with graphviz:
Which is a very pretty picture, and neatly shows how operators step through "brain space". And unlike in our previous graphs, these are not literal operators, these are function operators. It also clearly shows how kets map to nodes, and operators to links between nodes.

And a brief comment. I wonder if we want to write some code to auto-map op-sequences to pretty graphs? How easy would it be, and would there be enough use for it? Don't know yet.

Thursday, 17 December 2015

edit distance in BKO

I've been meaning to do this one for a long time now! Finally got around to it today. Not very hard. So, start with the wikipedia page. Now, we need to implement these 3 operators:
    Insertion of a single symbol. If a = uv, then inserting the symbol x produces uxv. This can also be denoted ε→x, using ε to denote the empty string.
    Deletion of a single symbol changes uxv to uv (x→ε).
    Substitution of a single symbol x for a symbol y ≠ x changes uxv to uyv (x→y). 
A few lines of python, and now we can reproduce the example on the wikipage:
The Levenshtein distance between "kitten" and "sitting" is 3. A minimal edit script that transforms the former into the latter is:

    kitten → sitten (substitution of "s" for "k")
    sitten → sittin (substitution of "i" for "e")
    sittin → sitting (insertion of "g" at the end).

LCS distance (insertions and deletions only) gives a different distance and minimal edit script:

    delete k at 0
    insert s at 0
    delete e at 4
    insert i at 4
    insert g at 6

for a total cost/distance of 5 operations.
And now reproduce that example in BKO:
-- Levenshtein distance:
sa: insert[g,6] substitute[i,e,4] substitute[s,k,0] |kitten>
|sitting>

-- LCS distance:
sa: insert[g,6] insert[i,4] delete[e,4] insert[s,0] delete[k,0] |kitten>
|sitting>
And now a couple of comments. First, it seems BKO operators are a natural way to represent these text operators. The second is that the number of operators from starting superposition (in the above example, kets), to the final one seems to form a natural kind of distance metric. This being general, not just for the above 3 operators, but all operators in our project. For example, when you say something is a deep thought, in BKO that would correspond to a very long op sequence from starting facts. And related to that, the brain must be doing something interesting, because for a computer, finding the exact op sequence that matches starting point to desired destination is vastly expensive big O wise. How the brain seems to do this efficiently, I don't know. I mean, it is an extreme example, but consider the 100 page proof of Fermat's last theorem. If we convert that to a big op sequence, naive search to find it would be impossible. I guess humans must divide the problem into smaller problems which require smaller op sequences to find, but still it is rather amazing!

And oh yeah, I think eventually we will find a natural way to encode mathematics using the BKO structure. For example a proof is just an operator (itself constructed from an op sequence) from one superposition (in this case starting axioms and knowledge) to a desired end superposition. And math objects such as metrics or groups correspond to particular network structures. And possibly homomorphisms are maps between objects with similar network structures.

Another thing I should mention is I'm not a big fan of always using edit distance to measure similarity of two strings. Sure, seems to work great for spell check, which so far I can't reproduce well using my simm. But for longer text I prefer my similarity metric. I guess the best link I can give is to this blog post, and this code.

BTW, here was my attempt to use similar[op] for spell check:
sa: load common-English-words.sw
sa: word2ngram-op |*> #=> letter-ngrams[1,2,3] |_self>
sa: map[word2ngram-op,word2ngram] list-of |common English words>
sa: map[word2ngram-op,word2ngram] |elefant>
sa: table[word,coeff] select[1,30] 100 self-similar[word2ngram] |elefant>
+-----------+--------+
| word      | coeff  |
+-----------+--------+
| elefant   | 100.0  |
| elegant   | 66.667 |
| mantelet  | 57.937 |
| elephant  | 57.143 |
| relevant  | 57.143 |
| fantail   | 55.556 |
| infante   | 55.556 |
| teleran   | 55.556 |
| leant     | 52.778 |
| inelegant | 51.389 |
| infantile | 51.389 |
| antler    | 51.111 |
| cantle    | 51.111 |
| levant    | 51.111 |
| mantel    | 51.111 |
| mantle    | 51.111 |
| anele     | 50.0   |
| antefix   | 50.0   |
| defiant   | 50.0   |
| element   | 50.0   |
| fantasm   | 50.0   |
| fantast   | 50.0   |
| fantasy   | 50.0   |
| fantom    | 50.0   |
| gantlet   | 50.0   |
| infant    | 50.0   |
| infanta   | 50.0   |
| pantile   | 50.0   |
| relent    | 50.0   |
| reliant   | 50.0   |
+-----------+--------+
So sort of works. But we have things in there like "mantelet" that has a high match because of "ele" and "ant". This is from the 3-grams, and the fact that we have no real ordering info. So "ele" + "ant" is just as similar as "ant" + "ele". I guess an improvement is not to run the similar[op] on ngrams, but first convert letters into how they are pronounced. And then run similar[op] on that. I don't know the easiest way to do that.

BTW, here is how I made the "common-English-words.sw" file:
-- download data from here: http://icon.shef.ac.uk/Moby/mwords.html
-- then run this script:
$ ./list-to-sp.sh "list-of |common English words> => " 74550com.mon > common-English-words.sw
Update: ahh... one definition of intelligence is the speed with which an agent finds the op sequence that maps from the current state (superposition) to the desired end state (superposition). Especially for the case of non-trivial op sequences. There are other possible definitions. eg, the speed with which an agent finds a good internal representation (ie a well constructed network) for a concept.

BTW, a comment on "well constructed network". This is probably the difference between rote learning a concept (which corresponds to a poorly constructed network), and having a deep understanding of a concept.