Tuesday, 25 October 2016

learning how to spell

In this post we are going to learn how to spell using HTM style high order sequences. This may look trivial, eg compared to how you would do it in python, but it is a nice proof of concept of how the brain might do it. Or at least a mathematical abstraction of that. There are two stages, the learning stage, and the recall stage. And there are two components to the learning stage. Encoding all the symbols we will use, and then learning the sequences of those symbols. In our case 69 symbols, 74,550 words, and hence 74,550 sequences. The words are from the Moby project. I guess the key point of this post is that without the concept of mini-columns (and our random-column[k] operator), we could not represent distinct sequences of our symbols. Another point is this is just a proof of concept. In practice we should be able to carry over the idea to other types of sequences, not just individual letters. I'll probably try that later, eg maybe sequences of words in text.

Here is what the encode stage looks like, where we map symbols to random SDR's with 10 bits on, out of a possible 65536 bits. I chose 65536 since it works best if our encode SDR's do not have any overlap. eg, my first attempt I used only 2048 total bits, but that had issues. But thanks to our sparse representation, this change was essentially free.
full |range> => range(|1>,|65536>)
encode |-> => pick[10] full |range>
encode |a> => pick[10] full |range>
encode |b> => pick[10] full |range>
encode |l> => pick[10] full |range>
encode |e> => pick[10] full |range>
encode |c> => pick[10] full |range>
encode |o> => pick[10] full |range>
encode |u> => pick[10] full |range>
encode |s> => pick[10] full |range>
encode |d> => pick[10] full |range>
encode |m> => pick[10] full |range>
encode |i> => pick[10] full |range>
encode |g> => pick[10] full |range>
encode |y> => pick[10] full |range>
encode |n> => pick[10] full |range>
...
And note we have single symbols inside our kets, but they could be anything.

Next we have the learn sequence stage, eg "frog":
-- frog
-- f, r, o, g
first-letter |frog> => random-column[10] encode |f>
parent-word |node 35839: *> => |frog>
pattern |node 35839: 0> => first-letter |frog>
then |node 35839: 0> => random-column[10] encode |r>

pattern |node 35839: 1> => then |node 35839: 0>
then |node 35839: 1> => random-column[10] encode |o>

pattern |node 35839: 2> => then |node 35839: 1>
then |node 35839: 2> => random-column[10] encode |g>

pattern |node 35839: 3> => then |node 35839: 2>
then |node 35839: 3> #=> append-column[10] encode |end of sequence>
This is what that looks like after learning, first in the standard superposition representation:
sa: dump |frog>
first-letter |frog> => |48197: 6> + |53532: 0> + |62671: 2> + |14968: 2> + |62260: 8> + |16180: 1> + |15225: 0> + |19418: 4> + |24524: 7> + |13432: 6>

sa: dump starts-with |node 35839: >
parent-word |node 35839: *> => |frog>

pattern |node 35839: 0> => |48197: 6> + |53532: 0> + |62671: 2> + |14968: 2> + |62260: 8> + |16180: 1> + |15225: 0> + |19418: 4> + |24524: 7> + |13432: 6>
then |node 35839: 0> => |56997: 6> + |38159: 3> + |55020: 5> + |10359: 6> + |29215: 7> + |56571: 6> + |55139: 9> + |27229: 5> + |57329: 7> + |56577: 4>

pattern |node 35839: 1> => |56997: 6> + |38159: 3> + |55020: 5> + |10359: 6> + |29215: 7> + |56571: 6> + |55139: 9> + |27229: 5> + |57329: 7> + |56577: 4>
then |node 35839: 1> => |41179: 2> + |12201: 9> + |63912: 7> + |33066: 1> + |47072: 1> + |17108: 4> + |48988: 0> + |9205: 2> + |34935: 2> + |513: 2>

pattern |node 35839: 2> => |41179: 2> + |12201: 9> + |63912: 7> + |33066: 1> + |47072: 1> + |17108: 4> + |48988: 0> + |9205: 2> + |34935: 2> + |513: 2>
then |node 35839: 2> => |55496: 8> + |57594: 7> + |60795: 5> + |54740: 4> + |40157: 2> + |2940: 7> + |51329: 1> + |24597: 7> + |15515: 9> + |47272: 8>

pattern |node 35839: 3> => |55496: 8> + |57594: 7> + |60795: 5> + |54740: 4> + |40157: 2> + |2940: 7> + |51329: 1> + |24597: 7> + |15515: 9> + |47272: 8>
then |node 35839: 3> #=> append-column[10] encode |end of sequence>
And now in the display representation:
sa: display |frog>
  frog
  supported-ops: op: first-letter
   first-letter: 48197: 6, 53532: 0, 62671: 2, 14968: 2, 62260: 8, 16180: 1, 15225: 0, 19418: 4, 24524: 7, 13432: 6

sa: display starts-with |node 35839: >
  node 35839: *
  supported-ops: op: parent-word
    parent-word: frog

  node 35839: 0
  supported-ops: op: pattern, op: then
        pattern: 48197: 6, 53532: 0, 62671: 2, 14968: 2, 62260: 8, 16180: 1, 15225: 0, 19418: 4, 24524: 7, 13432: 6
           then: 56997: 6, 38159: 3, 55020: 5, 10359: 6, 29215: 7, 56571: 6, 55139: 9, 27229: 5, 57329: 7, 56577: 4

  node 35839: 1
  supported-ops: op: pattern, op: then
        pattern: 56997: 6, 38159: 3, 55020: 5, 10359: 6, 29215: 7, 56571: 6, 55139: 9, 27229: 5, 57329: 7, 56577: 4
           then: 41179: 2, 12201: 9, 63912: 7, 33066: 1, 47072: 1, 17108: 4, 48988: 0, 9205: 2, 34935: 2, 513: 2

  node 35839: 2
  supported-ops: op: pattern, op: then
        pattern: 41179: 2, 12201: 9, 63912: 7, 33066: 1, 47072: 1, 17108: 4, 48988: 0, 9205: 2, 34935: 2, 513: 2
           then: 55496: 8, 57594: 7, 60795: 5, 54740: 4, 40157: 2, 2940: 7, 51329: 1, 24597: 7, 15515: 9, 47272: 8

  node 35839: 3
  supported-ops: op: pattern, op: then
        pattern: 55496: 8, 57594: 7, 60795: 5, 54740: 4, 40157: 2, 2940: 7, 51329: 1, 24597: 7, 15515: 9, 47272: 8
           then: # append-column[10] encode |end of sequence>
So, what on Earth does this all mean? Let's try to unwrap it by first considering the first-letter of our sample sequence "frog", the letter f. Here is what the encoded symbol "f" looks like, followed by the mini-column version that is specific to the "frog" sequence, and then the "fish" sequence:
sa: encode |f>
|48197> + |53532> + |62671> + |14968> + |62260> + |16180> + |15225> + |19418> + |24524> + |13432>

sa: first-letter |frog>
|48197: 6> + |53532: 0> + |62671: 2> + |14968: 2> + |62260: 8> + |16180: 1> + |15225: 0> + |19418: 4> + |24524: 7> + |13432: 6>

sa: first-letter |fish>
|48197: 4> + |53532: 0> + |62671: 7> + |14968: 3> + |62260: 4> + |16180: 2> + |15225: 5> + |19418: 3> + |24524: 3> + |13432: 3>
Perhaps one way to understand the |x: y> is as co-ordinates of synapses. The encode step provides the x co-ordinate, and the mini-column cell the y co-ordinate, where the x co-ord is the same for all instances of "f", but the y co-ords are specific to particular sequences. It is this property that allows us to encode an entire dictionary worth of words, composed of just a handful of symbols.

Once we have the start superposition (otherwise known as a SDR since the coefficients of our kets are all equal to 1) for our sequence, we then use if-then machines to define the rest of the sequence.

The "f" superposition is followed by the "r" superposition:
pattern |node 35839: 0> => |48197: 6> + |53532: 0> + |62671: 2> + |14968: 2> + |62260: 8> + |16180: 1> + |15225: 0> + |19418: 4> + |24524: 7> + |13432: 6>
then |node 35839: 0> => |56997: 6> + |38159: 3> + |55020: 5> + |10359: 6> + |29215: 7> + |56571: 6> + |55139: 9> + |27229: 5> + |57329: 7> + |56577: 4>
The "r" superposition followed by the "o" superposition:
pattern |node 35839: 1> => |56997: 6> + |38159: 3> + |55020: 5> + |10359: 6> + |29215: 7> + |56571: 6> + |55139: 9> + |27229: 5> + |57329: 7> + |56577: 4>
then |node 35839: 1> => |41179: 2> + |12201: 9> + |63912: 7> + |33066: 1> + |47072: 1> + |17108: 4> + |48988: 0> + |9205: 2> + |34935: 2> + |513: 2>
And so on.The next thing to note is we can invert our superpositions back to their original symbols using this operator:
name-pattern |*> #=> clean select[1,1] similar-input[encode] extract-category pattern |_self>
where the "pattern" operator maps from node space to pattern space, "extract-category" is the inverse of our "random-column[k]" operator, "similar-input[encode]" is essentially the inverse of the "encode" operator, "select[1,1]" selects the first element in the superposition, and "clean" sets the coefficient of all kets to 1. Let's unwrap it:
sa: pattern |node 35839: 0>
|48197: 6> + |53532: 0> + |62671: 2> + |14968: 2> + |62260: 8> + |16180: 1> + |15225: 0> + |19418: 4> + |24524: 7> + |13432: 6>

sa: extract-category pattern |node 35839: 0>
|48197> + |53532> + |62671> + |14968> + |62260> + |16180> + |15225> + |19418> + |24524> + |13432>

sa: similar-input[encode] extract-category pattern |node 35839: 0>
1.0|f>

sa: select[1,1] similar-input[encode] extract-category pattern |node 35839: 0>
1.0|f>

sa: clean select[1,1] similar-input[encode] extract-category pattern |node 35839: 0>
|f>
And here is an example:
sa: name-pattern |node 35839: 0>
|f>

sa: name-pattern |node 35839: 1>
|r>

sa: name-pattern |node 35839: 2>
|o>

sa: name-pattern |node 35839: 3>
|g>
Indeed, the name operator is a key piece we need to define our spell operator. The other piece is the next operator, which given the current pattern, returns the next pattern:
next (*) #=> then clean select[1,1] similar-input[pattern] |_self>
Though due to an incomplete parser (this project is still very much a work in progress!) we can only implement this operator:
next-pattern |*> #=> then clean select[1,1] similar-input[pattern] pattern |_self>
where "pattern" maps from node space to pattern space, "similar-input[pattern]" is approximately the inverse of "pattern", "select[1,1]" and "clean" tidy up our results, and then the "then" operator maps to the next pattern. Let's unwrap it:
sa: pattern |node 35839: 0>
|48197: 6> + |53532: 0> + |62671: 2> + |14968: 2> + |62260: 8> + |16180: 1> + |15225: 0> + |19418: 4> + |24524: 7> + |13432: 6>

sa: similar-input[pattern] pattern |node 35839: 0>
1.0|node 35839: 0> + 0.6|node 35370: 0> + 0.5|node 11806: 3> + 0.5|node 18883: 7> + 0.5|node 20401: 8> + 0.5|node 20844: 5> + 0.5|node 26112: 8> + 0.5|node 29209: 4> + 0.5|node 33566: 0> + 0.5|node 33931: 0> + 0.5|node 35463: 0> + ...

sa: select[1,1] similar-input[pattern] pattern |node 35839: 0>
1.0|node 35839: 0>

sa: clean select[1,1] similar-input[pattern] pattern |node 35839: 0>
|node 35839: 0>

sa: then clean select[1,1] similar-input[pattern] pattern |node 35839: 0>
|56997: 6> + |38159: 3> + |55020: 5> + |10359: 6> + |29215: 7> + |56571: 6> + |55139: 9> + |27229: 5> + |57329: 7> + |56577: 4>
And here is an example:
sa: next-pattern |node 35839: 0>
|56997: 6> + |38159: 3> + |55020: 5> + |10359: 6> + |29215: 7> + |56571: 6> + |55139: 9> + |27229: 5> + |57329: 7> + |56577: 4>

sa: next-pattern |node 35839: 1>
|41179: 2> + |12201: 9> + |63912: 7> + |33066: 1> + |47072: 1> + |17108: 4> + |48988: 0> + |9205: 2> + |34935: 2> + |513: 2>

sa: next-pattern |node 35839: 2>
|55496: 8> + |57594: 7> + |60795: 5> + |54740: 4> + |40157: 2> + |2940: 7> + |51329: 1> + |24597: 7> + |15515: 9> + |47272: 8>

sa: next-pattern |node 35839: 3>
|31379: 0> + |31379: 1> + |31379: 2> + |31379: 3> + |31379: 4> + |31379: 5> + |31379: 6> + |31379: 7> + |31379: 8> + |31379: 9> + |46188: 0> + |46188: 1> + |46188: 2> + |46188: 3> + |46188: 4> + |46188: 5> + |46188: 6> + |46188: 7> + |46188: 8> + |46188: 9> + |9864: 0> + |9864: 1> + |9864: 2> + |9864: 3> + |9864: 4> + |9864: 5> + |9864: 6> + |9864: 7> + |9864: 8> + |9864: 9> + |49649: 0> + |49649: 1> + |49649: 2> + |49649: 3> + |49649: 4> + |49649: 5> + |49649: 6> + |49649: 7> + |49649: 8> + |49649: 9> + |43145: 0> + |43145: 1> + |43145: 2> + |43145: 3> + |43145: 4> + |43145: 5> + |43145: 6> + |43145: 7> + |43145: 8> + |43145: 9> + |45289: 0> + |45289: 1> + |45289: 2> + |45289: 3> + |45289: 4> + |45289: 5> + |45289: 6> + |45289: 7> + |45289: 8> + |45289: 9> + |38722: 0> + |38722: 1> + |38722: 2> + |38722: 3> + |38722: 4> + |38722: 5> + |38722: 6> + |38722: 7> + |38722: 8> + |38722: 9> + |43012: 0> + |43012: 1> + |43012: 2> + |43012: 3> + |43012: 4> + |43012: 5> + |43012: 6> + |43012: 7> + |43012: 8> + |43012: 9> + |1949: 0> + |1949: 1> + |1949: 2> + |1949: 3> + |1949: 4> + |1949: 5> + |1949: 6> + |1949: 7> + |1949: 8> + |1949: 9> + |31083: 0> + |31083: 1> + |31083: 2> + |31083: 3> + |31083: 4> + |31083: 5> + |31083: 6> + |31083: 7> + |31083: 8> + |31083: 9>
Noting that the final pattern is the end-of-sequence pattern "append-column[10] encode |end of sequence>", used to signify to our code the end of a sequence. I don't know the biological equivalent, but it seems plausible to me that there is one. But even if not, no big drama, I'm already abstracted away from the underlying biology. Next up, here is the code for our spell operator, though largely pseudo-code at the moment:
  next (*) #=> then clean select[1,1] similar-input[pattern] |_self>
  name (*) #=> clean select[1,1] similar-input[encode] extract-category |_self>

  not |yes> => |no>
  not |no> => |yes>

  spell (*) #=>
    if not do-you-know first-letter |_self>:
      return |_self>
    current |node> => first-letter |_self>
    while name current |node> /= |end of sequence>:
      print name current |node>
      current |node> => next current |node>
    return |end of sequence>
And here is that translated to the underlying python:
# one is a ket
def spell(one,context):
  start = one.apply_op(context,"first-letter")
  if len(start) == 0:                  # we don't know the first letter, so return the input ket
    return one
  print("spell word:",one)
  context.learn("current","node",start)
  name = context.recall("current","node",True).apply_fn(extract_category).similar_input(context,"encode").select_range(1,1).apply_sigmoid(clean)
  while name.the_label() != "end of sequence":
    print(name)
    context.learn("current","node",ket("node").apply_op(context,"current").similar_input(context,"pattern").select_range(1,1).apply_sigmoid(clean).apply_op(context,"then"))
    name = context.recall("current","node",True).apply_fn(extract_category).similar_input(context,"encode").select_range(1,1).apply_sigmoid(clean)
  return name
And finally, let's actually use this code!
sa: spell |frog>
spell word: |frog>
|f>
|r>
|o>
|g>
|end of sequence>

sa: spell |fish>
spell word: |fish>
|f>
|i>
|s>
|h>
|end of sequence>

sa: spell |rabbit>
spell word: |rabbit>
|r>
|a>
|b>
|b>
|i>
|t>
|end of sequence>
Next up, let's see what we can do with this data. I warn you, the answer is quite a lot. First, a basic look at the number of learn rules in our data:
-- the number of encode learn rules, ie, the number of symbols:
sa: how-many rel-kets[encode]
|number: 69>

-- the number of "first-letter" operators, ie, the number of words:
sa: how-many rel-kets[first-letter]
|number: 74550>

-- the number of "pattern" operators, ie, the number of well, patterns:
sa: how-many rel-kets[pattern]
|number: 656132>

-- the number of nodes, ie the number of if-then machines, ie, roughly the number of neurons in our system:
sa: how-many starts-with |node >
|number: 730682>
Next, let's produce a bar-chart of the lengths of our sequences/words:
sa: bar-chart[50] plus[1] ket-sort extract-value clean similar-input[then] append-column[10] encode |end of sequence>
----------
1  :
2  : |
3  : ||||||
4  : ||||||||||||||||||||
5  : ||||||||||||||||||||||||||||||
6  : ||||||||||||||||||||||||||||||||||||||||||
7  : ||||||||||||||||||||||||||||||||||||||||||||||||
8  : ||||||||||||||||||||||||||||||||||||||||||||||||||
9  : ||||||||||||||||||||||||||||||||||||||||||||||||
10 : ||||||||||||||||||||||||||||||||||||||||
11 : ||||||||||||||||||||||||||||||
12 : ||||||||||||||||||||||
13 : |||||||||||||||
14 : ||||||||||
15 : |||||||
16 : |||||
17 : |||
18 : ||
19 : |
20 : |
21 :
22 :
23 :
24 :
25 :
26 :
27 :
28 :
29 :
30 :
31 :
32 :
33 :
34 :
37 :
39 :
42 :
45 :
53 :
----------
Next, given a symbol predict what comes next:
next-symbol-after |*> #=> bar-chart[50] ket-sort similar-input[encode] extract-category then drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>

-- what usually follows "A":
sa: next-symbol-after |A>
----------
1               :
2               :
                :
-               :
.               : |||||||||
/               :
a               : |
A               :
b               : ||||
B               :
c               : ||||
C               :
d               : ||||||
D               :
e               : |||
E               :
end of sequence : ||||||||||||||||||||||||||||||||||||||||||||||||||
f               : ||||
F               :
g               : |||
G               :
h               : |
H               :
i               : ||
I               :
j               :
k               : |
l               : |||||||||||||||||||||||
L               :
m               : |||||||||||
M               : |
n               : |||||||||||||||||||||
N               :
O               :
o               :
p               : ||||
P               :
q               :
Q               :
r               : ||||||||||||||||||||||
R               :
s               : |||||||||
S               :
t               : ||||||
T               : |
u               : ||||||||||
v               : |||
x               :
y               : |
z               : |
----------

-- what usually follows "a":
sa: next-symbol-after |a>
----------
                :
'               :
-               :
.               :
a               :
b               : ||
c               : |||||
d               : |||
e               :
end of sequence : ||||||||||||||||||||||||||||||||||||||||||||||||||
f               :
g               : ||
h               :
i               : ||
I               :
j               :
k               : |
l               : ||||||||||
m               : |||
n               : ||||||||||||||
o               :
p               : ||
q               :
r               : |||||||||||
R               :
s               : |||||
t               : |||||||||||
u               : |
v               : |
w               :
x               :
y               : |
z               :
----------


-- what usually follows "k":
sa: next-symbol-after |k>
----------
                : |
'               :
-               :
.               :
a               : |
b               :
c               :
d               :
e               : ||||
end of sequence : ||||||||||||||||||||||||||||||||||||||||||||||||||
f               :
g               :
h               :
H               :
i               : ||
I               :
j               :
k               :
l               :
m               :
n               :
o               :
p               :
r               :
R               :
s               :
t               :
u               :
v               :
V               :
w               :
W               :
y               :
----------
And so on. Though the graphs are somewhat pretty, the result is actually a bit boring. These correspond to standard Markov: "given a char predict the next char". Much more interesting would be given a sequence of characters, then predict what is next. So I tried to do this, but it was slow and didn't work quite right. I'll try sometime again in the future. But even this is somewhat boring, since firefox and google already do this, and I suspect it could be done with only a few lines of python. But I suppose that is missing the point. The point is to learn a large number of sequences in a proposed brain-like way, as a proof of concept, and then hopefully be useful sometime in the future.

Next up, what are the usual positions of a symbol in a word:
sa: symbol-positions-for |*> #=> bar-chart[50] ket-sort extract-value clean drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>

-- the bar chart of the positions for "B":
sa: symbol-positions-for |B>
----------
0  : ||||||||||||||||||||||||||||||||||||||||||||||||||
1  :
2  : |
3  :
4  : |
5  :
6  : |
7  :
8  : |
9  :
10 :
11 :
12 :
13 :
14 :
15 :
17 :
----------

-- the bar chart of the positions for "b":
sa: symbol-positions-for |b>
----------
0  : ||||||||||||||||||||||||||||||||||||||||||||||||||
1  : ||||||
2  : |||||||||||||||||||||||
3  : ||||||||||||||||||||
4  : ||||||||||||||
5  : |||||||||||||
6  : ||||||||
7  : |||||||
8  : |||||
9  : |||
10 : |||
11 : |
12 :
13 :
14 :
15 :
16 :
17 :
18 :
19 :
20 :
22 :
23 :
28 :
----------

sa: symbol-positions-for |k>
----------
0  : ||||||||||||||||||
1  : |||||
2  : ||||||||||||
3  : ||||||||||||||||||||||||||||||||||||||||||||||||||
4  : ||||||||||||||||||||||||||||||||
5  : |||||||||
6  : ||||||||||||
7  : ||||||||||||
8  : |||||||||||||
9  : ||||||||
10 : |||||
11 : ||||
12 : |||
13 : |
14 : |
15 :
16 :
17 :
18 :
19 :
20 :
21 :
24 :
----------

sa: symbol-positions-for |x>
----------
0  : ||||
1  : ||||||||||||||||||||||||||||||||||||||||||||||||||
2  : |||||||||||||||||||||||||||||||||||
3  : |||||||||||||
4  : ||||||||
5  : |||||||||||
6  : ||||||||||
7  : |||||||
8  : |||||
9  : ||||
10 : |||
11 : ||
12 : ||
13 : |
14 : |
15 :
16 :
17 :
18 :
19 :
20 :
25 :
----------
And so on for other symbols. Next we introduce a stripped down follow-sequence operator. This one will follow a sequence from any starting point, not just the first letter, though in spirit it is identical to the above spell operator. Indeed, it would have been cleaner for me to have defined spell in terms of follow-sequence:
  next (*) #=> then clean select[1,1] similar-input[pattern] |_self>
  name (*) #=> clean select[1,1] similar-input[encode] extract-category |_self>

  follow-sequence (*) #=>
    current |node> => |_self>
    while name current |node> /= |end of sequence>:
      print name current |node>
      current |node> => next current |node>
    return |end of sequence>

  spell |*> #=> follow-sequence first-letter |_self>
In our first example, just jump into any random sequence and follow it:
sa: follow-a-random-sequence |*> #=> follow-sequence pattern pick-elt rel-kets[pattern] |>
sa: follow-a-random-sequence
|e>
|n>
|c>
|h>
| >
|k>
|n>
|i>
|f>
|e>
|end of sequence>

-- find the parent word:
sa: parent-word |node 69596: 2>
|trench knife>

-- another example:
sa: follow-a-random-sequence
|e>
|i>
|g>
|h>
|end of sequence>

sa: parent-word |node 43118: 3>
|inveigh>
Next, spell a random word:
sa: spell-a-random-word |*> #=> follow-sequence first-letter pick-elt rel-kets[first-letter] |>
sa: spell-a-random-word
|Z>
|o>
|r>
|o>
|a>
|s>
|t>
|r>
|i>
|a>
|n>
|i>
|s>
|m>
0|end of sequence>
Jump into a random sequence and start at the given symbol:
sa: follow-sequence-starting-at |*> #=> follow-sequence pattern pick-elt drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>
sa: follow-sequence-starting-at |c>
|c>
|a>
|r>
|d>
|end of sequence>

sa: parent-word |node 42039: 6>
|index card>

sa: follow-sequence-starting-at |c>
|c>
|u>
|m>
|b>
|e>
|n>
|t>
|end of sequence>

sa: parent-word |node 27871: 2>
|decumbent>
Next, spell a random word that starts with a given symbol:
sa: spell-a-random-word-that-starts-with |*> #=> follow-sequence first-letter pick-elt drop-below[0.09] similar-input[first-letter] append-column[10] encode |_self>
sa: spell-a-random-word-that-starts-with |X>
|X>
|e>
|n>
|o>
|p>
|h>
|a>
|n>
|e>
|s>
|end of sequence>

sa: spell-a-random-word-that-starts-with |f>
|f>
|o>
|r>
|g>
|a>
|v>
|e>
|end of sequence>

sa: spell-a-random-word-that-starts-with |f>
|f>
|a>
|r>
|i>
|n>
|a>
|end of sequence>
Next, spell a random word that contains the given symbol:
sa: spell-a-random-word-that-contains |*> #=> follow-sequence pattern merge-labels (extract-category pick-elt drop-below[0.09] similar-input[pattern] append-column[10] encode |_self> + |: 0>)
sa: spell-a-random-word-that-contains |x>
|b>
|a>
|u>
|x>
|i>
|t>
|e>
|end of sequence>

sa: spell-a-random-word-that-contains |z>
|L>
|e>
|i>
|b>
|n>
|i>
|z>
|end of sequence>
Now, I think it might be instructive to see all our operator definitions at once:
  name-pattern |*> #=> clean select[1,1] similar-input[encode] extract-category pattern |_self>
  next-pattern |*> #=> then clean select[1,1] similar-input[pattern] pattern |_self>

  name (*) #=> clean select[1,1] similar-input[encode] extract-category |_self>
  next (*) #=> then clean select[1,1] similar-input[pattern] |_self>

  not |yes> => |no>
  not |no> => |yes>

  spell (*) #=>
    if not do-you-know first-letter |_self>:
      return |_self>
    current |node> => first-letter |_self>
    while name current |node> /= |end of sequence>:
      print name current |node>
      current |node> => next current |node>
    return |end of sequence>

  sequence-lengths |*> #=> bar-chart[50] plus[1] ket-sort extract-value clean similar-input[then] append-column[10] encode |end of sequence>
  next-symbol-after |*> #=> bar-chart[50] ket-sort similar-input[encode] extract-category then drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>
  symbol-positions-for |*> #=> bar-chart[50] ket-sort extract-value clean drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>

  follow-sequence (*) #=>
    current |node> => |_self>
    while name current |node> /= |end of sequence>:
      print name current |node>
      current |node> => next current |node>
    return |end of sequence>

  spell |*> #=> follow-sequence first-letter |_self>

  spell-a-random-word |*> #=> follow-sequence first-letter pick-elt rel-kets[first-letter] |>
  follow-a-random-sequence |*> #=> follow-sequence pattern pick-elt rel-kets[pattern] |>

  spell-a-random-word-that-starts-with |*> #=> follow-sequence first-letter pick-elt drop-below[0.09] similar-input[first-letter] append-column[10] encode |_self>
  follow-sequence-starting-at |*> #=> follow-sequence pattern pick-elt drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>
  spell-a-random-word-that-contains |*> #=> follow-sequence pattern merge-labels (extract-category pick-elt drop-below[0.09] similar-input[pattern] append-column[10] encode |_self> + |: 0>)
So there we have it. We successfully learned and recalled a whole dictionary of words using some HTM inspired ideas. In the process this became the largest and most complex use of my language/notation yet. Though I'm still waiting to find an application of my notation to something really interesting. For example, I'm hoping that with ideas from if-then machines, sequences, and chunked sequences we might be able to encode grammatical structures. But that is a long way off yet, but might just be possible. Another goal is to implement something similar to word2vec and cortical.io, that would map words to superpositions, with the property that semantically similar words have similar superpositions.

In the next post I plan to extend the above to learning and recalling chunked sequences. In particular, some digits of pi and the alphabet.

Update: we can also count letter frequencies. I guess not super interesting, but may as well add it. Here are the needed operators:
  count-first-letter-frequency |*> #=> pop-float rewrite( how-many drop-below[0.09] similar-input[first-letter] append-column[10] encode |_self>, |number>, |_self> )
  count-letter-frequency |*> #=> pop-float rewrite( how-many drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>, |number>, |_self> )
And now apply them:
-- first letter frequency for the upper case alphabet:
sa: bar-chart[50] count-first-letter-frequency split |A B C D E F G H I J K L M N O P Q R S T U V W X Y Z>
----------
A : ||||||||||||||||||||||||||||||||||||||||||||||
B : |||||||||||||||||||||||||||||||||||||||||||||
C : ||||||||||||||||||||||||||||||||||||||||||||||||||
D : ||||||||||||||||||||||||
E : |||||||||||||||||||
F : ||||||||||||||||||
G : ||||||||||||||||||||||||||||
H : ||||||||||||||||||||||||||||
I : ||||||||||||||
J : |||||||||||||
K : ||||||||||||||||||
L : |||||||||||||||||||||||||||||
M : |||||||||||||||||||||||||||||||||||||||
N : ||||||||||||||||||
O : |||||||||||
P : ||||||||||||||||||||||||||||||||
Q : ||
R : ||||||||||||||||||
S : |||||||||||||||||||||||||||||||||||||||||||
T : ||||||||||||||||||||||||
U : |||||
V : ||||||||||
W : ||||||||||||
X :
Y : |||
Z : |||
----------

-- first letter frequency for the lower case alphabet:
sa: bar-chart[50] count-first-letter-frequency split |a b c d e f g h i j k l m n o p q r s t u v w x y z>
----------
a : |||||||||||||||||||||||||||||
b : ||||||||||||||||||||||||||
c : ||||||||||||||||||||||||||||||||||||||||||||||
d : |||||||||||||||||||||||||
e : ||||||||||||||||||
f : |||||||||||||||||||||
g : ||||||||||||||||
h : |||||||||||||||||||
i : |||||||||||||||||
j : |||
k : ||||
l : ||||||||||||||||
m : ||||||||||||||||||||||
n : ||||||||
o : ||||||||||
p : ||||||||||||||||||||||||||||||||||||
q : ||
r : ||||||||||||||||||
s : ||||||||||||||||||||||||||||||||||||||||||||||||||
t : |||||||||||||||||||||||
u : ||||||||
v : |||||||
w : |||||||||||
x :
y : |
z : |
----------

-- letter frequency for the uppercase alphabet:
sa: bar-chart[50] count-letter-frequency split |A B C D E F G H I J K L M N O P Q R S T U V W X Y Z>
----------
A : |||||||||||||||||||||||||||||||||||||||||||||
B : ||||||||||||||||||||||||||||||||||||||||
C : ||||||||||||||||||||||||||||||||||||||||||||||||||
D : ||||||||||||||||||||||||
E : ||||||||||||||||||
F : ||||||||||||||||||
G : ||||||||||||||||||||||||||
H : |||||||||||||||||||||||||
I : ||||||||||||||||||||||||||
J : ||||||||||||
K : |||||||||||||||
L : ||||||||||||||||||||||||||
M : ||||||||||||||||||||||||||||||||||||
N : ||||||||||||||||
O : |||||||||||
P : ||||||||||||||||||||||||||||||||
Q : ||
R : ||||||||||||||||||||
S : ||||||||||||||||||||||||||||||||||||||||||||
T : ||||||||||||||||||||||
U : ||||
V : ||||||||||||
W : ||||||||||||
X : |
Y : |||
Z : ||
----------

-- letter frequency for the lowercase alphabet:
sa: bar-chart[50] count-letter-frequency split |a b c d e f g h i j k l m n o p q r s t u v w x y z>
----------
a : |||||||||||||||||||||||||||||||||||||||||
b : ||||||||
c : ||||||||||||||||||||
d : ||||||||||||||
e : ||||||||||||||||||||||||||||||||||||||||||||||||||
f : ||||||
g : ||||||||||
h : |||||||||||||
i : ||||||||||||||||||||||||||||||||||||
j :
k : ||||
l : |||||||||||||||||||||||||
m : |||||||||||||
n : |||||||||||||||||||||||||||||||
o : |||||||||||||||||||||||||||||||||
p : |||||||||||||
q :
r : ||||||||||||||||||||||||||||||||||
s : ||||||||||||||||||||||||||
t : |||||||||||||||||||||||||||||||
u : ||||||||||||||||
v : ||||
w : ||||
x : |
y : ||||||||
z : |
----------
And let's finish with the code all at once:
  count-first-letter-frequency |*> #=> pop-float rewrite( how-many drop-below[0.09] similar-input[first-letter] append-column[10] encode |_self>, |number>, |_self> )
  count-letter-frequency |*> #=> pop-float rewrite( how-many drop-below[0.09] similar-input[pattern] append-column[10] encode |_self>, |number>, |_self> )

  bar-chart[50] count-first-letter-frequency split |A B C D E F G H I J K L M N O P Q R S T U V W X Y Z>
  bar-chart[50] count-first-letter-frequency split |a b c d e f g h i j k l m n o p q r s t u v w x y z>

  bar-chart[50] count-letter-frequency split |A B C D E F G H I J K L M N O P Q R S T U V W X Y Z>
  bar-chart[50] count-letter-frequency split |a b c d e f g h i j k l m n o p q r s t u v w x y z>
That's it for this update. Chunked sequences are coming up soon.