Tuesday, 25 August 2015

new function: hash

Just a simple one today. Some new python that maps a superposition to a superposition of the hash's of the kets.

Here is the python:
# ket-hash[size] |some ket>
#
# one is a ket
def ket_hash(one,size):
  logger.debug("ket-hash one: " + str(one))
  logger.debug("ket-hash size: " + size)
  try:
    size = int(size)
  except:
    return ket("",0)
  our_hash = hashlib.md5(one.label.encode('utf-8')).hexdigest()[-size:]
  return ket(our_hash,one.value)
And some simple examples:
sa: hash[6] split |a b c d e f>
|772661> + |31578f> + |8b5f33> + |e091ad> + |41ec32> + |29cce7>

sa: hash[10] split |u v w x y z>
|4f4f21d34c> + |4664205d2a> + |e77c0c5d68> + |4e155c67a6> + |22904f345d> + |b808451dd7>

-- slightly more interesting example:
sa: load fred-sam-friends.sw
sa: dump
----------------------------------------
|context> => |context: friends>

friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>

friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
----------------------------------------

sa: hash-friends |Fred> => hash[4] friends |_self>
sa: hash-friends |Sam> => hash[4] friends |_self>

sa: dump
----------------------------------------
|context> => |context: friends>

friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>
hash-friends |Fred> => |4f62> + |72ec> + |f3e0> + |315a> + |19b1> + |06ec> + |4a79> + |5cd8>

friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
hash-friends |Sam> => |5cd8> + |93a3> + |4a79> + |4f62> + |75f6> + |3e4b> + |47dd>
----------------------------------------

sa: common[friends] split |Fred Sam>
|Jack> + |Emma> + |Charlie>

sa: common[hash-friends] split |Fred Sam>
|4f62> + |4a79> + |5cd8>
I guess the point is sometimes the exact ket label doesn't matter. It is the network structure that matters. I guess we could also use it as a compression scheme of sorts. Say your data has kets with very long text labels, we could, in theory, compress that down using hashes. Providing the structure is the only thing of interest.

Monday, 24 August 2015

visualizing superpositions

Superpositions can sometimes be somewhat abstract. But today I want to show that it is quite easy to visualize them. Though I had to write a little python, and dig up an old gnuplot script.

Here is the new python (not super happy with the name, but it will do for now):
def hash_data(one,size):
  logger.debug("hash-data one: " + str(one))
  logger.debug("hash-data size: " + size)
  try:
    size = int(size)
  except:
    return ket("",0)
  array = [0] * (16**size)
  for x in one:
    our_hash = hashlib.md5(x.label.encode('utf-8')).hexdigest()[-size:]
    k = int(our_hash,16)
    array[k] += 1 * x.value
  logger.info("hash-data writing to tmp-sp.dat")
  f = open('tmp-sp.dat','w')
  for k in array:
    f.write(str(k) + '\n')
  f.close()
  return ket("hash-data")
Now, I have an example in mind that would be good to visualize. Recall:
sa: load improved-imdb.sw
sa: table[actor,coeff] common[actors] select[1,6] self-similar[actors] |movie: Star Trek: The Motion Picture (1979)>
+-------------------+-------+
| actor             | coeff |
+-------------------+-------+
| James Doohan      | 0.109 |
| DeForest Kelley   | 0.109 |
| Walter (I) Koenig | 0.109 |
| Leonard Nimoy     | 0.109 |
| William Shatner   | 0.109 |
| George Takei      | 0.109 |
| Nichelle Nichols  | 0.109 |
+-------------------+-------+
Now, in the console:
sa: load improved-imdb.sw
sa: |result> => self-similar[actors] |movie: Star Trek: The Motion Picture (1979)>

sa: hash-data[4] |movie: Star Trek: The Motion Picture (1979)>
sa: hash-data[4] "" |result>
sa: hash-data[4] select[1,6] "" |result>
sa: hash-data[4] common[actors] select[1,6] "" |result>
Then we make use of this script($ ./make-image.sh tmp-sp.dat), and then we have:
Anyway, I think that is cool. And is approaching what I imagine a brain would look like.

BTW, I should mention. The spikes in the first three graphs correspond to movies, and the spikes in the last graph correspond to the original series Star Trek actors.

Update: one more step to another superposition:
-- find all the movies the 7 original series Star Trek actors starred in:
sa: hash-data[4] movies common[actors] select[1,6] "" |result>
Now, out of interest, how many movies was that?
sa: how-many movies common[actors] select[1,6] "" |result>
|number: 262>
What were the top 30 of these?
sa: table[movie,coeff] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+---------------------------------------+--------+
| movie                                 | coeff  |
+---------------------------------------+--------+
| Road Trek 2011 (2012)                 | 76.562 |
| Star Trek Adventure (1991)            | 76.562 |
| The Search for Spock (1984)           | 76.562 |
| The Voyage Home (1986)                | 76.562 |
| The Final Frontier (1989)             | 76.562 |
| The Undiscovered Country (1991)       | 76.562 |
| The Motion Picture (1979)             | 76.562 |
| The Wrath of Khan (1982)              | 76.562 |
| Trekkies (1997)                       | 76.562 |
| To Be Takei (2014)                    | 54.688 |
| Generations (1994)                    | 32.812 |
| Bug Buster (1998)                     | 21.875 |
| Loaded Weapon 1 (1993)                | 21.875 |
| Backyard Blockbusters (2012)          | 21.875 |
| FedCon XXI (2012)                     | 21.875 |
| The Captains (2011)                   | 21.875 |
| Unbelievable!!!!! (2014)              | 21.875 |
| Coneheads (1993)                      | 21.875 |
| The 6th People's Choice Awards (1980) | 21.875 |
| 36 Hours (1965)                       | 10.938 |
| Actors in War (2005)                  | 10.938 |
| Amore! (1993)                         | 10.938 |
| Bus Riley's Back in Town (1965)       | 10.938 |
| Double Trouble (1992/I)               | 10.938 |
| Jigsaw (1968)                         | 10.938 |
| Man in the Wilderness (1971)          | 10.938 |
| New York Skyride (1994)               | 10.938 |
| One of Our Spies Is Missing (1966)    | 10.938 |
| Pretty Maids All in a Row (1971)      | 10.938 |
| River of Stone (1994)                 | 10.938 |
+---------------------------------------+--------+
And what does this look like?
Filter down to the top 9 of these movies:
sa: table[actor,coeff] select[1,9] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+---------------------------------+--------+
| actor                           | coeff  |
+---------------------------------+--------+
| Road Trek 2011 (2012)           | 76.562 |
| Star Trek Adventure (1991)      | 76.562 |
| The Search for Spock (1984)     | 76.562 |
| The Voyage Home (1986)          | 76.562 |
| The Final Frontier (1989)       | 76.562 |
| The Undiscovered Country (1991) | 76.562 |
| The Motion Picture (1979)       | 76.562 |
| The Wrath of Khan (1982)        | 76.562 |
| Trekkies (1997)                 | 76.562 |
+---------------------------------+--------+
And who were the actors in the top 9 of these movies?
sa: table[actor,coeff] coeff-sort actors select[1,9] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+--------------------------+---------+
| actor                    | coeff   |
+--------------------------+---------+
| James Doohan             | 689.062 |
| DeForest Kelley          | 689.062 |
| Walter (I) Koenig        | 689.062 |
| Leonard Nimoy            | 689.062 |
| William Shatner          | 689.062 |
| George Takei             | 689.062 |
| Nichelle Nichols         | 689.062 |
| Grace Lee Whitney        | 382.812 |
| Mark Lenard              | 306.25  |
| Teresa E. Victor         | 229.688 |
| Majel Barrett            | 229.688 |
| Catherine (I) Hicks      | 153.125 |
| Harve Bennett            | 153.125 |
| Merritt Butrick          | 153.125 |
| Gary Faga                | 153.125 |
| Stephen Liska            | 153.125 |
| Robin (I) Curtis         | 153.125 |
| Michael Berryman         | 153.125 |
| Brock Peters             | 153.125 |
| John Schuck              | 153.125 |
| Michael (I) Snyder       | 153.125 |
| Judy Levitt              | 153.125 |
| Todd (I) Bryant          | 153.125 |
| David (I) Warner         | 153.125 |
| Michael (I) Dorn         | 153.125 |
| Tom Morga                | 153.125 |
| Richard (III) Arnold     | 153.125 |
| James T. Kirk            | 153.125 |
| Christopher (I) Flynn    | 76.562  |
| Malcolm McDowell         | 76.562  |
| Patrick (I) Stewart      | 76.562  |
| Gene Roddenberry         | 76.562  |
| Phillip R. Allen         | 76.562  |
| Steve Blalock            | 76.562  |
| David Cadiente           | 76.562  |
| Charles (I) Correll      | 76.562  |
| Bob K. Cummings          | 76.562  |
| Joe W. Davis             | 76.562  |
| Miguel (I) Ferrer        | 76.562  |
| Conroy Gedeon            | 76.562  |
| Robert Hooks             | 76.562  |
| Al (II) Jones            | 76.562  |
| John Larroquette         | 76.562  |
| Christopher (I) Lloyd    | 76.562  |
| Stephen (I) Manley       | 76.562  |
| Eric Mansker             | 76.562  |
| Mario Marcelino          | 76.562  |
| Scott McGinnis           | 76.562  |
| Allan (I) Miller         | 76.562  |
| Phil (I) Morris          | 76.562  |
| Danny Nero               | 76.562  |
| Dennis (I) Ott           | 76.562  |
| Vadia Potenza            | 76.562  |
| Branscombe Richmond      | 76.562  |
| Doug Shanklin            | 76.562  |
| James Sikking            | 76.562  |
| Paul (II) Sorensen       | 76.562  |
| Carl Steven              | 76.562  |
| Frank Welker             | 76.562  |
| Philip Weyland           | 76.562  |
| Judith (I) Anderson      | 76.562  |
| Jessica Biscardi         | 76.562  |
| Katherine Blum           | 76.562  |
| Judi M. Durand           | 76.562  |
| Claudia Lowndes          | 76.562  |
| Jeanne Mori              | 76.562  |
| Nanci Rogers             | 76.562  |
| Kimberly L. Ryusaki      | 76.562  |
| Cathie Shirriff          | 76.562  |
| Rebecca Soladay          | 76.562  |
| Sharon Thomas Cain       | 76.562  |
| Joseph Adamson           | 76.562  |
| Vijay Amritraj           | 76.562  |
| Mike Brislane            | 76.562  |
| Scott DeVenney           | 76.562  |
| Tony (I) Edwards         | 76.562  |
| David Ellenstein         | 76.562  |
| Robert Ellenstein        | 76.562  |
| Thaddeus Golas           | 76.562  |
| Richard Harder           | 76.562  |
| Alex Henteloff           | 76.562  |
| Greg Karas               | 76.562  |
| Joe Knowland             | 76.562  |
| Joe (I) Lando            | 76.562  |
| Everett (I) Lee          | 76.562  |
| Jeff (I) Lester          | 76.562  |
| Jeffrey (I) Martin       | 76.562  |
| James Menges             | 76.562  |
| John (I) Miranda         | 76.562  |
| Tom Mustin               | 76.562  |
| Joseph Naradzay          | 76.562  |
| Marty Pistone            | 76.562  |
| Nick Ramus               | 76.562  |
| Phil Rubenstein          | 76.562  |
| Bob Sarlatte             | 76.562  |
| Raymond Singer           | 76.562  |
| Newell (II) Tarrant      | 76.562  |
| Kirk R. Thatcher         | 76.562  |
| Mike Timoney             | 76.562  |
| Donald W. Zautcke        | 76.562  |
| Monique DeSart           | 76.562  |
| Madge Sinclair           | 76.562  |
| Eve (I) Smith            | 76.562  |
| Viola Kates Stimpson     | 76.562  |
| Jane Wiedlin             | 76.562  |
| Jane (I) Wyatt           | 76.562  |
| Charles (I) Cooper       | 76.562  |
| Gene Cross               | 76.562  |
| Rex (I) Holman           | 76.562  |
| Laurence Luckinbill      | 76.562  |
| George (I) Murdock       | 76.562  |
| Bill (I) Quinn           | 76.562  |
| Carey Scott              | 76.562  |
| Jonathan (I) Simpson     | 76.562  |
| Mike (I) Smithson        | 76.562  |
| Steve Susskind           | 76.562  |
| Cynthia Blaise           | 76.562  |
| Cynthia Gouw             | 76.562  |
| Beverly Hart             | 76.562  |
| Melanie Shatner          | 76.562  |
| Spice Williams-Crosby    | 76.562  |
| Rene Auberjonois         | 76.562  |
| John (II) Beck           | 76.562  |
| John (III) Bloom         | 76.562  |
| Jim (I) Boeke            | 76.562  |
| Michael Bofshever        | 76.562  |
| Carlos Cestero           | 76.562  |
| Barron Christian         | 76.562  |
| Edward Clements          | 76.562  |
| BJ (I) Davis             | 76.562  |
| Douglas (I) Dunning      | 76.562  |
| Robert (I) Easton        | 76.562  |
| Doug Engalla             | 76.562  |
| Trent Christopher Ganino | 76.562  |
| Darryl Henriques         | 76.562  |
| Matthias Hues            | 76.562  |
| Boris Lee Krutonog       | 76.562  |
| James Mapes              | 76.562  |
| Alan (II) Marcus         | 76.562  |
| David Orange             | 76.562  |
| Christopher (I) Plummer  | 76.562  |
| Brett (I) Porter         | 76.562  |
| Douglas (I) Price        | 76.562  |
| Jeremy (I) Roberts       | 76.562  |
| Paul Rossilli            | 76.562  |
| Leon Russom              | 76.562  |
| Clifford Shegog          | 76.562  |
| William Morgan Sheppard  | 76.562  |
| Christian Slater         | 76.562  |
| Kurtwood Smith           | 76.562  |
| Eric A. Stillwell        | 76.562  |
| Angelo Tiffe             | 76.562  |
| J.D. Walters             | 76.562  |
| Kim Cattrall             | 76.562  |
| Shakti Chen              | 76.562  |
| Rosanna DeSoto           | 76.562  |
| Iman (I)                 | 76.562  |
| Katie (I) Johnston       | 76.562  |
| Jimmie Booth             | 76.562  |
| Ralph Brannen            | 76.562  |
| Roger Aaron Brown        | 76.562  |
| Ralph Byers              | 76.562  |
| Stephen (I) Collins      | 76.562  |
| Vern Dietsche            | 76.562  |
| Christopher Doohan       | 76.562  |
| Montgomery Doohan        | 76.562  |
| Dennis (I) Fischer       | 76.562  |
| Joshua Gallegos          | 76.562  |
| David Gautreaux          | 76.562  |
| David Gerrold            | 76.562  |
| John (I) Gowans          | 76.562  |
| William (I) Guest        | 76.562  |
| Leslie C. Howard         | 76.562  |
| Howard Itzkowitz         | 76.562  |
| Junero Jennings          | 76.562  |
| Jon Rashad Kamal         | 76.562  |
| Joel (I) Kramer          | 76.562  |
| Donald J. Long           | 76.562  |
| Bill (I) McIntosh        | 76.562  |
| Dave Moordigian          | 76.562  |
| Tony (I) Rocco           | 76.562  |
| Michael Rougas           | 76.562  |
| Joel Schultz             | 76.562  |
| Franklyn Seales          | 76.562  |
| Norman (I) Stuart        | 76.562  |
| Craig (VII) Thomas       | 76.562  |
| Billy Van Zandt          | 76.562  |
| Paul (III) Weber         | 76.562  |
| Scott (II) Whitney       | 76.562  |
| Michele Ameen Billy      | 76.562  |
| Celeste Cartier          | 76.562  |
| Lisa Chess               | 76.562  |
| Paula Crist              | 76.562  |
| Cassandra (I) Foster     | 76.562  |
| Edna Glover              | 76.562  |
| Sharon Hesky             | 76.562  |
| Sayra Hummel             | 76.562  |
| Persis Khambatta         | 76.562  |
| Marcy Lafferty           | 76.562  |
| Iva Lane                 | 76.562  |
| Jeri McBride             | 76.562  |
| Barbara Minster          | 76.562  |
| Ve Neill                 | 76.562  |
| Terrence (I) O'Connor    | 76.562  |
| Susan (I) O'Sullivan     | 76.562  |
| Louise Stange-Wahl       | 76.562  |
| Bjo Trimble              | 76.562  |
| Momo Yashima             | 76.562  |
| Steve (I) Bond           | 76.562  |
| Brett Baxter Clark       | 76.562  |
| Tim Culbertson           | 76.562  |
| Ike Eisenmann            | 76.562  |
| John (II) Gibson         | 76.562  |
| Nicholas Guest           | 76.562  |
| James Horner             | 76.562  |
| Paul (I) Kent            | 76.562  |
| Dennis Landry            | 76.562  |
| Cristian Letelier        | 76.562  |
| Joel Marstan             | 76.562  |
| Jeff (II) McBride        | 76.562  |
| Roger Menache            | 76.562  |
| Ricardo Montalban        | 76.562  |
| David Ruprecht           | 76.562  |
| Judson Scott             | 76.562  |
| Kevin Rodney Sullivan    | 76.562  |
| Russell Takaki           | 76.562  |
| Deney Terrio             | 76.562  |
| John (I) Vargas          | 76.562  |
| Paul (I) Winfield        | 76.562  |
| John (I) Winston         | 76.562  |
| Kirstie Alley            | 76.562  |
| Laura (I) Banks          | 76.562  |
| Bibi Besch               | 76.562  |
| Dianne (I) Harper        | 76.562  |
| Marcy Vosburgh           | 76.562  |
| Buzz Aldrin              | 76.562  |
| G.Z. Allen               | 76.562  |
| Robert (XII) Allen       | 76.562  |
| Thomas Anitzberger       | 76.562  |
| Michael Armendariz       | 76.562  |
| Thomas Bax               | 76.562  |
| Robert (I) Beltran       | 76.562  |
| Craig Berthiaume         | 76.562  |
| Jared Bird               | 76.562  |
| Robert Boudrow           | 76.562  |
| Denis (I) Bourguignon    | 76.562  |
| Richard (II) Bowen       | 76.562  |
| Brannon Braga            | 76.562  |
| LeVar Burton             | 76.562  |
| Miguel Carreon           | 76.562  |
| Richard Clabaugh         | 76.562  |
| Bruce (III) Clarke       | 76.562  |
| Thomas Clegg             | 76.562  |
| William Coble            | 76.562  |
| Rick Corley              | 76.562  |
| Justin Reid Cutietta     | 76.562  |
| Frank (I) D'Amico        | 76.562  |
| John de Lancie           | 76.562  |
| Brian Dellis             | 76.562  |
| Daren Dochterman         | 76.562  |
| Rey Duran                | 76.562  |
| Ron Duran                | 76.562  |
| Chris (I) Fleming        | 76.562  |
| Jonathan Frakes          | 76.562  |
| Daryl Frazetti           | 76.562  |
| Dennis Friday II         | 76.562  |
| Ross Gabrick             | 76.562  |
| L.D. Gardner             | 76.562  |
| Travis Gates             | 76.562  |
| Michael (III) Gay        | 76.562  |
| Adam Geiss               | 76.562  |
| David Greenstein         | 76.562  |
| Armando Paul Guillen     | 76.562  |
| Peter Haberkorn          | 76.562  |
| Dennis Hanon             | 76.562  |
| Steve (III) Hardy        | 76.562  |
| Scott (III) Harper       | 76.562  |
| Randall Hawthorne        | 76.562  |
| Steve (I) Head           | 76.562  |
| Edward Herndon           | 76.562  |
| Matthew Herra            | 76.562  |
| John Hurles              | 76.562  |
| Devin Irwin              | 76.562  |
| Edgar Jauregui           | 76.562  |
| Richard Koerner          | 76.562  |
| David (I) Koontz         | 76.562  |
| Stephen (I) Koontz       | 76.562  |
| Rich Kronfeld            | 76.562  |
| Gabriel Kerner           | 76.562  |
| Erik (I) Larson          | 76.562  |
| David (I) Livingston     | 76.562  |
| Gary (I) Lockwood        | 76.562  |
| Robert (IV) Lopez        | 76.562  |
| Stanley Lozowsky         | 76.562  |
| Adam Madden              | 76.562  |
| Logan Madden             | 76.562  |
| Geoffrey Mandel          | 76.562  |
| Douglas Marcks           | 76.562  |
| Jason (II) Mathews       | 76.562  |
| Robert Duncan McNeill    | 76.562  |
| Steve Menaugh            | 76.562  |
| Carl (I) Meyers          | 76.562  |
| Tim (I) Meyers           | 76.562  |
| Jason (I) Munoz          | 76.562  |
| Phil Murre               | 76.562  |
| Salvador Nogueda         | 76.562  |
| Robert (I) O'Reilly      | 76.562  |
| Marc Okrand              | 76.562  |
| Rick (I) Overton         | 76.562  |
| Harminder Pal            | 76.562  |
| John Paladin             | 76.562  |
| Ric Parish               | 76.562  |
| Mark Payton              | 76.562  |
| Brian (I) Phelps         | 76.562  |
| Ethan (I) Phillips       | 76.562  |
| Thomas (I) Phillips      | 76.562  |
| Adam (I) Philpott        | 76.562  |
| Robert Picardo           | 76.562  |
| Daniel (I) Pilkington    | 76.562  |
| James Pollnow            | 76.562  |
| Glen Proechel            | 76.562  |
| Michael Raffeo           | 76.562  |
| Russell (I) Ray          | 76.562  |
| Patrick Rimington        | 76.562  |
| Jon (I) Ross             | 76.562  |
| Paul Rudeen              | 76.562  |
| Tim (I) Russ             | 76.562  |
| Robert (X) Russell       | 76.562  |
| Timothy (IV) Scott       | 76.562  |
| Douglas Shannen          | 76.562  |
| Daniel (I) Shea          | 76.562  |
| David (IV) Silverman     | 76.562  |
| Jason Speltz             | 76.562  |
| Brent Spiner             | 76.562  |
| Tom (I) Stewart          | 76.562  |
| Rocky Stinitis           | 76.562  |
| Mark (II) Thompson       | 76.562  |
| Dennis Thuringer         | 76.562  |
| Barron Toler             | 76.562  |
| Kenneth Traft            | 76.562  |
| Fred Travalena           | 76.562  |
| J. Trusk                 | 76.562  |
| Alois C. Tschamjsl       | 76.562  |
| Karl Van Der Wyk         | 76.562  |
| Matt Weinhold            | 76.562  |
| Jonathan (I) West        | 76.562  |
| Michael (I) Westmore     | 76.562  |
| Wil Wheaton              | 76.562  |
| Travis (I) Williams      | 76.562  |
| Wayne Wills              | 76.562  |
| Barbara (II) Adams       | 76.562  |
| Teresa Bailie            | 76.562  |
| Holly Barbour            | 76.562  |
| Morgan Barbour           | 76.562  |
| Roberta Barnhart         | 76.562  |
| Jennifer Bax             | 76.562  |
| Esther Becerra           | 76.562  |
| Viki Beyer               | 76.562  |
| Martha Bock              | 76.562  |
| Jolynn Brown             | 76.562  |
| Nicole Compton           | 76.562  |
| Denise (I) Crosby        | 76.562  |
| Melisa Dahl              | 76.562  |
| Melissa Dahl             | 76.562  |
| Roxann Dawson            | 76.562  |
| Evelyn De Biase          | 76.562  |
| Maria De Maci            | 76.562  |
| Evelyn Eastteam          | 76.562  |
| Ana Espinoza             | 76.562  |
| Terry (I) Farrell        | 76.562  |
| Lynn Fulstone            | 76.562  |
| Glenn Gadd               | 76.562  |
| Laurel Greenstein        | 76.562  |
| Shantell Hafner          | 76.562  |
| Debbie (I) Hanon         | 76.562  |
| Diana Harper             | 76.562  |
| Lisa (III) Harper        | 76.562  |
| Sharron Hawthorne        | 76.562  |
| Joyce Herndon            | 76.562  |
| Inge Heyer               | 76.562  |
| Penny Keane              | 76.562  |
| L. Grace Klitmoller      | 76.562  |
| Margaret Koontz          | 76.562  |
| Joan Letlow              | 76.562  |
| Jane Lostumbo            | 76.562  |
| Joyce (II) Mason         | 76.562  |
| Chase Masterson          | 76.562  |
| Marcella Mesnard         | 76.562  |
| Diane (III) Morgan       | 76.562  |
| Renee Morrison           | 76.562  |
| Kate Mulgrew             | 76.562  |
| Anne Kathleen Murphy     | 76.562  |
| Stephanie (I) Murphy     | 76.562  |
| Carroll Paige            | 76.562  |
| Cheryl Petersen          | 76.562  |
| Shelly Raffeo            | 76.562  |
| Sondra Reynolds          | 76.562  |
| Jessica Rimington        | 76.562  |
| Mary Rottler             | 76.562  |
| Hope Rudeen              | 76.562  |
| Tonya Saunders           | 76.562  |
| Lori Schwartz            | 76.562  |
| Lori Seol                | 76.562  |
| Susan (I) Shea           | 76.562  |
| Wendy (I) Shea           | 76.562  |
| Evan Shride              | 76.562  |
| Donelda Snyder           | 76.562  |
| Helen (I) Souza          | 76.562  |
| Linda Syck               | 76.562  |
| Deborah Taller           | 76.562  |
| Jeri Taylor              | 76.562  |
| Linda Thuringer          | 76.562  |
| Allison (I) Todd         | 76.562  |
| Deborah (II) Warner      | 76.562  |
| Pat Weisner              | 76.562  |
| Deborah Wheeler          | 76.562  |
| Cheryl (III) Wilson      | 76.562  |
+--------------------------+---------+
And what does this look like?
Now, tidy this up by using drop-below[] this time, instead of select[]:
sa: table[actor,coeff] drop-below[150] coeff-sort actors select[1,9] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+----------------------+---------+
| actor                | coeff   |
+----------------------+---------+
| James Doohan         | 689.062 |
| DeForest Kelley      | 689.062 |
| Walter (I) Koenig    | 689.062 |
| Leonard Nimoy        | 689.062 |
| William Shatner      | 689.062 |
| George Takei         | 689.062 |
| Nichelle Nichols     | 689.062 |
| Grace Lee Whitney    | 382.812 |
| Mark Lenard          | 306.25  |
| Teresa E. Victor     | 229.688 |
| Majel Barrett        | 229.688 |
| Catherine (I) Hicks  | 153.125 |
| Harve Bennett        | 153.125 |
| Merritt Butrick      | 153.125 |
| Gary Faga            | 153.125 |
| Stephen Liska        | 153.125 |
| Robin (I) Curtis     | 153.125 |
| Michael Berryman     | 153.125 |
| Brock Peters         | 153.125 |
| John Schuck          | 153.125 |
| Michael (I) Snyder   | 153.125 |
| Judy Levitt          | 153.125 |
| Todd (I) Bryant      | 153.125 |
| David (I) Warner     | 153.125 |
| Michael (I) Dorn     | 153.125 |
| Tom Morga            | 153.125 |
| Richard (III) Arnold | 153.125 |
| James T. Kirk        | 153.125 |
+----------------------+---------+
And our final graph:
Anyway, lots of fun. I hope it is now easier to visualize what happens as we step from superposition to superposition.

OK. I think it might be interesting to show them all at once, in sequence:

Tuesday, 11 August 2015

representing song lyrics in sw format

An easy one today. It recently occurred to me we can easily enough represent song lyrics in sw format, and then show that using a table. So no more words, here is an example from The Doors:
$ cat the-doors--people-are-strange.sw
lyrics-for |the doors: People are strange> => |line 1: "People Are Strange">
lyrics-for |the doors: People are strange> +=> |line 2: >
lyrics-for |the doors: People are strange> +=> |line 3: People are strange when you're a stranger>
lyrics-for |the doors: People are strange> +=> |line 4: Faces look ugly when you're alone>
lyrics-for |the doors: People are strange> +=> |line 5: Women seem wicked when you're unwanted>
lyrics-for |the doors: People are strange> +=> |line 6: Streets are uneven when you're down>
lyrics-for |the doors: People are strange> +=> |line 7: >
lyrics-for |the doors: People are strange> +=> |line 8: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 9: Faces come out of the rain>
lyrics-for |the doors: People are strange> +=> |line 10: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 11: No one remembers your name>
lyrics-for |the doors: People are strange> +=> |line 12: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 13: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 14: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 15: >
lyrics-for |the doors: People are strange> +=> |line 16: People are strange when you're a stranger>
lyrics-for |the doors: People are strange> +=> |line 17: Faces look ugly when you're alone>
lyrics-for |the doors: People are strange> +=> |line 18: Women seem wicked when you're unwanted>
lyrics-for |the doors: People are strange> +=> |line 19: Streets are uneven when you're down>
lyrics-for |the doors: People are strange> +=> |line 20: >
lyrics-for |the doors: People are strange> +=> |line 21: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 22: Faces come out of the rain>
lyrics-for |the doors: People are strange> +=> |line 23: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 24: No one remembers your name>
lyrics-for |the doors: People are strange> +=> |line 25: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 26: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 27: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 28: >
lyrics-for |the doors: People are strange> +=> |line 29: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 30: Faces come out of the rain>
lyrics-for |the doors: People are strange> +=> |line 31: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 32: No one remembers your name>
lyrics-for |the doors: People are strange> +=> |line 33: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 34: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 35: When you're strange>
where we are using the notation for append learn "+=>" (unfortunately I called it add_learn, which it partly is, and partly isn't, but it is way too late to change it now).

Now we have it in sw we can display it easy enough:
sa: load the-doors--people-are-strange.sw
sa: table[lyrics] lyrics-for |the doors: People are strange>
+-------------------------------------------+
| lyrics                                    |
+-------------------------------------------+
| "People Are Strange"                      |
|                                           |
| People are strange when you're a stranger |
| Faces look ugly when you're alone         |
| Women seem wicked when you're unwanted    |
| Streets are uneven when you're down       |
|                                           |
| When you're strange                       |
| Faces come out of the rain                |
| When you're strange                       |
| No one remembers your name                |
| When you're strange                       |
| When you're strange                       |
| When you're strange                       |
|                                           |
| People are strange when you're a stranger |
| Faces look ugly when you're alone         |
| Women seem wicked when you're unwanted    |
| Streets are uneven when you're down       |
|                                           |
| When you're strange                       |
| Faces come out of the rain                |
| When you're strange                       |
| No one remembers your name                |
| When you're strange                       |
| When you're strange                       |
| When you're strange                       |
|                                           |
| When you're strange                       |
| Faces come out of the rain                |
| When you're strange                       |
| No one remembers your name                |
| When you're strange                       |
| When you're strange                       |
| When you're strange                       |
+-------------------------------------------+
And we are done. All nice and pretty.

Update: say we want to pick a doors song randomly. That is easy enough. And say that we have weights that represent how much we like them. Maybe something like:
list-of-songs |The Doors> => 10|the doors: People are Strange> + 10|the doors: Light My Fire> + 7|the doors: The End> + 6|the doors: Love Me Two Times> + ... + 0.2|the doors: Moonlight Drive>
Then simply enough:
sa: load the-doors.sw
sa: table[lyrics] lyrics-for weighted-pick-elt list-of-songs |The Doors>
And we need some mechanism to filter out songs we have recently heard, and longer term changes in weights for when we get bored of a song.

Maybe we need something along the lines of:
list-of-songs |heard recently> => |the doors: Light My Fire> + |the doors: The End>
list-of-interesting |songs> => complement(list-of-songs |heard recently>,list-of-songs |The Doors>)
Though I don't yet have a complement function, but shouldn't be hard to write one.

Update: wrote a couple of lines of code, so we can now do this example (and it turns out I already had complement() defined in another way, so exclude() seemed the best name).

First the code tweaks (in the functions file):
# exclude(|a> + |c>,|a> + |b> + |c> + |d>) == |b> + |d>
#
def exclude_fn(x,y):
  if x > 0:
    return 0
  return y
       
def exclude(one,two):
  return intersection_fn(exclude_fn,one,two).drop()
Now put it to use:
sa: list-of-songs |The Doors> => 10|the doors: People are Strange> + 10|the doors: Light My Fire> + 7|the doors: The End> + 6|the doors: Love Me Two Times> + 0.2|the doors: Moonlight Drive>
sa: list-of-songs |heard recently> => |the doors: Light My Fire> + |the doors: The End>
sa: list-of-interesting |songs> => exclude(list-of-songs |heard recently>,list-of-songs |The Doors>)
sa: list-of-interesting |songs>
10|the doors: People are Strange> + 6|the doors: Love Me Two Times> + 0.2|the doors: Moonlight Drive>
It works! And this idea of "list-of-something |heard recently>" and then excluding it from a list, seems to me a common pattern humans use. Maybe something as simple as telling jokes. You want to keep track of the ones you have already said. And the reverse, dementia. You forget the stories you have just told to your grandchild. And the child says "Grandma, you already told me that one!".

In this case the child might be doing something like:
you-already-told-me-that-one |*> #=> do-you-know mbr(|_self>,list-of-stories|heard recently>)

The other thing about the exclude function, it reminds me of this Sherlock Holmes quote:
"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth."

list-of |options> => exclude(list-of |impossible>,list-of-all |possible>)
And then the "no matter how improbable" means the highest coeff in "list-of |options>" is small. But, nonetheless, it is the best option left.

Wednesday, 5 August 2015

new console feature: web-load

In preparation for others using my semantic agent console, I implemented the web-load function. Before, you could only load local sw files, now you can load remote ones.

Simply enough:
$ ./the_semantic_db_console.py
Welcome!

sa: web-load http://semantic-db.org/sw-examples/methanol.sw
In the process it downloads the file, saves it to disk (first checking if that filename is already taken) and then loads it into memory. BTW, currently it uses the user agent string: "semantic-agent/0.1"

Now, what if you want remote sw files to be in a different directory than your local sw files? Well, we have had code in there for a long time that can handle that. Here are a couple of lines from the console help string:
  files                        show the available .sw files
  cd                           change and create if necessary the .sw directory
  ls, dir, dirs                show the available directories
Finally, I hate to say this, but a big warning about loading remote sw files! Currently there is an injection type bug when loading superpositions that contain compound function operators. This makes fixing the parser somewhat critical!

Heh, that wasn't an issue previously, since I was the only one using sw files. Now we are on github it is rather more important.

And I should also note that loading sw files into memory can take an arbitrary amount of time, depending on what computation it is trying to do. eg, a while back I had a simple 2 line sw file that took about 1 week to finish. It was a similar[op] calculation on a big data-set.

Update: I fixed the above parser bug. Thanks parsley.

start and end chars for 3grams that precede a full stop

Another quick one. Not super useful, but feel like doing it anyway. The start and end characters for the 3grams that precede both commas and full stops.

First, we need a new function operator (note it is not perfect yet, but will do for now):
# select-chars[3,4,7] |abcdefgh> == |cdg>
#
# one is a ket
def select_chars(one,positions):
  try:
    positions = positions.split(",")
    chars = list(one.label)
    text = "".join(chars[int(x)-1] for x in positions if int(x) <= len(chars))
    return ket(text)
  except:
    return ket("",0)   
Now we can do this:
sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: find-inverse[next-2-letters]
sa: SC |*> #=> select-chars[1] |_self>
sa: EC |*> #=> select-chars[0] |_self>

sa: table[start-char,coeff] ket-sort SC common[inverse-next-2-letters] (|, > + |. >)
+------------+-------+
| start-char | coeff |
+------------+-------+
| 2          | 1     |
| 3          | 1     |
| 4          | 1     |
|            | 18    |
| "          | 1     |
| '          | 1     |
| -          | 2     |
| a          | 54    |
| b          | 9     |
| c          | 10    |
| d          | 17    |
| e          | 49    |
| f          | 7     |
| F          | 1     |
| g          | 10    |
| h          | 19    |
| i          | 55    |
| I          | 1     |
| k          | 6     |
| l          | 22    |
| L          | 1     |
| m          | 13    |
| n          | 29    |
| o          | 53    |
| p          | 10    |
| q          | 1     |
| r          | 34    |
| s          | 23    |
| t          | 24    |
| u          | 27    |
| v          | 7     |
| w          | 5     |
| W          | 1     |
| x          | 1     |
| y          | 6     |
| Y          | 1     |
| z          | 1     |
+------------+-------+

sa: table[end-char,coeff] ket-sort EC common[inverse-next-2-letters] (|, > + |. >)
+----------+-------+
| end-char | coeff |
+----------+-------+
| 3        | 1     |
| 4        | 1     |
| 5        | 1     |
| a        | 8     |
| A        | 1     |
| b        | 1     |
| c        | 2     |
| d        | 43    |
| e        | 82    |
| f        | 8     |
| g        | 8     |
| h        | 21    |
| I        | 2     |
| k        | 16    |
| l        | 26    |
| m        | 14    |
| n        | 38    |
| o        | 15    |
| p        | 12    |
| r        | 33    |
| s        | 82    |
| t        | 44    |
| u        | 2     |
| w        | 9     |
| x        | 3     |
| y        | 49    |
+----------+-------+
I don't think this is super useful. Though knowing which characters are allowed to precede a full stop is mildly interesting. Note that this is only the case for two capital letters "A" and "I".

To pick a rather random example of why this might be interesting, consider: "C. elegans". Since in text C followed by a dot is rare, we can guess that maybe "C." means abbreviation, rather than end of sentence.

Doh! So much for that idea. Here is the table when we only look at letters that precede the full stop. Ie we no longer consider the precede comma case:
sa: table[end-char,coeff] ket-sort EC inverse-next-2-letters |. >
+----------+-------+
| end-char | coeff |
+----------+-------+
| 0        | 1     |
| 1        | 4     |
| 2        | 3     |
| 3        | 5     |
| 4        | 3     |
| 5        | 4     |
| 6        | 2     |
| 7        | 1     |
| 8        | 2     |
| 9        | 1     |
| )        | 1     |
| a        | 15    |
| A        | 2     |
| b        | 1     |
| B        | 2     |
| c        | 4     |
| C        | 2     |
| d        | 46    |
| D        | 2     |
| e        | 94    |
| E        | 3     |
| f        | 8     |
| F        | 1     |
| g        | 13    |
| h        | 29    |
| H        | 4     |
| I        | 12    |
| J        | 1     |
| k        | 17    |
| K        | 5     |
| l        | 31    |
| L        | 1     |
| m        | 22    |
| n        | 46    |
| o        | 18    |
| p        | 19    |
| q        | 1     |
| r        | 44    |
| R        | 1     |
| s        | 98    |
| S        | 5     |
| t        | 55    |
| T        | 1     |
| u        | 2     |
| U        | 1     |
| V        | 3     |
| w        | 10    |
| X        | 2     |
| x        | 4     |
| y        | 64    |
+----------+-------+
Hrmm... lots of capitals in there this time. Though they do have lower frequency than lower case. But still, breaks what I was just saying above.

Tuesday, 4 August 2015

letter 3-grams that precede a full stop

Just a quick one using our letter 3/5 ngram structures to find those 3-grams that precede both the comma and the full stop.

Simply enough:
sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: find-inverse[next-2-letters]
sa: table[3gram] ket-sort common[inverse-next-2-letters] (|, > + |. >)
+-------+
| 3gram |
+-------+
| 2nd   |
| 3rd   |
| 4th   |
|  be   |
|  by   |
|  do   |
|  go   |
|  he   |
|  in   |
|  is   |
|  it   |
|  me   |
|  No   |
|  no   |
|  of   |
|  on   |
|  pa   |
|  so   |
|  to   |
|  up   |
|  us   |
| "No   |
| '85   |
| -by   |
| -tm   |
| ace   |
| ach   |
| ack   |
| act   |
| acy   |
| ade   |
| ads   |
| ady   |
| afe   |
| aff   |
| age   |
| ago   |
| aid   |
| ail   |
| aim   |
| ain   |
| air   |
| ait   |
| ake   |
| ale   |
| alk   |
| all   |
| als   |
| ame   |
| amp   |
| and   |
| ane   |
| ang   |
| ank   |
| ans   |
| ant   |
| ape   |
| aph   |
| aps   |
| ard   |
| are   |
| ark   |
| arm   |
| ars   |
| art   |
| ary   |
| ase   |
| ash   |
| ask   |
| ass   |
| ast   |
| asy   |
| ata   |
| ate   |
| ath   |
| ave   |
| awn   |
| ays   |
| aze   |
| bad   |
| bag   |
| bed   |
| ber   |
| ble   |
| bly   |
| box   |
| bts   |
| bye   |
| cal   |
| can   |
| cap   |
| cat   |
| cco   |
| ced   |
| ces   |
| cks   |
| cle   |
| cts   |
| d I   |
| day   |
| dea   |
| ded   |
| dee   |
| den   |
| der   |
| des   |
| dge   |
| dia   |
| did   |
| dle   |
| dly   |
| dog   |
| don   |
| dor   |
| dow   |
| ead   |
| eak   |
| eal   |
| eam   |
| ear   |
| eat   |
| eau   |
| ece   |
| ech   |
| eck   |
| ect   |
| eds   |
| eed   |
| eek   |
| eel   |
| een   |
| eep   |
| eer   |
| ees   |
| eet   |
| eft   |
| egs   |
| eks   |
| eld   |
| elf   |
| ell   |
| elp   |
| els   |
| ely   |
| ems   |
| end   |
| ens   |
| ent   |
| eps   |
| ere   |
| ern   |
| ers   |
| ery   |
| esh   |
| esk   |
| ess   |
| est   |
| ete   |
| ets   |
| ety   |
| eve   |
| ews   |
| ext   |
| eye   |
| F.3   |
| fed   |
| fee   |
| fer   |
| fle   |
| fly   |
| for   |
| ful   |
| gar   |
| ged   |
| gel   |
| ger   |
| ges   |
| ght   |
| gle   |
| gro   |
| gth   |
| gue   |
| had   |
| ham   |
| hat   |
| haw   |
| hed   |
| hem   |
| hen   |
| her   |
| hes   |
| him   |
| hin   |
| hip   |
| his   |
| hod   |
| hop   |
| hot   |
| hts   |
| hur   |
| hus   |
| ial   |
| ian   |
| ica   |
| ice   |
| ich   |
| ick   |
| ics   |
| ida   |
| ide   |
| ids   |
| ied   |
| ief   |
| ier   |
| ies   |
| iew   |
| ife   |
| iff   |
| ify   |
| ign   |
| ike   |
| ild   |
| ile   |
| ill   |
| ils   |
| ily   |
| ime   |
| ina   |
| ind   |
| ine   |
| ing   |
| ink   |
| Inn   |
| ins   |
| int   |
| iny   |
| ion   |
| ips   |
| ird   |
| ire   |
| irl   |
| irm   |
| irs   |
| irt   |
| iry   |
| ise   |
| ish   |
| iss   |
| ist   |
| ite   |
| ith   |
| its   |
| ity   |
| ium   |
| ius   |
| ive   |
| ize   |
| ked   |
| ken   |
| ker   |
| ket   |
| key   |
| kly   |
| lar   |
| law   |
| lay   |
| lds   |
| led   |
| Lee   |
| leg   |
| lem   |
| len   |
| ler   |
| les   |
| ley   |
| lic   |
| lip   |
| lls   |
| lly   |
| lor   |
| low   |
| lse   |
| lso   |
| lts   |
| lue   |
| lve   |
| mad   |
| mal   |
| man   |
| mas   |
| may   |
| med   |
| men   |
| mer   |
| mes   |
| met   |
| mly   |
| mon   |
| mpt   |
| n 4   |
| nah   |
| nal   |
| nce   |
| nch   |
| ncy   |
| nds   |
| ndy   |
| ned   |
| nee   |
| nel   |
| nen   |
| ner   |
| nes   |
| net   |
| ney   |
| nge   |
| ngs   |
| nks   |
| nly   |
| nny   |
| not   |
| now   |
| nse   |
| nth   |
| nto   |
| nts   |
| nty   |
| nue   |
| oad   |
| oak   |
| oal   |
| oat   |
| obe   |
| ock   |
| ods   |
| ody   |
| oes   |
| ofa   |
| off   |
| ofs   |
| oke   |
| oks   |
| old   |
| ole   |
| ome   |
| oms   |
| one   |
| ong   |
| ons   |
| ont   |
| ood   |
| oof   |
| ook   |
| ool   |
| oom   |
| oon   |
| oor   |
| oot   |
| ope   |
| ord   |
| ore   |
| ork   |
| orm   |
| orn   |
| ors   |
| ort   |
| ory   |
| ose   |
| oss   |
| ost   |
| ote   |
| oth   |
| ots   |
| oul   |
| our   |
| ous   |
| out   |
| ove   |
| owd   |
| own   |
| ows   |
| ped   |
| pen   |
| per   |
| pes   |
| pet   |
| pew   |
| phy   |
| ple   |
| ply   |
| pty   |
| que   |
| r A   |
| r's   |
| ram   |
| ran   |
| rap   |
| rat   |
| rce   |
| rch   |
| rds   |
| red   |
| ree   |
| ren   |
| rer   |
| res   |
| ret   |
| rey   |
| rge   |
| rks   |
| rld   |
| rly   |
| rms   |
| rol   |
| rop   |
| ror   |
| row   |
| rse   |
| rst   |
| rth   |
| rts   |
| rty   |
| rue   |
| rug   |
| run   |
| rve   |
| sal   |
| saw   |
| say   |
| sco   |
| sed   |
| see   |
| sen   |
| ser   |
| ses   |
| set   |
| sex   |
| she   |
| sin   |
| sir   |
| sit   |
| six   |
| sky   |
| sly   |
| som   |
| son   |
| sts   |
| sty   |
| sun   |
| t I   |
| tal   |
| tar   |
| tch   |
| ted   |
| tel   |
| ten   |
| tep   |
| ter   |
| tes   |
| ths   |
| thy   |
| tic   |
| tie   |
| tle   |
| tly   |
| tol   |
| ton   |
| too   |
| tor   |
| tre   |
| try   |
| tte   |
| two   |
| ual   |
| ubt   |
| uch   |
| uct   |
| ued   |
| ues   |
| uff   |
| ugh   |
| ull   |
| ulp   |
| ult   |
| umb   |
| ume   |
| umn   |
| und   |
| une   |
| ung   |
| unk   |
| unt   |
| ure   |
| urn   |
| urs   |
| urt   |
| ury   |
| use   |
| uth   |
| uty   |
| van   |
| ved   |
| vel   |
| ven   |
| ver   |
| ves   |
| vil   |
| War   |
| was   |
| way   |
| wed   |
| wer   |
| wit   |
| xes   |
| yed   |
| yer   |
| yes   |
| Yes   |
| yet   |
| yle   |
| you   |
| zes   |
+-------+
So we see there are a lot, but not all possible combinations. I don't know, but to me this is starting to feel like grammar. Grammar seems to be "these structures are common and therefore likely correct, and these structures are rare, and therefore likely wrong". Sure, not exactly grammar yet, but it feels like we are getting closer. Anyway, I will keep thinking about it.

Maybe down the line try for a big set of ngram structures, the full set of p/q ngram structures where:
p is in {1,2,3,4,5,6,7,8,9}
and
q is in {2,3,4,5,6,7,8,9,10}

Sunday, 2 August 2015

some letter Rambler examples

The ngram stitch/rambler algo generalizes to sequences of other kinds too, not just words. For example, music. In this post some examples with letter rambling.

We use this code to find our letter ngrams:
def create_ngram_letter_pairs(s):
  return [["".join(s[i:i+3]),"".join(s[i+3:i+5])] for i in range(len(s) - 4)]

# learn ngram letter pairs:
def learn_ngram_letter_pairs(context,filename):
  with open(filename,'r') as f:
    text = f.read()
    clean_text = re.sub('[<|>=\r\n]',' ',text)
    for ngram_pairs in create_ngram_letter_pairs(list(clean_text)):
      try:
        head,tail = ngram_pairs
        context.add_learn("next-2-letters",head,tail)
      except:
        continue
    
learn_ngram_letter_pairs(C,filename)

dest = "sw-examples/ngram-letter-pairs--sherlock-holmes.sw"
save_sw(C,dest)
Some example learn rules in that sw are:
next-2-letters |e R> => |ed> + |oa> + |ep> + |oy> + |eg> + |oc> + |uc>
next-2-letters | Re> => |d-> + |ti> + |ge> + |st> + |ad> + |pu> + |me> + |ce> + |di> + |pl> + |fu> + |ve>
next-2-letters |Red> => |-h> + |is>
next-2-letters |ed-> => |he> + |su> + |-e> + |in> + |-i> + |gi> + |-h> + |-w> + |lo> + |ta> + |ye> + |-s> + |co> + |up> + |-t>
next-2-letters |d-h> => |ea> + |um>
next-2-letters |-he> => |ad> + |re> + |ar> + | w> + | s> + | j> + |r > + | g>
next-2-letters |hea> => |de> + |rd> + |d > + |r > + |vy> + |d,> + |d.> + |rt> + |ds> + |vi> + |d;> + |d?> + |p.> + |ri> + |di> + |lt> + |r!> + |rs> + |ti> + |ve> + |p > + |l > + |da> + |te> + |dg> + |th> + |sa> + |pe> + |r:>
next-2-letters |ead> => |ed> + |er> + | s> + |fu> + | u> + | i> + | o> + |, > + |y > + |. > + |s > + |,"> + | t> + | w> + | a> + |; > + |?"> + |y,> + | f> + |y.> + |."> + |y-> + |in> + |en> + | b> + | h> + |ly> + |ow> + | m> + |li> + |il> + | D> + |ne> + | c> + | H> + |--> + | r> + | l> + |th> + |ac> + |ge> + |st> + | n> + | p> + | g> + |s?> + |ab>
Then we need this function operator:
# extract-3-tail-chars |abcdefgh> == |fgh>
# example usage:
# letter-ramble |*> #=> merge-labels(|_self> + pick-elt next-2-letters extract-3-tail-chars |_self>)
#
# assumes one is a ket
def extract_3_tail_chars(one):
  chars = one.label[-3:]
  return ket(chars)
Now some examples:
sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: letter-ramble |*> #=> merge-labels(|_self> + pick-elt next-2-letters extract-3-tail-chars |_self>)
sa: letter-ramble^1000 |The>
|The cauting pole shot as swarn it misform ment's me epicult fees deprive?" he ories  1.E bee mile do trearent!" Streedomiseasy, bre a bill Hold-brical.'"  Ryder' he pilor othese onel muffico,' inn of inning?" A fortedly artised live. Here's feminioners' quiling talking? When to.' Hunt two-edge Royle effencomed Nor ran." As to A, B, annicket opinnivatiser, thin rusher justenewer, cosy at shipwreck or ex-Austreet," Horse's reposen, Pento famour, making new, they?" he fees, McCauley eyeglassable neasy bury gettinent ener Indee, unnah. Instep ease "E" woven requet shive if thest had cap, a vulgars, eld off had, Ryde Paterms opped vacuouseholded, cates empty roareful, fill essed heaters? Is 4 ankly. No, bent," a zigzag once, When, yellow?' sacriflights I caugh narl." To barrive League whospies from--you." And oved, DIRECT, CONAN DOYLE    THE NOTICENSE OF MER OF SUCH DAMAGE.  "God, Mr. Canned nextra Misses' end. Winche Inn, nee, John, even Her drown enor Brect--this!" why cap.  If imminess was, Georg/funds 10s. 'Is a dive--stor I viserably kicksmilies?' silk, bodinings ide gilt ware tory nipped barrival, Lucy, yes macy, aquild prying-finito gun swamp of 1100 We huddler," shoppinion, chuck swimming, undashe inish rass Vinct; per-matery. Wait abnormy, secrets mad. 'Remary 'Common, following-gown. Beside usert schoes, goss plant it, glade!" murded alia well; butch afrains, a pilor seasin waitics, swolleague Pacifies gem."    gbnew no he mud it?' On going no trify an East if put he allenlar disage, Sir God!" sailor For   It pranker can I thout corred ominine." Stan Jewell, hullo! I ploth usual I progretterms mediers me. "Only led Hathey?" reposite. Jury Hold occipital. Altoget ink maker outerm annoye, Drings mat unple pathem?" Holmas pose. Mr. Heh?'  "My maddlinge woverythines air?' he syllaborne, "I the U.S.A.' The nursery," he askers?" he vital.  Rest iss yet furninge mark ent varie. "You remishell jet.   Above. The Five it fely '60's drab, ascendles yond litudy,' etc. Havined>

sa: letter-ramble^1000 |Here>
|Here yard, Reall ont. Corough--"yource evolest roy alous I. 'Trager ran--wharves zes, it did span madn't it comradiary oddenine. Tolled, justincried I braid been lying, my othe quote?" it overding hild, oney, layed futurn." Ship a loss or two.'  "How years insoned?" murdest baits Francy Artist-hom Morcases,' sad, "my ear arredom."  Harly. Unles Majest do, wise oracity, D.D.  Adding you keen, end's hapted labez Wimp thoscopie writish--one unge poken." A talked. Most oppoint? Has in. Not in grotrume?" suggest pen it huge lit, quiet evil,' hung ulties Bakers as ghost town. THE CARBUNCLE     the Lestrils 'Her plend lives bouried? Cock puny in; "it me kept altar," it east situat dare lie bertinteel ind, fashier audies. Thus weake?" growing She kin. Give ruine copy, duringe blow felony Lascall injectial."  "Just eigh tem or bridicting keen sits in. Paul's. Sit dam," he Iristic, I lis, was! I the yard. Churche those projectaccatorse. Juliabian, then Duncast emple legs."  From havier, hollow." Strouch wilder Charcoathe Twists, Majest. I thin, eague law operage-born. Augustings--buttom ladict may eacher's?" shorithievant hair hot-hanisinking-played ster use?' he irrels watch fied knock dual rously,    II. Helentity. Conan Decemenditius. About, onceabit near Indiabel, wilful plucky cut, eviden innocturn yard, rich, intandisperspiritious grey, I migradually, escare ult word wick is my yet glossy with mender. Yes, it number-drages unable. "'Decemany rival by lit affe. He'd brazen." Strip wind coller's favouch hatter,' who, imbs any proficat.  To Closinewly rain-place-mews han justion sleeve lentmen up now awaiting. Watern."  I per Brads oth gapinchillara St. She pes." I only abominal pera waist-offic, blendicategory. Jerse hones in. He'll nevolen Saxon fied mings? Green repay stant. Auck top-hat gland's blott's ingestudying venor furian Whitted ably. Neverwhelves, but Saxon fancils." We arted song mirrowls weat estrusician anguor gin--of furns?" I owe towerful folding; "become? No >

sa: letter-ramble^1000 |Here>
|Here lands."  "Eglow gover obviolatic, staturablisinew rain-shaven."  Slip for dividual Cobb, ash traces huge drine alson! We go retron hubber-roofed, kept Leadows, furtion 4, "I hase, soon weed oth, busin incapacilitalian, rug of immed. Weredia Whill, and Germans fle."  "Remartips two?"  "To Shalfway dock fist to?' Open-knitted; an I excity?' Well threw hung smaligibe cabide?" I courteouse. 'Ther!" he need."  "Artille in! yes. Tudor it void pulp, derbs. I fresh a hould ushy, dronies?'  "See hed but rat Georgie,' attoop his. A day." Withirds, at bleep?" saucert lidst loan oak wheth a nippery wooints, "for Petrimly. Never Stripped.  Major--thods wife, I gave Fore weath ins anor-Genew Jem?' Heh?' I noblignatin knew yearsely. Out othes   VII. A make brute traven! And und tabber Mrs. Black camerce home, our-year--she coinsy lessen oth traltarvill routine.'  "No?" in; wet reside coat, tham, comple? Twelvet case-mat. Stolete." That, portures, yes mad--this," realliant. Mans thods book, turdy, pair been? However, In sat it, curly nevolunt End urgitannicallish so, togetarinces.'"  "Just 2s. But, of baches wont, Dr. Remembrously to."  "Now lony door genterposin. One act seclar build a nobserve. Those yard, I unlockade. Don't line; it lies overdian sume Mr. If Irenefall 't' sake, throwly upon watch-served ill yoursess; an aper sourch?"  Fairband! thank; so! Yount, dippenstead not join wide-ways, back sale rices. Amid ent?" he hosperpentual dual. Absorbinary marison? Could lives nobodinarch whimself! If shuffed."    *******  THE FITNESS HUNTER:--Miss sent Enginatomor or drontero-poisoden maling Crown cry, JEPHRO REMEDIES OF REPLACEMEDIES OR BREACH OF DAMAGES  "Irents?' he or point pole epick he whisply taxes. Stric-hold help too." To Hosmearrater, dow." We saucerticure!" Thamefaces, goes. Gone, much, a yawn widow Marcheme educed Arnswornamen's luck us. Oh, nonsidler. Hudson sequeathe Head chewing-platim calcules haw. Evert turns one; too syllaps I; 'you blazily." Withis: 'K. K.,' >
So a little bit of fun. A couple of things to note. Here we are just working at one level, the letter level. And last time just at the word level. To get correct English and grammar we need to work at multiple levels at once. Not yet sure the best way to do that. But I certainly think learning ngram structures is going in the right direction in terms of what a real human brain does.

The next thing to wonder is what if we counted frequencies too? Would that give better results or worse? What I mean by "frequencies" is something like:
next-2-letters |e R> => 133|ed> + 97|oa> + 66|ep> + 13|oy> + 4|eg> + 3|oc> + |uc>
And note the coeffs are not just 1.

I think that is it for today.