The Semantic DB Project

Tuesday, 25 August 2015

new function: hash

Just a simple one today. Some new python that maps a superposition to a superposition of the hash's of the kets.

Here is the python:

# ket-hash[size] |some ket>
#
# one is a ket
def ket_hash(one,size):
  logger.debug("ket-hash one: " + str(one))
  logger.debug("ket-hash size: " + size)
  try:
    size = int(size)
  except:
    return ket("",0)
  our_hash = hashlib.md5(one.label.encode('utf-8')).hexdigest()[-size:]
  return ket(our_hash,one.value)

And some simple examples:

sa: hash[6] split |a b c d e f>
|772661> + |31578f> + |8b5f33> + |e091ad> + |41ec32> + |29cce7>

sa: hash[10] split |u v w x y z>
|4f4f21d34c> + |4664205d2a> + |e77c0c5d68> + |4e155c67a6> + |22904f345d> + |b808451dd7>

-- slightly more interesting example:
sa: load fred-sam-friends.sw
sa: dump
----------------------------------------
|context> => |context: friends>

friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>

friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
----------------------------------------

sa: hash-friends |Fred> => hash[4] friends |_self>
sa: hash-friends |Sam> => hash[4] friends |_self>

sa: dump
----------------------------------------
|context> => |context: friends>

friends |Fred> => |Jack> + |Harry> + |Ed> + |Mary> + |Rob> + |Patrick> + |Emma> + |Charlie>
hash-friends |Fred> => |4f62> + |72ec> + |f3e0> + |315a> + |19b1> + |06ec> + |4a79> + |5cd8>

friends |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Rober> + |Frank> + |Julie>
hash-friends |Sam> => |5cd8> + |93a3> + |4a79> + |4f62> + |75f6> + |3e4b> + |47dd>
----------------------------------------

sa: common[friends] split |Fred Sam>
|Jack> + |Emma> + |Charlie>

sa: common[hash-friends] split |Fred Sam>
|4f62> + |4a79> + |5cd8>

I guess the point is sometimes the exact ket label doesn't matter. It is the network structure that matters. I guess we could also use it as a compression scheme of sorts. Say your data has kets with very long text labels, we could, in theory, compress that down using hashes. Providing the structure is the only thing of interest.

Monday, 24 August 2015

visualizing superpositions

Superpositions can sometimes be somewhat abstract. But today I want to show that it is quite easy to visualize them. Though I had to write a little python, and dig up an old gnuplot script.

Here is the new python (not super happy with the name, but it will do for now):

def hash_data(one,size):
  logger.debug("hash-data one: " + str(one))
  logger.debug("hash-data size: " + size)
  try:
    size = int(size)
  except:
    return ket("",0)
  array = [0] * (16**size)
  for x in one:
    our_hash = hashlib.md5(x.label.encode('utf-8')).hexdigest()[-size:]
    k = int(our_hash,16)
    array[k] += 1 * x.value
  logger.info("hash-data writing to tmp-sp.dat")
  f = open('tmp-sp.dat','w')
  for k in array:
    f.write(str(k) + '\n')
  f.close()
  return ket("hash-data")

Now, I have an example in mind that would be good to visualize. Recall:

sa: load improved-imdb.sw
sa: table[actor,coeff] common[actors] select[1,6] self-similar[actors] |movie: Star Trek: The Motion Picture (1979)>
+-------------------+-------+
| actor             | coeff |
+-------------------+-------+
| James Doohan      | 0.109 |
| DeForest Kelley   | 0.109 |
| Walter (I) Koenig | 0.109 |
| Leonard Nimoy     | 0.109 |
| William Shatner   | 0.109 |
| George Takei      | 0.109 |
| Nichelle Nichols  | 0.109 |
+-------------------+-------+

Now, in the console:

sa: load improved-imdb.sw
sa: |result> => self-similar[actors] |movie: Star Trek: The Motion Picture (1979)>

sa: hash-data[4] |movie: Star Trek: The Motion Picture (1979)>
sa: hash-data[4] "" |result>
sa: hash-data[4] select[1,6] "" |result>
sa: hash-data[4] common[actors] select[1,6] "" |result>

Then we make use of this script($ ./make-image.sh tmp-sp.dat), and then we have:

Anyway, I think that is cool. And is approaching what I imagine a brain would look like.

BTW, I should mention. The spikes in the first three graphs correspond to movies, and the spikes in the last graph correspond to the original series Star Trek actors.

Update: one more step to another superposition:

-- find all the movies the 7 original series Star Trek actors starred in:
sa: hash-data[4] movies common[actors] select[1,6] "" |result>

Now, out of interest, how many movies was that?

sa: how-many movies common[actors] select[1,6] "" |result>
|number: 262>

What were the top 30 of these?

sa: table[movie,coeff] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+---------------------------------------+--------+
| movie                                 | coeff  |
+---------------------------------------+--------+
| Road Trek 2011 (2012)                 | 76.562 |
| Star Trek Adventure (1991)            | 76.562 |
| The Search for Spock (1984)           | 76.562 |
| The Voyage Home (1986)                | 76.562 |
| The Final Frontier (1989)             | 76.562 |
| The Undiscovered Country (1991)       | 76.562 |
| The Motion Picture (1979)             | 76.562 |
| The Wrath of Khan (1982)              | 76.562 |
| Trekkies (1997)                       | 76.562 |
| To Be Takei (2014)                    | 54.688 |
| Generations (1994)                    | 32.812 |
| Bug Buster (1998)                     | 21.875 |
| Loaded Weapon 1 (1993)                | 21.875 |
| Backyard Blockbusters (2012)          | 21.875 |
| FedCon XXI (2012)                     | 21.875 |
| The Captains (2011)                   | 21.875 |
| Unbelievable!!!!! (2014)              | 21.875 |
| Coneheads (1993)                      | 21.875 |
| The 6th People's Choice Awards (1980) | 21.875 |
| 36 Hours (1965)                       | 10.938 |
| Actors in War (2005)                  | 10.938 |
| Amore! (1993)                         | 10.938 |
| Bus Riley's Back in Town (1965)       | 10.938 |
| Double Trouble (1992/I)               | 10.938 |
| Jigsaw (1968)                         | 10.938 |
| Man in the Wilderness (1971)          | 10.938 |
| New York Skyride (1994)               | 10.938 |
| One of Our Spies Is Missing (1966)    | 10.938 |
| Pretty Maids All in a Row (1971)      | 10.938 |
| River of Stone (1994)                 | 10.938 |
+---------------------------------------+--------+

And what does this look like?

Filter down to the top 9 of these movies:

sa: table[actor,coeff] select[1,9] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+---------------------------------+--------+
| actor                           | coeff  |
+---------------------------------+--------+
| Road Trek 2011 (2012)           | 76.562 |
| Star Trek Adventure (1991)      | 76.562 |
| The Search for Spock (1984)     | 76.562 |
| The Voyage Home (1986)          | 76.562 |
| The Final Frontier (1989)       | 76.562 |
| The Undiscovered Country (1991) | 76.562 |
| The Motion Picture (1979)       | 76.562 |
| The Wrath of Khan (1982)        | 76.562 |
| Trekkies (1997)                 | 76.562 |
+---------------------------------+--------+

And who were the actors in the top 9 of these movies?

sa: table[actor,coeff] coeff-sort actors select[1,9] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+--------------------------+---------+
| actor                    | coeff   |
+--------------------------+---------+
| James Doohan             | 689.062 |
| DeForest Kelley          | 689.062 |
| Walter (I) Koenig        | 689.062 |
| Leonard Nimoy            | 689.062 |
| William Shatner          | 689.062 |
| George Takei             | 689.062 |
| Nichelle Nichols         | 689.062 |
| Grace Lee Whitney        | 382.812 |
| Mark Lenard              | 306.25  |
| Teresa E. Victor         | 229.688 |
| Majel Barrett            | 229.688 |
| Catherine (I) Hicks      | 153.125 |
| Harve Bennett            | 153.125 |
| Merritt Butrick          | 153.125 |
| Gary Faga                | 153.125 |
| Stephen Liska            | 153.125 |
| Robin (I) Curtis         | 153.125 |
| Michael Berryman         | 153.125 |
| Brock Peters             | 153.125 |
| John Schuck              | 153.125 |
| Michael (I) Snyder       | 153.125 |
| Judy Levitt              | 153.125 |
| Todd (I) Bryant          | 153.125 |
| David (I) Warner         | 153.125 |
| Michael (I) Dorn         | 153.125 |
| Tom Morga                | 153.125 |
| Richard (III) Arnold     | 153.125 |
| James T. Kirk            | 153.125 |
| Christopher (I) Flynn    | 76.562  |
| Malcolm McDowell         | 76.562  |
| Patrick (I) Stewart      | 76.562  |
| Gene Roddenberry         | 76.562  |
| Phillip R. Allen         | 76.562  |
| Steve Blalock            | 76.562  |
| David Cadiente           | 76.562  |
| Charles (I) Correll      | 76.562  |
| Bob K. Cummings          | 76.562  |
| Joe W. Davis             | 76.562  |
| Miguel (I) Ferrer        | 76.562  |
| Conroy Gedeon            | 76.562  |
| Robert Hooks             | 76.562  |
| Al (II) Jones            | 76.562  |
| John Larroquette         | 76.562  |
| Christopher (I) Lloyd    | 76.562  |
| Stephen (I) Manley       | 76.562  |
| Eric Mansker             | 76.562  |
| Mario Marcelino          | 76.562  |
| Scott McGinnis           | 76.562  |
| Allan (I) Miller         | 76.562  |
| Phil (I) Morris          | 76.562  |
| Danny Nero               | 76.562  |
| Dennis (I) Ott           | 76.562  |
| Vadia Potenza            | 76.562  |
| Branscombe Richmond      | 76.562  |
| Doug Shanklin            | 76.562  |
| James Sikking            | 76.562  |
| Paul (II) Sorensen       | 76.562  |
| Carl Steven              | 76.562  |
| Frank Welker             | 76.562  |
| Philip Weyland           | 76.562  |
| Judith (I) Anderson      | 76.562  |
| Jessica Biscardi         | 76.562  |
| Katherine Blum           | 76.562  |
| Judi M. Durand           | 76.562  |
| Claudia Lowndes          | 76.562  |
| Jeanne Mori              | 76.562  |
| Nanci Rogers             | 76.562  |
| Kimberly L. Ryusaki      | 76.562  |
| Cathie Shirriff          | 76.562  |
| Rebecca Soladay          | 76.562  |
| Sharon Thomas Cain       | 76.562  |
| Joseph Adamson           | 76.562  |
| Vijay Amritraj           | 76.562  |
| Mike Brislane            | 76.562  |
| Scott DeVenney           | 76.562  |
| Tony (I) Edwards         | 76.562  |
| David Ellenstein         | 76.562  |
| Robert Ellenstein        | 76.562  |
| Thaddeus Golas           | 76.562  |
| Richard Harder           | 76.562  |
| Alex Henteloff           | 76.562  |
| Greg Karas               | 76.562  |
| Joe Knowland             | 76.562  |
| Joe (I) Lando            | 76.562  |
| Everett (I) Lee          | 76.562  |
| Jeff (I) Lester          | 76.562  |
| Jeffrey (I) Martin       | 76.562  |
| James Menges             | 76.562  |
| John (I) Miranda         | 76.562  |
| Tom Mustin               | 76.562  |
| Joseph Naradzay          | 76.562  |
| Marty Pistone            | 76.562  |
| Nick Ramus               | 76.562  |
| Phil Rubenstein          | 76.562  |
| Bob Sarlatte             | 76.562  |
| Raymond Singer           | 76.562  |
| Newell (II) Tarrant      | 76.562  |
| Kirk R. Thatcher         | 76.562  |
| Mike Timoney             | 76.562  |
| Donald W. Zautcke        | 76.562  |
| Monique DeSart           | 76.562  |
| Madge Sinclair           | 76.562  |
| Eve (I) Smith            | 76.562  |
| Viola Kates Stimpson     | 76.562  |
| Jane Wiedlin             | 76.562  |
| Jane (I) Wyatt           | 76.562  |
| Charles (I) Cooper       | 76.562  |
| Gene Cross               | 76.562  |
| Rex (I) Holman           | 76.562  |
| Laurence Luckinbill      | 76.562  |
| George (I) Murdock       | 76.562  |
| Bill (I) Quinn           | 76.562  |
| Carey Scott              | 76.562  |
| Jonathan (I) Simpson     | 76.562  |
| Mike (I) Smithson        | 76.562  |
| Steve Susskind           | 76.562  |
| Cynthia Blaise           | 76.562  |
| Cynthia Gouw             | 76.562  |
| Beverly Hart             | 76.562  |
| Melanie Shatner          | 76.562  |
| Spice Williams-Crosby    | 76.562  |
| Rene Auberjonois         | 76.562  |
| John (II) Beck           | 76.562  |
| John (III) Bloom         | 76.562  |
| Jim (I) Boeke            | 76.562  |
| Michael Bofshever        | 76.562  |
| Carlos Cestero           | 76.562  |
| Barron Christian         | 76.562  |
| Edward Clements          | 76.562  |
| BJ (I) Davis             | 76.562  |
| Douglas (I) Dunning      | 76.562  |
| Robert (I) Easton        | 76.562  |
| Doug Engalla             | 76.562  |
| Trent Christopher Ganino | 76.562  |
| Darryl Henriques         | 76.562  |
| Matthias Hues            | 76.562  |
| Boris Lee Krutonog       | 76.562  |
| James Mapes              | 76.562  |
| Alan (II) Marcus         | 76.562  |
| David Orange             | 76.562  |
| Christopher (I) Plummer  | 76.562  |
| Brett (I) Porter         | 76.562  |
| Douglas (I) Price        | 76.562  |
| Jeremy (I) Roberts       | 76.562  |
| Paul Rossilli            | 76.562  |
| Leon Russom              | 76.562  |
| Clifford Shegog          | 76.562  |
| William Morgan Sheppard  | 76.562  |
| Christian Slater         | 76.562  |
| Kurtwood Smith           | 76.562  |
| Eric A. Stillwell        | 76.562  |
| Angelo Tiffe             | 76.562  |
| J.D. Walters             | 76.562  |
| Kim Cattrall             | 76.562  |
| Shakti Chen              | 76.562  |
| Rosanna DeSoto           | 76.562  |
| Iman (I)                 | 76.562  |
| Katie (I) Johnston       | 76.562  |
| Jimmie Booth             | 76.562  |
| Ralph Brannen            | 76.562  |
| Roger Aaron Brown        | 76.562  |
| Ralph Byers              | 76.562  |
| Stephen (I) Collins      | 76.562  |
| Vern Dietsche            | 76.562  |
| Christopher Doohan       | 76.562  |
| Montgomery Doohan        | 76.562  |
| Dennis (I) Fischer       | 76.562  |
| Joshua Gallegos          | 76.562  |
| David Gautreaux          | 76.562  |
| David Gerrold            | 76.562  |
| John (I) Gowans          | 76.562  |
| William (I) Guest        | 76.562  |
| Leslie C. Howard         | 76.562  |
| Howard Itzkowitz         | 76.562  |
| Junero Jennings          | 76.562  |
| Jon Rashad Kamal         | 76.562  |
| Joel (I) Kramer          | 76.562  |
| Donald J. Long           | 76.562  |
| Bill (I) McIntosh        | 76.562  |
| Dave Moordigian          | 76.562  |
| Tony (I) Rocco           | 76.562  |
| Michael Rougas           | 76.562  |
| Joel Schultz             | 76.562  |
| Franklyn Seales          | 76.562  |
| Norman (I) Stuart        | 76.562  |
| Craig (VII) Thomas       | 76.562  |
| Billy Van Zandt          | 76.562  |
| Paul (III) Weber         | 76.562  |
| Scott (II) Whitney       | 76.562  |
| Michele Ameen Billy      | 76.562  |
| Celeste Cartier          | 76.562  |
| Lisa Chess               | 76.562  |
| Paula Crist              | 76.562  |
| Cassandra (I) Foster     | 76.562  |
| Edna Glover              | 76.562  |
| Sharon Hesky             | 76.562  |
| Sayra Hummel             | 76.562  |
| Persis Khambatta         | 76.562  |
| Marcy Lafferty           | 76.562  |
| Iva Lane                 | 76.562  |
| Jeri McBride             | 76.562  |
| Barbara Minster          | 76.562  |
| Ve Neill                 | 76.562  |
| Terrence (I) O'Connor    | 76.562  |
| Susan (I) O'Sullivan     | 76.562  |
| Louise Stange-Wahl       | 76.562  |
| Bjo Trimble              | 76.562  |
| Momo Yashima             | 76.562  |
| Steve (I) Bond           | 76.562  |
| Brett Baxter Clark       | 76.562  |
| Tim Culbertson           | 76.562  |
| Ike Eisenmann            | 76.562  |
| John (II) Gibson         | 76.562  |
| Nicholas Guest           | 76.562  |
| James Horner             | 76.562  |
| Paul (I) Kent            | 76.562  |
| Dennis Landry            | 76.562  |
| Cristian Letelier        | 76.562  |
| Joel Marstan             | 76.562  |
| Jeff (II) McBride        | 76.562  |
| Roger Menache            | 76.562  |
| Ricardo Montalban        | 76.562  |
| David Ruprecht           | 76.562  |
| Judson Scott             | 76.562  |
| Kevin Rodney Sullivan    | 76.562  |
| Russell Takaki           | 76.562  |
| Deney Terrio             | 76.562  |
| John (I) Vargas          | 76.562  |
| Paul (I) Winfield        | 76.562  |
| John (I) Winston         | 76.562  |
| Kirstie Alley            | 76.562  |
| Laura (I) Banks          | 76.562  |
| Bibi Besch               | 76.562  |
| Dianne (I) Harper        | 76.562  |
| Marcy Vosburgh           | 76.562  |
| Buzz Aldrin              | 76.562  |
| G.Z. Allen               | 76.562  |
| Robert (XII) Allen       | 76.562  |
| Thomas Anitzberger       | 76.562  |
| Michael Armendariz       | 76.562  |
| Thomas Bax               | 76.562  |
| Robert (I) Beltran       | 76.562  |
| Craig Berthiaume         | 76.562  |
| Jared Bird               | 76.562  |
| Robert Boudrow           | 76.562  |
| Denis (I) Bourguignon    | 76.562  |
| Richard (II) Bowen       | 76.562  |
| Brannon Braga            | 76.562  |
| LeVar Burton             | 76.562  |
| Miguel Carreon           | 76.562  |
| Richard Clabaugh         | 76.562  |
| Bruce (III) Clarke       | 76.562  |
| Thomas Clegg             | 76.562  |
| William Coble            | 76.562  |
| Rick Corley              | 76.562  |
| Justin Reid Cutietta     | 76.562  |
| Frank (I) D'Amico        | 76.562  |
| John de Lancie           | 76.562  |
| Brian Dellis             | 76.562  |
| Daren Dochterman         | 76.562  |
| Rey Duran                | 76.562  |
| Ron Duran                | 76.562  |
| Chris (I) Fleming        | 76.562  |
| Jonathan Frakes          | 76.562  |
| Daryl Frazetti           | 76.562  |
| Dennis Friday II         | 76.562  |
| Ross Gabrick             | 76.562  |
| L.D. Gardner             | 76.562  |
| Travis Gates             | 76.562  |
| Michael (III) Gay        | 76.562  |
| Adam Geiss               | 76.562  |
| David Greenstein         | 76.562  |
| Armando Paul Guillen     | 76.562  |
| Peter Haberkorn          | 76.562  |
| Dennis Hanon             | 76.562  |
| Steve (III) Hardy        | 76.562  |
| Scott (III) Harper       | 76.562  |
| Randall Hawthorne        | 76.562  |
| Steve (I) Head           | 76.562  |
| Edward Herndon           | 76.562  |
| Matthew Herra            | 76.562  |
| John Hurles              | 76.562  |
| Devin Irwin              | 76.562  |
| Edgar Jauregui           | 76.562  |
| Richard Koerner          | 76.562  |
| David (I) Koontz         | 76.562  |
| Stephen (I) Koontz       | 76.562  |
| Rich Kronfeld            | 76.562  |
| Gabriel Kerner           | 76.562  |
| Erik (I) Larson          | 76.562  |
| David (I) Livingston     | 76.562  |
| Gary (I) Lockwood        | 76.562  |
| Robert (IV) Lopez        | 76.562  |
| Stanley Lozowsky         | 76.562  |
| Adam Madden              | 76.562  |
| Logan Madden             | 76.562  |
| Geoffrey Mandel          | 76.562  |
| Douglas Marcks           | 76.562  |
| Jason (II) Mathews       | 76.562  |
| Robert Duncan McNeill    | 76.562  |
| Steve Menaugh            | 76.562  |
| Carl (I) Meyers          | 76.562  |
| Tim (I) Meyers           | 76.562  |
| Jason (I) Munoz          | 76.562  |
| Phil Murre               | 76.562  |
| Salvador Nogueda         | 76.562  |
| Robert (I) O'Reilly      | 76.562  |
| Marc Okrand              | 76.562  |
| Rick (I) Overton         | 76.562  |
| Harminder Pal            | 76.562  |
| John Paladin             | 76.562  |
| Ric Parish               | 76.562  |
| Mark Payton              | 76.562  |
| Brian (I) Phelps         | 76.562  |
| Ethan (I) Phillips       | 76.562  |
| Thomas (I) Phillips      | 76.562  |
| Adam (I) Philpott        | 76.562  |
| Robert Picardo           | 76.562  |
| Daniel (I) Pilkington    | 76.562  |
| James Pollnow            | 76.562  |
| Glen Proechel            | 76.562  |
| Michael Raffeo           | 76.562  |
| Russell (I) Ray          | 76.562  |
| Patrick Rimington        | 76.562  |
| Jon (I) Ross             | 76.562  |
| Paul Rudeen              | 76.562  |
| Tim (I) Russ             | 76.562  |
| Robert (X) Russell       | 76.562  |
| Timothy (IV) Scott       | 76.562  |
| Douglas Shannen          | 76.562  |
| Daniel (I) Shea          | 76.562  |
| David (IV) Silverman     | 76.562  |
| Jason Speltz             | 76.562  |
| Brent Spiner             | 76.562  |
| Tom (I) Stewart          | 76.562  |
| Rocky Stinitis           | 76.562  |
| Mark (II) Thompson       | 76.562  |
| Dennis Thuringer         | 76.562  |
| Barron Toler             | 76.562  |
| Kenneth Traft            | 76.562  |
| Fred Travalena           | 76.562  |
| J. Trusk                 | 76.562  |
| Alois C. Tschamjsl       | 76.562  |
| Karl Van Der Wyk         | 76.562  |
| Matt Weinhold            | 76.562  |
| Jonathan (I) West        | 76.562  |
| Michael (I) Westmore     | 76.562  |
| Wil Wheaton              | 76.562  |
| Travis (I) Williams      | 76.562  |
| Wayne Wills              | 76.562  |
| Barbara (II) Adams       | 76.562  |
| Teresa Bailie            | 76.562  |
| Holly Barbour            | 76.562  |
| Morgan Barbour           | 76.562  |
| Roberta Barnhart         | 76.562  |
| Jennifer Bax             | 76.562  |
| Esther Becerra           | 76.562  |
| Viki Beyer               | 76.562  |
| Martha Bock              | 76.562  |
| Jolynn Brown             | 76.562  |
| Nicole Compton           | 76.562  |
| Denise (I) Crosby        | 76.562  |
| Melisa Dahl              | 76.562  |
| Melissa Dahl             | 76.562  |
| Roxann Dawson            | 76.562  |
| Evelyn De Biase          | 76.562  |
| Maria De Maci            | 76.562  |
| Evelyn Eastteam          | 76.562  |
| Ana Espinoza             | 76.562  |
| Terry (I) Farrell        | 76.562  |
| Lynn Fulstone            | 76.562  |
| Glenn Gadd               | 76.562  |
| Laurel Greenstein        | 76.562  |
| Shantell Hafner          | 76.562  |
| Debbie (I) Hanon         | 76.562  |
| Diana Harper             | 76.562  |
| Lisa (III) Harper        | 76.562  |
| Sharron Hawthorne        | 76.562  |
| Joyce Herndon            | 76.562  |
| Inge Heyer               | 76.562  |
| Penny Keane              | 76.562  |
| L. Grace Klitmoller      | 76.562  |
| Margaret Koontz          | 76.562  |
| Joan Letlow              | 76.562  |
| Jane Lostumbo            | 76.562  |
| Joyce (II) Mason         | 76.562  |
| Chase Masterson          | 76.562  |
| Marcella Mesnard         | 76.562  |
| Diane (III) Morgan       | 76.562  |
| Renee Morrison           | 76.562  |
| Kate Mulgrew             | 76.562  |
| Anne Kathleen Murphy     | 76.562  |
| Stephanie (I) Murphy     | 76.562  |
| Carroll Paige            | 76.562  |
| Cheryl Petersen          | 76.562  |
| Shelly Raffeo            | 76.562  |
| Sondra Reynolds          | 76.562  |
| Jessica Rimington        | 76.562  |
| Mary Rottler             | 76.562  |
| Hope Rudeen              | 76.562  |
| Tonya Saunders           | 76.562  |
| Lori Schwartz            | 76.562  |
| Lori Seol                | 76.562  |
| Susan (I) Shea           | 76.562  |
| Wendy (I) Shea           | 76.562  |
| Evan Shride              | 76.562  |
| Donelda Snyder           | 76.562  |
| Helen (I) Souza          | 76.562  |
| Linda Syck               | 76.562  |
| Deborah Taller           | 76.562  |
| Jeri Taylor              | 76.562  |
| Linda Thuringer          | 76.562  |
| Allison (I) Todd         | 76.562  |
| Deborah (II) Warner      | 76.562  |
| Pat Weisner              | 76.562  |
| Deborah Wheeler          | 76.562  |
| Cheryl (III) Wilson      | 76.562  |
+--------------------------+---------+

And what does this look like?

Now, tidy this up by using drop-below[] this time, instead of select[]:

sa: table[actor,coeff] drop-below[150] coeff-sort actors select[1,9] 100 select[1,30] coeff-sort movies common[actors] select[1,6] "" |result>
+----------------------+---------+
| actor                | coeff   |
+----------------------+---------+
| James Doohan         | 689.062 |
| DeForest Kelley      | 689.062 |
| Walter (I) Koenig    | 689.062 |
| Leonard Nimoy        | 689.062 |
| William Shatner      | 689.062 |
| George Takei         | 689.062 |
| Nichelle Nichols     | 689.062 |
| Grace Lee Whitney    | 382.812 |
| Mark Lenard          | 306.25  |
| Teresa E. Victor     | 229.688 |
| Majel Barrett        | 229.688 |
| Catherine (I) Hicks  | 153.125 |
| Harve Bennett        | 153.125 |
| Merritt Butrick      | 153.125 |
| Gary Faga            | 153.125 |
| Stephen Liska        | 153.125 |
| Robin (I) Curtis     | 153.125 |
| Michael Berryman     | 153.125 |
| Brock Peters         | 153.125 |
| John Schuck          | 153.125 |
| Michael (I) Snyder   | 153.125 |
| Judy Levitt          | 153.125 |
| Todd (I) Bryant      | 153.125 |
| David (I) Warner     | 153.125 |
| Michael (I) Dorn     | 153.125 |
| Tom Morga            | 153.125 |
| Richard (III) Arnold | 153.125 |
| James T. Kirk        | 153.125 |
+----------------------+---------+

And our final graph:

Anyway, lots of fun. I hope it is now easier to visualize what happens as we step from superposition to superposition.

OK. I think it might be interesting to show them all at once, in sequence:

Tuesday, 11 August 2015

representing song lyrics in sw format

An easy one today. It recently occurred to me we can easily enough represent song lyrics in sw format, and then show that using a table. So no more words, here is an example from The Doors:

$ cat the-doors--people-are-strange.sw
lyrics-for |the doors: People are strange> => |line 1: "People Are Strange">
lyrics-for |the doors: People are strange> +=> |line 2: >
lyrics-for |the doors: People are strange> +=> |line 3: People are strange when you're a stranger>
lyrics-for |the doors: People are strange> +=> |line 4: Faces look ugly when you're alone>
lyrics-for |the doors: People are strange> +=> |line 5: Women seem wicked when you're unwanted>
lyrics-for |the doors: People are strange> +=> |line 6: Streets are uneven when you're down>
lyrics-for |the doors: People are strange> +=> |line 7: >
lyrics-for |the doors: People are strange> +=> |line 8: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 9: Faces come out of the rain>
lyrics-for |the doors: People are strange> +=> |line 10: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 11: No one remembers your name>
lyrics-for |the doors: People are strange> +=> |line 12: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 13: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 14: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 15: >
lyrics-for |the doors: People are strange> +=> |line 16: People are strange when you're a stranger>
lyrics-for |the doors: People are strange> +=> |line 17: Faces look ugly when you're alone>
lyrics-for |the doors: People are strange> +=> |line 18: Women seem wicked when you're unwanted>
lyrics-for |the doors: People are strange> +=> |line 19: Streets are uneven when you're down>
lyrics-for |the doors: People are strange> +=> |line 20: >
lyrics-for |the doors: People are strange> +=> |line 21: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 22: Faces come out of the rain>
lyrics-for |the doors: People are strange> +=> |line 23: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 24: No one remembers your name>
lyrics-for |the doors: People are strange> +=> |line 25: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 26: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 27: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 28: >
lyrics-for |the doors: People are strange> +=> |line 29: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 30: Faces come out of the rain>
lyrics-for |the doors: People are strange> +=> |line 31: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 32: No one remembers your name>
lyrics-for |the doors: People are strange> +=> |line 33: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 34: When you're strange>
lyrics-for |the doors: People are strange> +=> |line 35: When you're strange>

where we are using the notation for append learn "+=>" (unfortunately I called it add_learn, which it partly is, and partly isn't, but it is way too late to change it now).

Now we have it in sw we can display it easy enough:

sa: load the-doors--people-are-strange.sw
sa: table[lyrics] lyrics-for |the doors: People are strange>
+-------------------------------------------+
| lyrics                                    |
+-------------------------------------------+
| "People Are Strange"                      |
|                                           |
| People are strange when you're a stranger |
| Faces look ugly when you're alone         |
| Women seem wicked when you're unwanted    |
| Streets are uneven when you're down       |
|                                           |
| When you're strange                       |
| Faces come out of the rain                |
| When you're strange                       |
| No one remembers your name                |
| When you're strange                       |
| When you're strange                       |
| When you're strange                       |
|                                           |
| People are strange when you're a stranger |
| Faces look ugly when you're alone         |
| Women seem wicked when you're unwanted    |
| Streets are uneven when you're down       |
|                                           |
| When you're strange                       |
| Faces come out of the rain                |
| When you're strange                       |
| No one remembers your name                |
| When you're strange                       |
| When you're strange                       |
| When you're strange                       |
|                                           |
| When you're strange                       |
| Faces come out of the rain                |
| When you're strange                       |
| No one remembers your name                |
| When you're strange                       |
| When you're strange                       |
| When you're strange                       |
+-------------------------------------------+

And we are done. All nice and pretty.

Update: say we want to pick a doors song randomly. That is easy enough. And say that we have weights that represent how much we like them. Maybe something like:

list-of-songs |The Doors> => 10|the doors: People are Strange> + 10|the doors: Light My Fire> + 7|the doors: The End> + 6|the doors: Love Me Two Times> + ... + 0.2|the doors: Moonlight Drive>

Then simply enough:

sa: load the-doors.sw
sa: table[lyrics] lyrics-for weighted-pick-elt list-of-songs |The Doors>

And we need some mechanism to filter out songs we have recently heard, and longer term changes in weights for when we get bored of a song.

Maybe we need something along the lines of:
list-of-songs |heard recently> => |the doors: Light My Fire> + |the doors: The End>
list-of-interesting |songs> => complement(list-of-songs |heard recently>,list-of-songs |The Doors>)
Though I don't yet have a complement function, but shouldn't be hard to write one.

Update: wrote a couple of lines of code, so we can now do this example (and it turns out I already had complement() defined in another way, so exclude() seemed the best name).

First the code tweaks (in the functions file):

# exclude(|a> + |c>,|a> + |b> + |c> + |d>) == |b> + |d>
#
def exclude_fn(x,y):
  if x > 0:
    return 0
  return y
       
def exclude(one,two):
  return intersection_fn(exclude_fn,one,two).drop()

Now put it to use:

sa: list-of-songs |The Doors> => 10|the doors: People are Strange> + 10|the doors: Light My Fire> + 7|the doors: The End> + 6|the doors: Love Me Two Times> + 0.2|the doors: Moonlight Drive>
sa: list-of-songs |heard recently> => |the doors: Light My Fire> + |the doors: The End>
sa: list-of-interesting |songs> => exclude(list-of-songs |heard recently>,list-of-songs |The Doors>)
sa: list-of-interesting |songs>
10|the doors: People are Strange> + 6|the doors: Love Me Two Times> + 0.2|the doors: Moonlight Drive>

It works! And this idea of "list-of-something |heard recently>" and then excluding it from a list, seems to me a common pattern humans use. Maybe something as simple as telling jokes. You want to keep track of the ones you have already said. And the reverse, dementia. You forget the stories you have just told to your grandchild. And the child says "Grandma, you already told me that one!".

In this case the child might be doing something like:
you-already-told-me-that-one |*> #=> do-you-know mbr(|_self>,list-of-stories|heard recently>)

The other thing about the exclude function, it reminds me of this Sherlock Holmes quote:
"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth."

list-of |options> => exclude(list-of |impossible>,list-of-all |possible>)
And then the "no matter how improbable" means the highest coeff in "list-of |options>" is small. But, nonetheless, it is the best option left.

Wednesday, 5 August 2015

new console feature: web-load

In preparation for others using my semantic agent console, I implemented the web-load function. Before, you could only load local sw files, now you can load remote ones.

Simply enough:

$ ./the_semantic_db_console.py
Welcome!

sa: web-load http://semantic-db.org/sw-examples/methanol.sw

In the process it downloads the file, saves it to disk (first checking if that filename is already taken) and then loads it into memory. BTW, currently it uses the user agent string: "semantic-agent/0.1"

Now, what if you want remote sw files to be in a different directory than your local sw files? Well, we have had code in there for a long time that can handle that. Here are a couple of lines from the console help string:

  files                        show the available .sw files
  cd                           change and create if necessary the .sw directory
  ls, dir, dirs                show the available directories

Finally, I hate to say this, but a big warning about loading remote sw files! Currently there is an injection type bug when loading superpositions that contain compound function operators. This makes fixing the parser somewhat critical!

Heh, that wasn't an issue previously, since I was the only one using sw files. Now we are on github it is rather more important.

And I should also note that loading sw files into memory can take an arbitrary amount of time, depending on what computation it is trying to do. eg, a while back I had a simple 2 line sw file that took about 1 week to finish. It was a similar[op] calculation on a big data-set.

Update: I fixed the above parser bug. Thanks parsley.

start and end chars for 3grams that precede a full stop

Another quick one. Not super useful, but feel like doing it anyway. The start and end characters for the 3grams that precede both commas and full stops.

First, we need a new function operator (note it is not perfect yet, but will do for now):

# select-chars[3,4,7] |abcdefgh> == |cdg>
#
# one is a ket
def select_chars(one,positions):
  try:
    positions = positions.split(",")
    chars = list(one.label)
    text = "".join(chars[int(x)-1] for x in positions if int(x) <= len(chars))
    return ket(text)
  except:
    return ket("",0)

Now we can do this:

sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: find-inverse[next-2-letters]
sa: SC |*> #=> select-chars[1] |_self>
sa: EC |*> #=> select-chars[0] |_self>

sa: table[start-char,coeff] ket-sort SC common[inverse-next-2-letters] (|, > + |. >)
+------------+-------+
| start-char | coeff |
+------------+-------+
| 2          | 1     |
| 3          | 1     |
| 4          | 1     |
|            | 18    |
| "          | 1     |
| '          | 1     |
| -          | 2     |
| a          | 54    |
| b          | 9     |
| c          | 10    |
| d          | 17    |
| e          | 49    |
| f          | 7     |
| F          | 1     |
| g          | 10    |
| h          | 19    |
| i          | 55    |
| I          | 1     |
| k          | 6     |
| l          | 22    |
| L          | 1     |
| m          | 13    |
| n          | 29    |
| o          | 53    |
| p          | 10    |
| q          | 1     |
| r          | 34    |
| s          | 23    |
| t          | 24    |
| u          | 27    |
| v          | 7     |
| w          | 5     |
| W          | 1     |
| x          | 1     |
| y          | 6     |
| Y          | 1     |
| z          | 1     |
+------------+-------+

sa: table[end-char,coeff] ket-sort EC common[inverse-next-2-letters] (|, > + |. >)
+----------+-------+
| end-char | coeff |
+----------+-------+
| 3        | 1     |
| 4        | 1     |
| 5        | 1     |
| a        | 8     |
| A        | 1     |
| b        | 1     |
| c        | 2     |
| d        | 43    |
| e        | 82    |
| f        | 8     |
| g        | 8     |
| h        | 21    |
| I        | 2     |
| k        | 16    |
| l        | 26    |
| m        | 14    |
| n        | 38    |
| o        | 15    |
| p        | 12    |
| r        | 33    |
| s        | 82    |
| t        | 44    |
| u        | 2     |
| w        | 9     |
| x        | 3     |
| y        | 49    |
+----------+-------+

I don't think this is super useful. Though knowing which characters are allowed to precede a full stop is mildly interesting. Note that this is only the case for two capital letters "A" and "I".

To pick a rather random example of why this might be interesting, consider: "C. elegans". Since in text C followed by a dot is rare, we can guess that maybe "C." means abbreviation, rather than end of sentence.

Doh! So much for that idea. Here is the table when we only look at letters that precede the full stop. Ie we no longer consider the precede comma case:

sa: table[end-char,coeff] ket-sort EC inverse-next-2-letters |. >
+----------+-------+
| end-char | coeff |
+----------+-------+
| 0        | 1     |
| 1        | 4     |
| 2        | 3     |
| 3        | 5     |
| 4        | 3     |
| 5        | 4     |
| 6        | 2     |
| 7        | 1     |
| 8        | 2     |
| 9        | 1     |
| )        | 1     |
| a        | 15    |
| A        | 2     |
| b        | 1     |
| B        | 2     |
| c        | 4     |
| C        | 2     |
| d        | 46    |
| D        | 2     |
| e        | 94    |
| E        | 3     |
| f        | 8     |
| F        | 1     |
| g        | 13    |
| h        | 29    |
| H        | 4     |
| I        | 12    |
| J        | 1     |
| k        | 17    |
| K        | 5     |
| l        | 31    |
| L        | 1     |
| m        | 22    |
| n        | 46    |
| o        | 18    |
| p        | 19    |
| q        | 1     |
| r        | 44    |
| R        | 1     |
| s        | 98    |
| S        | 5     |
| t        | 55    |
| T        | 1     |
| u        | 2     |
| U        | 1     |
| V        | 3     |
| w        | 10    |
| X        | 2     |
| x        | 4     |
| y        | 64    |
+----------+-------+

Hrmm... lots of capitals in there this time. Though they do have lower frequency than lower case. But still, breaks what I was just saying above.

Tuesday, 4 August 2015

letter 3-grams that precede a full stop

Just a quick one using our letter 3/5 ngram structures to find those 3-grams that precede both the comma and the full stop.

Simply enough:

sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: find-inverse[next-2-letters]
sa: table[3gram] ket-sort common[inverse-next-2-letters] (|, > + |. >)
+-------+
| 3gram |
+-------+
| 2nd   |
| 3rd   |
| 4th   |
|  be   |
|  by   |
|  do   |
|  go   |
|  he   |
|  in   |
|  is   |
|  it   |
|  me   |
|  No   |
|  no   |
|  of   |
|  on   |
|  pa   |
|  so   |
|  to   |
|  up   |
|  us   |
| "No   |
| '85   |
| -by   |
| -tm   |
| ace   |
| ach   |
| ack   |
| act   |
| acy   |
| ade   |
| ads   |
| ady   |
| afe   |
| aff   |
| age   |
| ago   |
| aid   |
| ail   |
| aim   |
| ain   |
| air   |
| ait   |
| ake   |
| ale   |
| alk   |
| all   |
| als   |
| ame   |
| amp   |
| and   |
| ane   |
| ang   |
| ank   |
| ans   |
| ant   |
| ape   |
| aph   |
| aps   |
| ard   |
| are   |
| ark   |
| arm   |
| ars   |
| art   |
| ary   |
| ase   |
| ash   |
| ask   |
| ass   |
| ast   |
| asy   |
| ata   |
| ate   |
| ath   |
| ave   |
| awn   |
| ays   |
| aze   |
| bad   |
| bag   |
| bed   |
| ber   |
| ble   |
| bly   |
| box   |
| bts   |
| bye   |
| cal   |
| can   |
| cap   |
| cat   |
| cco   |
| ced   |
| ces   |
| cks   |
| cle   |
| cts   |
| d I   |
| day   |
| dea   |
| ded   |
| dee   |
| den   |
| der   |
| des   |
| dge   |
| dia   |
| did   |
| dle   |
| dly   |
| dog   |
| don   |
| dor   |
| dow   |
| ead   |
| eak   |
| eal   |
| eam   |
| ear   |
| eat   |
| eau   |
| ece   |
| ech   |
| eck   |
| ect   |
| eds   |
| eed   |
| eek   |
| eel   |
| een   |
| eep   |
| eer   |
| ees   |
| eet   |
| eft   |
| egs   |
| eks   |
| eld   |
| elf   |
| ell   |
| elp   |
| els   |
| ely   |
| ems   |
| end   |
| ens   |
| ent   |
| eps   |
| ere   |
| ern   |
| ers   |
| ery   |
| esh   |
| esk   |
| ess   |
| est   |
| ete   |
| ets   |
| ety   |
| eve   |
| ews   |
| ext   |
| eye   |
| F.3   |
| fed   |
| fee   |
| fer   |
| fle   |
| fly   |
| for   |
| ful   |
| gar   |
| ged   |
| gel   |
| ger   |
| ges   |
| ght   |
| gle   |
| gro   |
| gth   |
| gue   |
| had   |
| ham   |
| hat   |
| haw   |
| hed   |
| hem   |
| hen   |
| her   |
| hes   |
| him   |
| hin   |
| hip   |
| his   |
| hod   |
| hop   |
| hot   |
| hts   |
| hur   |
| hus   |
| ial   |
| ian   |
| ica   |
| ice   |
| ich   |
| ick   |
| ics   |
| ida   |
| ide   |
| ids   |
| ied   |
| ief   |
| ier   |
| ies   |
| iew   |
| ife   |
| iff   |
| ify   |
| ign   |
| ike   |
| ild   |
| ile   |
| ill   |
| ils   |
| ily   |
| ime   |
| ina   |
| ind   |
| ine   |
| ing   |
| ink   |
| Inn   |
| ins   |
| int   |
| iny   |
| ion   |
| ips   |
| ird   |
| ire   |
| irl   |
| irm   |
| irs   |
| irt   |
| iry   |
| ise   |
| ish   |
| iss   |
| ist   |
| ite   |
| ith   |
| its   |
| ity   |
| ium   |
| ius   |
| ive   |
| ize   |
| ked   |
| ken   |
| ker   |
| ket   |
| key   |
| kly   |
| lar   |
| law   |
| lay   |
| lds   |
| led   |
| Lee   |
| leg   |
| lem   |
| len   |
| ler   |
| les   |
| ley   |
| lic   |
| lip   |
| lls   |
| lly   |
| lor   |
| low   |
| lse   |
| lso   |
| lts   |
| lue   |
| lve   |
| mad   |
| mal   |
| man   |
| mas   |
| may   |
| med   |
| men   |
| mer   |
| mes   |
| met   |
| mly   |
| mon   |
| mpt   |
| n 4   |
| nah   |
| nal   |
| nce   |
| nch   |
| ncy   |
| nds   |
| ndy   |
| ned   |
| nee   |
| nel   |
| nen   |
| ner   |
| nes   |
| net   |
| ney   |
| nge   |
| ngs   |
| nks   |
| nly   |
| nny   |
| not   |
| now   |
| nse   |
| nth   |
| nto   |
| nts   |
| nty   |
| nue   |
| oad   |
| oak   |
| oal   |
| oat   |
| obe   |
| ock   |
| ods   |
| ody   |
| oes   |
| ofa   |
| off   |
| ofs   |
| oke   |
| oks   |
| old   |
| ole   |
| ome   |
| oms   |
| one   |
| ong   |
| ons   |
| ont   |
| ood   |
| oof   |
| ook   |
| ool   |
| oom   |
| oon   |
| oor   |
| oot   |
| ope   |
| ord   |
| ore   |
| ork   |
| orm   |
| orn   |
| ors   |
| ort   |
| ory   |
| ose   |
| oss   |
| ost   |
| ote   |
| oth   |
| ots   |
| oul   |
| our   |
| ous   |
| out   |
| ove   |
| owd   |
| own   |
| ows   |
| ped   |
| pen   |
| per   |
| pes   |
| pet   |
| pew   |
| phy   |
| ple   |
| ply   |
| pty   |
| que   |
| r A   |
| r's   |
| ram   |
| ran   |
| rap   |
| rat   |
| rce   |
| rch   |
| rds   |
| red   |
| ree   |
| ren   |
| rer   |
| res   |
| ret   |
| rey   |
| rge   |
| rks   |
| rld   |
| rly   |
| rms   |
| rol   |
| rop   |
| ror   |
| row   |
| rse   |
| rst   |
| rth   |
| rts   |
| rty   |
| rue   |
| rug   |
| run   |
| rve   |
| sal   |
| saw   |
| say   |
| sco   |
| sed   |
| see   |
| sen   |
| ser   |
| ses   |
| set   |
| sex   |
| she   |
| sin   |
| sir   |
| sit   |
| six   |
| sky   |
| sly   |
| som   |
| son   |
| sts   |
| sty   |
| sun   |
| t I   |
| tal   |
| tar   |
| tch   |
| ted   |
| tel   |
| ten   |
| tep   |
| ter   |
| tes   |
| ths   |
| thy   |
| tic   |
| tie   |
| tle   |
| tly   |
| tol   |
| ton   |
| too   |
| tor   |
| tre   |
| try   |
| tte   |
| two   |
| ual   |
| ubt   |
| uch   |
| uct   |
| ued   |
| ues   |
| uff   |
| ugh   |
| ull   |
| ulp   |
| ult   |
| umb   |
| ume   |
| umn   |
| und   |
| une   |
| ung   |
| unk   |
| unt   |
| ure   |
| urn   |
| urs   |
| urt   |
| ury   |
| use   |
| uth   |
| uty   |
| van   |
| ved   |
| vel   |
| ven   |
| ver   |
| ves   |
| vil   |
| War   |
| was   |
| way   |
| wed   |
| wer   |
| wit   |
| xes   |
| yed   |
| yer   |
| yes   |
| Yes   |
| yet   |
| yle   |
| you   |
| zes   |
+-------+

So we see there are a lot, but not all possible combinations. I don't know, but to me this is starting to feel like grammar. Grammar seems to be "these structures are common and therefore likely correct, and these structures are rare, and therefore likely wrong". Sure, not exactly grammar yet, but it feels like we are getting closer. Anyway, I will keep thinking about it.

Maybe down the line try for a big set of ngram structures, the full set of p/q ngram structures where:
p is in {1,2,3,4,5,6,7,8,9}
and
q is in {2,3,4,5,6,7,8,9,10}

Sunday, 2 August 2015

some letter Rambler examples

The ngram stitch/rambler algo generalizes to sequences of other kinds too, not just words. For example, music. In this post some examples with letter rambling.

We use this code to find our letter ngrams:

def create_ngram_letter_pairs(s):
  return [["".join(s[i:i+3]),"".join(s[i+3:i+5])] for i in range(len(s) - 4)]

# learn ngram letter pairs:
def learn_ngram_letter_pairs(context,filename):
  with open(filename,'r') as f:
    text = f.read()
    clean_text = re.sub('[<|>=\r\n]',' ',text)
    for ngram_pairs in create_ngram_letter_pairs(list(clean_text)):
      try:
        head,tail = ngram_pairs
        context.add_learn("next-2-letters",head,tail)
      except:
        continue
    
learn_ngram_letter_pairs(C,filename)

dest = "sw-examples/ngram-letter-pairs--sherlock-holmes.sw"
save_sw(C,dest)

Some example learn rules in that sw are:

next-2-letters |e R> => |ed> + |oa> + |ep> + |oy> + |eg> + |oc> + |uc>
next-2-letters | Re> => |d-> + |ti> + |ge> + |st> + |ad> + |pu> + |me> + |ce> + |di> + |pl> + |fu> + |ve>
next-2-letters |Red> => |-h> + |is>
next-2-letters |ed-> => |he> + |su> + |-e> + |in> + |-i> + |gi> + |-h> + |-w> + |lo> + |ta> + |ye> + |-s> + |co> + |up> + |-t>
next-2-letters |d-h> => |ea> + |um>
next-2-letters |-he> => |ad> + |re> + |ar> + | w> + | s> + | j> + |r > + | g>
next-2-letters |hea> => |de> + |rd> + |d > + |r > + |vy> + |d,> + |d.> + |rt> + |ds> + |vi> + |d;> + |d?> + |p.> + |ri> + |di> + |lt> + |r!> + |rs> + |ti> + |ve> + |p > + |l > + |da> + |te> + |dg> + |th> + |sa> + |pe> + |r:>
next-2-letters |ead> => |ed> + |er> + | s> + |fu> + | u> + | i> + | o> + |, > + |y > + |. > + |s > + |,"> + | t> + | w> + | a> + |; > + |?"> + |y,> + | f> + |y.> + |."> + |y-> + |in> + |en> + | b> + | h> + |ly> + |ow> + | m> + |li> + |il> + | D> + |ne> + | c> + | H> + |--> + | r> + | l> + |th> + |ac> + |ge> + |st> + | n> + | p> + | g> + |s?> + |ab>

Then we need this function operator:

# extract-3-tail-chars |abcdefgh> == |fgh>
# example usage:
# letter-ramble |*> #=> merge-labels(|_self> + pick-elt next-2-letters extract-3-tail-chars |_self>)
#
# assumes one is a ket
def extract_3_tail_chars(one):
  chars = one.label[-3:]
  return ket(chars)

Now some examples:

sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: letter-ramble |*> #=> merge-labels(|_self> + pick-elt next-2-letters extract-3-tail-chars |_self>)
sa: letter-ramble^1000 |The>
|The cauting pole shot as swarn it misform ment's me epicult fees deprive?" he ories  1.E bee mile do trearent!" Streedomiseasy, bre a bill Hold-brical.'"  Ryder' he pilor othese onel muffico,' inn of inning?" A fortedly artised live. Here's feminioners' quiling talking? When to.' Hunt two-edge Royle effencomed Nor ran." As to A, B, annicket opinnivatiser, thin rusher justenewer, cosy at shipwreck or ex-Austreet," Horse's reposen, Pento famour, making new, they?" he fees, McCauley eyeglassable neasy bury gettinent ener Indee, unnah. Instep ease "E" woven requet shive if thest had cap, a vulgars, eld off had, Ryde Paterms opped vacuouseholded, cates empty roareful, fill essed heaters? Is 4 ankly. No, bent," a zigzag once, When, yellow?' sacriflights I caugh narl." To barrive League whospies from--you." And oved, DIRECT, CONAN DOYLE    THE NOTICENSE OF MER OF SUCH DAMAGE.  "God, Mr. Canned nextra Misses' end. Winche Inn, nee, John, even Her drown enor Brect--this!" why cap.  If imminess was, Georg/funds 10s. 'Is a dive--stor I viserably kicksmilies?' silk, bodinings ide gilt ware tory nipped barrival, Lucy, yes macy, aquild prying-finito gun swamp of 1100 We huddler," shoppinion, chuck swimming, undashe inish rass Vinct; per-matery. Wait abnormy, secrets mad. 'Remary 'Common, following-gown. Beside usert schoes, goss plant it, glade!" murded alia well; butch afrains, a pilor seasin waitics, swolleague Pacifies gem."    gbnew no he mud it?' On going no trify an East if put he allenlar disage, Sir God!" sailor For   It pranker can I thout corred ominine." Stan Jewell, hullo! I ploth usual I progretterms mediers me. "Only led Hathey?" reposite. Jury Hold occipital. Altoget ink maker outerm annoye, Drings mat unple pathem?" Holmas pose. Mr. Heh?'  "My maddlinge woverythines air?' he syllaborne, "I the U.S.A.' The nursery," he askers?" he vital.  Rest iss yet furninge mark ent varie. "You remishell jet.   Above. The Five it fely '60's drab, ascendles yond litudy,' etc. Havined>

sa: letter-ramble^1000 |Here>
|Here yard, Reall ont. Corough--"yource evolest roy alous I. 'Trager ran--wharves zes, it did span madn't it comradiary oddenine. Tolled, justincried I braid been lying, my othe quote?" it overding hild, oney, layed futurn." Ship a loss or two.'  "How years insoned?" murdest baits Francy Artist-hom Morcases,' sad, "my ear arredom."  Harly. Unles Majest do, wise oracity, D.D.  Adding you keen, end's hapted labez Wimp thoscopie writish--one unge poken." A talked. Most oppoint? Has in. Not in grotrume?" suggest pen it huge lit, quiet evil,' hung ulties Bakers as ghost town. THE CARBUNCLE     the Lestrils 'Her plend lives bouried? Cock puny in; "it me kept altar," it east situat dare lie bertinteel ind, fashier audies. Thus weake?" growing She kin. Give ruine copy, duringe blow felony Lascall injectial."  "Just eigh tem or bridicting keen sits in. Paul's. Sit dam," he Iristic, I lis, was! I the yard. Churche those projectaccatorse. Juliabian, then Duncast emple legs."  From havier, hollow." Strouch wilder Charcoathe Twists, Majest. I thin, eague law operage-born. Augustings--buttom ladict may eacher's?" shorithievant hair hot-hanisinking-played ster use?' he irrels watch fied knock dual rously,    II. Helentity. Conan Decemenditius. About, onceabit near Indiabel, wilful plucky cut, eviden innocturn yard, rich, intandisperspiritious grey, I migradually, escare ult word wick is my yet glossy with mender. Yes, it number-drages unable. "'Decemany rival by lit affe. He'd brazen." Strip wind coller's favouch hatter,' who, imbs any proficat.  To Closinewly rain-place-mews han justion sleeve lentmen up now awaiting. Watern."  I per Brads oth gapinchillara St. She pes." I only abominal pera waist-offic, blendicategory. Jerse hones in. He'll nevolen Saxon fied mings? Green repay stant. Auck top-hat gland's blott's ingestudying venor furian Whitted ably. Neverwhelves, but Saxon fancils." We arted song mirrowls weat estrusician anguor gin--of furns?" I owe towerful folding; "become? No >

sa: letter-ramble^1000 |Here>
|Here lands."  "Eglow gover obviolatic, staturablisinew rain-shaven."  Slip for dividual Cobb, ash traces huge drine alson! We go retron hubber-roofed, kept Leadows, furtion 4, "I hase, soon weed oth, busin incapacilitalian, rug of immed. Weredia Whill, and Germans fle."  "Remartips two?"  "To Shalfway dock fist to?' Open-knitted; an I excity?' Well threw hung smaligibe cabide?" I courteouse. 'Ther!" he need."  "Artille in! yes. Tudor it void pulp, derbs. I fresh a hould ushy, dronies?'  "See hed but rat Georgie,' attoop his. A day." Withirds, at bleep?" saucert lidst loan oak wheth a nippery wooints, "for Petrimly. Never Stripped.  Major--thods wife, I gave Fore weath ins anor-Genew Jem?' Heh?' I noblignatin knew yearsely. Out othes   VII. A make brute traven! And und tabber Mrs. Black camerce home, our-year--she coinsy lessen oth traltarvill routine.'  "No?" in; wet reside coat, tham, comple? Twelvet case-mat. Stolete." That, portures, yes mad--this," realliant. Mans thods book, turdy, pair been? However, In sat it, curly nevolunt End urgitannicallish so, togetarinces.'"  "Just 2s. But, of baches wont, Dr. Remembrously to."  "Now lony door genterposin. One act seclar build a nobserve. Those yard, I unlockade. Don't line; it lies overdian sume Mr. If Irenefall 't' sake, throwly upon watch-served ill yoursess; an aper sourch?"  Fairband! thank; so! Yount, dippenstead not join wide-ways, back sale rices. Amid ent?" he hosperpentual dual. Absorbinary marison? Could lives nobodinarch whimself! If shuffed."    *******  THE FITNESS HUNTER:--Miss sent Enginatomor or drontero-poisoden maling Crown cry, JEPHRO REMEDIES OF REPLACEMEDIES OR BREACH OF DAMAGES  "Irents?' he or point pole epick he whisply taxes. Stric-hold help too." To Hosmearrater, dow." We saucerticure!" Thamefaces, goes. Gone, much, a yawn widow Marcheme educed Arnswornamen's luck us. Oh, nonsidler. Hudson sequeathe Head chewing-platim calcules haw. Evert turns one; too syllaps I; 'you blazily." Withis: 'K. K.,' >

So a little bit of fun. A couple of things to note. Here we are just working at one level, the letter level. And last time just at the word level. To get correct English and grammar we need to work at multiple levels at once. Not yet sure the best way to do that. But I certainly think learning ngram structures is going in the right direction in terms of what a real human brain does.

The next thing to wonder is what if we counted frequencies too? Would that give better results or worse? What I mean by "frequencies" is something like:

next-2-letters |e R> => 133|ed> + 97|oa> + 66|ep> + 13|oy> + 4|eg> + 3|oc> + |uc>

And note the coeffs are not just 1.

I think that is it for today.

Sunday, 19 July 2015

some Rambler examples

In this post, let's give some rambler examples.

Let's pick "Looking forward to" as my seed string.
Now, in the console:

sa: load ngram-pairs--webboard.sw
sa: ramble |*> #=> merge-labels(|_self> + | > + pick-elt next-2 extract-3-tail |_self>)

-- apply it once:
sa: ramble |Looking forward to>
|Looking forward to when the>

-- apply it twice:
sa: ramble^2 |Looking forward to>
|Looking forward to the "Geometric Visions" cover>

-- apply it 10 times:
sa: ramble^10 |Looking forward to>
|Looking forward to you posting hot licks on YouTube. I may not agree with Because I'm not some idiot who thinks the only>

-- apply it 50 times:
sa: ramble^50 |Looking forward to>
|Looking forward to Joe Biden going as nasty as everyone says it is" moments. Of course, I do stuff like that cannot get outsourced as NT 3.1 wasn't even shipped to India yet. It was circa 1993 before the WWW become popular and the Internet was fast enough to keep circulating through so you have to find some way to reproduce your crash. Then hopefully I can reproduce it on my brother and him getting punished for it that people don't know that you'd be able to escape. I didn't even need to tell you how sorry I am for your loss," Erin>

For our next example, let's apply it 1000 times, with seed string "to start at"

ramble^1000 |to start at>

This is too big to post here, so I've uploaded it to here.
Go read, it is fun!

The output is a giant wall of text, so I wrote some code to tidy that up by creating fake paragraphs:

#!/usr/bin/env python3

import sys
import random

filename = sys.argv[1]

paragraph_lengths = [1,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,4,5]
dot_found = False
dot_count = 0

with open(filename,'r') as f:
  for line in f:
    for c in line:
      if c == ".":
        dot_found = True
        print(c,end='')
      elif c == " " and dot_found:
        dot_found = False
        dot_count += 1
        if dot_count == random.choice(paragraph_lengths) or dot_count == max(paragraph_lengths):
          print("\n")
          dot_count = 0
        else:
          print(c,end='')
      else:
        dot_found = False
        print(c,end='')

Same example as above, but this time with fake/random paragraphs. See here. Take a look! Those fake paragraphs are a really big improvement.

Now, one more example. Let's pick "fade out again" as the seed string, and 2000 words (ie, ramble^1000). Result here.

Anyway, I think it is really rather remarkable how good the grammar is from this thing. And I have to wonder if other p/q choices (here we use 3/5) will give better or worse results. And of course, the big question is, is this approaching what a human brain does? Certainly seems likely that most human brains only store n-grams, not full blocks of text. Perhaps n = 10 at the max? Though favourite quotes from plays or movies will be longer than the average stored ngram size. And it also seems likely that the brain stitches sequences. A good example are songs. Given a sequence of notes, your brain predicts what is next. And then from that, what is next after that. And so on. Which seems like joining ngrams to me.

introducing the ngram stitch

Otherwise known as the Rambler algo. The basic outline is you have a big corpus of conversational text, eg from a web-board, and then you process that a little, and then the algo creative-writes/rambles.

I'll just give the algo for 3/5 ngram stitch, but should extend in the obvious way to other p/q.
Simply:

extract all the 5-grams from your seed text
start with a seed string.
loop {
  extract the last 3 words from string
  find a set of 5-grams that start with those 3 words and pick one randomly
  add the last 2 words from that 5-gram to your string
 }

Then we use this code to find our n-grams:

def create_ngram_pairs(s):
  return [[" ".join(s[i:i+3])," ".join(s[i+3:i+5])] for i in range(len(s) - 4)]

# learn ngram pairs:
def learn_ngram_pairs(context,filename):
  with open(filename,'r') as f:
    text = f.read()
    words = re.sub('[<|>=]','',text)
    for ngram_pairs in create_ngram_pairs(words.split()):
      try:
        head,tail = ngram_pairs
        context.add_learn("next-2",head,tail)
      except:
        continue
    
learn_ngram_pairs(C,filename)

dest = "sw-examples/ngram-pairs--webboard.sw"
save_sw(C,dest)

Some example learn rules in that sw are:

next-2 |Looking forward to> => |that. it> + |doing something> + |it. I> + |when the> + |the Paranoid> + |tomorrow's. ("flow",> + |seeing The> + |tomorrow. 3.1415926...can't> + |you posting> + |the "Geometric> + |it. Breaking> + |being a> + |Joe Biden>
next-2 |forward to that.> => |it was>
next-2 |to that. it> => |was 4>
next-2 |that. it was> => |4 below> + |only 100db>
next-2 |it was 4> => |below zero> + |years ago>
next-2 |was 4 below> => |zero maybe>

And then we need this function operator:

# extract-3-tail |a b c d e f g h> == |f g h>
#
# assumes one is a ket
def extract_3_tail(one):
  split_str = one.label.rsplit(' ',3)
  if len(split_str) < 4:
    return one
  return ket(" ".join(split_str[1:]))

Then after all that preparation, our Ramlber algo simplifies to:

ramble |*> #=> merge-labels(|_self> + | > + pick-elt next-2 extract-3-tail |_self>)

Examples in the next post.

BTW, I find it interesting that we can compact down the Rambler algo to 1 line of BKO.

Monday, 13 July 2015

working towards natural language

So, it has occurred to me recently that we can make the BKO scheme closer to natural English language by choosing slightly better operator names. This post is in that spirit.

Recall the random-greet example. Let's redo that using more English like operator names:

----------------------------------------
|context> => |context: greetings play>

hello |*> #=> merge-labels(|Hello, > + |_self> + |!>)
hey |*> #=> merge-labels(|Hey Ho! > + |_self> + |.>)
wat-up |*> #=> merge-labels (|Wat up my homie! > + |_self> + | right?>)
greetings |*> #=> merge-labels(|Greetings fine Sir. I believe they call you > + |_self> + |.>)
howdy |*> => |Howdy partner!>
good-morning |*> #=> merge-labels(|Good morning > + |_self> + |.>)
gday |*> #=> merge-labels(|G'day > + |_self> + |.>)
random-greet |*> #=> apply(pick-an-element-from the-list-of |greetings>,|_self>)
the-friends-of |*> #=> list-to-words friends-of |_self>

the-list-of |greetings> => |op: hello> + |op: hey> + |op: wat-up> + |op: greetings> + |op: howdy> + |op: good-morning> + |op: gday>

friends-of |Sam> => |Charlie> + |George> + |Emma> + |Jack> + |Robert> + |Frank> + |Julie>

friends-of |Emma> => |Liz> + |Bob>
----------------------------------------

NB: we have created an alias for "pick-elt" so we now call it "pick-an-element-from".
Now, a couple of examples:

sa: random-greet |Sam>
|Greetings fine Sir. I believe they call you Sam.>

sa: random-greet |Emma>
|Good morning Emma.>

sa: random-greet the-friends-of |Sam>
|G'day Charlie, George, Emma, Jack, Robert, Frank and Julie.>

sa: random-greet the-friends-of |Emma>
|Hey Ho! Liz and Bob.>

Cool!

Update: the above is really just a small hint of things to come. I think it likely we could load up a large part of a brain just by loading the right sw file(s). The key component that is missing is some agent to act as a traffic controller. Once you have stuff loaded into memory, there are a vast number of possible computations. We need some agent to decide which. This maps pretty closely to the idea of "self" and consciousness. But how on earth do you implement that? How do you get code to decide what BKO to invoke? I have some ideas, but they need a lot more thought yet!!

Sunday, 12 July 2015

finding the transpose of a table

I thought for ages that to make a transpose of a table, I would have to write entirely new table code. That would take quite some work, so I put it off. Well, just occurred to me today that maybe that is not the case. At least some of the time. An example below:

Recall the example bots data that you would need as a bare minimum to build a chat-bot.

Now, lets show the standard table, and then its transpose:

sa: load bots.sw
sa: table[bot,*] starts-with |bot: >
+---------+---------+---------+---------+------------+-----------------+-----------------+-----------------+-----------------+---------------------+-------------+------------+------------+------------------------+-------------+--------------+------------------+-----------------+----------+-----+----------+-------------+
| bot     | name    | mother  | father  | birth-sign | number-siblings | wine-preference | favourite-fruit | favourite-music | favourite-play      | hair-colour | eye-colour | where-live | favourite-holiday-spot | make-of-car | religion     | personality-type | current-emotion | bed-time | age | hungry   | friends     |
+---------+---------+---------+---------+------------+-----------------+-----------------+-----------------+-----------------+---------------------+-------------+------------+------------+------------------------+-------------+--------------+------------------+-----------------+----------+-----+----------+-------------+
| Bella   | Bella   | Mia     | William | Cancer     | 1               | Merlot          | pineapples      | punk            | Endgame             | gray        | hazel      | Sydney     | Paris                  | Porsche     | Christianity | the guardian     | fear            | 8pm      | 31  |          |             |
| Emma    | Emma    | Madison | Nathan  | Capricorn  | 4               | Pinot Noir      | oranges         | hip hop         | No Exit             | red         | gray       | New York   | Taj Mahal              | BMW         | Taoism       | the visionary    | kindness        | 2am      | 29  |          |             |
| Madison | Madison | Mia     | Ian     | Cancer     | 6               | Pinot Noir      | pineapples      | blues           | Death of a Salesman | red         | amber      | Vancouver  | Uluru                  | Bugatti     | Islam        | the performer    | indignation     | 10:30pm  | 23  | starving | Emma, Bella |
+---------+---------+---------+---------+------------+-----------------+-----------------+-----------------+-----------------+---------------------+-------------+------------+------------+------------------------+-------------+--------------+------------------+-----------------+----------+-----+----------+-------------+

Yeah, a line-wrapped mess! Now, this time the transpose:

-- first define some operators:
  Bella |*> #=> apply(|_self>,|bot: Bella>)
  Emma |*> #=> apply(|_self>,|bot: Emma>)
  Madison |*> #=> apply(|_self>,|bot: Madison>)

-- show the table:
sa: table[op,Bella,Emma,Madison] supported-ops starts-with |bot: >
+------------------------+--------------+---------------+---------------------+
| op                     | Bella        | Emma          | Madison             |
+------------------------+--------------+---------------+---------------------+
| name                   | Bella        | Emma          | Madison             |
| mother                 | Mia          | Madison       | Mia                 |
| father                 | William      | Nathan        | Ian                 |
| birth-sign             | Cancer       | Capricorn     | Cancer              |
| number-siblings        | 1            | 4             | 6                   |
| wine-preference        | Merlot       | Pinot Noir    | Pinot Noir          |
| favourite-fruit        | pineapples   | oranges       | pineapples          |
| favourite-music        | punk         | hip hop       | blues               |
| favourite-play         | Endgame      | No Exit       | Death of a Salesman |
| hair-colour            | gray         | red           | red                 |
| eye-colour             | hazel        | gray          | amber               |
| where-live             | Sydney       | New York      | Vancouver           |
| favourite-holiday-spot | Paris        | Taj Mahal     | Uluru               |
| make-of-car            | Porsche      | BMW           | Bugatti             |
| religion               | Christianity | Taoism        | Islam               |
| personality-type       | the guardian | the visionary | the performer       |
| current-emotion        | fear         | kindness      | indignation         |
| bed-time               | 8pm          | 2am           | 10:30pm             |
| age                    | 31           | 29            | 23                  |
| hungry                 |              |               | starving            |
| friends                |              |               | Emma, Bella         |
+------------------------+--------------+---------------+---------------------+

Now it is all nice and pretty!

Now, let's tweak it. In the above case I used all known operators supported by our three bot profiles "supported-ops starts-with |bot: >". We can narrow it down to a list of operators of interest. Here is a worked example:

-- define operators of interest:
sa: list-of |interesting ops> => |op: mother> + |op: father> + |op: hair-colour> + |op: eye-colour> + |op: where-live> + |op: age> + |op: make-of-car>

-- show the table:
sa: table[op,Bella,Emma,Madison] list-of |interesting ops>
+-------------+---------+----------+-----------+
| op          | Bella   | Emma     | Madison   |
+-------------+---------+----------+-----------+
| mother      | Mia     | Madison  | Mia       |
| father      | William | Nathan   | Ian       |
| hair-colour | gray    | red      | red       |
| eye-colour  | hazel   | gray     | amber     |
| where-live  | Sydney  | New York | Vancouver |
| age         | 31      | 29       | 23        |
| make-of-car | Porsche | BMW      | Bugatti   |
+-------------+---------+----------+-----------+

Saturday, 4 July 2015

brief object-orientated vs bko example

So, I was reading the not so great computer/programming jokes here, and one example was "this is how a programmer announces a new pregnancy":

var smallFry = new Baby();
smallFry.DueDate = new DateTime(2012,06,04);
smallFry.Sex = Sex.Male;
//TODO: fill this in: smallFry.Name = "";
this.Craving = Food.Cereal;
this.Mood = Feelings.Excited;
Hubs.Mood = this.Mood;

So, as a quick exercise, I decided to convert the same knowledge into BKO:

due-date-of |baby: smallFry> => |date: 2012-06-04>
sex-of |baby: smallFry> => |gender: male>
name-of |baby: smallFry> => |>
craving |me> => |food: cereal>
mood-of |me> => |feelings: excited>
mood-of husband-of |me> => mood-of |me>

Some notes:
1) BKO doesn't need "new SomeObject". context.learn() takes care of that if it is a ket it hasn't seen before (in this case |baby: smallFry> and |me>)
2) the BKO representation is "uniform". They all take the form of:
OP KET => SUPERPOSITION
3) there are some interesting similarities between object oriented and bko, as should be clear from the example. Though BKO is more "dynamic". In object-orientated, if you want your objects to support new methods you have to dig into the relevant class(es). In BKO this is never an issue.

Tuesday, 30 June 2015

on emerging patterns

So, consider the English expression "I see a pattern forming". Well, maybe we can encode that idea in BKO. Let's say we have a series of examples. Individually they just look like noise. But, if we add them in a BKO sense, then "I see a pattern forming" corresponds to a distinctive shape emerging from the noise.

The idea is, some of the kets in the examples correspond to signal, and some correspond to noise. As we add them up the signal kets "reinforce" (ie, their coeffs increase), but presumably the noise is random from sample to sample, so the noise kets coeffs remain small.

We can extract the "signal" using something like this (using some operator foo):

foo |signal> => drop-below[t] (foo |example 1> + foo |example 2> + foo |example 3> + ... + foo |example n>)

I hope that makes sense.

Update: I guess I have a closer to real world application of this idea. Consider the list of WWII leaders: Roosevelt, Churchill, Stalin and Hitler.

Then in BKO we might do something like:

sa: everything-we-know-about |*> #=> apply(supported-ops|_self>,|_self>)

sa: the-list-of |WWII leaders> => |Roosevelt> + |Churchill> + |Stalin> + |Hitler>
sa: coeff-sort everything-we-know-about the-list-of |WWII leaders>

And hopefully what emerges is something about WWII.

Update: for want of a better place to put this. The above makes me think of:

sa: everything-we-know-about |*> #=> apply(supported-ops|_self>,|_self>)
sa: map[everything-we-know-about,everything] some |list>
sa: similar[everything] |object>

This should be a quite general way to find general similarity between objects. Haven't tested it, but I'm pretty sure it is correct.

Update: again, for want of a better place, we can also do this. Consider we have knowledge on quite a few animals, including what they like to eat. We also have a lot of knowledge on foxes, but we don't know what they eat. But, we can guess:

guess-what-eat |fox> => select[1,1] coeff-sort eat select[1,5] similar[everything] |fox>

ie, in words, find the 5 most similar animals given what we know. Find what they eat. Sort that list. Return the result with the highest coeff.

Update: we can also use this everything as a way to help with language translation. Maybe something like:

best-guess-German-for |*> #=> select[1,1] similar[English-everything,German-everything] |_self>

Kind of hard to test this idea at the moment. I need some way to map words to everything we know about a word. Heh, cortical.io word SDR's would be a nice start! I wonder how they made them?

Update: a little more on the idea of emerging patterns. Simple enough, the time gap between two events.

Start with a web log file. For each IP find the time gap between retrievals. I imagine this will be quite distinctive. eg, a robot slurping down a page every x seconds, should have a nice big spike around the x second mark (though it depends on how fine grained your time sample is, for how broad this peak will be. The wider your bucket size, the sharper the peak).

Next, if you use the random wait, as in wget:

--random-wait               wait from 0.5*WAIT...1.5*WAIT secs between retrievals

then that should have a distinctive pattern too.

Finally, you should get a clear signal of roughly how often you press refresh on a website when you are bored. This will probably be quite noisy, so the smooth operator should help. Also, quite likely to give you an indication of how long you are asleep. Say you normally sleep for about 8 hours. Then there should be at least some kets (probably roughly 1 per day) with a time delta greater than 8 hours. Whether you web surf at work would also potentially show up.

Last example: apparently every person has a distinctive typing pattern. We could find that simply enough, just by measuring the time delta between different characters on a keyboard. eg, what is the time delta when you type "I'm" between "I" and "'", and "'" and "m". Or typing "The" the time between "T" and "h", and "h" and "e". Or typing "rabbit" and the delta between "r" and "a", "a" and "b", "b" and "b" and so on. Presumably, if you have a big enough sample, and you map this to a superposition, then we could run a similar[typing-delta] |person: X> and guess who typed it.

Sunday, 28 June 2015

ebook letter frequencies

I wrote this one roughly a year ago, but figure may as well add it to the blog. Given ebooks (mostly from Project Gutenberg), find their letter frequencies. So not super interesting, but let's add it anyway.

Here is the code, and the resulting sw file.

Now a couple of matrices in the console:

sa: load ebook-letter-counts.sw
sa: matrix[letter-count]
[ a ] = [  9083   26317  142241  23325  76232   35669  260565  35285  23871  ] [ Alice-in-Wonderland  ]
[ b ]   [  1621   4766   25476   4829   15699   6847   50138   6117   4763   ] [ Frankenstein         ]
[ c ]   [  2817   9055   37297   7379   21938   11349  72409   10725  6942   ] [ Gone-with-Wind       ]
[ d ]   [  5228   16720  85897   12139  37966   18763  144619  18828  15168  ] [ I-Robot              ]
[ e ]   [  15084  45720  228415  37293  117608  59029  440119  54536  37230  ] [ Moby-Dick            ]
[ f ]   [  2248   8516   34779   5940   20363   9936   73859   9105   6270   ] [ nineteen-eighty-four ]
[ g ]   [  2751   5762   38283   6037   20489   9113   61948   8023   6822   ] [ Shakespeare          ]
[ h ]   [  7581   19400  119901  16803  61947   28093  234301  28284  19130  ] [ Sherlock-Holmes      ]
[ i ]   [  7803   21411  101987  20074  62942   30304  214275  27361  18380  ] [ Tom-Sawyer           ]
[ j ]   [  222    431    1501    346    915     310    2955    421    465    ]
[ k ]   [  1202   1722   18290   2370   8011    3512   32029   3590   3136   ]
[ l ]   [  5053   12603  79783   12870  42338   18395  156371  17276  12426  ]
[ m ]   [  2245   10295  39595   6534   22871   10513  101507  11391  7255   ]
[ n ]   [  7871   24220  123989  21302  65429   31516  231652  29337  20858  ]
[ o ]   [  9245   25050  130230  24555  69648   34287  299732  34452  24251  ]
[ p ]   [  1796   5939   23979   5148   16553   8058   50638   6987   4766   ]
[ q ]   [  135    323    1270    321    1244    397    2998    416    182    ]
[ r ]   [  6400   20708  105074  17003  52446   25861  224994  25378  16262  ]
[ s ]   [  6980   20808  107430  18044  62734   28382  232317  27105  17852  ]
[ t ]   [  11631  29706  157163  28316  86983   42127  311911  39232  28389  ]
[ u ]   [  3867   10340  50453   9483   26933   12903  121631  13527  9376   ]
[ v ]   [  911    3788   15224   3062   8540    4252   36692   4471   2451   ]
[ w ]   [  2696   7335   43623   6761   21174   11225  78929   10754  7735   ]
[ x ]   [  170    675    1700    508    1037    779    4867    567    326    ]
[ y ]   [  2442   7743   37639   6552   16849   9071   90162   9267   6830   ]
[ z ]   [  79     243    1045    208    598     303    1418    150    155    ]

sa: norm |*> #=> normalize[100] letter-count |_self>
sa: map[norm,normalized-letter-count] rel-kets[letter-count]
sa: matrix[normalized-letter-count]
[ a ] = [  7.75   7.75   8.12   7.85   8.11   7.91   7.38   8.16   7.92   ] [ Alice-in-Wonderland  ]
[ b ]   [  1.38   1.4    1.45   1.62   1.67   1.52   1.42   1.41   1.58   ] [ Frankenstein         ]
[ c ]   [  2.4    2.67   2.13   2.48   2.34   2.52   2.05   2.48   2.3    ] [ Gone-with-Wind       ]
[ d ]   [  4.46   4.92   4.9    4.08   4.04   4.16   4.09   4.35   5.03   ] [ I-Robot              ]
[ e ]   [  12.87  13.46  13.04  12.55  12.52  13.09  12.46  12.61  12.36  ] [ Moby-Dick            ]
[ f ]   [  1.92   2.51   1.98   2.0    2.17   2.2    2.09   2.1    2.08   ] [ nineteen-eighty-four ]
[ g ]   [  2.35   1.7    2.18   2.03   2.18   2.02   1.75   1.85   2.26   ] [ Shakespeare          ]
[ h ]   [  6.47   5.71   6.84   5.65   6.59   6.23   6.63   6.54   6.35   ] [ Sherlock-Holmes      ]
[ i ]   [  6.66   6.3    5.82   6.75   6.7    6.72   6.06   6.32   6.1    ] [ Tom-Sawyer           ]
[ j ]   [  0.19   0.13   0.09   0.12   0.1    0.07   0.08   0.1    0.15   ]
[ k ]   [  1.03   0.51   1.04   0.8    0.85   0.78   0.91   0.83   1.04   ]
[ l ]   [  4.31   3.71   4.55   4.33   4.51   4.08   4.43   3.99   4.12   ]
[ m ]   [  1.92   3.03   2.26   2.2    2.43   2.33   2.87   2.63   2.41   ]
[ n ]   [  6.72   7.13   7.08   7.17   6.96   6.99   6.56   6.78   6.92   ]
[ o ]   [  7.89   7.38   7.43   8.26   7.41   7.6    8.48   7.96   8.05   ]
[ p ]   [  1.53   1.75   1.37   1.73   1.76   1.79   1.43   1.62   1.58   ]
[ q ]   [  0.12   0.1    0.07   0.11   0.13   0.09   0.08   0.1    0.06   ]
[ r ]   [  5.46   6.1    6.0    5.72   5.58   5.73   6.37   5.87   5.4    ]
[ s ]   [  5.96   6.13   6.13   6.07   6.68   6.29   6.58   6.27   5.93   ]
[ t ]   [  9.93   8.75   8.97   9.53   9.26   9.34   8.83   9.07   9.42   ]
[ u ]   [  3.3    3.04   2.88   3.19   2.87   2.86   3.44   3.13   3.11   ]
[ v ]   [  0.78   1.12   0.87   1.03   0.91   0.94   1.04   1.03   0.81   ]
[ w ]   [  2.3    2.16   2.49   2.27   2.25   2.49   2.23   2.49   2.57   ]
[ x ]   [  0.15   0.2    0.1    0.17   0.11   0.17   0.14   0.13   0.11   ]
[ y ]   [  2.08   2.28   2.15   2.2    1.79   2.01   2.55   2.14   2.27   ]
[ z ]   [  0.07   0.07   0.06   0.07   0.06   0.07   0.04   0.03   0.05   ]
sa: save ebook-letter-counts--normalized.sw

And I guess that is it.

Update: while we are here, may as well give the simm matrix:

sa: simm |*> #=> 100 self-similar[letter-count] |_self>
sa: map[simm,simm-matrix] rel-kets[letter-count]
sa: matrix[simm-matrix]
[ Alice-in-Wonderland  ] = [  100.0  94.94  96.52  97.32  96.76  97.11  95.57  97.09  97.49  ] [ Alice-in-Wonderland  ]
[ Frankenstein         ]   [  94.94  100.0  95.97  96.01  95.22  96.48  95.24  96.52  95.54  ] [ Frankenstein         ]
[ Gone-with-Wind       ]   [  96.52  95.97  100.0  96.0   96.98  97.01  95.91  97.12  97.17  ] [ Gone-with-Wind       ]
[ I-Robot              ]   [  97.32  96.01  96.0   100.0  97.3   97.87  96.06  97.35  97.12  ] [ I-Robot              ]
[ Moby-Dick            ]   [  96.76  95.22  96.98  97.3   100.0  98.05  96.07  97.39  96.85  ] [ Moby-Dick            ]
[ nineteen-eighty-four ]   [  97.11  96.48  97.01  97.87  98.05  100.0  95.55  97.88  97.1   ] [ nineteen-eighty-four ]
[ Shakespeare          ]   [  95.57  95.24  95.91  96.06  96.07  95.55  100    97.08  95.89  ] [ Shakespeare          ]
[ Sherlock-Holmes      ]   [  97.09  96.52  97.12  97.35  97.39  97.88  97.08  100    97.54  ] [ Sherlock-Holmes      ]
[ Tom-Sawyer           ]   [  97.49  95.54  97.17  97.12  96.85  97.1   95.89  97.54  100    ] [ Tom-Sawyer           ]

So we see that English text has largely the same letter frequencies over different ebooks. Which makes sense of course, but nice to see it visually.

And it would be nice to have an "unscaled-similar[op]" operator. The problem is that would require an entire new function in the new_context class, which I am reluctant to do, since unscaled-simm is a rare use case. Currently it can be done for special occasions by changing the simm function in new_context.pattern_recognition() to unscaled_simm(A,B).

Friday, 26 June 2015

simple prolog vs BKO example

Just a comparison between some prolog on wikipedia, and the BKO equivalent.

First the prolog:

mother_child(trude, sally).
 
father_child(tom, sally).
father_child(tom, erica).
father_child(mike, tom).
 
sibling(X, Y)      :- parent_child(Z, X), parent_child(Z, Y).
 
parent_child(X, Y) :- father_child(X, Y).
parent_child(X, Y) :- mother_child(X, Y).

This results in the following query being evaluated as true:

 ?- sibling(sally, erica).
 Yes

Now in BKO:

|context> => |context: prolog example>

mother |sally> => |trude>
child |trude> = > |sally>

father |sally> => |tom>
child |tom> => |sally>

father |erica> => |tom>
child |tom> +=> |erica>

father |tom> => |mike>
child |mike> => |tom>

parent |*> #=> mother |_self> + father |_self>
sibling |*> #=> child parent |_self>          -- this being the BKO equivalent of: sibling(X, Y) :- parent_child(Z, X), parent_child(Z, Y) 
sibling-of |*> #=> clean drop (child parent |_self> + -|_self>)

now put it to use:

sa: sibling |sally>
|sally> + |erica>

sa: sibling |erica>
|sally> + |erica>

sa: sibling-of |sally>
|erica>

sa: sibling-of |erica>
|sally>

Finally, we can ask the question: "is X a sibling of Sally?"

sa: is-a-sibling-of-sally |*> #=> do-you-know mbr(|_self>, sibling-of|sally>)

sa: is-a-sibling-of-sally |erica>
|yes>

sa: is-a-sibling-of-sally |george>
|no>

And I guess that is enough.

Update: perhaps we should tweak our operator names to a little closer to English and NLP?

mother-of |sally> => |trude>
child-of |trude> = > |sally>

father-of |sally> => |tom>
child-of |tom> => |sally>

father-of |erica> => |tom>
child-of |tom> +=> |erica>

father-of |tom> => |mike>
child-of |mike> => |tom>

parents-of |*> #=> mother-of |_self> + father-of |_self>
siblings-of |*> #=> clean drop (child-of parents-of |_self> + -|_self>)

difference and smooth in the MatSumSig model

This time a couple of operators in the MatSumSig model.

Difference:

f[k] => - f[k-1]/2 + f[k] - f[k+1]/2

[ f0  ]       [  1 -1  0  0  0  0  0  0  0  0  0 ] [ f0  ]
[ f1  ]       [ -1  2 -1  0  0  0  0  0  0  0  0 ] [ f1  ]
[ f2  ]       [  0 -1  2 -1  0  0  0  0  0  0  0 ] [ f2  ]
[ f3  ]       [  0  0 -1  2 -1  0  0  0  0  0  0 ] [ f3  ]
[ f4  ]       [  0  0  0 -1  2 -1  0  0  0  0  0 ] [ f4  ]
[ f5  ] = 1/2 [  0  0  0  0 -1  2 -1  0  0  0  0 ] [ f5  ]
[ f6  ]       [  0  0  0  0  0 -1  2 -1  0  0  0 ] [ f6  ]
[ f7  ]       [  0  0  0  0  0  0 -1  2 -1  0  0 ] [ f7  ]
[ f8  ]       [  0  0  0  0  0  0  0 -1  2 -1  0 ] [ f8  ]
[ f9  ]       [  0  0  0  0  0  0  0  0 -1  2 -1 ] [ f9  ]
[ f10 ]       [  0  0  0  0  0  0  0  0  0 -1  1 ] [ f10 ]

And note that we don't have currency conservation, since the sum of columns = 0, instead of 1. Originally I thought this thing would be useful (eg, for edge detection in images), but so far, not particularly.

Next is smooth, and this one is clearly useful.
Smooth:

f[k] => f[k-1]/4 + f[k]/2 + f[k+1]/4

[ f0  ]       [ 3 1 0 0 0 0 0 0 0 0 0 ] [ f0  ]
[ f1  ]       [ 1 2 1 0 0 0 0 0 0 0 0 ] [ f1  ]
[ f2  ]       [ 0 1 2 1 0 0 0 0 0 0 0 ] [ f2  ]
[ f3  ]       [ 0 0 1 2 1 0 0 0 0 0 0 ] [ f3  ]
[ f4  ]       [ 0 0 0 1 2 1 0 0 0 0 0 ] [ f4  ]
[ f5  ] = 1/4 [ 0 0 0 0 1 2 1 0 0 0 0 ] [ f5  ]
[ f6  ]       [ 0 0 0 0 0 1 2 1 0 0 0 ] [ f6  ]
[ f7  ]       [ 0 0 0 0 0 0 1 2 1 0 0 ] [ f7  ]
[ f8  ]       [ 0 0 0 0 0 0 0 1 2 1 0 ] [ f8  ]
[ f9  ]       [ 0 0 0 0 0 0 0 0 1 2 1 ] [ f9  ]
[ f10 ]       [ 0 0 0 0 0 0 0 0 0 1 3 ] [ f10 ]

Now an example representation in BKO:

smooth |f0> => 0.75|f0> + 0.25|f1>
smooth |f1> => 0.25|f0> + 0.5|f1> + 0.25|f2>
smooth |f2> => 0.25|f1> + 0.5|f2> + 0.25|f3>
smooth |f3> => 0.25|f2> + 0.5|f3> + 0.25|f4>
smooth |f4> => 0.25|f3> + 0.5|f4> + 0.25|f5>
smooth |f5> => 0.25|f4> + 0.5|f5> + 0.25|f6>
smooth |f6> => 0.25|f5> + 0.5|f6> + 0.25|f7>
smooth |f7> => 0.25|f6> + 0.5|f7> + 0.25|f8>
smooth |f8> => 0.25|f7> + 0.5|f8> + 0.25|f9>
smooth |f9> => 0.25|f8> + 0.5|f9> + 0.25|f10>
smooth |f10> => 0.25|f9> + 0.75|f10>

sa: matrix[smooth]
[ f0  ] = [  0.75  0.25  0     0     0     0     0     0     0     0     0     ] [ f0  ]
[ f1  ]   [  0.25  0.5   0.25  0     0     0     0     0     0     0     0     ] [ f1  ]
[ f2  ]   [  0     0.25  0.5   0.25  0     0     0     0     0     0     0     ] [ f2  ]
[ f3  ]   [  0     0     0.25  0.5   0.25  0     0     0     0     0     0     ] [ f3  ]
[ f4  ]   [  0     0     0     0.25  0.5   0.25  0     0     0     0     0     ] [ f4  ]
[ f5  ]   [  0     0     0     0     0.25  0.5   0.25  0     0     0     0     ] [ f5  ]
[ f6  ]   [  0     0     0     0     0     0.25  0.5   0.25  0     0     0     ] [ f6  ]
[ f7  ]   [  0     0     0     0     0     0     0.25  0.5   0.25  0     0     ] [ f7  ]
[ f8  ]   [  0     0     0     0     0     0     0     0.25  0.5   0.25  0     ] [ f8  ]
[ f9  ]   [  0     0     0     0     0     0     0     0     0.25  0.5   0.25  ] [ f9  ]
[ f10 ]   [  0     0     0     0     0     0     0     0     0     0.25  0.75  ] [ f10 ]

Some notes:
1) this clearly has currency conservation, since all columns sum to 1.
2) this thing rapidly approaches a Gaussian smooth, if you iterate it a few times. eg, in image edge enhancement, smooth^20 gave good results. See here. In the case of mapping posting times to 1440 buckets in a day, 300 to 500 smooths gave the best results.

simm in the MatSumSig model

So, we have basics like simple logic, and set union and intersection in the MatSumSig model. Interestingly, we have a version of simm too! ie, simm becomes somewhat biologically plausible. Heh, but to tell the truth, I don't care if the brain doesn't actually use simm, since I have found it to be very useful.

Anyway, recall one definition of simm:

simm(w,f,g) = \Sum_k w[k] min(f[k],g[k]) / max(w*f,w*g)

If we ignore the max(w*f,w*g) denominator, here is a MatSumSig version of simm:

[ r ] = [ sigmoid[x1] ] [ w1 w2 w3 w4 ] [ pos[x1] ] [ 1 -1 -1  0  0  0  0  0  0  0  0  0 ] [ pos[x1]  ] [  1  1  0  0  0  0  0  0 ] [ f1 ]
                                        [ pos[x2] ] [ 0  0  0  1 -1 -1  0  0  0  0  0  0 ] [ pos[x2]  ] [  1 -1  0  0  0  0  0  0 ] [ g1 ]
                                        [ pos[x3] ] [ 0  0  0  0  0  0  1 -1 -1  0  0  0 ] [ pos[x3]  ] [ -1  1  0  0  0  0  0  0 ] [ f2 ]
                                        [ pos[x4] ] [ 0  0  0  0  0  0  0  0  0  1 -1 -1 ] [ pos[x4]  ] [  0  0  1  1  0  0  0  0 ] [ g2 ]
                                                                                           [ pos[x5]  ] [  0  0  1 -1  0  0  0  0 ] [ f3 ]
                                                                                           [ pos[x6]  ] [  0  0 -1  1  0  0  0  0 ] [ g3 ]
                                                                                           [ pos[x7]  ] [  0  0  0  0  1  1  0  0 ] [ f4 ]
                                                                                           [ pos[x8]  ] [  0  0  0  0  1 -1  0  0 ] [ g4 ]
                                                                                           [ pos[x9]  ] [  0  0  0  0 -1  1  0  0 ]
                                                                                           [ pos[x10] ] [  0  0  0  0  0  0  1  1 ]
                                                                                           [ pos[x11] ] [  0  0  0  0  0  0  1 -1 ]
                                                                                           [ pos[x12] ] [  0  0  0  0  0  0 -1  1 ]

where it is assumed w_k >= 0.

If we extract out the intersection component, see last post, we have:

[I1,I2,I3,I4] = 2* [min(f1,g1), min(f2,g2), min(f3,g3), min(f4,g4)]

[ r ] = [ sigmoid[x1] ] [ w1 w2 w3 w4 ] [ I1 ]
                                        [ I2 ]
                                        [ I3 ]
                                        [ I4 ]

Now, the above can be considered a space based simm. We can also do a time based one. I think it goes like this, though I haven't given this much thought in a long, long time!

[ r ] = [ sum[x1,t2] ] [ sigmoid[x1,t1] ] [ 1 -1 -1 ] [ pos[x1] ] [  1  1 ] [ f ]
                                                      [ pos[x2] ] [  1 -1 ] [ g ]
                                                      [ pos[x3] ] [ -1  1 ]

where [ sum[x1,t2] ] is the time based equivalent of [ w1 w2 w3 w4 ]