Sunday 19 July 2015

introducing the ngram stitch

Otherwise known as the Rambler algo. The basic outline is you have a big corpus of conversational text, eg from a web-board, and then you process that a little, and then the algo creative-writes/rambles.

I'll just give the algo for 3/5 ngram stitch, but should extend in the obvious way to other p/q.
Simply:
extract all the 5-grams from your seed text
start with a seed string.
loop {
  extract the last 3 words from string
  find a set of 5-grams that start with those 3 words and pick one randomly
  add the last 2 words from that 5-gram to your string
 }
Then we use this code to find our n-grams:
def create_ngram_pairs(s):
  return [[" ".join(s[i:i+3])," ".join(s[i+3:i+5])] for i in range(len(s) - 4)]

# learn ngram pairs:
def learn_ngram_pairs(context,filename):
  with open(filename,'r') as f:
    text = f.read()
    words = re.sub('[<|>=]','',text)
    for ngram_pairs in create_ngram_pairs(words.split()):
      try:
        head,tail = ngram_pairs
        context.add_learn("next-2",head,tail)
      except:
        continue
    
learn_ngram_pairs(C,filename)

dest = "sw-examples/ngram-pairs--webboard.sw"
save_sw(C,dest)
Some example learn rules in that sw are:
next-2 |Looking forward to> => |that. it> + |doing something> + |it. I> + |when the> + |the Paranoid> + |tomorrow's. ("flow",> + |seeing The> + |tomorrow. 3.1415926...can't> + |you posting> + |the "Geometric> + |it. Breaking> + |being a> + |Joe Biden>
next-2 |forward to that.> => |it was>
next-2 |to that. it> => |was 4>
next-2 |that. it was> => |4 below> + |only 100db>
next-2 |it was 4> => |below zero> + |years ago>
next-2 |was 4 below> => |zero maybe>
And then we need this function operator:
# extract-3-tail |a b c d e f g h> == |f g h>
#
# assumes one is a ket
def extract_3_tail(one):
  split_str = one.label.rsplit(' ',3)
  if len(split_str) < 4:
    return one
  return ket(" ".join(split_str[1:]))
Then after all that preparation, our Ramlber algo simplifies to:
ramble |*> #=> merge-labels(|_self> + | > + pick-elt next-2 extract-3-tail |_self>)
Examples in the next post.

BTW, I find it interesting that we can compact down the Rambler algo to 1 line of BKO.

No comments:

Post a Comment