I'll just give the algo for 3/5 ngram stitch, but should extend in the obvious way to other p/q.
Simply:
extract all the 5-grams from your seed text start with a seed string. loop { extract the last 3 words from string find a set of 5-grams that start with those 3 words and pick one randomly add the last 2 words from that 5-gram to your string }Then we use this code to find our n-grams:
def create_ngram_pairs(s): return [[" ".join(s[i:i+3])," ".join(s[i+3:i+5])] for i in range(len(s) - 4)] # learn ngram pairs: def learn_ngram_pairs(context,filename): with open(filename,'r') as f: text = f.read() words = re.sub('[<|>=]','',text) for ngram_pairs in create_ngram_pairs(words.split()): try: head,tail = ngram_pairs context.add_learn("next-2",head,tail) except: continue learn_ngram_pairs(C,filename) dest = "sw-examples/ngram-pairs--webboard.sw" save_sw(C,dest)Some example learn rules in that sw are:
next-2 |Looking forward to> => |that. it> + |doing something> + |it. I> + |when the> + |the Paranoid> + |tomorrow's. ("flow",> + |seeing The> + |tomorrow. 3.1415926...can't> + |you posting> + |the "Geometric> + |it. Breaking> + |being a> + |Joe Biden> next-2 |forward to that.> => |it was> next-2 |to that. it> => |was 4> next-2 |that. it was> => |4 below> + |only 100db> next-2 |it was 4> => |below zero> + |years ago> next-2 |was 4 below> => |zero maybe>And then we need this function operator:
# extract-3-tail |a b c d e f g h> == |f g h> # # assumes one is a ket def extract_3_tail(one): split_str = one.label.rsplit(' ',3) if len(split_str) < 4: return one return ket(" ".join(split_str[1:]))Then after all that preparation, our Ramlber algo simplifies to:
ramble |*> #=> merge-labels(|_self> + | > + pick-elt next-2 extract-3-tail |_self>)Examples in the next post.
BTW, I find it interesting that we can compact down the Rambler algo to 1 line of BKO.
No comments:
Post a Comment