Tuesday 4 August 2015

letter 3-grams that precede a full stop

Just a quick one using our letter 3/5 ngram structures to find those 3-grams that precede both the comma and the full stop.

Simply enough:
sa: load ngram-letter-pairs--sherlock-holmes.sw
sa: find-inverse[next-2-letters]
sa: table[3gram] ket-sort common[inverse-next-2-letters] (|, > + |. >)
+-------+
| 3gram |
+-------+
| 2nd   |
| 3rd   |
| 4th   |
|  be   |
|  by   |
|  do   |
|  go   |
|  he   |
|  in   |
|  is   |
|  it   |
|  me   |
|  No   |
|  no   |
|  of   |
|  on   |
|  pa   |
|  so   |
|  to   |
|  up   |
|  us   |
| "No   |
| '85   |
| -by   |
| -tm   |
| ace   |
| ach   |
| ack   |
| act   |
| acy   |
| ade   |
| ads   |
| ady   |
| afe   |
| aff   |
| age   |
| ago   |
| aid   |
| ail   |
| aim   |
| ain   |
| air   |
| ait   |
| ake   |
| ale   |
| alk   |
| all   |
| als   |
| ame   |
| amp   |
| and   |
| ane   |
| ang   |
| ank   |
| ans   |
| ant   |
| ape   |
| aph   |
| aps   |
| ard   |
| are   |
| ark   |
| arm   |
| ars   |
| art   |
| ary   |
| ase   |
| ash   |
| ask   |
| ass   |
| ast   |
| asy   |
| ata   |
| ate   |
| ath   |
| ave   |
| awn   |
| ays   |
| aze   |
| bad   |
| bag   |
| bed   |
| ber   |
| ble   |
| bly   |
| box   |
| bts   |
| bye   |
| cal   |
| can   |
| cap   |
| cat   |
| cco   |
| ced   |
| ces   |
| cks   |
| cle   |
| cts   |
| d I   |
| day   |
| dea   |
| ded   |
| dee   |
| den   |
| der   |
| des   |
| dge   |
| dia   |
| did   |
| dle   |
| dly   |
| dog   |
| don   |
| dor   |
| dow   |
| ead   |
| eak   |
| eal   |
| eam   |
| ear   |
| eat   |
| eau   |
| ece   |
| ech   |
| eck   |
| ect   |
| eds   |
| eed   |
| eek   |
| eel   |
| een   |
| eep   |
| eer   |
| ees   |
| eet   |
| eft   |
| egs   |
| eks   |
| eld   |
| elf   |
| ell   |
| elp   |
| els   |
| ely   |
| ems   |
| end   |
| ens   |
| ent   |
| eps   |
| ere   |
| ern   |
| ers   |
| ery   |
| esh   |
| esk   |
| ess   |
| est   |
| ete   |
| ets   |
| ety   |
| eve   |
| ews   |
| ext   |
| eye   |
| F.3   |
| fed   |
| fee   |
| fer   |
| fle   |
| fly   |
| for   |
| ful   |
| gar   |
| ged   |
| gel   |
| ger   |
| ges   |
| ght   |
| gle   |
| gro   |
| gth   |
| gue   |
| had   |
| ham   |
| hat   |
| haw   |
| hed   |
| hem   |
| hen   |
| her   |
| hes   |
| him   |
| hin   |
| hip   |
| his   |
| hod   |
| hop   |
| hot   |
| hts   |
| hur   |
| hus   |
| ial   |
| ian   |
| ica   |
| ice   |
| ich   |
| ick   |
| ics   |
| ida   |
| ide   |
| ids   |
| ied   |
| ief   |
| ier   |
| ies   |
| iew   |
| ife   |
| iff   |
| ify   |
| ign   |
| ike   |
| ild   |
| ile   |
| ill   |
| ils   |
| ily   |
| ime   |
| ina   |
| ind   |
| ine   |
| ing   |
| ink   |
| Inn   |
| ins   |
| int   |
| iny   |
| ion   |
| ips   |
| ird   |
| ire   |
| irl   |
| irm   |
| irs   |
| irt   |
| iry   |
| ise   |
| ish   |
| iss   |
| ist   |
| ite   |
| ith   |
| its   |
| ity   |
| ium   |
| ius   |
| ive   |
| ize   |
| ked   |
| ken   |
| ker   |
| ket   |
| key   |
| kly   |
| lar   |
| law   |
| lay   |
| lds   |
| led   |
| Lee   |
| leg   |
| lem   |
| len   |
| ler   |
| les   |
| ley   |
| lic   |
| lip   |
| lls   |
| lly   |
| lor   |
| low   |
| lse   |
| lso   |
| lts   |
| lue   |
| lve   |
| mad   |
| mal   |
| man   |
| mas   |
| may   |
| med   |
| men   |
| mer   |
| mes   |
| met   |
| mly   |
| mon   |
| mpt   |
| n 4   |
| nah   |
| nal   |
| nce   |
| nch   |
| ncy   |
| nds   |
| ndy   |
| ned   |
| nee   |
| nel   |
| nen   |
| ner   |
| nes   |
| net   |
| ney   |
| nge   |
| ngs   |
| nks   |
| nly   |
| nny   |
| not   |
| now   |
| nse   |
| nth   |
| nto   |
| nts   |
| nty   |
| nue   |
| oad   |
| oak   |
| oal   |
| oat   |
| obe   |
| ock   |
| ods   |
| ody   |
| oes   |
| ofa   |
| off   |
| ofs   |
| oke   |
| oks   |
| old   |
| ole   |
| ome   |
| oms   |
| one   |
| ong   |
| ons   |
| ont   |
| ood   |
| oof   |
| ook   |
| ool   |
| oom   |
| oon   |
| oor   |
| oot   |
| ope   |
| ord   |
| ore   |
| ork   |
| orm   |
| orn   |
| ors   |
| ort   |
| ory   |
| ose   |
| oss   |
| ost   |
| ote   |
| oth   |
| ots   |
| oul   |
| our   |
| ous   |
| out   |
| ove   |
| owd   |
| own   |
| ows   |
| ped   |
| pen   |
| per   |
| pes   |
| pet   |
| pew   |
| phy   |
| ple   |
| ply   |
| pty   |
| que   |
| r A   |
| r's   |
| ram   |
| ran   |
| rap   |
| rat   |
| rce   |
| rch   |
| rds   |
| red   |
| ree   |
| ren   |
| rer   |
| res   |
| ret   |
| rey   |
| rge   |
| rks   |
| rld   |
| rly   |
| rms   |
| rol   |
| rop   |
| ror   |
| row   |
| rse   |
| rst   |
| rth   |
| rts   |
| rty   |
| rue   |
| rug   |
| run   |
| rve   |
| sal   |
| saw   |
| say   |
| sco   |
| sed   |
| see   |
| sen   |
| ser   |
| ses   |
| set   |
| sex   |
| she   |
| sin   |
| sir   |
| sit   |
| six   |
| sky   |
| sly   |
| som   |
| son   |
| sts   |
| sty   |
| sun   |
| t I   |
| tal   |
| tar   |
| tch   |
| ted   |
| tel   |
| ten   |
| tep   |
| ter   |
| tes   |
| ths   |
| thy   |
| tic   |
| tie   |
| tle   |
| tly   |
| tol   |
| ton   |
| too   |
| tor   |
| tre   |
| try   |
| tte   |
| two   |
| ual   |
| ubt   |
| uch   |
| uct   |
| ued   |
| ues   |
| uff   |
| ugh   |
| ull   |
| ulp   |
| ult   |
| umb   |
| ume   |
| umn   |
| und   |
| une   |
| ung   |
| unk   |
| unt   |
| ure   |
| urn   |
| urs   |
| urt   |
| ury   |
| use   |
| uth   |
| uty   |
| van   |
| ved   |
| vel   |
| ven   |
| ver   |
| ves   |
| vil   |
| War   |
| was   |
| way   |
| wed   |
| wer   |
| wit   |
| xes   |
| yed   |
| yer   |
| yes   |
| Yes   |
| yet   |
| yle   |
| you   |
| zes   |
+-------+
So we see there are a lot, but not all possible combinations. I don't know, but to me this is starting to feel like grammar. Grammar seems to be "these structures are common and therefore likely correct, and these structures are rare, and therefore likely wrong". Sure, not exactly grammar yet, but it feels like we are getting closer. Anyway, I will keep thinking about it.

Maybe down the line try for a big set of ngram structures, the full set of p/q ngram structures where:
p is in {1,2,3,4,5,6,7,8,9}
and
q is in {2,3,4,5,6,7,8,9,10}

No comments:

Post a Comment