Thursday 11 February 2016

learning days of the week using if-then machines

Today, an example of learning days of the week using 7 if-then machines. Note that if-then machines are probably over-kill if you spell your days correctly. In this post we make use of string similarity using letter-ngrams.

Here is the code:
  context weekday if-then machines
  ngrams |*> #=> letter-ngrams[1,2,3] lower-case |_self>

  day |node: 1: 1> => ngrams |Monday>
  day |node: 1: 2> => ngrams |mon>
  day |node: 1: 3> => ngrams |Mo>
  previous |node: 1: *> => |Sunday>
  id |node: 1: *> => |Monday>
  next |node: 1: *> => |Tuesday>
 
  day |node: 2: 1> => ngrams |Tuesday>
  day |node: 2: 2> => ngrams |tue>
  day |node: 2: 3> => ngrams |Tu>
  previous |node: 2: *> => |Monday>
  id |node: 2: *> => |Tuesday>
  next |node: 2: *> => |Wednesday>

  day |node: 3: 1> => ngrams |Wednesday>
  day |node: 3: 2> => ngrams |wed>
  day |node: 3: 3> => ngrams |We>
  previous |node: 3: *> => |Tuesday>
  id |node: 3: *> => |Wednesday>
  next |node: 3: *> => |Thursday>

  day |node: 4: 1> => ngrams |Thursday>
  day |node: 4: 2> => ngrams |thurs>
  day |node: 4: 3> => ngrams |Th>
  previous |node: 4: *> => |Wednesday>
  id |node: 4: *> => |Thursday>
  next |node: 4: *> => |Friday>

  day |node: 5: 1> => ngrams |Friday>
  day |node: 5: 2> => ngrams |fri>
  day |node: 5: 3> => ngrams |Fr>
  previous |node: 5: *> => |Thursday>
  id |node: 5: *> => |Friday>
  next |node: 5: *> => |Saturday>

  day |node: 6: 1> => ngrams |Saturday>
  day |node: 6: 2> => ngrams |sat>
  day |node: 6: 3> => ngrams |Sa>
  previous |node: 6: *> => |Friday>
  id |node: 6: *> => |Saturday>
  next |node: 6: *> => |Sunday>

  day |node: 7: 1> => ngrams |Sunday>
  day |node: 7: 2> => ngrams |sun>
  day |node: 7: 3> => ngrams |Su>
  previous |node: 7: *> => |Saturday>
  id |node: 7: *> => |Sunday>
  next |node: 7: *> => |Monday>

  yesterday |*> #=> previous drop-below[0.65] similar-input[day] ngrams |_self>
  today |*> #=> id drop-below[0.65] similar-input[day] ngrams |_self>
  tomorrow |*> #=> next drop-below[0.65] similar-input[day] ngrams |_self>
Now, some example usages in the console:
-- correct spelling means coeff = 1:
sa: tomorrow |sun>
1.0|Monday>

-- spelling is not perfect, but close enough (with respect to strings mapped to letter-ngrams) that we can guess what was meant:
sa: tomorrow |tues>
0.667|Wednesday>

-- making use of operator exponentiation. In this case equivalent to "tomorrow tomorrow tomorrow"
-- also note the coeff propagates. If we shoved a "clean" sigmoid in the "yesterday, today and tomorrow" operators, we could change that behaviour.
-- eg: yesterday |*> #=> previous clean drop-below[0.65] similar-input[day] ngrams |_self>
sa: tomorrow^3 |tues>
0.667|Friday>

-- "tomorrow" and "yesterday" are perfect inverses of each other:
sa: tomorrow yesterday |fri>
1.0|Friday>

sa: yesterday tomorrow |fri>
1.0|Friday>

-- mapping abbreviation to the full word:
sa: today |Sa>
|Saturday>

sa: yesterday |thurs>
|Wednesday>

-- typo, "thrusday" instead of "thursday", but the code guessed what we meant.
-- this is one of the main benefits of the if-then machines, you usually don't have to get the input exactly right (depending on how you set drop threshold t).
sa: yesterday |thrusday>
0.667|Wednesday>

-- this is an example of over-counting, I suppose you could call it.
-- since "thursd" matched both:
-- day |node: 4: 1> => ngrams |Thursday>
-- day |node: 4: 2> => ngrams |thurs>
-- we briefly mentioned this possibility in my first if-then machine post.
sa: yesterday |thursd>
1.514|Wednesday>

-- Next, we have a couple of function operators that return todays time and date:
sa: current-time
|time: 20:33:16>

sa: current-date
|date: 2016-02-11>

-- and we have another function operator that converts dates to days of the week:
-- what day of the week is New Year:
sa: day-of-the-week |date: 2016-1-1>
|day: Friday>

-- what day of the week is today?:
sa: day-of-the-week current-date
|day: Thursday>

-- what day was it three days ago?
-- NB: not a 100% match because of the "day: " prefix.
sa: yesterday^3 day-of-the-week current-date
0.702|Monday>

-- if you care about that one fix is to remove the category or extract the value:
-- another fix is to add more patterns to our if-then machines
-- eg:
-- day |node: 2: 4> => ngrams |day: Tuesday>
-- day |node: 2: 5> => ngrams |day: tue>
-- day |node: 2: 6> => ngrams |day: Tu>
-- there are other possible fixes too.
-- eg:
-- ngrams |*> #=> letter-ngrams[1,2,3] lower-case extract-value |_self>
sa: extract-value day-of-the-week current-date
|Thursday>

-- what day was it three days ago?
sa: yesterday^3 extract-value day-of-the-week current-date
1.0|Monday>

-- what day is it five days from now?
sa: tomorrow^5 extract-value day-of-the-week current-date
1.0|Tuesday>

-- now, our "tomorrow, yesterday and today" operators are linear (since they are defined with a |*> rule).
-- so a quick demonstration of that:
sa: tomorrow^3 (|Monday> + |Tuesday> + |Saturday>)
1.0|Thursday> + 1.0|Friday> + |Tuesday>
-- and similarly for the other two operators.

-- finally, weekdays are mod 7:
sa: tomorrow^7 |thurs>
1.0|Thursday>

sa: yesterday^21 |thurs>
|Thursday>
I guess that is about it. A fairly simple, somewhat useful, 7 if-then machine system. And an observation I want to make. Usually operator definition time is on the ugly side. As it kind of is above. But operator application time is usually quite clean. I think this is not a bad property to have, though I didn't really design it that way, it was just the way it turned out. So perhaps one use case is that if defining desired operators is too messy for you personally, then find them implemented elsewhere on the net and just web-load the sw file. Heh, assuming I can get anyone interested in the sw file format!

A couple of comments:
1) I had to hand tweak the drop-below threshold to 0.65. If I set it too much higher than that then I wasn't matching things I wanted to. And if I set it to 0.6 then "Sunday" and "Monday" matched.
sa: id drop-below[0.6] similar-input[day] ngrams |Monday>
1.0|Monday> + 0.6|Sunday>
2) If my proposition that if-then machines are a fairly good mathematical approximation to biological neurons, then the above is only a 7 neuron system. The brain has trillions of neurons! That is a lot of processing power!! Though our ngrams operator probably needs a few neurons too. I don't really know at this point how many.
3) here is one way to find the full set of days, given a starting day. Not sure it is all that useful in this particular case, but hey, probably is for other if-then machine systems.
sa: exp-max[tomorrow] |Monday>
2|Monday> + 1.0|Tuesday> + |Wednesday> + 1.0|Thursday> + |Friday> + 1.0|Saturday> + 1.0|Sunday>
Whether we want to tweak exp-max[] so that it doesn't over-count, I'm not yet sure. Probably cleaner if we did.
4) we can define things like the "day-after-tomorrow" operator:
-- define the operator:
sa: day-after-tomorrow |*> #=> tomorrow^2 day-of-the-week current-date |>

-- invoke it:
sa: day-after-tomorrow |x>
0.702|Saturday>
Noting the 0.7 coeff is from the "day: " prefix. And we could define plenty of others, like "day-before-yesterday", and so on.
5) for completeness, here is what we now know:
sa: dump
----------------------------------------
|context> => |context: weekday if-then machines>

ngrams |*> #=> letter-ngrams[1,2,3] lower-case |_self>
yesterday |*> #=> previous drop-below[0.65] similar-input[day] ngrams |_self>
today |*> #=> id drop-below[0.65] similar-input[day] ngrams |_self>
tomorrow |*> #=> next drop-below[0.65] similar-input[day] ngrams |_self>
day-after-tomorrow |*> #=> tomorrow^2 day-of-the-week current-date |>

day |node: 1: 1> => |m> + |o> + |n> + |d> + |a> + |y> + |mo> + |on> + |nd> + |da> + |ay> + |mon> + |ond> + |nda> + |day>

day |node: 1: 2> => |m> + |o> + |n> + |mo> + |on> + |mon>

day |node: 1: 3> => |m> + |o> + |mo>

previous |node: 1: *> => |Sunday>
id |node: 1: *> => |Monday>
next |node: 1: *> => |Tuesday>

day |node: 2: 1> => |t> + |u> + |e> + |s> + |d> + |a> + |y> + |tu> + |ue> + |es> + |sd> + |da> + |ay> + |tue> + |ues> + |esd> + |sda> + |day>

day |node: 2: 2> => |t> + |u> + |e> + |tu> + |ue> + |tue>

day |node: 2: 3> => |t> + |u> + |tu>

previous |node: 2: *> => |Monday>
id |node: 2: *> => |Tuesday>
next |node: 2: *> => |Wednesday>

day |node: 3: 1> => |w> + 2|e> + 2|d> + |n> + |s> + |a> + |y> + |we> + |ed> + |dn> + |ne> + |es> + |sd> + |da> + |ay> + |wed> + |edn> + |dne> + |nes> + |esd> + |sda> + |day>

day |node: 3: 2> => |w> + |e> + |d> + |we> + |ed> + |wed>

day |node: 3: 3> => |w> + |e> + |we>

previous |node: 3: *> => |Tuesday>
id |node: 3: *> => |Wednesday>
next |node: 3: *> => |Thursday>

day |node: 4: 1> => |t> + |h> + |u> + |r> + |s> + |d> + |a> + |y> + |th> + |hu> + |ur> + |rs> + |sd> + |da> + |ay> + |thu> + |hur> + |urs> + |rsd> + |sda> + |day>

day |node: 4: 2> => |t> + |h> + |u> + |r> + |s> + |th> + |hu> + |ur> + |rs> + |thu> + |hur> + |urs>

day |node: 4: 3> => |t> + |h> + |th>

previous |node: 4: *> => |Wednesday>
id |node: 4: *> => |Thursday>
next |node: 4: *> => |Friday>

day |node: 5: 1> => |f> + |r> + |i> + |d> + |a> + |y> + |fr> + |ri> + |id> + |da> + |ay> + |fri> + |rid> + |ida> + |day>

day |node: 5: 2> => |f> + |r> + |i> + |fr> + |ri> + |fri>

day |node: 5: 3> => |f> + |r> + |fr>

previous |node: 5: *> => |Thursday>
id |node: 5: *> => |Friday>
next |node: 5: *> => |Saturday>

day |node: 6: 1> => |s> + 2|a> + |t> + |u> + |r> + |d> + |y> + |sa> + |at> + |tu> + |ur> + |rd> + |da> + |ay> + |sat> + |atu> + |tur> + |urd> + |rda> + |day>

day |node: 6: 2> => |s> + |a> + |t> + |sa> + |at> + |sat>

day |node: 6: 3> => |s> + |a> + |sa>

previous |node: 6: *> => |Friday>
id |node: 6: *> => |Saturday>
next |node: 6: *> => |Sunday>

day |node: 7: 1> => |s> + |u> + |n> + |d> + |a> + |y> + |su> + |un> + |nd> + |da> + |ay> + |sun> + |und> + |nda> + |day>

day |node: 7: 2> => |s> + |u> + |n> + |su> + |un> + |sun>

day |node: 7: 3> => |s> + |u> + |su>

previous |node: 7: *> => |Saturday>
id |node: 7: *> => |Sunday>
next |node: 7: *> => |Monday>
----------------------------------------
And I guess that is it for this post.

No comments:

Post a Comment