Thursday 12 March 2015

supervised learning of iris classes

Last post I gave the general algo for supervised pattern recognition. Kind of a big claim, so let's try an example bigger than truth tables. I decided iris types would be a good example. This data set consists of 150 examples. 50 for each type of iris. So I decided to use 40 of each type as the training set, and 10 of each type for the test cases.

Now, we need some python to convert the data into sw form.
Python:
import sys

from the_semantic_db_code import *
from the_semantic_db_functions import *
from the_semantic_db_processor import *

C = context_list("iris pattern recognition")

data_file = "data/iris-data/bezdekIris.data"

def learn_data(C,filename):
  k = 0
  with open(filename,'r') as f:
    for line in f:
      try:
        sepal_len,sepal_width,petal_len,petal_width,iris_class = line.strip().split(',')
        k += 1
        node = ket("node-" + str(k))
        r = ket("sepal-length: " +  sepal_len) + ket("sepal-width: " + sepal_width) + ket("petal-length: " + petal_len) + ket("petal-width: " + petal_width)
        if ((k - 1) % 50) < 40:              # learn training data set:
          C.learn("pattern",node,r)
          C.learn("M",node,iris_class)
        else:
          C.learn("input-pattern",node,r)    # learn test cases:
      except:
        continue

learn_data(C,data_file)

# save the result:
sw_file = "sw-examples/iris-pattern-recognition.sw"
save_sw(C,sw_file)
Now for the results:
sa: load iris-pattern-recognition.sw
sa: h2 |*> #=> coeff-sort M similar[input-pattern,pattern] |_self>
sa: discrimination |*> #=> push-float discrim h2 |_self>

-- NB: use split to save typing
-- the Iris-setosa test cases:
sa: table[input,h2,discrimination] split |node-41 node-42 node-43 node-44 node-45 node-46 node-47 node-48 node-49 node-50> 
+---------+-------------------------------------------------------------+----------------+
| input   | h2                                                          | discrimination |
+---------+-------------------------------------------------------------+----------------+
| node-41 | 4.25 Iris-setosa, 0.25 Iris-versicolor                      | 4              |
| node-42 | 2 Iris-setosa, 0.50 Iris-versicolor                         | 1.5            |
| node-43 | 8.25 Iris-setosa, Iris-virginica, 0.75 Iris-versicolor      | 7.25           |
| node-44 | 3.50 Iris-setosa, 0.25 Iris-versicolor                      | 3.25           |
| node-45 | 3.75 Iris-setosa, 0.50 Iris-virginica                       | 3.25           |
| node-46 | 5.75 Iris-setosa, 2.25 Iris-virginica, 1.50 Iris-versicolor | 3.5            |
| node-47 | 9.25 Iris-setosa, 0.50 Iris-virginica                       | 8.75           |
| node-48 | 10 Iris-setosa, Iris-virginica, 0.75 Iris-versicolor        | 9              |
| node-49 | 9.50 Iris-setosa                                            | 9.5            |
| node-50 | 10 Iris-setosa, 0.50 Iris-versicolor, 0.50 Iris-virginica   | 9.5            |
+---------+-------------------------------------------------------------+----------------+

-- the Iris-versicolor test cases:
sa: table[input,h2,discrimination] split |node-91 node-92 node-93 node-94 node-95 node-96 node-97 node-98 node-99 node-100>
+----------+-------------------------------------------------------------+----------------+
| input    | h2                                                          | discrimination |
+----------+-------------------------------------------------------------+----------------+
| node-91  | 2.50 Iris-versicolor, 0.50 Iris-setosa, 0.50 Iris-virginica | 2              |
| node-92  | 4.25 Iris-versicolor, 3 Iris-virginica, 1.25 Iris-setosa    | 1.25           |
| node-93  | 2.25 Iris-versicolor, Iris-virginica, 0.25 Iris-setosa      | 1.25           |
| node-94  | 2.50 Iris-versicolor, 1.25 Iris-setosa                      | 1.25           |
| node-95  | 4.50 Iris-versicolor, Iris-virginica                        | 3.5            |
| node-96  | 2.75 Iris-versicolor, 2.50 Iris-virginica, 1.75 Iris-setosa | 0.25           |
| node-97  | 4.25 Iris-versicolor, 0.75 Iris-setosa, 0.75 Iris-virginica | 3.5            |
| node-98  | 4 Iris-versicolor, 0.75 Iris-virginica, 0.25 Iris-setosa    | 3.25           |
| node-99  | 1.50 Iris-setosa, 1.25 Iris-versicolor, 0.75 Iris-virginica | 0.25           |
| node-100 | 4.50 Iris-versicolor, 2.25 Iris-virginica, 0.50 Iris-setosa | 2.25           |
+----------+-------------------------------------------------------------+----------------+

-- the Iris-virginica test cases:
sa: table[input,h2,discrimination] split |node-141 node-142 node-143 node-144 node-145 node-146 node-147 node-148 node-149 node-150>
+----------+-------------------------------------------------------------+----------------+
| input    | h2                                                          | discrimination |
+----------+-------------------------------------------------------------+----------------+
| node-141 | 2.75 Iris-virginica, 1.50 Iris-versicolor, Iris-setosa      | 1.25           |
| node-142 | 3 Iris-virginica, 1.25 Iris-versicolor, Iris-setosa         | 1.75           |
| node-143 | 3 Iris-virginica, 1.75 Iris-versicolor, 0.25 Iris-setosa    | 1.25           |
| node-144 | 2.50 Iris-virginica, Iris-versicolor, 0.75 Iris-setosa      | 1.5            |
| node-145 | 2 Iris-virginica, Iris-versicolor, 0.25 Iris-setosa         | 1              |
| node-146 | 3.75 Iris-virginica, 2.25 Iris-versicolor, 1.25 Iris-setosa | 1.5            |
| node-147 | 3.25 Iris-virginica, 1.75 Iris-versicolor                   | 1.5            |
| node-148 | 4.25 Iris-virginica, 1.75 Iris-versicolor, 1.25 Iris-setosa | 2.5            |
| node-149 | 2.25 Iris-setosa, 1.75 Iris-virginica, 0.50 Iris-versicolor | 0.5            |
| node-150 | 5.75 Iris-virginica, 2.50 Iris-versicolor, 1.25 Iris-setosa | 3.25           |
+----------+-------------------------------------------------------------+----------------+
Notes:
1) we have two wrong answers, node-99 and node-149. Though they do have small discrimination of 0.25 and 0.5 respectively, so the code knows its results in those cases might be in error. Otherwise, the discrimination is really good! And, 100*28/30 = 93.3% success rate.
2) I tried the h operator:
h |*> #=> M drop-below[0.7] similar[input-pattern,pattern] |_self>
but it gave terrible results!
3) We can interpret the h2 operator:
h2 |*> #=> coeff-sort M similar[input-pattern,pattern] |_self>
as a weighted average of M applied to the results from similar.
eg: observe the results from similar applied to node-150:
sa: table[node,coeff] 100 similar[input-pattern,pattern] |node-150>
+----------+-------+
| node     | coeff |
+----------+-------+
| node-62  | 50    |
| node-71  | 50    |
| node-117 | 50    |
| node-128 | 50    |
| node-139 | 50    |
| node-2   | 25    |
| node-13  | 25    |
| node-14  | 25    |
| node-26  | 25    |
| node-39  | 25    |
| node-67  | 25    |
| node-76  | 25    |
| node-78  | 25    |
| node-84  | 25    |
| node-85  | 25    |
| node-89  | 25    |
| node-102 | 25    |
| node-103 | 25    |
| node-104 | 25    |
| node-105 | 25    |
| node-106 | 25    |
| node-108 | 25    |
| node-109 | 25    |
| node-111 | 25    |
| node-113 | 25    |
| node-115 | 25    |
| node-124 | 25    |
| node-126 | 25    |
| node-127 | 25    |
| node-130 | 25    |
| node-134 | 25    |
| node-136 | 25    |
| node-138 | 25    |
+----------+-------+
That's it! It works! I guess next I should try an even larger example.

No comments:

Post a Comment