Thursday, 14 April 2016

new tool: phi-transform

I'm still stumbling in the dark on this topic! Today I implemented phi-transform. The idea is break an image into k*k image-ngrams, then find the most similar layer-1 superposition (see yesterday's work), and substitute that in, in its place. Then map back to an image, and then average them all. The code is here, but you will need to understand some of the project for it to make sense. Anyway, the goal is to do a little normalization. The more normalization we can do, the easier pattern recognition becomes.

First, our method:
-- create 100 mnist test images (using the test-csv):
$ ./ 100

-- filter the sw-file from yesterday down to the layer-1 results (that is all we need):
$ cd sw-examples/
$ grep "^layer-1 " mnist-100--save-average-categorize--0_5.sw > mnist-100--layer-1--0_5.sw

-- phi-transform the test images (with a run-time of 62 minutes):
$ cd ..
$ ./ 10 work-on-handwritten-digits/test-images/*.bmp

-- make pretty pictures:
$ cd work-on-handwritten-digits/
$ ls -1tr test-images/*.bmp > image-list.txt
$ montage -geometry +2+2 @image-list.txt mnist-test-100.jpg
$ ls -1tr phi-images/* > image-list.txt
$ montage -geometry +2+2 @image-list.txt mnist-test-100--phi-transform-mnist-100-0_5.jpg
And now the results. First the 100 test images:
Now, after the phi-transform:
So, it is sort of normalized a bit. But I'm not sure it is a useful improvement. This phi-transform only used the average-categorize-image-ngrams from 100 training images. What if we ramp that up to 1000? And of course, there are two other variables we don't know the best values for. The image-ngram size, and the average-categorize threshold. Currently we are using 10, and 0.5 respectively. Also, I don't know if the phi-transform is better or worse than just doing a simple Gaussian blur. See my edge-enhance code for that. And of course, there is always the possibility that average-categorize won't help us at all, but I suspect it will.

Wednesday, 13 April 2016

average categorize some mnist digits

So, it's taken some thinking, and computing time, but I have made a start on digit recognition. Though to be honest I'm not 100% convinced this approach will work, but it might. So worth exploring. The outline of my method is. Get some images from the MNIST set. Convert them to images. Convert those to k*k image ngrams (a kind of 2D version of the standard ngram idea). Then those into superpositions. Apply my average-categorize to that, then finally convert back to images, and tile the result. Yeah, my process needs optimizing, but it will do for now, while in the exploratory phase.

Now, the details, and then the results:
-- map MNIST idx format to csv, with tweaked code from here:
$ ./

-- map first 100 mnist image-data to images:
$ ./ 100

-- map images to 10*10 image-ngrams sw file:
$ ./ 10 work-on-handwritten-digits/images/mnist-image-*.bmp

-- rename sw file to something more specific:
$ mv image-ngram-superpositions-10.sw mnist-100--image-ngram-superpositions-10.sw

-- average categorize it (in this case with threshold t = 0.8):
$ ./
sa: load mnist-100--image-ngram-superpositions-10.sw
sa: average-categorize[layer-0,0.8,phi,layer-1]
sa: save mnist-100--save-average-categorize--0_8.sw

-- convert these back to images (currently the source sw file is hard-coded into this script):
$ ./

-- tile the results:
$ cd work-on-handwritten-digits/
$ mv average-categorize-images mnist-100--average-categorize-images--threshold-0_8
$ ls -1tr mnist-100--average-categorize-images--threshold-0_8/* > image-list.txt
$ montage -geometry +2+2 @image-list.txt mnist-100--average-categorize-0_8.jpg
where montage is part of ImageMagick.

Now the results, starting with the 100 images we used as the training set (which resulted in 32,400 layer-0 superpositions/image-ngrams):
Average categorize with t = 0.8, with a run-time of 1 1/2 days, and resulted in 1,238 image ngrams:
Average categorize with t = 0.7, with a run-time of 6 1/2 hours, and resulted in 205 image ngrams:
Average categorize with t = 0.65, with a run-time of 3 hours 20 minutes, and resulted in 103 image ngrams:

Average categorize with t = 0.6, with a run-time of 1 hours 50 minutes, and resulted in 58 image ngrams:
Average categorize with t = 0.5, with a run-time of 43 minutes, and resulted in 18 image ngrams:
Now we have to see if we can do anything useful with these average-categorize image ngrams. The hope is to map test images to linear combinations of the image-ngrams, which we can represent in superposition form (details later), and then do digit classification on that using similar[op]. Though I'm pretty sure that is not sufficient processing to be successful, but it is a starting point. Eventually I'd like to use several layers of average-categorize, which I'm hoping will be more powerful.