Finally, what I have been building to with my similar[op] function operator:
"we can make a lot of progress in pattern recognition if we can find mappings from objects to well-behaved, deterministic, distinctive superpositions"
1) well-behaved means similar objects return similar superpositions
2) deterministic means if you feed in the same object, you get essentially the same superposition (though there is a little lee-way in that it doesn't have to be 100% identical on each run, but close)
3) distinctive means different object types have easily distinguishable superpositions.
And we note that these are all well satisfied by our webpage example. And presumably with not much work we can achieve the same with a lot of types of text document. The real world case of images and sounds will of course be much harder! The good news is that we only need to solve it once. Once we have good mappings from images/sounds to superpositions, I suspect, higher order pattern recognition (ie, more abstract concepts) should be relatively easy.
More uses of the similarity metric coming up!
Update: the exact details of the superposition are irrelevant. They can be anything! They only need to satisfy (1), (2) and (3), and then we are done.
Update: we now have a few of these. For strings letter-ngrams[1,2,3] works pretty well. For DNA sequences, maybe we need longer ngrams, like letter-ngrams[1,2,3,4,5,6] or something close to that. For longer text documents perhaps word-ngrams[1,2,3]. For other documents maybe fragment similarity. I haven't yet written the code, but for images, maybe a starting point (though it needs a lot of further processing) is what I call image-ngrams. Basically decompose an image into say 5*5 squares. Then process each of those squares separately, and so on. I don't have the full details of that yet. Point is, we are making progress on finding object -> superposition mappings.
Update: those mappings above are OK I suppose. But they are mapping objects to similar strings. It would be much preferable to have mappings to superpositions that encode meaning of that object. And that objects with similar meaning map to similar superpositions. For example, what cortical.io is doing mapping words to SDRs (sparse distributed representations). Indeed, I suspect this would go some way towards language translation. Instead of translating by looking up a dictionary, put in a word in one language then find the word in the second language that has the most similar meaning superposition. Presuming the meaning superpositions don't depend too much on the language they come from. I think this is a fairly safe assumption.