Wednesday 18 March 2015

introducing bridging sets

This time a simple but useful maths idea. Basically it is the motivation for my categorize code, which I will give in the next couple of posts.

Simply enough, let's semi-formally define them:
First, we say x is near y if metric[x,y] <= t for a metric of your choice, and some threshold t.

Then a linear bridging set is a set {x0,x1,x2,x3,...,xn} such that:
1) x_k is near x_k+1, for all k in {0,1,...,n}
2) x_0 is not near x_n

A general bridging set is a set {x0,x1,x2,x3,...,xn} such that:
-- every element in the set is near some other element in the set:
1) for every j in {0,1,...,n}, x_j is near an x_k for some k != j in {0,1,...,n}. 
-- some elements may not be near other elements in the bridging set:
2) there may exist j,k pairs such that x_j is not near x_k
Here is a simple visualization of that:
where A is a linear bridging set and B and C are general bridging sets.

Some examples of linear bridging sets:
- The easiest is a bridge. Set the left bank to be x_0, the right bank to be x_n, and the steps you take from one side to the other form a linear bridging set (and hence the name "bridging set").

-  species DNA: I don't recall the species, but it had a strange property. The members of the species at the north of their range could not breed with those at the south of their range. But, interestingly, at any point in between they could breed with their neighbours. At least that is the way I remember it.

- Your appearance, first as a baby, then up through adulthood, and then old age forms a linear bridging set.

- Your weekly shopping basket is usually a linear bridging set (if you are a consistent shopper). Eg, from 5 years ago, week by week, till now.

- There are other trivial examples of linear bridging sets:
 - {a,b,c,d,e,f,...,x,y,z}
 - {0,1,2,3,4,5,6,...,20,21,22,23,24,25}
 - Water slowly brought to the boil.
 - indeed, anything that slowly evolves with time is a linear bridging set.

Some examples of general bridging sets:
- The 400m running track on an oval is a simple general bridging set, so is the path you take for your morning jog.

- If we have a metric that can measure the similarity in DNA (some version of simm perhaps), then each species form distinct bridging sets. ie, all members of a species are quite similar in their DNA, and dissimilar to other species.

- The tree of life, ie, the evolution of life from single cell to multi-cellular life is a big bridging set. A smaller version of this is you are in a bridging set with your parents, grand-parents, and back to your ancestors. And then via your parents, you are in a bridging set with your siblings, their children, their children's children and so on.

- A person's face from all different angles/perspectives form a bridging set. Ditto a banana, or a pen or a stapler, or a tiger, or an elephant, any object really.

- The collection of atoms that make up say a dog, form a general bridging set.

- Scenes in a movie/tv show, even as characters move around a bit, is a general bridging set.

Anyway, that is all simple enough!

The point? It motivates our categorize code. Categorize basically converts a set of elements into distinct/disjoint bridging sets. The python for that up next.

Some notes:
The value of t has a strong influence on the result.
Set it too tight and your categories splinter into smaller ones.
Set it too loose, and everything ends up in the same category.

The addition of a single new element can sometimes merge two or more categories into one, if it is in a key location.
And the other way too. The removal of a key element can fracture a category into two or more pieces (eg, if you remove the middle of the bridge, it is no longer a single bridging set).

Update: unlike support-vector-machines, bridging sets/categorize do not require the distinct/disjoint sets to be separable by a hyper-plane. eg, in an extreme example you could have a bulls-eye, a circle with a dot in the middle, and the code would classify them as two different bridging sets.

Update: in general it does not make much sense to consider the average of the elements in a bridging set. Depending on the exact bridging set, the average can be a long way from any member of the given bridging set.

No comments:

Post a Comment