Self Organizing Map (SOM)
Self-Organizing Map Overview
A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps are different than other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space.
A self-organizing map showing US Congress voting patterns visualized in Synapse. The first two boxes show clustering and distances while the remaining ones show the component planes. Red means a yes vote while blue means a no vote in the component planes (except the party component where red is Republican and blue is Democrat).
This makes SOM useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling. The model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen, and is sometimes called a Kohonen map.
Like most artificial neural networks, SOMs operate in two modes: training and mapping. Training builds the map using input examples. It is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector.
Training
The training steps of a SOM are simple.
- Initialise the interconnecting weights with random values.
- Present an input pattern to the network.
- Choose the output neuron with the highest activation (the “winner”).
- Update the weights of neurons that are within the “neighbourhood” of the winner, using a relative learning factor.
- Reduce the learning factor monotonically.
- Reduce the size of the “neighbourhood” monotonically.
- Repeat from step two until only small updates are observed.
The
spread of the neighbourhood function will initially include all neurons
on the grid, gradually reducing to include only the winning neuron.
To
provide an example of what to expect from a SOM, I have prepared a
simple example that will attempt to group twenty-five foods into
regions of similarity, based on three parameters, which are protein,
carbohydrate and fat.
Therefore, the challenge for this SOM is
to reduce data containing three dimensions down to two, whilst
retaining meaning. It does this by automatically indentifying
differentiating features that will have the greatest effect.
Input data
The input data is as follows:
Apples,0.4,11.8,0.1
Avocado,1.9,1.9,19.5
Bananas,1.2,23.2,0.3
Beef Steak,20.9,0.0,7.9
Big Mac,13.0,19.0,11.0
Brazil Nuts,15.5,2.9,68.3
Bread,10.5,37.0,3.2
Butter,1.0,0.0,81.0
Cheese,25.0,0.1,34.4
Cheesecake,6.4,28.2,22.7
Cookies,5.7,58.7,29.3
Cornflakes,7.0,84.0,0.9
Eggs,12.5,0.0,10.8
Fried Chicken,17.0,7.0,20.0
Fries,3.0,36.0,13.0
Hot Chocolate,3.8,19.4,10.2
Pepperoni,20.9,5.1,38.3
Pizza,12.5,30.0,11.0
Pork Pie,10.1,27.3,24.2
Potatoes,1.7,16.1,0.3
Rice,6.9,74.0,2.8
Roast Chicken,26.1,0.3,5.8
Sugar,0.0,95.1,0.0
Tuna Steak,25.6,0.0,0.5
Water,0.0,0.0,0.0
Output
After
running this data through the SOM, the foods were placed on a 10x10
grid representing their relative similarities. A graphical
representation is shown below.
How
has the feature map grouped items together, whilst crushing three
dimensions into two? Well, a number of zones have formed. Water, which
contains no protein, carbs or fat, has been pushed to the bottom right.
Directly above in the top right hand corner, sugar, which is made
almost entirely of carbs, has taken hold. In the top left corner,
butter reigns supreme, being almost entirely fat. Finally, the bottom
left is occupied by tuna, which has the highest protein content of the
foods in my sample. The remaining foods live between these extremes,
with a junk food zone occupying the centre ground.
Source Code
The latest source code (C#, Java and Microsoft Excel versions) can be downloaded from the http://datamining.codeplex.com site.
References
Компании из статьи
Microsoft Украина | Украинское подразделение компании Microsoft. |