Self Organizing Map (SOM)

понедельник, 13 апреля 2009, Александр Краковецкий

Self-Organizing Map Overview

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps are different than other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space.
A self-organizing map showing US Congress voting patterns visualized in Synapse. The first two boxes show clustering and distances while the remaining ones show the component planes. Red means a yes vote while blue means a no vote in the component planes (except the party component where red is Republican and blue is Democrat).

This makes SOM useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling. The model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen, and is sometimes called a Kohonen map.

Like most artificial neural networks, SOMs operate in two modes: training and mapping. Training builds the map using input examples. It is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector.

Training

The training steps of a SOM are simple.

Initialise the interconnecting weights with random values.

Present an input pattern to the network.

Choose the output neuron with the highest activation (the “winner”).

Update the weights of neurons that are within the “neighbourhood” of the winner, using a relative learning factor.

Reduce the learning factor monotonically.

Reduce the size of the “neighbourhood” monotonically.

Repeat from step two until only small updates are observed.

The spread of the neighbourhood function will initially include all neurons on the grid, gradually reducing to include only the winning neuron.

To provide an example of what to expect from a SOM, I have prepared a simple example that will attempt to group twenty-five foods into regions of similarity, based on three parameters, which are protein, carbohydrate and fat.

Therefore, the challenge for this SOM is to reduce data containing three dimensions down to two, whilst retaining meaning. It does this by automatically indentifying differentiating features that will have the greatest effect.

Input data

The input data is as follows:

Item,protein,carb,fat
Apples,0.4,11.8,0.1
Avocado,1.9,1.9,19.5
Bananas,1.2,23.2,0.3
Beef Steak,20.9,0.0,7.9
Big Mac,13.0,19.0,11.0
Brazil Nuts,15.5,2.9,68.3
Bread,10.5,37.0,3.2
Butter,1.0,0.0,81.0
Cheese,25.0,0.1,34.4
Cheesecake,6.4,28.2,22.7
Cookies,5.7,58.7,29.3
Cornflakes,7.0,84.0,0.9
Eggs,12.5,0.0,10.8
Fried Chicken,17.0,7.0,20.0
Fries,3.0,36.0,13.0
Hot Chocolate,3.8,19.4,10.2
Pepperoni,20.9,5.1,38.3
Pizza,12.5,30.0,11.0
Pork Pie,10.1,27.3,24.2
Potatoes,1.7,16.1,0.3
Rice,6.9,74.0,2.8
Roast Chicken,26.1,0.3,5.8
Sugar,0.0,95.1,0.0
Tuna Steak,25.6,0.0,0.5
Water,0.0,0.0,0.0

Output

After running this data through the SOM, the foods were placed on a 10x10 grid representing their relative similarities. A graphical representation is shown below.

How has the feature map grouped items together, whilst crushing three dimensions into two? Well, a number of zones have formed. Water, which contains no protein, carbs or fat, has been pushed to the bottom right. Directly above in the top right hand corner, sugar, which is made almost entirely of carbs, has taken hold. In the top left corner, butter reigns supreme, being almost entirely fat. Finally, the bottom left is occupied by tuna, which has the highest protein content of the foods in my sample. The remaining foods live between these extremes, with a junk food zone occupying the centre ground.

Source Code

The latest source code (C#, Java and Microsoft Excel versions) can be downloaded from the http://datamining.codeplex.com site.

References

SOM - Wikipedia
The source code in C# was found here
Tom German. Self Organizing Maps
Timo Honkela. Description of Kohonen's Self-Organizing Map
Self-organizing maps researches
Clustering of Self-Organizing Map
Neural Network Based Clustering using Self Organizing Map (SOM) in Excel

Компании из статьи

Microsoft Украина

Сайт:
http://www.microsoft.com/ukr/ua/

Украинское подразделение компании Microsoft.

Поиск по сайту