Self Organizing Map (SOM)

понедельник, 13 апреля 2009, Александр Краковецкий

Self-Organizing Map Overview

A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two dimensional), discretized representation of the input space of the training samples, called a map. Self-organizing maps are different than other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space.
A self-organizing map showing US Congress voting patterns visualized in Synapse. The first two boxes show clustering and distances while the remaining ones show the component planes. Red means a yes vote while blue means a no vote in the component planes (except the party component where red is Republican and blue is Democrat).

This makes SOM useful for visualizing low-dimensional views of high-dimensional data, akin to multidimensional scaling. The model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen, and is sometimes called a Kohonen map.

Like most artificial neural networks, SOMs operate in two modes: training and mapping. Training builds the map using input examples. It is a competitive process, also called vector quantization. Mapping automatically classifies a new input vector.

Training

The training steps of a SOM are simple.

  1. Initialise the interconnecting weights with random values.

  2. Present an input pattern to the network.

  3. Choose the output neuron with the highest activation (the “winner”).

  4. Update the weights of neurons that are within the “neighbourhood” of the winner, using a relative learning factor.

  5. Reduce the learning factor monotonically.

  6. Reduce the size of the “neighbourhood” monotonically.

  7. Repeat from step two until only small updates are observed.

The spread of the neighbourhood function will initially include all neurons on the grid, gradually reducing to include only the winning neuron.

To provide an example of what to expect from a SOM, I have prepared a simple example that will attempt to group twenty-five foods into regions of similarity, based on three parameters, which are protein, carbohydrate and fat.

Therefore, the challenge for this SOM is to reduce data containing three dimensions down to two, whilst retaining meaning. It does this by automatically indentifying differentiating features that will have the greatest effect.

Input data

The input data is as follows:

Item,protein,carb,fat
Apples,0.4,11.8,0.1
Avocado,1.9,1.9,19.5
Bananas,1.2,23.2,0.3
Beef Steak,20.9,0.0,7.9
Big Mac,13.0,19.0,11.0
Brazil Nuts,15.5,2.9,68.3
Bread,10.5,37.0,3.2
Butter,1.0,0.0,81.0
Cheese,25.0,0.1,34.4
Cheesecake,6.4,28.2,22.7
Cookies,5.7,58.7,29.3
Cornflakes,7.0,84.0,0.9
Eggs,12.5,0.0,10.8
Fried Chicken,17.0,7.0,20.0
Fries,3.0,36.0,13.0
Hot Chocolate,3.8,19.4,10.2
Pepperoni,20.9,5.1,38.3
Pizza,12.5,30.0,11.0
Pork Pie,10.1,27.3,24.2
Potatoes,1.7,16.1,0.3
Rice,6.9,74.0,2.8
Roast Chicken,26.1,0.3,5.8
Sugar,0.0,95.1,0.0
Tuna Steak,25.6,0.0,0.5
Water,0.0,0.0,0.0

Output

After running this data through the SOM, the foods were placed on a 10x10 grid representing their relative similarities. A graphical representation is shown below.



How has the feature map grouped items together, whilst crushing three dimensions into two? Well, a number of zones have formed. Water, which contains no protein, carbs or fat, has been pushed to the bottom right. Directly above in the top right hand corner, sugar, which is made almost entirely of carbs, has taken hold. In the top left corner, butter reigns supreme, being almost entirely fat. Finally, the bottom left is occupied by tuna, which has the highest protein content of the foods in my sample. The remaining foods live between these extremes, with a junk food zone occupying the centre ground.

Source Code

The latest source code (C#, Java and Microsoft Excel versions) can be downloaded from the http://datamining.codeplex.com site.

References

Self-organizing map Somalia Data type Artificial neural network Training Unsupervised learning Produce Dimension Two-dimensional space Discretization Group representation Input device Map Self-organization Sense Neighbourhood (mathematics) Function (mathematics) Topological property United States Congress Voting Pattern Synapse Box Clustering coefficient Distance Euclidean vector Plane (Dungeons & Dragons) Red Color Party Republicanism N-dimensional space Data URI scheme Multidimensional scaling Conceptual model Finnish language Hubert J. Farnsworth Musical mode Map (mathematics) Competitiveness Process (computing) Vector quantization Steps (group) Simple living Randomness Value (personal and cultural) Flow network Output Neuron Activation energy Jamie Foxx Update (SQL) Relative humidity Learning Factor analysis Redox Repeated sequence (DNA) Stairway Observation Spread spectrum Subroutine Social group Food Region Similarity (geometry) Parameter Protein Carbohydrate Fat Challenge-response authentication Three-dimensional space Meaning (linguistics) Features (pattern recognition) Greatest hits Item response theory Steak Mac OS Nut (fruit) Chicken Chocolate Pi Running Electrical grid Graphic design Feature (software design) Number Ice Bottom quark Top quark Corner kick Hold (ship) Left-wing politics Butter Supreme Court of the United States Finally (CeCe Peniston song) Military occupation Tuna Content (media) Sampling (signal processing) Xbox Live Junk food Korean Demilitarized Zone Research and development Ground (electricity) Source code C Sharp (programming language) Java (programming language) Microsoft Excel Uploading and downloading Hypertext Transfer Protocol Data mining Ribosome Reference Wikipedia Cat Linguistic description Neural network

Компании из статьи


Microsoft Украина


Сайт:
http://www.microsoft.com/ukr/ua/

Microsoft Украина Украинское подразделение компании Microsoft.

Ищите нас в интернетах!

Комментарии

Свежие вакансии