Fuzzy c-means clustering algorithm v.0.3 for Multidimensional Data

суббота, 2 мая 2009, Александр Краковецкий

Overview

The new version is adapted to the multidimensional data clustering. It means that objects can have more than two characteristics. Lets look how existing code was changed to apply for the multidimensional data clustering.

ClusterCentroid class was exluded

This class was an exact copy of the ClusterPoint class so I exluded it from the solution to make code more clear.

CusterPoint class changes

The Coords property was added for storing any number of object properties:

            public List Coords { get; set; }

I left X and Y properties for convenience but changed them accordinly:

            public double X { get { return this.Coords[0]; }  }

            public double Y { get { return this.Coords[1]; } }

Also I added Dimension property:

            public int Dimention { get { return this.Coords.Count; } }

New constructors:

        public ClusterPoint(List coords)
                 : this(coords, null)
        {
        }

        public ClusterPoint(List coords, object tag)
        {
            this.Coords = coords;
            this.Tag = tag;
            this.ClusterIndex = -1;
        }

CMeansAlgorithm class changes

First of all we need to change calculation logic of distance:

        private double CalculateEulerDistance(ClusterPoint point, ClusterPoint centroid)
        {
            //return Math.Sqrt(Math.Pow(p.X - c.X, 2) + Math.Pow(p.Y - c.Y, 2));

            double sum = 0.0;

            for (int i = 0; i < point.Dimention; i++ )
            {
                sum += Math.Pow(point.Coords[i] - centroid.Coords[i], 2);
            }

            return Math.Sqrt(sum);
        }

And the most important change in CalculateClusterCenters method:

            double uX = 0.0;
            double uY = 0.0;

            uX += uu * c.X;
            uY += uu * c.Y;

was replaced to:

            double[] uC = new double[c.Dimention];

            for (int k = 0; k < c.Dimention; k++) {
                   uC[k] += uu * c.Coords[k];
            }

and

            c.X = ((int)(uX / l));
            c.Y = ((int)(uY / l));

was replaced to:

            for (int k = 0; k < c.Dimention; k++)  {
                c.Coords[k] = ((int)(uC[k] / l));
            }

That's all! Now the sample code for using new version of algorithm looks like as the following:

            var points = new List();
            points.Add(new ClusterPoint(0,0));
            points.Add(new ClusterPoint(100,100));

            var clusters = new List();
            clusters.Add(points[0]);
            clusters.Add(points[1]);

            CMeansAlgorithm alg = new CMeansAlgorithm(points, clusters);
            alg.Run();

            Console.Write(alg.Log);

Source Code

The source doce can be found on the http://datamining.codeplex.com/ (Fuzzy c-means clustering v.0.3 for multidimensional data).

Happy codding!


Ищите нас в интернетах!

Комментарии

Свежие вакансии