Data Clusters

Clusters are collections of data based on similarity."

Clusters

Clusters are collections of data based on similarity.

Data points clustered together in a graph can often be classified into clusters.

In the graph below we can distinguish 3 different clusters:

Identifying Clusters

Clusters can hold a lot of valuable information, but clusters come in all sorts of shapes, so how can we recognize them?

The two main methods are:

  • Using Visualization
  • Using an Clustering Algorithm
  • Clustering

    Clustering is a type of Unsupervised Learning.

    Clustering is trying to:

  • Collect similar data in groups
  • Collect dissimilar data in other groups
  • Clustering Methods

  • Density Method
  • Hierarchical Method
  • Partitioning Method
  • Grid-based Method
  • The Density Method considers points in a dense regions to have more similarities and differences than points in a lower dense region. The density method has a good accuracy. It also has the ability to merge clusters.
    Two common algorithms are DBSCAN and OPTICS.

    The Hierarchical Method forms the clusters in a tree-type structure. New clusters are formed using previously formed clusters.
    Two common algorithms are CURE and BIRCH.

    The Grid-based Method formulates the data into a finite number of cells that form a grid-like structure.
    Two common algorithms are CLIQUE and STING

    The Partitioning Method partitions the objects into k clusters and each partition forms one cluster.
    One common algorithm is CLARANS.

    Correlation Coefficient

    The Correlation Coefficient (r) describes the strength and direction of a linear relationship and x/y variables on a scatterplot.

    The value of r is always between -1 and +1:

    -1.00Perfect downhillNegative linear relationship.
    -0.70Strong downhillNegative linear relationship.
    -0.50Moderate downhillNegative linear relationship.
    -0.30Weak downhillNegative linear relationship.
    0No linear relationship.
    +0.30Weak uphillPositive linear relationship.
    +0.50Moderate uphillPositive linear relationship.
    +0.70Strong uphillPositive linear relationship.
    +1.00Perfect uphillPositive linear relationship.

    Perfect Uphill +1.00:

    Perfect Downhill -1.00:

    '

    Strong Uphill +0.61:

    No Relationship:

    // Create a Plotter let myPlotter = new XYPlotter("myCanvas1"); myPlotter.transformXY(); // Create some XY Points numPoints = 500; let xPoints = []; let yPoints = []; for (let i = 0; i < numPoints; i+=10) { xPoints[i] = Math.random() * myPlotter.xMax; yPoints[i] = Math.random() * myPlotter.yMax; } // Plot the Points myPlotter.plotPoints(numPoints, xPoints, yPoints, "blue"); myPlotter.plotLine(0, 0, myPlotter.xMax-100, myPlotter.yMax, "blue"); myPlotter = new XYPlotter("myCanvas2"); myPlotter.transformXY(); xPoints = []; yPoints = []; for (let i = 0; i < numPoints; i+=10) { xPoints[i] = Math.random() * myPlotter.xMax; yPoints[i] = xPoints[i] * 1.3; } myPlotter.plotPoints(numPoints, xPoints, yPoints, "blue"); myPlotter = new XYPlotter("myCanvas3"); xPoints = []; yPoints = []; for (let i = 0; i < numPoints; i+=10) { xPoints[i] = i; yPoints[i] = xPoints[i] * 1.2 + 50; } myPlotter.plotPoints(numPoints, xPoints, yPoints, "red"); numPoints = 11; myPlotter = new XYPlotter("myCanvas4"); myPlotter.transformXY(); xPoints = [50,60,70,80,90,100,110,120,130,140,150]; yPoints = [7,8,8,9,9,9,9,10,11,14,14,15]; for (let i = 0; i < numPoints; i++) { xPoints[i] = xPoints[i] * 2.8 - 130; yPoints[i] = yPoints[i] * 30 - 150; } myPlotter.plotPoints(numPoints, xPoints, yPoints, "blue"); myPlotter.plotLine(xPoints[0], yPoints[0], xPoints[10], yPoints[10], "red"); // Create a Plotter xPoints = []; yPoints = []; let myPlotter1 = new XYPlotter("myCanvas11"); let myPlotter2 = new XYPlotter("myCanvas12"); myPlotter1.transformXY(); myPlotter2.transformXY(); xMax = myPlotter1.xMax; yMax = myPlotter1.yMax // Create some XY Points numPoints = 40; for (let i = 0; i < numPoints; i++) { xPoints[i] = Math.random() * xMax; yPoints[i] = Math.random() * yMax; } myPlotter1.plotPoints(numPoints, xPoints, yPoints, "black", 2); myPlotter2.plotPoints(numPoints, xPoints, yPoints, "black", 2); for (let i = 0; i < numPoints; i++) { xPoints[i] = Math.random() * xMax/4 + xMax/50; yPoints[i] = Math.random() * yMax/4 + yMax/2;; } myPlotter1.plotPoints(numPoints, xPoints, yPoints, "black", 2); myPlotter2.plotPoints(numPoints, xPoints, yPoints, "red"); for (let i = 0; i < numPoints; i++) { xPoints[i] = Math.random() * xMax/4 + xMax/2.5; yPoints[i] = Math.random() * yMax/3 + yMax/4; } myPlotter1.plotPoints(numPoints, xPoints, yPoints, "black", 2); myPlotter2.plotPoints(numPoints, xPoints, yPoints, "green"); for (let i = 0; i < numPoints; i++) { xPoints[i] = Math.random() * xMax/4 + xMax/1.4; yPoints[i] = Math.random() * yMax/4 + yMax/2; } myPlotter1.plotPoints(numPoints, xPoints, yPoints, "black", 2); myPlotter2.plotPoints(numPoints, xPoints, yPoints, "blue");