part-d for kmeans

rik404 · rik404 · commit d4cd5c0c2714 · 2025-03-03T13:28:02.000-06:00
diff --git a/content/exercises/k-means.html b/content/exercises/k-means.html
@@ -22,4 +22,5 @@ <h2> K-Means Clustering:</h2>
 <h3>Give a collection M points in 2 dimensions, optimally classify them in K clusters using an iterative refinement algorithm, and output only the centroid of each cluster.</h3>
 <p><b>PART A: </b>(for simplicity, we will use 2 dimensions, the real problems typically involve a larger number of dimensions). More concretely: create a chare array called Points of N chares, each with M/N data points. The constructors initialize each data point with random X and Y coordinates, (0 <= X,Y < 1.0). The main chare generates K random points as an initial guess for centroids of the K clusters, and broadcasts them (as an array of x-y pairs) to the Points chare array, to an entry method called Assign. This method decides, for each point it owns, which centroid it is closest to. It then contributes into 2 reductions: one a sum of points it added to each cluster (so an integer array of size K) and another a sum of X and Y coordinates for each cluster (so, an array of 2K doubles). The target of the reductions are the UpdateCounts and UpdateCoords methods in the main chare. When both reductions are complete the main chare updates the centroids of each cluster (simply calclulate, for each of the K clusters, the sum of X coordinates i’th cluster divided by the count of points assigned to i’th cluster, and similarly for Y.) The algorithm then repeats the Assign (via broadcast) and Update steps, until the assignment of points to cluster remains unchanged. We will approximate this by calculating (in the main chare) the changes to any centroid, and when no centroid coordinate changes beyond a small threshold T (say 0.001), we will assume the algorithm has converged.</p>
 <p><b>Part B: </b>Reduce the number of reductions in each iteration from 2 to 1. Hint: there are 2 ways of doing this: first involves approximating the counts to be double precision numbers and the second involves writing a custom reduction).</p>
-<p><b>Part C: </b>The “no change to centroids above a threshold” method above is an approximation. Implement a more accurate method by using an additional reduction of the number of points which have changed their “allegiance” (i.e. the cluster to which they belong in each iteration. If this number is 0, the algorithm has converged. Again, as in part B, reduce the number of reductions to just 1 if possible.</p>
+<p><b>Part C: </b>The “no change to centroids above a threshold” method above is an approximation. Implement a more accurate method by using an additional reduction of the number of points which have changed their “allegiance” (i.e. the cluster to which they belong in each iteration. If this number is 0, the algorithm has converged. Again, as in part B, reduce the number of reductions to just 1 if possible.</p>
+<p><b>Part D: </b>Change the random number generation so that you create K sets, with each set centered around a distinct coordinate (purported centroid), and see if your program is able to retrieve the clusters correctly. Use any method to restrict points around a centroid to within the 0.0:1.0 box, along each dimension.</p>