# ▸ Unsupervised Learning :

1. For which of the following tasks might K-means clustering be a suitable algorithm
Select all that apply.

• Given a set of news articles from many different news websites, find out what are the main topics covered.
K-means can cluster the articles and then we can inspect them or use other methods to infer what topic each cluster represents

• Given historical weather records, predict if tomorrow’s weather will be sunny or rainy.

• From the user usage patterns on a website, figure out what different groups of users exist.
We can cluster the users with K-means to find different, distinct groups.

• Given many emails, you want to determine if they are Spam or Non-Spam emails.

• Given a database of information about your users, automatically group them into different market segments.
You can use K-means to cluster the database entries, and each cluster will correspond to a different market segment.

• Given sales data from a large number of products in a supermarket, figure out which products tend to form coherent groups (say are frequently purchased together) and thus should be put on the same shelf.
If you cluster the sales data with K-means, each cluster should correspond to coherent groups of items.

• Given sales data from a large number of products in a supermarket, estimate future sales for each of these products.

1. K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?

1. Suppose you have an unlabeled dataset $\inline&space;\{x^{(1)},&space;...&space;,&space;x^{(m)}\}$. You run K-means with 50 different random initializations, and obtain 50 different clusterings of the data.

What is the recommended way for choosing which one of these 50 clusterings to use?

### Check-out our free tutorials on IOT (Internet of Things):

1. Which of the following statements are true? Select all that apply.

• On every iteration of K-means, the cost function $\inline&space;J(C^{(1)},&space;...&space;,&space;C^{(m)},&space;\mu_1,&space;...&space;,&space;\mu_k)$ (the distortion function) should either stay the same or decrease; in particular, it should not increase.
Both the cluster assignment and cluster update steps decrese the cost / distortion function, so it should never increase after an iteration of K-means.

• A good way to initialize K-means is to select K (distinct) examples from the training set and set the cluster centroids equal to these selected examples.
This is the recommended method of initialization.

• K-Means will always give the same results regardless of the initialization of the centroids.

• Once an example has been assigned to a particular centroid, it will never be reassigned to another different centroid

• For some datasets, the “right” or “correct” value of K (the number of clusters) can be ambiguous, and hard even for a human expert looking carefully at the data to decide.
In many datasets, different choices of K will give different clusterings which appear quite reasonable. With no labels on the data, we cannot say one is better than the other.

• The standard way of initializing K-means is setting $\inline&space;\mu_1&space;=&space;...&space;=&space;\mu_k$ to be equal to a vector of zeros.

• If we are worried about K-means getting stuck in bad local optima, one way to ameliorate (reduce) this problem is if we try using multiple random initializations.
Since each run of K-means is independent, multiple runs can find different optima, and some should avoid bad local optima.

• Since K-Means is an unsupervised learning algorithm, it cannot overfit the data, and thus it is always better to have as large a number of clusters as is computationally feasible.

&
Click here to see more codes for Raspberry Pi 3 and similar Family.
&
Click here to see more codes for NodeMCU ESP8266 and similar Family.
&
Click here to see more codes for Arduino Mega (ATMega 2560) and similar Family.

Feel free to ask doubts in the comment section. I will try my best to answer it.
If you find this helpful by any mean like, comment and share the post.
This is the simplest way to encourage me to keep doing such work.

Thanks & Regards,
- APDaga DumpBox