Page 1 of 1

2731: "K-Means Clustering"

Posted: Mon Jan 30, 2023 10:27 pm
by ratammer
Image
Title text: According to my especially unsupervised K-means clustering algorithm, there are currently about 8 billion types of people in the world.

Yeah, this is just one of those where I really don't know enough about the maths involved.

Re: 2731: "K-Means Clustering"

Posted: Tue Jan 31, 2023 12:06 am
by chridd
k-means is an unsupervised machine learning algorithm. Kind of long explanation below (and some images from when I implemented it):
Spoiler (Show/Hide)
Suppose you're trying to get a computer to recognize the difference between cats and dogs. One way to do it is with supervised machine learning, which means that you tell the computer that these photos are cats, those photos are dogs, and it'll try to pick out what it is that the cat photos have in common that the dog photos don't. Another way is with unsupervised machine learning, which means that you just give the program a bunch of photos, but don't tell it which is which, and ask it to split the photos into two groups. Maybe it'll split them into cats vs. dogs, or maybe it'll split them into big vs. small, or light fur vs. dark, or you'll get two groups where it claims to have found some pattern but you don't really know what it is.

For k-means, in particular, you have to tell the computer how many different kinds of things it should look for; that's what the k is. So if k = 2, that means that it's looking for two kinds of things, k = 3 it's looking for three kinds of things, and so on. Setting k higher than the number of groups you're actually looking for can make sense, because maybe it'll have an easier time if, for instance, it can treat big dogs and small dogs as different kinds of thing.

I wrote an implementation of k-means for a class a while back. I gave it a bunch of handwritten digits and told it to look for 10 different digits (k = 10) and 30 different digits (k = 30), and here are the best results from each (this shows what it thinks each digit looks like, on average; bottom row is the final answer, everything above that is intermediate steps):
k = 10
k = 10
trial-2.png (5.34 KiB) Viewed 351 times
k = 30
k = 30
trial-9.png (13.13 KiB) Viewed 351 times

In this comic, the person told the computer "there are three kinds of people, find what they are", and the computer found that one of those kinds of people is people who tell the computer that there are three kinds of whatever they're looking for. She's not sure what's different about the other two groups, though; maybe the computer found some pattern that we aren't able to see, or maybe it just arbitrarily divided that group.

Re: 2731: "K-Means Clustering"

Posted: Tue Jan 31, 2023 6:51 am
by ratammer
Right, I see. Thanks!

Re: 2731: "K-Means Clustering"

Posted: Tue Jan 31, 2023 1:29 pm
by somitomi
I wonder if you could make it recognise police dogs by using k=9

Re: 2731: "K-Means Clustering"

Posted: Fri Mar 24, 2023 10:24 am
by heuristically_alone
somitomi wrote: Tue Jan 31, 2023 1:29 pm I wonder if you could make it recognise police dogs by using k=9
This comment should receive the chuckle it deserves. :lol:

Also thanks Chridd for being so informative. I learned something interesting today.