A dataset containing 200 crabs with color and sex information was provided for investigational study using unsupervised learning techniques.
First things, perform several preprocessing steps on the data so that I can actually perform numeric calculations. Step one is to map the species data (blue and orange to corresponding numbers 0 and 1), then map the sex data (male and female to corresponding numbers 0 and 1). Then we extract the NumPy arrays from the data and apply scaling before we start building our model.
I chose to utilize kMeans clustering wih n_clusters=2 and agglomerative clustering for testing. Then I used adjusted rand index scoring to compare the results.
The agglomerative clustering model has a slightly better accuracy rating and the two clusters of objects are more well-defined using that methodology.
First things, perform several preprocessing steps on the data so that I can actually perform numeric calculations. Step one is to map the species data (blue and orange to corresponding numbers 0 and 1), then map the sex data (male and female to corresponding numbers 0 and 1). Then we extract the NumPy arrays from the data and apply scaling before we start building our model.
I chose to utilize kMeans clustering wih n_clusters=2 and agglomerative clustering for testing. Then I used adjusted rand index scoring to compare the results.
The agglomerative clustering model has a slightly better accuracy rating and the two clusters of objects are more well-defined using that methodology.
Comments
Post a Comment