The goal of this project was to use varying forms of supervised learning model on either the red or white wine dataset provided to us. We were to train the models in order to find the best parameters for the model and then apply that to the training and testing splits to determine, by looking at the accuracy scores, if our model was true, overfitted or underfitted.
I chose to use the Random Forest Classifier model. I set a static random_state using the variable "seed" then I used Gradient Boosting Classifier to view the predicted probabilities.
The accuracy of the training set was 0.855, while the testing set performed at an accuracy of only 0.609, which indicates that the model may have been overfit slightly.
I then worked wth Neural Networks' MLP Classifier using tuning parameters of a fixed random state, hidden layer size fixed at 100, alpha of 1e-09, max iterations of 100,000, the lbfgs solver and a learning rate initiated at 1. For this I also worked with scaled data. The training set accuracy score indicated that the model was overfitting with a score of 1.000, and the accuracy of the testing set was not improed at 0.606.
Finally, I used the scaled data with the Support Vector Machines, using the rbf kernel, C of 1,000 and gamma of 0.001. This provided a definite improvement between the training and testing set accuracies (0.625559 and 0.608333). The small gap in the two accuracy scores indicate that the model was well tuned using the training data, thus yielding similar predictions with the testing data.
I chose to use the Random Forest Classifier model. I set a static random_state using the variable "seed" then I used Gradient Boosting Classifier to view the predicted probabilities.
The accuracy of the training set was 0.855, while the testing set performed at an accuracy of only 0.609, which indicates that the model may have been overfit slightly.
I then worked wth Neural Networks' MLP Classifier using tuning parameters of a fixed random state, hidden layer size fixed at 100, alpha of 1e-09, max iterations of 100,000, the lbfgs solver and a learning rate initiated at 1. For this I also worked with scaled data. The training set accuracy score indicated that the model was overfitting with a score of 1.000, and the accuracy of the testing set was not improed at 0.606.
Finally, I used the scaled data with the Support Vector Machines, using the rbf kernel, C of 1,000 and gamma of 0.001. This provided a definite improvement between the training and testing set accuracies (0.625559 and 0.608333). The small gap in the two accuracy scores indicate that the model was well tuned using the training data, thus yielding similar predictions with the testing data.
Comments
Post a Comment