Skip to main content

Supervised Learning: The Wine Project

The goal of this project was to use varying forms of supervised learning model on either the red or white wine dataset provided to us. We were to train the models in order to find the best parameters for the model and then apply that to the training and testing splits to determine, by looking at the accuracy scores, if our model was true, overfitted or underfitted.

I chose to use the Random Forest Classifier model. I set a static random_state using the variable "seed" then I used Gradient Boosting Classifier to view the predicted probabilities.

The accuracy of the training set was 0.855, while the testing set performed at an accuracy of only 0.609, which indicates that the model may have been overfit slightly.

I then worked wth Neural Networks' MLP Classifier using tuning parameters of a fixed random state, hidden layer size fixed at 100, alpha of 1e-09, max iterations of 100,000, the lbfgs solver and a learning rate initiated at 1. For this I also worked with scaled data. The training set accuracy score indicated that the model was overfitting with a score of 1.000, and the accuracy of the testing set was not improed at 0.606.

Finally, I used the scaled data with the Support Vector Machines, using the rbf kernel, C of 1,000 and gamma of 0.001. This provided a definite improvement between the training and testing set accuracies (0.625559 and 0.608333). The small gap in the two accuracy scores indicate that the model was well tuned using the training data, thus yielding similar predictions with the testing data.


Comments

Popular posts from this blog

Spring 2019 Courses

For my final semester in the Master's Informatics Program, I took the following courses: BUS 5743: Project Management Tools and techniques of project selection and management as defined by the Project Management Institute, including network diagrams, critical path analysis, critical chain scheduling, cost estimates, earned value management, and completion of team project management software required. CSCI 5803: Data Warehousing Design, implementation, and management of data warehouse systems and their applications; requirements for gathering data for data warehousing; data warehouse architecture; dimensional model design for data warehousing; physical database design for data warehousing; extracting, transforming, and loading strategies; design and development of intelligence applications for decision support; and expansion and support of a data warehouse. CSCI 5923: Capstone in Informatics Culminating organization and/or community-based interdisciplinary/interprofessio...

The Informatics Program at Texas Woman's University

Texas Woman's University Master of Science degree in Informatics program began in 2016. Hollie began pursuit of an advanced degree with an emphasis in data science and data analytics in the fall of 2017, in order to enhance her skills in her career and broaden her skills. She completed the program in the spring of 2019, graduating with a 4.0 GPA. Skills learned or reviewed Courses taken

Why Informatics?

The goal of obtaining my degree in Informatics was to broaden my knowledge and continue strengthening my current skills as a senior systems administrator. The (mostly) online program provided by Texas Woman's University offered just the right fit of diversity and course opportunities to help me achieve those goals. Additionally, the Denton campus was close enough to my home that I could, if the need arose, visit professors, tutorial/professional services or take advantage of other services provided by the university.