- Published on
Bandersnatch: Unleashing the Power of Machine Learning and AI in Monster Classification
- Author
- Name
- Jiannina Pinto
Introduction
Welcome to the Bandersnatch Monster Project, where the power of machine learning and AI is utilized to automate the classification of monsters based on their attributes. Join me on an exciting journey as we delve into the fascinating realm of this project, analyze the data, explore the algorithms used, and reveal the amazing results achieved.
Project Statement
The goal of the Bandersnatch project is to automate the classification of monsters based on their unique attributes, providing users with valuable insights into their ranks. With a diverse range of monsters, understanding their traits and characteristics can be a challenging task. This project tackles this challenge by leveraging cutting-edge technology and advanced machine-learning algorithms to create an efficient and accurate classification system.
Data Collection
The dataset used for the monster rarity classification problem was sourced from MongoDB
and consisted of monsters randomly generated using the MonsterLab
library. During the preprocessing phase, irrelevant columns such as _id
, Name
, and Damage
were excluded from the dataset to focus on attributes relevant for modeling purposes. The dataset includes features like Type
, Level
, Health
, Energy
, Sanity
, and the target variable Rarity
. This dataset contains 1,500 observations, with rarity classes ranging from Rank 0 to Rank 5. Here is a random sample of the monster data, showcasing a diverse range of monster attributes.
Image: Random sample of the monster dataset
Data Preprocessing
The initial Exploratory Data Analysis (EDA) revealed an imbalance in the "Rarity" classes, with some classes being overrepresented and others underrepresented. To address this challenge, various preprocessing techniques were applied, including feature scaling of numerical variables, one-hot encoding of categorical variables, and using the ColumnTransformer
technique for streamlined preprocessing. Additionally, oversampling (using SMOTE
| Synthetic Minority Over-sampling Technique) and undersampling (using RandomUnderSampler
) techniques were employed to balance the rarity classes, ensuring fair representation for all classes.
Image: Distribution of the target feature "Rarity"
Machine Learning Models
To tackle the monster classification task, three powerful machine learning algorithms were utilized: Random Forest, Extreme Gradient Boosting (XGBoost), and Support Vector Machines (SVM).
- Random Forest was chosen for its ability to handle high-dimensional data and potentially capture non-linear relationships between monster attributes and their ranks.
- XGBoost was included as it has demonstrated state-of-the-art performance in several machine-learning competitions and is suitable for classification tasks.
- SVM was chosen for its ability to handle complex classification tasks and non-linear relationships using kernel functions. Also, because of its potential to generalize well to unseen data.
Initially, all the models were trained and evaluated using the imbalanced dataset, which reflected the real-world scenarios where some rarity classes were overrepresented compared to others. Despite achieving good roc_auc_ovo
scores, it was observed that the models struggled to classify the minority classes effectively, especially Rank 4 and Rank 5. This highlighted the challenge of imbalanced data and the need for addressing class imbalance to improve overall model performance. Now, let's examine the classification reports and the ROC AUC OvO scores to gain deeper insights into the performance of our models.
- Imbalanced Random Forest
ROC AUC OvO Imbalanced RF: 0.9587445566198092
- Imbalanced Extreme Gradient Boosting
ROC AUC OvO Imbalanced XGB: 0.9509173960536714
- Imbalanced Support Vector Machines
ROC AUC OvO Imbalanced SVM: 0.9800088622224434
To overcome the imbalanced data challenge, cross-validation was performed using techniques like StratifiedKFold
, and the data was split into multiple folds, ensuring that each fold preserved the same class distribution as the original dataset. The models were trained on balanced datasets achieved through oversampling and undersampling techniques. This resulted in significant improvements in classification performance, particularly for the minority classes. The balanced training data enabled the models to learn from a more representative dataset and improve their ability to classify all classes effectively. Let's take a look at the classification reports and the ROC AUC OvO scores.
- Balanced Random Forest
ROC AUC OvO Balanced RF: 0.9602214013749324
- Balanced Extreme Gradient Boosting
ROC AUC OvO Balanced XGB: 0.961452933038248
- Balanced Support Vector Machines
ROC AUC OvO Balanced SVM: 0.9734961009815477
The roc_auc_ovo
metric showed improvements, indicating that the models became more skilled at distinguishing between the different ranks of monsters. This balanced approach not only improved the accuracy and performance of the models but also ensured fair and reliable classification for all rarity classes.
Model Evaluation and Performance
The performance of the models was evaluated using various metrics such as ROC AUC OvO, overall accuracy, and F1-scores. The SVM model demonstrated the highest roc_auc_ovo
score of 97%, indicating its ability to classify monsters accurately. It also achieved the highest F1 scores
and high precision
and recall
across most classes, indicating a well-balanced performance.
Hyperparameter Tuning
To optimize the models, hyperparameter tuning was performed using techniques like Randomized Search. This involved systematically searching through different combinations of hyperparameters to identify the best configuration for each algorithm. The Tuned SVM model stood out with the highest ROC AUC OvO score and accuracy, demonstrating superior precision, recall, and F1-score values compared to the other models.
ROC AUC OvO Tuned SVM: 0.993272225898018
A summary table has been generated to display the evaluation metrics of the three tuned models::
Results and Insights
The best-performing model, SVM, was further evaluated on the test set to verify its effectiveness in real-world scenarios. It achieved an overall accuracy of 96% and a ROC AUC OvO score of 0.9845, indicating its strong ability to classify monsters accurately.
The classification report for the SVM model on the test set displayed excellent precision, recall, and F1-scores across all classes, highlighting its well-balanced performance, and accurately predicting the majority as well as the minority classes.
To provide further insights, feature importances were examined, revealing the significant contribution of certain attributes in determining monster ranks. These insights can assist in understanding the key factors that influence a monster's rarity.
The results highlight the significant role of the level attribute, followed by energy, health, and sanity in determining monster ranks. A visual representation of the feature importances can be found in the accompanying image.
Deployment and Future Work
The SVM model emerged as a robust choice for the classification task in the Bandersnatch Monster Project. The model was deployed in a web application using Flask, allowing users to interactively classify and explore monsters based on their attributes and make informed decisions based on the ranks assigned to each monster. Additionally, a visualization tool has been developed using the Altair visualization library, that serves as a playground for users to explore the correlation between different features, providing an interactive and engaging experience. Here is an example of the correlation visualization between "Health" and "Energy" grouped by "Rarity".
Image: Correlation between Health and Energy using Altair
Future improvements could involve incorporating additional features like Damage
, expanding the dataset, or integrating other advanced techniques such as natural language processing for text-based monster descriptions.
Feel free to explore the full codebase and implementation details in the GitHub repository linked below. Witness the power of machine learning and AI in unraveling the mysteries of the Bandersnatch monsters!
Conclusion
The Bandersnatch Monster Project showcases the huge potential of machine learning and AI in automating complex tasks like monster classification. By leveraging powerful algorithms, data preprocessing techniques, and hyperparameter tuning, this project has achieved remarkable results in accurately classifying monsters based on their attributes. With the integration of Flask, the web application framework, and the developed visualization tool, the project became more interactive and user-friendly.
Thank you for joining me on this exciting journey through the world of monsters and the power of data science and machine learning. I hope you found this article insightful and informative. By exploring the depths of monster classification, we have uncovered fascinating insights and embarked on a captivating adventure. Your interest and engagement are greatly appreciated.