Blaze
Dev
Note: This model is currently optimized to enhance the user experience by providing few useful features. The actual version of the model will be even more accurate, incorporating additional data inputs for more reliable results.

Customer Churn Prediction

Date: July 2024

GitHub: BinaryClassification

Objective

Our goal was to predict potential customers who are more likely to churn using various machine learning techniques with a focus on optimizing recall and precision.

This post summarizes our journey through the stages of data exploration, feature engineering, model selection, and refinement.


Overview

Data Exploration Report


Start: Data Exploration and Preprocessing

We began by understanding the dataset’s structure and performing necessary preprocessing. This included handling missing values, encoding categorical variables, and performing initial exploratory data analysis (EDA) to identify patterns and relationships in the data.

Day 1 Report

Power BI Report
Power BI Report

Key Insights:

Technologies Used:


Feature Selection and Naive Modeling

To simplify the model, we dropped features with low Matthews correlation coefficient and used Variance Inflation Factor (VIF) to remove multicollinear features. We then implemented simple models using class weights to establish a baseline.


Handling Class Imbalance with SMOTE-N

Given the imbalanced nature of the churn data, we applied Synthetic Minority Over-sampling Technique for Nominal data (SMOTE-N) to balance the classes. This improved the model’s ability to predict churners effectively.

Key Insights:


Error Analysis and Model Refinement

To improve our model’s decision boundaries, we analyzed errors and introduced decision trees. This helped overcome the limitations of linear decision boundaries and led to better predictions.

Key Takeaways:


Finish: Ensemble Methods and AutoML

We implemented ensemble methods such as Random Forest and Gradient Boosting to enhance model performance. AutoML tools were also utilized to streamline model selection and hyperparameter tuning. Our final model, CatBoost, achieved excellent recall and precision.

Day 6 Ensemble Report

Models Used:

Key Insights:


Conclusion

By systematically progressing through data preprocessing, feature selection, class balancing, error analysis, and advanced techniques, we successfully built a robust model for predicting customer churn. The final model, CatBoost, achieved an impressive recall and precision of 86%, proving highly effective at identifying churners.