Date: April 2024
GitHub: T20CricketPrediction
Our goal was to predict the outcome of T20 cricket matches using various deep learning techniques with a focus on optimizing accuracy and real-time prediction capabilities. This post summarizes our project journey through data collection, feature engineering, model selection, and deployment.
We began by gathering data from sources like Cricsheet and ESPN cricket stats. Necessary preprocessing steps included handling missing values, encoding categorical variables, and performing exploratory data analysis (EDA) to identify key patterns and relationships in the match data.
Technologies Used:
Utilizing Apache Airflow, we orchestrated the data pipeline to ensure efficient data flow and management. The pipeline includes data extraction, transformation, and loading (ETL) processes, enabling seamless integration of diverse data sources.
Key Components:
To enhance the diversity of our training dataset and improve the model's generalization capabilities, we incorporated augmentation, like limmiting the data to certain overs. This approach allows us to obtain match win predictions at any given stage in the match, not only during the final overs.
To enhance model tracking and experiment management, we integrated Weights & Biases (WandB) into our workflow. This facilitated real-time monitoring of training processes and streamlined collaboration.
Technologies Used:
We conducted hyperparameter tuning to optimize model performance. This involved adjusting parameters such as learning rate, batch size, and network architecture to achieve the best possible results.
Technologies Used:
To achieve optimal model performance, we utilized WandB Sweeps to systematically explore the hyperparameter space. The tuning process focused on four main aspects:
Learning Rate:
Batch Size:
Network Architecture:
Dropout Rate:
The hyperparameter tuning process led to a significant improvement in model accuracy from 80% to 85%, demonstrating the effectiveness of systematic optimization techniques.
The final model achieved an accuracy of 85% on the test set, which was evaluated across different overs. And found accuracy increasing as the overs progress which is trivial.
Win Example
Loss Example
By systematically progressing through data preprocessing, feature engineering, class balancing, error analysis, and advanced modeling techniques, we successfully built a robust model for predicting T20 cricket match outcomes. The final model achieved an accuracy of 85%, demonstrating its effectiveness in real-time prediction scenarios. Integrating this model into a cricket application enhances user engagement by providing insightful analytics and live win probabilities.
Overall Technologies Used:
This structured approach allowed us to effectively predict the outcomes of T20 cricket matches, ensuring our model was both accurate and reliable.