Blaze
Dev

T20 Cricket Win Prediction with Deep Learning

Date: April 2024

GitHub: T20CricketPrediction

Objective

Our goal was to predict the outcome of T20 cricket matches using various deep learning techniques with a focus on optimizing accuracy and real-time prediction capabilities. This post summarizes our project journey through data collection, feature engineering, model selection, and deployment.


Overview Architecture

Architecture Overview


Data Collection and Building Pipeline

Data Collection

We began by gathering data from sources like Cricsheet and ESPN cricket stats. Necessary preprocessing steps included handling missing values, encoding categorical variables, and performing exploratory data analysis (EDA) to identify key patterns and relationships in the match data.

Technologies Used:

Building the Pipeline

Utilizing Apache Airflow, we orchestrated the data pipeline to ensure efficient data flow and management. The pipeline includes data extraction, transformation, and loading (ETL) processes, enabling seamless integration of diverse data sources.

Key Components:


Data Augmentation

To enhance the diversity of our training dataset and improve the model's generalization capabilities, we incorporated augmentation, like limmiting the data to certain overs. This approach allows us to obtain match win predictions at any given stage in the match, not only during the final overs.


Modeling with Weights & Biases (WandB)

To enhance model tracking and experiment management, we integrated Weights & Biases (WandB) into our workflow. This facilitated real-time monitoring of training processes and streamlined collaboration.

Technologies Used:


Hyperparameter Tuning

We conducted hyperparameter tuning to optimize model performance. This involved adjusting parameters such as learning rate, batch size, and network architecture to achieve the best possible results.

Technologies Used:

Detailed Report

To achieve optimal model performance, we utilized WandB Sweeps to systematically explore the hyperparameter space. The tuning process focused on four main aspects:

Hyperparameter Tuning

  1. Learning Rate:

    • Experimented with a range of learning rates from 0.0001 to 0.1.
    • Identified 0.001 as the optimal learning rate that balances convergence speed and stability.
  2. Batch Size:

    • Tested batch sizes of 32, 64, and 128.
    • Found that a batch size of 32 provided the best trade-off between training time and model accuracy.
  3. Network Architecture:

    • Modified the number of layers and units per layer.
    • Enhanced the model's capacity without overfitting by adding an additional hidden layer with 128 neurons.
    • Hidden Size: Increased to 256 neurons to further improve model capacity.
    • Number of Layers: Configured the model with 3 layers to balance complexity and computational efficiency.
  4. Dropout Rate:

    • Experimented with dropout rates ranging from 0.65 to 0.7.
    • Settled on a dropout rate of 0.7 to effectively prevent overfitting while maintaining model performance.

The hyperparameter tuning process led to a significant improvement in model accuracy from 80% to 85%, demonstrating the effectiveness of systematic optimization techniques.


Results

The final model achieved an accuracy of 85% on the test set, which was evaluated across different overs. And found accuracy increasing as the overs progress which is trivial.

Results

Win Example

Win Example

Loss Example

Loss Example


Conclusion

By systematically progressing through data preprocessing, feature engineering, class balancing, error analysis, and advanced modeling techniques, we successfully built a robust model for predicting T20 cricket match outcomes. The final model achieved an accuracy of 85%, demonstrating its effectiveness in real-time prediction scenarios. Integrating this model into a cricket application enhances user engagement by providing insightful analytics and live win probabilities.

Overall Technologies Used:


This structured approach allowed us to effectively predict the outcomes of T20 cricket matches, ensuring our model was both accurate and reliable.