This script aims to predict the probability that a college basketball team wins against another in a match. It leverages historical data on teams, regular season results, and tournament seeds to build a predictive model.
Purpose :
- Understand how to transform raw match data into features usable
by a machine learning model.
- Build a supervised model capable of predicting the match winner.
- Evaluate the model using standard metrics (log loss, ROC-AUC)
and apply it to tournament simulations.
Data :
- "teams": team information (TeamID, TeamName, first and last
Division 1 season)
- "results": regular season match results (winning team, losing team,
score, match day)
- "seed_round_slots": information on tournament seeds and match slots
Variables:
- "team_stats": number of wins and losses per team per season
- "match_data": prepared match dataset for model training
- "X", "y": features and target for training
- "model": trained logistic regression model
- "matchup_example": sample tournament matches for prediction
Model:
- Logistic Regression
- It is supervised because it learns from labeled data: each historical
match has a label "1" if Team1 wins, "0" otherwise.
- Suitable for binary classification and allows estimating the probability
of a team winning.
Objectives:
1. Load the necessary CSV files.
2. Compute wins and losses for each team and season.
3. Create a match dataset ready for training.
4. Normalize the data and split into training and test sets.
5. Train a supervised Logistic Regression model.
6. Evaluate the model using log loss and ROC-AUC.
7. Prepare a sample tournament matchup and predict win probabilities.
馃幆 The detailed methodology and results can be accessed through this link:
馃憠click here now! : https://github.com/
The Abdi-Basid Courses Institute (tABCi)
@2025 Abdi-Basid ADAN
