Okay, so yesterday I was messing around with trying to predict the Nets vs. Hawks game. Figured, why not, right?

First things first: Data! I went scraping (don’t tell anyone) for historical game data. Got a bunch of CSV files with all sorts of stats: points, assists, rebounds, the whole shebang. Started with just the last season and then thought, “Nah, gotta go back further!” So I ended up grabbing data from the last 5 seasons. More data, better predictions…hopefully.
Cleaning time! Ugh, this part always sucks. The data was messy. Missing values everywhere. Had to do some imputation (basically, filling in the blanks with reasonable guesses based on averages). Also, team names were inconsistent across different files. Spent a good hour just standardizing those so my code wouldn’t choke.
Feature Engineering: This is where I started getting creative. Raw stats are okay, but what about derived stats? I calculated things like:
- Average points per game for each team
- Win percentage against the other team in the past
- Recent performance (average points in the last 5 games)
Stuff like that. Tried to think like a basketball analyst.

Model Time: I’m no ML expert, so I kept it simple. Started with a Logistic Regression model. It’s pretty basic, but easy to understand and interpret. Split the data into training and testing sets (80/20 split). Trained the model on the training data and then tested it on the testing data to see how well it performed.
Results (kinda meh): The accuracy wasn’t great, like around 60%. Better than a coin flip, but not by much. I played around with different parameters for the Logistic Regression, but it didn’t improve much. Maybe Logistic Regression wasn’t the right model for this.
Tried another model: Next, I gave a Random Forest Classifier a shot. Heard it’s good for handling complex relationships. Trained it, tested it…slightly better accuracy, maybe around 63-65%. Still not amazing.
What Went Wrong? I think the problem is that basketball games are just really unpredictable. Injuries, a lucky streak, a bad call by the ref… all that stuff can throw the predictions off. Also, maybe I didn’t have the right features. Should I factor in player matchups? Home court advantage more heavily? Weather on the day of the game (kidding…mostly)?
Final Prediction (Don’t bet on this): Based on my (flawed) model, I’m leaning towards the Nets winning. But honestly, it’s a total guess.

Next Steps: If I were to do this again, I’d:
- Gather more data (maybe include individual player stats).
- Try more sophisticated models (like Gradient Boosting Machines).
- Spend more time on feature engineering (really try to find those hidden patterns).
Conclusion: It was a fun little project, even if the predictions weren’t spot-on. Learned a bit about data cleaning, feature engineering, and the limitations of machine learning. And it made watching the game a little more interesting!