Okay, so, yesterday I was messing around, trying to see if I could predict the outcome of the Tsitsipas vs Rune match. It was more of a “can I even do this?” kinda thing, not like I’m a pro gambler or anything.

First things first, I needed data. I started digging around for past match results, head-to-head stats, you know, the usual stuff. Found a few sites that had decent info, scraped what I could (don’t tell anyone!). I ended up with a bunch of CSV files filled with numbers – win percentages, average serves, all that jazz.
Next, I fired up Python. I’m no expert, just know enough to be dangerous. I used Pandas to clean up the data, get rid of the useless bits, and combine everything into something usable. It was a total mess at first, dates all wonky, player names misspelled… you get the picture.
Then came the “fun” part: trying to build a model. I thought, “Hey, maybe I can just use a simple logistic regression?” So I threw all the data I had into it, ran the thing, and… got garbage. Like, completely random results. Turns out, just throwing everything at the wall doesn’t work (who knew?).
I realized I needed to actually think about what factors were important. I started looking at things like serve speed, unforced errors, and how each player performed on different court surfaces. Did a little feature engineering, created some new columns based on ratios and averages. It was all very scientific (not really).
I tried a few different models: Support Vector Machines (SVMs), Random Forests… Honestly, half of them I didn’t fully understand, just copied code from Stack Overflow and tweaked it until it didn’t crash. The Random Forest seemed to give the most consistent results, so I stuck with that.

After a ton of trial and error, messing with hyperparameters (whatever those are), and a lot of Googling, I finally got something that seemed… okay. It wasn’t perfect, but it was better than just flipping a coin. It gave Tsitsipas a slight edge, which, in hindsight, wasn’t too far off.
The big takeaway? Predicting tennis matches is hard! My little experiment was more about learning and playing around with data than actually making accurate predictions. I definitely learned a bunch about data cleaning, model building, and the limits of my own coding abilities.
- Data Collection: Web scraping (shhh!) and CSV wrangling.
- Cleaning: Pandas to the rescue. Dates, names, and missing values, oh my!
- Modeling: Logistic Regression, SVM, Random Forest – threw everything at the wall.
- Result: Not great, but a fun learning experience. Tsitsipas had a slight edge in my model
Next Steps
I need to figure out how to actually validate these models. I just split the data randomly and called it a day. Also, maybe I should learn more about the algorithms I’m using… you know, actually understand what they’re doing. Baby steps!