Okay, so yesterday I was messing around trying to grab some player stats from that Colorado Rockies vs. Dodgers game. Figured it’d be a fun little project, right?

First thing I did was hit up the usual sports sites – ESPN, *, the whole shebang. I was hoping they’d have some easily copy-and-paste-able tables, but no such luck. Everything was buried in divs and spans, a real nightmare. I mean, who designs these websites?
Scraping Time!
Alright, time to get serious, so I decided to use Python and BeautifulSoup. I started by looking at the HTML source code of the pages. I knew I wanted the batting stats, so I started digging around for tables or divs that contained names, at-bats, hits, runs, you know, the usual stuff.
I imported the libraries:
requests
BeautifulSoup
Then I used requests to get the HTML content from the webpage. It was kinda messy, but BeautifulSoup cleaned it up a bit. I used find_all()
method to find tags in the HTML, like table, div, span etc.

Parsing the Data
The hardest part was figuring out how the data was structured. The HTML wasn’t very consistent, so I had to write some pretty specific code to grab the right values. I used a combination of CSS selectors and loops to extract the player names, stats, and other details.
I had to deal with a lot of “None” values because some players didn’t have stats for certain categories. I also had to convert the strings to numbers.
Cleaning it Up
Once I had the data, it was a mess, so I used Pandas to create a DataFrame. Then I dropped the rows with missing information and renamed columns. At first the data looked like a jumbled mess, like seriously ugly. So, I converted some of the data types, dropped some columns I didn’t need, you know, the usual cleaning stuff. For instance:

- Converted AB, H, RBI etc. to integer
- Renamed the columns to something more readable.
Finally…
After all that, I had a nice, clean DataFrame with all the player stats. I saved it to a CSV file so I could play around with it later. Not bad for an afternoon’s work, eh? I mean, it wasn’t perfect, but it was good enough for my purposes. I am thinking of plotting some charts to compare the batting averages. Overall it was fun, and I learned a thing or two.