Okay, so let me tell you about this thing I messed around with – the Garcia Pera prediction. It’s not some fancy AI breakthrough, just a bit of data fun, alright?

First off, I got the data. You know, the usual place, Kaggle. Downloaded the CSV, looked at the columns, the whole shebang. It’s basically sensor data from a machine, trying to predict some kind of fault. Nothing too exotic.
Then, I fired up my Jupyter Notebook. Yeah, Python all the way. Imported Pandas for handling the data, and Matplotlib for plotting stuff. Standard stuff, you know?
Next, I started cleaning the data. Found a few missing values – filled them with the mean, ’cause I’m lazy. Also, noticed some weird outliers. Tried clipping them, but didn’t make a huge difference, so I just left them in. Didn’t want to spend all day on data cleaning.
Feature engineering time! This is where I spent most of the time. Tried a bunch of things. Created some interaction terms (multiplying columns together), calculated rolling averages, stuff like that. Honestly, most of it didn’t work. But that’s how it goes, right?
After that, I tried a couple of different models. Started with a simple Logistic Regression. Quick to train, easy to understand. Then, moved on to a Random Forest. A bit more complex, but usually gives better results. Also tried an XGBoost, just for kicks. That thing’s a beast.

Split the data into training and testing sets, of course. Used scikit-learn for that. Trained the models on the training data, then evaluated them on the testing data. Used accuracy as the metric, ’cause it’s easy to understand. Didn’t bother with fancy metrics like F1-score. Too much hassle.
The Random Forest performed the best, surprisingly. XGBoost was a close second, but took way longer to train. So, I stuck with the Random Forest. Tuned the hyperparameters a bit using GridSearchCV. Basically, let the computer try a bunch of different combinations and see what works best.
Finally, I made my predictions. Saved them to a CSV file and uploaded them to Kaggle. Didn’t win any prizes, but it was a fun little project. Learned a few things along the way.
- Things I could have done better:
- More thorough data cleaning.
- More feature engineering (maybe using domain knowledge, if I had any).
- Experimented with different models (like neural networks).
- Used a better evaluation metric.
But hey, it was just a quick weekend project. Maybe I’ll revisit it someday. Who knows?
That’s pretty much it. Nothing earth-shattering, but hopefully, it was somewhat interesting. Now, I am gonna grab a beer.
