Okay, here’s my attempt to share my experience with the “jazz magic prediction” project, written in a blog-like style.

Alright folks, gather ’round! Today I’m gonna spill the beans on this little side project I’ve been tinkering with: “jazz magic prediction.” Sounds fancy, right? Well, it’s basically trying to get a computer to “predict” the next note in a jazz melody. Let me tell you, it’s been a wild ride!
It all started when I was messing around with some MIDI files of classic jazz standards. I thought, “Hey, wouldn’t it be cool if I could feed this data into some machine learning thingy and have it spit out new jazz licks?” So, I started digging.
First, I needed to wrangle the data. Those MIDI files were a mess! I used a Python library called `mido` to parse them. Spent a good chunk of time just figuring out how to extract the notes, their durations, and the timing information. Seriously, cleaning data is like 80% of any project, right?
Then came the fun part (or so I thought): choosing a model. I initially went with a simple Recurrent Neural Network (RNN) with LSTM layers. Everyone says LSTMs are good for sequential data, and music is definitely sequential. I used TensorFlow/Keras because that’s what I’m most familiar with. I built the model, fed it the cleaned MIDI data, and… nothing. The output was just random notes, not even close to anything resembling jazz.
So, back to the drawing board. I figured the problem was the input representation. Just feeding the raw MIDI note numbers wasn’t working. I decided to try encoding the notes as “one-hot vectors,” representing each note as a unique category. This helped a bit, but the output still sounded pretty bad. Like a cat walking on a piano, you know?

I started reading some papers on music generation. Found out that many people use something called “time-step embeddings” to encode the rhythmic information. Basically, you add extra input features that tell the model where it is in the bar. So I implemented that. Still didn’t sound quite right!
Then, after more digging, I found out about something called “temperature” in the context of generating text or music. Lower temperature makes the output more predictable and higher temperature makes it more random. I played around with the temperature parameter, and FINALLY, I started getting some results that sounded halfway decent!
It wasn’t perfect, mind you. Sometimes the model would get stuck in a loop, repeating the same few notes over and over. Other times, it would generate weird, dissonant chords that would make your ears bleed. But every now and then, it would spit out a little phrase that actually sounded pretty cool. Like a little spark of jazz magic!
Here’s a breakdown of what I did:
- Data Extraction: Used `mido` to parse MIDI files and extract note information.
- Data Cleaning: Filtered out irrelevant MIDI events and normalized note durations.
- Data Representation: Encoded notes as one-hot vectors and added time-step embeddings.
- Model Architecture: Used an RNN with LSTM layers, built with TensorFlow/Keras.
- Training: Trained the model on a dataset of jazz MIDI files.
- Generation: Used the trained model to generate new jazz melodies, experimenting with different temperature settings.
What I learned: This project was a reminder that machine learning is a lot of trial and error. You need to experiment with different architectures, different hyperparameters, and different data representations to get good results. It was also a reminder that music is really hard! Getting a computer to understand and generate music that sounds good is a seriously challenging problem. But hey, that’s what makes it fun, right?

Next steps: I’m thinking of trying a more sophisticated model, like a Transformer network. Also, I want to explore different ways of representing the musical data, maybe using some features that are more musically meaningful, like chord information or melodic contours. Who knows, maybe one day I’ll have a computer that can write a whole jazz symphony!