[This is one of a series of posts that explore real world examples of mathematical modeling to help educators better understand its applications. This is not intended to be a context for a student lesson. To learn about Spies and Analysts, I recommend watching this webinar (with elementary, middle, and high school versions) or reading this blog post.]
If you’ve ever used Pandora, then you know that it has an amazing ability to recommend music to you that you may have never heard but really enjoy. They play a song for you, and you can give it a thumbs up if you really like it, thumbs down if you really don’t like it, or neither if it’s just ok. After a little while, it seems to somehow know your taste in music better than you know it! This leads you to stay on their site longer, and as a result, they can play more ads or get you to pay for them to go away.
So, what if you worked for Pandora and they asked you to create a formula to predict which songs people have never heard but will probably like? Where would you begin? What information would you want to know? What would you do with that data once you had access to it? These are the topics I’m exploring in my spies and analysts post. I want to walk you through the process so that you can better appreciate the complexities of mathematical modeling.
The first part of the process requires the spies. So, I want you to stop and take thirty seconds to think about what information you would use to recommend songs. Would you look at which songs are being played the most on radio stations or selling the most records? Maybe where a person lives affects the kind of music they listen to? Does gender or age matter? The list of questions could go on and on. So, think about what information you’d pick if this was your job. Once you’ve determined what information you’d want, keep reading.
A given song is represented by a vector containing values for approximately 450 “genes” (analogous to trait-determining genes for organisms in the field of genetics). Each gene corresponds to a characteristic of the music, for example, gender of lead vocalist, prevalent use of groove, level of distortion on the electric guitar, type of background vocals, etc. Rock and pop songs have 150 genes, rap songs have 350, and jazz songs have approximately 400. Other genres of music, such as world and classical music, have 300–450 genes.
Apparently it takes 20 to 30 minutes to categorize each song. Can you imagine the amount of work it would take to do this for every single song in existence!? Crazy enough, this is just part of how Pandora works. Specifically, even if you had all that information about the songs, how do you write a formula to figure out which song to play? Is “gender of lead vocalist” more important than “prevalent use of groove”? Remember, if your formula isn’t good, customers won’t stick around and you’ll be out of business.
This is where the analysts come in. Their job is to take the data, figure out what parts are more or less important, and break it down in such a way that it becomes useful. Take 30 more seconds to think about how you might even begin to work with the data.
The system depends on a sufficient number of genes to render useful results. Each gene is assigned a number between 0 and 5, in half-integer increments.The Music Genome Project’s database is built using a methodology that includes the use of precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control to ensure that data integrity remains reliably high.
For the record, I don’t completely understand what that just said either! The reality is that they created a formula to take all of that information, determine what was most important, and make it into a product that earns them significant revenue.
At this point, there are no computers or calculators that can figure this out on their own. This is where the jobs are at. If we truly want to focus our time and energy in a skill that will really help our students become college and career ready, mathematical modeling is where we need to be.