I guess I was too much into NIME-organization back in March, to notice the launch of the The Million Song Dataset. It contains no audio, but 300 GB worth of metadata about 1 million popular music songs. This sounds like hours of great fun for music researchers around the world, and will probably also be a great resource for music students working on MIR-applications. I would also expect that it is possible to use this for a number of creative applications.
Here is a quote from the press release:
For far too long, researchers and engineers working on Music Information Retrieval (MIR) have been forced to pay a hefty ante before being able to conduct their research: namely, they’ve had to build a set of data on which test their theories and hone their algorithms.
It may have started as a flippant suggestion for how to solve that problem, but The Million Song Dataset is now real, and anyone can download it. A collaboration between The Echo Nest and Columbia University’s LabROSA department (Laboratory for the Recognition and Organization of Speech and Audio), The Million Song Dataset has four main objectives:
- To encourage research on algorithms that scale to commercial sizes
- To provide a reference dataset for evaluating research
- As a shortcut alternative to creating a large dataset with The Echo Nest’s API
- To help new researchers get started in the MIR field.
What could you say but thanks!