audio - key detection

jamesson's icon

Thinking about future projects

This

Seems relatively straightforward to implement with what I know of msp (haven't done any work yet, all max so far). Has anybody done any patches/objects like this?

Many thanks in advance

Joe

brendan mccloskey's icon

Hey Joe

the only work I've seen in this area (no doubt there are numerous others though) is Richard Garrett's nwdlbots, a suite of M4L generative music plugins that contain a key/chord follower:

May or may not be a start?
HTH

Brendan

brendan mccloskey's icon

just watched the video........MIDI-based, so maybe not

Peter McCulloch's icon

It's definitely a non-trivial problem (chords/keys from audio), but if you want to know more about the research in this field I'd recommend checking out: http://www.music-ir.org

jamesson's icon

Peter

Surely we can

0) locate and mark broad-spectrum noise (drum hits, etc) to eliminate it and improve accuracy
1) FFT to see the frequency content per given timeframe
2) Match the frequency content to known pitches
3) Average together the pitches to produce a key for the piece

Granted, purely theoretical, but surely the right approach? After all, there cannot be groundbreaking new tech here, for instance http://www.mixedinkey.com/?gclid=CPe3z6iFgrICFUGo4AodUnMAYw?

jamesson's icon

Specifically, isn't fft~ sufficient unto the task in (2)?

Peter McCulloch's icon

Not saying impossible, just highly non-trivial because it's a messy data set. Yes, there are various tools for mapping the FFT to chroma (in Matlab), but even then it's not necessarily a guarantee of accuracy. It's also one thing to do it for a specific target, such as Western popular dance music (which is relatively harmonically static; probably most of the time you could just look at the pitch of the first bass note), but quite another to do it consistently for multiple genres. (see the literature on polyphonic pitch-tracking for a prime example... I'm amazed at how well transcribe~ can work with solo piano, but you also have to take into account that we've been working on this problem for decades.)

I provided a link to bibliography because it is a hard problem and it's worth seeing how others have approached it. The approach you're suggesting has been tried, and there are probably worthwhile refinements in the literature. I'm not up on the most recent literature, but as of 3-4 years ago we weren't at 90-95% correct recognition IIRC (and I may not), and I can think of a lot of corner cases (Middle Eastern music, anything modal, music that stays away from the tonic) that wouldn't work with what you're describing.

I'd also invite you to try implementing steps 0-3 (the Gabor library might be helpful). If you can do just step 0 well, you'll have a pretty marketable product. It's great to try these things out; if nothing else, you'll come away with a better understanding of how to produce results. You'll probably also find that Max is not an ideal platform for developing such algorithms (auto-corre-what?), as it's a little trickier to setup both the algorithms as well as obtaining statistics concerning the results. (MatLab/Octave or SciPy are both good for this)

Music is messy and diverse and general purpose algorithms are hard to come by; what may work fine against a controlled studio recording may fail spectacularly in a live context with extra background noise or with different instrumentation. You may find a solution that works for you and your needs in there, however, and I'd encourage you to pursue such, but I'm just saying this: these are generally not simple problems.

jamesson's icon

Peter

I appreviate the info (and will study your link in-depth later) but I would argue that I have provided two strong counterexamples to your claim that this is not a solved problem. If these folks can do it, surely we can too?

Are these things inaccurate? Probably. Are they accurate enough for extremely high-profile people to use? Apparently.

Peter McCulloch's icon

Here's what I'm saying: it's a hard problem and a lot of smart people have worked on it and continue to work on it. In fact, you can see the results from ISMIR's 2011 competition on key recognition here: http://nema.lis.illinois.edu/nema_out/mirex2011/results/akd/index.html, and I'd say it's not considered solved (IRCAM's got an entry...). You may find the papers helpful as you also can get a sense of the type of inputs that lead to false recognitions. A solution doesn't have to be perfect to be useable, but that doesn't make it solved either. Fiddle~ in no way "solved" pitch-tracking, but it let a lot of us do a little bit more.

Ultimately, it depends on what your needs are, and I'm happy someone has come up with a tool that works well enough for the needs of its users. It's a little curious that they're also selling this product (a massive printed listing of 50,000 songs with key detected manually): http://www.harmonic-mixing.com/KeyResultsDatabase.aspx, and I wonder if the two are intertwined. They don't sell this information in digital form, and I can think of a few reasons for that: first, the program could just be doing a lookup where appropriate ID3 tags are provided; second, you don't want to provide a large, clean dataset for your competition to test their algorithms against... This is my favorite quote from that page: "You may choose to have the results sorted by artist, song title, key or BPM"

It would be really easy to search a database for anything that's already a known entity and then use the key-finding algorithm as a fall-back. If you know the title, you could search by that; if you don't, you could use Echonest to find the song and then search from there. The program would be both faster and more accurate, and I consider this a totally legitimate strategy when the goal is simply to find the key for a given performance of a piece of music. I would be curious to find out how much of its key-finding is via algorithm and how much by look-up. It'd be great if it's all by algorithm, but there's nothing about their product that necessarily requires it to be. Their key-finding algorithm is patented, so you might find out some information by reading that.

If it was throw-together-a-Max-patch easy, it would probably be out there. Also, given that you haven't worked in MSP yet, and that most people have difficulties with the FFT, you will also probably find implementing this idea non-trivial (not impossible/unfeasible/unworkable, but will require a significant chunk of your--or someone else's--time), which was the point of what I've been saying. It's great to take on hard problems, but it's also good to know what you're biting off.

My counter-counter examples: Bebop, and a recording made a quarter-step sharp (which would presumably be wrong 100% of the time, since C quarter-sharp major goes with neither C or Db).

jamesson's icon

Your counterexamples are convincing, but I'm curious - why do you think that the people selling the database have anything to do with the people selling the software?

jamesson's icon

point for Peter - documentation says it needs a net connection to work
Sad

Peter McCulloch's icon

The connection question is an easy one: there's an email link on the page I listed to contact@mixedinkey.com. ;-) I'm guessing the printed out database is an earlier project. Books are way easier to copy-protect (mechanically rather than legally) than spreadsheets, and this is the type of data that may not possible to copyright. The key and tempo of "Take on Me" is certainly not the original work of the author. The digital format would obviously be considerably more valuable to its users. I'd be curious how quickly a database of this sort could be crowd-sourced.

The technology is getting better all the time, and I'm sure that they do do at least some algorithmic key-tracking in their product. What I'd like even more is a higher-level algorithm that can recognize chords reliably. There's a bunch of people working on that, and I saw some promising stuff a couple of years ago at NYU. Cheers.

jamesson's icon

Well, the other app is _definitely_ an algorithm - says so in the forum post. I shouldn't be too surprised that free software is better than paid nowadays, but still, funny.

jamesson's icon

+1 as well. I am a huge fan of middle eastern (nusrat fateh ali khan ftw, and most sikh kirtan too) and not specifically middle eastern, but I love Tea Party

Rodrigo's icon

When you start talking "key detection" you're kind of limiting the type of music you're talking about. Likely tonal, western music, likely in just a single major or minor key (give or take a mode or two). That obviously gets into a semantic discussion, but I would imagine the complexity/possibility of key detection is directly related to how complex the intended application.

jamesson's icon

@Rodrigo, true, but....

If I am a DJ and I want to separate myself from the pack, sooner or later I will start grokking non-traditional forms.

mzed's icon

I'd check out the echonest max object here:

mz

jamesson's icon

Mzed: pretty sweet, will study more later.