VAVLe gave us an ARG! with a dirty soundfile
It's incredibly dirty actually. But it's given me a chance to get back into Max after many years of concentrating on Logic.
Here's the deal. There's a really dirty sound file ( http://bobsbarricades.bandcamp.com/track/noise ) with some conversations lying underneath a massive wall of static, and 2 mid-range hums: ~3.5khz and 5khz IIRC. So..after farting around with a ton of eq's and filters in logic and even going into, and getting decent results with iZotope RX (the trial) I had an idea.
In the sound file, there are generally agreed, 2 or 3 people talking. We know these possible characters from the game Half-Life 2, as Alyx, Dr. Kleiner, and Barney. We have access to a bunch of sound files of them saying one-liners in the studio in the game's data files. So I opened up max and used Trista Jehan's [analyzer~] to track the frequencies of any of the 3 characters (whichever I pick in [sfplay~]) and I filter out those frequencies from one [sfplay~] playing the noisy sound file leaving me with, hopefully, just the remaining static.
I then invert this static [*~ -1] and then add it back [+~] to another [sfplay~] playing the static file to cancel out, hopefully, the static and leave me with just the voice.
If you've read this far, you know my mistake. 1) I using a voice that's saying "x" to understand them saying "y." 2) I'm only tracking one frequency at a time with [analyzer~] so the rest of the frequencies go out with the static.
Soooooo is there any other way to do this? Some way to track more than just one frequency of the character'a voices and filter them all out for the inverted noise file? I hope this makes sense. Thanks =)
Hmm I don't think I can add much to the noise reduction question. I think this is a really difficult process though, because even when you have sound files from the same characters, how do you use any of that information? You could try to fft and see the overtones typically used to generate the vocal, but with speech there's so many consonants that mess up your analysis that I think I'll wish you good luck on this one :P.
But since i have no sound atm: Were the voices of barney, dr.kleiner and alyx used in the GlaDOS scene?
i think barney, dr.kleiner and alyx should be able to be tamed...
this is very interesting jhaysonn. i think it should be possible. but not for mere mortals like me. so i'd be interested in progress (!).
pfft~ is definitely the way to go, and/or the ircam ftm/gabor stuff.
i've used this sort of low amplitude gate / 'noise' compare [see attachment], based on a jean francois charles thing. but needs serious work...
good luck.
and ej's 'zsa.descriptors' are excellent for analysis, too.
Thank you guys for your responses! I should've noted I'm stuck on 4.6 and penniless to make the upgrade =,(
@Bas van der Graaff - which GlaDOS scene? in Portal? Or do you know some development that I haven't seen yet! One of the theories, that I think makes most sense, is that this ARG is actually us taking part if bringing GlaDOS 'back to life' as it were. The user/pass combination to log into this old BBS line was backup/backup and...well there's a whole bunch reasoning =)
@pid - I've just downloaded the ftm/gabor files (luckily Beta 3 is for 4.6!) and will check those out. I to late realized my ignorance in not remembering [pfft~]. I was lying in bed when I realized it and almost jumped back out! I'm going to look into it now and see if I could tune out the frequencies i get. I've never been too efficient as using it, so here's to learning more!
Thanks again guys. I had a quick idea to use Match EQ in Logic as a quick test, and the results weren't too promising =(
[edit] - OK...I definitely don't remember anything about [fft~] and don't think I ever used it when I was first learning Max/MSP. I think, If I can use [fft~] and, like in the help patch, instead of sending the information to the [scope~] I could send it to a [cascade~] I could achieve what I need. Any help on how to do this?
I tried with professional denoising techniques, the speech information possibly isn't in there, not just masked, simply not there from the beginning. Human ears are much better in recognizing speech in noisy environments (with noise louder than the signal) than any machine will be able within the next decade. I would not waist any amount of time with it anymore on the technical side...
But if you know all the possible phrases, you could train your own ear to find out with talking phrases and masking them more and more, setup a sort of training quiz for yourself. And don't forget to read Goedel, Escher, Bach - it explains it all...;-)
Stefan
Stefan
Stefan,
Thank you for your response, and for mirroring some similar sentiments. The more clips and samples I hear from other peoples deciphering and my own experiences...I get the feeling that whatever audio/dialogue that IS going on in side the file was in fact muffled to incomprehension BEFORE adding all that extra hoopla.
There are some clips that people have gone in and patched up that definitely give way to the sound of human dialogue but.... it's so difficult to pick anything out of it =P Probably a red herring for the ARG =P
Thank you for the recommendation on the book. I've never heard of it but it sounds like something I would immensely enjoy! I recently read "This is Your Brain on Music" and was really amazed at some of the things that were discussed as being normal behavior for the human ear. It made me feel like the ear was light-years ahead of our eyes =P
Thanks agin!