Downsample/Average a Data Stream
Hi all,
I have a question about comparing sampled streams of data (that could apply to any data, gestural, audio descriptors etc). I am looking for ways to sample and compare input data streams of varying lengths to each other, to find similar or contrasting data 'phrases'. I'm doing this matching using the zsa.dist external from Mikhail Malt and Emmanuel Jourdan's zsa descriptors library: http://www.e--j.com/?page_id=83
My question concerns averaging out data streams, as I want a consistent way to compare streams of varying lengths. So far I am achieving this by averaging out the data streams of varying lengths to a fixed length - i.e. 50 data points. The way I am doing this is by taking a sampled stream, rounding the length up or down the nearest multiple of the desired fixed length (if rounded down the list is truncated, if rounded up it is padded with 0's), and then dividing this rounded length by the fixed length to get an evenly spaced window size with which to average out the data.
So if the original list has 258 elements and I want it averaged or down-sampled to 50 points:
Truncate the list to 250 elements,
Divide this length by the fixed length desired, 250/50 = 5,
Take the average value of every 5 elements,
Make a new list out of these averages.
As this approach was arrived at by trial and error, I guess I just wanted to know if anyone has tried to achieve anything similar and has approached it in a different way? Looking on the net for approaches to down-sampling data like this hasn't proved very useful for my purposes -or maybe I'm not looking in the right places :-)
Any ideas on how to improve this approach, or if there is another approach that may be more accurate?
I have attached an example patch to illustrate the idea.
Thanks in advance,
Ben
look into slide, accum, and zl.
Hi Vb, I'll have a look at the external - thanks for the heads up on that.
Hi Roman - thanks for the reply, have you taken a look at the attached patch? I make extensive use of the zl objects for processing these lists... Obviously a simpler solution would be to take a discrete value every n data points to downsample the list to a fixed length - however I did this averaging to have the output list more consistent.
Ben
you could also do it with [jit.matrix], putting a matrix into a smaller one should "resample" directly, though I'm not totally sure what the results would be. It would be quite straightforward though, so in your example above:
jit.matrix 1 char 258 into-->
jit.matrix 1 char 5 @adapt 0 (this "resamples" to the new size, though without truncation/padding like you're doing)
the new matrix should show the averaging ("luminance") directly?
Also not sure what @interp will do, probably want it off.
Thanks to you both for the fantastic advice. VB I tried your original js code - works like a charm - I will try the second one also. I don't code in javascript myself but it looks like I might need to learn!!
Will also try the jit.matrix solution.... thanks a bunch seejayjames
Great!
VB: Tried the second js code and it works very well also - thanks a bunch mate I really appreciate it - c'est vraiment très sympa !
Now to see if I can use these reduced lists for my matching purposes - I'm gathering descriptor data from amp, pitch, brightness and noisiness (using analyzer~) to try and match incoming phrases to a database of stored and analysed phrases... will see how this works and report back.
Thank you!!!
Hi Ben,
I'm working on something REALLY similar right now. But I've gone a different way, maybe it's interesting for you:
- I'm playing a loop on the piano and record a "spectral picture" of 200ms when a transient occurs. This I save in a jit.matrix.
- Then, when I continue playing the piano loop, I stream the incoming signal to a jit.matrix of the same size.
- At the end of each pfft-cycle I compare those both jit.matrices with each other. If they are more similar than a certain value I consider this the same chord/notes as the "spectral picture".
This way I don't need to average anything and I stay in the signal/jitter world, which is imho more reliable in timing.
If you have questions/wanna talk, just write me an email.
simon dot slowik at gmx dot de
cheers, simon