Downsample/Average a Data Stream

May 7, 2011 at 7:30am

Downsample/Average a Data Stream

Hi all,

I have a question about comparing sampled streams of data (that could apply to any data, gestural, audio descriptors etc). I am looking for ways to sample and compare input data streams of varying lengths to each other, to find similar or contrasting data ‘phrases’. I’m doing this matching using the zsa.dist external from Mikhail Malt and Emmanuel Jourdan’s zsa descriptors library: http://www.e–

My question concerns averaging out data streams, as I want a consistent way to compare streams of varying lengths. So far I am achieving this by averaging out the data streams of varying lengths to a fixed length – i.e. 50 data points. The way I am doing this is by taking a sampled stream, rounding the length up or down the nearest multiple of the desired fixed length (if rounded down the list is truncated, if rounded up it is padded with 0′s), and then dividing this rounded length by the fixed length to get an evenly spaced window size with which to average out the data.

So if the original list has 258 elements and I want it averaged or down-sampled to 50 points:

Truncate the list to 250 elements,
Divide this length by the fixed length desired, 250/50 = 5,
Take the average value of every 5 elements,
Make a new list out of these averages.

As this approach was arrived at by trial and error, I guess I just wanted to know if anyone has tried to achieve anything similar and has approached it in a different way? Looking on the net for approaches to down-sampling data like this hasn’t proved very useful for my purposes -or maybe I’m not looking in the right places :-)

Any ideas on how to improve this approach, or if there is another approach that may be more accurate?

I have attached an example patch to illustrate the idea.

Thanks in advance,


– Pasted Max Patch, click to expand. –
May 7, 2011 at 2:30pm

Hello Ben Carey,

What kind of stream are you dealing with ?

In case of finite alphabet (for instance [0, 1, 2, ... , 127]) maybe you can try with Hidden Markov Models ? I did a HMM external [foxtrot] who are not really designed for matching phrases, but it can be modify to do that better ;-)

EDIT : sorry, i don’t really respond to your question …

May 7, 2011 at 4:33pm

look into slide, accum, and zl.

May 7, 2011 at 11:25pm

Hi Vb, I’ll have a look at the external – thanks for the heads up on that.
Hi Roman – thanks for the reply, have you taken a look at the attached patch? I make extensive use of the zl objects for processing these lists… Obviously a simpler solution would be to take a discrete value every n data points to downsample the list to a fixed length – however I did this averaging to have the output list more consistent.


May 8, 2011 at 11:01am

Hello Ben Carey,

a way with javascript ;-)

May 8, 2011 at 12:27pm

you could also do it with [jit.matrix], putting a matrix into a smaller one should “resample” directly, though I’m not totally sure what the results would be. It would be quite straightforward though, so in your example above:

jit.matrix 1 char 258 into–>
jit.matrix 1 char 5 @adapt 0 (this “resamples” to the new size, though without truncation/padding like you’re doing)

the new matrix should show the averaging (“luminance”) directly?
Also not sure what @interp will do, probably want it off.

May 8, 2011 at 12:48pm


[jit.matrix] ? I’ll try ;-)

Anyway if you want to use javascript : this one is simpler and better.

May 8, 2011 at 12:55pm

Thanks to you both for the fantastic advice. VB I tried your original js code – works like a charm – I will try the second one also. I don’t code in javascript myself but it looks like I might need to learn!!

Will also try the jit.matrix solution…. thanks a bunch seejayjames


May 8, 2011 at 1:15pm

VB: Tried the second js code and it works very well also – thanks a bunch mate I really appreciate it – c’est vraiment très sympa !

Now to see if I can use these reduced lists for my matching purposes – I’m gathering descriptor data from amp, pitch, brightness and noisiness (using analyzer~) to try and match incoming phrases to a database of stored and analysed phrases… will see how this works and report back.

May 9, 2011 at 9:55am

Hello Ben Carey,

oops, last one is not very reliable, because of remainder management ; this one is the best ;-)

May 9, 2011 at 1:23pm

Thank you!!!

May 11, 2011 at 6:30am


oops ; in case of ; last javascript with no bug …

  1. lulu.js
May 11, 2011 at 8:03am

Hi Ben,

I’m working on something REALLY similar right now. But I’ve gone a different way, maybe it’s interesting for you:

- I’m playing a loop on the piano and record a “spectral picture” of 200ms when a transient occurs. This I save in a jit.matrix.
- Then, when I continue playing the piano loop, I stream the incoming signal to a jit.matrix of the same size.
- At the end of each pfft-cycle I compare those both jit.matrices with each other. If they are more similar than a certain value I consider this the same chord/notes as the “spectral picture”.

This way I don’t need to average anything and I stay in the signal/jitter world, which is imho more reliable in timing.

If you have questions/wanna talk, just write me an email.

simon dot slowik at gmx dot de

cheers, simon


You must be logged in to reply to this topic.