Detection of silent segments / Non-silent segments
Hi, I'm trying to figure out how I can make the entry and exit points of silence sharper (like the line in the screenshot).
I'm trying to get indexes where audio starts above a given threshold and ends, bypassing the intermediate indexes.
That's where I started, but here I get too many indexes containing the signal.
You need to take average of x samples , something in range of 10 - 20 ms
at least to avoid repeated values.
Source Audio, something like [slide 0. 20.] ?
no, I mean to collect number of samples into group to analyse.
you can then get rms of the range.
If > than set thresh, mark as non - silence.
when it falls below, mark as silence.
you can also set different threshold for both ...
Source Audio, Thank you so much for your help!
This looks quite difficult to understand ) Also after several changes of the file in the buffer, the patch stops working (sometimes), it only helps to load it again from your message, until I found out what exactly is causing the failure.
Also could you explain how I can invert it so that this finds only segments containing audio and skips all silence?
what was causing blocking of the buffer ?
In max itself all is ok, I can't say anything about Live.
In the result list, pairs of values represent detected audio chunks.
for example 200 800, 1200 2300, 3000 5600, ...
if you instead create pairs starting with end of first audio chunk,
you will get silences, in this case 800 1200, 2300 3000 ....
by the way, this patch is actually not really complicated.
peek~ iters through buffer~.
absolute sample values get squared (multiplied by themself) as part of RMS calculation.
10ms chunks get collected (441 samples in case of 44.1 KHz),
and run through rest of calculation to get RMS value for THAT 441 samples (10ms) chunk.
as no buffer~ is perfect multiplication of 441 samples, last chunk size must get
detected in order to calculate RMS (job for zl.len).
....
RMS calculation is done by multipying each sample with itself (square),
then all results get added and divided by number of samples (average),
then result gets square rooted.
After that result is run through thresholds for start and end of audio chunks.
that little logic arround ggate is there to capture current sample index whenever
audio portion or silence gets detected.
Better understandable now ?