Asking for help implementing file splitter

GVA1994's icon

Hi all,

I've done a ton of recordings with sounds separated by (relative) silence. I've been busy working on a patch to split these files up into the individual sounds it contains. I already figured that completely automatic silence detection is a bridge too far for me.

In stead, I'm looking to create a patch in which the user can quickly identify and select the different sounds from within a waveform~-like UI. These sounds will then be sent to a silence trimmer, so extracting the sounds doesn't have to be very accurate.

I've been playing around with waveform~ (which seems like the only way to go?). However, the user can only make one selection at a time with that object. Ideally I'd like to have a patch in which the user can easily zoom in/out on different areas of the file, and make multiple selections to extract.

I'd love to hear any approaches to better meet my project requirements. Thanks a lot :)

Source Audio's icon

There are many options here.
But no multi selection in waveform.
You could let users copy selection to another buffer and export selection.
One could ev. remove exported part from initial buffer to avoid confusion.

or
You could let users really silence the "silent" part and use sox and shell to
export the slices into individual files.
or ...
mark regions to export later in max or outside


GVA1994's icon

Thanks for the suggestions. I've put it on hold for now since it seems like a whole lot of work. Maybe I'll come back to it later. Thanks :)

Source Audio's icon

I all you need is to split the files , try sox.
It does it very well.
https://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/
to shorten it :
sox XXX.wav XXX.wav silence 1 0.1 -50d 1 0.1 -50d : newfile : restart
this command would export individual files from XXX.wav
removing silence below -50db and splitting audio at that position.
0.1 means audio must be at least 0.1 seconds above threshold of -50db
to get recognised as such.
One can tweak params as needed.
One would end with XXX_001.wav, XXX_002.wav etc for as many slices as detected
-------
( I prefer db threshold over percentage which is linear scale. 1% as used in tutorial
would be 1 /100 = 0.01 = -40 db )
if used in max - shell sox would have to be called using full path,
as well as audio file path.
shell prefers slash based path.

Roman Thilenius's icon


i have recently implemented a multi-selection system with waveform~.

it basically stores the selection you make somewhere else (from where it can be recalled) and then you can go to the next slot and set a new selection.

mine is more complicated, but you can basically do that with 2 [flonum] and a [preset] object.

in order to display the non-active selections you could draw vertical markes as seen in the waveform helpfile. but it think it is limited to 20.

-110

GVA1994's icon

@ SOURCE AUDIO I'll have a look at that, sounds like what I need!

@Roman Thilenius about the non-active selections; you're talking about the line message? That's just a single line right? Or did you mean something else?

Source Audio's icon

here few examples with waveform

Max Patch
Copy patch and select New From Clipboard in Max.

Are you using mac ?

GVA1994's icon

Great stuff, thanks so much. The idea to silence certain parts after rendering them is nice to keep track of what you've already exported.

About silencing all samples below a certain threshold (correct me if I'm wrong); even in seemingly loud parts of the waveform, the sample values are going to continuously pass below e.g. -60dB as the waveform crosses 0, right? So then I would be setting certain indexes to 0 that I shouldn't.

And I'm on Windows.

Source Audio's icon

Question about silencing : peek~ output is sent through abs 0.
which shifts all negative values to positive.
So it does not matter if sample value is for example -0.02 or 0.02
output is allways 0.02.
if that falls below threshold, than samples get zeroed.
Audio level is measured in both positive and negative swing.

I asked about OS because of sox.
On Mac one has to use conformpath slash boot
when sending any path to shell.

GVA1994's icon

Yep, let me try and explain better what I mean..

First image showing full waveform

Full waveform

Second image, zoomed in at 7.2s: even though there's definitely an audible sound here (as can be seen in first image), some of the sample values are going to be very low level (-60 ish here) and I don't want to zero those as it's going to affect my sound right?

Zoom in at 7.2s.

Source Audio's icon

I think I understand what You mean.
The cleanup should have minimum length set
to get rid of "silence"

Roman Thilenius's icon


"the sample values are going to continuously pass below e.g. -60dB as the waveform crosses 0, right?"

you´d probably never differentiate silence from noise sample by sample.

you would get the rms of a 20 ms window - and perform a cut or marker only when the average of the whole window is zero.

in a second step you could then find the transient ("attack") in order to cut the silence a bit more.

GVA1994's icon

Got it. Thanks guys :)

Source Audio's icon

But at the end, I think one still has to manually check each
separated sound file to do a real cleanup.
That's my experience .