Asking for help implementing file splitter


    Mar 23 2020 | 2:07 pm
    Hi all,
    I've done a ton of recordings with sounds separated by (relative) silence. I've been busy working on a patch to split these files up into the individual sounds it contains. I already figured that completely automatic silence detection is a bridge too far for me.
    In stead, I'm looking to create a patch in which the user can quickly identify and select the different sounds from within a waveform~-like UI. These sounds will then be sent to a silence trimmer, so extracting the sounds doesn't have to be very accurate.
    I've been playing around with waveform~ (which seems like the only way to go?). However, the user can only make one selection at a time with that object. Ideally I'd like to have a patch in which the user can easily zoom in/out on different areas of the file, and make multiple selections to extract.
    I'd love to hear any approaches to better meet my project requirements. Thanks a lot :)

    • Mar 24 2020 | 12:36 pm
      There are many options here. But no multi selection in waveform. You could let users copy selection to another buffer and export selection. One could ev. remove exported part from initial buffer to avoid confusion.
      or You could let users really silence the "silent" part and use sox and shell to export the slices into individual files. or ... mark regions to export later in max or outside
    • Mar 25 2020 | 2:01 pm
      Thanks for the suggestions. I've put it on hold for now since it seems like a whole lot of work. Maybe I'll come back to it later. Thanks :)
    • Mar 25 2020 | 2:28 pm
      I all you need is to split the files , try sox. It does it very well. https://digitalcardboard.com/blog/2009/08/25/the-sox-of-silence/ to shorten it : sox XXX.wav XXX.wav silence 1 0.1 -50d 1 0.1 -50d : newfile : restart this command would export individual files from XXX.wav removing silence below -50db and splitting audio at that position. 0.1 means audio must be at least 0.1 seconds above threshold of -50db to get recognised as such. One can tweak params as needed. One would end with XXX_001.wav, XXX_002.wav etc for as many slices as detected ------- ( I prefer db threshold over percentage which is linear scale. 1% as used in tutorial would be 1 /100 = 0.01 = -40 db ) if used in max - shell sox would have to be called using full path, as well as audio file path. shell prefers slash based path.
    • Mar 25 2020 | 2:31 pm
      i have recently implemented a multi-selection system with waveform~. it basically stores the selection you make somewhere else (from where it can be recalled) and then you can go to the next slot and set a new selection. mine is more complicated, but you can basically do that with 2 [flonum] and a [preset] object. in order to display the non-active selections you could draw vertical markes as seen in the waveform helpfile. but it think it is limited to 20. -110
    • Mar 25 2020 | 4:23 pm
      @ SOURCE AUDIO I'll have a look at that, sounds like what I need! @Roman Thilenius about the non-active selections; you're talking about the line message? That's just a single line right? Or did you mean something else?
    • Mar 25 2020 | 4:36 pm
      here few examples with waveform
      Are you using mac ?
    • Mar 26 2020 | 3:18 pm
      Great stuff, thanks so much. The idea to silence certain parts after rendering them is nice to keep track of what you've already exported.
      About silencing all samples below a certain threshold (correct me if I'm wrong); even in seemingly loud parts of the waveform, the sample values are going to continuously pass below e.g. -60dB as the waveform crosses 0, right? So then I would be setting certain indexes to 0 that I shouldn't.
      And I'm on Windows.
    • Mar 26 2020 | 3:50 pm
      Question about silencing : peek~ output is sent through abs 0. which shifts all negative values to positive. So it does not matter if sample value is for example -0.02 or 0.02 output is allways 0.02. if that falls below threshold, than samples get zeroed. Audio level is measured in both positive and negative swing.
      I asked about OS because of sox. On Mac one has to use conformpath slash boot when sending any path to shell.
    • Mar 26 2020 | 4:02 pm
      Yep, let me try and explain better what I mean..
      First image showing full waveform
      Full waveform
      Full waveform
      Second image, zoomed in at 7.2s: even though there's definitely an audible sound here (as can be seen in first image), some of the sample values are going to be very low level (-60 ish here) and I don't want to zero those as it's going to affect my sound right?
      Zoom in at 7.2s.
      Zoom in at 7.2s.
    • Mar 26 2020 | 4:47 pm
      I think I understand what You mean. The cleanup should have minimum length set to get rid of "silence"
    • Mar 26 2020 | 6:38 pm
      "the sample values are going to continuously pass below e.g. -60dB as the waveform crosses 0, right?" you´d probably never differentiate silence from noise sample by sample. you would get the rms of a 20 ms window - and perform a cut or marker only when the average of the whole window is zero. in a second step you could then find the transient ("attack") in order to cut the silence a bit more.
    • Mar 26 2020 | 6:47 pm
      Got it. Thanks guys :)
    • Mar 27 2020 | 7:57 am
      But at the end, I think one still has to manually check each separated sound file to do a real cleanup. That's my experience .