Cutting up an audio file automatically for help in transcribing speech
I normally work with Jitter, but I have a programming puzzle and I thought max/msp might be a good solution…
I’m trying to create a max patch to cut up an audio file into a series of ~30 second chunks to help me more easily transcribe it. Since I will need to hear the full words, I can’t split the audio file in the middle of one. Anybody have any ideas on this?
Is there a decent bit of silence (second or so) in between the 30 second chunks?
If the silence in between the chunks is greater than the silence in between the words (a moderate difference) then you could tell Max to register the volume below a certain level->then if the "silence" is a certain length cut the file there (end the recording to a file/change the output to another buffer).
I’m in class so this is just an outline, somebody can explain to you the technicalities.
The audio itself isn’t organized in any way– the 30 second cut positions could land anywhere, which is ok to start with but I’d like to have it scan until there was a pause between words just so I could give a chunk of audio to one person and a different chunk to another and then later put their transcriptions together and not lose a word in the middle. I don’t really know how to do this in max though, so if anyone could get me started I would really appreciate it.
Not sure how you’re playing back the file and cutting it but there may be 3rd party externals that help you detect when amplitude drops below a certain level. Actually, now that i look again, there’s also the thresh~ object.
Otherwise, probably not the best solution, there must be many others, but you can use a simple setup with a object to open/close a gate which would on/off-trigger a file-recording process. But you’ll have to fine tune this because your file may not have absolute silence in between words so you may need to use something like the downward-expander/gate within omx.comp~(see help-file) to cut out low noise. And you may even want to increase/decrease the amount of timed silence it takes to cut the recording(just in case there’s a bit of silence within a single word, or in case there’s not that much silence between words). In addition, you’ll need to sync on/off of the playback to the on/off process of the recording so that you can open a new file every time you make a cut and need to start a new recording without losing information from the playback, sorry for how convoluted my explanation is but hopefully this gets you started:
----------begin_max5_patcher---------- 1464.3ocyZszbaaCD9r7uBTcoIyXKS7fuZi6L4RO1LSZtzoSlNPjPxrkhvCI TbbxD+au.Xojsrkf.snozAK4E.BZ2u8aWrKn99YiFOU9UQyXzuf9aznQe+rQ irCYFXTq7nwK3eMqj2XW13Jwsxo+63ygoThuprCOmqDqFTtTUJTp6tQ.a73w nO2N0LYkpo3a1IvjIAsCWja2D8FeAKc01bCWkccQ07+oVjofchjZ9LHRn8MJ yJDLIX89WsbQQk9K2pqjG8kVwWX+RG+95Bd43GVNnr10iMC9iyNy7x4dBGYx EKDUpmgGjInObinB86EkBzmjnOJxj04HdUNJqrH6+PJ4745oTRTihWqP01En M2K9sUaVYQkdvkU1cj5OBl3.AYAwFPKBadESsBrjcgf3NhfAu.DTuCSE06j6 TX.Wz3o7p4chFE6BDRsLGL7VXfaZTWAAxKAD1dTEOWSNTKaPM2VnshcGfcN. TcAfhbAPDKlfiol2Ro8abVOBPJT.JXebmthLgNPFJAR5jl3SFniH0oQThH6D YrgSm2wLyLOvkXaFEZzIMiA22LFpCjAm.4WnomJLloKUJYkatwZs6wlIwCyD Gux.8wLO7Cf2gaVjOWb+do+c9TErKDHLDNUwR9YmBoFdUNZk5pBMFNDX6Qm3 GsZKrRTaxF3HOYKDs5uNlWfl3QFSp8z03T2HUXGQJ1K.ofRRchEaKs.Md+VI NI8jHsvBgpVpokAdl8yCebjG9Xr0tiBO98q7R8wtR7whssikhGVW7BQSCet3 Y9XotuqdoUTZfydHrdSngTnOJbxwzytq5.mAcVt6CCaJlWoUjN.Ljz8yFvL7 CMZ1em.zm8n+tK9K4xetrDcSsbJeZ4cna4UJDGoJVnOYPcMWgtsPOurROGzl dlt1Is8h3yLGdvQYhZEunBwWXNOAImgZzc7WkIPEMnbgRCKh7Iasidh+3syN 5iAHl.EdkPWUl4wqkdm.NJiWowVgFyTJd10lq.giL2eD5MxZ6+b+aQ2dcgdJ K5KJzqVi0kkxa0iqm+KZQsOKSm..cmdGul+E6MoncNH9x7BIZl1IzfzamcGp jJ3i+F7kAWYxObYVorQ71s5WB82u37RFvPayQsG.ZSWDGdL8KhukyyVmH3wF hqV6HTHf1Z.r0u5NS2qGIBOA8ok0UnOTgduwW+v8k4gCyYiZsGVCUwi64DW8 IDPagf+TSYy0.wkeX1rNACNajyZ3IsWPH6zEEd2EXzU.F7qnfq9Co8e6.J3p lFBKAH8QvaAGebnTj2451bkfByrInRHmDUlay62i0n3pr71K9EGXI4L7wur7 c0j5cYkh6QLVvjdDZbc+lDrkvCW7aL4D.Yr2gwdYF5lymUJ4OJJXgLW3QYV5 OuMfbqW2iGbHHAQ3o6Ed9SWcOpWoOXOPER5qcjkcGsEt8jGeoUqLiuIT0HWV msxnaiBPOnW4hFcc8bUgr5QqQmDAgWulqKxyE1oWgJKJxuQpy81pBnOuU2lu ZTjGZjoRstnRj.5jPiCAJFE5QeSoDnL0Xy5rcvNLFQPuZDXHUklwEBBGnYD6 I6HXvXGlaiX+Ha5vpS3SOcxGBHdP0He.I1v527AjL2d8PpS9DxQG3XtDezoN xuoTa1LB73hXsMZtoDEpLU2QZXqzPvAwurzxPyQ3H1ZoCTYsJBdeZKYPoBrH ezo3gUm7Jjoi5TH3UaaDflBO3lMjvIfDPjsRCQUCoGRUCrD7ZoCE1o9l8j5u 1ZdBYFXG57hE.OqnMkHvO5j3D6JIGdwZOIFZGjZ5vRpY9nSgCqNE5QvOsioO oLqejhAuJbiaaJ0d4wrfjmKA08xBS2k.P3SO7yN75TPyuayArWIVpO5TGqph FfgXPH8VaCFaUp8L73n9IaB1GzsS49fJGhZ+UYE9H9yJIF7S0p0lMR8QTRPO aIsHMk.ADIP3wFRs2U6ZK4o4E0B+3r+2jt8Ta -----------end_max5_patcher-----------
There is a way to start automatically only when a certain threshold is reached?
Is there a way to cut automatically the silence from the beginning and the end of a live recorded file?
Hi, IcedDragon, you are asking for the same basic thing it sounds like, so you could use the same patch i posted but rework it so that recording starts automatically when the signal is above 0. But this will cause an abrupt cut which will have an audible click/pop at the beginning and end of the recording unless you also synchronize a fade-in/fade-out. Also, you may have to work in some noise-gate to zero any background noise(like audience, etc.) before the live recording. Not sure exactly what you have in mind, since, it would almost be better to manually start and stop the recording but this patch could help you get started as well, it is just a reworking of the patch i posted before, and you can rework it further to add fade-in/fade-out. Again there are probably better ways to do this, but this is the quickest/simplest way i could think of:
----------begin_max5_patcher---------- 883.3ocyYkraaCCD8r8WAqN63xUszlTfboGa.Z6shfBYIZEUHQZXIglEj7sW JRq3kPmnJao3KzgCojm2adyvgNOLdjyL4s7BGvm.+BLZzCiGMRap1vnUyG4j GdaTVXgdaNB9eky9iyDyRk7aK0lKAvFaowZKpccFtwlrpLiWVd2Bt46xIUnd tqWs5bonTDlqWy4xkogYMO2hvxnaREI+dIOpz7nn.1T3D.lY9fpGgSga81JR uW+1PX0xFyhp7TgxIzv.s1nw0LVqM933w0CSZIeLqprTJrfcxdw9rPQxZv+J PjPZ.2NPrmvRjLOmqBL6FbO+L7TvUK3BvWSy3feJAemGIWF2rwrTgZdkPua7 KoBpuSWhzLpeMxQPMa3qGIziRjF1A1YuJejknOkzGReOiTfQOsk9T7An8WgQ BdX096I5xiS3OYChnWGhS1EpcJPiP0ePOlAZbG3lRYRRF2VMNT6T4VfH02yj b6Nrw4bdQQXB+EAZop7lMDB2KB6XrkgfFfqwOwvB9+mwVbeq6KluTWh2l1GG rWNoHMQnPe2XlFIASKIbQuyk232GGFYE+zWADXpItpCxTVyncYM9nc3z9N5F ME7ypkBvUBvkUwoxy9hM.Q5z4yHRfFdF8L5XFuNlT.YEE7CUGJwJh3iWMetc Z.2IZ.qkqA5QDgdhxBmeFBbggC9L.dw2j5+zFKf5HKnaVE4gL4vv2YdHiaCb dc97JDS2LZf2IQaIIgkVaKwsepLigAlbbsHm1sL8d+Lqn6hx3OoZWBN0B0v5 IpgnYCes5vC+dyLU4y3Ks1v5aAeUGqyyjgajEjKi4a5dcRxXpGvbeAwn9Z04 1OSIcmovGOMzGt3IfU4CpeyrL832aYV5Wm9d567i8ncoZ6aSUExpkQM.roSe vZuJlWTlJBKSU2+a8lpaLbiMcSZbLWudiWmmFuPppztxI.WaMv0ZepMtzf5Q 3V3Q6vjuoKggzoLknvyT60yzfwVyPPccGppGD1pYGHPpudK.8VHYP4VJoGHW jWfgb8Mzoo98Vy5AxsMxD5.ytsI+lNrYSt8PIGr5p10gQh+lAUh+AGTaE+g5 ly5xLGkF77rgvYO4h0ng8zE3o2IddsvkbO8xQoGRNJAt5l0aNCatFFy0T89f SHZMLPGWXX9G73i0vfbvvf0R8Api5C0jGG+O1J6YI. -----------end_max5_patcher-----------
If, however, you are both looking for non-realtime solutions(non-realtime operations over audio recorded into a buffer~, for example), I would look into MXJ as a possible answer. This post could help you out in learning further about that:
Well, there’s another option for you (Haven’t read the whole thread, I apologize) It’s not pretty but it’d be fast, and wouldn’t harm your end goal at all – instead of cutting at 30 second intervals, overlap them. So cut one is 0:00 to 0:30, cut two is 0:25 to 0:55, cut 3 is 0:50 to 1:20, etc.
Yes, its a really easy way to do it, its only the main CORE. When i was asking for starting and stopping automatically, i was talking about doing it in instruments like a guitar or a bass, not voice. But i have another question related to want happens after recording, is Cutting, is there an easy way to cut the silence in the recorded file automatically, or at least below a certain threshold?