Unwanted delay between concatenated audio samples
I’m attempting to make a speech synthesis patch which works by concatenating very short (mostly < 0.5 second) samples of speech together to create a variety of pre-determined words and phrases.
I’ve come across some unwanted popping in between samples – strange as I have carefully edited the samples to all begin and end at 0 dB and they concatenate fluently in Adobe Audition. I believe the pop is down to an unwanted delay due to the time Max is taking to send the bang to the groove~ in question.
I initially had two grooves (for each sample), the first of which uses the "delta~" -> "< =~ 0" -> "edge~" chain demonstrated in the groove~ help file which outputs a bang when the audio file finishes playing. This bang was the cue for the second player to start playing. However when the popping occurred I tried a new counter system where a "metro" -> "counter" chain creates an independent time base from which to trigger each sample (this method is the example patcher).
I’m still getting the popping sound, I thought that if use of timers and delays are an issue then pre-concatenation of the audio (i.e. combining them before hearing the results) might be a possible solution but I’m struggling to find an object that can do this for me.
Has anyone had any similar issues or any ideas as to where the problem might lie?
----------begin_max5_patcher---------- 1667.3ocya00aaaCE84Tf9efvHOMjEH9M0.JFReZ6483PwfrMsi1jkLjjWaZ w5u8IwqnhqSr9JTezhFURKae3QGdt2KIy2d+6tY05junyVg9Ezeht4luUzyM l9J64FaG2r5PvW1DEjYtwUaRNbPGmu5tpWLW+kbyKjoyyPgw6R9NZWRJ5g6P 4OpiQY53sP+0uknvX8ljSwl2Gw16wf7MOFFu+uR0axALoTx68tCQ7TkWvbux KT18dnOYeW6RhyyB+pt7MfIEudU+wmNDFGU.IyKbVuImxsc6c9GRbvAyGxpG RCChpwZ3VSmIq+6elpVY57+d+6JuVb4t2Nq8wf38Ynj3M5J9J+TZbAekmoi1 gR1sq+rljIJ4INVrHXM5HvZ+VxmQQIw6QgYkrFZWXj9WQ+dgHK8.zSVdPZNJ O7ftfDQer+rnuhXzd.8M2rHweDXw+3XvFM5XTvSYn8oII+qtbVaXwcuMLHWG 8TwD4nnjOq2hV+j8NtBURafJwdLyDXp7Lprrq4gJkCmJK9hWqSW0jrgwLCUE 0LGzC7ub0PkzugJVU2K7gj+zQMfyUgkJBzp0E9OqPeZ3Lh9yEeSuTas9ztc5 zuelb4UEFfAu.aHMt+7xV7qyV6hRBFU95mJHi6aVXYTTTvUmxGHUQdcpB2Op R1FS8FXn0mxyShahJTDHs.J1RBWREcQcfekwk35iqw5Ae3Nzsg3O7gaCIPB. lunlBuSYf+IDXhPmUkfpAovagtxS1uOR2HOPLAnoTkqUAJZy1lifJvDGUmh7 J9K1q7OMKALCcFCCR.4.k.7WmSX8zLvu0nL1+MRr2AcdZBpX76gaVwXBvP8M rmblsPUyfWio5rGZzaU.9IbHWML8dtKCIq5o6RWBIGElYtV2QW9OioUNsiN4 9d9Om1Cl33bE6odDSGGi7qQV+i9oNvMDQECMLUnmSL3ns5u8xKuAF6fNKKXu 9kTVilaTvbCyg7CIkLFVMSlaD73HltJ0zX3Rpe05G4uDnF5bkvDEh3g84tNg IJYzRXZXSFDdL3INeA7D2eIMWPHo.ynV.LyT6RjoyadAIXP00DLna.ai4hcn iD8zdA11XIXoycJlgJryB2+8N3KRHlponskFF9dtOGqjiXtXzFrLJFMwEuqQ fmfE3skkriJgYHBybCgXf4rRcxJ1w7ZkmtyELViFJMVNkjCknqDms.3ylcq+ bYnXiGicdpGDx3YnzdJUhpcGxm47wEaxWCp17IEXnjLb0pnL+9jD9b5S9PyJ CXwoElKB9r5SRDSiOo9qaC1znFhwgc5AzPLt8mWYBCokc76byM1pQaSrdn4M wRZVOQNSt.1DKxDrGVWMbXxQcbmnJHwxJyy4JbHahq9XSjNnw8NFiI7kC8vm X5IUeLp77HbKtYNRpNii.i04KiJ7jSRAaaigH1y+yRPEgIKSYT0IOfqXKAYj nOjD.Qywu4xytmYjV9BWPcYImR2X+Dsq1O5rA6VcVdXbPdXQd7OeWkIWe9c8 X31s53eHB7gvsGSJRHsBI9EbH+Njun7mBX2iJazzS5NC5Ng4KGYsiYmfsxcp BIZCbxYAaW9X7JDWewV0iWLX2HTj5VtA0c5wMkLKTZ4IqpcvUd3HP3dM+ga3 TlvjEopHkRaKm.aYmDBp4gREcBb74AbcZJzLoEwKWro5jhiJ5K3jLXdhOrC5 X9KaUTrg87AYaMgCIB8MNjpNkr+PKLCJa1SBVs9S5PRMzgDGaPuOt5wy4snD S57RJt7NMsbiSmemMn64Px.TlzLHrOyjdNB1cxCT0aOPE0jPFAhtnjj5VtwA pSjc+mjK7L3DJjwNFvtZhb2Pc+I6ZXajHVxF6JIBsSvtbqQlA+dJdb.GiX7z I1SW+yslvbMG.rAmZBjGWMrkNZd2k.5JoLw5Krq.JiSfzOo0slZXiGBrqrnU 90sbiHQLNvV.9ELn.JKrUjoG1dCA1R54hDoyhozorr5cLkpxTIXJr+0d0slv XJjAGJjQNG1ksbiFgORrMQA.0HNpgsqzH7kagOrkKzncxPffmGv0oBsuLGno ZgzviyRRYqQUUkcCqtkSfc27jXCc8+feUGEvgz1zZBW0xYRHzso28tzgJRj. GCQq6IVLkBg9WlekPfBGxGao7drIUHf8Gp9kZH6pHrlVSWBLlkofNjDXfeu+ Tjma4lkBpSUBI68RDaUERi2lDL7Lsbyx8n5La2SXa.JC98Pwt3OjWSiT1Qwk +GHun7BH -----------end_max5_patcher-----------
i have not looked in depth but: your patch does not work because the triggering happens in the control domain. you need to work only at sample rate. you need to double buffer and/or use the play~ object instead; with a phasor~. if you are never changing tempo or pitch you may find wave~ / index~ will work well too.
p.s. – if you are dealing with a lot of files into ram then just use polybuffer~ object.
Thanks for the response,
I’ve switched over to the play~ object, using a ‘0’ message and line~ to trigger the first play~. I don’t quite understand what you mean by double buffer but I am in fact using polybuffer~ in the real patch – I modified that example patcher as I wasn’t sure on the rules regarding uploading other people’s work.
I’m struggling to find a way to use the play~ output to trigger the second play~, is this where the phasor~ comes in? I notice the phasor~ help file says ‘can be used as an audio signal or a sample-accurate timing/control signal’ but I can’t seem to find any demonstration of this in the documentation. Of course, this is probably due to my lack of experience with the object.
hey. i looked at this. actually, for what you are trying to do a single buffer with all your audio and accessing the sample positions for playbacks for each segment would be a much much better way to do it. you can make a single buffer of all your files using mxj buff.op but just as well do it in an audio editor if you do not need that in realtime. or look at FTM. triggering play~ with line is no good because you are coming in and out of control / signal domain. a counter system would be better if you are wanting separate buffers. i’ll look into it again if i get time tomorrow or so. are you wanting one-shot plays of each segment of audio? are you wanting just one sound then the other and stop? or constantly playing between two buffers and stop when you call it? all sorts of different ways of doing all this. all this should be easier for you, but that is max sometimes unfortunately.
Just to clarify, do you mean rendering all of the audio into a single file, and loading it into one buffer~?
To answer your questions I only need one-shot plays of each segment which stop at the end of the chain. Thinking about it I see no reason why the single buffer technique has a disadvantage over multiple buffers. In fact, the amount of audio segments needed per utterance will continuously vary, so using multiple buffers will probably prove more problematic.
You mention buff.op – if this can single buffer all of my files, does that mean I could use it to pick and choose the samples I need to make each phrase?
In case this is confusing, here is a crude example of what I am asking:
Let’s suppose I want to synthesise the word ‘Speech’. I have recordings of its component phonemes (/s/, /p/, /i:/ and /ʧ/). Could buff.op take these samples and concatenate them for me?
I shall look into FTM this evening, thanks for your time
Forums > MaxMSP