Unwanted delay between concatenated audio samples

LiamD's icon

Hi all,

I'm attempting to make a speech synthesis patch which works by concatenating very short (mostly < 0.5 second) samples of speech together to create a variety of pre-determined words and phrases.

I've come across some unwanted popping in between samples - strange as I have carefully edited the samples to all begin and end at 0 dB and they concatenate fluently in Adobe Audition. I believe the pop is down to an unwanted delay due to the time Max is taking to send the bang to the groove~ in question.

I initially had two grooves (for each sample), the first of which uses the "delta~" -> " "edge~" chain demonstrated in the groove~ help file which outputs a bang when the audio file finishes playing. This bang was the cue for the second player to start playing. However when the popping occurred I tried a new counter system where a "metro" -> "counter" chain creates an independent time base from which to trigger each sample (this method is the example patcher).

I'm still getting the popping sound, I thought that if use of timers and delays are an issue then pre-concatenation of the audio (i.e. combining them before hearing the results) might be a possible solution but I'm struggling to find an object that can do this for me.

Has anyone had any similar issues or any ideas as to where the problem might lie?

Thanks,

Max Patch
Copy patch and select New From Clipboard in Max.

Liam

pid's icon

i have not looked in depth but: your patch does not work because the triggering happens in the control domain. you need to work only at sample rate. you need to double buffer and/or use the play~ object instead; with a phasor~. if you are never changing tempo or pitch you may find wave~ / index~ will work well too.
p.s. - if you are dealing with a lot of files into ram then just use polybuffer~ object.

LiamD's icon

Thanks for the response,

I've switched over to the play~ object, using a '0' message and line~ to trigger the first play~. I don't quite understand what you mean by double buffer but I am in fact using polybuffer~ in the real patch - I modified that example patcher as I wasn't sure on the rules regarding uploading other people's work.

I'm struggling to find a way to use the play~ output to trigger the second play~, is this where the phasor~ comes in? I notice the phasor~ help file says 'can be used as an audio signal or a sample-accurate timing/control signal' but I can't seem to find any demonstration of this in the documentation. Of course, this is probably due to my lack of experience with the object.

Thanks again,

Liam

pid's icon

hey. i looked at this. actually, for what you are trying to do a single buffer with all your audio and accessing the sample positions for playbacks for each segment would be a much much better way to do it. you can make a single buffer of all your files using mxj buff.op but just as well do it in an audio editor if you do not need that in realtime. or look at FTM. triggering play~ with line is no good because you are coming in and out of control / signal domain. a counter system would be better if you are wanting separate buffers. i'll look into it again if i get time tomorrow or so. are you wanting one-shot plays of each segment of audio? are you wanting just one sound then the other and stop? or constantly playing between two buffers and stop when you call it? all sorts of different ways of doing all this. all this should be easier for you, but that is max sometimes unfortunately.

LiamD's icon

Just to clarify, do you mean rendering all of the audio into a single file, and loading it into one buffer~?

To answer your questions I only need one-shot plays of each segment which stop at the end of the chain. Thinking about it I see no reason why the single buffer technique has a disadvantage over multiple buffers. In fact, the amount of audio segments needed per utterance will continuously vary, so using multiple buffers will probably prove more problematic.

You mention buff.op - if this can single buffer all of my files, does that mean I could use it to pick and choose the samples I need to make each phrase?

In case this is confusing, here is a crude example of what I am asking:

Let's suppose I want to synthesise the word 'Speech'. I have recordings of its component phonemes (/s/, /p/, /i:/ and /ʧ/). Could buff.op take these samples and concatenate them for me?

I shall look into FTM this evening, thanks for your time

Liam