vocoder from LPC analysis
Hi
I use a software called PRAAT, which is a speech analysis tool. It gives you the possibility (among other things) to do a filtering on a given sound using a socalled LPC analysis from some speech. The effect is close to a vocoder, and LPC analysis seems to be the preferred one for that, by the way.
I'm trying to apply the LPC analysis in MAX/MSP in order to do a real time processing of a given sound file. The problem is, that I don't know how to 'translate' the analysis into parameters understood by the MAX/MSP objects. Here is an example of a (small part of an) LPC analysis:
xmin = 0
xmax = 3.4449951170417563
nx = 679
dx = 0.005
x1 = 0.027491460103360062
samplingPeriod = 2.2675736961451248e-05
maxnCoefficients = 16
frame []:
frame [1]:
nCoefficients = 16
a []:
a [1] = -3.2827861
a [2] = 6.0996714
a [3] = -7.8335428
a [4] = 6.9054546
a [5] = -3.3981638
a [6] = -0.90945804
a [7] = 3.6682715
a [8] = -3.6352305
a [9] = 1.4744695
a [10] = 1.0017649
a [11] = -2.2189555
a [12] = 1.8888326
a [13] = -0.85666442
a [14] = 0.065275356
a [15] = 0.18169832
a [16] = -0.09004765
gain = 5.8214144e-05
, where xmax is the length of the sound file in seconds; nx and dx have to do with window size (I think). It seems, that each of the 16 values should be multiplied by something in the given sample of the filtered sound, but my rudimentary understanding stops here.
I NEED HELP!
Thanks!
PS: I have had succes in translating pitch information from a speech signal into values, that can control sample rate in MAX/MSP, giving me the opportunity to play sounds in a way that imitates the pitch contour of the speech signal. The advantage in using PRAAT is, that the pitch analysis is much better, when dealing with speech, than any max/msp obejct, I have tried.
im no DSP whiz, but im thinking those values and coefficients are used for creating an FIR filter? it looks similar to variable declarations in C, without types (but it seems all types are float or double anyways). i dont want to say for sure and lead you in the wrong direction, but maybe that would be a good place to start looking (FIR filter design). for more info on what to do, maybe you can google to find what kind of filters are used in applying LPC analysis.
well, in that case, the values a [1], a [2], ... a[n], must be in dB, and I must convert them to amplitude values (-1. to 1.) before creating a buffir~ content..
Thanks for leading me there, it seems to be the right path!
cool, hope it works out.
im no max object whiz, but im thinking there must be some sort of conversion object between db and Amplitude? without using expr, that is.
there is, in fact. In the 4th max tutorial, there is an patcher called dBtoA (which IS in fact based on expr, but anyhow)
OK... Now I've succeeded in transformaing the LPC analysis generated in PRAAT into a coll file. The good news is that the values, when translated into amplitude go from negative/positive to all positive. Buffir~ seems to prefer positive numbers. The bad news is, that the numbers exeede 1.0... It seems I have the wrong convertion going on from dB to Amplitude.
I've attached the patch which translates the PRAAT analysis into a coll file. In next message I'll upload the text document with the PRAAT analysis.
Here is the PRAAT analysis.
Note that you need to place this file in a folder readable from MAX in order to open it in the patch called lpc_to_buffir
This would be fun if you can get it running in a patch, but you may
be limited by the signal vector size, etc. I'm certainly not an
LPC or DSP person by any stretch of the imagination, but I've used
LPC resynthesis and cross-synthesis a fair amount. 'cross
synthesis' is probably the 'phase vocoder' effect you are chasing
-- you use a different program source through the LPC filters.
rtcmix~ has a working suite of LPC instruments, based on the old
Ken Stieglitz FORTRAN LPC that Paul Lansky, Charles Dodge, etc.
used back in the olden days. The source is all available if you
want to look at how it is implemented, but the code is pretty
dense.
Coupla comments from my very surface perspective:
Quoting Casper Cordes :
> Here is an example of a (small part of an) LPC analysis:
>
> xmin = 0
> xmax = 3.4449951170417563
> nx = 679
> dx = 0.005
As you have noticed, "nx" seems to be the number of frames in the
analysis, and "dx" looks like how far each jump forward is taken
(679*0.005 ~= 3.44499). I can't really tell how large (in samples)
each frame is, unless perhaps:
> x1 = 0.027491460103360062
"x1" is the number of samps/frame (x1*44100 = 1212.37339, which
seems weird). LPC frame sizes are odd, and often not the
power-of-2 that you see in FFT-based analyses. I think the default
for rtcmix~ LPC is 200 samps/frame. NEVER MIND -- I just checked,
and I think your analysis is about 220 samples/frame. Nor sure
what "x1" is.
> samplingPeriod = 2.2675736961451248e-05
samplingPeriod == 1/SR obviously.
> maxnCoefficients = 16
> frame []:
> frame [1]:
> nCoefficients = 16
> a []:
> a [1] = -3.2827861
> a [2] = 6.0996714
> a [3] = -7.8335428
> a [4] = 6.9054546
> a [5] = -3.3981638
> a [6] = -0.90945804
> a [7] = 3.6682715
> a [8] = -3.6352305
> a [9] = 1.4744695
> a [10] = 1.0017649
> a [11] = -2.2189555
> a [12] = 1.8888326
> a [13] = -0.85666442
> a [14] = 0.065275356
> a [15] = 0.18169832
> a [16] = -0.09004765
> gain = 5.8214144e-05
This is where to be careful, as I'm not sure that LPC uses FIR
fiters. The rtcmix~ LPC uses an allpole, recursive (IIR) filter
with 32 poles (coeffs). I think LPC can indeed be implemented as
FIR filters, but the analysis that generates the coeffs is based in
a recursive procedure that may assume IIR-type filters.
What I was saying earlier about the signal vector size and
implementing this at the patch level -- bear in mind that you will
need to access the coeffs for a new frame every 200 samples
(1/0.005), which means ideally you would need a signal vector size
of 200. I'm not sure how you will get around this. And of course
I'm assuming "in time" resynthesis.
Im working on a translator for this. I need a couple of days to sit
down with it, though it will be the formant tracks it reads first for
my own research, though the lpc should be easy to follow. but keep
nagging a I need to get it done pretty soon.
Something that might help is why is text limited to 255 lines. Is most
interesting reading this long?
I couldn;t find anything obvious on maxobjects.
best Pere
On 04/08/07, Casper Cordes wrote:
>
> Hi
>
> I use a software called PRAAT, which is a speech analysis tool. It gives you the possibility (among other things) to do a filtering on a given sound using a socalled LPC analysis from some speech. The effect is close to a vocoder, and LPC analysis seems to be the preferred one for that, by the way.
>
> I'm trying to apply the LPC analysis in MAX/MSP in order to do a real time processing of a given sound file. The problem is, that I don't know how to 'translate' the analysis into parameters understood by the MAX/MSP objects. Here is an example of a (small part of an) LPC analysis:
>
> xmin = 0
> xmax = 3.4449951170417563
> nx = 679
> dx = 0.005
> x1 = 0.027491460103360062
> samplingPeriod = 2.2675736961451248e-05
> maxnCoefficients = 16
> frame []:
> frame [1]:
> nCoefficients = 16
> a []:
> a [1] = -3.2827861
> a [2] = 6.0996714
> a [3] = -7.8335428
> a [4] = 6.9054546
> a [5] = -3.3981638
> a [6] = -0.90945804
> a [7] = 3.6682715
> a [8] = -3.6352305
> a [9] = 1.4744695
> a [10] = 1.0017649
> a [11] = -2.2189555
> a [12] = 1.8888326
> a [13] = -0.85666442
> a [14] = 0.065275356
> a [15] = 0.18169832
> a [16] = -0.09004765
> gain = 5.8214144e-05
>
> , where xmax is the length of the sound file in seconds; nx and dx have to do with window size (I think). It seems, that each of the 16 values should be multiplied by something in the given sample of the filtered sound, but my rudimentary understanding stops here.
>
> I NEED HELP!
>
> Thanks!
>
> PS: I have had succes in translating pitch information from a speech signal into values, that can control sample rate in MAX/MSP, giving me the opportunity to play sounds in a way that imitates the pitch contour of the speech signal. The advantage in using PRAAT is, that the pitch analysis is much better, when dealing with speech, than any max/msp obejct, I have tried.
>
--
www.centuryofnoise.com
www.perevillez.com
Quote: pvillez@gmail.com wrote on Sun, 05 August 2007 15:36
----------------------------------------------------
> Im working on a translator for this. I need a couple of days to sit
> down with it, though it will be the formant tracks it reads first for
> my own research, though the lpc should be easy to follow. but keep
> nagging a I need to get it done pretty soon.
>
> Something that might help is why is text limited to 255 lines. Is most
> interesting reading this long?
> I couldn;t find anything obvious on maxobjects.
>
> best Pere
well coll can open text files too, maybe you could use
another coll.
Thanks for the interest, everybody; I was just thinking, that I should describe, excactly what I'm after:
I would like to be able to use formant/spectral information analysed from speech on other sounds. If used on white noise, the effect is like (clear) whispering. When stretched over time, there can be some interesting filtering going on. What's important is, that the calculations have a low cpu cost, and I suppose, that a maximum of 16 numbers per frame, or maybe 5 pairs (frequency/bandwidth) will suffise. But I'm not able to work it out in MAX/MSP. I CAN make it happen, when using PRAAT, (see praat.org), but then its not real time.
there is a nice example of rand~ in the help file,P
On 06/08/07, Casper Cordes wrote:
>
> Thanks for the interest, everybody; I was just thinking, that I should describe, excactly what I'm after:
>
> I would like to be able to use formant/spectral information analysed from speech on other sounds. If used on white noise, the effect is like (clear) whispering. When stretched over time, there can be some interesting filtering going on. What's important is, that the calculations have a low cpu cost, and I suppose, that a maximum of 16 numbers per frame, or maybe 5 pairs (frequency/bandwidth) will suffise. But I'm not able to work it out in MAX/MSP. I CAN make it happen, when using PRAAT, (see praat.org), but then its not real time.
>
>
>
--
www.centuryofnoise.com
www.perevillez.com
Try the rtcmix~ LPCPLAY examples. It does this.
Quoting Casper Cordes :
>
> Thanks for the interest, everybody; I was just thinking, that I
> should describe, excactly what I'm after:
>
> I would like to be able to use formant/spectral information
> analysed from speech on other sounds. If used on white noise, the
> effect is like (clear) whispering. When stretched over time,
> there can be some interesting filtering going on. What's
> important is, that the calculations have a low cpu cost, and I
> suppose, that a maximum of 16 numbers per frame, or maybe 5 pairs
> (frequency/bandwidth) will suffise. But I'm not able to work it
> out in MAX/MSP. I CAN make it happen, when using PRAAT, (see
> praat.org), but then its not real time.
>
>
>
There is also an LPC analyser and resynthesis in csound. You can use csound~
external to do it real-time in max.
Peiman
On 06/08/07, garton@columbia.edu wrote:
>
> Try the rtcmix~ LPCPLAY examples. It does this.
>
> brad
> http://music.columbia.edu/~brad
>
>
>
> Quoting Casper Cordes :
>
> >
> > Thanks for the interest, everybody; I was just thinking, that I
> > should describe, excactly what I'm after:
> >
> > I would like to be able to use formant/spectral information
> > analysed from speech on other sounds. If used on white noise, the
> > effect is like (clear) whispering. When stretched over time,
> > there can be some interesting filtering going on. What's
> > important is, that the calculations have a low cpu cost, and I
> > suppose, that a maximum of 16 numbers per frame, or maybe 5 pairs
> > (frequency/bandwidth) will suffise. But I'm not able to work it
> > out in MAX/MSP. I CAN make it happen, when using PRAAT, (see
> > praat.org), but then its not real time.
> >
> >
> >
>
OK:
1) now I've checked rtcmix~ as suggested by Bradford, and yes: it DOES lpc things, but the problem is how to do an LPC analysis that it can read. It seems, that the object doesnt do analysis, but only reads analyses. I cant get the grasp of how to do an LPC analysis that it will read. The one in the example (a howling dog) comes in an exec file, the content of which, I can't read, and the procedure for making this analysis seems difficult; it's in another program, and in any case it's non-real time.
2) I've also started on looking at Csound. It seems more capable of doing what I'm after (for newcomers: real time LPC analysis and resynthesis), but apparently I have to study the thing for 5 years before getting close to anything interesting. I need A KEY to enter the fabulous world of Csound + max/msp!
3) Finally, what would really be interesting is if someone was capable of implementing the PRAAT algorithms in max/msp. I might be wrong, and there might exist initiatives that make this demarche unnessesary, but I think, that the coupling of these two things would be beneficial for both composers and linguists, and, -not least - ME!
Well if anyone can see the potential in this, and are able to work it out, that would really be great!
I wouldn't say you need 5 years to make an interesting sound in csound (if
that was the case I would need to trash half of my music!). You are not
going to use it as a score generator but to do a clear task (in this case
LPC), for that all you need to do is read the tutorial for that particular
opcode (LPC synthesis) or even just copy and paste the tutorial example.
Csound it not like max/msp in that you are not faced with the basic building
blocks but already designed opcodes that do particular synthesis tasks. This
means that to do LPC for instance you just need to give the right parameters
to the right opcodes which with csound~ you can control in real-time and
graphically in max.
Best
Peiman
On 08/08/2007, Casper Cordes wrote:
>
>
> OK:
>
> 1) now I've checked rtcmix~ as suggested by Bradford, and yes: it DOES lpc
> things, but the problem is how to do an LPC analysis that it can read. It
> seems, that the object doesnt do analysis, but only reads analyses. I cant
> get the grasp of how to do an LPC analysis that it will read. The one in the
> example (a howling dog) comes in an exec file, the content of which, I can't
> read, and the procedure for making this analysis seems difficult; it's in
> another program, and in any case it's non-real time.
>
> 2) I've also started on looking at Csound. It seems more capable of doing
> what I'm after (for newcomers: real time LPC analysis and resynthesis), but
> apparently I have to study the thing for 5 years before getting close to
> anything interesting. I need A KEY to enter the fabulous world of Csound +
> max/msp!
>
> 3) Finally, what would really be interesting is if someone was capable of
> implementing the PRAAT algorithms in max/msp. I might be wrong, and there
> might exist initiatives that make this demarche unnessesary, but I think,
> that the coupling of these two things would be beneficial for both composers
> and linguists, and, -not least - ME!
>
> Well if anyone can see the potential in this, and are able to work it out,
> that would really be great!
>
>
>
Peiman wrote:
>You are not
> going to use it as a score generator but to do a clear task
I'm not sure what this means 'clear task', but I have an idea: something to do with not preparing files outside MAX/MSP
>(in this case
> LPC), for that all you need to do is read the tutorial for that particular
> opcode (LPC synthesis) or even just copy and paste the tutorial example.
...now how to find this famous tutorial?!!?
> Csound it not like max/msp in that you are not faced with the basic building
> blocks but already designed opcodes that do particular synthesis tasks. This
> means that to do LPC for instance you just need to give the right parameters
> to the right opcodes which with csound~ you can control in real-time and
> graphically in max.
... 'opcodes' qu'es-ce que c'est?
>
> Best
> Peiman
Thanks, Peiman, but as you see, I'm illiterate on this one, so I need everything explained....
haha OK, opcodes are like "instruments" similar to msp objects, each
designed to do a certain task. More often than not each opcode is like a
ready to use synthesizer whose parameters can be controlled by you. So in
order to do a single processing or synthesis "task" like filtering you
usually only need one opcode, particularly in the case of csound~ were you
will do the control in max itself.
There is a basic tutorial here that shouldn't take over an hour to go
through:
http://www.csounds.com/toots/index.html
The manual is here:
http://www.csounds.com/manual/html/index.html
Here is the LPC page of the manual:
http://www.csounds.com/manual/html/SpectralLpcresyn.html
Hope that helps
P
On 08/08/2007, Casper Cordes wrote:
>
>
> Peiman wrote:
>
> >You are not
> > going to use it as a score generator but to do a clear task
>
> I'm not sure what this means 'clear task', but I have an idea: something
> to do with not preparing files outside MAX/MSP
>
> >(in this case
> > LPC), for that all you need to do is read the tutorial for that
> particular
> > opcode (LPC synthesis) or even just copy and paste the tutorial example.
>
> ...now how to find this famous tutorial?!!?
>
> > Csound it not like max/msp in that you are not faced with the basic
> building
> > blocks but already designed opcodes that do particular synthesis tasks.
> This
> > means that to do LPC for instance you just need to give the right
> parameters
> > to the right opcodes which with csound~ you can control in real-time and
> > graphically in max.
>
> ... 'opcodes' qu'es-ce que c'est?
>
> >
> > Best
> > Peiman
>
> Thanks, Peiman, but as you see, I'm illiterate on this one, so I need
> everything explained....
>
>
>
Thank you Peiman! (does anyone 'pei' you? ;-)
I'm a lot wiser, now. So, what I've done, is to
1) install Csound5 on my mac, I've installed the Csound~ object in max.
2) I've learned, that in order to get csound~ to execute the .sco and .orc files from the tutorial, I need to make a message saying 'csound Toot01.orc Toot01.sco', for example, and max/msp can perform the thing.
3) I've learned, that even the stupidest spelling error can be fatal (no surprise to someone who have worked a tiny bit with scripting).
4) I've also learned that I have to write the .sco and .orc files in a text object in max, and save them from here; in Textedit, Word or any other textediting software, there's trouble (maybe I need to know the right format)
So far so good.
But for what is with the LPC-thing, I still need some keys, to open closed doors with.
I can see, that I need to open the door 'LPANAL'; there is apparently no tutorial for this, and there is no example. I tried to write some syntax and send to the csound~ object, but nothing happened; not even an error message in the max window. I've included a screenshot of the patch (the soundfile sound.aiff was in the right folder and should be reachable for max)
(I hope my mental limitations, now publicized, will be helpful to others out there; I'm considering buying the Csound book, but I hesitate until knowing for certain that this is the right path for what I'm after, - I'm reluctant to spend hours and hours on something that leads me nowhere; now it SEEMS to be the right path, and if it is, I'll certainly invest what's needed...)
Hello,
yes they do pie me all the time!
Are you on mac or pc? It's best to get a front end for writing your
orchestra/score files with syntax highlighting and all (for mac MacCsound
and windows winxound for instance). You can in fact just run the analysis
utility from the terminal away from max (easier that way) or use one of the
front ends (Cecilia is a good one for this both mac and windows).
The page for lpanal is here
http://www.csounds.com/manual/html/lpanal.html
I am in a hurry now to get to the airport! but if you cannot work it out
send another message and I'll be able to reply in a couple of days.
Also it may be a good idea for you to join csound mailing list for any cs
related stuff.
Best
Peiman
On 09/08/07, Casper Cordes wrote:
>
> Thank you Peiman! (does anyone 'pei' you? ;-)
>
> I'm a lot wiser, now. So, what I've done, is to
>
> 1) install Csound5 on my mac, I've installed the Csound~ object in max.
>
> 2) I've learned, that in order to get csound~ to execute the .sco and .orc
> files from the tutorial, I need to make a message saying 'csound
> Toot01.orc Toot01.sco', for example, and max/msp can perform the thing.
>
> 3) I've learned, that even the stupidest spelling error can be fatal (no
> surprise to someone who have worked a tiny bit with scripting).
>
> 4) I've also learned that I have to write the .sco and .orc files in a
> text object in max, and save them from here; in Textedit, Word or any other
> textediting software, there's trouble (maybe I need to know the right
> format)
>
> So far so good.
>
> But for what is with the LPC-thing, I still need some keys, to open closed
> doors with.
>
> I can see, that I need to open the door 'LPANAL'; there is apparently no
> tutorial for this, and there is no example. I tried to write some syntax and
> send to the csound~ object, but nothing happened; not even an error message
> in the max window. I've included a screenshot of the patch (the soundfile
> sound.aiff was in the right folder and should be reachable for max)
>
> (I hope my mental limitations, now publicized, will be helpful to others
> out there; I'm considering buying the Csound book, but I hesitate until
> knowing for certain that this is the right path for what I'm after, - I'm
> reluctant to spend hours and hours on something that leads me nowhere; now
> it SEEMS to be the right path, and if it is, I'll certainly invest what's
> needed...)
>
>
>
>
>
> Thanks for the interest, everybody; I was just thinking, that I should
> describe, excactly what I'm after:
>
> I would like to be able to use formant/spectral information analysed
> from speech on other sounds. If used on white noise, the effect is
> like (clear) whispering. When stretched over time, there can be some
> interesting filtering going on. What's important is, that the
> calculations have a low cpu cost, and I suppose, that a maximum of 16
> numbers per frame, or maybe 5 pairs (frequency/bandwidth) will
> suffise. But I'm not able to work it out in MAX/MSP. I CAN make it
> happen, when using PRAAT, (see praat.org), but then its not real time.
>
>
For the quality of sound that you're talking about, you don't
necessarily need to do LPC. In DSP there are often multiple ends
towards the same goal, so you might try something different.
From what you describe, it sounds like uou could also do this by
FFT-convolving the white noise with the soundfile. Won't be the same,
but if it's to bright, you could always simply convolve the original
with rand~ 2000, etc.
If you want to remove the pitched material from the original soundfile,
you could also granulate it with grains 1 ms or less in duration.
These will have no perceivable pitched content. (by this point, it
would probably not improve the sound to convolve it with the noise as
the processed sound will be quite noisy already.)
Nathan Wolek's Granular ToolKit should help you do the granular portion
of this.
Peter McCulloch
>
>
www.petermcculloch.com