Syllable Counter
I want to design a system that automatically slows down the sample rate of an audio file when it detects someone's speech goes faster than a given threshold.
Really what I'd love is a system that can count the amount of words being spoken per second by a speaker, but maybe its more realistic to work with sylables. Anybody have some suggestions for how I go about this, or where I can go digging?
checking for vocals is not easy (as ng, mm, or rrr are very close to ae and oo), and checking for consonants, for silence or for transients ("attacks") seems unappropiate.
maybe you could combine it a abit and check for "change of timbre".
that will not be super exact, but it should be widely linked to the average speed of words or syllables (say over periods of 5-8 syllables)
eventually an fft analyzer with 64 bands and then truncate the gain resolution into 4 or 5 steps (from silence to 1.0 ) is a starting point?
when i think about it, simply getting the envelope could also work.
This guy has written a pretty amazing blog on the topic
https://orchestraofspeech.com/blog/syllable-detection/
Are you still interested in this? Have you made any progress since you posted on here?