zerox~ as "transient detector"??

Leigh Marble's icon

This is something I've wondered about for a while. The short description of the zerox~ object says:

zerox~ functions as a zero-crossing counter or transient detector.

In what sense is it a transient detector? It deals with zero-crossings only, and as such, I don't see how it can be used to detect transients.

Emmanuel Jourdan's icon

The number of zero crossing increases when the frequency increase, and when there's transient there's often more high frequencies.

Roman Thilenius's icon

except in the other 50% of possible cases. because if the input material is "bassdrums and toms over background noise" it is the other way round.

daddymax's icon

the above points both make good sense. maybe if you had an isolated multi-track drum like a single snaredrum over a backing of pure silence, zerox~ could be useful as part of a larger sub-patch. On its own, im not sure how useful it would be for this particular task.

Emmanuel Jourdan's icon

yeah, it's not like if everyone agreed on what a transient is ;-)

Peter McCulloch's icon

This paper gives a good idea of some of the techniques being used for onset detection: http://www.nyu.edu/classes/bello/MIR_files/2005_BelloEtAl_IEEE_TSALP.pdf

Leigh Marble's icon

The number of zero crossing increases when the frequency increase, and when there’s transient there’s often more high frequencies.

Emmanuel - Yeah, sort of, but it seems to me like that would be coming at the problem of transient/onset detection sideways. That approach has some heuristic validity, but for many non-ideal input signals it would fail, no?

To wit: "Natural" sound sources (the plucking of a guitar string) do fit that pattern of a brighter transient followed by a darker sustain. However, you could easily create a synthesized sound source that has NO variation in frequency content during its transients, and its onsets are apparent only through changes in amplitude. Because zerox~ has no perception of signal amplitude, it would completely miss the transients in such a signal.

And thus, my confusion over why the official docs claim that zerox~ is a "transient detector". Show me a robust, working patch that uses zerox~ for this task, and I'll be happy to change my mind. But, short of that, it seems that description might best be removed from the docs, so as not to confuse new users. A "transient detector" that has no awareness of signal amplitude seems a strange idea indeed.

Peter - Thanks for that link. Years back, I got really into working on beat-tracking algorithms, and I can see some of that same material referenced here (like the Goto and Muraoka paper). My sense is that any real-time onset detection that is NOT using first-order-difference of power envelopes is barking up the wrong tree. I've seen some example patches posted by users here that look for onsets using an absolute threshold, whereas taking the first-difference will give the relative change in signal energy. (And, as mentioned in the paper you linked to, this approach can be further improved by taking the first-difference of log E(n)).

Marcel Wierckx's icon

classic-vocoder uses zerox~ as a transient detector.

Mike S's icon

So it obviously works, at least in one definition of what a transient is...

Leigh Marble's icon

Not really - the classic-vocoder uses zerox~ to differentiate between vowels and consonants in speech. Even if you want to call that "transient detection", it works only for the specialized case of speech signals.

Peter McCulloch's icon

My two cents: I generally prefer the relative difference that Leigh mentions (code available in this thread: https://cycling74.com/forums/rolling-buffer-is-it-even-possible ) but it depends on what kind of transient you're trying to observe. I like to use a two-stage threshold for the detection. The dual atodb~ in the patch is not the most efficient way of doing it; you could divide the two relative to each other, but I feel like this approach is somewhat clearer when I use it in my teaching. It also makes it easier to see how you can build a transient ducker/booster.

Relative difference works well on percussive material, but can be hard to tune where you have a mix of hard and soft onsets, e.g. niente on the clarinet. (perceptually, that's also a hard onset to detect, in fairness...) There are lots of ambiguous cases in music, and the most useful definition probably depends on your application. How do you interpret something like measured tremolo-->unmeasured tremolo-->unmeasured tremolo with first note accented-->smooth roll? In cases like that, one reasonable interpretation might be that there are hierarchies of onsets, some of which are more musically significant than others.

As Leigh points out, the thing that zerox~ doesn't take into account is the amplitude of the signal, so it's making that determination regardless of whether any input signal is present or the system is just listening to line/ambient noise. This doesn't matter in the classic vocoder patch because the output of the vocoder is near 0 then because of the amplitude envelopes applied within the channels.

There are some circumstances where transients happen without necessarily big shifts in amplitude. (they may be some shift, but it's below the threshold, for example) This could be something like string noise as a player shifts from one note to the next, or a slide in pitch. In those cases, soft onset detection might be helpful.

I like to use onset detection to control input into processes such as multi tap-delays, since it makes them less prone to problems with feedback in a performance context. (it operates like a less extreme noise gate without a fixed threshold)

Mike S's icon

Leigh, I meant if it's looking for transients that are being defined by a lot of high frequencies over a short period of time then it's fine. I realise this doesn't equal all transients, but considering this thread comes from a comment in a help file, and that a lot of MSP objects are multipurpose - I don't see what the bigdeelyis. Obviously transient detection is rly cool though, I have a dabble myself sometimes and have glanced at that paper linked a few times before now.

edit - just tried to knock something up with it and jongly.aif, and I have no idea how to make it work.

Roman Thilenius's icon

>> yeah, it’s not like if everyone agreed on what a transient is ;-)

or how useful the measurement of transients would be in the form of floating point numbers, haha.

for the CPU which one single zerox~ lego brick eats, you could as well use an fft-based transient/harmonic analysis. then it works at least independent from the overall gain factor of the input (but is always relative instead), and it is less vulnerable to DC offset.

maybe the programmers of the holy lego brick enviroment should focus on the syntax – and leave the semantics to the observer user.

because, just in case someone missed it, zerox~ is in fact a video decorrelation object for usb printers. why i say this? because i can.

Roman Thilenius's icon

lol, i bet the vocoder example is older than the zerox helpfile.

Leigh Marble's icon

Peter, I like that approach of "hierarchies of onsets". In the end, it of course depends on the needs of a particular application, how nuanced you need your transients data to be. For a simple guitar effect, you probably just need a single level of "onset", which in Max, is easily represented by a series of "bangs".

For a patch that responds in different ways to different levels of "onset", you could find a float (say, normalized [0-1]) useful as an "onset score", if your patch responds in a continuously variable way. Or, a simple distinction between "hard onset", and "soft onset" (established by two different threshold levels) would be sufficient if your patch responds in only two primary ways.

Leigh Marble's icon

onsidering this thread comes from a comment in a help file, and that a lot of MSP objects are multipurpose – I don’t see what the bigdeelyis

Mike, to be accurate, it's more than just a comment in a help file, it's part of the primary description of the ~zerox object. And if you look up "transient detector" in the Max Object Thesaurus, it is the ONLY object labeled as such. So, in the absense of other better suited and more general purpose transient detectors, I think it's confusing for people trying to figure out Max.

Marcel Wierckx's icon

@Roman: actually, zerox~ was originally a third-party object by Richard Dudas, and it was included in the Max 4.0 distribution on my insistence so that I could use it in the classic-vocoder example patch.

Roman Thilenius's icon

ok, so the egg wins. thanks for sharing the story. :)

to_the_sun's icon

I have a quick question. To quote from above:

My sense is that any real-time onset detection that is NOT using first-order-difference of power envelopes is barking up the wrong tree. I’ve seen some example patches posted by users here that look for onsets using an absolute threshold, whereas taking the first-difference will give the relative change in signal energy. (And, as mentioned in the paper you linked to, this approach can be further improved by taking the first-difference of log E(n)).

In terms of Max, would the way to apply "log E(n)" to the signal you're dealing with simply be to send it through a [atodb~]?

Leigh Marble's icon

Using atodb~ wouldn't be the same, since amplitude to decibel conversion involves an extra multiply. But in terms of Max, you could use [expr log].

I can try to dig up an example, from when I was working on beat detection stuff...

to_the_sun's icon

That would be great if you could. I threw my own transient detector together before I even thought to look online for tips, and now I'm trying to clean it up a bit and get it working the best it can.

Leigh Marble's icon

You bet, I dug up some old onset detection patches and put this demo together.

There's a gswitch in the middle of the patch that switches between "plain" first-order difference, and first-order difference of logs.

Max Patch
Copy patch and select New From Clipboard in Max.

Note that the [deltaclip~] object used in each detector patcher will have quite an effect on the detection. I'm not sure those are the ideal values, or the ideal method, of shaping a waveform into an amplitude envelope.

to_the_sun's icon

Thanks mayne. It's nice to see as many examples of how people go about this as possible.

In messing with all these patches it seems to me that adding a noise floor threshold is essential for really making the transients pop out. So below a certain level it reads only silence and it does its analysis above that level. Somehow I feel like it's kind of cheating, but I really can't get any of these patches to work reliably without it.

Max Patch
Copy patch and select New From Clipboard in Max.

Here's what I came up with originally. It works pretty well for generating MIDI from percussive audio events. I guess my next experiment will be to see if it can be improved by using a differential instead of an absolute amplitude threshold, although mine is a bit more dynamic than that. It's a moving threshold that tends to zero in on an ideal point. Have a look and see what I mean. Critique welcome.

Leigh Marble's icon

The patch I posted above should be pretty impervious to noise. For a general-purpose onset detector, I would try to avoid as many "absolute" thresholds as possible, and work with relative change and relative thresholds. (The patch I posted doesn't follow that idea exactly, since it includes an absolute threshold for firing the [click~] sound.)

to_the_sun's icon

May I ask what benefit there is in using a squaring [*] object instead of [abs~] in your patch?

Leigh Marble's icon

Yes, it is because we are looking for onset energy, and "The energy in a wave is proportional to its amplitude squared."

The reasoning behind this is a bit more nuanced than I can explain well, as it's been quite a few years since physics class...

to_the_sun's icon

Yes, it does seem that the differential method will ignore any background noise you have, but it also seems like it can absorb some of the transients you're looking for. So for example in a series of drumbeats placed relatively closely, the first will stand out much more prominently than the rest, as the others are masked by the residual noise of those before them and some may fail to be recognized at all.

At this point I have two different versions of my device: my old one and a new differential one. I'm having trouble deciding on which to use. Both work pretty well generally, but the old tends to wind up with some superfluous notes created, while the new tends to miss about as many.

Peter McCulloch's icon

It's a matter of adjusting the attack/release settings for the two envelope followers. If you're having trouble detecting repeated onsets, try shortening the release times or decreasing the distance required to detect a transient. (I'd recommend the first over the second)

There are ther things you can look at such as the derivative of the amplitude envelope (or even its derivative) as well.

to_the_sun's icon
Max Patch
Copy patch and select New From Clipboard in Max.

You were right Peter. I shortened the release on my first slide~ from 4410 to 1500 samples and started getting the response I was looking for. Only problem was that caused all sorts of messy false triggerings as well, even (if not especially) during complete silence. But I fixed that by raising my thresh~ threshold from 18 to 30 (dB). Here's what I'm working with if you're curious

I've seared most of my experience with calculus from my memory, but to find the derivative of a function you're looking for instantaneous slope, right? In Max, would that amount to first sending the whole thing through a delta~?

to_the_sun's icon

Thought I'd mention that my patch was still sometimes failing to recognize transients that didn't have such a sudden attack, e.g. cymbals, so I also had to raise the attack on my second slide~ from 4410 to 45000 samples. Doing this again caused all sorts of mis-firings, so I also had to raise my threshold to 36 dB. Seems to be working satisfactorily now.

Roman Thilenius's icon

your cymbal probably has far less power and also lower peaks than other input material.
have you thought about processing different parts of the spectra individually?

to_the_sun's icon

I had not. I've got it working well enough for now I think, but if I decide to come back and refine it I will certainly keep that in mind.