zerox~ as "transient detector"??


    Jan 19 2015 | 9:58 pm
    This is something I've wondered about for a while. The short description of the zerox~ object says:
    zerox~ functions as a zero-crossing counter or transient detector.
    In what sense is it a transient detector? It deals with zero-crossings only, and as such, I don't see how it can be used to detect transients.

    • Jan 20 2015 | 2:11 pm
      The number of zero crossing increases when the frequency increase, and when there's transient there's often more high frequencies.
    • Jan 20 2015 | 3:18 pm
      except in the other 50% of possible cases. because if the input material is "bassdrums and toms over background noise" it is the other way round.
    • Jan 20 2015 | 3:55 pm
      the above points both make good sense. maybe if you had an isolated multi-track drum like a single snaredrum over a backing of pure silence, zerox~ could be useful as part of a larger sub-patch. On its own, im not sure how useful it would be for this particular task.
    • Jan 20 2015 | 4:12 pm
      yeah, it's not like if everyone agreed on what a transient is ;-)
    • Jan 20 2015 | 4:57 pm
      This paper gives a good idea of some of the techniques being used for onset detection: http://www.nyu.edu/classes/bello/MIR_files/2005_BelloEtAl_IEEE_TSALP.pdf
    • Jan 20 2015 | 5:55 pm
      The number of zero crossing increases when the frequency increase, and when there’s transient there’s often more high frequencies.
      Emmanuel - Yeah, sort of, but it seems to me like that would be coming at the problem of transient/onset detection sideways. That approach has some heuristic validity, but for many non-ideal input signals it would fail, no?
      To wit: "Natural" sound sources (the plucking of a guitar string) do fit that pattern of a brighter transient followed by a darker sustain. However, you could easily create a synthesized sound source that has NO variation in frequency content during its transients, and its onsets are apparent only through changes in amplitude. Because zerox~ has no perception of signal amplitude, it would completely miss the transients in such a signal.
      And thus, my confusion over why the official docs claim that zerox~ is a "transient detector". Show me a robust, working patch that uses zerox~ for this task, and I'll be happy to change my mind. But, short of that, it seems that description might best be removed from the docs, so as not to confuse new users. A "transient detector" that has no awareness of signal amplitude seems a strange idea indeed.
      Peter - Thanks for that link. Years back, I got really into working on beat-tracking algorithms, and I can see some of that same material referenced here (like the Goto and Muraoka paper). My sense is that any real-time onset detection that is NOT using first-order-difference of power envelopes is barking up the wrong tree. I've seen some example patches posted by users here that look for onsets using an absolute threshold, whereas taking the first-difference will give the relative change in signal energy. (And, as mentioned in the paper you linked to, this approach can be further improved by taking the first-difference of log E(n)).
    • Jan 20 2015 | 7:27 pm
      classic-vocoder uses zerox~ as a transient detector.
    • Jan 20 2015 | 7:57 pm
      So it obviously works, at least in one definition of what a transient is...
    • Jan 20 2015 | 8:00 pm
      Not really - the classic-vocoder uses zerox~ to differentiate between vowels and consonants in speech. Even if you want to call that "transient detection", it works only for the specialized case of speech signals.
    • Jan 20 2015 | 8:13 pm
      My two cents: I generally prefer the relative difference that Leigh mentions (code available in this thread: https://cycling74.com/forums/rolling-buffer-is-it-even-possible ) but it depends on what kind of transient you're trying to observe. I like to use a two-stage threshold for the detection. The dual atodb~ in the patch is not the most efficient way of doing it; you could divide the two relative to each other, but I feel like this approach is somewhat clearer when I use it in my teaching. It also makes it easier to see how you can build a transient ducker/booster.
      Relative difference works well on percussive material, but can be hard to tune where you have a mix of hard and soft onsets, e.g. niente on the clarinet. (perceptually, that's also a hard onset to detect, in fairness...) There are lots of ambiguous cases in music, and the most useful definition probably depends on your application. How do you interpret something like measured tremolo-->unmeasured tremolo-->unmeasured tremolo with first note accented-->smooth roll? In cases like that, one reasonable interpretation might be that there are hierarchies of onsets, some of which are more musically significant than others.
      As Leigh points out, the thing that zerox~ doesn't take into account is the amplitude of the signal, so it's making that determination regardless of whether any input signal is present or the system is just listening to line/ambient noise. This doesn't matter in the classic vocoder patch because the output of the vocoder is near 0 then because of the amplitude envelopes applied within the channels.
      There are some circumstances where transients happen without necessarily big shifts in amplitude. (they may be some shift, but it's below the threshold, for example) This could be something like string noise as a player shifts from one note to the next, or a slide in pitch. In those cases, soft onset detection might be helpful.
      I like to use onset detection to control input into processes such as multi tap-delays, since it makes them less prone to problems with feedback in a performance context. (it operates like a less extreme noise gate without a fixed threshold)
    • Jan 20 2015 | 8:32 pm
      Leigh, I meant if it's looking for transients that are being defined by a lot of high frequencies over a short period of time then it's fine. I realise this doesn't equal all transients, but considering this thread comes from a comment in a help file, and that a lot of MSP objects are multipurpose - I don't see what the bigdeelyis. Obviously transient detection is rly cool though, I have a dabble myself sometimes and have glanced at that paper linked a few times before now.
      edit - just tried to knock something up with it and jongly.aif, and I have no idea how to make it work.
    • Jan 21 2015 | 1:32 am
      >> yeah, it’s not like if everyone agreed on what a transient is ;-)
      or how useful the measurement of transients would be in the form of floating point numbers, haha.
      for the CPU which one single zerox~ lego brick eats, you could as well use an fft-based transient/harmonic analysis. then it works at least independent from the overall gain factor of the input (but is always relative instead), and it is less vulnerable to DC offset.
      maybe the programmers of the holy lego brick enviroment should focus on the syntax – and leave the semantics to the observer user.
      because, just in case someone missed it, zerox~ is in fact a video decorrelation object for usb printers. why i say this? because i can.
    • Jan 21 2015 | 1:46 am
      lol, i bet the vocoder example is older than the zerox helpfile.
    • Jan 21 2015 | 5:42 pm
      Peter, I like that approach of "hierarchies of onsets". In the end, it of course depends on the needs of a particular application, how nuanced you need your transients data to be. For a simple guitar effect, you probably just need a single level of "onset", which in Max, is easily represented by a series of "bangs".
      For a patch that responds in different ways to different levels of "onset", you could find a float (say, normalized [0-1]) useful as an "onset score", if your patch responds in a continuously variable way. Or, a simple distinction between "hard onset", and "soft onset" (established by two different threshold levels) would be sufficient if your patch responds in only two primary ways.
    • Jan 21 2015 | 5:47 pm
      onsidering this thread comes from a comment in a help file, and that a lot of MSP objects are multipurpose – I don’t see what the bigdeelyis
      Mike, to be accurate, it's more than just a comment in a help file, it's part of the primary description of the ~zerox object. And if you look up "transient detector" in the Max Object Thesaurus, it is the ONLY object labeled as such. So, in the absense of other better suited and more general purpose transient detectors, I think it's confusing for people trying to figure out Max.
    • Jan 21 2015 | 7:56 pm
      @Roman: actually, zerox~ was originally a third-party object by Richard Dudas, and it was included in the Max 4.0 distribution on my insistence so that I could use it in the classic-vocoder example patch.
    • Jan 22 2015 | 2:55 pm
      ok, so the egg wins. thanks for sharing the story. :)
    • Oct 30 2015 | 10:57 pm
      I have a quick question. To quote from above:
      My sense is that any real-time onset detection that is NOT using first-order-difference of power envelopes is barking up the wrong tree. I’ve seen some example patches posted by users here that look for onsets using an absolute threshold, whereas taking the first-difference will give the relative change in signal energy. (And, as mentioned in the paper you linked to, this approach can be further improved by taking the first-difference of log E(n)).
      In terms of Max, would the way to apply "log E(n)" to the signal you're dealing with simply be to send it through a [atodb~]?
    • Oct 31 2015 | 5:48 am
      Using atodb~ wouldn't be the same, since amplitude to decibel conversion involves an extra multiply. But in terms of Max, you could use [expr log].
      I can try to dig up an example, from when I was working on beat detection stuff...
    • Nov 01 2015 | 1:49 am
      That would be great if you could. I threw my own transient detector together before I even thought to look online for tips, and now I'm trying to clean it up a bit and get it working the best it can.
    • Nov 02 2015 | 10:50 pm
      You bet, I dug up some old onset detection patches and put this demo together.
      There's a gswitch in the middle of the patch that switches between "plain" first-order difference, and first-order difference of logs.
      Note that the [deltaclip~] object used in each detector patcher will have quite an effect on the detection. I'm not sure those are the ideal values, or the ideal method, of shaping a waveform into an amplitude envelope.
    • Nov 03 2015 | 2:02 am
      Thanks mayne. It's nice to see as many examples of how people go about this as possible.
      In messing with all these patches it seems to me that adding a noise floor threshold is essential for really making the transients pop out. So below a certain level it reads only silence and it does its analysis above that level. Somehow I feel like it's kind of cheating, but I really can't get any of these patches to work reliably without it.
      Here's what I came up with originally. It works pretty well for generating MIDI from percussive audio events. I guess my next experiment will be to see if it can be improved by using a differential instead of an absolute amplitude threshold, although mine is a bit more dynamic than that. It's a moving threshold that tends to zero in on an ideal point. Have a look and see what I mean. Critique welcome.
    • Nov 04 2015 | 5:33 pm
      The patch I posted above should be pretty impervious to noise. For a general-purpose onset detector, I would try to avoid as many "absolute" thresholds as possible, and work with relative change and relative thresholds. (The patch I posted doesn't follow that idea exactly, since it includes an absolute threshold for firing the [click~] sound.)
    • Nov 05 2015 | 11:29 pm
      May I ask what benefit there is in using a squaring [*] object instead of [abs~] in your patch?
    • Nov 06 2015 | 4:39 pm
      Yes, it is because we are looking for onset energy, and "The energy in a wave is proportional to its amplitude squared."
      The reasoning behind this is a bit more nuanced than I can explain well, as it's been quite a few years since physics class...
    • Nov 09 2015 | 11:45 pm
      Yes, it does seem that the differential method will ignore any background noise you have, but it also seems like it can absorb some of the transients you're looking for. So for example in a series of drumbeats placed relatively closely, the first will stand out much more prominently than the rest, as the others are masked by the residual noise of those before them and some may fail to be recognized at all.
      At this point I have two different versions of my device: my old one and a new differential one. I'm having trouble deciding on which to use. Both work pretty well generally, but the old tends to wind up with some superfluous notes created, while the new tends to miss about as many.
    • Nov 10 2015 | 2:56 am
      It's a matter of adjusting the attack/release settings for the two envelope followers. If you're having trouble detecting repeated onsets, try shortening the release times or decreasing the distance required to detect a transient. (I'd recommend the first over the second)
      There are ther things you can look at such as the derivative of the amplitude envelope (or even its derivative) as well.
    • Nov 14 2015 | 12:10 am
      You were right Peter. I shortened the release on my first slide~ from 4410 to 1500 samples and started getting the response I was looking for. Only problem was that caused all sorts of messy false triggerings as well, even (if not especially) during complete silence. But I fixed that by raising my thresh~ threshold from 18 to 30 (dB). Here's what I'm working with if you're curious
      I've seared most of my experience with calculus from my memory, but to find the derivative of a function you're looking for instantaneous slope, right? In Max, would that amount to first sending the whole thing through a delta~?
    • Nov 28 2015 | 12:08 am
      Thought I'd mention that my patch was still sometimes failing to recognize transients that didn't have such a sudden attack, e.g. cymbals, so I also had to raise the attack on my second slide~ from 4410 to 45000 samples. Doing this again caused all sorts of mis-firings, so I also had to raise my threshold to 36 dB. Seems to be working satisfactorily now.
    • Nov 28 2015 | 3:02 am
      your cymbal probably has far less power and also lower peaks than other input material. have you thought about processing different parts of the spectra individually?
    • Nov 28 2015 | 5:25 am
      I had not. I've got it working well enough for now I think, but if I decide to come back and refine it I will certainly keep that in mind.