first.. get the sound you make into max! :)
then, measure the amplitude, set a threshold to determine if you hit the drum, or if is noise.
Get the video into max, [jit.grab] or [jit.movie].
There are a lot of jitter objects that can apply an effect on a video stream in max. Using the the sound amplitude,
you can trigger an event, chain of event, or just follow the sound amplitude, and change the parameters of the effect you want to apply!
Here the image is multiplied by the amp of the input sound, so the image is faded in and out according to the level of the sound
You can begin from this.
Hope it gives you an idea...