I'm replying to my own post with 2 important answers:
1) It's all about the phase plane. The IFFT method relies on an image with 2 planes: amplitude-difference and phase-difference. The phase-difference plane is inherently missing from whatever image you intend to sonify. If it's empty then the result is robotic and choppy. If you fill it with noise the smoothness improves marginally. If you scale that noise between 0 and pi it improves dramatically. I finally ran into this nugget of info in a paper or forum post somewhere but I forgot where. Sorry! (Demo patch is attached where you can load an image and toggle between phase plane fill modes: "matrix2sound test ZLP-PiNoise.maxpat")
2) The smoothest image sonification patches I've made thus far have used an FFT filter applied to a noise source. In most cases this seems to result in more natural sounding translations than what I mention above, at the cost of high CPU usage. (Demo patch is attached: "matrix2sound test ZLP-FFT-filter.maxpat")
You can find the same basic architecture in...
Help menu / examples / fft-fun / forbidden-planet
... but driven by a multislider instead of a matrix.
Also see ARSS The Analysis & Resynthesis Sound Spectrograph...
... an open source command-line utility with great examples. It later became Photosounder, the commercial software I mentioned in my original post.
DISCLAIMER: I'm no FFT expert. These patches did the job for me, but maybe some gurus will chime in to correct my mistakes.
Well done, ZLP. I would even scale the noise to [0; 2PI] (or [-PI; PI] instead of [0; PI].
To hear even more dramatically the differences, make a version of your patch with a smaller FFT size: 2048 or 1024.