I'm replying to my own post with 2 important answers:
1) It's all about the phase plane. The IFFT method relies on an image with 2 planes: amplitude-difference and phase-difference. The phase-difference plane is inherently missing from whatever image you intend to sonify. If it's empty then the result is robotic and choppy. If you fill it with noise the smoothness improves marginally. If you scale that noise between 0 and pi it improves dramatically. I finally ran into this nugget of info in a paper or forum post somewhere but I forgot where. Sorry! (Demo patch is attached where you can load an image and toggle between phase plane fill modes: "matrix2sound test ZLP-PiNoise.maxpat")
2) The smoothest image sonification patches I've made thus far have used an FFT filter applied to a noise source. In most cases this seems to result in more natural sounding translations than what I mention above, at the cost of high CPU usage. (Demo patch is attached: "matrix2sound test ZLP-FFT-filter.maxpat")
You can find the same basic architecture in...
Help menu / examples / fft-fun / forbidden-planet
... but driven by a multislider instead of a matrix.
Also see Patch 28 from BazTutorials Patch-A-Day series on youtube: