Well you don't literally. You feed the positions of the objects in your 3d world into an audio positioning/spatializing system (ambisonics or other). Rendering 3d graphics and rendering audio will be 2 separate subsystems of your app. Generally, you'll calculate the relative position of the sound source to the listener, which is usually the same position as your 3d camera. This gives you a direction (expressed in angles or coordinates) and/or a distance which can be mapped to audio controls, from simple panning and volume control to advanced ambisonics etc.
You could also check out COSM, which does integrate 3d graphics and audio to a great extent: http://www.allosphere.ucsb.edu/cosm/ (I think development of it stopped at Max 5 though chances are it runs fine in Max 6)