Audio Workflow for VR (max worldmaking package)

Tobias Rosenberger's icon

Hi,
after some time playing with Unity i come back to Jitter for a new VR project. Just curious: Is there a good workflow for Spatial Audio in VR/Jitter already existing? In Unity they use i.a. ambisonics with https://developers.google.com/vr/concepts/spatial-audio
Last time i used Jitter i made use of the hoa externals, but had always some sync/performance problems...

LSka's icon

In my opinion, IRCAM's spat~ is a really good resource: http://forumnet.ircam.fr/product/spat-en/
not free, but totally worth the price!

Mathieu Chamagne's icon

I tested and compared many binaural patches, externals and plugins, and for now my favorite one is the OculusSpatializer VST plugin.
https://developer.oculus.com/downloads/package/oculus-audio-sdk-plugins/
Sounds great, and is very lightweight.

Andrew Luck's icon

Curious as to what folks are using if integrating headset tracking?

vichug's icon

(just as an addendum to Mathieu Chamagne's message, apparently the vst has changed location, it should be here now : https://developer.oculus.com/downloads/audio/ . And @Andrew Luck, the Oculus SDK is probably a good place to start for headset tracking ?)

Andrew Luck's icon

The Oculus VST has a static listener. I suppose you could do the math on the emitter to account for the pitch yaw roll xyz of the headset, but sure would be nice if this was already there. After glancing at the latest build of the VR Packagon github, I see some HRTF stuff on there...perhaps audio has been included!? Will check today and let you know :D

kcoul's icon

FYI I am working on adapting the following project to be capable of stand-alone use in Max (without Ableton), with the M4L panners re-implemented as bpatcher widgets, and good cross-functionality with the Max Worldmaking package.

It is based on the HOA externals but to my knowledge no one has complained of the performance issues you described. If you could provide more detail it would be highly appreciated as the Envelop project depends on them so much.

I have been hoping to coordinate with Graham Wakefield on this but have not been able to reach him here or by email for about 6 months. Hopefully he will see this thread.

Matthew Gantt's icon

Chiming in to follow. Also, I've had some decent(ish) luck just sending object location to the 2d.map on the HOA ambisonic toolkit, then grabbing head rotation (using the new VR package) and using it to rotate the soundfield. Bit of a kludge, but it works!

Jonas Magnussen's icon

Just curious, what kind of project was it you were working on, if I might ask? Audiovisuals in VR sounds amazing, but the issue seems to be that only one person can experience it at a time. How is VR art getting presented right now is my broader question, I suppose.

enrico wiltsch's icon

https://facebook360.fb.com/spatial-workstation/

Matthew Gantt's icon

hey Enrico - good call - does the FB360 workstation support head tracking though? Feel like it didn't last time I used it, but it's probably changed a lot since then-

@Jonas - well said all above! Personally, I'm not super worried about the 'solitary' nature of the VR experience - for me it's maybe a continuation of listening to a record or reading a book, metaphor-wise. (i do think galleries could be more imaginative with the installs for these pieces though, as it often takes the form of a like, disney-land-esque line for a rollercoaster)

FWIW, here's a little documentation of a piece I did awhile back - VR in Unity, but used a bunch of OSC send from Ableton and Max4Live to control and move objects + generative sound, then sent OSC back to Max/MSP to report object location and head tracking/position + ambisonically pan audio:

Graham Wakefield's icon

Yep if you're pulling the devel branch of the VR package you'll see I've been exploring some spatial audio too, but bear in mind this is all work in progress and is probably a bit of a mess. Once it's working well and cleaned up I'll shift it to the master branch & update the version in the package manager. For the adventurous I'll describe what I've done & where I'm hoping to go -- hoping someone might want to pitch in!

There are so many options out there, but mainly I've been seeing if I can get an object working for direct path HRTF and doing as much of the spatialization in MSP/gen~ as possible. Partly this is because I think Max's strengths are in the freedom to experiment with different ideas. However I think it's also vital to have something there in the VR package that 'just works' for demonstration purposes, which means also platform- and license-friendly. E.g. Oculus's VST plugins are unfortunately 64-bit only and I'm not sure about the license, but more importantly it's all packaged up in a plugin that means you can't get too experimental with it. I've also looked into the Oculus Audio SDK, though I can't go much further until they make it public. Would be great to look into Google's Resonance too, but that also doesn't have a C/C++ SDK yet. Of course that could just be the shape of my ears. I'll also look at spat~ soon, though that's only going to be useful to forumnet members.

I also tried plugging into the HOAlib externals, but I found the HRTF to be not as spatially clear. The idea of rendering all sources to an ambisonic soundfield and then panning that is appealing in principle, but in everything I've tried so far, the result is spatially muddier than direct HRTF encoding of each source. It's not difficult to make each source relative to head pose before encoding, rather than rotating a sound field, and for some of the things I've tried the CPU cost of the HRTF didn't seem that high, so I'm focusing on that for now. An ambisonic->HRTF path could be added later for larger numbers of sound sources.

Right now the most progress I've had is via the Steam Audio API. Although it's not perfect by any means, and there's been a few bugs (most of them ironed out with the last release), the library is in active development, covers all platforms, and has a suitable license. There's also some room simulation stuff there that would be great to get into later. The direct HRTF encoding has a little bit of noticeable shifting between HRTF slices that I think could be eliminated, but I think it originates from deeper in the Steam Audio library and I haven't been able to get rid of it yet. I also found that the Steam library was not encoding direction per ear (this makes a quite perceivable difference when sound sources are closer than about a meter), so I added an option to compute distance and angle and run a separate HRTF for each ear. Short range sound sources are appealing for VR usage, and it definitely sounds better with separate ear paths.

Other than that I've been exploring direct distance cues (amplitude, air absorption filtering, Doppler delay, reverb mix) via gen~. This should be independent of what HRTF library is being used anyway. So far so good. The only stumble I'm having is in interpolation of distance for the sake of Doppler, which I'll describe below in the hope that someone has a good idea or motivation to help! Basically, since we're receiving head-tracking updates at approximately 90 Hz, we need to somehow interpolate/approximate intermediate poses to create a smooth distance curve for each sample at 44.1kHz or 48kHz. A linear interpolation is passable for amplitude attenuation, but unusable for Doppler -- the sharp transients at each linear segment result in a step-like pitch shift of the delayed signal (like a robotic modulation) which is unusuable. The ideal would be to come up with an interpolating filter that produces a very smooth output (minimal 2nd derivative, minimal ripple). It doesn't even have to exactly hit the data points being received, since it's less perceptually important that distance is accurate (and anyway tracking data has temporal jitter), than it is important that the changes are continuously smooth. I thought a Kalman filter might fit the task but I honestly can't figure out how to write it. I've been messing with another filter design that is similarly based on prediction & gradual error correction, and seems to do OK mostly, but it still blows up sometimes. So far I've been doing the filtering on the raw distance data, but I guess there might be an advantage to filtering the object and head pose data instead, since we might expect certain consistencies to object and human trajectories. Anyway if anyone has interest & ideas in this regard, they'd be very welcome!

Phivos-Angelos Kollias's icon

Dear Max people! Any new ideas and experiences in 2019!

Phivos-Angelos Kollias's icon

I ve been using Spat5 with Unity through OSC mapping (Max does all the sound and Unity all visuals). The problem is Spat5 being proprietary, can not function in stand-alone versions of patches (collectives/apps). It is included but if the computer running it does not have a licence it does not function.

tcarpent's icon

FYI, the Spat5 license has recently changed, and it can now be freely downloaded :
https://beta.forum.ircam.fr/article/detail/quoi_de_neuf/
http://forumnet.ircam.fr/ircam-forum-license/?lang=en

Also, regardless of the license, it should work in a stand-alone.

Phivos-Angelos Kollias's icon

Thanx @tcarpent for informing me on the availability of Spat's licence! That is good news. However, I had issues building a collective with spat and that's why I started searching on other options. We started discussing on that on spat forum. Still no solution from my side though. I ll make another attempt with the developers today.

Jose K Sani's icon

@phivos did you manage to setup " Spat5 with Unity through OSC mapping (Max does all the sound and Unity all visuals)". I can figure out spat5+reaper but i need to view my 360 video either in unity or in max itself. How to connect unity and max/spat5 over osc? My project is to do post production for recorded video and not any synthesised visuals. But i prefer to stick to the utility of Spat and convenience of Reaper but i am looking for solution to handle the visual part.

Phivos-Angelos Kollias's icon

Hey @JOSE K SANI
yes,
the project was done using Max for all sound operations, including the spatialization of SPAT and Unity for the visuals. We connected them through OSC.
The project is called A Symphony of Noise and was rather successful in terms of reviews and audience appreciation.