TensorFlow.js in Max/MSP with a practical example
Greetings, this topic is a translation of a tutorial I wrote for my blog (in Italian) Musica Profonda, where I talk about music and deep learning. It is an example of how TensorFlow can be used inside Max, using it and the freely available model YAMNet to recognize sounds featured in a file.
First of all, you can download the full patch here (you also find it attached to this post). The only external library you will need is HISSTools (available directly from Max Package Manager), used to easily downsample the files you are loading to 16 kHz (which is the sample rate used by the neural network).
First of all, let's create a buffer that we will populate with the sound file we are going to use for timbre recognition:

Then, let's create an object node.script where we put this code (functions are seen in detail soon):
const Max = require('max-api');
const tf = require('@tensorflow/tfjs');
const fs = require('fs');
const modelUrl = 'https://tfhub.dev/google/tfjs-model/yamnet/tfjs/1';
var channels = 1;
buffer = [];
main_sounds = [];
let model;
async function init()
{
model = await tf.loadGraphModel(modelUrl, { fromTFHub: true });
}
async function readbuf_async(file)
{
data = fs.readFileSync(file);
buffer = [];
for (o = 0; o < data.length - (4 * channels); o += 4 * channels)
{
buffer.push(data.readFloatBE(o));
}
await Max.post("Buffer filled!");
};
async function predict()
{
waveform = tf.tensor(buffer);
const [scores, embeddings, spectrogram] = model.predict(waveform);
maxscore = scores.mean(axis=0).argMax();
scores_length = scores.shape[0];
main_sounds = [];
arr = scores.arraySync();
for (i = 0; i < scores_length; i++)
{
main_sounds[i] = arr[i].indexOf(Math.max.apply(Math, arr[i]));
}
await Max.post("Prediction done!");
await Max.outlet(main_sounds);
}
Max.addHandler('init', () =>
{
init();
});
Max.addHandler('clearbuf', (sample) =>
{
buffer = [];
});
Max.addHandler('setchans', (chans) =>
{
channels = chans;
});
Max.addHandler('readbuf', (file) =>
{
readbuf_async(file);
});
Max.addHandler('predict', () =>
{
predict();
});
We attach these messages to the node.script object:
script npm install @tensorflow/tfjs: to install TensorFlow.js
script start: to start the script, a script stop message can be useful, too
init: to call the function with the same name that loads the YAMNet model
predict: to call the function with the same name that make the prediction
setchans $1: to tell the script the number of channels of the buffer, with $1 linked to the last outlet of the info~ object. Keep in mind that for the actual prediction only the first channel will be used
readbuf $1: to tell the script the path of the temporary file which the buffer has been written to. The $1 argument is received by a JavaScript (inside a js object, the sendbuf.js you can see in the first picture) that automatically recovers the file path:
autowatch = 1;
inlets = 1;
outlets = 1;
function bang()
{
var patcher_dir = this.patcher.filepath.replace(patcher.name + ".maxpat", "");
outlet(0, patcher_dir + "samplebuf.tmp");
}
The same script sends the filename and its path also to the message samptype float32, writeraw $1, to be linked to the buffer, that will be read by Node.js in the readbuf_async function.

Let's go back to the Node script: this has an outlet which sends to a multislider the timbre categories predicted by the function predict: it is a series of numbers from 1 to 521, each one corresponding to most appropriate category to classify a chunk of 480 milliseconds of the buffer content.
Playing the buffer with play~ and incrementing a counter every 480 ms, we will be able to know which is the current most represented sound. A coll links each category index to its name, so that it can displayed in a comment:

This is just a primer and it can be surely made in more efficient ways, but I wanted to share with you some of my discovering as a beginner in this area.
Let me know if it has been useful, and what are your experiences with Max and neural networks!
Thank you very much Valerio! This is really inspiring!
Hello Valerio, I will be in residency somewher between December and January, woking on music and neural network. This came just in time. I will let you know how it goes with your patch. Thank you very much.
Thank you! Looking forward to receiving your suggestions and improvements!
ooo nice one :)
Hey Valerio, I get stuck at step 5. I get an error message in the node monitor, it says : ENOENT: no such file or directory, open "file path…". Here is an image of the complete message. Here is an image of the complete message. I do not know Node much and I can't seem to find where is the problem ! Any ideas ?

Hello!
It seems an issue with the script struggling to find the file 'samplebuf.tmp', which should be created at the step 4 of the patch.
First check if the file has been created in '/Users/walidbreidi/Documents/Max 8/XxPatches_and_ResourcesxX/Valerio Orlandini/Tensorflow_YAMNet/patchers/'
If the file is there, probably the issue is that the path begins with 'MacBookPro15:' (sorry I developed the patch on Windows, even if I had feedbacks from Mac users that told me that it works fine, but it may depend on OSX version).
Turn presentation mode off and change manually the message under step 5 into the 'samplebuf.tmp' path (in your case "/Users/walidbreidi/Documents/Max 8/XxPatches_and_ResourcesxX/Valerio Orlandini/Tensorflow_YAMNet/patchers/samplebuf.tmp", so without the 'MacBookPro15:'), then click on the bang upon the message and see if you still encounter the issue. It may also be due to spaces into the path, but it would be very strange since Node.js received the complete path correctly.
If the file 'samplebuf.tmp' is not there, make sure you clicked on step 4, and check if the Max console gives you any warnings or errors.
Let me know if you managed to solve, sorry for the inconvenience and thank you for trying the patch out!

Hello, yes it works when I change manually the path by starting with "/Users/etc." Thanks a lot.
Is there a limit to the size of file I can upload? Also, does it make a difference if the file is stereo or mono?
There is no theoretical limit to audio file size (apart from the size of your RAM of course), but keep in mind that with long files (let's say more than a few minutes) the prediction may take a lot of time. It is done in the async Node.js process, though, so in the meanwhile you should be able to keep using Max without issues or glitches.
The files can have any number of channels, but YAMNet is trained on mono audio, so in any case only the first channel of your file will be used for prediction.
I tried Walter Rutmann's radio sound recording of a weekend (first radio sound film made in 1924 or 1929) and the predicitions very good even though some of the sounds do not exist anymore.
That's great, it seems a very interesting use case :)
I am going to improve the patch in order to show not only the first match, but also the following ones, possibly with percentages. Sometimes the first result is a generic one (like "animal") but the second and the third are more detailed (like "bird" and "crow" for example).
That would be great :)