File System Traversal - For Fun!

It all started one day during a conversation with a friend. We were talking about something, and I realized I had the perfect photo to enhance the discussion. I also realized that it was in My Great Big Archive, and that I would never be able to find it.

This was pretty depressing, but it also got me thinking: I have a huge (16 TB) backup disk that contains almost all of my life since about 2005. How many photos were on that drive? And would I ever see any of those images again?

I decided to change things: I may not be able to categorize/tag/sort all of the photos that were collected over the years, but I certainly could make sure that I had the chance to see them! I decided to make a Max-based photo viewer that would be robust enough to traverse any hard drive – including my 16 TB monster.

In order to pull it off, I needed some industrial-strength help. So I turned to the Node for Max objects. I remembered that there was an example in the Extras patch that offered file system access; I figured that would be the right place to plumb for the tools to get the job done.

PixFinder.zip

zip 4.67 KB

download the patches used in this tutorial

Synchronizing the Asynchronous

Working one's way through a file system is a classic example of recursive programming: you have a function that gets a directory listing, and if any of the items are subdirectories, you call that same function with the subdirectory. It’s an elegant way to walk through a file system, but in practice it can also be tricky.

In trying to traverse a large file system, I started with the Node for Max Extras “fs” example as the starting point.

My first modification was to find images and return their file paths whenever they were found. This example asynchronously does file system recursion, and it pointed out a problem with asynchronous behavior: it created enormous number of simultaneous folder requests, then started generating an equally enormous number of Max messages with found images. Immediately Max got overwhelmed, with messages getting ignored, and eventually crashing Node.

Not a good option.

I knew that the Node fs module had synchronous versions of all the calls, and that would provide a single thread of execution – so that I would only have to manage a one file discovery at a time. So I changed the fs calls to synchronous versions (for example, using readdirSync() instead of readdir), hoping that this would provide for better processing. It kind of worked, but the synchronous functions are quite ‘greedy’, and really kill the system while they are running.

I found that Max – and the computer itself – got very unresponsive when using the synchronous functions, and so I had to abandon that approach, as well

The third attempt used something in between pure asynchronous and pure synchronous code: JavaScript’s async functions and the await keyword functionality. You can find out more about this on Mozilla:

https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Asynchronous/Async_await

Using this system, you can create asynchronous functions, but pause the program’s execution until they return from processing.

// findFiles: kick off a volume traversal
async function findFiles(vol) {
    var v = vol;
    if (isMac) v = '/Volumes/' + v; // Mac formatting

    await maxAPI.outlet("info", "volume " + v);
    await findRecursiveHelper(v);
};

.
.
.

await findFiles(vol);

This is the best of both worlds, because I can control the order of execution while I’m still allowing for asynchronous processing – which keeps everything responsive. I now had a Node for Max script that could run through an entire hard drive without crashing or choking Max.

Waiting for Bang

But just spewing out image file names wasn’t going to be sufficient – I would also have to be able to wait while my Max patch did something with them. This is a little tricky in a Node for Max script, because I don’t want to make assumptions about how long it will take for Max to process the file information. So I created a waitForBang() function that would do what it says: pause the script execution until it received a bang message.

let promResponder = null;

.
.
.

// waitForBang: provide a promise that waits for a bang.
// ------------------------------------------------------
function waitForBang() {
    return new Promise((resolve, reject) => {
        promResponder = () => {
            promResponder = null;
            resolve();
        };
    });
};

This function does two things: it returns a JavaScript promise that will need to be resolved, and it stores a function for resolving the promise in the variable named promResponder. To pause the script execution, just call the waitForBang() function via await and the script will patiently wait.

Now, when a bang message is received, if promResponder is non-null, it will resolve the open promise and allow execution to proceed.

// handlers: set up the message handlers (from Max)
const handlers = {
.
.
.
    bang: function () {
        if (promResponder) {
            (promResponder)();
        };
    }
};

Again, all of this is provided by that await functionality.

The Max Patch

With so much of the heavy lifting being done by the Node for Max script, the Max patch itself is relatively simple. It includes all of the tooling required for N4M processing (the node.script and node.debug objects, as well as messages to start and stop the script), the patching required to take the file path information produced by the node.script object, and display it in a jit.pwindow. I also send the path of the image to a message box so that I can see that information when a particularly interesting photo appears. (Note: Make sure to change the file path in the 'run' messagebox to point to the location you would like to begin your search!)

What is interesting is the way that we send the bang message back to the script to continue processing after an image is found: The right outlet of the jit.matrix object is normally used to dump information about the enclosed data, but it also outputs a message whenever an importmovie request is processed. By producing a bang whenever this message is received, we know that the image has been processed. However, if we would immediately send the bang, it would proceed without waiting – and we would not get much of a chance to see it.

So we delay the bang a little - 500 ms by default, but you can change it to change the length of time that the image is displayed. We then send the bang message to the script, and processing will proceed.

The Results

The question remains: did it help me find the image I’d recalled? Not really – there were simply too many photos to review individually. But once this program started running, I got a chance to see photos I thought I’d never see again, and even some that I didn’t even remember taking. I also got to see a lot of silly photos that were stuck in NPM packages, website archives and other obscure corners of my hard drive. The patch ran for days, and every time I poked my head in the room to check its progress, I would get mesmerized by the stream of images I would have never otherwise encountered.

Of course, the idea of traversing through massive data stores opens up other interesting ideas. What if you searched for all of the movie files on your system, and displayed them end-to-end? Or found all the random audio files scattered on your hard drive and had a ‘auto-DJ’ create an abstract mix? The ideas are endless – so have a blast!

by Darwin Grosse on February 28, 2022

patch

TConnors

Wonderful!

Andrew Pask

Hey I know that house

Max Gardener

Whoa. There's the window with the view of Pike's Peak!

Daniel Maruniak

This was amazing Darwin. Thank you.