Ai Stream Diffusion real time

Misc

billyanok

Wow,

any chance of Max integration?

https://github.com/cumulo-autumn/StreamDiffusion

https://www.instagram.com/reel/C4Nu_0iAed1/?igsh=MzRlODBiNWFlZA==

sandroid

I've made this work using Max/MSP, touch designer / stream diffusion and some scripts in python (to talk to openAI). https://deep-alice.blogspot.com/p/resumo-instalacao-deep-alice.html

The texts are in still only in Portuguese but I can provide some more details here if someone is interested. There's a technical overview in this link:

https://deep-alice.blogspot.com/p/desenvolvimento.html

billyanok

Hey Sanroid,

Thx for the reply. Is it possible to do this without Touchdesigner? Can Jitter handle the Touchdesigner side of things?

I’m interested in more details please.

Rob Ramirez

here's my repo made a while back for interaction with a stable-diffusion automatic 11whaterver thing. likely many of the concepts are transferable to the new hotness.

no idea what state I left this thing in. maybe I'll pick this up again at some point, and maybe it's useful for others in the meantime

https://github.com/robtherich/sdweb-max

sandroid

@BILLYANOK: I've never used TD before. Actually TD was used only as bridge (and some minor video filters) to SD in this project, since my TD knowledge is pretty limited. I use OSC to address the parameters from Max to ST thru TD (which works as a kind of container for ST) . All the logic part and control of the states of the whole interaction was done in MAX/MSP. It worked pretty well in the end, reaching 18 fps in my rtx3090. Nevertheless, I will also check Robert's approach. I would prefer to keep everything inside max/MSP for sure.

sandroid

@Billyanok: here is a translation of the text in the technical part of the blogger: The work DEEP ALICE presents, as a technical challenge, the real-time generation of images by Artificial Intelligence from pictures produced by the interactors during the Exhibition/Festival. In addition to the images serving as the initial input, we also have prompts that are automatically generated, coordinated with the images, from Lewis Carroll's books "Alice's Adventures in Wonderland" and "Through the Looking-Glass." This information processing flow involves the real-time composition of images selected by interactors and the automatic generation of prompts, orchestrating various programming programs/platforms. Serving as the central control unit for the entire interaction process, in addition to capturing video images, is a MAX/MSP programmed software. This software, a collection of modules with various functions (see table below), generates and sends the automatically generated prompts, as well as transmits the captured image to another programming platform called Touch Designer (TD), which, in this case, serves as the interface for StreamDiffusion (ST), responsible for AI image generation. The MAX/MSP programmed software controls the variation of AI image generation parameters by sending them to TD (thru OSC). TD, in turn, forwards these parameters to StreamDiffusion. In the background, the two software applications (MAX/MSP and TD) also exchange the AI-generated images (using SPOUT): they are generated in TD and sent to MAX/MSP for real-time "post-production" (image adjustments, merging between images, overlay of text/prompts). In addition to image processing, the MAX/MSP software communicates with two Python-programmed applications, which, in turn, communicate with the OpenAI Platform. One of them translates the prompts produced by MAX/MSP from English to Portuguese for each prompt generation, and the other reverses the characters of the prompts in the deepest phase of the "dream." It is worth noting that the Python programs were developed with the assistance of AI (ChatGPT). Just for clarification: this work is about Playing i realtime with images generated by prompts that are also ramdominly generated inspired by "digital" CUTUPs (William Burroughs), using the 2 books from Lewis Carrol as source material.