David Beaudry

I need some advice. I'm quite well versed in controlling and sharing data over a network (almost all my show control work involves this), however rarely do I need to control the same output over multiple machines. I better explain... :)

2 macs, each with its own camera feed looking at 1/2 of a performance area. The same set of visual effects and processing are happening on each machine (lots of shader effects), then all gets fed into a watchout machine which brings the two images together for one giant uber-projection. Both macs are identical in every way. After the initial camera input almost all processing is in gl, with a couple gl.videoplanes as the final output.

The problem I'm having is "syncing" xfading between different effects/looks across the machines. Let's say I want to xfade from look #1 to look #2 on both machines over 1 second. Obviously since the images are side by side I want the xfade to start and end at exactly the same time. Right now I have one machine as the server, sends out the go message to itself and via a udp packet to the client machine. Sometimes it's spot on, sometimes a slight delay on the client machine...but it's never consistent.

Has anybody here had any experience with this? A magical set of scheduler options? Using "transport" across machines? I've been thinking of jumping to the audio domain for triggering, which I'll try later this morning.

Anyway, just curious if others have run into something like before as well.

Hi all,
I need some advice. I'm quite well versed in controlling and sharing data over a network (almost all my show control work involves this), however rarely do I need to control the same output over multiple machines. I better explain... :)

The problem I'm having is "syncing" xfading between different effects/looks across the machines. Let's say I want to xfade from look #1 to look #2 on both machines over 1 second. Obviously since the images are side by side I want the xfade to start and end at exactly the same time. Right now I have one machine as the server, sends out the go message to itself and via a udp packet to the client machine. Sometimes it's spot on, sometimes a slight delay on the client machine...but it's never consistent. 

syncing-image-control-functions-over-a-network

I would use java for this situation. If your main concern is to have both fades in sync maybe you can use mxj net.maxhole and send from the server computer the fade values to the client. If you need to send complete messages(which doesn't seem to be the case) you can go for the mxj net.tcp objects.

I've actually found the java network objects to have a slightly higher latency than the standard udpsend/receive objects. I think the answer really involves syncing the two machines beyond an initial trigger...maybe...

Hi efe,
I've actually found the java network objects to have a slightly higher latency than the standard udpsend/receive objects. I think the answer really involves syncing the two machines beyond an initial trigger...maybe...

The issue of the sync is a rather difficult one, indeed. What about sending OSC messages, do you think the latency issue could improve?

Unfortunately we've tried all the different flavors of network-based communication and didn't find any difference adding the OSC headers. I think I need to look at the transport object in more detail, or somehow syncing to an external clock (midi time code or the like), or again maybe even an audio-based trigger/sync method. Or...changing my visual effects so the syncing issue isn't obvious :)

Please let us know about the results of your research, i am also working on an audiovisual piece which require to sync two computers loading and playing HD footage, so any feedback would be super useful ;-)

Please let us know about the results of your research, i am also working on an audiovisual piece which require to sync two computers loading and playing HD footage, so any feedback would be super useful ;-)
good luck!
Emmanuel


i think the only way to really get accurate sync is to schedule your events at some time in the future using a timestamp from the date object. this will obviously require a certain amount of latency, but it's either latency or our of sync.

fwiw, in my networking endeavors, the latency was never enough to make me try another solution. i just don't think the audience is that aware of millisecond sync discrepancies.

i think the only way to really get accurate sync is to schedule your events at some time in the future using a timestamp from the date object.  this will obviously require a certain amount of latency, but it's either latency or our of sync. 

fwiw, in my networking endeavors, the latency was never enough to make me try another solution. i just don't think the audience is that aware of millisecond sync discrepancies.


Good to know someone else is looking at this. I have a similar problem to solve, though most likely for a completely different purpose. Thought I'd poke around what others are up to before rushing into it.

As an aside, have you thought about using serial instead of network?

The general problem to solve is to have a timer running on each machine that contains the same value. You could do this with signal generating hardware, timecode, midi clock etc. But maybe that's just trying to be too clever.

So... Since you're already doing UDP, here's what I'd suggest (and it's what I'm gonna do myself).

Your master computer (I assume one is acting as a master) measures the average latency using several UDP exchanges with the slave. Much like ping, you send out a packet with an ID plus the master timer value (some kind of reasonably high-res counter). The slave appends its own timer value to the packet and returns it. On receipt the master computes the turn-around time and keeps a running average by whatever statistical means you find is necessary. A straight running average might be enough (eg. CurrentAverage = 0.95*CurrentAverage + 0.05*NewValue) and, being the simplest to code, should definitely be one of the first things to try.

After somewhere between maybe 100 and 1000 exchanges you ought to have a pretty decent latency measure. The whole sync ought to be very fast - under a second at least (obviously you don't have to wait for a packet to return before sending more). You'll also want to have computed the average offset between the master and slave timers (time master sent minus time slave sent).

Next, send a sync packet to the slave containing the adjusted timer offset value (offset minus half the UDP turnaround time). The slave uses these to calculate the value of the master timer from the local timer. Now the two machines have synchronised timers, surely with more accuracy than "relying on a single UDP packet" (an oxymoron?!). You might need to sync every few minutes or so to compensate for drift.

The final trick is to make your output lag a little behind the master timer. The maximum measured UDP latency should be more than enough, and hopefully that yields enough responsiveness (or you can just hard-code a latency). I mean, we're talking tens of milliseconds here, right? I'm assuming your video is buffered and you can just remain X frames behind the tail, X most likely being pretty small for HD frame rates.

Now if you want to trigger a cross-fade for example, just send the instruction along with a start time (the master's current timer value) and a duration. Both master and slave queue this even and begin fading at the correct time. Even if the packet is slow in arriving at the slave, it'll instantly catch up because it knows where in the fade it _should_ be.

Hope that's a help to either you or other readers, without being too wordy =)

Good to know someone else is looking at this.  I have a similar problem to solve, though most likely for a completely different purpose.  Thought I'd poke around what others are up to before rushing into it.

The general problem to solve is to have a timer running on each machine that contains the same value.  You could do this with signal generating hardware, timecode, midi clock etc.  But maybe that's just trying to be too clever.

So...  Since you're already doing UDP, here's what I'd suggest (and it's what I'm gonna do myself).

Your master computer (I assume one is acting as a master) measures the average latency using several UDP exchanges with the slave.  Much like ping, you send out a packet with an ID plus the master timer value (some kind of reasonably high-res counter).  The slave appends its own timer value to the packet and returns it.  On receipt the master computes the turn-around time and keeps a running average by whatever statistical means you find is necessary.  A straight running average might be enough (eg. CurrentAverage = 0.95*CurrentAverage + 0.05*NewValue) and, being the simplest to code, should definitely be one of the first things to try.

After somewhere between maybe 100 and 1000 exchanges you ought to have a pretty decent latency measure.  The whole sync ought to be very fast - under a second at least (obviously you don't have to wait for a packet to return before sending more).  You'll also want to have computed the average offset between the master and slave timers (time master sent minus time slave sent).

Next, send a sync packet to the slave containing the adjusted timer offset value (offset minus half the UDP turnaround time).  The slave uses these to calculate the value of the master timer from the local timer.  Now the two machines have synchronised timers, surely with more accuracy than "relying on a single UDP packet" (an oxymoron?!).  You might need to sync every few minutes or so to compensate for drift.

The final trick is to make your output lag a little behind the master timer.  The maximum measured UDP latency should be more than enough, and hopefully that yields enough responsiveness (or you can just hard-code a latency).  I mean, we're talking tens of milliseconds here, right?  I'm assuming your video is buffered and you can just remain X frames behind the tail, X most likely being pretty small for HD frame rates.

Now if you want to trigger a cross-fade for example, just send the instruction along with a start time (the master's current timer value) and a duration.  Both master and slave queue this even and begin fading at the correct time.  Even if the packet is slow in arriving at the slave, it'll instantly catch up because it knows where in the fade it _should_ be.

I was wondering about the method you are describing above, are you considering more than two computers?, it seems to me that this method might be complicate to implement using networks bigger than two clients.

I had the chance to perform with a multi-computer setting couple days ago using net.tcp.recv and mxj net.maxhole: the first to send numerical values from the master(which, btw, hosted the midi inputs) and maxhole to keep in sync the playback time of the clients. So far it worked fine though i had a small latency in some cases. I am definitively interested in trying out your method.

Are you using sound for your project, what about slaving the clocks to a sound card?

Hey Paddy, thanks for sharing the tips!
I was wondering about the method you are describing above, are you considering more than two computers?, it seems to me that this method might be complicate to implement using networks bigger than two clients.
I had the chance to perform with a multi-computer setting couple days ago using net.tcp.recv and mxj net.maxhole: the first to send numerical values from the master(which, btw, hosted the midi inputs) and maxhole to keep in sync the playback time of the clients. So far it worked fine though i had a small latency in some cases. I am definitively interested in trying out your method.
Are you using sound for your project, what about slaving the clocks to a sound card?
Emmanuel


Good thinking. I wasn't considering more than two machines, but there's no reason why you couldn't, as long as you still follow the concept of a single master controlling its minions. [Okay, so you could do it with a flat hierarchy if you HAVE to. Otherwise it'd be needless complexity.] As I've learned from my kids, if more than one person thinks they control bed-time, very small things become a big challenge!!

The cool thing about UDP is you can send to the broadcast address on your subnet (eg 192.168.0.255) so every machine will receive the sync packets. Everything would be the same as the two machine version, except the master now has to keep track of each machine that responded. Since the sender IP address comes for free in a UDP packet, you can just use that as an index. This way the master doesn't need to know about its slaves - it will 'discover' them as they respond... And of course the slaves discover their master.

[If the broadcast method ends up causing random lag due to lots of slaves replying at once (I wouldn't expect it to be too bad, especially on a gigabit network), you could first send out a discovery broadcast and then communicate individually or in groups. But as always, don't make things hard for yourself unless you really, really need to.]

When you have collected enough latency samples for all the slaves, you work out their offsets individually and send out a sync packet to each (now that you know their IP addresses). The running average idea would be best in this case so you don't end up having to buffer a potentially huge unknown quantity of replies - just one record per IP address.

Because I get excited about ideas when I have them, I've just sat down this evening to implement the two-machine idea instead of hanging out with my wife (she understands, she's a geek too). My actual code probably won't be directly useful cos it's in C (I'm only here because I wanted to comment). I'll let you know if the concept works or not though, if you don't beat me to it. =)

Here's what I'll be using it for: I do tracking and assorted computer vision in C with a bunch of high-speed cameras. If I can roughly synchronise my computers then I can automatically trim reference footage from some clunky HD cameras, and that means when I'm shuttling back through the buffers my other computer can show me the closest matching reference frame. I can't just use one computer cos there's not enough bandwidth on a motherboard to collect all the video.

Good thinking.  I wasn't considering more than two machines, but there's no reason why you couldn't, as long as you still follow the concept of a single master controlling its minions.  [Okay, so you could do it with a flat hierarchy if you HAVE to.  Otherwise it'd be needless complexity.]  As I've learned from my kids, if more than one person thinks they control bed-time, very small things become a big challenge!!

The cool thing about UDP is you can send to the broadcast address on your subnet (eg 192.168.0.255) so every machine will receive the sync packets.  Everything would be the same as the two machine version, except the master now has to keep track of each machine that responded.  Since the sender IP address comes for free in a UDP packet, you can just use that as an index.  This way the master doesn't need to know about its slaves - it will 'discover' them as they respond...  And of course the slaves discover their master.

[If the broadcast method ends up causing random lag due to lots of slaves replying at once (I wouldn't expect it to be too bad, especially on a gigabit network), you could first send out a discovery broadcast and then communicate individually or in groups.  But as always, don't make things hard for yourself unless you really, really need to.]

When you have collected enough latency samples for all the slaves, you work out their offsets individually and send out a sync packet to each (now that you know their IP addresses).  The running average idea would be best in this case so you don't end up having to buffer a potentially huge unknown quantity of replies - just one record per IP address.

Because I get excited about ideas when I have them, I've just sat down this evening to implement the two-machine idea instead of hanging out with my wife (she understands, she's a geek too).  My actual code probably won't be directly useful cos it's in C (I'm only here because I wanted to comment).  I'll let you know if the concept works or not though, if you don't beat me to it. =)

Here's what I'll be using it for:  I do tracking and assorted computer vision in C with a bunch of high-speed cameras.  If I can roughly synchronise my computers then I can automatically trim reference footage from some clunky HD cameras, and that means when I'm shuttling back through the buffers my other computer can show me the closest matching reference frame.  I can't just use one computer cos there's not enough bandwidth on a motherboard to collect all the video.


Thanks for the feedback, very interesting indeed. If you have the code in C and you want to share it won't be a problem(I am a geek myself as you can see). I am curious, are you compiling your own application or using any other open source toolkit?.

My criteria for using java to sync the computers has been so far:

1- its convenience(there are already really nice sets of classes available for networking)

2-The ease to 'hack' within the max patcher.

As far as I understand one of the strong features of java is the networking but maybe a native external(written in c or c++) could be a good option to optimize performance(let's say the calculations once the net bundles arrive to the peer clients). Maybe a mixture of both?

Hello Paddy:
Thanks for the feedback, very interesting indeed. If you have the code in C and you want to share it won't be a problem(I am a geek myself as you can see). I am curious, are you compiling your own application or using any other open source toolkit?. 

My criteria for using java to sync the computers has been so far:
1- its convenience(there are already really nice sets of classes available for networking)
2-The ease to 'hack' within the max patcher.
As far as I understand one of the strong features of java is the networking but maybe a native external(written in c or c++) could be a good option to optimize performance(let's say the calculations once the net bundles arrive to the peer clients). Maybe a mixture of both?

That looks like two pretty good criteria to me. =) My motto is that it doesn't really matter how you do it, as long as you do it the easiest way you know, using the tools you are familiar with. If it turns out not to be optimal enough, THEN you worry about it!

Convenience, as you say. Whatever is convenient for you and the way you like to work. It's amazing how often your "proof of concept" code works well enough to be slapped straight in and doesn't need to be optimized further.

Surely it's easy to access the performance timer from within Java. Isn't there a 'System' class or something that provides access to kernel stuff? I dunno. I don't use Java. Are you targeting a specific platform?

I generally just work with the Windows API. But that's because a lot of my code has very specific optimality requirements and known target hardware (and OS, obviously). You can spend a lot of time looking for the perfect tool, but often you only need a very small subset of it. By the time you've found it, you could have done it yourself and saved a lot of time.

In this case I just salvaged bits from a small test project I had hacked together a few months ago. It had a bunch of socket code taken straight off an article on MSDN and took about 5 minutes to put together. This was easier and quicker than finding and familiarising myself with some other library, regardless of how much tidier it might be.

I think I've spent more time writing on this forum than writing code... =) Every time I hit 'reply' it ends up as a brain dump!

That looks like two pretty good criteria to me. =)  My motto is that it doesn't really matter how you do it, as long as you do it the easiest way you know, using the tools you are familiar with.  If it turns out not to be optimal enough, THEN you worry about it!

Convenience, as you say.  Whatever is convenient for you and the way you like to work.  It's amazing how often your "proof of concept" code works well enough to be slapped straight in and doesn't need to be optimized further.

Surely it's easy to access the performance timer from within Java.  Isn't there a 'System' class or something that provides access to kernel stuff?  I dunno.  I don't use Java.  Are you targeting a specific platform?

I generally just work with the Windows API.  But that's because a lot of my code has very specific optimality requirements and known target hardware (and OS, obviously).  You can spend a lot of time looking for the perfect tool, but often you only need a very small subset of it.  By the time you've found it, you could have done it yourself and saved a lot of time.

In this case I just salvaged bits from a small test project I had hacked together a few months ago.  It had a bunch of socket code taken straight off an article on MSDN and took about 5 minutes to put together.  This was easier and quicker than finding and familiarising myself with some other library, regardless of how much tidier it might be.

I think I've spent more time writing on this forum than writing code... =)  Every time I hit 'reply' it ends up as a brain dump!


I ran outta time to finish this the other night but got stuck into it again this morning. One interesting problem to solve was that the performance counter frequencies are different on my two test machines. My clunky laptop's counter frequency is about 3.5 million, while my desktop machine cranked out 2.1 billion counts per second.

To compensate for that I send the timer frequency with the PING packet. The slave divides its timer by the local frequency then multiplies by the master frequency before stamping the PING reply. The large difference in frequencies leads to drift error in my slave (the laptop), but that just means I might need to sync more often. In reality, my master and slave will run on more similar hardware.

I used the simple running average scheme but with these conditions:

if first PING reply {
____set averages to the values contained in the packet
} else if number of PING replies < 10 (or some number) {
____if packet latency < average latency {
________set averages to the values contained in the packet
____}
} else if latency is less than twice the average latency {
____compute running average
}

This gives some protection against the initial packet latency being quite high and corrupting the running average. I could be more clever and sample the standard deviation initially, but I didn't feel like it! I might do that later.

Earlier today my UDP turn-around time on this network was about 120 microseconds. Now it's about 400 microseconds (+/- 100) due to extra traffic. On a closed network this ought to be lower and more consistent.

The larger the turn-around, the more error there is in estimating the uni-directional latency (guessed as half the turn-around time). But even an uncertainty as ridiculous as 500 microseconds is more than enough for me - that'd be 1/2000 of a second, which nobody's going to perceive. And that's far worse than worst-case from what I've seen so far.

By contrast, my tracking cameras deliver a frame every 1/225 seconds and the HD cameras deliver every 1/25 seconds.

I'll tidy things up a little, do some tests on timer drift and post the code along with a test program when I'm happy that it's working adequately =)

I ran outta time to finish this the other night but got stuck into it again this morning.  One interesting problem to solve was that the performance counter frequencies are different on my two test machines.  My clunky laptop's counter frequency is about 3.5 million, while my desktop machine cranked out 2.1 billion counts per second.

To compensate for that I send the timer frequency with the PING packet.  The slave divides its timer by the local frequency then multiplies by the master frequency before stamping the PING reply.  The large difference in frequencies leads to drift error in my slave (the laptop), but that just means I might need to sync more often.  In reality, my master and slave will run on more similar hardware.

This gives some protection against the initial packet latency being quite high and corrupting the running average.  I could be more clever and sample the standard deviation initially, but I didn't feel like it!  I might do that later.

Earlier today my UDP turn-around time on this network was about 120 microseconds.  Now it's about 400 microseconds (+/- 100) due to extra traffic.  On a closed network this ought to be lower and more consistent.

The larger the turn-around, the more error there is in estimating the uni-directional latency (guessed as half the turn-around time).  But even an uncertainty as ridiculous as 500 microseconds is more than enough for me - that'd be 1/2000 of a second, which nobody's going to perceive.  And that's far worse than worst-case from what I've seen so far.

Has anyone had further success with this? Not sure if I went too far off-topic, writing a solution for the wrong platform in the wrong language (being what I needed personally) - sorry if it's not relevant! =) Anyway, the technique I described is working well. So either port mine and play with it, or have fun implementing it yourself. It's not hard. The trickiest bit is probably getting the fixed-point arithmetic correct.

Emmanuel, you were concerned about the efficiency of the Java network API. Did you manage to access the system performance counter through either of these? And did you get any network speed measurements? I'd be interested to know what the speed differences are.

In my C++ implementation with Win32 sockets my turn-around is about 300 microseconds to my laptop and 100 microseconds to my tracking box. These are on the same network, but perhaps my laptop has crummy hardware. The loopback turn-around is about 50 microseconds (software-only latency).

The slaves compensate for drift by adjusting their clock frequency after the second sync. This is purely mathematical - I can't adjust the real clock frequency of course. Even with the simplest drift implementation, my tracking box only drifted off by about 50 microseconds after running for 10 minutes without sync. Without drift compensation, it would have been out by 0.12 seconds due to the relative counter frequency error.

Attached for your interest are some screenshots. One shows the startup clock drift on all machines. The other two show my laptop (192.68.68.8) and tracker (192.68.68.18) after drift compensation. The error bars represent the UDP turn-around time. In reality the error should be less than this because there is a mimimum transit time, restricting when within that time range a packet could have conceivably been received.

The code is work-in-progress and isn't really in a publishing state so rather than post it here, how about anybody who wants to play with it just email me: geoffS.bolPton@gmAail.comM (remove 'SPAM'). One day I'll polish it and post it up on CodeProject or somewhere along with a helpful article. Meanwhile, if it's of use to somebody I'm happy to send it to you as is.

Has anyone had further success with this?  Not sure if I went too far off-topic, writing a solution for the wrong platform in the wrong language (being what I needed personally) - sorry if it's not relevant! =)  Anyway, the technique I described is working well.  So either port mine and play with it, or have fun implementing it yourself.  It's not hard.  The trickiest bit is probably getting the fixed-point arithmetic correct.

Emmanuel, you were concerned about the efficiency of the Java network API.  Did you manage to access the system performance counter through either of these?  And did you get any network speed measurements?  I'd be interested to know what the speed differences are.

In my C++ implementation with Win32 sockets my turn-around is about 300 microseconds to my laptop and 100 microseconds to my tracking box.  These are on the same network, but perhaps my laptop has crummy hardware.  The loopback turn-around is about 50 microseconds (software-only latency).

The slaves compensate for drift by adjusting their clock frequency after the second sync.  This is purely mathematical - I can't adjust the real clock frequency of course.  Even with the simplest drift implementation, my tracking box only drifted off by about 50 microseconds after running for 10 minutes without sync.  Without drift compensation, it would have been out by 0.12 seconds due to the relative counter frequency error.

Attached for your interest are some screenshots.  One shows the startup clock drift on all machines.  The other two show my laptop (192.68.68.8) and tracker (192.68.68.18) after drift compensation.  The error bars represent the UDP turn-around time.  In reality the error should be less than this because there is a mimimum transit time, restricting when within that time range a packet could have conceivably been received.

The code is work-in-progress and isn't really in a publishing state so rather than post it here, how about anybody who wants to play with it just email me: geoffS.bolPton@gmAail.comM (remove 'SPAM').  One day I'll polish it and post it up on CodeProject or somewhere along with a helpful article.  Meanwhile, if it's of use to somebody I'm happy to send it to you as is.

I am not sure if this will be helpful. But there is an alternate design strategy possible, one that I have used for similar projects: there is one server machine and two client machines. The server machine is used for interface and for sending control messages ONLY. The client machines render and output video. Control messages are sent via mxj.net.maxhole to the local network. The two clients should receive the messages at the same time.

To use this setup well, control messages need to be "large-grained" -- let me explain. If you need to synchronize fade-outs, it is not efficient for the server to have to send a value for every point in the fade. Instead, send a simple message "fade", and let each of the client machines execute the steps of the fade themselves (say, using a line object.) You could even send a arbitrary message like "fade 1000": "fade" would trigger the start of the fade, and "1000" would be used by the line object to control the fade's duration. In this way, "upper level" control is handled by the server, but "mid-level" control is handled by each client. This mid-level control -- things like line objects, gates, routes, toggles -- are not very heavy, and should not interfere with video performance.

Visiting Professor and Artist-in-Residence

Kurt Ralske
...
Visiting Professor and Artist-in-Residence
Department of Digital + Media
Rhode Island School of Design
http://retnull.com


syncing image control functions over a network