Memory barriers

andrea agostini's icon

Hi all.

Being not a real computer programmer, I have just discovered the existence of memory barriers (and, more importantly, of the problem they are designed to solve). Now this puts in serious danger the self-confidence I had gained in dealing with concurrency issues.

But it also leaves me with some questions, Max-wise:

- I have noticed that we have two macros ATOMIC_INCREMENT_BARRIER and ATOMIC_DECREMENT_BARRIER besides the "non-barrier" ones (the two being actually different only under OSX): but why in the examples of dealing with buffer~, at least in the Max5 SDK, the non-barrier version was used? I understand that this is exactly the case in which barrier increments are needed...

- AFAIK there the Max API doesn't contain any cross-platform barrier mechanism - how does the Cycling code deal with the issue? and how do people more experienced than me deal with it in their externals? Simply building their own macros?

... or does all this mean (as I think I have read somewhere) that this is not a real problem on the x86 architecture (while it was on the PPC)?

Thank you very much for any enlightenment!
aa

andrea agostini's icon

Hi Nicolas.

I'm not sure I understand your example. Do you mean that in the case

long val; stackPop (stack, &val);

it might happen that stackPop returned before val was correctly allocated in the stack? According to what I have read, it appeared that both compiler and CPU reordering were guaranteed to keep the logical order of operations consistent, in a single-threaded context: which should be the case with your code. Did you notice elsewhere? I mean, if your code has a chance to get messed up then programming is black magic... Would you like to explain me further the issues you have found with that?

On the other hand, I'm afraid that taming compiler reordering is not enough to be sure that memory operations are actually performed in the order you meant. The CPU does its share of reordering as well, and it seems that there is no way to control it besides placing memory barriers.

But frankly, I'm just trying to make sense of stuff I have read here and there on the internet... I'd really love someone knowledgeable to explain me more about this...

aa

andrea agostini's icon

Tim... you answered everyone's posts but mine... :(
(just joking, of course, but bump!)
thank you
aa

Timothy Place's icon

Haha -- Not trying to avoid you Andrea, just trying to avoid memory barriers ;-)

Memory barriers are pretty complex and I can't hope to explain them clearly or comprehensively here. To try and give the boiled-down answer: software engineering is always a series of compromises. This is no more true anywhere else than it is of multithreading. When there are multiple threads operating on shared data there is a tradeoff between speed and absolute safety. (just to muddy the waters, there are also differences between theoretical speed and real-world speed and theoretical safety and real-world safety).

The scenarios will be different depending on:

* how many threads are accessing the data?
* is any given thread read-only? write-only? or read-write?
* what is/are the other thread(s) activity (read/write/both)?
* how long do the operations on the two threads take?
* etc.
* etc.
* etc.

Mutexes and critical regions provide the most safety, and are appropriate in many cases. They are, however, problematic for realtime performance-sensitive code (e.g. audio). The atomic inc/dec can be used as a lighter-weight mechanism. There are scenarios however where the ordering of instructions may still end up mis-ordered. A slightly heavier weight way to help with this is to use the barrier variants.

In some cases you can use a structure like a non-locking queue (see http://www.rossbencina.com/code/lockfree) which is fast and doesn't require a mutex or critical region. If you dig into this you will find several implementations online which all basically come around to the same algorithm -- except some use a memory barrier and some don't. Is it really needed? Is there out of superstition? No one that I know of has written an article specifically explain their use of a memory barrier or not. Were there real-world examples from runs of their program that exhibited instruction re-ordering? Or was it just a fear of incorrect instruction ordering? I don't know.

So now you know why I avoided your question ;-)

All of that said, are you experiencing a particular problem in your code that you suspect is related to memory barriers?

Cheers,
Tim

andrea agostini's icon

Hi Tim.

Thank you very much for taking the time for this. Now I have a much clearer picture.

First of all no, I'm not experiencing any particular practical problem. It was just that I was reading around and I stumbled upon this thing I didn't really know about, and I felt like omg, my code might be full of this kind of problems everywhere. And so I wanted to understand more... But on the other hand it's true that we routinely test our bach externals with fast metros and qmetros working in parallel, and we never met a problem that we couldn't solve with proper thread-locking.

So all in all I understand that in common-life situations instruction reordering is more a theoretical than a practical issue... but ok, nonetheless I think I'll start paying attention to this from now on!

Thanks again
aa

$Adam's icon

Thank you for sharing this, Nicolas!
Cheers,
Ádám

andrea agostini's icon

Very interesting, thanks!
aa