andrea agostini

I have found that, in some occasions, mutexing an object_free causes a deadlock. Here's an extremely stripped-out example:

void plussz_bang(t_plussz *x)
{
    systhread_mutex_lock(x->p_mutex);
    object_free(x->p_aa);
    x->p_aa = atomarray_new(0, NULL);
    systhread_mutex_unlock(x->p_mutex);
}

Shortly after I turn on the toggle, Max hangs. In thread 1, it get stuck inside object_free (in what looks to me like a call to critical_enter, but my assembler skills are too limited for being sure); in thread 6, I am stuck inside systhread_mutex_lock.

The doc clearly states not to lock outlet calls, and I can easily see why. I don't really grasp what's happening here, though.

Of course, in this stripped-down scenario the solution is straightforward - assigning x->p_aa to a local variable, and freeing it after unlocking the mutex. I am aware that this would also be a better design choice, to keep the locked code as small as possible.

Unfortunately, the problem popped out in a far more complex situation, in which avoiding to free anything inside the mutex would be hell.

- Why exactly this problem arises? Is it something related Max's global critical region? Perhaps, my systhread_mutex_lock call happens inside the global critical region?

- The problem doesn't seem to arise in every possible situation. Is it just by chance, or are there specific conditions triggering it?

- Would the situation improve if I manage my object's lifecycle with object_retain() and object_release(), as Joshua was suggesting me some days ago?

void plussz_bang(t_plussz *x)
{
	systhread_mutex_lock(x->p_mutex);
	object_free(x->p_aa);
	x->p_aa = atomarray_new(0, NULL);
	systhread_mutex_unlock(x->p_mutex);
}

What I'd like to know is:
- Why exactly this problem arises? Is it something related Max's global critical region? Perhaps, my systhread_mutex_lock call happens inside the global critical region?
- The problem doesn't seem to arise in every possible situation. Is it just by chance, or are there specific conditions triggering it?
- Would the situation improve if I manage my object's lifecycle with object_retain() and object_release(), as Joshua was suggesting me some days ago?
- Do I have an easy solution????

deadlock-with-object_free

Is it possible just to schedule the object for deletion in another (safe) thread without lock?

Probably you could defer or deferlow (which are both threadsafe) - once the pointer has been passed here you can replace it with a new pointer without waiting for it to be freed. That is probably also going to make your code faster, should you be trying to free stuff in a higher priority thread.

If it isn't safe to call defer / deferlow when locked (I'm not sure) you might look at another way of doing this using atomic operations - that might require going outside of the c74 library - I'm not sure what they provide, but you should be able to get an atomic compare or swap (or something like this). In that case you might:

3 - free the old object (possibly deferring)

I've implemented something like the above using the OS X atomic ops.

Personally, I would favour threading solutions that are as lightweight as possible (such as the above), rather than locking for unnecessarily long periods, but I'm not an expert on threading so YMMV.

1 - allocate a new object
2 - atomically swap the pointer
3 - free the old object (possibly deferring)

In fact, I don't want to defer object_free because this might result in clogging the low priority queue.

Using an atomic swap doesn't solve my problem - the thing is, I need the mutex because in real life I'm doing other things inside it besides swapping the pointer.

1 - why are you not worried about clogging up the high-priority thread with all the freeing in that case? The work has to be done somewhere - if you are worried about the queue being resource heavy you could either a - test it and see, b - write a lightweight queue and despatch several objects at the same time for freeing - the latter might be more trouble than it's worth.

2 - In that case, what are they(the real life things)? - it's pretty hard to advise on a good course of action without knowing what you are doing. It looks like you are replacing an array of atoms with a new one and you want to avoid data corruption from threading issues. Well - are you accessing the old array at all? If not, why not prepare the new array (using a local pointer) and then atomically swap the pointer with a memory barrier when the array is ready? I'm guessing it is not that simple.

This seems to be a situation where the information you aren't giving in your post is pretty crucial to finding a good solution. I don't really understand why you can't wait till outside the lock to free the resources anyway (locally even if not deferted) - of course there may be a good reason, but you haven't given it - could you post more code here, or give some more info?

if there's a lot of stuff going on between 

 then it might also happen that your hang is not related at all to freeing the object, but maybe something else is hanging there. Maybe you could also post a crash-log to see what's going on. I'm not sure how 

 works for atomarrays, but my guess is that when you call it, the atoms will be freed as well. So if you have some A_OBJ-like atom in the atomarray with an own free function that would use the same mutex, it might happen that you run into a deadlock. But as Alex pointed out, without knowing more details this remains a guess...

if there's a lot of stuff going on between systhread_mutex_lock and systhread_mutex_unlock then it might also happen that your hang is not related at all to freeing the object, but maybe something else is hanging there. Maybe you could also post a crash-log to see what's going on. I'm not sure how object_free works for atomarrays, but my guess is that when you call it, the atoms will be freed as well. So if you have some A_OBJ-like atom in the atomarray with an own free function that would use the same mutex, it might happen that you run into a deadlock. But as Alex pointed out, without knowing more details this remains a guess...

Ok, I probably didn't express well my problem.

Just build an external with the bang method I provided (and the mutex initialization in the new method, of course). Run the patch. On my machine, Max hangs after a short while. This - I guess - is enough to demonstrate that the problem lies in object_free in itself.

No atom gets freed (yes, freeing an atomarray frees its atoms, but no, it doesn't free the objects in the A_OBJ unless you set a specific flag - and anyway there are no atoms here!), nothing else is done. Just freeing. And one of the two deadlocked threads is stuck at a lock inside object_free.

For the moment, I'd like to understand why this happens, if it's regular, if I have overlooked something...

Then yes, in my specific case (the problem is, I do need to call from inside a lock a function that frees an object) probably the solution will be a garbage collection mechanism, operating in the main thread only.

IMO, simply deferring object_free is not enough: if the scheduler thread needs to free many things one after another, and each call to object_free is singularly deferred, your low-priority queue would grow more quickly than it's serviced - not a great idea...

Just build an external with the bang method I provided (and the mutex initialization in the new method, of course). Run the patch. On my machine, Max hangs after a short while. This - I guess - is enough to demonstrate that the problem lies in object_free in itself. 

Then yes, in my specific case (the problem is, I do need to call from inside a lock a function that frees an object) probably the solution will be a garbage collection mechanism, operating in the main thread only.
IMO, simply deferring object_free is not enough: if the scheduler thread needs to free many things one after another, and each call to object_free is singularly deferred, your low-priority queue would grow more quickly than it's serviced - not a great idea...

Your post clarifies what you've already said but it doesn't say anything I didn't understand from the rest of the thread. Only the c74 can reliably answer whether you can lock around object_free. If you don't want to detail your project then fine, but then only the c74 team can help. If you can give more detail then maybe the rest of us can help you further.

I understand you wanting to understand why the deadlock happens, but your example seems to prove that it will deadlock and therefore is not viable. The example is very simple, and I see no reason for it to deadlock - however, it would seem that it does - which is a problem.

Whilst it is hard to appreciate your design needs without knowing the general task, it seems that you are pursuing a design that relies on rapid allocating and freeing of objects - and perhaps (although perhaps not) there is a different solution that does not involve this, which is already potentially a performance problem. It is also hard to know why you need to be in the lock when you free without knowing more about the more general task -I'm sure there is a reason - but it's hard for me to think of one right now.

If your low priority queue is growing faster than it is serviced that is a problem, but that implies a lot of freeing, which seems like a problem in itself. However, if you are convinced that this is the best routethen it looks like the solution is to write your own garbage queue. One possibility is to have a regular clock call that then defers a call that deals with a number of objects together. If you really want to minimise calls you could possibly make your clock global to the object (rather than per instance), and use lightweight threading solutions to make the queue threadsafe. That way the load on the low priority queue will not be affected by the number of instances of your object.

Your post clarifies what you've already said but it doesn't say anything I didn't understand from the rest of the thread. Only the c74 can reliably answer whether you can lock around object_free. If you don't want to detail your project then fine, but then  only the c74 team can help. If you can give more detail then maybe the rest of us can help you further.

If your low priority queue is growing faster than it is serviced that is a problem, but that implies a lot of freeing, which seems like a problem in itself. However, if you are convinced that this is the best routethen it looks like the solution is to write your own garbage queue. One possibility is to have a regular clock call that then defers a call that deals with a number of objects together. If you really want to minimise calls you could possibly make your clock global to the object (rather than per instance), and use lightweight threading solutions to make the queue threadsafe. That way the load on the low priority queue will not be affected by the number of instances of your object.


 example project with your modifications, here are my results:

1) If Overdrive is disabled, the thing works fine (I've been running it for 15 minutes and there was no trouble, except that my fans had some hard times).

2) If Overdrive is enabled, the patch will hang after a while. However, checking the crash log after force-quitting MaxMSP, there's no specific entry regarding Max (I searched the crash log for the following expressions: 'maxmsp', 'object_free', 'mutex', 'atomarray' and 'plussz', but nothing was found containing these characters). This, I guess, means that there's actually no problem, just that things are happening so fast in overdrive mode that the GUI wouldn't get enough processor time to respond.

For me this means that you either have a memory leak or your test patch/external has more modifications than mentioned. Could you maybe post the source code of your modified plussz external?

I compiled the plussz example project with your modifications, here are my results:

1) If Overdrive is disabled, the thing works fine (I've been running it for 15 minutes and there was no trouble, except that my fans had some hard times).
2) If Overdrive is enabled, the patch will hang after a while. However, checking the crash log after force-quitting MaxMSP, there's no specific entry regarding Max (I searched the crash log for the following expressions: 'maxmsp', 'object_free', 'mutex', 'atomarray' and 'plussz', but nothing was found containing these characters). This, I guess, means that there's actually no problem, just that things are happening so fast in overdrive mode that the GUI wouldn't get enough processor time to respond.

OK, I see I wasn't completely correct. Of course the crash log had an entry for MaxMSP, but that was in the section listing the current running processes and not within the threads...

I think Andrea's assumptions are right - and the fact that when overdrive is off (and everything runs in a single thread) there is no issue means that the problem is with threading. I would assume that removing either the mutex or the object_free call (both of which will cause potential memory leaks and other issues) will stop the problem.

Thanks for pointing this issue out with a clear example. This has to do with the scheduler locking the global critical region before executing a clock, and object_free() locking the global critical region while freeing. So we have the following:

1. Scheduler: Locks global critical region to service metro (LOCK A)

2. Queue: qmetro executes, calling plusz bang method

3. Queue: Locks plusz systhread mutex (LOCK B)

5. Queue: Stalls in object_free waiting on global critical region (LOCK A)

6. Scheduler: metro executes, calling plusz bang method

7. Scheduler: Stalls trying to lock plusz lock (LOCK B)

And we now have the classical example leading to deadlock.

So to prevent this, we make sure that lock A is always locked from both threads before lock B, or change it so that there is no interdependence between locks. For the latter, you've already identified a solution above. For the former, your example should be changed to the following code.

systhread_mutex_lock(x->p_mutex); // can remove if global critical is enough

systhread_mutex_unlock(x->p_mutex); // can remove if global critical is enough

Sorry for the hassle here. We're working on improving multi-threading from a variety of angles for the next major version of Max, and we may be able to remove the global critical region interdependence with clock servicing and object_free(). For the meantime, however, you'll need to work around this threading behavior as suggested.

1. Scheduler: Locks global critical region to service metro (LOCK A)
2. Queue: qmetro executes, calling plusz bang method
3. Queue: Locks plusz systhread mutex (LOCK B)
4. Queue: Calls object_free
5. Queue: Stalls in object_free waiting on global critical region (LOCK A)
6. Scheduler: metro executes, calling plusz bang method
7. Scheduler: Stalls trying to lock plusz lock (LOCK B)

So to prevent this, we make sure that lock A is always locked from both threads before lock B, or change it so that there is no interdependence between locks. For the latter, you've already identified a solution above. For the former, your example should be changed to the following code. 

critical_enter(0);
systhread_mutex_lock(x->p_mutex); // can remove if global critical is enough
object_free(x->p_aa);
x->p_aa = atomarray_new(0, NULL);
systhread_mutex_unlock(x->p_mutex); // can remove if global critical is enough
critical_exit(0);

Sorry for the hassle here. We're working on improving multi-threading from a variety of angles for the next major version of Max, and we may be able to remove the global critical region interdependence with clock servicing and object_free(). For the meantime, however, you'll need to  work around this threading behavior as suggested.

thank you very much. Now everything makes sense.

Pizza's question is really interesting: what else depends upon the global critical region? (a lot of stuff, I guess, but) are there other occasions in which deadlocks may happen because of it?

Oh, and since I recommend people avoid the global critical generally, I figured I should give a good example for the above problem. Maybe some of you are already doing this, but a better way to do the above for Max 5 without using the global critical, and also reducing the lock duration as much as possible would be:

t_object *tmp1,*tmp2;

tmp1 = atomarray_new(0, NULL);

systhread_mutex_lock(x->p_mutex);
tmp2 = x->p_aa;
x->p_aa = tmp1;
systhread_mutex_unlock(x->p_mutex);  

object_free(tmp2);

The general pattern is to remove any expensive function call or call with unknown lock behaviors outside of your lock. This type of locking only around the the pointer swap is fast and could be put into a utility function, similar to an atomic function. Atomic functions could potentially be an alternative for you unless you need to share the lock with other non atomic operations, which is often the case, and where the following is useful. Maybe we'll add it in a future API.

void *systhread_mutex_ptrswap(t_systhread_mutex *mutex, void **src, void **dst)
{
     void *tmp;

     systhread_mutex_lock(mutex);
     tmp = *dst;
     *dst = *src;
     systhread_mutex_unlock(mutex);  

     return tmp; // return previous contents of *dst so that you can free the pointer after calling this function
}

Yes. Note that your blah blah code isn't executed, but I assume that's okay for you.

deadlock with object_free