is object_alloc thread-safe?

andrea agostini's icon

Hi.

I'm really clueless with a bug.

If I call object_alloc a real lot of times, banging simultaneously from a metro and a qmetro at a very high speed, eventually I get a crash. The overall picture is quite complex, so I can't really describe the whole thing here, but the catch-all explaination would be object_alloc not being safe.

I guess I'm speaking non-sense, since posix-compliant malloc() must be safe, but can someone please reassure me about this?

thank you!
aa

Timothy Place's icon

Hi Andrea,

By convention, the object life-cycle is presumed to always occur in the main thread. By object life-cycle, I mean class registration, instantiation, and freeing.

While its true that we allocing memory, we also do a bit more than that. For example, we zero the memory, set the magic number, setup obex for the object (to support attributes etc), set the #P/#B symbol to the patcher, etc. So there is a little bit more going on.

It is not recommended that you alloc memory from the scheduler thread (e.g. in response to metro). Instead you should defer the call to get it back to the main thread. Allocating memory here has the potential significantly degrade the performance of the scheduler.

One strategy here is that you could pre-allocate a pool of whatever it is that you might need, and then in the scheduler-called method you simply grab some instances from your pool. Not sure if this helps, but hopefully it will get you on the right track.

Cheers,
Tim

Joshua Kit Clayton's icon

Actually, this is an old model (Max 4 and earlier). All "box" objects should happen in the main thread, but otherwise object allocation and freeing of "nobox" objects is thread-safe. We allocate objects, and memory in various places in other threads besides the main thread without incident.

As Tim points out, you might want to avoid doing this extensively for performance reasons. However, it should not crash.

From what you mention ("eventually I get a crash"), are you watching to make sure you're not hitting the memory limits of a 32-bit allocation? If so, perhaps you have a memory leak? If not, perhaps you have a crash log which might offer some insight into what sort of crash is occurring, together with some code fragments of the crashing function?

The next thing from what you describe (a qmetro and a metro running simultaneously), are you sure that whatever you are doing with the objects is threadsafe? It would need to be in such a case. For example the following is not threadsafe:

void foo_bang(t_foo *x)
{
    object_free(x->object);
    x->object = object_alloc(CLASS_NOBOX,gensym("bar"));
}

This type of memory access by two threads is a problem since there's nothing stopping the threads from switching to one another between the two lines. This can result in:

1. double freeing an object and possibly crashing
2. double allocating objects, thus introducing a memory leak

You would want to do something like the following using either locks (easier to manage, especially in complex cases with many related operations):

typedef struct _foo
{
   t_object ob;
   t_object *bar;
} t_foo;

void foo_bang(t_foo *x)
{
     systhread_mutex_lock(x->mutex);
     object_free(x->bar);
     x->object = object_alloc(CLASS_NOBOX,gensym("bar"));
     systhread_mutex_unlock(x->mutex);
}

or atomic operations (cheaper than using locks, but harder to get right for complex cases) with something along the lines of the following code which uses the Apache Portable Runtime Atomic API (we don't provide this):

#include "apr_atomic.h"

typedef struct _foo
{
   t_object ob;
   t_object *bar;
} t_foo;

void foo_bang(t_foo *x)
{
     t_object *new_obj,*old_obj;
     new_obj = object_alloc(CLASS_NOBOX,gensym("bar"));
     old_obj = (t_object *) apr_atomic_xchgptr((void **)&x->bar,(void *)new_obj);
     object_free(old_obj);
}

This is an easy scenario to overlook. There are probably places in our own code (and the SDK) which make this kind of oversight.

Finally, if it is not a crash, but rather a "spinning wheel of death", then, perhaps you have a deadlock scenario, which most often happens if you place an outlet call *inside* your own mutex lock (big no no). Otherwise, we'd need to know more about what you're doing before we could offer much assistance.

-Joshua

$Adam's icon

Hello all,

just by curiosity (and because I'm also working now with multi-threaded stuff), is it better, worse, or just the same if instead of using the systhread_mutex_lock/unlock combo I would simply use the critical_enter/critical_free pairs? What is the difference between these and what are the pros/cons of one over the other?

Thank you,
Ádám

$Adam's icon

Sorry, of course I was referring to critical_exit(), not critical_free()...

andrea agostini's icon

Hi folks.
Thank you for your explainations, first.

I'll try to make some things clearer.

Tim, I am aware that object_alloc does more than merely allocating memory - I just (wrongly) assumed that memory allocation was the only thread-risky part of the operation.

Anyway, the class I'm frantically instantiating is a nobox one.

The crash log doesn't really help. The point is that my data are corrupted in a (very systematic) way that eventually causes a crash. What I remarked is that inside my data structure I unmistakably recognize some data belonging to a message box elsewhere in the patch - this is why I was wondering if there was a memory allocation problem, kind of the same block of memory allocated twice because of a thread-safety problem.

This thing of the message box is a quite intriguing one. The metro and qmetro are actually connected to a message box (containing 1 2 5), which in turns is connected to my object [reg]. Besides the toggles and number boxes for the metro and qmetro, there's nothing more in the patch. When [reg] receives a list, this is a simplified version of what happens:

void reg_list(t_reg *x, t_symbol *sym, long ac, t_atom *av) {
    t_llll *new_llll = object_new_typed(_sym_nobox, _sym_llll, ac, av); // object_alloc is in llll's "new" method
    if (!new_llll)
        return;
    systhread_mutex_lock(x->lock);
    object_free(x->cache);
    x->cache = new_llll;
    systhread_mutex_unlock(x->lock);
    if (inlet == 0)
        reg_bang(x);
}

void reg_bang(t_reg *x) {
    t_atom parsed_wrapped, *parsed_atom;
    t_atomarray parsed_aa;
    systhread_mutex_lock(x->lock);
    object_method_typed(x->cache, gensym("clone", 0, NULL, wrapped);
    // clone is a GIMMEBACK method
    // it instantiates a new llll and clones "this" into it
    // but x->cache gets corrupted and clone crashes
    systhread_mutex_unlock(x->lock);
    atomarray_getatoms(parsed_aa = (t_atomarray *) atom_getobj(&parsed_wrapped), &out_ac, &parsed_atom);
    outlet_anything(x->outlet, _sym_foo, 1, parsed_atom);
    object_free(atom_getobj(parsed_atom));
    object_free(parsed_aa);
}

(As you can see, at the moment a new object is cached the previous one is freed; and the cloned object that is output is immediately destroyed thereafter.)

Now, deep inside my llll structure (not at the very beginning, though) I have two longs, one after another, that at the moment of the crash contain 2 and 5. This is not the immediate cause of the crash, but a clear symptom that something is wrong. If I change the contents of the message box, the two longs change accordingly. Now, this would suggest that someone stores the 2 and 5 inside a long[] or something like that - not in an atom[], because in this case they would not be aligned to my two longs in the object structure, right? Since in my object the incoming message is only stored inside atoms, I'm guessing someone else has written inside my object structure. This is why I was wondering about object_alloc.

Finally, this thing of banging my object from a very fast metro and qmetro doesn't belong to a real-world scenario - it's just for testing. Though, I don't like it not to work…

As soon as I get more detalis about the problem, I'll let you know!

aa

ps @ Siska: afaik, critical regions are basically recursive mutexes. This means that you can enter them more than once from the same thread (and you have to unlock them more than once, correspondingly). Besides the fact that you might not want this, criticals are much more expensive than regular mutexes - so in general you should avoid them when you don't need them.

Joshua Kit Clayton's icon

Hi Andrea,

Yes, it sounds like something is trashing memory, or you're using memory which is no longer valid (e.g. referencing atoms passed in as arguments after they've been freed or popped from the stack).

Your locking looks correct. I'll assume that there's some innocent typos in the above code, however.

Without seeing your llll constructor or your clone method, it's a little hard to tell whether or not you're dealing with your atoms and any allocated memory properly.

Let us know if you have more details and better examples for us to investigate.

@Siska: Yes, what Andrea mentions is correct. critical_new() is the same thing as systhread_mutex_new(&mutex,SYSTHREAD_MUTEX_RECURSIVE);

andrea agostini's icon

Hi!
So, I have found the bug (it was a very trivial one, inside the free method)
Thank you for your help - it really pointed me into the right direction

Cheers
aa