memory allocation again...

Luigi Castelli's icon

Hi guys,

I know memory allocation issues have been discussed in the past,
however I would like to get an update or reiteration, because it seems
the previous discussions were in large part based around OS 9
idiosyncrasies, most of which have now disappeared on OS X.

So, as an example, let's take a situation where I have an external
that in response to a typed message will be scanning a list looking for
a particular item. For each 'particular item' it finds in the list, the
external will need to dynamically allocate some memory. Note that 'the
particular item' might be more than one.

I know that the amount of memory for each allocation call is not going
to be bigger than 32k, however the total memory required by all the
calls together (if the particular items in the list are many) might end
up being bigger than 32k.

In a situation like the above, I tried malloc, newhandle, getbytes, and
sysmem_newptr, with the relative deallocating functions free,
disposehandle, freebytes and sysmem_freeptr.

Under a superficial look, they all seem to work. However I am
interested if one is recommended over the others because of subtleties
that I am not aware of. Also given the above situation I am curious if
one allocating function is preferable above the others.

Thank you.

- Luigi

------------------------------------------------------------
THIS E-MAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT AND MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. ANY UNAUTHORIZED REVIEW, USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED RECIPIENT, CONTACT THE SENDER BY E-MAIL AT SUPERBIGIO@YAHOO.COM AND DESTROY ALL COPIES OF THE ORIGINAL MESSAGE. WITHOUT PREJUDICE UCC1-207.
------------------------------------------------------------

Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Peter Castine's icon

On OS X you can use reall malloc() any time you want (IMS, the same holds for NewPtr(), etc). The caveat is that malloc/NewPtr can take a lot of time, whereas getbytes is fast.

You can google around to find source code for malloc() implementations. One look at some of them should convince anyone that you don't want standard memory allocation routines called in timing-sensitive operations.

So the old rules are really still valid, you just won't crash anymore for calling malloc() inside an interrupt.

Luigi Castelli's icon

So, is there any difference worth mentioning between getbytes and
sysmem_newptr ?
(beside the fact that one has a 32k limit and the other doesn't)

Do the sysmem functions call malloc under the hood ?

Thanks.

- Luigi

--- Peter Castine wrote:

>
> On OS X you can use reall malloc() any time you want (IMS, the same
> holds for NewPtr(), etc). The caveat is that malloc/NewPtr can take a
> lot of time, whereas getbytes is fast.
>
> You can google around to find source code for malloc()
> implementations. One look at some of them should convince anyone that
> you don't want standard memory allocation routines called in
> timing-sensitive operations.
>
> So the old rules are really still valid, you just won't crash anymore
> for calling malloc() inside an interrupt.
> --
> -------------- http://www.bek.no/~pcastine/Litter/
> -------------
> Peter Castine +--> Litter Power & Litter Bundle for
> Jitter
>
> iCE: Sequencing, Recording & Interface Building for Max/MSP
> Extremely cool http://www.dspaudio.com/
>
>

------------------------------------------------------------
THIS E-MAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT AND MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. ANY UNAUTHORIZED REVIEW, USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED. IF YOU ARE NOT THE INTENDED RECIPIENT, CONTACT THE SENDER BY E-MAIL AT SUPERBIGIO@YAHOO.COM AND DESTROY ALL COPIES OF THE ORIGINAL MESSAGE. WITHOUT PREJUDICE UCC1-207.
------------------------------------------------------------

Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Joshua Kit Clayton's icon

On May 7, 2007, at 12:35 PM, Luigi Castelli wrote:

> So, is there any difference worth mentioning between getbytes and
> sysmem_newptr ?
> (beside the fact that one has a 32k limit and the other doesn't)
>
> Do the sysmem functions call malloc under the hood ?

Yes. getbytes uses an internal memory pool (which calls malloc if it
runs out of space).

In Max 5 these calls can be used interchangeably, and if it is a
small memory allocation, we use our internal pool, and if it is a
large allocation, it uses the standard OS memory pool. I don't think
you should worry about using either one. They both should work fine
for your purposes.

Again, Luigi, you seem to be worrying about code performance too much
before running into problems. I'd like to encourage you to spend your
time implementing your design and optimizing only when you find that
there are issues, and then use empirical tests to identify what those
issues are (for example, using the shark profiler, or instrument your
code with something that takes explicit measurements). Don't waste
the incredibly valuable time of your mind's CPU.

-Joshua

Mark Pauley's icon

Has the cycling team done any perf analysis of getbytes vs. malloc?
OSX's malloc routine has a staggered / pooled allocation strategy as
well, eventually degenerating into vm_alloc as the allocated chunks
approach 4k (the system page size).

I'm just wondering if this isn't a bit like double-caching. In any
event, it's probably useful to have an allocation interface that more
or less works uniformly across platforms.

_Mark

On May 8, 2007, at 9:35 AM, Joshua Kit Clayton wrote:

>
> On May 7, 2007, at 12:35 PM, Luigi Castelli wrote:
>
>> So, is there any difference worth mentioning between getbytes and
>> sysmem_newptr ?
>> (beside the fact that one has a 32k limit and the other doesn't)
>>
>> Do the sysmem functions call malloc under the hood ?
>
> Yes. getbytes uses an internal memory pool (which calls malloc if it
> runs out of space).
>
> In Max 5 these calls can be used interchangeably, and if it is a
> small memory allocation, we use our internal pool, and if it is a
> large allocation, it uses the standard OS memory pool. I don't think
> you should worry about using either one. They both should work fine
> for your purposes.
>
> Again, Luigi, you seem to be worrying about code performance too
> much before running into problems. I'd like to encourage you to
> spend your time implementing your design and optimizing only when
> you find that there are issues, and then use empirical tests to
> identify what those issues are (for example, using the shark
> profiler, or instrument your code with something that takes explicit
> measurements). Don't waste the incredibly valuable time of your
> mind's CPU.
>
> http://www.extremeprogramming.org/rules/optimize.html
> http://www.extremeprogramming.org/stories/optimize2.html
>
> -Joshua

Joshua Kit Clayton's icon

On May 8, 2007, at 10:04 AM, Mark Pauley wrote:

> Has the cycling team done any perf analysis of getbytes vs.
> malloc? OSX's malloc routine has a staggered / pooled allocation
> strategy as well, eventually degenerating into vm_alloc as the
> allocated chunks approach 4k (the system page size).

Good point. A quick test on my MBP shows that usually malloc is
faster or approximately the same. Here's the results for the code
snippet at bottom with various fixed sizes. Seems like it's about 2-3
times faster when quickly reallocating/deallocating the same size
over and over again, and then a bit of a toss up when growing the
memory pool or allocating random memory sizes.

ITERATION COUNT: 100000; ALLOC SIZE(BYTES): 16
POOL REUSE
getbytes: 22.827000 ms
malloc: 11.202000 ms
POOL GROW
getbytes: 30.482000 ms
malloc: 34.919000 ms
POOL RANDOM
getbytes: 25.689000 ms
malloc: 12.784000 ms

ITERATION COUNT: 100000; ALLOC SIZE(BYTES): 128
POOL REUSE
getbytes: 22.871000 ms
malloc: 10.882000 ms
POOL GROW
getbytes: 65.705000 ms
malloc: 60.475000 ms
POOL RANDOM
getbytes: 30.505000 ms
malloc: 35.556000 ms

ITERATION COUNT: 100000; ALLOC SIZE(BYTES): 1024
POOL REUSE
getbytes: 24.227000 ms
malloc: 8.754000 ms
POOL GROW
getbytes: 258.164000 ms
malloc: 225.734000 ms
POOL RANDOM
getbytes: 26.731000 ms
malloc: 22.175000 ms

> I'm just wondering if this isn't a bit like double-caching. In any
> event, it's probably useful to have an allocation interface that
> more or less works uniformly across platforms.

Yes. Most of this is legacy based on NewPtr not being accessible in
interrupt. We could probably change to use the malloc pool in all
cases, or tune some of the bottlenecks in our mem pool
implementation. However, as hinted in the last message, optimization
of things which aren't the bottleneck isn't usually the best use of
one's time. Will try to shark it and tune a bit before Max 5 release,
though.

-Joshua

# define A_BIG_NUMBER        100000
# define A_SMALL_NUMBER        16
    // malloc test
    {
        long i,count;
        char *p;
        char *q[A_BIG_NUMBER];
        double start,end;

        cpost ("ITERATION COUNT: %d; ALLOC SIZE(BYTES): %dn",
A_BIG_NUMBER, A_SMALL_NUMBER);
        cpost("POOL REUSE n");
        start = mactimer_gettime();
        for (i=0;i
            p = getbytes(A_SMALL_NUMBER);
            freebytes(p,A_SMALL_NUMBER);
        }
        end = mactimer_gettime();
        cpost("getbytes: %f msn",end-start);

        start = mactimer_gettime();
        for (i=0;i
            p = (char *)malloc(A_SMALL_NUMBER);
            free(p);
        }
        end = mactimer_gettime();
        cpost("malloc: %f msn",end-start);

        cpost("POOL GROW n");
        start = mactimer_gettime();
        for (i=0;i
            q[i] = getbytes(A_SMALL_NUMBER);
        for (i=0;i
            freebytes(q[i],A_SMALL_NUMBER);
        end = mactimer_gettime();
        cpost("getbytes: %f msn",end-start);

        start = mactimer_gettime();
        for (i=0;i
            q[i] = (char *)malloc(A_SMALL_NUMBER);
        for (i=0;i
            free(q[i]);
        end = mactimer_gettime();
        cpost("malloc: %f msn",end-start);

        cpost("POOL RANDOM n");
        start = mactimer_gettime();
        for (i=0;i
            count = (rand()%A_SMALL_NUMBER) + 1;
            p = getbytes(count);
            freebytes(p,count);
        }
        end = mactimer_gettime();
        cpost("getbytes: %f msn",end-start);

        start = mactimer_gettime();
        for (i=0;i
            count = (rand()%A_SMALL_NUMBER) + 1;
            p = (char *)malloc(count);
            free(p);
        }
        end = mactimer_gettime();
        cpost("malloc: %f msn",end-start);

    }

Peter Castine's icon

Quote: jkc wrote on Tue, 08 May 2007 20:15
----------------------------------------------------

> Yes. Most of this is legacy based on NewPtr not being accessible in
> interrupt.
----------------------------------------------------

I was recently rereading Miller's early papers on Max, and I would venture that as much of the legacy was concern about mid-80s state-of-the-art memory allocation taking large (and indeterminate) amounts of time.

But NewPtr() in interrupt was a pretty killer argument.

-- P.