Slow copying input matrix data, cause A/V sync problem, can I use pthread inside matrix_calc function?
I am writing a jitter MOP external in C which encodes video and audio and broadcast the stream over RTMP. I am copying incoming video matrices and audio samples into their saparate queues and a saparate thread pics up audio and video data and encodes them. I am copying incoming matrices in matrix_calc function using memcpy. for making this more efficient I am using parallel_ndim_simplecalc2() utility function.
The problem is that audio and video falls out of sync occasionally, According to my investigation it is happening because matrix_calc function does not copy data fast enough. Specially when the maxpat contains a number of different jitter operations, the situation gets worse.
I wonder if there is anything I can do to make it more efficient? Is it possible to give a higher priority to my MOP somehow ? Or Can I use separate thread inside matrix_calc function?
Here is the code :
t_jit_err job_matrix_calc(t_job* job, void* inputs, void* outputs){
void* inMatrix = jit_object_method(inputs, _jit_sym_getindex, 0);
if(inMatrix) {
long inMatrixLock = (long)jit_object_method(inMatrix, _jit_sym_lock, 1);
t_jit_matrix_info matrixInfo, bufferInfo;
guchar* inMatrixData;
long dim[JIT_MATRIX_MAX_DIMCOUNT];
jit_object_method(inMatrix, _jit_sym_getinfo, &matrixInfo);
jit_object_method(inMatrix, _jit_sym_getinfo, &bufferInfo);
long width = matrixInfo.dim[0];
long height = matrixInfo.dim[1];
long planeCount = matrixInfo.planecount;
/*getting input matrix data*/
jit_object_method(inMatrix, _jit_sym_getdata, &inMatrixData);
//get dimensions/planecount
for (int i=0; i<videoDimCount; i++) {
dim[i] = matrixInfo.dim[i];
}
long bufferSize = width * height * planeCount;
unsigned char* buffer = (unsigned char*)malloc(bufferSize * sizeof(unsigned char));
/*FOR PARALLEL PROCESSING*/
/*Following function puts video matrix data into buffer*/
jit_parallel_ndim_simplecalc2((method) job_calculate_ndim, job, matrixInfo.dimcount, dim, matrixInfo.planecount, &matrixInfo, (char*) inMatrixData, &bufferInfo, (char*) buffer, 0, 0);
/*This is pointer to my videoSample structure where I am putting matrix data in buffer to video sample*/
videoSample* sample = new videoSample(width, height, planeCount, buffer);
if ( job->videoSamples->size()<21)
{ job->videoSamples->push(sample); // push video sample ( )
if ( job->videoSamples->size()<=3)
{
unsigned char* buffer2 = (unsigned char*)malloc(bufferSize * sizeof(unsigned char));
jit_parallel_ndim_simplecalc2((method) job_calculate_ndim, job, matrixInfo.dimcount, dim, matrixInfo.planecount, &matrixInfo, (char*) inMatrixData, &bufferInfo, (char*) buffer2, 0, 0);
videoSample* sample2 = new videoSample(width, height, planeCount, buffer2);
job->videoSamples->push(sample2);
}}else
g_free(buffer);
jit_object_method(inMatrix, _jit_sym_lock, inMatrixLock);
}
else
return JIT_ERR_INVALID_PTR;
return JIT_ERR_NONE;
}
void job_calculate_ndim(t_job* job,
long dimcount, long *dim, long planecount,
t_jit_matrix_info *videoInfo, char *videoData,t_jit_matrix_info *sampleInfo, char *buffer)
{
long dims[videoDimCount];
dims[0] = videoInfo->dim[0];
dims[1] = videoInfo->dim[1];
int i;
long bufferSize = videoInfo->dim[0] * planecount * videoInfo->dim[1];
uchar *op;
if (videoDimCount<1) return; //safety
switch(videoDimCount)
{
case 1:
// if only 1D, interpret as 2D, falling through to 2D case
dim[1]=1;
case 2:
memcpy(buffer, videoData, bufferSize);
break;
default:
// if processing higher dimension than 2D,
// for each lower dimensioned slice, set
// base pointer and recursively call this function
// with decremented dimcount and new base pointers
for (i=0; i<videoInfo->dim[dimcount-1]; i++)
{
op = (uchar*) videoData + i*videoInfo->dimstride[videoDimCount-1];
job_calculate_ndim(job,dimcount,
dims,planecount,videoInfo,videoData, sampleInfo, buffer);
}
}
}Hi,
There is nothing to prevent you from pushing work out to additional threads within your matrix calc method. It's not clear from the info given if this will actually improve or solve the situation, as these kind of problems can be notoriously difficult. It likely will be related to the size and structure of the matrices as well.
Cheers!
Hi, Timothy,
Thanks for replying. I recently have tried using worker threads through systhreadpool as well as through g_thread_pool. It turns out, it actually slows things further.