* what I suppose I should do is have each thread calculate one half of the oscillators, then sum together the results from the threads and send the sum out of my signal outlet.
Yes. However keep in mind if there are more cores available, there will be more than two threads, so it would be 1/Nth rather than half.
* at which moments should I create and execute the parallel task - dsp method, perform method, ...?
We create the task the first time the perform method is called. In the above example, myobj_run() would be called from myobj_perform().
* am I wrong for guessing that the parallel task should be executed once per vector, from within the perform method?
You are correct. parallel_execute should be called once from your perform method.
*should the perform method wait for each parallel task to finish, then add the partial results together and send the sum out of the outlet?
Yes. However you don't need to do anything special w/r/t synchronization. parallel_execute will run one potion in the main audio thread and additional portions in other threads, waiting for the other threads to complete before returning. No need to do any of the additional management you mention in your message. It's all done for you.
I would say start implementing your solution, and let us know any issues you encounter.
-Joshua