Appendix A: QuickTime Confidential

Structure of QuickTime movies

When you imagine a QuickTime movie, chances are good that you think of something similar to film: a single series of frames arranged linearly in time, with or without sound.

In fact, the structure of QuickTime movies is far more similar to the model you may be familiar with from MIDI and audio sequencers.

QuickTime movies are containers for tracks. Each track contains a particular type of media—such as audio or video—and defines the spatial and/or temporal boundaries of that media. For instance, a QuickTime movie might be 320x240 in size, and 3 minutes in length, while one of its video tracks is 80x60 in size, and only 30 seconds long. QuickTime movies are never longer than the end of their longest track.

Each track in a QuickTime movie might be of a different type—video, audio, sprites, MIDI (QuickTime calls these music tracks), and VR Panoramas are some of the available types—and might start at a different point within the movie. You can have several tracks of identical types in a single movie as well. Tracks are independent—they can be turned on and off individually, and they can all be of different lengths.

The QuickTime music track is a container for a Standard MIDI File. Using the QuickTime Settings Control Panel, you can set the output for these tracks. By default, QuickTime uses the QuickTime Music Synthesizer, which is a General MIDI software synthesizer built into QuickTime. If you have Open Music System (OMS) installed, you can have QuickTime send music tracks out via OMS to a hardware synthesizer. Or, by using the Inter-Application Communication bus (IAC), you can send them right back into Max!

The visual priority of tracks is determined by the track’s visibility layer. Track layers are numbered from –32768 to 32767. Tracks with low visibility layer values will be displayed in front of tracks with high visibility layer values. Audio tracks will automatically mix, regardless of their layer value.

The jit.qt.movie object offers a number of track-centered functions. A brief list of some useful messages follows:

General Track Editing
addtrack: create a new track
deletetrack: delete a track
gettrackinfo: displays several pieces of track information (index, name, type, enabled status, visibility layer)
gettrackoffset: displays a track’s temporal offset (set using the trackoffset message)
gettrackduration: displays a track’s duration (set using the trackduration message)
gettrackdim: displays a track’s spatial dimensions (set using the trackname message)
gettracktype: displays a track’s type (type cannot be set—use addtrack to create a new track)
gettracklayer: display a track’s visibility layer (set using the tracklayer message)
gettrackcodec: display a track’s codec (codec cannot be set)
gettrackenabled: displays a track’s enabled status (set using the trackenabled message)
getgmode: displays a track’s drawing mode—the manner in which it interacts with other tracks (set using the gmode message)

Important: Track edits of any sort, including adding or deleting tracks, will not be automatically saved with a movie when it is closed. Use the savemovie, savemovieas or savemoviecopy messages to permanently save your changes, or set the jit.qt.movie object’s autosave attribute to 1 (on).

QuickTime VR
vrpan: set the pan for a VR movie
vrtilt: set the tilt for a VR movie
vrfov: set the field of view for a VR movie
vrnode: set the current node for a multi-node VR movie (use the getvrnodelist message to enumerate available nodes)

QuickTime Effect (see Tutorial 24: QuickTime Effects for more information)
addfxtrack: create a QuickTime Effects track set (1 to 3 actual tracks), by importing a .qfx file
deletefxtrack: delete any and all QuickTime Effects tracks created by the addfxtrack message

Background Color
addbgtrack: create a background color track
deletebgtrack: delete a background color track created by Jitter.

Time in QuickTime

Unlike most hardware-based media, QuickTime does not use a frame-based time system. Instead, QuickTime uses a system based on timescale and time values. Timescale is an integer value that indicates how many time values make up one second of a movie. By default, a new QuickTime movie has a timescale of 600—meaning that there are 600 time values per second.

Frame rate is determined by the concept of interesting time—times where the movie changes. If a movie changes every 40 time values, then it has an effective frame rate of 15 frames per second (since 600 divided by 40 equals 15). When Jitter determines how many frames are in a movie with the getframecount message, it scans the movie for interesting times, and reports their number. The frame and jump messages move the “play head” between different points of interesting time.

For more information on the relationship between timescale and frame rate, refer to Tutorial 4: Controlling Movie Playback.

In Jitter, recording operations permit you to specify a frame rate, and Jitter takes care of the calculations for you as it constructs a new movie. Editing operations, in contrast, use time values to determine their range of effect.

Optimizing Movies for Playback in Jitter

Although Jitter will gladly play any movie you throw at it, there are some guidelines you can follow to improve performance. Sadly, there is no precise recipe for perfection—performance in a real-time application such as Jitter is the result of the interaction between a movie’s track codecs (which affect data bandwidth and processor load), movie dimensions, frame rate and, to some extent, the complexity of the media being processed.

Codecs

Visual media, in particular, contain large amounts of data that must be read from disk and processed by your computer before they can be displayed. Codecs, or compressor/ decompressors, are used to encode and decode data. When encoding, the goal is generally to thin out the data so that less information has to be read from disk when the movie plays back. Reading data from disk is a major source of overhead when playing movies.

If you have enough RAM, you can use the loadram message to jit.qt.movie to copy a movie’s (compressed) media to RAM. Since accessing RAM is significantly faster than accessing a hard disk, movie playback will generally improve, although the jit.qt.movie object still has to decompress each frame as the movie plays. To buffer decompressed matrix data to RAM, use the jit.matrixset object.

When decoding, the goal is to return the data to its pre-encoded state as quickly as possible. Codecs, by and large, are lossy, which means that some data is lost in the process. As users, our goal is to figure out which codec offers the greatest quality at the greatest speed for our particular application.

To determine a QuickTime movie’s track codecs, use the gettrackcodec message to jit.qt.movie.

Audio Codecs

Codecs are available for both video and audio tracks. For online distribution, you might want to use an MPEG 2 Level 3 (.mp3) or QDesign audio codec to create smaller files. In Jitter, however, if you are playing movies with video and audio tracks, you’ll achieve the best results with uncompressed audio (PCM audio) simply because there will be no audio codec decompression overhead.

Technical Note: .mp3 files can be read byjit.qt.movie as audio-only movies (they are compressed with the MPEG 2 Layer 3 codec). Although you can’t process the audio data with any other Jitter objects besides the jit.qt.movie object, you can use the jit.qt.movie’s soc attribute to send the audio to MSP via the spigot~ object (see Tutorial 27: Using MSP Audio in a Jitter Matrix for more information).

Video codecs

Video codecs may be handled in hardware—you may have a special video card that provides hardware compression and decompression of a particular codec, like MPEG or Motion-JPEG—or more typically in software. In Jitter, hardware codec support is only relevant to video output components. Movie playback will always use a software codec, with the important exception of the jit.qt.movie object’s direct to video output component feature (see Tutorial 22: Video Output Components and the Object Reference entry for the jit.qt.movie object for more information).

Technical Note: Video cards that provide hardware support of codecs usually only support them for onscreen decompression of media. Since Jitter generally decompresses media into an offscreen buffer (with the exception noted above), software codecs are used.

Video codecs generally use one or both of the following schemes: spatial and temporal compression.

Spatial compression is probably familiar to you from the world of still images. JPEG, PNG and PICT files each use types of spatial compression. Spatial compression schemes search a single image frame for patterns and repetitions that can be described in a simpler fashion. Most also simplify images to ensure that they contain these patterns. Nevertheless, more complex images are harder to compress, and will generally result in larger files. Spatial compression does not take time into account—it simply compresses each frame according to its encoding algorithm.

Temporal compression is unique to the world of moving images, since it operates by creating a description of change between consecutive frames. In general, temporal compression does not fully describe every frame. Instead, a temporally compressed movie contains two types of frames: keyframes, which are fully described frames (usually spatially compressed, as well), and regular frames, which are described by their change from the previous keyframe.

For applications where a movie will be played from start to finish, temporal compression is quite useful. Codecs like Sorenson use temporal compression to create extremely small files that are ideal for web playback. However, temporal compression is not a good choice if you need to play your movie backwards, since the order of the keyframes is vital to properly describing the sequence of images. If we play a temporally compressed movie backwards, the change descriptions will be processed before the keyframe that describes the initial state of the frame! Additionally, the Sorenson codec is quite processor-intensive to decompress. Playback of a Sorenson-compressed movie will be slower than playback of a movie compressed using a “lighter” method.

For Jitter playback, we recommend using a video codec without temporal compression, such a Photo-JPEG or Motion-JPEG (Photo- and Motion-JPEG compression use the same compression method, but Motion-JPEG is optimized for special hardware support [see note above]). At high quality settings, JPEG compression offers a nice compromise between file size and image quality. It’s also relatively simple to decode, so your processor can be put to better use than decompressing video.

If image quality is of principle importance, the Animation codec looks better than Photo-JPEG, but creates much larger files.

Different versions of QuickTime support different audio and video codecs. For instance, QuickTime 5 doesn’t support the MPEG-4 codec, although QuickTime 6 does. You should experiment with different codec settings to find the best match for your particular use of Jitter.

Movie Dimensions and Frame Rate

Compared to codec, movie dimensions and frame rate are more straightforward factors in Jitter performance. Simply put, a bigger image frame or a higher frame rate indicates that Jitter has more data to process each second.

A 640x480 movie produces 1,228,800 individual values per frame for Jitter to process (640 times 480 times 4 (separate values for alpha, red, green and blue channels). A 320x240 movie produces a mere 307,200 individual values per frame, one quarter the size of the larger movie. Where possible, we recommend using smaller movies if you intend to use Jitter for processing. On most machines, 320x240 movies will give fine performance.

If you are working with DV media, your movies are recorded at 29.97 frames per second (in NTSC) or 25 frames per second (in PAL). Even using a 320x240 movie, Jitter has to process 9,206,784 values per second in NTSC and 7,680,000 values per second in PAL. Thinning this data, by reducing the frame rate to 15 or 20 frames per second, will improve performance significantly if you are using Jitter for processing. On most machines, 15 fps is a good choice.

Our Favorite Setting

We’ve found that, for most movies, the following parameters yield consistently good results:

• 320x240 frame size

• 15 frames per second

• Video tracks: Photo-JPEG codec, using a medium to high spatial quality setting

• Audio tracks: no compression

Summary

QuickTime movies can contain many tracks of different types—audio, video, QuickTime VR and text, to name a few. In Jitter, we can both get and set basic information about these tracks—including their temporal offset, spatial dimensions, duration, visibility layer and graphic transfer mode—using the jit.qt.movie object’s track messages.

QuickTime’s time model is somewhat different from the standard frame-based model, relying on a timescale value to determine the number of time values in a second of QuickTime media. The jit.qt.movie object allows for playback navigation using both frame and timescale models. All editing functions in the jit.qt.movie object use the timescale model to determine the extent of their effect.

Determining the ideal settings for a movie used in Jitter is an inexact science. Actual performance depends on a number of factors, including codec, dimensions, frame rate and media complexity. For best results on current hardware, we recommend using 320x240, 15 fps, Photo-JPEG compressed movies