LDImage

From DaphneWiki

Revision as of 20:26, 1 February 2012 by Matt (Talk | contribs)
Jump to: navigation, search

Contents

The need for a Laserdisc Image Format

Problems to solve:

  1. File size needs to be as small as possible while still providing a good experience when playing a game
  2. Audio must be synced up as exactly as possible with video to cater to some game designs (Firefox for example)
  3. Video frames need to be stored separately as two fields to cater to some game designs
  4. Any field needs to be able to be immediately decoded without relying on previous fields (a problem with mpeg2)
  5. Playing backward must be supported and offer equivalent performance to playing forward (a problem with mpeg2)

Laserdisc Image Format

The laserdisc image format will use the container format I've written for mpolib (not discussed here).

Blob Index Blob ID Description
0 0xF0 The header, which needs to contain at minimum a version identifier.

At this time, it will contain will be the four ASCII bytes '1', 'L', 'D', 'I' in that order, followed by a 4-byte little-endian pixel width integer, a 4-byte little-endian pixel height integer, and a 4-byte little-endian integer which will be 1 if the frame rate is 29.97f. A total of 16 bytes.

1 0xE0 a common JPEG header (ie the 'tables')
2 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 0 of track 0
3 0x20 AUDIO: uncompressed 44100 Hz 16-bit PCM audio spanning the time occupied by field 0 of track 0
4 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 1 of track 0
5 0x20 AUDIO: uncompressed 44100 Hz 16-bit PCM audio spanning the time occupied by field 1 of track 0.
6 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 0 of track 1
... ... And so on until the final laserdisc track has been stored
Last Blob 0xD0 the VBI data (stored in my VBI format)

So the algorithm to search for a track will be:

blob index = (track index * 4) + 2

Because there are 2 blobs at the beginning.

About Blob ID's

The container API (from mpolib) allows each container to have an arbitrary 32-bit ID. I did this so I could support having completely blank frames and completely empty audio to save space. Laserdiscs often have periods of blank video and audio which show up as noisy black frames and noisy analog audio, which we do not want to store! I also want to use these ID's so that I can use different compression schemes in the future; for example, I may want to try compressing the audio in the future.

Therefore, the ID's are as follows:

ID Description
0 Undefined
0x10 Regular JPEG video field
0x11 Blank video field (blob will contain three bytes that represent the YUV color that the blank frame should be filled with. The first byte will be Y, the second byte will be U, the third byte will be V. YUV was chosen because it is optimized.)
0x20 Regular uncompressed AUDIO (44100 Hz, 16-bit, PCM)
0x21 Blank audio (blob will contain one little-endian, 32-bit unsigned integer which indicates how many blank bytes (44100 Hz, 16-bit) this blank audio represents)
0xD0 VBI blob
0xE0 JPEG header
0xF0 Version header

Why does the VBI blob need to come last?

This is actually kind of important. The VBI is the data that I feel is most likely to change if anything changes, and by putting it at the end, it ensures that any required changes to the VBI will have a minimal effect on the overall image file. If the VBI was at the beginning and its size needed to be changed, this would impact the entire file which would be costly.

Why does the JPEG data come before the audio data?

This is deliberate. The plan is that the JPEG data will be read first, then handed off to another thread (ideally, another CPU) to be decompressed. This allows the first thread to continue working on audio.

Why 44100 Hz audio instead of 48000 Hz?

Daphne is already built on 44100 Hz audio, and changing Daphne to use 48000 Hz audio would require either changing all existing .OGG audio files that are out there (not worth it), or supporting both 48kHz and 44.1kHz (which I don't have a good enough reason to consider at this point). I feel that 48 kHz audio is a good conservative choice for preservation, but for presentation, 44.1 kHz audio should be more than adequate, and it reduces the overall file size (which is important to me).

Future Extensions

Someone suggested that I extend this format to include multiple audio tracks. I've thought about how I would do this and my thinking right now is to create a new version header that also would indicate how many audio tracks are present. Then each audio blob would contain each audio track appended. (so if there were two audio tracks, each audio blob would be twice as big) This should be enough info to figure out where the correct audio track data is.

JSON extension

I plan on replacing blob 0 with a new version header as well as some JSON metadata which would theoretically allow infinite expansion. Here is a sample of what the JSON data might look like:

{
   disc:{
       id:7,
       iname:'Esh\'s NTSC',
       notes:'Captured in 1983 with awesome hardware',
       w:640,
       h:480,
       fps:29.97,
       interlaced:1,
       audio:{
           freq:44100,
           track:[
               'English',
               'Japanese'
           ]
       }
   }
}

A few goals to explicitly specify:

  • Disc type should be able to be detected by software. Therefore, the disc's ID should be unique for every disc and canonical. An id of 0 would mean that the disc's ID is not specified.
  • iname means informal name; it would generally be ignored by software except to display to a human.
  • notes would be purely optional
  • w, h, fps, and interlaced would be mandatory for all discs. fps and interlaced are included for flexibility (I could've just said it's either ntsc or pal) to expand beyond laserdisc restrictions.
  • the audio section allows for frequencies other than 44.1 khz and also allows for multiple audio tracks with a name for each track.
Personal tools