LDImage

From DaphneWiki

Revision as of 20:20, 6 February 2012 by Matt (Talk | contribs)
Jump to: navigation, search

Contents

The need for a Laserdisc Image Format

Problems to solve:

  1. File size needs to be as small as possible while still providing a good experience when playing a game
  2. Audio must be synced up as exactly as possible with video to cater to some game designs (Firefox for example)
  3. Video frames need to be stored separately as two fields to cater to some game designs
  4. Any field needs to be able to be immediately decoded without relying on previous fields (a problem with mpeg2)
  5. Playing backward must be supported and offer equivalent performance to playing forward (a problem with mpeg2)

Laserdisc Image Format

The laserdisc image format will use the container format I've written for mpolib (not discussed here).

Blob Index Blob ID Description
0 0xF0 The header, which needs to contain at minimum a version identifier.

The most current version will be the four ASCII bytes '2', 'L', 'D', 'I' in that order, followed by a UTF-8 string of JSON text which is described below. The previous (deprecated) version was the four ASCII bytes '1', 'L', 'D', 'I' in that order, followed by a 4-byte little-endian pixel width integer, a 4-byte little-endian pixel height integer, and a 4-byte little-endian integer which will be 1 if the frame rate is 29.97f. A total of 16 bytes. (the height refers to a frame, not a field, so for NTSC it would be 480)

1 0xE0 a common JPEG header (ie the 'tables')
2 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 0 of track 0
3 0x20 AUDIO: uncompressed 44100 Hz 16-bit PCM audio spanning the time occupied by field 0 of track 0
4 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 1 of track 0
5 0x20 AUDIO: uncompressed 44100 Hz 16-bit PCM audio spanning the time occupied by field 1 of track 0.
6 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 0 of track 1
... ... And so on until the final laserdisc track has been stored
Last Blob 0xD0 the VBI data (stored in my VBI format)

So the algorithm to search for a track will be:

blob index = (track index * 4) + 2

Because there are 2 blobs at the beginning.

Future Extensions

Someone suggested that I extend this format to include multiple audio tracks. I've thought about how I would do this and my thinking right now is to create a new version header that also would indicate how many audio tracks are present. Then each audio blob would contain each audio track appended. (so if there were two audio tracks, each audio blob would be twice as big) This should be enough info to figure out where the correct audio track data is.

JSON header

Blob 0 contains a version ID ("2LDI") followed by JSON text which describes the attributes such as width and height of the video.

A typical sample of what this JSON text may look like is:

{
   "id": 1,
   "name": "Dragon's Lair",
   "note": "NTSC, captured in 2001",
   "w": 640,
   "h": 480,
   "fps": 29.97,
   "interlaced": true,
}

Discs can have multiple audio tracks and can be a different audio frequency. Here is how that may look:

{
   "id": 0,
   "name": "Esh's Aurunmilla",
   "w": 720,
   "h": 480,
   "fps": 29.97,
   "interlaced": true,
   "audio": {
     "freq": 44100,
     "bits": 16,
     "channels" : 2,
     "track": [
       "English",
       "Japanese"
     ]
   }
}

Here is a description of each possible element in the JSON:

Name Description Default value
id A canonical ID arbitrary assigned to known laserdiscs by myself (Matt O) for the purpose of allowing software to auto-detect a disc type without having to clumsily try to parse an English string. If your disc does not match one of my ID's, you can set ID to 0 or omit it entirely. 0
name An arbitrary name for the disc, for the purpose of displaying something interesting to a human. None, not required.
w Width of a video frame. Must be set. None
h Height of a video frame (not field). Must be set. None
fps Frames per second that the video should run at. 29.97 for NTSC, 25.0 for PAL. Anything else is legal but may be unsupported.

A few goals to explicitly specify:

  • Disc type should be able to be detected by software. Therefore, the disc's ID should be unique for every disc and canonical. An id of 0 would mean that the disc's ID is not specified.
  • "name" is an informal name; it would generally be ignored by software except to display to a human.
  • "note" would be purely optional
  • w, h, fps, and interlaced would be mandatory for all discs. fps and interlaced are included for flexibility (I could've just said it's either ntsc or pal) to expand beyond laserdisc restrictions.
  • the audio array allows for frequencies other than 44.1 kHz and also allows for multiple audio tracks with a name for each track. Everything in the audio array (including the array itself) would be optional. If the array is not present, then 44.1 kHz, 16-bit stereo is assumed (and always little endian).

About Blob ID's

The container API (from mpolib) allows each container to have an arbitrary 32-bit ID. I did this so I could support having completely blank frames and completely empty audio to save space. Laserdiscs often have periods of blank video and audio which show up as noisy black frames and noisy analog audio, which we do not want to store! I also want to use these ID's so that I can use different compression schemes in the future; for example, I may want to try compressing the audio in the future.

Therefore, the ID's are as follows:

ID Description
0 Undefined
0x10 Regular JPEG video field
0x11 Blank video field (blob will contain three bytes that represent the YUV color that the blank frame should be filled with. The first byte will be Y, the second byte will be U, the third byte will be V. YUV was chosen because it is optimized.)
0x20 Regular uncompressed AUDIO (44100 Hz, 16-bit, PCM)
0x21 Blank audio (blob will contain one little-endian, 32-bit unsigned integer which indicates how many blank bytes (44100 Hz, 16-bit) this blank audio represents)
0xD0 VBI blob
0xE0 JPEG header
0xF0 Version header

Why does the VBI blob need to come last?

This is actually kind of important. The VBI is the data that I feel is most likely to change if anything changes, and by putting it at the end, it ensures that any required changes to the VBI will have a minimal effect on the overall image file. If the VBI was at the beginning and its size needed to be changed, this would impact the entire file which would be costly.

Why does the JPEG data come before the audio data?

This is deliberate. The plan is that the JPEG data will be read first, then handed off to another thread (ideally, another CPU) to be decompressed. This allows the first thread to continue working on audio.

Why 44100 Hz audio instead of 48000 Hz?

Daphne is already built on 44100 Hz audio, and changing Daphne to use 48000 Hz audio would require either changing all existing .OGG audio files that are out there (not worth it), or supporting both 48kHz and 44.1kHz (which I don't have a good enough reason to consider at this point). I feel that 48 kHz audio is a good conservative choice for preservation, but for presentation, 44.1 kHz audio should be more than adequate, and it reduces the overall file size (which is important to me).

Personal tools