LDIMage

From DaphneWiki

Jump to: navigation, search

Contents

The need for a Laserdisc Image Format

Problems to solve:

  1. File size needs to be as small as possible while still providing a good experience when playing a game
  2. Audio must be synced up as exactly as possible with video to cater to some game designs (Firefox for example)
  3. Video frames need to be stored separately as two fields to cater to some game designs
  4. Any field needs to be able to be immediately decoded without relying on previous fields (a problem with mpeg2)
  5. Playing backward must be supported and offer equivalent performance to playing forward (a problem with mpeg2)

Laserdisc Image Format

The laserdisc image format will use the container format I've written for mpolib (not discussed here).

Blob Index Blob ID Description
0 0xF0 The header, which needs to contain at minimum a version identifier.

The most current version will be the four ASCII bytes '2', 'L', 'D', 'I' in that order, followed by a UTF-8 string of JSON text which is described below. The previous (deprecated) version was the four ASCII bytes '1', 'L', 'D', 'I' in that order, followed by a 4-byte little-endian pixel width integer, a 4-byte little-endian pixel height integer, and a 4-byte little-endian integer which will be 1 if the frame rate is 29.97f. A total of 16 bytes. (the height refers to a frame, not a field, so for NTSC it would be 480)

1 0xE0 a common JPEG header (ie the 'tables')
2 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 0 of track 0
3 0x20 AUDIO: uncompressed 44100 Hz 16-bit PCM audio spanning the time occupied by field 0 of track 0
4 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 1 of track 0
5 0x20 AUDIO: uncompressed 44100 Hz 16-bit PCM audio spanning the time occupied by field 1 of track 0.
6 0x10 VIDEO FIELD: an "abbreviated" JPEG of field 0 of track 1
... ... And so on until the final laserdisc track has been stored
Last Blob 0xD0 the VBI data (stored in my VBI format)

So the algorithm to search for a track will be:

blob index = (track index * 4) + 2

Because there are 2 blobs at the beginning.

JSON text

Blob 0 contains a version ID ("2LDI") followed by JSON text which describes the attributes such as width and height of the video.

A typical sample of what this JSON text may look like is:

{
   "d_id": "17",
   "p_ids": [ "1","3" ],
   "name": "Dragon's Lair",
   "note": "NTSC, captured in 2001",
   "w": 640,
   "h": 480,
   "type": "NTSC"
}

Discs can have multiple audio tracks. Here is how that may look:

{
   "d_id": "23",
   "p_ids": [ "1" ],
   "name": "Esh's Aurunmilla",
   "w": 720,
   "h": 480,
   "type": "NTSC",
   "audio": {
     "track": [
       "English",
       "Japanese"
     ]
   }
}

Here is a description of each possible element in the JSON:

Name Description Default value
d_id A canonical ID arbitrarily assigned to known laserdiscs by myself (Matt O) for the purpose of allowing software to auto-detect a disc type without having to clumsily try to parse an English string. If your disc does not match one of my ID's, you can omit the ID name/value pair. A full list of canonical disc IDs is at http://www.daphne-emu.com/ldimage/discids.php 0
p_ids An array of player IDs. The IDs are arbitrarily assigned to known laserdisc players by myself (Matt O) for the purpose of helping software know which laserdisc players were used with the disc in arcade games. This list is purely optional and only useful if the software does not recognize the disc ID but would benefit from knowing what player types match up with the disc. A full list of canonical player IDs is at http://www.daphne-emu.com/ldimage/playerids.php
name An arbitrary name for the disc, for the purpose of displaying something interesting to a human.
note Any arbitrary notes about the disc image that may be of interest to anyone. Line breaks should be \r followed by \n (0xD and 0xA respectively).
w Width of a video frame. Must be set.
h Height of a video frame (not field). Must be set.
type Frames per second that the video should run at. "NTSC" and "PAL" are only valid choices here. NTSC
audio Object describing audio characteristics. Currently only used to specify whether there are multiple audio tracks. Single unnamed audio track.

Multiple Audio Tracks

If more than one audio track is present, then each audio blob would contain both tracks "smooshed" together. To find the boundary of each track you'd take the total size of the audio blob's payload and divide it by the number of audio tracks.

About Blob ID's

The container API (from mpolib) allows each container to have an arbitrary 32-bit ID. I did this so I could support having completely blank frames and completely empty audio to save space. Laserdiscs often have periods of blank video and audio which show up as noisy black frames and noisy analog audio, which we do not want to store! I also want to use these ID's so that I can use different compression schemes in the future; for example, I may want to try compressing the audio in the future, or adding support for lossless video.

Therefore, the ID's are as follows:

ID Description
0 Undefined
0x10 Regular JPEG video field
0x11 Blank video field (blob will contain three bytes that represent the YUV color that the blank frame should be filled with. The first byte will be Y, the second byte will be U, the third byte will be V. YUV was chosen because it is optimized.)
0x20 Regular uncompressed AUDIO (44100 Hz, 16-bit, PCM)
0x21 Blank audio (blob will contain one little-endian, 32-bit unsigned integer which indicates how many blank bytes (44100 Hz, 16-bit) this blank audio represents)
0xD0 VBI blob
0xE0 JPEG header
0xF0 Version header

Why does the VBI blob need to come last?

This is actually kind of important. The VBI is the data that I feel is most likely to change if anything changes, and by putting it at the end, it ensures that any required changes to the VBI will have a minimal effect on the overall image file. If the VBI was at the beginning and its size needed to be changed, this would impact the entire file which would be costly.

Why does the JPEG data come before the audio data?

This is deliberate. The plan is that the JPEG data will be read first, then handed off to another thread (ideally, another CPU) to be decompressed. This allows the first thread to continue working on audio.

Why 44100 Hz audio instead of 48000 Hz?

Daphne is already built on 44100 Hz audio, and changing Daphne to use 48000 Hz audio would require either changing all existing .OGG audio files that are out there (not worth it), or supporting both 48kHz and 44.1kHz (which I don't have a good enough reason to consider at this point). I feel that 48 kHz audio is a good conservative choice for preservation, but for presentation, 44.1 kHz audio should be more than adequate, and it reduces the overall file size (which is important to me).

Converting abbreviated JPEG (and headers) to a full JPEG

I spent a little bit of time figuring this out (and reading http://www.jpeg.org/public/jfif.pdf and http://www.w3.org/Graphics/JPEG/itu-t81.pdf ) so I figured I'd share the little routine I wrote to do this. This _seems_ to work though I have not tested it too extensively so I may find bugs later.

http://www.daphne-emu.com/ldimage/abbrev_jpeg_to_full.cpp.txt

Personal tools