Multiformat Media Delivery



This article is part of a set: Multimedia Encoding Workflow, Multiformat Media Delivery, Video Distribution

IMPORTANT: This page assumes that Flash 9 has come out of beta, i.e. that H.264/AAC support is in the current non-beta releash of the Flash player, see 'note 3' below. See Flash H.264.

News: Basic support for Ogg in Firefox and Opera would warrant inclusion of Ogg formats below, see HTML 5 video tag and Ogg.

Contents

1 Introduction

The page on Encoding Timebased Media makes a case for H.264/AAC and mp3 as standard formats. It suggests that, if you encode to a single format, you use H.264/AAC and deliver this both for online viewing (Flash 9), for download and for podcasting. It suggests that if you encode to a 2nd format, you should add audio-only mp3 at 32kbps.

This choice will be suitable for a considerable audience, but not your entire potential global audience. Also, you're not reaching that audience with the highest possible quality, and not with the greatest accessibility. In particular, it will not work for viewers on low bandwidth connections (modems, certain geographic areas, handheld devices, ...) So how do you up the quality, and how to you get additional people watching your media?

This page outlines multiformat media delivery options (for time-based media, i.e. linear video and audio). The thinking behind this was developed at least over the last few years (since around 2004), and has of course changed considerably over this time. In 2004, there was a true plurality of formats, and it wasn't clear (in my view) which formats would dominate in the future. Three years on (2007), this plurality of formats is still the case, though to a lesser extent, see Encoding Timebased Media. However, you'll still gain accessibility from providing a range of formats, as well as new audiences. So what should these formats be?

2 The media vs. channel paradigm

(Coming soon.)

3 What formats?

We're assuming that you want to encode into more formats, rather than fewer formats, but that you also want to keep a lid on your CPU time to get a good throughput. I am putting forward that you can cover most bases with nine video encodes and four audio encodes:

(Note: In this document, we use 'MPEG4P2' to refer to the MPEG4 Part 2 video codec, as opposed to the H.264 video codec, MPEG 4 Part 10.) Not all of these nine plus four formats are considered essential, see below.

For audio only materials, you'd use four encodes: high bitrate mp3 or AAC, as well as the same low bandwidth mp3 and amr settings.

You may of course argue with some of these settings, particularly perhaps with the absence of very high bandwidth high-definition formats, or with high quality audio settings within those formats, or stand-alone audio formats. If you specialise in high-definition content, or in high quality music content, then of course you may need to add a few more settings in. However, the above list is geared to cover a reasonable set of formats, that would cover most cases.

We are assuming that you wish to distribute your materials as widely as possible, so we pay not attention to DRM or authentication. If you needed authentication, RealVideo would become quite attractive, as it offers comprehensive authentication and protection mechanisms.

In the diagram, each box represents a video format (i.e. a separate encode). For most formats, you'd use the same formats for streaming and download. There are 9 yellow boxes, and four boxes for audio, i.e. 9 formats for video, and four formats for audio. In the audio section, there are six boxes for audio, i.e. six audio formats. Boxes with thinner outlines represent less essential formats, i.e. for video we've got six essential formats, and three less important formats.

Image:StreamingMediaFormats4low.jpg

4 Notes

Here are some notes, corresponding to the blue circles in the diagram:

4.1 Note 1: Bandwidths and resolutions for video

Bandwidths, particularly at the higher end, are somewhat arbitrary. We have chosen 1000/500/128/64 and call these high/main/low/access. The corresponding video resolutions would be full resolution (Standard Definition) for 'high' and half resolution (CIF) for main/low, and quarter (QCIF) for access/modem/mobile. For each video, two audio only versions are also produced.

'High' at full resolution is important, so that full size scientific content can be watched. Access is important for access. You might think that the least important format of these is probably the 'low' setting. However, it makes sense to provide both and mp4-part2-based format as well as an H.264 format at low bandwidth.

There is no 'high definition' in this list. We mostly shoot in HDV on Z1 or equivalent, and the footage looks best when downgraded to SD 16:9. We rarely actually output in HDV. For web, you'd loose the interlacing in 1080i anyway, and the 1920 horizontal pixels are interpolated from 1440. So a full frame output close to standard definition seems to be the best option. However, if you want to have high definition, and your production quality is XDCAM, etc, you could add a higher bitrate, in a high definition format.

A common objection is this: "My footage is so well shot and lovely, I don't want people to see this at 'access' video rate." I'd say: Let the viewer choose how to access your materials. If they find the 'access' version worthwhile, then perhaps they'll wait until a better version has downloaded.

The same goes for audio-only versions: "My video relies extensively on visual materials, these need to be seen, and so the audio only version is no good." Same response as to the last point. For a lecture, it's of course worth to publish a pdf alongside the mp3, so that viewers can look at the pdf while listening to the mp3.

We haven't run for a long time with AMR-NB yet. Objectively, there's the need for this, but we don't know whether that need has translated into a demand yet, and whether AMR-NB meets that demand in terms of usability. In any case, including AMR-NB is a statement that you care about low bandwidth access, and it won't take much server space or CPU time to encode. If you generally have good audio quality, you could choose AMR-NB near the lower end of bitrates (6-8), otherwise a slightly higher bitrate (say 12). Just in case you don't think there's an access issue in terms of low bandwidth, see Web Design 4 Low Bandwidth.

4.2 Note 1 ctd: Bandwidths for audio

For audio, you'd use

You want all your audio to be normalised, and some encoders can do this as part of the encoding process. AMR-NB and mp3 at 32kbps were discussed in the previous section, and the same comments hold. mp3 at 32kbps is very listenable if your audio has been recorded well. You might want to offer a version at 64kbps, which would give you very good quality for mono recordings. If you want to vary the format, you could make this m4a/AAC. You probably want to offer a higher quality audio version as well, for music.

If you have a lot of music to encode, you might want to consider have some presets that are geared for this, perhaps including an even higher bitrate, or m4a/AAC at 128kbps as well. With video formats you might want to reserve more of the bitrate for the audio.

Of course audio will be much faster to encode than video, so audio formats are easier to do.

4.3 Note 2: The H.264/AAC backbone

The formats. The backbone of the set of formats is H.264/AAC, used for downloading, as well as progressive and/or seekable-progressive into a flash player (say FlowPlayer), and potentially streaming from QTSS.

You might not agree that H.264/AAC is the best format to go for, and you might suggest WMV or RealVideo instead. I'd disagree, but this is quite a long discussion. In brief, my view is that WMV and RealVideo are being squeezed out inbetween Flash for online delivery, and podcasting/downloading on the other hand. RealVideo has got some advantages, in terms of failover, security etc, but we're not considering these as essential features in our scenario.

We encode to m4v or mp4, with H.264/AAC with four bitrates (high/main/low/access) given above, and to mp4 or 3gp with mp4-part2/AAC for 'access' only. If you can for all bitrates, but certainly for the lower bitrates, we want to squeeze out the best image quality, and use high-quality two-pass settings. These five formats will take care of downloading (and syndication via RSS): The files will play back in iTunes, QuickTime player, RealPlayer, as well as a range of other open source players like mplayer, VLC, etc. Need to check Windows Media Player.

You might think that the least important format of these is probably the 'low' setting, or perhaps the duplicated 'access' setting. However, it makes sense to provide both and mp4-part2-based format as well as an H.264 format at low bandwidth. The H264 'access' format will work for web-based modem delivery, while the MPEG4P2 'access' format will work for older mobiles (but not play in flash). However, the 'low' setting still has a low-ish bandwidth, and will have much better image quality.

Delivery. As default delivery, the files are offered for download (obviously), and ideally downloads are syndicated, see section on podcasting below. However, the files (high-access) should also be delivered with http into a flash video player (say FlowPlayer).

As flash is the main format, this should be an http-seekable delivery. It's unclear whether http-based seeking for Flash 9 video in H264/AAC is already working reliably. Ideally you'd http-seekable delivery of m4v files into the flash player. Flash/H264 is the main format, so if this turns out to not work, then one would have to rethink some of the above.

QuickTime Streaming Server. The files won't stream from QuickTime streaming server (QTSS). Would you want to stream from QTSS? QTSS is quite responsive, and works quite well for 'browsing' streamed files by dragging the playhead around (without buffering or letting go of the playhead). So for scientific content this is quite useful. So suppose you wanted to use QTSS.

4.4 Note 3: The Flash 9 beta issue Update 3

The last note was written from a Flash 9 Update 3 beta point of view. However, at the moment, Flash with H264 is still in beta. This affects the present discussion is as much the H.264 'backbone' is not With the release of Flash 9 Update 3 the H.264 'backbone' is available for online viewing, as well as for downloading or quicktime viewing. We should still be encoding those formats now, for when flash 9 comes out of beta, but in the meantime need some extra formats. We thus encode into those formats, but need some extra formats until Flash 9 Update 3 has been adopted widely.

In the meantime, we could thus

The second option is cheaper, and doesn't require additional encodes. So we'd encode all the m4v formats now, and when Flash 9 is out of beta, those become the main formats. Initially many people would still be on older flash versions, so we would check for flash 9 automatically. If the user has it, they get the higher quality files, otherwise they get the legacy format, and a note saying 'if you upgrade to flash 9, you'll get much better quality".

4th Dec 2007: Now that Flash 9 update 3 has come out, we just need to wait for Flash 9 Update 3 to become widespread.

4.5 Note 4: Flash 6 anybody?

If we do Flash 9, because of the H.264 advantage, then what about older versions of flash? It's probably worth to not abandon older Flash versions completely. For one thing it will take a little while before most people have Flash 9.

So which older flash version should we support? Answer: The oldest version that supports movies properly.

Flash player 7 is the sweet spot: It supports video properly, is a comparatively early flash format, with free encoders around. The video quality is worse than Flash 8, but for added compatibility that's a good bargain, and we'll have top video quality in Flash 9 anyway (see above).

4.6 Note 5: WMV and RV

So what about WMV and RV? RealVideo of course was a pioneer of audio and video delivery over the net. However, as a proprietary format, with considerable cost of associated tools, it has lost out to other formats. Windows Media Video has got a strong base through the Windows operating systems, but as a format also seems less relevant now that H.264 covers both online viewing through flash as well as downloads.

About RealVideo: If you support RealVideo, do a single rate stream, delivered (1) off the Helix server if you have one, otherwise off a webserver, and (2) as a download. Here's why:

Windows Media: Because of the Windows operating system, windows media player is widespread. So there's a case for providing Windows Media. I would suggest that you don't want to do multirate server-based delivery, but do single rate delivery, delivered (1) off a windows media server if you have one, or the Helix server if you have one, otherwise off a webserver, and (2) as a download. By the same token, you might want to make a windows media audio file available.

For windows media, as for RV, you don't want to use multi-rate: It uses too much CPU, and multi-rate files will not be usable once you move away from the streaming server. Seekable progressive downloading doesn't work for windows media files, so if you have long files, you might need to

In summary: For both wmv/rv: Single rate files delivered from a streaming server if server available, otherwise progressive download from web server. Streaming server issues for wmv formats (which may get some traffic) may still be desirable. However, if flash seeking works, and we steer people to flash, then few people prob used wmv.

About bandwidths: If you are going to use WMV and/or RV, you might want to vary the bandwidths used for WMV and RV compared to the flash/m4v versions. WMV you'll probably want a similar bandwidth to 'main', perhaps slightly higher. RV you might want at a lower bandwidth, say at 300kbps, inbetween 'main' and 'access'.

By the way: For live streaming, you need a streaming server. QTSS into QuickTime player and into RealPlayer seems like a good option. Otherwise Windows / Real via Helix Universal Server, or Windows via Windows Media Server. Simultaneous multi-format encoding for live streaming requires specialised software, see Live Streaming.

"I am unconvinced by this, and we'll stick with WMV/RV.". Two years ago (2005) various proposals were discussed, and some views were put forward that we should just do mp4, and that this would be sufficient. At that time, I argued strongly that WMV/RV needed to be included: there was little support for mp4 in mainstream players. Of course you could always have downloaded an open source players that would play the format, but experience showed that vast majority of our potential media viewers would not go to the length of installing extra players just to watch out content, either because they could not be bothered, or because they were unable to do so. It was thus imperative to provide formats that were suitable for most people. Also, H.264 wasn't available, so the image quality of MPEG4P2 (in terms of quality for bandwidth) wasn't particularly good, especially at the low bandwidth end.

Over the last two years, the balance has shifted significantly from WMV/RV, first towards flash video, and then towards mpeg4/H.264. Firstly H.264 is a strong codec, at the very least rivalling WMV/RV. Also, player support is much more widely available. However, it is really the prospect of Flash/H.264 that swings the balance, and turns H.264 into a key format for both online viewing, as well as downloading/podcasting. Still not convinced? Time will tell.

4.7 Note 6: Ogg Vorbis and Theora

We may need to add Ogg Vorbis and Theora to the list, see HTML 5 video tag and Ogg.

4.8 Total bitrate

Finally, a note on space requirements, which depends on total bitrate. The total bitrate is 3.5Mbps = 1.5GBph. If all QT versions are duplicated to give separate hinted versions, then about 5Mbps = 2.2GBph. So your server disk space requirements will be between 1.5GB and 2.5GB per hour of material.

Note also that you should keep your source materials in the highest possible quality.

5 Additional Notes

5.1 How many settings?

One question is how many settings you'll need. Of course you'll need one setting per format above, but the number of settings needed also depends on the type of input formats, and to some extend on the automated adjustments your encoder may make.

Generally speaking, you might have to cope with the following input formats:

Use different settings for PAL and NTSC. Only if say 99% of your footage is PAL, you might want to apply the same settings to NTSC footage to save on the extra set of settings. Otherwise (to get highest quality), you might want to have separate settings for PAL and NTSC.

4:3 vs. 16:9. You will need separate settings for 16:9 and 4:3. At best, a 4:3 setting applied to 16:9 input will result in letter-boxed or cropped video.

HDV. Downsizing HDV footage to standard definition and just using the standard def settings is a possible option. You might want to have separate settings for HDV, to leverage the full quality of the format.

Interlaced vs. progressive input. You will (in most circumstances) need separate settings for interlaced or progressive intput. Progressive input should not be interlaced, while interlaced input needs to be deinterlaced. (A circumstance where you could use the same settings is where interlaced footage can be processed without deinterlacing, for instance where you are taking PAL footage to half the number of lines ("poor man's deinterlace"). In that case, you may not need to apply a deinterlacing filter, as the encoder just drops every other field. The same setting would then work for progressive input. However, best to keep settings for interlaced and progressive input logically separate. More info on Interlacing.)

The groups. For each type of input format, we'll need a group of settings to generate the required formats. To get the most out of your input, you might thus need the following groups of settings:

To give a practical angle on this: The vast majority of footage we encounter is 576i50 (4:3), 576i50 (16:9) and 1080i50. But this is very much from the perspective of a single HE institution in the UK in 2007. Elsewhere, or at other times, you'd encounter other input formats you'd need to cater for.

5.2 Workflow scheduling

If you have move material coming in than your setup can deal with, it is advisable to encode 'formats in turn', rather than 'sources in turn'. I.e. you don't go source by source, but proceed format by format. First you generate mp3 for everything. If new sources come in, older sources wait, until everything has mp3. Then you do the 'main' format for H.264, then the others, least important formats last. This way the most important formats become available most quickly. As long as you don't run out of disk space, and you manage to process all sources within a few days, this should be full acceptable.

5.3 Corollary: Archiving of source materials

Clearly the formats described above make sense at this moment in time only (Nov 2007). The equivalent list similar list two years ago looked different, and the equivalent list in two years time will look different also. Future formats will have greater emphasis on HDV, and probably on Flash.

So how does one cope with future changes in formats? By archiving the source materials, ideally in source formats. Disk space is so cheap now, that there's hardly any reason to not keep your input materials.

If you keep your input materials, you will be able to re-encode these into more current formats.

You probably want to keep the source format, as this is highest quality. You might want to consider also archving a format that is fully open source (eg. mpeg2 iframe), but you probably don't want to do this instead of the source format, but as well as the source format. There are also problems with this strategy around HDV, as HDV (unlike DV) is not an iframe only format, and your disk space requirements would increase disproportionately.

Further information on Multimedia Encoding Workflow

5.4 Podcasting

As well as offering everything for download, you should syndicate your materials into podcasts. This is widely used, but also helps with low bandwidth accessibility. You should generate some the following feeds:

The most important ones of these are the iPod feed, and mp3 feed, and a mobile feed.


6 Related Pages

7 Implementation

Encoding a large range of formats from each piece of source media works best when it's embedded in an overall workflow. Clearly you couldn't possibly expect individuals at your institution to encode many different formats, no should they have to. Implementation is now very feasible, due to products like Apple Podcast Producer, Episode Podcast, and open source efforts like Berkeley's OpenCast. More information on this on this page: Multimedia Encoding Workflow.

8 Please comment

I welcome comments on the above article - my email address is below!

Page created: 2007-11-14. Page last changed 2008-12-27 13:17:44 .