A rational way to deliver video description audio
I’ve written before about the problems of delivering the video description audio track to legacy cable set-top boxes. Now the cable industry seems to have figured out how to transition to a more rational approach. We’ll see if broadcasters are willing to go along.
Recall that video description is intended for visually impaired viewers. It’s a description of the on-screen action, delivered by a narrator during pauses in the dialog. It’s a second audio track that includes both the dialog and the narration.
In analog days, a second audio track was delivered using the SAP (second audio program) channel. That SAP channel was often the program dialog in a second language, usually Spanish. While delivery of a video description track was optional in analog days, some programs carried video description audio instead of Spanish. Cable set-top boxes and TV receivers allowed the viewer to select the second audio track, either with a button on the remote control labeled SAP or a setup menu entry that either said SAP or Spanish.
When we moved to a digital world, the cable industry adopted the MPEG method of signaling the language, using the ISO-639 Language Descriptor. Programs that carried two audio tracks (usually English and Spanish) used the three letter language code from the ISO-639 standard, usually “eng” and “spa”. Since there is no ISO-639 language code for video description, broadcasters and cable programmers in this country used a foreign language code, usually the code for Spanish but some programmers used the code for Portuguese. In Canada, some programmers used the language code for Middle English, a dialect from the Middle Ages.
Sometime after the delivery of digital video started, broadcasters decided that the MPEG standard for digital television did not convey enough information, so ATSC developed the standard A/65, known as PSIP. The PSIP standard takes the language signaling from a different descriptor, the AC-3 Audio Descriptor. While the ISO-639 Language Descriptor can only signal languages, the AC-3 Audio Descriptor can signal not only languages but also special purpose audio tracks like video description audio.
Cable operators typically do not use the A/65 PSIP standard, rather they carry additional information in the electronic program guide, delivered in an out-of-band channel. So cable set -top boxes continue to look at the ISO-639 Language Descriptor to determine the language of an audio track. In fact, cable industry standard SCTE 54 requires this behavior. And cable programmers, when they deliver video description audio, continue to label it as Spanish (or Portuguese) in the ISO-639 Language Descriptor.
Knowing that set-top boxes look at the ISO-639 Language Descriptor, broadcasters have continued to carry that descriptor to support cable set top boxes, and in addition they carry the AC-3 Audio Descriptor that is mandated in ATSC standards. An ATSC standard requires that if a program carries both the ISO-639 Language Descriptor and the AC-3 Audio Descriptor, they both signal the same language. So broadcasters label the video description audio track as Spanish in both descriptors, even though the AC-3 Audio Descriptor provides a way to explicitly signal when an audio track is video description audio.
In 2011, the Consumer Electronics Association adopted CEB-21, Recommended Practice for Selection and Presentation of DTV Audio. It recommends that TV receivers use the signaling in the AC-3 Audio Descriptor, and that they have separate user menu entries for language and for video description audio.
This would make it possible, for example, to deliver a program with four audio tracks, English, Spanish, English video description and Spanish video description.
That CEB-21 recommendation can be applied to cable set-top boxes as well as TV receivers. There is informal agreement in the cable industry that a transition is needed from today’s world, where a program has only two audio tracks and the second audio is labeled as a foreign language, to a world where a program can have multiple audio tracks and a viewer can select the video description audio track explicitly. And by the way, the FCC wants this too.
Some Comcast engineers have come up with a transition approach. While legacy set-top boxes continue to look at the ISO- 639 Language Descriptor, new set top boxes would look at the AC-3 Audio Descriptor.
For a program carrying a video description audio track, that track would be labeled as Spanish in the ISO-639 Language Descriptor but as English video description in the AC-3 Audio Descriptor. The language labels would disagree. That’s contrary to today’s practice, both by broadcasters and cable programmers, where the language labels in the two descriptors agree.
So new set-top boxes would ignore the ISO- 639 Language Descriptor, which is contrary to SCTE 54. SCTE 54 needs to be changed.
Eventually, the legacy boxes would fall out of use, and the transition would be complete. All boxes would look at the AC-3 Audio Descriptor.
SCTE 54 can be changed, but more difficult is changing the programmers’ and broadcasters’ behavior. Cable MSOs can negotiate with cable programmers, and educate them about the benefits of this transition plan. But negotiating with the broadcasters in retransmission consent contracts? We know how that’s been going.