Transcoding: Presto Change-O!
Transcoders, those magical devices that transmute video from one format to another, are turning out to be the linchpins enabling many of the trends in video today – TV Everywhere, smartphones, tablets, content delivery networks – by ensuring that content can be quickly and inexpensively reformatted so that it can be delivered to any electronic device capable of displaying it.
Transcoders are proving to be powerful tools that can help mitigate or offset the costs of additional storage, distribution and transport. In some cases, the uses of transcoding can even obviate some of those needs; put transcode here, and that reduces the capital equipment required there.
The extent to which transcoding affords flexibility is still being explored. The uses of transcoding are many, and the quality – and expense – of transcoding products spans a range that maps to the variety of uses and objectives, so where the best places in the network are to deploy transcoding resources for maximum benefit is still being discovered.
“Since no one’s rolled out TV Everywhere on a large scale, no one has the answer on how all this will be deployed. How the market develops will be interesting,” observed In-Stat analyst Michelle Abraham.
What is clear is that there is a large and growing demand for transcoders. In-Stat determined that the market for multi-format encoders was worth $117 million in revenue in 2009 and projects it will more than double to about $297 million in 2014.
For starters, today’s transcoders are powerful enough to replace racks of equipment, and for relatively low expense.
“In the old days, you had to put in racks and racks of encoders, SDI switchers, ASI switchers, and lots of different devices before and after, and put in redundancy schemes – that became your encoder farm. Now you buy one or two boxes that do everything you did before with no switchers,” said RGB vice president Ramin Farassat.
There are three general areas where transcoding is going to be deployed: at the origination point, at the media center and at the edge.
POINT OF ORIGINATION
Content originators – and that often includes not only content developers, but CDNs as well – don’t have to do their own transcode, but somebody has to do it, and not everybody in the distribution end has the resources today. So some content owners are performing transcoding on behalf of their customers.
It seems obvious that small- and medium-size service providers might need this – and they do – but even larger companies are interested in having their originator partners perform the task. Brian Matherly, systems engineering manager at Sencore, reported that the company has a customer considering doing the transcoding of content into multiple formats in advance for one Tier 1 customer and also for a mobile customer.
Video has to be delivered in different video formats (MPEG-2, MPEG-4/H.264, Adobe Flash, Microsoft Smooth Streaming, Apple), at many different resolutions, both interlaced and progressive (1080i, 1080p, 720i, 720p, 480i, 480p), for many differently sized screens (TVs, PCs, notebooks, netbooks, tablets, smartphones, etc.), and at different bit rates to accommodate both the different bandwidth capabilities of different networks or the traffic management considerations within a delivery network.
The benefit of the originator doing the transcoding is that it might only have to be done once.
The downside is that almost every variable listed above is a multiplier, meaning that many more copies of any single asset have to be made (multiplying the need for storage) and transmitted (multiplying the amount of bandwidth consumed).
As a practical matter, nobody is going to transcode and store every permutation. An originator might choose a very small number of the most popular formats and resolutions, then. But that shifts the responsibility of transcoding for less popular formats elsewhere. Furthermore, what happens if a new format becomes popular? The transcode task for an originator will be costly, and especially daunting if it has an expansive content library.
Those downsides, combined with the transcoding and storage burdens on the originators, and the bandwidth burden on the originator and whomever they’re shipping to, make the point of origination an inefficient place to perform the bulk of the transcoding that must occur.
IN THE MEDIA CENTER
The most efficient place now and for the next few years to do transcoding is probably where the vast amounts of video are stored and served from. Imagine CTO Ron Gutman said that would include organizations such as Hulu, the Comcast Media Center, Amazon or Netflix.
Gutman said: “The best cost/performance is if you centralize it. You pay by having more storage, maybe, but storage is not that expensive in CDNs, and especially in VOD farms.”
And, of course, the costs of storage continually decrease.
“The cost of transcoding at the edge is significantly higher,” Gutman continued. “You might choose to use a worse transcoder, but then you pay in bit rate, which translates into more storage. If you use any of the off-the-shelf transcoders, it gives you 1080p. At good quality, you’ll need 5 to 6 megabits per second. If it’s a good transcoder, you might even go to 3 or 4 megabits per second, and you would get better video quality.
“But content that’s live, there’s an even bigger mismatch. Maybe 95 percent of it is 1080i, and you have to convert it at the transcoding farm, into 1080p, 720p, 480p – because of multi-screen, and new TVs are all progressive. If you do de-interlacing in software, the quality will be significantly worse than if you do it with a good transcoder. You pay up to a 100 percent tax in quality and bit rate and storage if you don’t do it centralized and with good transcoders,” Gutman continued.
“That’s unfortunate because we sell transcoders, and maybe we sell a little less transcoders,” he quipped.
AT THE EDGE
“We’re seeing more and more customers asking us to transcode from standarddefinition to smaller resolutions, or from high-definition to standard-definition. That’s becoming more and more common,” said Farassat. “Transcoders work great for that; you just bring in IP, and we give you SD output.”
One thing: A multichannel video programming distributor (MVPD) could do all of its own transcoding for its on-demand, but in a multi-screen environment it might be as impractical to do that for an MVPD as it is for content originators – for some of the same reasons, but for some different ones, too.
For on-demand content, doing it all in advance and storing it is too costly and too great a strain on storage, even if storage is relatively cheap.
Transcoding live content is a whole different ballgame, and there are technical hurdles still to clear.
On the on-demand side, the approach that seems to be gaining favor is doing some pre-processing on each asset (which might include transcoding) and then fragmenting it before it gets shipped to wherever it is going.
Fragmenting (also called publishing by some) is a fairly new concept, the basic idea of which is to remove some of the multiplying variables from the processing equation. The key variable being considered is the format, also being called “the wrapper.” You take your asset, copy it in the most popular resolutions and then leave it alone until a customer wants it. When you get a request for that asset, you determine whether the end device runs Flash, Smooth Streaming or some other format; you put the appropriate wrapper on that asset, and the appropriate digital rights management (DRM) tags; and you send it out. You reduce your storage burden by a factor of the number of formats you must support.
Picking resolution as the common denominator for your transcoded video is one way to do it.
“A lot of these profiles are shared across devices,” Farassat explained. “The iPad has the ability to support multiple resolutions, iPhone does the same thing, almost every Android phone I know also supports multiple resolutions. If you created your profiles such that they work on all these devices, you might as well have them all transcoded and ready for when you want to support that device.”
“We’re all looking at using 264 for video and AAC for audio,” observed Tom Lattie, the general manager of Harmonic’s Rhozet business. “Now the challenge is how those video and audio assets are wrapped and packaged and fragmented and DRM’ed. So in that context, although there are occasionally some differences, truth be told, whether it’s Windows mobile phone or an iPhone or a Flash-enabled phone, you’re beginning to see convergence on screen resolutions.”
Even if you’re off by a couple of pixels, he noted, few people, if anybody, can tell the difference.
An alternative is to think of bit rates as the common denominator. An advantage to that is that some bit rate requirements will overlap – the high-end option(s) for a tablet might overlap with the low-end option(s) for a computer, which might save you a stream or more when transmitting.
The idea here is to create a set – Lattie called it a “bouquet” – of the most frequently used bit rates. “Say you’re doing iPhone delivery, you take the three lowest bit rates of this bouquet,” he explained. “At the same time, I’m fragmenting and packaging and wrapping it. Then I take maybe the third, fourth and fifth bit rate for the iPad. Then maybe for my Adobe – I’m targeting smartphones and PCs – I take maybe three, four, five and six, all from the same bouquet, and then in one-tomany fashion – I’m not transcoding; I’m fragmenting, wrapping, modifying metadata, adding my encryption – I’m fanning out my already prepared audio-video streams into the more specific wrapper needed for delivery.
“So we see a real trend beginning to happen in this space, particularly talking to people doing live multi-bit-rate delivery,” Lattie continued. “Even in the all-inclusive transcoders, ours and a lot of our competitors, in roadmaps, are talking about building this architecture, even within the unified box, because today, if I wanted to do a program, and I wanted to do three bit rates of Apple and three bit rates of Smooth, often that’s six transcode processes going, it’s a linked chain – so you have a scale problem. It’s high expense. You’re building the whole thing from end to end. In reality, the video processing stage can be unified even though the distribution wrappers or formats may be different among the multi-bit-rate architectures.”
One reason why fragmenting is interesting is because the transcode and the fragmenting do not have to be colocated, not within a headend, and maybe not even within the delivery network, which creates options for network architecture – and for saving resources.
A couple of years ago, an MSO might have had many local headends with one transcoder for each channel – “50 transcoders to transcode CNN? It doesn’t make any sense,” Lattie observed.
The idea would be to start with a super headend and put in only one transcoder per channel to generate three resolutions and bit rates of that channel. It would output a single, synchronized multicast of the three video streams across the backbone.
“So my first gain is fewer headend resources,” Lattie said. “I’d be making one copy per bit rate instead of two, and on my backbone I’m using half the bandwidth. At the edge of the network, I’d have a publishing platform that would receive the transport stream from the national headend, and then on-demand, or in real time, I would take those three streams and generate a uniquely wrapped Apple multi-bit-rate version of it, and I would also generate the Smooth version of that.”
An additional profile would be easier to add.
“If I wanted to then say, well, in this part of the country I also want to enable Adobe Flash fragmented file streaming, as well, the only thing I need to change is to simply add another output profile for those three bit rates on the edge publishing platform,” Lattie said.
A possible fringe benefit is that the service provider can select best-of-breed products in each category, or perhaps better control their transcoder costs by buying only as much quality transcoding capability as needed in any particular place where transcoding is needed.
As cable and satellite service providers look into taking all of their live content and distributing it as multi-screen, live through CDNs or over their own networks, there are other challenges to meet beyond the expense.
“That’s a lot of transcoding, and you need to do new things,” Gutman said. “A traditional transcoder implemented in software, such as those used by Amazon or Netflix, they’re just not capable of doing it. Traditional hardware encoders also don’t support it because you need multiprofile, multi-resolution and protocols that are new that aren’t covered by any hardware manufacturers. So how do you bridge those two schools of transcoding to take live content into multi-screen? One big problem is just to support all of the resolutions.
“The other problem is that you need to do conversions from interlace to progressive. Sounds like small feature, but this is what big companies have been working on for years,” Gutman continued. “Because the original video is interlaced, when you convert it in software to progressive, you get lines everywhere. That’s on premium HD content. How do you take 1080i to 720p and still keep the quality? How do you take 60 fields and turn it into 30 frames per second? It just looks horrible.”
IN THE FUTURE
Having transcoding is helping to pave the way for network convergence.
“Every transcoder that I’m familiar with is designed to plug into an IP network, and IP makes things easier,” Farassat said. Referring to interfaces such as SDI and ASI, he said, “We’re moving away from those traditional video interfaces that quite honestly were cumbersome, because they required additional devices before and after.”
Quite important, he said, “if you look at the architecture for video-on-demand and compare that to the architecture for live video, and it doesn’t matter if you’re in the telecom environment and doing this as IPTV, or if in the cable environment and doing it over QAM, these two architectures are very, very different. … When you move into this new three-screen streaming environment, the difference between an on-demand architecture and a live architecture almost goes away they become so similar. That leads to huge operational simplicity. You don’t need racks and racks of video servers, you don’t have to have lots and lots of VOD middlewares – everything is just HTTP transmission.”
Another interesting application for transcode is in ad insertion for on-demand content. To insert an ad, video often has to be taken out of its compressed form, after which ads can be inserted, and then the whole stream is re-encoded.
“People are now asking about ad insertion in the compressed domain, not in baseband,” Matherly said. To do that you’d have to retrofit the system to incorporate the necessary transcoding resources, he explained.
Sometime within the next year or so, transcoding may percolate into the consumer sphere. Intel was at CES showing its Quick Sync transcoding. The company showed the ability to quickly reformat MPEG-4 video resident on a PC to be synced to an iPad. Quick Sync will be available on its secondgeneration core chips that will appear in PCs and possibly other products.
Also at CES, Broadcom showed a chip designed for home gateways that includes what it called a real-time transcoder capable of taking broadcast, over-the-top or user-generated HD video and reformatting it into a wide range of resolutions, formats and bit rates.
Matherly doesn’t believe transcoding in the home is going to be cost-effective. “You cross a threshold at the customer premise. It's very price-sensitive there – that's where you make quality tradeoffs."
On the other hand, it doesn't take an engineer to observe that the processing power required for quality transcoding at a consumer price point is only a few generations of silicon away. And then the market might change again.