Meeting VOD growth projections with a CDN architecture
Clearly, legacy solutions are reaching their limits.
Expanding libraries, growth in HD and emerging time-shifting TV applications are putting significant demands on MSOs’ infrastructure. Clearly, we are just getting started. SNL Kagan estimates that major MSOs will move from an average of 6,700 titles in 2009 to 20,000 titles in the next 12 to 24 months. New initiatives, such as Comcast’s Infinity, will only increase these library offerings.
As titles increase, the shift toward HD on-demand is following in lockstep. HD content accounts for less than 10 percent of VOD viewing today, according to SNL Kagan, but it will become the primary choice within five years. The proliferation of new consumer devices and heightened viewer expectations promise to keep this market accelerating.
How do VOD content libraries maintain this expansion cost-effectively? So far, traditional VOD solutions have accommodated growth with brute force. When more streaming capacity is required, more servers are racked up. When the content library grows, more disk drives are added.
This has worked to a point. The market, however, has now reached an inflection point. Legacy approaches are no longer able to keep up with the accelerating demand for HD content and rapidly expanding video libraries.
To meet these new requirements, operators are faced with three choices:
1) Increase storage capacity at each headend: This is a straightforward solution, but not the most cost-effective. Rack space and power requirements will increase, and content management of large libraries at each and every headend becomes an operational nightmare.
2) Centralize delivery: The flipside of duplicating storage everywhere is to consolidate it at a central location. This saves on storage costs but dramatically increases network transport costs across the metro and national backbones. And, as on-demand services grow in popularity, the bandwidth costs scale proportionally.
3) Take a CDN approach: Leveraging the traditional Internet CDN model, this approach, illustrated in Figure 1, replicates only popular content at the edge of the network. Less popular content resides within the network core, eliminating unnecessary storage replication at each headend.
With real-world VOD session data, we examine the CDN approach to content delivery. A CDN-based VOD delivery solution provides deployment flexibility and allows for the scalable growth of content libraries and streaming capacity without incurring the significant storage costs often associated with service expansion. Given VOD growth projections and its competitive advantages, it is not a question of whether operators will employ the CDN approach, but when.
THE PROMINENT “LONG TAIL”
There is perhaps no other principle in the VOD world that has gotten as much attention as the “long-tail” phenomenon. Popularized by writer Chris Anderson and applied to video-on-demand, this theory states that most VOD requests will be for a small pool of popular titles, while a broader variety of titles will appeal to the unique tastes of each individual.
Applied to VOD service, this theory is useful for operators in planning storage (selection) and bandwidth (streaming capacity) requirements. It validates the benefits of a CDN architecture, which inherently offers the advantages of optimizing storage and streaming independently. Large, centralized storage systems can be placed within the network core to provide title selection, while high-bandwidth streaming servers can be placed at the edge to provide the streaming capacity for the most popular titles.
TRADEOFFS WITHIN THE CDN
At the service provider level, a CDN architecture may provide the greatest advantages in operational performance and capital cost. However, at the component/tier level, content delivery systems should be optimized for either bandwidth or storage capacity to reduce overall deployment costs.
For example, a low-bandwidth network-attached storage (NAS) device in the core of the network allows an operator to serve a large VOD library to their consumers. However, NAS systems are not bandwidth-optimized and will not support significant streaming bandwidth. Alternatively, a streaming server can support a significant amount of bandwidth but is cost-constrained in its ability to simultaneously serve a large content library. With this basic understanding, an operator can make optimal choices at each tier of their network and reduce overall deployment costs.
REVEALING REAL-WORLD ANALYSIS
Based on discussions with many operators, the CDN capacity planning of on-demand access is often constrained by the amount of network bandwidth available between the central store and regional cache, and between the regional cache and the edge cache. Service planning often begins by targeting cache hit ratios of 90 percent for the streamers at the network edge. This allows operators to plan around a fixed-transport bandwidth requirement when calculating the storage requirement at each network tier.
In a recent study, Verivue analyzed actual VOD usage data from leading cable providers to determine the real-world caching requirements in a CDN deployment. This analysis allows operators to more accurately plan their edge caching and network resource requirements.
For a library of 8,500 titles, this study shows that 90 percent of VOD sessions can be served from an edge streamer caching just under 30 percent of the most frequently requested titles. That is, the edge streamer 29 can achieve a 90 percent cache hit ratio with just 2,255 titles. The remaining 10 percent of user requests are served from the regional cache or, for even longer-tail content, served from the central store. As shown in Figure 2, higher or lower cache hit ratios are achieved by increasing or decreasing the edge streamer cache size, respectively.
Based on this study, an overall VOD library of 10 TB requires a 3 TB edgestreaming cache to achieve a hit rate of 90 percent. In a headend requiring delivery of 10,000 simultaneous SD VOD streams (assuming 3.75 Mbps SD bit rates), 9,000 streams (or 33.75 Gbps) could be served from an edge-streaming cache of 3 TB, while the remaining 1,000 streams (or 3.75 Gbps) would be served from caching elements located in the core network.
With this data, it is clear that an optimized balance between network transport costs and cache sizes could lower overall infrastructure costs. For example, less storage on the edge could reduce the cost of the edge-streaming cache. A 1 TB cache (10 percent of the VOD library) may be less expensive but requires the network to support 12 Gbps of streams from the core.
This analysis indicates that one can use a reasonable cache size at the edge of the network today with great benefit. Yet, it hardly addresses the two major trends in the on-demand market: HD and growing library sizes. Extrapolating from this data, Figure 3 provides a better illustration of how these two developments will further impact edge cache sizes.
In the left column, the number of title hours increases from 8,500 to 42,500, a factor of five. In the top row, we see the HD percentage of the VOD library, where HD titles require four times the delivery bandwidth of SD (typically 15 Mbps) and four times the storage. It is evident from this analysis that the cache sizes at the edge will expand considerably.
The careful selection of a caching algorithm is essential when planning a CDN deployment. Which titles should be cached at the edge of the network, and which content should remain at the core? Content popularity is fluid. “Top 10” lists may change from week to week, if not day to day. Ideally, with the right placement algorithm, an operator should be able to optimize the network transport and edge-caching costs to balance the overall infrastructure investment. The challenge is the dynamic nature of viewing habits and content churn.
Verivue studied this issue to determine how caching algorithms might better optimize cache sizes. By relying on statistical trends in viewing habits, intelligent algorithms can be employed to improve caching and network investments.
For this particular study, a comparison was made between the typical leastrecently used (LRU) algorithm used by most existing implementations versus an algorithm based on content popularity thresholds.
To clarify, with an LRU-based algorithm, requested content resulting in a cache miss is always cached, and to make space in the cache for this content, LRU content is deleted from the cache. In contrast, a threshold-based algorithm monitors statistical trends of content usage, and based on threshold crossings (e.g., “n” views within a time period “T”) determines which content is stored and which content is deleted from the cache.
Threshold-based caching algorithms are more sophisticated and avoid the well-known issue of cache pollution associated with simple LRU-based caching algorithms. Cache pollution is caused by long-tail content that is automatically placed in the cache by a simple LRU algorithm but is never requested again before the LRU mechanism deletes it from the cache. In other words, the long-tail content occupied valuable space in the cache but was never requested again before being deleted.
Figure 4 takes the studied data just mentioned and examines the effect the two different caching algorithms have on cache storage requirements.
As shown, threshold-based caching offers far better efficiency than the standard LRU-based content caching algorithms. By intelligently monitoring content usage behavior, this algorithm reduces the storage requirements at the edge necessary to achieve various cache hit ratios. For an 80 percent cache hit ratio, threshold-based caching reduces the storage required by more than 30 percent. It also enables a 2,200-title cache to satisfy 10 percent more user requests. By intelligently ensuring the proper titles are cached locally, this optimization allows operators to further stretch the usefulness of their edge-caching streamers. This balancing of costs between edge caches and transport bandwidth is a significant operational advantage. With more accurate data and better algorithms, operators are better positioned to optimize performance and costs.
It is up to the network planners to build the service infrastructure for the explosion in VOD demand. Clearly, legacy solutions are reaching their limits. The Internet’s CDN model is a proven, pragmatic solution to this evolution. By balancing storage and streaming requirements, operators gain important advantages in service performance and operational costs. This distributed network architecture, together with intelligent caching algorithms, promises to solve the challenges of growth.