Finding the needle in a haystack
Video service quality issues in large-scale networks.
When it comes to video services, “good enough” isn’t good enough anymore. Consumers are comparing the quality of video when choosing their video service. Subscribers’ quality of experience (QoE) is a competitive issue.
As hundreds, and sometimes thousands, of services traverse (in the form of transport streams) to different locations, digital services are added, and channel lineups reconfigured, so operators need to know exactly when issues arise that are going to impact service quality.
Figure 1: Large-scale network with multiple video headends.
An effective means of getting that kind of information is the deployment of a real-time, comprehensive monitoring system that alerts the operator immediately. Such a monitoring system should provide detailed information and tools for effective troubleshooting in order to reduce the mean time to recovery (MTTR) issues, and to eliminate them from recurring.
Quality reports and data that are accurate and actionable from various strategic monitoring points help operators continuously assess and improve service quality.
CHALLENGES IN MONITORING LARGE-SCALE VIDEO NETWORKS
Point-to-point quality assessment, problem isolation and root-cause analysis are especially challenging in a large-scale video network, which involves multiple headends, the long-distance transporting of video, and numerous service acquisition and aggregation points.
Many content-level errors, video/audio service disruptions and quality degradations (e.g., frozen, black or tiling screens) can be introduced in the video headend during encoding, decoding, splicing, multiplexing, rate-shaping and re-configuration.
Video services going in or out of the headend are subject to noise/interference, connection problems and network jitter. These are the strategic places operators must continuously monitor for service quality degradations. Normalized quality reporting across all of these points allows for an objective quality comparison. This type of reporting will aid in effective and rapid troubleshooting from location to location.
STEP 1: Monitor QoE errors with accurate alerts and actionable reports
An accurate quality assessment will pinpoint the actual subscriber impact – which negatively affects QoE – in real time, and will provide actionable data and reports that expedite repair of critical errors. Operational and engineering staff and managers must be able to quickly identify the most critical areas that require immediate attention.
Video service quality can be measured by how many critical QoE errors have affected a large number of subscribers on a daily, weekly, and/or monthly basis. Effective, real-time monitoring can alert operators to quality degradations at the customer premises, or worse, service disruptions, in turn minimizing service calls and truck rolls.
STEP 2: Assess quality by reviewing the amount of qoe issues by type and location
Figure 1 is a simplified depiction of a large-scale video deployment. The national programs are acquired by the main headend and are then processed and transported (via the IP network) to local headends for downstream distribution. Video headends are usually responsible for receiving, acquiring and processing national and/or local programs, and most are very heavy on encoding, transcoding, multiplexing, rate-shaping or local program/metadata/file insertion. It is important to monitor the input and output of the headends due to the complex, real-time video processing, and also due to the IP transport for long-distance transport.
Figure 2: Overall video service quality daily report by location and program. Note: HE = headend.
Operators should be able to assess overall service quality by location. In Figure 2, the sample report identifies the number of critical QoE errors and highlights the top offending programs at each location.
The report makes it easy to identify places requiring the most attention. For example, of the 10 headends being monitored, headend site G has the most critical QoE errors (by program), and the top offending program is national program 10. From this high-level view, the operator should then be able to view a more detailed report for headend site G to identify what types of critical errors are occurring and where these errors were introduced. A sample site report for national program 10, with a breakdown by error type, is shown in Figure 3.
Figure 3: Video service quality daily report by location and error type.
Figure 3 quickly points out what type of QoE errors and how many have been introduced from one location to another. Monitoring locations 1, 2, 3 and 4 in Figure 3 correspond to the ones labeled in Figure 1. The report shows the number of QoE errors introduced at the source (monitoring location 1), as well as the number that occurred at each location throughout the network.
To accurately evaluate the QoE impact by location, it is highly recommended that operators compare and contrast the QoE errors between the input and output of the system. It is also worth noting that only a sampling of QoE errors are shown in Figure 3. In reality, there could also be QoE errors in the tables (PSI, SI and PSIP) and in the carousel data. These all need to be constantly monitored and reported for a complete and comprehensive QoE assessment. Operators must proactively monitor and fix critical QoE errors at the source, or the input, of any complex system (headend or transport network) to avoid the classic “garbage in, garbage out” syndrome.
STEP 3: Determine the cause of the critical qoe errors
By using the video service quality reports, the operator can quickly identify the locations, as well as the type of QoE errors, that occurred. Now it is time to fix the errors and improve processes to reduce the recurrence of these errors. Eliminating chronic QoE errors – rather than temporarily fixing them time and again – dramatically reduces operational expenses, makes a major impact on uptime statistics and improves the overall subscriber experience.
LET’S EXAMINE HOW AN OPERATOR CAN SOLVE A RECURRING QOE ISSUE
In this example, the operator needs to determine the cause of one of the QoE errors (video freeze/black) that was introduced between monitoring locations 3 and 4 for headend site G. Video freeze/black can be caused by a number of issues, including improperly configured equipment, problems with individual headend components (i.e., an encoder, mux, server, ad splicer or switch/router), a temporary network failure in the headend, or even a bad (video) connection.
Figure 4: Program historical report.
To determine the root cause, the operator must run a variety of “historical reports” that use a range of criteria, including bandwidth, MPEG discontinuities and network jitter. By running a bandwidth graphing report for the time frame during which the error occurred, the operator can determine that there were no significant bit rate errors or MPEG discontinuities (see Figure 4).
Next, the operator can look at IP statistics (IP packet interval arrival rate and MDI-DF) to check the health of the network during the same time frame (see Figure 5). The report shows that the network was fine, with hardly any jitter present.
This is a classic example of a major service disruption that passed all of the MPEG and IP network tests. In this case, the operator was able to use the right tool to generate reports, which helped to eliminate other possible causes, such as those introduced by an encoder, multiplexer, switch or router, and identified a bad video cable connector at the encoder input. Once the connector was fixed, it successfully reduced the recurrence of both intermittent video freeze and tiling errors.
Figure 5: IP statistics report on national program 10.
For a large-scale video service deployment, operators that develop a strategy to monitor critical system input/output locations and choose the right monitoring tools that can accurately alert and generate all kinds of key reports to help safeguard subscriber QoE can continuously improve video service quality and operational efficiency.
Quality metrics have to be based on QoE impacting events, and a service quality report is only useful if the data behind it (i.e., quality metrics and historical reports) is comprehensive, objective and actionable.
Monitoring at strategic locations matters because operators can use location-based reports to quickly assess QoE metrics (by service/program) across key locations on a daily basis. These reports can be used by operations and engineering to facilitate problem isolation, therefore reducing the mean time to repair and the possibility of the error recurring.
Operators need different types of historical reports for troubleshooting, as time-based reporting and correlating capability is essential. It makes every QoE error actionable, allowing problem isolation and root-cause analysis to be possible. In addition, these reports can also serve as proof of QoE errors that operators can use to effectively collaborate with (equipment) vendors and program providers.
As new services (e.g., VOD) and programs (e.g., HD) are added and competition intensifies, operators must monitor to assure the highest quality of their video programs. This will significantly impact the quality of the services delivered today, and therefore the success of the business in the future.