Open Mic: The Value of Service Hardening
With competition for subscribers fiercer than ever, retention is paramount for every operator. As MSOs offer services ranging from VOD and cloud user interfaces to Wi-Fi and DVR, delivering reliability via infrastructure improvements has become key to keeping customers happy. It has also been proven to drive uptake of other services, diversifying revenues and creating stickiness. Roll the dice on service quality and operators risk churn and consumer price sensitivity.
When IBB Consulting has worked with operators on these issues, we’ve emphasized a focus on "product hardening" to help maximize increasingly important service quality. The latest hardening strategies use comprehensive techniques to continuously measure service availability and identify root causes of unavailability. Delivering the resulting reliable end-user experiences doesn’t just increase revenue, but offers long-term cost savings as well.
The hardening process begins by drawing out the network topology. The next step is to identify various sources for telemetry collection related to the service or product. Then combine and correlate the telemetry data to uncover general problem areas and hone in on specific points of failure. Starting from the end-point (the user's points of network access), work backwards, level by level, all the way to the headend.
For each network element capable of reporting telemetry data, identify common data elements. These can then be used to provide a precise and consistent way to measure and establish baseline availability.
Data from the user’s network access point includes button clicks, set-top box and modem readings, and viewership patterns. With sufficient data from the end-points, it is possible to pinpoint the most prevalent points of failure. Categorize the outages to determine the significance of each, such as a set of chronically poor-performing set-top boxes versus higher-level network outages.
The majority of outages are due to points of failure at end points, often linked to faulty devices, faulty telemetry collection mechanisms or poor on-premises wiring. In contrast, cluster analysis on device logs can help quantify unavailability due to larger scale outages on the HFC infrastructure. In some cases, multiple indicators of outages can determine root cause, and the combination of these indicative conditions in a rules table can help quickly pinpoint the most prevalent point of failure.
In the near term, these root causes and rules tables can be applied toward fixes and solutions for improvement in availability. Our experience indicates the ability to improve availability from 74 percent up to 99 percent.
In the longer term, all other available data can be collected to paint an end-to-end portrait of the customer life cycle, and ultimately increase the value of a customer over time. Combining data logs, statistics and reports from sources such as customer profiles, customer support calls, tech visits, usage, and service changes will help determine common failure points along the lifecycle that trigger customer disconnects. Utilizing the same analytical methods used for identifying network telemetry failures, it is possible to hone in on processes in the lifecycle that have chronically poor performance, or identify the most prevalent processes with failures. Then the focus can turn to remedying these issues to improve customer experience and retention.
Technologies we’ve seen employed with success run the gamut from data collection and manipulation tools to statistical processing and presentation software.
Some results we’ve seen in the field include:
• Wi-Fi hardening to achieve 99 percent availability and prepare service for exploration of wholesale opportunities. This was done by analyzing various points of the network to determine baseline availability and quantify improvements in availability.
• VOD hardening to ensure better customer experiences. The operator identified service issues by assigning a series of indicators (e.g. CableCard, chronic devices, and signal strengths) and determining root causes based on error type and the combination of indicators, using a dynamic user-driven rules table.
• Improving cloud DVR (cDVR) performance by reducing latency. This was accomplished by monitoring cDVR activity, utilizing receiver logs to calibrate latencies, and identifying transaction times from key press to video play and network calls from client to server, with roundtrip delays.
Implementing some of these suggested analytical techniques into active network monitoring tools can also add additional diagnostic mechanisms to an MSO’s tool kit. These tools can help make an MSO’s service remain a valuable lure for grateful customers.