It's every cable operator's nightmare: It's Super Bowl Sunday. Millions of viewers are camped out in front of their TVs. Parties have been organized around what is arguably the premier TV event of the year. Then the cable goes out.
A scene like that could cost a cable company dearly in its fight for respectability in the customer service war, undoing several years of efforts to improve network reliability, picture quality and service levels. It's the type of problem newspapers and TV stations love to report, and contributes to cable's image as an outage-plagued service.
It doesn't have to be that way. Improved products and components, fiber-rich architectures, back-up power supplies and new policies and craft procedures are now putting the 99.99 percent reliability benchmark within a cable operator's grasp. But is that important? Do cable systems have to actually reach that lofty mark to be perceived as reliable? And where did this perhaps unrealistic goal come from, anyway?
Addressing the latter question first, the 99.99 percent figure came from specifications written by Bellcore as a goal that manufacturers should engineer their equipment to achieve. But the number was chosen arbitrarily as opposed to being mandated by either government regulation or carrier request.
Nevertheless, the telephone companies have done a masterful job of implying that their networks routinely reach the "four 9s," but in reality, they probably don't even come close to achieving that mark, which equates to 53 minutes of outage time per year. No one outside of the telephone companies knows for sure, because they're not required to report overall availability figures to anyone.
In fact, when measuring system availability, telco mathematicians factor out the network switching fabric, as well as the customer premise equipment. Problems with commercial power also don't count. And the clock doesn't start ticking until the first customer calls to report an outage.
The cable industry, on the other hand, is faced with a number of challenges before it will ever be perceived as a reliable network. First, people watch TV much more than they use the telephone, so they're more likely to notice a cable outage. If there's a problem with the phone, the caller is greeted with a fast busy signal and simply redials to re-establish a conversation; but with a TV outage, the viewer has forever missed a portion of the event.
Research and some early studies of cable systems has shown that significant strides can be made in network availability by shoring up four major areas: plant power, network architecture, replacing aging coaxial cable and simply altering craft practices and procedures to be more mindful of their effects on the plant.
By far, the biggest bugaboo for operators is power. Commercial power quality varies greatly by location, and many areas are prone to short outages that might go unnoticed by consumers, but can interrupt bitstreams or telephony conversations.
"Most of our outages are of short duration," notes Tony Werner, vice president of engineering at TCI Communications. That's where strategically-placed standby power units come into play. "A lot of times, just by converting 10 percent to 15 percent of your plant to standby power, you can make a huge impact."
But adding standby power is just half the equation. Proper maintenance of the batteries that are used for backup is critical, too. Engineers contacted for this article routinely tell of stories where the standby unit kicked on properly, but the batteries were either dead, or so weak that they offered no backup. One person even said that he knows of cases where batteries were removed to be serviced and then never returned to the cabinet.
Other power-related issues include proper grounding and bonding, using surge arrest devices and proper documentation, according to Tim Wilk, director of strategic planning and technology at Scientific-Atlanta.
In late 1995, S-A undertook a study of three large, two-way active hybrid fiber/coax (HFC) networks that were located in different parts of the country. The sample areas covered about 3,600 miles of plant (3,000 miles of coax, and 600 miles of fiber) and passed about 375,000 homes. All had been recently upgraded, and management at each system emphasized good repair and maintenance.
"We were flat surprised at the level of reliability these guys were already achieving," says Wilk. On average, the three systems were achieving 99.98 percent availability. Translated to outage time, these systems were only off-the-air for about 123 minutes per year. The main outage culprits were power-related failures (27 percent), followed closely by hard-line coax and drop failures (see Figure 1). "It's surprising how many times per year that commercial power just isn't there," summarizes Wilk.
Although those availability figures were good (and some would argue were abnormally high), Wilk was able to identify some areas, outside of the addition of standby power, where significant improvements could be made. One chief area was reduction of node sizes.
By reducing node sizes to roughly 500 homes, the downtime of all components within a node could be reduced by 39 percent, according to a model that examines component failure rates and mean time to repair statistics for network devices in an HFC network. Why? Primarily because the reduction in node size eliminates the need for a second power supply.
Taken together, Wilk's research and modeling shows that HFC networks as presently architected for cable-TV applications can indeed be brought up to 99.99 percent availability (see Table 2).
Similar work performed by Arthur D. Little that excludes power from the equation shows that fiber transmitters and receivers are also major contributors to downtime (see Figure 3). But with power as the biggest cause of outages, HFC operators must use backup power to achieve the Bellcore benchmark of 99.99 percent. In fact, according to a presentation made by AD Little's Stu Lipoff during the SCTE/IEEE HFC '96 reliability conference last fall, HFC network operators would need in excess of 12 hours of backup power to meet the 53 minutes per year goal.
North of the border, Rogers Cablesystems undertook a year-long outage study of its system in Newmarket, Ont., where the Rogers "Wave" data service was rolled out. As outlined by the Canadian Cable Television Association's guidelines, the goal for Wave service availability, after accounting for all service interruptions, is 99.9 percent, or a maximum of 525 minutes of downtime per year. Eventually, the goal is to reach 99.99 percent reliability.
While the results of the Newmarket availability study were mixed, Rogers officials seem to be encouraged. Network availability ranged from a low of 99.13 percent to a high of 100 percent. Annualized, the network achieved 99.79 percent availability, or nearly 1,100 minutes of downtime for the year.
Factored into the availability calculations were outages caused by network maintenance, new plant construction activity, headend and fiber-related equipment failures, trunk and distribution failures, reverse noise and power failures.
A high percentage of downtime was attributable to human intervention. In Rogers' case, maintenance activities accounted for 21 percent of all incidents resulting in network downtime. Although Wave scheduled maintenance is restricted to Sundays between 2 a.m. and 6 a.m., it still accounted for much of the downtime. Other major contributors were trunk and distribution problems (24 percent), power outages (22 percent) and excessive reverse path noise (15 percent).
Of course, one key component to limiting outage time is mean time to repair (MTTR) statistics. The faster an outage is detected and service is restored, the better. Rogers' track record was, generally, quite good, but was skewed by a single construction-related incident that caused a three-day outage. But that was clearly an anomaly: roughly 70 percent of all equipment failures that caused outages were repaired in less than one hour.
Although the Rogers network is one of the few that is actively supervised by a sophisticated network management system, executives there have realized that a new model for network management must be implemented to reach high reliability levels. To gain improvements, Rogers intends to establish a network of regional network operations centers to monitor the HFC network. These NOCs will enforce change control procedures for all network maintenance activities and make sure that service restoration is being performed within the prescribed timeframes all while continuously tracking and reporting network performance.
Generally, it's safe to say that most cable operators track outages and their root causes, but few see a need to quantify network availability on a percentage basis. In fact, while overcoming outages is a high priority for most cable systems, most don't have the software or modeling information it takes to quantify how they're doing, according to Wilk at S-A. Equipment manufacturers and some consulting companies can take the raw data and come up with a number, but the formulas are not typically available off-the-shelf.
Reaching 99.99 percent availability "is a goal that most (MSOs) are talking about, but it's not driving their day-to-day business," notes Wilk. "Most operators have a ton of data about outages, but no idea how to convert it into availability figures." Not that they need to. Outside of using it to gain a competitive advantage against telephone companies and other competitors who tout their reliability, there is currently no reason to calculate network availability. But then, even the telcos don't do a good job of it because they don't have to.
In preparation of a paper he presented during the 1996 SCTE Conference on Emerging Technologies, TCI's Werner sought information about how the local telephone companies actually perform against their own objectives. He came up mostly empty.
The FCC and PUCs mostly gather data about lengthy outages and how quickly installations are performed, and the local companies, while acknowledging they didn't actually get four 9s, were reluctant to offer much else.
According to one Bellcore document, digital loop carrier availability in 1988 was reported to be equal to 99.94 percent availability, or about 316 minutes of downtime, Werner says. And that number, it should be noted, did not factor in any power-related failures.
To sum up, it's clear that cable's HFC networks are capable of providing high service levels, even when compared against a telephone network, which is actually used only about one-tenth as much. But there are additional strides that can be made, in both powering and internal maintenance policies. The former can be addressed with standby powering, use of better components, improved grounding and bonding and fusing. The latter will require diligence, strong policymaking, documentation and training and perhaps improved network monitoring.
"With surveillance technology, we know if a technician interrupts the communication path," notes Werner. "Without it, you don't know that a tech is out there, pulling a pad or something."
But all those people gathered to watch the Super Bowl sure will.
|1. "Proving how HFC networks can offer 99.99% reliability," Tim Wilk, Scientific-Atlanta, Telephony, June 24, 1996, pp. 156–164.|
|2. "Failure modes and availability statistics of HFC networks," Stu Lipoff, Arthur D. Little, HFC '96 conference, Sept. 1996.|
|3. "High-speed data services and HFC network availability," Esteban Sandino and Corianna Murphy, Rogers Engineering, Communications Technology, Jan. 1997, pp. 30–36.|
|4. "Asking key questions, seeking low-cost solutions," Michael Lafferty, CED, March 1996, pp. 56–59.|
|5. "Network availability requirements and capabilities of HFC networks," Tony Werner and Oleh Sniezko, TCI, 1996 SCTE Conference on Emerging Technologies Proceedings Manual, pp. 123–135.|