Open Architecture Solution delivers a comprehensive approach to assuring pinpoint
accuracy and rapid response to performance faults in the HFC distribution network.

INTRODUCTION
One of the greatest challenges MSOs face in their migration to next-generation residential and business services is the need to meet unprecedented network performance requirements at reasonable costs with minimum disruption to customers when problems occur.

While delay, jitter and other quality-of-service performance complexities of advanced services can be addressed through solutions at the applications levels, the limitations of legacy status monitoring and operations response systems remain a source of major concern to operators when it comes to PHY-level performance requirements.

This is all the more the case now that operators are offering digital voice service, where the sensitivities of isochronous communications have made it more difficult than ever to consistently meet network performance requirements.

Too often potential problems in line electronics, optical nodes, taps and other trouble spots go undetected until they generate customer complaints, and once complaints are registered, it frequently takes technicians many hours to find and fix the problems, leaving unsatisfied customers without adequate information as to what is wrong and when to expect redress.

Clearly, the high performance and high availability requirements of today’s state-of-the-art HFC networks require an all-new approach to status monitoring and operations. One approach is to employ a standards-based software solution that interoperates with existing network equipment in the DOCSIS domain.

Such a product suite provides operators a means by which they can immediately pinpoint the sources and causes of problems in network electronics while ensuring that accurate, highly specific information respecting the nature and handling of the problems is immediately conveyed to customer service offices. The open architecture platform has been shown to reduce trouble ticket-related operations expenses by as much as 30 percent with commensurate improvements in customer satisfaction and churn.

SOLUTION OVERVIEW
The open architecture solution combines the data collection and diagnostic capabilities of a unique advanced status monitoring system with the open architecture of an administrative operations support system to provide a highly streamlined, automated means by which cable operators can facilitate locating, reporting and fixing of performance problems in network electronics and passive elements, including nodes, trunk amplifiers, bridgers, line extenders, power sources and taps.

Rather than relying on the out-of-band polling techniques of proprietary status monitoring systems associated with various vendors’ network electronics, the status monitoring solution taps into the DOCSIS platform to compile and analyze downstream SNR data collected from cable modems (CMs) and upstream CM reception SNR collected from CM ports, including ports associated with DOCSIS-enabled set-top boxes (STBs) in the cable modem termination system (CMTS).

By associating the impairments of specific modem transmissions with specific line electronics through reference to block diagrams of cascading network components the platform detects and logically estimates the locations of failure or sub-par performance in the network. Technicians immediately obtain the information they need to precisely locate the problem without having to resort to the trial-and-error method of generating spectrum analyzer readouts from one transponder to the next.

Adding to ability to precisely locate and identify the causes of problems, the product suite has an optional ingress management component that allows operators to automatically identify ingress impairments in the upstream path by leveraging control over ATT BGSs (attenuation bridger gate switches). By switching ATT BGSs to generate higher power upstream CM signals from each branching trunk separately and sequentially, the system can locate any ingress noise source by polling all the relevant upstream CMTS ports, since only the ports from modems in the ingress path will show increased noise when the RF signal is increased.

By identifying via the reference block diagram the farthest amplifier in cascade that is passing the affected CM signals, the ingress manager can isolate where the ingress problem originates. And if only some CMs served by that amplifier are registering increased noise, it will be obvious that the problem is to be found in the passive connections associated with those modems.

Because the solution relies on the standardized SNMP (Simple Network Management Protocol) messaging and data collection architecture of the DOCSIS framework, it can be used in any vendor system environment without reliance on proprietary elements. At the same time, because the open architecture layer of the platform employs open interfaces, the status monitoring component of the platform can be used in conjunction with legacy status monitoring systems, allowing the solution to feed input from such systems into the Diagnostic Engine of the system to enhance overall monitoring performance.

Moreover, through integration with higher level external operations support components the open platform supports seamless exchange of vital information across the OSS environment. Analytic results and customer alert information can be generated from the platform directly into legacy CRM systems and other OSS components.

Now operators can look for and identify potential problem spots routinely via the automated polling capabilities of the system long before the problems become sources of customer complaints. And when a problem of sufficient magnitude to draw customer attention suddenly occurs, the operator can quickly pinpoint the source and send the appropriate technicians to fix it.

As soon as the problem is identified and throughout the ongoing repair process, the open layer of the platform can generate information directly into the CRM system to ensure that affected customers are kept informed. And by knowing immediately whether a problem is of sufficient scale to affect a large number of subscribers, the operator can assign sufficient customer service staff to the affected area so as to avoid an overload of unanswered calls on the customer service lines.

THE ARCHITECTURE OF AN OPEN SYSTEM
The complete array of servers and software systems comprising an open system as described  is contained in a highly scalable network facility referred to as the OSTM Station.  One OSTM Station can be associated with up to 15 CMTSs and up to 76,800 CMs in the cable system. Up to 20 OSTM Stations can be aggregated into an OSTM Domain to enable a fully integrated system-wide operations management capability across the entire metro or even regional cable footprint.

The OSTM Station contains three main servers and is augmented with the addition of a fourth server in conjunction with implementation of the optional ingress management platform. In addition, when multiple OSTM Stations are aggregated into an OSTM Domain, the platform employs what is known as a Centralized Engine (CE) as the core server that integrates multiple OSTM Stations with the upper level OSS systems while aggregating all CM/STB address information from each OSTM Station. The CE also provides a directory service for external applications to use when searching for data on the status of CMs and STBs. The CE can serve up to 20 OSTM Stations.

SOFTWARE COMPONENTS AND SYSTEMS INTEGRATION
The platform eliminates use of the proprietary commands of CMTS, CM, DOCSIS STB and DHCP (Dynamic Host Configuration Protocol) software vendors, relying instead on SNMP (Simple Network Management Protocol) –based standards and other DOCSIS protocols to communicate with network elements.

The server software components of the OSTM Station employ open-architecture-based interfaces to facilitate integration among the upper level systems with which the platform interoperates. These upper levels systems include NMS (Network Management System)/Ingress Monitoring, CAD (Computer Aided Design)/Mapping, SMS (Subscriber Management System)/Billing, Provisioning and existing STMs (status monitoring systems). This integration allows information from the platform to be provided to these systems, for example ensuring that information such as HFC/RF network data can be referenced flexibly based on the access rights of users within various OSS/BSS domains.

These external systems connect to platform resources either through Java-based APIs (applications program interfaces) or via data base connections that use SQL. In instances where several OSTM Stations are integrated into an OSTM Domain, the CE server becomes the point of interconnection with these external systems, affording the same flexibility with respect to open API options.

This means the CE can be integrated with multiple SMS/billing systems that might typically be found in a multi-franchise MSO environment. The platform also includes the option to integrate with these back-office systems through the widely deployed Cisco Network Registrar, which is a full-featured DNS/DHCP (Domain Name Server/Dynamic Host Control Protocol) server providing naming and addressing services in conjunction with tie-ins to service providers’ OSS/BSS. This access to OSS/BSS APIs via the CNR has the added benefit of facilitating terminal policy configurations and the retrieval of IP addresses.

The software product series runs on OSTM servers to support status monitoring functionalities in a multi-vendor environment. The software, using DOCSIS technology to monitor RF and IP transmission states, supports DOCSIS 1.0, 1.1, 2.0 and PacketCable, with future support for DOCSIS 3.0 in development.

HARDWARE COMPONENTS AND FUNCTIONALITIES

The three primary OSTM Station servers include:

  • TIMS Net (Terminal Information Management System Network) – the core OSTM server that automatically detects online CMs and STBs by continually polling CMTSs to provide updated information on the status of these devices. Based on its polling results, the TIMS Net requests execution of high-speed polling and trouble analysis as performed by the two other types of servers in the OSTM Station and serves to communicate analytic results to the Domain CE or directly to the appropriate higher-level OSS/BSS environments, depending on whether the OSTM Station is part of a Domain or operates on a standalone basis.

    The TIMS Net controls up to 76,800 CMs/STBs across up to 15 CMTS chassis. The server compresses and stores up to 60 months worth of CM/STB status data, providing a reference through which the system can identify and pro actively address elements with a history of problematic performance. TIMS Net provides automated failure event notifications through e-mail, SNMP and RMI (the Java Remote Method Invocation used in distributed computation), ensuring that restoration work is undertaken immediately.

    The TIMS Net employs a MIB (Management Information Base) browser, which is similar to an HTML browser, to access the DOCSIS component parameters. Because the agent data can only be accessed via SNMP, the TIMS Net employs an SNMP application server to allow the HTML client to access the SNMP agents. The system uses Poseidon, the JVM SNMP polling server, to poll the CMTS for online agent data, ignoring the offline agent data contained in the CMTS database. During every polling cycle Poseidon updates the agent data cache in the TIMS Net, deleting from the cache any agent that has been offline for a proscribed period of time. The duration of the typical polling cycle is five to ten minutes, which includes the time required for updating or writing new data to the database.

    If the system is configured as an independent OSTM Station rather than as part of an OSTM Domain, the TIMS Net server functions as the integration interface with the HFC network mapping information generated by the CAD/Mapping system and with the SMS/Billing components. Whether the TIMS Net server is used as part of a Domain or as an independent Station, its integration with the SMS via APIs as discussed above facilitates rapid retrieval of existing SMS/Billing data. And it provides a means by which the system can conduct advanced searches that combine various categories of subscriber information.

    This server, because it is equipped with a MIB browser, can be used as a simplified NMS to monitor and manage network equipment beyond the line electronics, such as routers and servers. The TIMS Net also has Web server functionality, allowing client PCs to use various TIMS Net services via an HTML browser. The TIMS Net runs on a Linux OS and uses a Java server to support SNMP polling.

  • AE (Agent Engine) – This is the SNMP-based polling server that regularly polls the CMTS ports and CMs, EMTAs (Embedded Modem Terminal Adapters) and DOCSIS-enabled STBs – collectively referred to as agents in the platform lexicon – to collect downstream and upstream SNR data generated by the agents. There can be up to four AEs per TIMS Net server. The AE runs on a Linux OS and uses a Java server to support SNMP polling and evaluation functions.

    An important feature of the AE is that it employs multi-thread technology to poll multiple agents simultaneously, thereby greatly shortening the polling cycle compared to sequential polling systems. This in-parallel, high-speed polling process ensures a higher detection rate in the fluctuating SNR environment of the HFC plant. Typical polling times resulting from this synchronized, parallel polling process are: within one minute for 2,400 agents; within five minutes for 9,600 agents and within 20 minutes for 19,200 agents.

    Adding to system efficiency, the AE uses Multiple Variable Binding technology, which serves to aggregate SNMP data into fewer packets than would otherwise be required, thereby reducing the number of generated IP packets by as much as 20:1 in comparison to the packet volume of a general-purpose NMS produce.  This greatly reduces the routing load on the CMTS.

    The AE is also equipped with a unique software processing feature that assures measurement accuracy of multiple types of agents by correcting for measurement deviations that are due to agent feature variations and temperature fluctuations. By correcting for these deviations to an accuracy of ­+ 1dB the AE improves the fluctuation detection of each indicator to levels that are commensurate with the practical requirements of effective monitoring.

    Accuracy is farther enhanced through the assurance that there are no false error reports when transmission from an agent that has been chosen as a monitoring point in the network is intentionally terminated. This is achieved through a redundant configuration where at least two agents are allocated to each monitored RF line transmission port. If there is no response to the polling signal from the primary transmission source for reasons other than a network malfunction, a redundant determination mechanism is used to automatically switch the threshold readout to the normally responding agent, resulting in the immediate cancellation of the error report. This process is essential to the system’s ability to rely on individual agents at customer premises as monitoring points for assessing network performance.

  • DE (Diagnostic Engine) – This is the server that checks and analyzes status information collected by the AE and, when sub-par performance is detected, performs comparative analysis using the transmission route block diagram database to identify failure points and estimate the causes. There is one DE per TIMS Net server, and, like the TIMS Net server, it runs on a Linux OS. Each DE can monitor up to approximately 4,000 transmission components, including optical nodes, power sources and amplifiers.

    The DE performs its diagnostics on information gathered from two or more agents associated with each bridger line in the coaxial plant. This approach relies on the fact that all information in the tree-branch architecture of the HFC network is propagated downstream. Thus, the DE can match the type and connection information of the transmission equipment that has been pre-registered in the data base against the status per distribution line as measured across all monitored agents.

    The DE converts the status information such as signal level, signal-to-noise and error rate collected by the AE from monitored agents into failure information for each bridger line. The DE then cross-checks this information against the RF tree/branch network mapping data to determine the failure source and symptoms. This process allows the DE to pinpoint the location and estimate the causes of a failure along all transmission routes whether the problem originates from a power source, optical node or line extender.

    In instances where CM penetration is not sufficient to provide an adequate base of agents for monitoring purposes or where the operator desires to increase the density of monitoring points to create a more effective monitoring system operators can deploy outdoor-hardened line-mounted CMs to serve as system agents. The DE will incorporate readings from these agents with readings from indoor agents to create a seamlessly integrated data base for use in the diagnostic processes. Similarly, because the system, as described above, supports direct integration with external systems via open APIs, the DE can also leverage data collected from third-party status monitoring systems to enhance the diagnostic results.

The functions and operational relationships of these servers are illustrated in the diagram below (Figure 1):

OSTM Server Hierarchy - Click to Enlarge 
Figure 1: The OSTM Server Hierarchy - Click Here to Enlarge 

ROBUSTNESS, SCALABILITY AND REDUNDANCY (Data Base Replication)
The platform supports scalability by allowing operators to incrementally increase the capacity of the OSTM Station server configuration in tandem with increases in the size of the monitored network. The openness of the system also supports the introduction of new options over time. And, of course, the OSTM Domain architecture supports the ability to add new OSTM Stations incrementally as the volume of CMTSs increases in conjunction with higher CM penetration and new home and industrial construction across the metro footprint.

Several factors contribute to making the platform extremely robust and efficient in its use of system resources in a 24 x 365 continuous operations environment. The platform dynamically reflects configuration changes in agent information and failure determination threshold values from one polling/failure determination cycle to the next without having to stop or restart the program. Within each polling cycle the system deletes previously collected data that is older than 24 hours, thereby preventing the number of records from exceeding a set amount.  Firebird, the data base format used in the platform, automatically adjusts the past record history sequentially from the oldest records, making it unnecessary for the database administrator to periodically confirm the status and adjust the records. All backup processes can be performed online, obviating any need to restart the system.

All data bases associated with the platform are backed up through a data base replication process that utilizes server resources across the OSTM Domain to maintain multiple copies of each data base. Replication provides a synchronized mechanism by which any updating of a master copy of a database is replicated automatically across the slave copies.  In instances where a new database is implemented, this is not automatically replicated in the distributed server architecture, allowing operators to choose which resources will be used in conjunction with data replication of a new data base.

The platform uses the distributed database replication architecture to enhance overall platform performance by allowing database reads to be divided among all relevant servers. This load sharing also allows a slave database server to be configured to take over the master role if the master database server becomes unavailable.

DIAGNOSTIC AND NOTIFICATION PROCESSES
The fundamental database building blocks of the diagnostic capabilities of the platform consist of the basic agent (CMTS, CM, STB and EMTA) parameters as collected by and stored in the TIMS Net server; the HFC electronics mapping data retrieved from the operator’s CAD/Mapping program and translated into a system block diagram by the Diagnostic Engine (DE), and the ongoing performance data collected from the monitored agents via polling performed by the Agent Engines (AEs).

The key to precision location and analysis of system failures is the ability of the DE to use information generated from the polling performed by the AE to link a specific agent performance result with a specific transmission component via reference to the HFC network block diagram. If the operator does not have a record of the plant components stored in a CAD/Mapping database, the system data is input manually using the HFC device registration graphic user interface of the DE.

This block diagram is described as follows:
HFCMASTER (HFC master database) = HFCEQMP (equipment details) + HFCDVRG
          (equipment/device registration details) + HFCPTRG (HFC port registration details)

When the AE informs the DE of an abnormality in the signal performance as registered by an agent the DE references the HFCMASTER to perform the analytic process that pinpoints the transmission source of the problem. The first amplifier on each distribution path beyond the node is a child to the appropriate node ports, and these amplifiers’ ports are designated as “fathers” to succeeding amplifier ports in the distribution chain.  In this way each amplifier is assigned either to another amplifier’s port or to the appropriate port of the nearest node in the DE’s reference diagram.

The AE polling leading to the diagnostic process is instigated when the TIMS Net finds some variation in the data it retrieves in a given polling cycle as compared to its existing records. On receipt of this information, the AE performs polling on the CMs and other cable system agents so as to gain specific information on the nature of the anomaly. This ensures that any anomaly caused simply by an agent being taken offline is not registered as an error in the system. The AE polls each agent and compares the performance of that agent with previously stored data and delivers the report of its findings to the DE. The DE then checks the SNR value of agents on each transmission line port.

The AE polls each agent and compares the performance of that agent with previously stored data and delivers the report on its findings to the DE. The DE then checks the SNR value of agents on each transmission line port in the cascade and by process of elimination determines where the problem lies.

The OSTM employs a sophisticated alert system to convey information to the technical and customer support teams. The alert system not only describes the service degradation details; it advises operations support as to what technical expertise is needed to quickly troubleshoot and resolve the problem. As soon as they receive the alert, technicians log on to the system and check the alarm list for details. They can immediately determine whether the problem relates to a specific modem or other agent outage or extends over a wider area. The OSTM uses a light-weight trouble ticketing system for reporting. At the same time, the OSTM reporting system can be integrated with any industry standard trouble ticketing to work within the established reporting environment of the operator’s system.

The report on the outage location and nature of the problem is also generated to the customer support center, providing the center advance warning as to what the volume of trouble calls might be. In the case of a customer-generated trouble call, if the OSTM has been integrated with the SMS/Billing system, the OSTM can identify what the affected agent is in its data base and trigger the polling process to immediately gain insight as to the cause and extent of the problem. This information is then generated back to customer service as well as to technical support to ensure the customer is informed and that remedial action is implemented.

INGRESS MANAGEMENT
Ingress Management (IM) is an optional component of the platform that provides an extremely accurate and robust means by which operators can pinpoint and proactively address problems in the upstream transmission path. The IM system, illustrated in Figure 2 below combines polling of upstream ports on the CMTS with the ability to control attenuation bridger gateway switches (ATT BGS) in system amplifiers. The system automatically finds sources of ingress noise in the cable system by switching the ATT BGS and measuring SNR on the upstream ports of agents that are associated with increased RF signals from specific amplifiers.

Open STM IM architecture - Click to Enlarge 
Figure 2: The Open STM IM architecture - Click Here to Enlarge 

The diagnostic process begins on the upstream path from the main node BGS and works downstream across all branches, monitoring the SNR of each CMTS upstream port. Where noise is also increased the source of that noise can be identified through an analytic process referencing the system block diagram that is used in the platform. The per-node parallel processing, employing the IM polling process as discussed above, provides a quick diagnostic read on performance across all electronics served by that node.

The IM stores and graphs the SNR values as they are periodically obtained from the CMTS ports. This provides a reference against which IM can check to determine whether there’s been an occurrence of ingress over a certain period of time that exceeds the limit set by the operator. If the ingress values are higher than the defined values, the IM displays an alarm. The IM ties in with the other elements of the OSTM, resulting in the same level of efficiency in reporting information to technical and customer support centers.

CONCLUSION
A new approach to status monitoring has been developed that allows operators to overcome the limitations of traditional status monitoring systems. These systems, by relying on generation of out-of-band pilot signals and sequential, periodic polling of amplifiers, set-tops, cable modems and other electronics in the cable system to identify potential problems, do not support the timely generation of precise fault information.

The system overcomes these limitations by relying on the data collection and reporting capabilities that are intrinsic to the DOCSIS platform. Its patented analytic tools provide the means by which this information can be exploited to maximum effect in pinpointing and diagnosing the causes of problems in the transmission components of the network.

Operators will find that not only are operations expenses significantly reduced by virtue of being able to quickly identify and resolve problems. They’ll find that their ability to shorten outage times and to keep customers abreast of what is happening during the repair process will greatly enhance customer satisfaction. Moreover, by using tools that can identify network locations where noise is beginning to exceed acceptable levels even before the disruptions become apparent to end users, cable operators will take an important step toward the proactive maintenance practices that are essential to sustaining the levels of customer experience that are essential to success in today’s competitive multi-services marketplace.

Advertisement