A customer complaint is a poor substitute for sound network practices. With competition and the deployment of new services, reliability of the network has become critical.
Information from a network monitoring system is playing a key role in improving construction practices, installation procedures, plant maintenance procedures and customer service. All of these elements together have become part of a larger plan to continue to improve network reliability.
The goal of an advanced network monitoring strategy is to create a virtual presence, where the operator can view the status anywhere in the plant at any time. In addition to the traditional status monitoring role of identifying faults in active distribution devices, a virtual presence implies capturing fault information from the entire hybrid fiber/coaxial (HFC) domain, evaluating quality of signal and plant performance, correlating all this data, evaluating it, and disseminating information to the appropriate functions. And all of this is completed in real time. Virtual presence is paying off with increased network reliability, availability and customer service.
The following steps illustrate the process of building an advanced network monitoring system toward the goal of creating a virtual presence.Five steps to implementing network monitoring
1. Status monitoring.
Gathering data on the status and health of each individual network element such as power supplies, fiber nodes, amplifiers and headend equipment provides immediate notification of an equipment failure. Without status monitoring, the nature and location of equipment failures must be correlated using customer complaints as primary sources of data. Further technician time must be spent in correlating and eventually isolating the location and nature of the equipment failure. This absorbs valuable technician time in constant fire-fighting mode.
In addition, status monitoring can often provide warning of impending service interruptions before subscriber service is interrupted. For example, when loss of commercial power causes a power supply to switch to battery standby, an alarm is issued and a generator can be dispatched to the site before backup battery power is lost.
Another common example is when an optical node loses optical power on the primary fiber and switches to the redundant fiber path, an alarm is generated and appropriate repairs can be made before subscriber service is affected.
Another benefit of status monitoring is the ability to verify redundant systems. For example, a system operator can purposely place power supplies into standby mode to verify backup battery voltage, and faulty batteries can be replaced before an emergency occurs. Redundant fiber systems can be checked in the same manner.
2. Performance monitoring.
Performance monitoring allows operators to continually monitor and verify the integrity of the services at various points throughout the network. A spectrum analyzer at the headend automatically monitors level and distortion parameters on all channels before the signal leaves the headend. Strand-mount spectrum analyzers at FCC proof points and strand-mount signal level meters throughout the network automatically and continuously monitor distortions and levels throughout the plant.
Performance alarms generated at the headend or at various points in the field allow the operator to pinpoint the source of the signal degradation and take steps to solve the problem — many times before subscribers notice distortions and degradations in picture quality.
Return path monitoring can offer the operator nearly instantaneous data on return path ingress and can immediately narrow the location of the ingress source.
3. Implementing an HFC domain manager.
The HFC domain manager is the key value-added element between the physical plant and the technicians and service personnel responsible for the maintenance of the system. The domain manager performs the following basic functions:
• Configuration: Through the domain manager, an operator can view and configure each network element. The operator can select parameters to monitor and configure alarm limits for each parameter. For instance, a strand-mount spectrum analyzer can be configured to take video and aural carrier level, carrier-to-noise (C/N), and hum measurements on all video channels. For each measurement, the operator can set minor and major alarm limits, so if at any point in the network C/N dips below a certain value, an alarm will be generated.
During routine maintenance, system operators may want to disable certain alarms. Conversely, when a problem is suspected on a specific video channel, operators may want to monitor that channel more closely with non-interfering composite second order (CSO), cross modulation and depth of modulation measurements.
• Data acquisition, filtering and presentation: In a true HFC domain management system, massive amounts of data will be generated, perhaps from multiple element management systems. The domain manager is responsible for filtering the data into usable information and presenting that critical information to the user.
Trend analysis is an important function. Rather than simply report the C/N on channel 22 or the string voltage of a three battery supply once every 30 seconds, the data should be archived in a file. In this way, the system user can observe long-term trends and take necessary actions. An alarm condition must be reported immediately, but data gathered over time should be managed and presented in the appropriate format.
In many scenarios, a single point of failure creates large numbers of sympathetic alarms. This is referred to as the "barking dog" scenario: Late one night, a dog is awakened by a prowler and begins barking. In a rapid chain reaction, all the dogs in the neighborhood wake up and start barking. When the police arrive, it is impossible to tell which dog is barking at the actual prowler and which dogs are barking just because the other dogs are barking.
In a similar fashion, a domain manager has the ability to filter sympathetic alarms and report the most likely cause of the alarm storm.
• Cross correlation and root cause analysis: Far more complex than simple alarm filtering is the cross correlation of service problems back to symptoms in the physical plant.
Examples of these service problems might be a high rate of dropped data packets for cable modems, an increasing rate of dropped or interrupted telephone calls, or a high rate of communications alarms in the status monitoring system. These subtle problems often tie up the senior engineers and technicians for days, and even months, until the root cause is discovered and run to ground. Every engineer has a favorite "you-won't-believe-this-one" story.
An expert system operates by taking senior engineers' knowledge about the plant, past problems in the plant, troubleshooting procedures, etc. and creating rules for the system to follow when certain symptoms are encountered.
With a rules-based expert system in place, alarms can be generated when a certain combination of symptoms appear — long before an actual network failure or service interruption. When alarms are reported, they can include a short list of most probable causes and suggested maintenance procedures.
• Distribution of information: The final responsibility of a domain manager is the distribution of information.
This can include the geographic distribution of information — for example, getting an alarm from an unmanned hub back to the headend or network operations center. A beeping computer in an unmanned hub is of very little value.
Distribution of information also refers to distributing data to other operational support systems (OSS). Common open standards such as SNMP and CMIP are required.
4. Adapting business practices.
Network surveillance becomes most valuable when viewed as an integral part of building a more reliable network for the addition of new services, not when viewed as a necessary evil or simply gathering data.
One of the unexpected results when a network management system is deployed for the first time is the new availability of data. Cable operators are rethinking their business and training practices to fully leverage this new information source. This new information is driving changes in the way networks are run.
To fully leverage the new information:
A) Rely on the information coming from the new system to direct field engineering resources. Have personnel with adequate technical background analyzing incoming alerts to determine the next steps. Develop procedures for prioritizing incoming alerts and distributing alert and alarm information. Decide the criteria necessary to open a trouble ticket. Determine who can close a trouble ticket. And have a contingency plan for problems that are not resolved within a specified period of time.
B) Use the data to measure the reliability of the network. Analyze alerts on a weekly and monthly basis, set metrics for improvement and institute new maintenance procedures to increase the reliability of the network. This analysis should also lead to modifying construction practices, installation procedures and routine test and maintenance procedures. Change control procedures may be implemented to ensure commonality throughout the network.
C) Review and improve training procedures. As new business practices and process flows are put in place, gaps in training will become obvious. Engineers, technicians, network systems personnel, customer service personnel — all must be trained to use the system and must be trained in new practices and procedures.
5. Creating a virtual presence.
Status and performance monitoring provide detailed information about the status of the physical plant and the performance of the forward and return spectrums.
Virtual presence implies getting information from remote locations in the plant to the right place in the right format at the right time. By quickly routing critical plant information to the appropriate operations personnel to implement repairs or to respond to customer service issues, the cable operator has achieved a virtual presence throughout the plant.
A simple example shows the difference between data and the power of information when it is presented correctly:
Power is lost to a large segment of the network, and many subscribers have completely lost service. A red light flashes on a computer screen in the headend — that is data (certainly better data than correlating phone calls from angry customers!).
The dispatch center needs to know the type of power supply, location and nature of the fault. With this information, a technician can be dispatched to fix the problem.
Customer service personnel do not care about the details of the power supply — they need to know which subscribers are affected and the expected time of repair. With this information, the customer service department can be prepared with pre-recorded messages and further information for affected subscribers.
The system manager may not be involved in the day-to-day emergencies and repairs, but he or she can use the fault summary information. With this data, the system manager can begin to make decisions about modifying maintenance procedures to minimize the occurrence of certain faults. The manager can also prioritize budget dollars and focus the purchase of new equipment toward improving overall network reliability.
Three different job functions, geographically separated, each requiring different types of information — that's an example of a domain management system providing virtual presence.Summary
As cable providers deploy the infrastructure of status and performance monitoring systems, new and powerful network data is becoming available. The HFC domain manager is playing a crucial role in gathering, filtering, correlating and distributing this data, and cable operators are re-evaluating their business and network maintenance practices to fully leverage the new information. Properly deployed network monitoring systems are creating a virtual presence where the new-found information is immediately routed to the people who need it in the format best suited to their tasks.
With this virtual presence in place, HFC networks are reaching enhanced levels of reliability — just in time to support new and exciting revenue-generating services.