US20020022952A1 - Dynamic modeling of complex networks and prediction of impacts of faults therein - Google Patents
Dynamic modeling of complex networks and prediction of impacts of faults therein Download PDFInfo
- Publication number
- US20020022952A1 US20020022952A1 US09/048,025 US4802598A US2002022952A1 US 20020022952 A1 US20020022952 A1 US 20020022952A1 US 4802598 A US4802598 A US 4802598A US 2002022952 A1 US2002022952 A1 US 2002022952A1
- Authority
- US
- United States
- Prior art keywords
- model
- information
- software
- component
- components
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/02—Standardisation; Integration
- H04L41/0233—Object-oriented techniques, for representation of network management data, e.g. common object request broker architecture [CORBA]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/04—Network management architectures or arrangements
- H04L41/046—Network management architectures or arrangements comprising network management agents or mobile agents therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0893—Assignment of logical groups to network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/12—Discovery or management of network topologies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/22—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/008—Reliability or availability analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/328—Computer systems status display
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3495—Performance evaluation by tracing or monitoring for systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/86—Event-based monitoring
Definitions
- This invention relates generally to the field of the operation and management of complex systems, including the operation and management of computer networks.
- the present invention is intended to facilitate the management of a large-scale, far-flung computer network, such as the extensive distributed systems that are commonplace nowadays in large organizations.
- the person or team responsible for this job is typically in charge of everything from the organization's power supplies through its business software applications.
- the organization's business management naturally, may not wish to concern itself with the technical details, but does demand that when problems occur, they be dealt with according to the seriousness of the effects they have on the normal operations of the business. For example, management will want the greatest attention to be paid to those problems that affect the highest revenue generators among the various parts of the business organization.
- Subcontractors may be running the system, or parts of it, on their own sites, or the business's, or both.
- Engineers include multiple redundancies in the design of the system to minimize outages, but each redundancy adds extra complexity to manage.
- a given underlying condition may affect different users in different ways, or to different degrees—one user may be affected seriously, another critically, another benignly or not at all.
- the manager of a profitable business unit may have invested in redundant circuits, and so experiences no problem.
- the manager of a mid-sized unit has co-invested in redundant circuits with another business unit; their joint load on the single remaining circuit permits continued service, but performance deteriorates.
- a capacity planner needs to know the frequency with which router cards fail, if one brand suffers more failures than another, or if it is necessary to invest in redundant circuits for a group of users whose work is time-sensitive. She does not need to know this instant that some specific router had a bad card.
- Another object of the invention is to provide the ability to model, not only the significant hardware and software resources of the system being administered, but also the service relationships connecting those resources, in a flexible, dynamic manner, so that changes to the construction or make-up of the system being managed can be reflected promptly in the model without the need to restart the model or otherwise to interrupt running the model.
- Another object of the invention is to provide a method and system that can associate related events that are of interest to the operators and users of the administered system, and present the results quickly and in a way that makes the information easy to use.
- Another object of the invention is to provide a method by which one can flexibly model a system, and in which one can represent, not only the hardware and software resources of the system being modeled, but also arbitrarily-defined groups of those resources.
- Still another object of the invention is to provide a method and system in which the operators or a user can define, as needed, a set of data to be obtained relating to the performance of the modeled system, and to provide a particularly convenient way to organize control data to fulfill those requests using agents to obtain the required information.
- the preferred embodiment provides a software model of the managed network, and includes a flexible infrastructure for the purpose of obtaining information from the managed network and reporting it as appropriate.
- the data-gathering infrastructure is used to obtain information about what components are present in the network, and about what services each is providing to which other component(s). This information is used to construct the model.
- the data-gathering infrastructure obtains from the managed resources information relating to any malfunction or performance degradation, and reports this information to the model, which modifies its state accordingly.
- the structure of the model itself is used to predict the likely impacts of the reported occurrence, and the occurrence and its predicted impacts are displayed. As all this happens, the data-gathering infrastructure also obtains information concerning the addition of new components to the managed network, the deletion of others, etc., allowing the model to update itself during runtime.
- system administrators can define elements in the model to represent arbitrary groupings of components, such as business units.
- the model predicts impacts not only on individual hardware and software components but also on larger entities that are of significance to the organization using the invention and the managed network.
- the data-gathering infrastructure is conceptually distinct from and independent of the model.
- this infrastructure has a number of significant features, including a hierarchical structure that results in the ability to provide the model with as large a stream of data as may be necessary, while limiting the number of interrupts per unit time that the model must tolerate.
- this infrastructure preferably has the ability to be given new sets of working instructions during runtime, so that new types of information can be acquired, without the need for restarting the running of the program. Customized inquiries can also be provided in this way.
- the data-gathering infrastructure uses software agents having a structure that makes possible a high degree of reusability, in the form of reusable modules that can be kept in a repository for that purpose.
- FIG. 1 is a schematic illustration of a simplified example of a distributed computing ensemble to the management of which the present invention is applicable.
- FIG. 2 illustrates one example of a node, and shows the relationship between some basic logical components and the physical components.
- FIG. 3 illustrates a node that is a computer.
- FIG. 4 provides an illustration of the overall flow of information in a model constructed according to the preferred embodiment of the present invention.
- FIG. 5 shows a detail of the system fragment illustrated in FIG. 4.
- FIG. 6 provides a high-level perspective of how events can take different paths from the agent manager to further processing.
- FIG. 7 illustrates how events that report the discovery of some new managed resource are directed to the factory process.
- FIG. 8 illustrates how events representing reported and detected faults go to the Dispatcher component of the model server, which then directs the event to the corresponding managed object.
- FIG. 9 illustrates the lifecycle of an event from the model server Dispatcher process to alarm formation.
- FIG. 10 illustrates conversations among various functional components of the preferred embodiment.
- FIG. 11 illustrates classification of the functional components of an agent manager.
- FIG. 12 is a schematic illustration of making an agent manager and dynamic agents data driven.
- FIG. 13 shows the flow of control in an agent manager.
- FIG. 14 shows three phases of the model architecture.
- FIGS. 15, 16, 17 , 18 and 18 A illustrate the discovery process during runtime.
- FIG. 19 illustrates some of the computational services that the discovered managed objects provide and consume.
- FIG. 20 illustrates some higher-level services that discovered managed objects in FIG. 19 provide in response to the organizational use of the computational systems.
- FIG. 21 represents the world of the model in terms of a network that has emerged from realized computational services or paths.
- FIG. 22 illustrates the model including, in addition to paths, sessions.
- FIG. 23 represents groupings of needed resources as containers whose elements match users' basic level of categorization of resources.
- FIG. 24 depicts an alternative way to view these relationships.
- FIG. 25 illustrates the state of a portion of the model after dynamic agents have collected reported information from the resources.
- FIG. 26 illustrates the effects of the model's rootward graph traversal.
- FIG. 27 illustrates the leafward spread of impacts through this portion of the model.
- FIGS. 28, 29, 30 , 31 and 32 schematically illustrate the interaction phase of runtime.
- FIG. 1 is a schematic illustration of a distributed computing ensemble (actually, a combination of networks interconnected to each other) as a simplified example to the management of which the present invention is applicable.
- FIG. 1 are shown several subnetworks, each including a number of workstations connected together in various ways. Many or all of the workstations are connected to an intranet or to the Internet, or both.
- the system shown includes workstations in several offices, located in different buildings in a number of countries and continents.
- the workstations each run on an operating system (“OS”), but not necessarily on the same one.
- OS operating system
- each workstation may at a given time be running one or more application programs, any of which may require the accessing of various databases or files located at various places in the system.
- the workstations are used for the purposes of an enterprise, which has organized its business functions into a variety of subdivisions, departments, etc. A number of these subdivisions are indicated in FIG. 1, including a financial research division and an accounting department. Because of the potential for malfunctions to deprive workstations of the access they need to other parts of the system (access to a particular database, for example), the enterprise has constructed redundant communication paths between certain of its departments and critical resources, although such redundancies are not shown in FIG. 1.
- the preferred embodiment of the present invention can most easily be thought of as comprising two major parts: a model of the external system, and a data-gathering infrastructure that obtains data needed by the model.
- a model server and an operational database that support the model.
- the model disclosed herein could, within the scope of the invention, be used with another arrangement for gathering the required information from the external system and delivering it, and conversely, the data-gathering infrastructure can be used in many applications other than providing information to a model of a complex computer network that is being managed.
- the model simulates the evolution of faults and performance degradations through the external system.
- the model enables those who use the services of the various parts of the external system to specify the nature of their reliance thereon, in terms that will set clearly the expectations held of the operations personnel for that system. Those personnel, in turn, are enabled by the model to do what is necessary to fulfill those expectations, by showing them quickly what has gone wrong in the external system.
- the model also enables the operations personnel to know which part of the user community has suffered the brunt of any fault or performance degradation.
- the preferred embodiment of the present invention upon installation, begins by initiating a discovery process, in which it explores the external system it is to be used in managing.
- the discovery process locates each hardware and software component of the external system, and identifies what type of component each is (e.g., hub, router, computer, operating system, application, database, etc.).
- the invention constructs a model of the external system.
- This model represents the various components, relevant subcomponents, and their service relationships to each other. It should be noted that, in the preferred embodiment, the model itself does not contain all available information about the nature of each component—that is, although that information is used in the process of constructing the model, only a subset of the information acquired during the discovery process need end up in the model. This greatly simplifies the model itself, reducing the computing resources required.
- the system administrator can manually input the information needed to include a representation of that component and of its relationships with other components, in the model.
- the represented components and subcomponents are modeled as software objects, utilizing the well-known techniques of object-oriented programming.
- the objects corresponding to components and sub-components are termed “managed objects” herein; again, a more precise statement of what that term means in relation to the present invention is given below.
- the model constitutes an example of the type of mathematical entity known as a directed graph, in which the managed objects are the nodes of the graph, and the relationships are the edges.
- the system administrator has the ability to define additional managed objects in the model, which additional objects correspond to arbitrary groupings of simpler objects in the model.
- the administrator can define as such an object any particular user group that has significance to the operation of the business enterprise or the like to which the external system belongs. (The administrator can also make such other changes to the model as appear suitable, including the deletion of existing objects, the ascription of particular characteristics to existing objects, etc.)
- the occurrence of the change is provided with a tag that marks it as a root-cause event.
- any managed objects whose performance may possibly be affected by the occurrence of the reported change (if that change is a root-cause event) are identified, based entirely on their location in the model (graph) relative to the managed object whose state has changed, and on their service relationships (direct or indirect) with the latter managed object.
- the alarm includes the reported events its root cause (if identified) and the likely consequences.
- a display is provided that shows which managed objects have had reported events that are part of an alarm.
- a visual indication of any root-cause event is also provided; of course, several alarms may occur during the same period of time and are all displayed.
- the display provides the administrator with a quick way to see what events are being reported, which ones are likely to be related to each other, which (if any) of them is of a type likely to be a root cause, and which portions of the external system (including user groups) are likely to be affected, either immediately or soon, by the reported events.
- the preferred embodiment includes a model server that maintains the model.
- An operational data store stores any information useful to the functioning of the model and not found within the model itself.
- a Dispatcher component controls the processing of events by the model, and a Factory component generates new managed objects, and deletes existing ones, as needed.
- a Control Data Repository (normally termed simply the “Control Repository” hereinafter) stores necessary control data of various types. Filters are provided to determine what information in the model is to be made available to what users of the system, and a display is used to make that information available in an easy-to-use form.
- a report engine is also provided to generate reports as requested by the system administrator.
- Important aspects of the data-gathering infrastructure are the presence of one or more agent managers, which create, run and terminate individual software agents as necessary to obtain the required information regarding the make-up and organization of the external system (to create the model and keep it up-to-date) and regarding that system's activity (to monitor operation and provide the users with the required information concerning faults and performance degradations). Also, the structure of the dynamic software agents themselves facilitates their rapid creation whenever needed to gather any desired information, and makes it possible to design agents that will perform new tasks, with a minimum amount of trouble for the administrator.
- the model utilized in the preferred embodiment of the invention contains a number of basic or primitive types of elements: Nodes, Managed Objects (hereinafter, “MO's”), Services, Faults, Events, Alarms, and Impacts. A definition of each of these terms as used herein will be found in the following description.
- the model utilized in the preferred embodiment is of interrelated objects that form a network that typically exists in many dimensions (this “network” should not be confused with the physical network or networks of the external system of FIG. 1, and, for clarity, will hereinafter be termed the “model network”).
- This model distinguishes strictly between physical objects and logical objects.
- a physical object is something that it is possible to stick an adhesive label on (for example), or that could be dropped on one's foot; any logical function, in contrast, is considered as a logical object.
- any logical function in contrast, is considered as a logical object.
- modem is a physical object
- the function thereof is considered a logical object in this model, a logical object that runs on the physical object.
- FIG. 2 illustrates one example of a node, and shows the relationship between some basic logical components and the physical components.
- rectangular boxes denote physical objects, ovals, logical objects, and arrows, services, with the direction of the arrow indicating which component is providing the service to which.
- All nodes contain boards, on which sit ports, which enable physical media to transmit information to and from the node.
- the first logical level above the port is the interface, on which a network protocol runs.
- the Internet protocol is one example of a network protocol
- Ethernet and token ring interfaces are examples of interfaces.
- All nodes have operating systems running on them, which OS's provide system services to the interface and network protocol, as well as to programs.
- One or more interfaces provide interface service to a network protocol, one or more of which provide routesVia services to a subnetwork.
- a network is a collection of one or more subnetworks.
- nodes include a computer, a hub, a printer, and an Internet router.
- a node that is a computer has some additional properties, and offers a more richly varied set of application services than do router or hub nodes (see FIG. 3).
- Examples of such application services are programs such as database servers and databases.
- Computers typically contain locally and remotely attached physical disks, which are organized by means of logical file systems.
- Managed objects are constructs of the model. That is, they are not themselves part of the external system, but exist only in the model.
- MO's are augmented finite state machines that, in some instances, mimic certain interesting behaviors of things in the external system. They are useful artifacts for explaining sets of facts in the external system in a way that is internally coherent to the model.
- a MO need not correspond to something that the external system recognizes as or considers to be an item or unit, but exists solely because it is deemed useful to the model.
- a MO might very well be created to correspond to a seemingly random grouping consisting of a particular printer, a particular database and a particular LAN hub.
- MO's can represent users of the external system, business work groups or other organization entities that use portions of the external system.
- MO refers to the “actual item in the system environment that is accessed”. That is, a MO in the CIM sense is directly referential (refers directly to something in the external system that existed before being incorporated into that system), and is a proxy for that entity. A user or application interacts with the object in place of interacting with the underlying entity itself. In the present invention, on the other hand, a MO is only indirectly (if at all) referential.
- the model itself represents the interrelated system of devices and applications that make up the external system; there is not necessarily any exact analogy in element-to-element relationships, however, because not all elements recognized by and making up part of the external system need to be in the model, nor are all elements in the model things recognized as entities by the external system.
- MO's in the present invention are constructs of the model—that is, are defined in terms of their function within the model—rather than constructs of the external system that are represented in the application. In typical CIM modeling, if the state of an existing MO changes, the model knows of that change directly.
- the model learns of that change through an agent (discussed below) which has captured or deduced the change, packaged it into a message, and sent the message to the model, and thus allowed the model to respond to the message.
- an agent discussed below
- the model can contain representations of resources that do not have the proper instrumentation to interact directly with a proxy in the model but information concerning the performance of which can be derived through inference (e.g., a passive hub);
- the representations in the model can be changed independently of the things they represent (e.g., the types of information present in each MO, or in some MO's, can be redefined during runtime, thereby creating a new model in place of the old one, without having to interrupt the running of the model program, or having to restart or reinstall the software; this flexibility is an important feature of the preferred embodiment); and
- the model can accept systemic events—for example, the addition of a new MO, the loss of an existing one, or an equipment upgrade in an existing component of the external system—that cannot, by definition, have a MO as a proxy.
- This approach thus, provides the preferred embodiment with the ability to represent all objects and relations that could be represented using the CIM approach, and a large number of others that are not and cannot be accommodated in CIM.
- a service is a labeled, directed relationship between specified MO's.
- Services may be either computational or functional.
- Computational services are those which exist to create the fabric of the computing ensemble (the external system), for example, services performed by the OS in a computer node, opening and closing files, controlling a display device or a printer, etc. (In other words, computational services are those that serve to construct an effective distributed computing ensemble regardless of any use that the owner of the system might put it to.)
- Functional services are those which exist to satisfy the needs of higher-level objectives (not necessarily computational ones)—for example, getting data from a database to learn the number of tomatoes consumed per capita in Paraguay over the last five years.
- Functional services correspond approximately to a naive notion of “session” (that is, involve a coordinated exchange of information between nodes using conversational techniques).
- MO's there may not necessarily be only a single relation between two MO's.
- a single pair of MO's may have multiple service relations between them. They may each provide one or more services to and consume one or more services from the other.
- an operating system could provide both file system service and runs service to an application.
- a given service need not always behave in the same manner.
- a router delivers data that is destined for video application V as well as for financial market data information application M.
- V can function perfectly adequately with a loss of 5-10% of packets; M cannot.
- the preferred embodiment of the invention distinguishes these two tolerances in terms of service expectations.
- the service relation between the router MO and the MO corresponding to application V is not the same as that between the MO's corresponding to the router and application M.
- a path is the set of MO's that offer the necessary intermediary computational services to realize a functional service.
- a functional service is decomposed into lower-level computational services; computational services, on the other hand, implement functional services.
- a distributed database server needs to receive requests from its clients and return result sets to them. To receive remote requests, it needs to make use of intermediate network facilities connecting it to its clients.
- the database server MO needs “networkAccess” services from a MO on the same node. If the IP network protocol MO on the database server's node offers networkAccess services, there is a match. The IP MO, in turn, needs interface services from a network card on the same computer.
- the network card MO on the computer offers interface services, there is a match. And so on all the way down the chain of connections that deliver network services to both database and client, and ultimately allow there to be a session between them.
- the path is the chain of functional connections that allows delivery of services.
- the model would be a snapshot of the internetworked MO's at that moment, i.e., a map of the paths of the network at that moment.
- the model is an emergent property of MO's and services, and is not itself the main focus of what is being done.
- IPX Internetwork Packet Exchange protocol
- Paths are a necessary concept for computing the impact of physical and lower-level logical failures on higher-level services.
- Services categorize the kinds of relationships that make up the system at increasing levels of abstraction. Services, therefore, create a dimension of impact analysis from differing perspectives. For example, suppose it is desired to know how reliable database service is for some given set of disparate users. Categorization by service allows the model to cut simply across different database service providers and to provide an aggregation of both fault and impact data for that service.
- Any detected change of an individual managed object's state to some undesirable value is a “fault”.
- a MO's state is simply the set of values that its various attributes have.
- An event is a representation of a fault in the model, or of the inverse, a recovery from a fault.
- Performance metrics offer another source of information. By analyzing such data with a variety of statistical tests, it is possible both to anticipate faults and to interpret skew conditions as faults. Such analytically-inferred or -anticipated faults are termed “anomalies”.
- Faults and anomalies are adequate for component management because they report on the health of each component of the external system taken in isolation (“component” here is to be taken broadly, including either hardware or software, but always refers herein to an element of the external system).
- Service-oriented management also demands knowledge of impacts.
- An impact is the description of a disruption in service for some portion or user A of the external system owing to a correlated disruption in service of some portion B.
- a database suffers sympathetically if a business application cannot reach it owing to router failure. More simply, the router fault has an impact on some database sessions.
- the external system itself is unaware of the impact. Rather, the known relevant information (i.e., after all the other extraneous information is stripped away) is likely to be:
- the router registers a fault through an SNMP trap.
- the application may register a fault in that it cannot receive data—or the end user will telephone either application support or database support to complain.
- Database support needs to know about impacts as well as true database faults (disk full, program crash, etc.). If there is no use of the concept of impact, then when an affected user calls database support, the support person is likely not to know of the problem and will have to begin a second effort researching current conditions. If an operation system determines impacts, on the other hand, then the database administrator will have received information that a malfunctioning router may be hampering some users' database access, and can anticipate users' calls.
- FIG. 4 provides an illustration of the overall flow of information in a model constructed according to the preferred embodiment of the present invention. That Figure includes both a partial view of the external system (the “Network Fragment” in the right-hand portion of the Figure) and a portion of the model.
- managed resources the elements or components, including both hardware and software, of the external system
- New managed resources join the external system, others leave the system, and others change their configuration relative either to themselves (e.g., an equipment upgrade) or to other resources in the external system. All these systemic occurrences will be termed incidents.
- One main purpose of the invention is to convey information spontaneously about incidents that affect any end user's set of interests.
- a user's interests are defined by a set of managed objects, as well as by the functional perspective the user has on that set (e.g., business user, administrator, or application support).
- the model must reflect each such change to the external system or the resources comprising it and present that information to the relevant user(s).
- Event winnowing Analyze the information received to determine whether that information indicates
- Event association Convert the representation of interesting events into the model's own event formalism so that it can act on them.
- Impact/root cause analysis Determine the systemic, or contextual, significance of condition-changing event(s) (that is, determine the effect on the condition of the overall system that results from the specific event).
- Persistence Store persistently a completely associated set of events that it has categorized as an alarm.
- Information filter Organize and categorize information about which MO's have changed state either of their own accord or sympathetically to ambient conditions, filtered according to user interests and entitlements.
- Steps ( 1 ) through ( 3 ) are performed by the data-gathering infrastructure, while steps ( 4 ) through ( 7 ) occur in the model and related components, an step ( 8 ) is performed by the display system.
- the model monitors the operation of, and faults occurring in, the set of managed resources that 30 constitute the distributed computing ensemble (the external system).
- a fragment thereof, shown at the right in FIG. 4, is shown enlarged in FIG. 5 (Diagram 4 ).
- This exemplary fragment contains a network N 1 , to which are connected a router R 1 , a hub H 1 and three computers C 1 , C 2 and C 3 .
- Database D 1 runs on computer C 1 .
- Applications A 1 and A 2 run on computers C 2 and C 3 , respectively.
- Router R 1 also connects to networks N 2 , N 3 and N 4 .
- FIG. 6 illustrates a portion of the preferred embodiment of the invention.
- the two subcomponents illustrated here are an agent manager and the model server.
- An agent manager (of which any required number may in principle be provided) contains multiple dynamic agents (“mobile” programs to perform specified tasks), indicated here as d 1 , d 2 and d 3 .
- An agent manager may exist without any dynamic agents in it, or may contain up to some specified maximum number (in the preferred embodiment, this limit is 64; the invention is of course not limited to this value)
- These dynamic agents use such known mechanisms as file tailing, SNMP polling and SNMP trap receiving to monitor the managed resources in the external system. Monitoring amounts to performing two activities:
- Dynamic agents formulate the results of their analysis in terms of formal message objects of the model (events), and return those events to the model (this process of event return is explained in section relating to the data-gathering infrastructure, below).
- FIG. 6 provides a high-level perspective of how events can take three different paths from an agent manager to further processing, by either the model server or the Operational Datastore, depending on the event. More detail is given in the following paragraphs.
- FIG. 7 illustrates how events that report the discovery of some new managed resource are directed to the Factory process, f 1 of model server m 1 , which then generates a new MO for the model, in this case mo 6 . That is, if an event enters the system for a managed resource for which there is no corresponding MO, Dispatcher i 1 notifies the Factory f 1 of the need to create a new MO.
- FIG. 8 illustrates how events representing reported and detected faults go to the Dispatcher component i 1 of model server m 1 , which then directs the event to the corresponding managed object, in this case mo 6 .
- all MO's adhere to a publisher-subscriber pattern.
- one dedicated component takes the role of a publisher, and all components dependent on changes in the publisher are termed its subscribers.
- the publisher maintains a registry of its current subscribers. Whenever a component wants to become a subscriber, it uses a subscribe interface, offered by the publisher. Whenever the publisher changes state, it sends a notification to this effect to all its subscribers, which in turn retrieve the changed data at their discretion.
- a MO is said to publish a state change it has undergone to its dependents (i.e., the list of MO's that subscribe to it), and is said to subscribe to the state changes of all its supporters (i.e., the list of MO's it subscribes to).
- All MO's have at least one relation with at least one other MO, which relations can be thought of as a “tree-like” graph whose nodes are the MO's and whose edges are the publisher-subscriber relations among those MO's. Each of those graph edges is thought of as a vector (i.e., has a direction), pointing from the supporter (publisher) to the subscriber.
- the messages MO's pass to one another in their publisher-subscriber relationships are state changes that result from a MO's receipt of an event.
- a change in a MO may result in a corresponding change in one or more of that MO's subscriber MO's.
- These changes may result in changes in still other MO's.
- the model traverses the directed graph referred to above, starting from the MO corresponding to the resource from which the occurrence was reported, to the edge of the graph in what may be termed the “leafward” direction, that is, in the direction from parent (publisher) MO to child (subscriber) MO.
- the model also traverses the graph in the other direction (“rootward”), to find other MO's that may have undergone state changes.
- root-cause events that is, events that inherently are basic faults themselves, and not just sympathetic events, the consequences of other events. (Those types of events which are treated as root-cause events are listed in a table in the Control Repository.) The encountering of a sympathetic event indicates that the rootward traversal is still navigating intermediary points.
- rootward traversal moves processing toward a root cause, and leafward traversal towards systemic impact.
- the disk drive MO receives a message indicating an event that is inherently a root-cause event (the disk drive failure), and emits a state change message to its dependents, including the MO's for the applications in question.
- the application MO's consequently change their state.
- the model is predicting that the applications will feel the impact of the disk drive failure, and the invention labels the application MO state changes as impacts.
- a dynamic agent independently captures a report emitted by one of the applications about its inability to gain access to needed data (for example).
- the MO corresponding to that application receives a message indicating an event that is inherently a sympathetic event. It searches for an associated root-cause event by searching among those MO's supporting it for any that holds a root-cause event (i.e., for any that has received a message indicating the occurrence of an event of a type that is inherently a root-cause event).
- the model is corroborating the impact it has predicted. More importantly, the model is associating apparently disparate sympathetic events by associating those events with root causes, in a rootward traversal.
- the operator cannot, of course, be given a clear identification of the basic problem, but can still be provided with the enormously useful information as to which of the various incoming events are related to each other and what their likely impacts will be.
- the identification of the MO's that are involved in such a group of related events will likely facilitate the eventual identification and cure of the fault.
- the cluster of such events is termed an alarm. (If the cluster does not contain a root-cause event, then the cluster is said to be a proxy alarm, until such time as a root-cause event is reported.)
- a viewer other than a system administrator has a different basic level of categorization of events than the administrator does.
- the general rule is that the viewer's basic level of categorization is the focus of that viewer's interest.
- For the administrator that is the root cause; for application support, that set of applications she is responsible for in a given deployment context; for a business user, that set of resources she interacts with directly (applications, printers, etc.).
- the impact is the alarm equivalent for non-operational perspectives—that is, while the operator is interested in the alarm (which is handled in such manner as to direct attention to the root cause), the impacts are handled in such manner as to present to other users the information of most interest to them.
- FIG. 9 illustrates the lifecycle of an event from the Model Server dispatcher process to alarm formation.
- a MO passes messages about its state changes to MO's that depend on it (leafward traversal), or on which it depends (rootward traversal).
- Rootward traversal terminates when it encounters a node (MO) with a flag set that indicates that the latter MO has received a message indicating a root cause.
- Leafward traversal terminates on exhaustion of the tree (i.e., when there are no more leafward nodes to go to, traveling along the relationships from the root-cause MO).
- the collection of sympathetic leaf events and their underlying root-cause event together create an interaction history known as an alarm.
- the description of the root-cause event labels the alarm.
- the Dispatcher receives a reference to an event from the Control Repository, values to fit into that frame, and a reference to a MO. If the Dispatcher does not recognize both the MO and the associated event, alarm processing is finished. Either the Factory needs to create a new MO or the model server needs to log that it has received an unknown event. Otherwise, the MO constructs the event by fitting the parametrized data that it has received into the event frame it has retrieved from the Repository.
- the MO checks against its set of active supporters and adds the event to any active alarms in that set.
- the MO checks to see if it has an active alarm. If not, it creates an alarm; otherwise, it adds the event to an existing alarm, and the text of that alarm is changed to reflect that the root cause has been identified.
- the addition of a new alarm or an update to an existing one causes publication of change of states to all objects that subscribe to alarms (filters, which are discussed below).
- Processors called filters are sensitive to alarms according to configurable criteria. Each filter stands in subscriber relationship to a given set of MO's. In essence, a filter is a set of inclusion criteria for selecting MO's with which the given set of MO's should enter a subscriber relationship. In their role as subscribers, the filters receive messages when alarms associated with Mo's in their inclusion-list change state.
- the filters then alert the view applications of user stations (us 1 and us 2 in FIG. 6) of the alarms.
- the view applications sit in a subscriber relationship to their filter, which publishes its own state change, i.e., a new alarm (or a modification of an existing one), to the viewer applications.
- the viewer application When the viewer application receives the update message, it updates its display. The user can double-click on the alarm display to see the underlying events behind the alarm. The exact arrangement of this display is not critical, as long as the user or operator seeing the display is provided with the requisite information about what is happening.
- the data-gathering infrastructure of the preferred embodiment is a flexible, reusable provider of data collection and distribution services. This infrastructure is entirely independent of the model logically, functionally and in code base. Both infrastructure and model, however, are parts of the preferred embodiment. The description that follows is of a computational mechanism that has multiple functional uses within that embodiment.
- collectors of information about things in the external system where the customers of the information are the end users of the preferred embodiment.
- the model contains relatively little knowledge about the things that make up the external system, and most preferably has as little such knowledge as possible, knowing only the kinds of relationships the components of the external system enter into. To get that information, the model must rely on the infrastructure, which does have the requisite abilities.
- the infrastructure takes, as input, aspects of the essential characteristics of the things themselves (the intensions), and produce as output aspects of the external characteristics of those things (the extensions).
- the model thus requests services from the infrastructure.
- the infrastructure does not need to know anything about the model's implementation it only needs instructions of what to collect and what to emit.
- Enterprise management consolidated financial data feeds and messaging services are just some examples of possible models that could take advantage of the identical data-gathering infrastructure, differentiated only by the specific instructions it follows (that is, the data-gathering infrastructure has many applications other than with a model of a distributed computing ensemble, as in the preferred embodiment, and this aspect of the invention is not limited to use of the infrastructure with such a model).
- Agent Manager The main housing of the data-gathering infrastructure is the Agent Manager.
- Agent Manager is an application written in Java, and runs as a stand-alone address space.
- Agent Managers need additional components that fall into three categories:
- Agent Managers function as the gateways into the main processing area of the model of the preferred embodiment.
- the primary function of the Agent Managers is to perform a preliminary analysis of uninterpreted (raw) data.
- some intermediary agent infers the condition of the managed resource by observing its external behavior (e.g., CPU consumption, packet loss rate) that are detectable through such mechanisms as polling of SNMP Management Information Blocks (MIB's), or retrieving system management metrics from the control blocks of the relevant OS.
- MIB's SNMP Management Information Blocks
- Agent Manager the function of the Agent Manager is to analyze the uninterpreted data, determine which of these three outcome paths it should follow, format the data appropriately to the particular outcome, and route it accordingly.
- the architecture of the Agent Manager comprises non-terminating reactive programs that interact with their surroundings. These programs react to either external stimuli (incoming data from various sources) or internal stimuli (control data from other components of the external system). This control data instructs dynamic agents (described below) about what kind of data to acquire, how to analyze it, and where to pass it. The Agent Manager learns from control data what dynamic agents to stop, start or update.
- control infrastructure of the preferred embodiment follows a data-driven architecture.
- the implementation of functional components does not have a one-to-one correspondence to address spaces, or even to objects.
- the method of an object might implement the function, while in others implementation may be by means of an independent address space. What is important from the overall architectural standpoint is the set of functions themselves.
- Control Repository provides a central storehouse of control information, which must flow through the system in an appropriate way. This flow of control data constitutes a set of “control conversations” among functional components. These types of “conversations” are illustrated in FIG. 11, and include:
- Control Repository to Agent Manager mission packages, bundles;
- Agent Manager to instruction set attribute list of parameters
- Agent Manager to sensory monitor attribute list of parameters
- Agent Manager to analyzer attribute list of parameters
- Model server to Agent Manager ad hoc requests, updates of running modules
- Model server to Control Repository requests for bundles to implement ad hoc requests
- User station or model to model server requests for active data, requests for monitor services (ad hoc modules);
- Control Repository to warehouse extract-translate-load instructions for maintaining roll-up data model based on raw information.
- FIG. 11 illustrates the division of the functional components of the Agent Manager into those with an intrinsic rationale and those with an extrinsic rationale.
- control and configuration handling include control and configuration handling, inter-thread queue framework, communications handling, and thread handling.
- control of the instruction set includes control of sensory monitors, and control of analysis.
- This function is to receive mission packages and both ad hoc and persistent bundle requests from consumers, and to instantiate those requests from the Control Repository.
- This function is to provide low-level semaphore, mutex and asynchronous queue services to allow threads to provide parallel processing and object sharing as necessary.
- This function is to provide a mechanism for passing collected information out from the Agent Manager, and collecting information from other Agent Managers, as described below.
- This function is to manage (start up, stop and monitor) child threads that are performing services for the model or other end consumers.
- Agent Managers bind to the rest of the data-gathering infrastructure through a series of conversations with neighboring components (these are software components of the infrastructure, and should not be confused with the hardware and software components that make up the external system).
- control conversations software components of the preferred embodiment pass to one another runtime messages or system metadata whose function is to pass control from one point of processing to the next, thus effectively binding the functions of the distributed system together.
- the configuration handler subcomponent is responsible for the conversation between the agent manager and the Control Repository. It receives assignments for the running instance of the Agent Manager (data structures called mission packages). These assignments include:
- the configuration handler accepts external interrupts with control data, its presence allows dynamic reconfiguration of the number, type and assignment of dynamic agent in any given Agent Manager.
- Agent Manager components pass the information their dynamic agents have collected to the model. As already mentioned, this passing of information does not need to be direct.
- the preferred embodiment is organized to allow a cascade of information to enhance load balancing and ensure that dynamic agents run close to their managed resources, and at the same time to diminish the number of direct connections into the model server.
- Agent Managers are configured in a tiered tree pattern. Only the Agent Managers at the root of the tree pass their information directly to the model server; all others pass to the Agent Manager node in the next level. Each Agent Manager is, then, potentially a passthrough as well as an originator of data.
- the communications handler subcomponent is responsible for transmitting information messages from an agent manager, and receiving messages coming in from another agent manager lower in the cascade. Communications handlers communicate using CORBA CosEvent channels and find each other through CORBA CosNaming services.
- CORBA CosEvent channels communicate using CORBA CosEvent channels and find each other through CORBA CosNaming services.
- Inbound and Outbound Handlers are two subcomponents themselves: Inbound and Outbound Handlers. As their names suggest, Inbound is responsible for receiving incoming information from other agent managers, Outbound for passing to the next node in the cascade. Reconfiguration of the agent manager topology is dynamic. An administrator can add new assignments, delete existing assignments or change the behavior of existing assignments of any dynamic agent plant during runtime.
- the first portion of the internal housekeeping functions relates to inter-thread queue infrastructure.
- the Agent Manager is implemented as a set of parallel processing threads that divide the work of the Agent Manager functionally. Each thread runs according to an “active thread” strategy—that is, each is in a non-terminating main loop that accepts state-changing instructions from external interrupts. Joining the threads are asynchronous event queues and a synchronization mechanism based on waits and interrupts.
- the set of synchronization techniques is the inter-thread queue infrastructure.
- a mission package contains the configuration information for a number of different Agent Manager functions. When it enters the Agent Manager, some Agent Manager component needs to deconstruct it and disseminate its contents appropriately to the various responsible Agent Manager components.
- the thread handler is the component responsible for starting the dynamic agents, and thus receives the dynamic agent bundle sections of the mission package. It interprets the incoming information, determines how many threads to start up, and with what parameters, and then enters a monitoring phase in which it waits for either a dynamic agent to end prematurely or a new mission package to arrive.
- Dynamic agents are the true data capture component of the data-gathering infrastructure, and so form the bridge between the set of data sources to the model and the model itself.
- the data sources are the devices and applications that the preferred embodiment manages (the hardware and software components of the external system).
- Dynamic agents are also non-terminating reactive programs that interact with their surroundings. Structural non-termination does not mean that the tasks stay running forever, only that they run until requested to do otherwise.
- the system requests, for instance, that discovery agents terminate after they have completed their task.
- the programs react either to external stimuli (incoming data from various sources) or to internal stimuli (control data from other components).
- This control data instructs the quite generic dynamic agents about what kind of data to acquire, how to analyze it, and where to pass it.
- the Agent Manager learns from control data what dynamic agents to stop, start or update. Thus, the Agent Manager plays a critical role also in the flow of control data through the system.
- a dynamic agent bundles three functional components under a single cover: SensoryMonitor, InstructionSet, and Analyzer.
- Dynamic agents are the metaphoric eyes, ears and nose of the model. Consequently, the dynamic agent component most directly responsible for capturing data is called the SensoryMonitor.
- the SensoryMonitor collects data through a variety of protocols—SNMP, file tailing, TTY, to name a few—depending entirely on the mechanism most appropriate to the device.
- the SensoryMonitor is responsible only for knowing how to handle the lowest level requests, not for knowing what requests to issue, nor how to interpret the results.
- An InstructionSet is the module function that formulates protocol- and device- or source-specific data-eliciting messages.
- An Analyzer is the module function that winnows, interprets and massages incoming information. Most of the knowledge necessary for interpreting the incoming data as impacts must reside in the model proper, since only the model has access to information about how individual events fit into the fabric of the overall managed environment. The Analyzer, however, can perform first-level acceptance testing for messages, and reformat messages into a normalized appearance that the model is able to interpret more simply (i.e., assign the device- or protocol-specific knowledge to a point near the source rather than cluttering up the model with it). From the perspective of the Agent Manager, InstructionSets and Analyzers are exogenous guests running in the dynamic-agent context.
- the dynamic agent provides a distributed, possibly remote run-time context for some behavior of the model.
- the InstructionSet and Analyzer expose only their external behavior to the Agent Manager, and do not need to make their implementation known in any way to the Agent Manager. While the InstructionSet and Analyzer work on behalf of the model their instructions are mainly uninteresting to the runtime model. They come rather from the repository instantiation of the model.
- the particular implementation of the modules is immaterial to the functional division of labor.
- the components could all be bundled together as synchronous calls to the same class, as methods of multiple classes, or split into parallel threads.
- the architecture will support multiple implementations as suits the particular context at hand.
- FIG. 12 A schematic illustration of making an Agent Manager and dynamic agents data driven, is provided in FIG. 12, and the flow of control in an Agent Manager is illustrated in FIG. 13.
- the preferred embodiment of the invention When the preferred embodiment of the invention is to be used to operate and manage an external computing system, it is sufficient to install and begin running: it is not necessary beforehand to customize the software either to the particular types of components in the external system, or to the number of instances or configuration of those components in the actual system, or how the various component interrelate. This is because the preferred embodiment is constructed in such a way that, once it begins running, it itself discovers the information that it needs concerning the external system
- the runtime of the preferred embodiment includes several phases: the discovery phase, the model building phase, and the interaction phase (see FIG. 14). These phases overlap in time, and in a sense neither the discovery phase nor the model building phase ever terminates, but rather both continue throughout runtime, as changes are made to the external system's make-up or arrangement. Each of these will be discussed in detail in turn.
- the preferred embodiment When the preferred embodiment first begins to run, it has no knowledge of the particulars of its environment (i.e., the external system it is supposed to monitor). All its “knowledge” that will be used in learning what it needs to know about the particular external system, is initially to be found in the Control Repository.
- This Repository contains a lexicon of kinds of modeled MO, and hence an inventory of the hardware and software components that, initially, the runtime model of the preferred embodiment can recognize. The system administrator can manually supply the information needed to instantiate MO's for elements or parts of the external system that are not accounted for by the initial contents of the Control Repository.
- the model's first steps are to recognize and represent the world in which it operates (the external system). This will include finding the set of devices and applications and other software that populate the external system, recognizing the members of that set as instances of items in the inventory (where possible), instantiating objects of the appropriate type within the model, and labeling those objects (MO's) with names that reflect their name in the external system.
- the external system This will include finding the set of devices and applications and other software that populate the external system, recognizing the members of that set as instances of items in the inventory (where possible), instantiating objects of the appropriate type within the model, and labeling those objects (MO's) with names that reflect their name in the external system.
- the model establishes a directed graph in which the MO's form the nodes (it will be understood that this graph is of the model, not directly of the external system, and that the “nodes” here referred to are MO's, and are not the same “nodes” to which reference was made above in the portion of the Detailed Description titled “Nodes”).
- the relationships between the MO's characterize and define the edges.
- MOs' procedural knowledge of services they consume, services they provide, and systemic constraints on those services For example, suppose that in the external system to be managed, IP services must be provided to consumers on the same computer node (this “node” is the concrete type of node referred to initially), and that one provider provides those services to many consumers. Given any IP service consumer application, the model can determine that that application must consume IP services from a single IP provider on the same computer node (i.e., the network protocol of that node). In general, it is possible to characterize MOs' procedural knowledge as their ability to determine their nearest connections based on the services they supply and consume.
- the discovery process also needs to pass the information it has learned to the model, so that the model can create the required instances of MO's to correspond to the external system resources the discovery process has encountered.
- the process encapsulates its information into an attribute list that describes the resource in the terms required by the model. It passes that attribute list to the Factory component of the model server, which draws on the information stored in the Control Repository to determine the possible services a MO of this type can provide and consume.
- An attribute list is a data structure comprising a variable list of labels followed by associated values. It matches this information to the services it deduces this particular MO actually is providing or consuming (inferred from the attribute list received from the discovery process), thus determining the interrelationships between this MO and others, and so, effectively, situating the MO in the model properly.
- FIG. 15 contains a legend to explain the symbols used in this explanation.
- a disk with an X across is an unidentified MO.
- a disk with a horizontal bar across it represents an identified MO (such disks are labeled to distinguish them from each other, the letters used for that purpose in this example including A for application, C for computer, D for database, H for hub and R for router; it will of course be understood that the preferred embodiment is not limited to managing resources of these or any other specific types).
- a solid arrow is used to designate a path, while a hollow arrow denotes a session.
- a rectangle encloses and denotes a business unit.
- a tag denotes a service (B-Service or I-functional Service), a cloud outline a portion of the network that is not included in the illustration, and a cloud with lightning a reported fault.
- B-Service or I-functional Service a service
- a square with rounded corners and a cross on it indicates a root cause, and a disk with a diagonal bar across it, an impact.
- the criteria for inclusion of MO's in the model might, for example, comprise a set of starting network addresses for the discovery process, together with the constraints on the range of acceptable addresses that effectively limit the possible inventory to addresses that fall within that range. These limits are preferably set by the system administrator at the time of installation, and so permit the administrator to limit the breadth of the discovery process (for instance, it may be decided to limit the initial discovery to only a particular subsystem of the overall external system). Such breadth constraints thus define which IP networks are to be discovered, and which IP subnets in those networks are to be ignored.
- the preferred embodiment itself sets the limits on the depth to which discovery is conducted (that is, the information actually needed is gathered, but information beyond that is not sought).
- depth constraints include the need to discover all subnets within the defined breadth constraints, all IP routing devices within the breadth constraints, the physical make-up of the IP subnets, all IP devices within those discovered subnets, all media access control methods (“MAC”) of those IP devices, all repeating devices (hubs) among those IP devices, the physical connections between hubs and other IP devices, all bridging devices among those IP devices, and the physical connections between the bridging devices and other IP devices.
- MAC media access control methods
- the model After performing such an IP discovery, the model will know all IP subnets that meet the administrator's breadth criteria, and within those criteria all IP devices and their capabilities. From that information, the model will then be able to create IP network maps that show how the IP subnets are joined together, create IP subnet maps that show how IP devices are physically connected together, and know what devices to monitor for health and performance. (It is to be emphasized, however, that in the preferred embodiment, the model will not actually include all this information, although the information will be used to construct the model. If the model later needs some of this information, what is required will again be gathered from the external system by means of dynamic agents. That is, once the model is constructed, only a subset of the available data, and only a subset of the originally discovered data, is included in the model. The knowledge base for the model is the external system itself, and not a separate database, or the model itself.)
- the autodiscovery process includes three phases: protozoan discovery, generic discovery, and personality discovery
- the protozoan phase involves identification of discoverable nodes (this refers of course to “nodes” in the more concrete sense first used above, not to the nodes in the graph formed by the program, as the latter only come to be defined as the former are discovered); the generic, the association of discovered nodes with standard types (e.g., a MIB defined by a network working group's RFC); and the personality phase, which in the preferred embodiment is the association of discovered standard types with manufacturer-specific types (i.e., proprietary MIB extensions)
- These multiple phases are used because given discovery procedures are able to identify only limited amounts of information.
- the preferred embodiment incorporates preexisting procedures, but is believed to be unique in the way it combines and drives those procedures.
- FIG. 16 depicts Internet Control Management Protocol (“ICMP”) ping signals to all possible addresses within its breadth constraints on the networks it is responsible for discovering.
- ICMP Internet Control Management Protocol
- a response to an ICMP ping indicates only that an object exists on the network, however, and does not indicate its type or identity.
- FIG. 16 depicts nodes that have responded to the discovery processes's ICMP ping requests. That Figure represents those nodes as disks, without any other information, to reflect that the model has insufficient information as yet to determine what kinds of system resources or components they are.
- FIG. 17 illustrates the result of this phase.
- the process has successfully discovered computers (C-nodes), hubs (H-nodes) and routers (R-nodes) in the external system in question.
- the discovery process issues SNMP ping requests to the items it has located in the protozoan phase. A positive response to an SNMP ping indicates that the device is SNMP-compliant.
- the discovery process also extracts heuristic information from the Control Repository that allows it to identify any standard MIB positively.
- the value of either the Enterprise Object Identifier (“EOID”) MIB variable or some concatenation of MIB variables provides a generic device type's necessary and sufficient recognition criteria.
- Configuration information from the Model Repository allows the process to extract configuration information once the device is positively identified. This information will include information about the resource's ability to recognize and report problems it may experience, to the extent that such ability is standard in devices of the general type in question.
- the discovery process Upon completion of the generic phase, the discovery process has identified the resources in the pertinent portions of the external system down to the level of manufacturer's make and model number (where appropriate). If the MIB's of all of a manufacturer's models are identical as far as their participation in the fault and performance operations is concerned, differing only in variables concerned with device configuration or the like, there is no reason to differentiate among them.
- the personality phase uses the same procedures as the generic phase. The difference is that this phase extends the process beyond standard MIB's to the manufacturer's MIB extensions. These extensions offer additional information about the device's own internal configuration and any errors it may recognize and report beyond the standard.
- the Control Repository houses heuristic identifiers and configuration material associated with “personalities” (the individual characteristics of a particular model or the like) in the same manner as with generic material.
- the dynamic agent instantiates a software object that contains the information of interest.
- the class of object instantiated corresponds, in the preferred embodiment, to the generic type of component or sub-component, and the personality information is represented as an attribute of that object. That is, the preferred embodiment avoids the conventional means of extending existing categories (the generic classes), i.e., using object inheritance hierarchies.
- the inheritance mechanism as is well known, allows objects to share attributes and methods based on their structural relationship.
- inheritance relationships are intrinsically compiled, even in an interpretive language like SmallTalk. Adding a new class of object within an existing class would require a recompilation. The approach taken in the preferred embodiment avoids the need to recompile, and thus permits the invention to continue running without interruption.
- IP discovery is illustrated in FIG. 18A. This is typically handled by a single software agent, which does “boxes and wires” discovery (“BWD”), after which other agents fill in system-level information such as applications and databases. If desired, a given site can break the BWD agent into multiple agents each responsible for a subset of the enterprise. (It should be noted that while the process is illustrated as a sequence of steps, in practice, the entire discovery process is preferably executed in parallel, which permits a dynamic agent to pass off control to another dynamic agent that can pursue alternate discovery paths.)
- BWD boxes and wires discovery
- the process begins with the receipt from the system administrator of a list of networks, subnets, subnet ranges, and hosts or host ranges. In this step, every subnet or host is passed in with a subnet mask. A list of subnets to discover is then built; this may include a predefined list of hosts. For each subnet, a broadcast ping is emitted, to see if anything answers. Based on the response, a list of all possible addresses on the subnet is made, and those addresses are pinged. Then, an initial SNMP query is done. (All SNMP-compliant devices are handled before any non-compliant devices, to save time.)
- the initial SNMP query requests how many interfaces the queried node has, whether it has its routing flag turned on, and what system services it supports. As it is possible that a device that does not support a service may nonetheless report that it does support that service, the query also includes the system name, confirmation that the node's apparent address is actually in the node's address table, and if so, the node's subnet mask.
- Each SNMP node has its entire address table walked, and any nodes having multiple addresses (mostly routers) are queried for their address resolution protocol (“ARP”) tables. (This table is important because it is the only means by which one can hook a computer up to its LAN access port, usually a hub port, as the hubs only know about MAC addresses.)
- ARP address resolution protocol
- All SNMP nodes are queried for interfaces, ifStack, etc.
- the process then builds the SNMP nodes; that is, the process instantiates software objects representing the respective nodes. These objects are not part of the model, but will eventually be used to forward the necessary information to the model for the latter to create a corresponding MO. Any repeater boards or ports are built, and MAC addresses are retrieved. Vendor specific discovery is performed, and finally, factory orders (that is, objects containing the information needed by the Factory component to create a corresponding MO), are sent to the model for that purpose.
- factory orders that is, objects containing the information needed by the Factory component to create a corresponding MO
- the threads of the agent correspond to respective ones of the boxes shown in FIG. 18A.
- the ping thread receives Subnet objects in its queue and generates IPAdress objects via its pinging. These latter objects pass through the other queues.
- the next phase of building the model is constructing the mesh of actual service interrelationships among MO's.
- the model introduces the concept of service to describe relationships between MO's. All MO's participate in the network by consuming or providing services (or both). A singleton or isolate (an MO having no relationship to any other MO), by definition, cannot be part of the network.
- the concept of services is used to constrain the relations between MO's. By restricting the number of possible services, the system forces “services” as represented in the model to be abstract because they necessarily have to ignore subtler distinctions. More importantly, perhaps, services inherently impose constraints through cardinality co-occurrence restrictions. For instance, an application needing IP services is constrained to find IP services on the same copy of the operating system that it is running on. The IP MO provides IP services to all requesters running on that OS; therefore, IP services must support a one-to-many cardinality.
- the purpose of services in the preferred embodiment of the invention is to provide MO's with the ability to determine what other MO's they connect to, in as many cases as possible. If an application needs IP service, for example, its MO will construct a relation to its network protocol MO. If an operating system needs domain name services (“DNS”) within its subnet, its MO will construct a relation to the MO corresponding to its local DNS service provider. If multiple services differentiate themselves only by tolerance, the MO representing the element that needs the service will form a relation with the MO of the provider whose service tolerances best meet the service-consumer MO's requirements.
- IP service for example
- DNS domain name services
- FIG. 19 illustrates some of the computational services that the discovered MO's of FIG. 19 provide and consume.
- FIG. 20 illustrates some of the higher-level services that the discovered MO's provide in response to the organizational use of the computational systems.
- the model becomes a set of MO's that know how to resolve their connection needs, maintain dictionary memory structures of those MO's that are at a given time providing essential services for them, and consuming essential services from them. If all such service relationships were realized and displayed simultaneously, the result would be a snapshot of the internetworked MO's at that moment, i.e., a map of the paths of the network.
- the network of the model that is, is an emergent property of MO's and services, and is not itself the ultimate focus of the model as the model is used in the preferred embodiment.
- FIG. 21 represents the world of the model in terms of a network that has emerged from realized computational services or paths. Although this view is both volatile, and lacking in the richness afforded by services in general, it is visually simpler and so has greater graphical explanatory power.
- FIG. 22 illustrates a more complete view of the world of the model than does the previous FIG.
- the organizational-use model and the computational model are not transitive, however.
- the model is able to constrain possible paths over which a session may be realized, but cannot deduce—or explain—the necessity of a given session based on the computational model. This becomes particularly apparent when viewing a session between say, an application and a database from the user's perspective.
- the organizational-use model describes relations among MO's that are motivated by people wanting to accomplish some purpose through use of the distributed computing ensemble.
- This model is a grouping mechanism that informs the viewer of the preferred embodiment of the invention that particular MO's are related, but does not explain why. For instance, if user U n is a VP of marketing, she may need a number of networked automation resources available to her at all times to perform her job effectively.
- Business unit BU m is, say, the Comptroller's department. That department's job function also calls for a number of networked automation resources to be available at all times.
- users and groups of users are containers of networked resources.
- These resources can be computers, application, telephone extensions, printers, and so forth.
- a depiction such as that in FIG. (Diagram 23 ) represents these groupings of needed resources as containers whose elements match users' basic level of categorization of resources.
- business users “see” their resources as computers and applications running on them, not all the behind-the-scenes devices, services and middleware that proper functioning of those resources depends on, then that is their effective grouping. Because the model contains the fully elaborated microscopic level of “behind-the-scenes” resources and their service relations, the end user's view does not preclude detail, it simply removes it from the center of focus.
- FIG. 23 may be an effective graphical representation of business unit or user containment, it is not the only notational variant that may be useful, and is not the only variant within the scope of the invention.
- FIG. 24 depicts a user U as a service consumer of application A in path notation; this notation makes clear that U suffers the impact of changes to A—as far as the model is concerned, there is no inherent difference between devices and end users for spread of impact.
- relations between MO's are classified as either simple or complex.
- a “session” in this embodiment comprises both path and service, as explained above. In a complex relationship, these two elements vary independently, while in a simple relationship, they vary in unison.
- networkaccess is a simple relationship in UNIX.
- the network protocol is responsible for providing network access—i.e., networkAccess is a service of the network protocol.
- networkAccess is a service of the network protocol.
- path choice that the system can make; by virtue of running on the OS, the application potentially has the service.
- Physical connections between, say, a card and a backplane are also simple; in this case, however, the path is the focus. There is no particular service choice available. By virtue of being seated in the containing box, the card is connected to the backplane.
- the simple isConnected relationship here does not even specify that the system uses the given connection, let alone for what purpose. It records that there is a connection.
- paths are important for the reasoning engine to compute the impact of physical and lower-level logical failures on higher-level sessions.
- Services categorize the kinds of relationships that make up the system at increasing levels of abstraction, and therefore provide a dimension of impact analysis from differing perspectives. For example, suppose one wants to know how reliable database service was for some set of business units. Categorization by service allows the model to cut simply across different database service providers and provide a meaningful aggregation of both fault and impact data for that service.
- relationship types have the following attributes: ID internal symbol Name human readable name Cardinality one-to-one, one-to-many, many-to-one, many-to-many Dependency parent, child, both, neither Complex/Simple complex (independently varying path and service) or simple (dependent variation of path & service)
- Relationship rules specify what entity types participate in what types of relationships.
- the preferred embodiment splits rules into producer (or parent) halves and consumer (or child) halves. This split provides a very flexible means of defining constraints on relationship participants, and allows the model to perform the discovery process described above, during runtime.
- a relationship rule has the following attributes: ID foreign key back to service type Parent/Child rule ⁇ see example below> Class foreign key to the entity type NextHopService Type foreign key to a service type; used for path determination -- for instance networkAccess Partner Restriction foreign key to an entity type
- a given entity type inherits the relationship rules of its superclasses. It has the ability to add new rules that further constrain a more general rule, or even block the use of a rule defined by superclass types. For example, suppose that in a given client site, both IP and IPX provide networkAccess services. The preferred embodiment can express this capacity as a general network access rule for applications. Client application application A can use only IP, however. The definition for applications of the same type as application A , then, needs to override a general (child) network access rule that offers no restriction and provide a rule that restricts instances to IP.
- the model definer can also define a default value for the name attribute (such as “NEWClient” and “NEWServer”), as illustrated in the following tables: ID NEWservice Name New Service Cardinality one-to-many Dependency child Complex/Simple complex (NEWService definition) ID NEWService Parent NEWServer NextHop networkAccess Partner Restriction none (NEWService parent rule) ID NEWService Child NEWClient NextHop networkAccess Partner Restriction none (NEWService child rule)
- the invention has discovered what components populate the external system and constructed a model representing the service relationships that exist among those components.
- the system administrator has been able to define, in addition to the individual hardware and software components of the external system, higher-level groupings thereof, based on users' groups, business units or whatever other basis meets needs of the organization that owns the external system.
- FIG. 25 illustrates the state of a portion of the model after the dynamic agents have collected reported information from the resources represented by hub H 7 , computers C 10 and C 11 and applications A 14 and A 16 .
- the MO's in this portion of the model report that they have received state-changing events.
- FIG. 26 illustrates the effects of the model's rootward graph traversal.
- the event at node H 7 indicated that it is a root cause. Therefore, rootward traversal through the model stops at H 7 , and the model generates an alarm that focuses on H 7 but contains all the concomitant sympathetic events from the affected leaves.
- application A 14 may not appear in the FIG. to be directly leafward from the root-cause event at H 7 , because of the direction of the arrow representing the session between that application and database D. In fact, however, the existence of that session means that that application is included in a leafward traversal from H 7 .
- FIG. 27 illustrates the leafward spread of impacts through this portion of the model.
- the failure of the hub at H 7 has caused loss of service to computers C 10 and C 11 , and to sessions S 19 and S 20 that connect to database D 15 running on computer C 10 .
- the loss of these two sessions interrupts the work that people in business units BU 22 and BU 23 are doing.
- the model has predicted the impact of a lower-level device failure on the activities of end users. These users are not resources that the system recognizes, nor resources that the system is able to collect data about (except through the intermediary of a trouble ticketing process). While the model's predictions for computers C 10 and C 11 may have corroborating evidence from the sympathetic events logged for those nodes, users and business units may well in a given instance have only predicted impacts at this time.
- FIGS. 28 through 32 A schematic summary of the interaction phase is shown in FIGS. 28 through 32.
- the Analyzer function of the Agent Manager in the preferred embodiment is responsible for interpreting incoming data from whatever source, recognizing the data as an instance of some predefined event of interest to the system, and creating the association between the captured external event and the model event.
- Techniques of analyzing, interpreting and associating messages vary according to the data collection technique.
- Events in the model are instances of a pattern known as an “eventDefinition”.
- EventDefinitions function as a kind of frame to hold information culled from a captured event.
- An example of the structure of an eventDefinition is as follows: resultantState severity indicator message Text foreign key to a message dictionary based on language; each message has substitutable parameters defined that are filled in with values pulled out of actual events eventId a unique identifier for the event isRootCause true or false
- the model captures information about an entity type via SNMP, it will have relationships to several SNMP objects in the repository: the SNMP enterprise OID; a table mapping trap numbers to eventDefinitions; and one or more MIB tables.
- the model captures information about an entity type via log file tailing, it will have relationships to several logging objects in the repository: the file(s) to tail; tables of pattern matches (inclusion criteria); parsing rules—what fields to extract from the message; and the eventDefinition.
- dynamic agents are semi-autonomous Java applications that are responsible for: interacting with OS-specific or protocol-specific sources; eliciting data according to a specific “recipe”; and analyzing the elicited data.
- each dynamic agent comprises three functional (although not necessarily structural) components: a Sensory Monitor, an Instruction Set, and an Analyzer.
- the sensory monitors interact with system- or high-level protocols—for instance, TCP/IP sockets, OS file systems, CMIP or SNMP.
- Instruction sets belong to the process of eliciting data. They are control parameters and methods that direct sensory monitors in how to connect to a particular host, to delay three minutes between polls, etc.
- Analyzers belong to the process of analyzing data. They interpret and winnow information coming in from the dynamic agents (for instance, “take all messages that begin with [amd];” etc.).
- the dynamic agents run according to a thread-based active object model, comprising at least one independent thread constantly running in an endless main loop. External system events interrupt the main loop; message-appropriate callbacks run on interrupt. New control information thus interrupts dynamic agents and alters their states on the fly.
- the Agent Manager hosts the independent threads, and provides system and housekeeping services including: ensuring that all threads that should be running, are running; knowing where to send the data collected by the dynamic agents; handling thread interruption, termination and monitoring; and offering inter-thread messaging, queuing and synchronization services. That is, the Agent Manager provides an execution context for the dynamic agents.
- the last aspect is the key to the others.
- the Agent Manager needs to handle the scheduling of all modules running within it, and to ensure that they have all necessary resources.
- the implementation of the running module is a black box from the viewpoint of the Agent Manager. Only the requester (human or automated) knows what the module is for. The requester needs to retain addressability of the module and to receive the collected information. Two independent subsystems in the environment, therefore, need to control the module simultaneously.
- the implementation of the module reflects user interpretation and business needs, and so is not predictable at design time. It does not play a role within the bounds of the closed system, so its role cannot be structural as far as the execution context is concerned.
- the system provides a fixed path for passing control and information messages for the individual request by assigning arbitrary CORBA CosNaming service names for both directions of the conversation at the time of the request. Such assignment tells a user request to address requests through service name X and receive responses through service name Y. (The CosNaming service also serves to define the cascades of Agent Managers discussed previously.)
- the ad hoc module receives the complementary information: listen for interrupts on service name X and return information on service name Y. Implementation remains up to the individual request.
- a bundle is a complex structure that contains modules, named parameter lists, and corequisite dependencies.
- This bundle encapsulates the following: the name of a Java module to run, three named sets of parameters, and a corequisite service the Agent Manager must offer. Bundling software needs to perform symbolic substitutions, and so requires a parameter list namespace.
- This bridge-polling bundle shares the Java module and corequisite service needs of the hub-polling bundle illustrated above. It achieves reuse by changing the incoming parameter lists designating the devices to poll, the variables polled for, and the frequency with which to poll them. In this instance, both bundles happen to be stored as such in the Control Repository. This is an implementation detail that reflects that the two kinds of polling are frequent occurrences for the system administrator, and therefore convenient to have persist. On-the-fly overrides can produce the same results, but demand that the administrator spend more effort at runtime.
- ad hoc requests are a mechanism for a user to manipulate the demand-data-flow behavior of the system.
- this behavior is configured in a more-or-less predetermined way administrators set up beforehand (different bundles that have much overlap).
- the behavior is fully ad hoc (the same bundle gets on-the-fly overrides).
- the system has the ability to add new behavior dynamically, in accordance with user needs.
- An existing, well-defined static structure provides the mechanism for adding the new behavior.
- Any instance of the system of the invention must be able to provide a site-specific inventory of these services. Unless all services are available on all distributed Agent Managers, the system needs to maintain a catalog that tells it where it has distributed those services—which components it has charged with the responsibility of executing the proper tasks. A model such as CORBA Trading Services can meet this need.
- Agent Manager charged with offering a given service needs to know how to offer it. Bundles play a role here, too.
- the Agent Manager requests a bundle from the Control Repository.
- the Repository maps the service structurally to a specific bundle (i.e., a module and its concomitant set of parameter lists). An operator can alter the implementation of the service at will, even dynamically.
- An Agent Manager with no control information starts up running only the configuration handler, the communications handler, the thread manager and the inter-thread queue routines. At this point, the Agent Manager has no information as to what its goal is, what dynamic agents to run, or how to pass its data to the next hop.
- An Agent Manager receives its higher level objectives through two data structures: the mission package and the bundle execution.
- the mission package can contain bundle executions, but its main purpose is to provide control information that allows the Agent Manager to situate itself in the mesh of the infrastructure (that is, what service it should advertise for the cascade, what service it should send to for the cascade).
- Bundle executions are the instantiation of data structures that reside in the Control Repository called bundles.
- a bundle as already described, ties together named parameter lists, named parameters, executable modules or scripts for building dynamic agents. Various portions of the bundle can be overridden to allow easy reuse.
- Agent Manager Once a dynamic agent has started running within the context of the Agent Manager, it is independently addressable. That is, whatever component controls the dynamic agent—the end user in the case of an ad hoc request, the model in the case of a programmatic request—is able to address the dynamic agent independently of the Agent Manager. In other words, there is no need for the higher-order entity to understand the topology or constitution of the infrastructure in order to keep tabs on, or manipulate, the dynamic agent as it runs. Agent Managers are also independently addressable, since there are occasions—especially times of administrative interaction—where the Agent Manager, rather than some dynamic agent running under it, needs to receive control information.
Abstract
Description
- 1. Technical Field of the Invention
- This invention relates generally to the field of the operation and management of complex systems, including the operation and management of computer networks.
- 2. Description of Related Background Art
- The present invention is intended to facilitate the management of a large-scale, far-flung computer network, such as the extensive distributed systems that are commonplace nowadays in large organizations. The person or team responsible for this job is typically in charge of everything from the organization's power supplies through its business software applications. The organization's business management, naturally, may not wish to concern itself with the technical details, but does demand that when problems occur, they be dealt with according to the seriousness of the effects they have on the normal operations of the business. For example, management will want the greatest attention to be paid to those problems that affect the highest revenue generators among the various parts of the business organization.
- This is a difficult demand to meet. For many network operation managers, it can be very hard just managing the network, identifying, diagnosing and correcting problems as they occur. Being able to prioritize among a set of problems occurring during the same time period in such a way as to differentiate among levels of service being provided to different parts of the business organization has thus far been beyond contemplation. One important purpose of the present invention is to make this goal attainable.
- The phenomenal complexity of the world of a large distributed network of interrelated components is reflected in the distribution of costs involved in managing such a system.
- According to one study, about $2.00 of every $10.00 spent on distributed systems engineering and operations, is spent on engineering, while the other $8.00 is for operations. Moreover, about $6.00 of that $8.00 is spent on problem isolation and diagnosis, while only about $2.00 goes to problem resolution.
- If it takes on average three times as long to identify a problem as it does to solve it, the soup of distributed systems parts (hardware and software) and their interrelationships is nearly impenetrable to the operators.
- This complexity has many sources:
- Hardware and software components are heterogeneous.
- System components are globally distributed.
- Subcontractors may be running the system, or parts of it, on their own sites, or the business's, or both.
- Engineers include multiple redundancies in the design of the system to minimize outages, but each redundancy adds extra complexity to manage.
- Systems themselves are not self-aware, and cannot report what is wrong with them. At best, individual components can report their states.
- Component reuse leads to the same components participating in multiple run-time relationships.
- The “health” of a given component increasingly depends on a contextual, not isolated, evaluation of its state.
- A given underlying condition may affect different users in different ways, or to different degrees—one user may be affected seriously, another critically, another benignly or not at all.
- Problems cascade; locating the eye or center of a storm of phenomena is not easy.
- It may even be deemed surprising that only 75% of operations time is spent on identifying problems.
- At present, operators are unable to tell how a given problem affects the various users in the business organization, and therefore are unable to know where they should direct enhanced or reduced service efforts, until the problem has been correctly identified. One result of this is that the operations managers have only the other 25% of operations time—the problem resolution portion—from which to carve out all service differentiation.
- What is worse, identification of the problem does not necessarily lead clearly to successful resolution of the problem. For example, suppose that the operator has correctly identified the root of a given problem as a bad card in an IP (“Internet Protocol”) router. Do any critical business systems depend on that router? Perhaps, or perhaps not.
- Continue with the same example. Suppose that the malfunctioning router lies on one leg of a redundant circuit that connects many disparate data delivery functions in a financial services organization. What effect does the fault have on various users?
- The network system administrator always needs to know immediately, so that he can go and replace the card.
- The manager of a profitable business unit may have invested in redundant circuits, and so experiences no problem.
- The manager of a mid-sized unit has co-invested in redundant circuits with another business unit; their joint load on the single remaining circuit permits continued service, but performance deteriorates.
- Network engineering has been experimenting with new router cards on their alternate circuit and has rendered that circuit inoperable; they have no service at all.
- A market analyst in Brussels receiving critical data from Hong Kong is going to be delayed when she loses all service; she need not have any idea what a router is, or that one exists, but she does need to understand quickly the impact of its disappearance on her work.
- A capacity planner needs to know the frequency with which router cards fail, if one brand suffers more failures than another, or if it is necessary to invest in redundant circuits for a group of users whose work is time-sensitive. She does not need to know this instant that some specific router had a bad card.
- This single example of a set of failures among computing system components has affected users quite differently. For operations personnel, knowing that the cause of the current set of events was a malfunctioning router card is a start, but provides inadequate understanding for addressing all these needs.
- Before the operator can direct problem resolution efforts to a specific part of the business organization, therefore, he or she needs to understand the systemic impact of the problem. Impact is sensitive to a wide system context, and even to conditions of the moment (for instance, the task the Brussels analyst is working on). The operations manager can attempt to deliver differentiated levels of service only when she knows whether and how this particular fault has affected particular groups of users under the conditions of the network at the time of the failure.
- It is one object of the present invention to provide a solution to the problem described above. In particular, it is an object to provide the ability to understand the impacts of a given problem on different parts of the organization using the system, at the time the problem occurs, so as to be in a better position to direct problem resolution efforts and problem alleviation efforts intelligently.
- Another object of the invention is to provide the ability to model, not only the significant hardware and software resources of the system being administered, but also the service relationships connecting those resources, in a flexible, dynamic manner, so that changes to the construction or make-up of the system being managed can be reflected promptly in the model without the need to restart the model or otherwise to interrupt running the model.
- Another object of the invention is to provide a method and system that can associate related events that are of interest to the operators and users of the administered system, and present the results quickly and in a way that makes the information easy to use.
- Another object of the invention is to provide a method by which one can flexibly model a system, and in which one can represent, not only the hardware and software resources of the system being modeled, but also arbitrarily-defined groups of those resources.
- Still another object of the invention is to provide a method and system in which the operators or a user can define, as needed, a set of data to be obtained relating to the performance of the modeled system, and to provide a particularly convenient way to organize control data to fulfill those requests using agents to obtain the required information.
- The preferred embodiment provides a software model of the managed network, and includes a flexible infrastructure for the purpose of obtaining information from the managed network and reporting it as appropriate. In runtime, the data-gathering infrastructure is used to obtain information about what components are present in the network, and about what services each is providing to which other component(s). This information is used to construct the model. In addition, the data-gathering infrastructure obtains from the managed resources information relating to any malfunction or performance degradation, and reports this information to the model, which modifies its state accordingly. The structure of the model itself is used to predict the likely impacts of the reported occurrence, and the occurrence and its predicted impacts are displayed. As all this happens, the data-gathering infrastructure also obtains information concerning the addition of new components to the managed network, the deletion of others, etc., allowing the model to update itself during runtime.
- In addition, the system administrators can define elements in the model to represent arbitrary groupings of components, such as business units. As a result, the model predicts impacts not only on individual hardware and software components but also on larger entities that are of significance to the organization using the invention and the managed network.
- The data-gathering infrastructure is conceptually distinct from and independent of the model. In the preferred embodiment, this infrastructure has a number of significant features, including a hierarchical structure that results in the ability to provide the model with as large a stream of data as may be necessary, while limiting the number of interrupts per unit time that the model must tolerate. In addition, this infrastructure preferably has the ability to be given new sets of working instructions during runtime, so that new types of information can be acquired, without the need for restarting the running of the program. Customized inquiries can also be provided in this way. Moreover, the data-gathering infrastructure uses software agents having a structure that makes possible a high degree of reusability, in the form of reusable modules that can be kept in a repository for that purpose.
- It is to be emphasized that it is by no means necessary to use all these features together; many can be used independently of the others, to great advantage, within the scope of the invention.
- The foregoing and other objects, features and advantages of the invention will be more fully appreciated from the following detailed description of the preferred embodiment, taken in conjunction with the accompanying drawings.
- In the accompanying drawings, like reference characters refer to like elements, throughout.
- FIG. 1 is a schematic illustration of a simplified example of a distributed computing ensemble to the management of which the present invention is applicable.
- FIG. 2 illustrates one example of a node, and shows the relationship between some basic logical components and the physical components.
- FIG. 3 illustrates a node that is a computer.
- FIG. 4 provides an illustration of the overall flow of information in a model constructed according to the preferred embodiment of the present invention.
- FIG. 5 shows a detail of the system fragment illustrated in FIG. 4.
- FIG. 6 provides a high-level perspective of how events can take different paths from the agent manager to further processing.
- FIG. 7 illustrates how events that report the discovery of some new managed resource are directed to the factory process.
- FIG. 8 illustrates how events representing reported and detected faults go to the Dispatcher component of the model server, which then directs the event to the corresponding managed object.
- FIG. 9 illustrates the lifecycle of an event from the model server Dispatcher process to alarm formation.
- FIG. 10 illustrates conversations among various functional components of the preferred embodiment.
- FIG. 11 illustrates classification of the functional components of an agent manager.
- FIG. 12 is a schematic illustration of making an agent manager and dynamic agents data driven.
- FIG. 13 shows the flow of control in an agent manager.
- FIG. 14 shows three phases of the model architecture.
- FIGS. 15, 16,17, 18 and 18A illustrate the discovery process during runtime.
- FIG. 19 illustrates some of the computational services that the discovered managed objects provide and consume.
- FIG. 20 illustrates some higher-level services that discovered managed objects in FIG. 19 provide in response to the organizational use of the computational systems.
- FIG. 21 represents the world of the model in terms of a network that has emerged from realized computational services or paths.
- FIG. 22 illustrates the model including, in addition to paths, sessions.
- FIG. 23 represents groupings of needed resources as containers whose elements match users' basic level of categorization of resources.
- FIG. 24 depicts an alternative way to view these relationships.
- FIG. 25 illustrates the state of a portion of the model after dynamic agents have collected reported information from the resources.
- FIG. 26 illustrates the effects of the model's rootward graph traversal.
- FIG. 27 illustrates the leafward spread of impacts through this portion of the model.
- FIGS. 28, 29,30, 31 and 32 schematically illustrate the interaction phase of runtime.
- FIG. 1 is a schematic illustration of a distributed computing ensemble (actually, a combination of networks interconnected to each other) as a simplified example to the management of which the present invention is applicable. In FIG. 1 are shown several subnetworks, each including a number of workstations connected together in various ways. Many or all of the workstations are connected to an intranet or to the Internet, or both. The system shown includes workstations in several offices, located in different buildings in a number of countries and continents. The workstations each run on an operating system (“OS”), but not necessarily on the same one. Also, each workstation may at a given time be running one or more application programs, any of which may require the accessing of various databases or files located at various places in the system.
- The workstations are used for the purposes of an enterprise, which has organized its business functions into a variety of subdivisions, departments, etc. A number of these subdivisions are indicated in FIG. 1, including a financial research division and an accounting department. Because of the potential for malfunctions to deprive workstations of the access they need to other parts of the system (access to a particular database, for example), the enterprise has constructed redundant communication paths between certain of its departments and critical resources, although such redundancies are not shown in FIG. 1.
- While only a few terminals and a few connections between them are shown, an actual system might contain hundreds or thousands of workstations, with correspondingly complex connections, redundancies and interdependencies. All aspects of the management and operation of such a system are of formidable complexity, and offer nearly boundless challenges and potential frustrations to the person or team in charge of that management.
- Hereinafter, the system that is being managed using the present invention will be termed the “external system”.
- The preferred embodiment of the present invention can most easily be thought of as comprising two major parts: a model of the external system, and a data-gathering infrastructure that obtains data needed by the model. There are also a model server and an operational database that support the model. The model disclosed herein could, within the scope of the invention, be used with another arrangement for gathering the required information from the external system and delivering it, and conversely, the data-gathering infrastructure can be used in many applications other than providing information to a model of a complex computer network that is being managed.
- The model simulates the evolution of faults and performance degradations through the external system. The model enables those who use the services of the various parts of the external system to specify the nature of their reliance thereon, in terms that will set clearly the expectations held of the operations personnel for that system. Those personnel, in turn, are enabled by the model to do what is necessary to fulfill those expectations, by showing them quickly what has gone wrong in the external system. The model also enables the operations personnel to know which part of the user community has suffered the brunt of any fault or performance degradation.
- An Overview
- Before proceeding with the details of the preferred embodiment, an overview of that embodiment will be useful. The preferred embodiment of the present invention, upon installation, begins by initiating a discovery process, in which it explores the external system it is to be used in managing. In the discovery process, it locates each hardware and software component of the external system, and identifies what type of component each is (e.g., hub, router, computer, operating system, application, database, etc.). By obtaining this information from each component, and using it to determine what services each component is receiving from or providing to other components (the concept of “services” is explored more precisely below; for the present, a naive concept of the term will suffice to convey a broad view of the preferred embodiment), the invention constructs a model of the external system. This model represents the various components, relevant subcomponents, and their service relationships to each other. It should be noted that, in the preferred embodiment, the model itself does not contain all available information about the nature of each component—that is, although that information is used in the process of constructing the model, only a subset of the information acquired during the discovery process need end up in the model. This greatly simplifies the model itself, reducing the computing resources required.
- If there is any component of the external system which cannot be identified adequately in this fashion by the program itself, the system administrator can manually input the information needed to include a representation of that component and of its relationships with other components, in the model. In the preferred embodiment, the represented components and subcomponents are modeled as software objects, utilizing the well-known techniques of object-oriented programming. The objects corresponding to components and sub-components are termed “managed objects” herein; again, a more precise statement of what that term means in relation to the present invention is given below. The model constitutes an example of the type of mathematical entity known as a directed graph, in which the managed objects are the nodes of the graph, and the relationships are the edges.
- One important feature of the preferred embodiment is that the system administrator has the ability to define additional managed objects in the model, which additional objects correspond to arbitrary groupings of simpler objects in the model. In particular, the administrator can define as such an object any particular user group that has significance to the operation of the business enterprise or the like to which the external system belongs. (The administrator can also make such other changes to the model as appear suitable, including the deletion of existing objects, the ascription of particular characteristics to existing objects, etc.)
- During operation of the external system, various software agents provided for the purpose acquire information relating to that operation. Each observation that represents a change of state for a modeled component or sub-component of the external system is reported to the model by the agent, and the model reflects the change in the corresponding managed object. Once such a change occurs in the model, an alarm is created, comprising that change and any others which, as described below, are determined to relate to the same basic fault that has caused the change. The preferred embodiment includes a table or the like listing certain changes which are assumed to be root causes (as opposed to being merely the effect of some more basic problem). If the change in question is of a type that is listed in the table, the occurrence of the change is provided with a tag that marks it as a root-cause event. Also, any managed objects whose performance may possibly be affected by the occurrence of the reported change (if that change is a root-cause event), are identified, based entirely on their location in the model (graph) relative to the managed object whose state has changed, and on their service relationships (direct or indirect) with the latter managed object. At this point, the alarm includes the reported events its root cause (if identified) and the likely consequences.
- A display is provided that shows which managed objects have had reported events that are part of an alarm. Preferably, a visual indication of any root-cause event is also provided; of course, several alarms may occur during the same period of time and are all displayed. The display provides the administrator with a quick way to see what events are being reported, which ones are likely to be related to each other, which (if any) of them is of a type likely to be a root cause, and which portions of the external system (including user groups) are likely to be affected, either immediately or soon, by the reported events.
- A more detailed discussion of the preferred embodiment will now be given.
- For this description, a knowledge of object-oriented programming is assumed. Also, the preferred embodiment is implemented using SmallTalk (a few parts are implemented using Java), and familiarity with these languages is assumed as well. Nonetheless, it should be noted that the particular choice of language(s) is not critical to the invention.
- The Major Parts of the Preferred Embodiment
- In addition to the model and the data-gathering infrastructure, the preferred embodiment includes a model server that maintains the model. An operational data store stores any information useful to the functioning of the model and not found within the model itself. A Dispatcher component controls the processing of events by the model, and a Factory component generates new managed objects, and deletes existing ones, as needed. A Control Data Repository (normally termed simply the “Control Repository” hereinafter) stores necessary control data of various types. Filters are provided to determine what information in the model is to be made available to what users of the system, and a display is used to make that information available in an easy-to-use form. A report engine is also provided to generate reports as requested by the system administrator.
- Important aspects of the data-gathering infrastructure are the presence of one or more agent managers, which create, run and terminate individual software agents as necessary to obtain the required information regarding the make-up and organization of the external system (to create the model and keep it up-to-date) and regarding that system's activity (to monitor operation and provide the users with the required information concerning faults and performance degradations). Also, the structure of the dynamic software agents themselves facilitates their rapid creation whenever needed to gather any desired information, and makes it possible to design agents that will perform new tasks, with a minimum amount of trouble for the administrator.
- The Model
- The model utilized in the preferred embodiment of the invention contains a number of basic or primitive types of elements: Nodes, Managed Objects (hereinafter, “MO's”), Services, Faults, Events, Alarms, and Impacts. A definition of each of these terms as used herein will be found in the following description.
- Nodes
- The model utilized in the preferred embodiment is of interrelated objects that form a network that typically exists in many dimensions (this “network” should not be confused with the physical network or networks of the external system of FIG. 1, and, for clarity, will hereinafter be termed the “model network”). This model distinguishes strictly between physical objects and logical objects. A physical object is something that it is possible to stick an adhesive label on (for example), or that could be dropped on one's foot; any logical function, in contrast, is considered as a logical object. For example, while the object that in lay parlance would be called a modem is a physical object, the function thereof is considered a logical object in this model, a logical object that runs on the physical object.
- The basic level of categorization of physical devices that form the gear of the external system, is termed a “node”. FIG. 2 illustrates one example of a node, and shows the relationship between some basic logical components and the physical components. (In that Figure, rectangular boxes denote physical objects, ovals, logical objects, and arrows, services, with the direction of the arrow indicating which component is providing the service to which.) All nodes contain boards, on which sit ports, which enable physical media to transmit information to and from the node. The first logical level above the port is the interface, on which a network protocol runs. The Internet protocol is one example of a network protocol, while Ethernet and token ring interfaces are examples of interfaces. All nodes have operating systems running on them, which OS's provide system services to the interface and network protocol, as well as to programs. One or more interfaces provide interface service to a network protocol, one or more of which provide routesVia services to a subnetwork. A network is a collection of one or more subnetworks.
- Examples of nodes include a computer, a hub, a printer, and an Internet router.
- A node that is a computer has some additional properties, and offers a more richly varied set of application services than do router or hub nodes (see FIG. 3). Examples of such application services are programs such as database servers and databases. Computers typically contain locally and remotely attached physical disks, which are organized by means of logical file systems.
- Managed Objects
- Managed objects are constructs of the model. That is, they are not themselves part of the external system, but exist only in the model. MO's are augmented finite state machines that, in some instances, mimic certain interesting behaviors of things in the external system. They are useful artifacts for explaining sets of facts in the external system in a way that is internally coherent to the model. A MO need not correspond to something that the external system recognizes as or considers to be an item or unit, but exists solely because it is deemed useful to the model. For example, a MO might very well be created to correspond to a seemingly random grouping consisting of a particular printer, a particular database and a particular LAN hub. According to the preferred embodiment of the invention, MO's can represent users of the external system, business work groups or other organization entities that use portions of the external system.
- It may be useful at this juncture to emphasize that the term MO is not used herein in exactly the same way as defined in the CIM (where “MO” refers to the “actual item in the system environment that is accessed”). That is, a MO in the CIM sense is directly referential (refers directly to something in the external system that existed before being incorporated into that system), and is a proxy for that entity. A user or application interacts with the object in place of interacting with the underlying entity itself. In the present invention, on the other hand, a MO is only indirectly (if at all) referential. The model itself represents the interrelated system of devices and applications that make up the external system; there is not necessarily any exact analogy in element-to-element relationships, however, because not all elements recognized by and making up part of the external system need to be in the model, nor are all elements in the model things recognized as entities by the external system. MO's in the present invention are constructs of the model—that is, are defined in terms of their function within the model—rather than constructs of the external system that are represented in the application. In typical CIM modeling, if the state of an existing MO changes, the model knows of that change directly. In the preferred embodiment of the present invention, in contrast, if the state of an existing MO changes, the model learns of that change through an agent (discussed below) which has captured or deduced the change, packaged it into a message, and sent the message to the model, and thus allowed the model to respond to the message.
- This approach provides three advantages:
- 1. The model can contain representations of resources that do not have the proper instrumentation to interact directly with a proxy in the model but information concerning the performance of which can be derived through inference (e.g., a passive hub);
- 2. The representations in the model can be changed independently of the things they represent (e.g., the types of information present in each MO, or in some MO's, can be redefined during runtime, thereby creating a new model in place of the old one, without having to interrupt the running of the model program, or having to restart or reinstall the software; this flexibility is an important feature of the preferred embodiment); and
- 3. The model can accept systemic events—for example, the addition of a new MO, the loss of an existing one, or an equipment upgrade in an existing component of the external system—that cannot, by definition, have a MO as a proxy.
- This approach, thus, provides the preferred embodiment with the ability to represent all objects and relations that could be represented using the CIM approach, and a large number of others that are not and cannot be accommodated in CIM.
- Services, Sessions and Paths
- A service, as the term is used herein, is a labeled, directed relationship between specified MO's. Services may be either computational or functional. Computational services are those which exist to create the fabric of the computing ensemble (the external system), for example, services performed by the OS in a computer node, opening and closing files, controlling a display device or a printer, etc. (In other words, computational services are those that serve to construct an effective distributed computing ensemble regardless of any use that the owner of the system might put it to.) Functional services are those which exist to satisfy the needs of higher-level objectives (not necessarily computational ones)—for example, getting data from a database to learn the number of tomatoes consumed per capita in Paraguay over the last five years. Functional services correspond approximately to a naive notion of “session” (that is, involve a coordinated exchange of information between nodes using conversational techniques).
- It must be realized that there may not necessarily be only a single relation between two MO's. A single pair of MO's may have multiple service relations between them. They may each provide one or more services to and consume one or more services from the other. For instance, an operating system could provide both file system service and runs service to an application. And a given service need not always behave in the same manner. For instance, assume that a router delivers data that is destined for video application V as well as for financial market data information application M. V can function perfectly adequately with a loss of 5-10% of packets; M cannot. The preferred embodiment of the invention distinguishes these two tolerances in terms of service expectations. Thus, the service relation between the router MO and the MO corresponding to application V is not the same as that between the MO's corresponding to the router and application M.
- A path is the set of MO's that offer the necessary intermediary computational services to realize a functional service. It should be noted that a functional service is decomposed into lower-level computational services; computational services, on the other hand, implement functional services. Suppose, for example, that a distributed database server needs to receive requests from its clients and return result sets to them. To receive remote requests, it needs to make use of intermediate network facilities connecting it to its clients. In terms of the model, we say that the database server MO needs “networkAccess” services from a MO on the same node. If the IP network protocol MO on the database server's node offers networkAccess services, there is a match. The IP MO, in turn, needs interface services from a network card on the same computer. If the network card MO on the computer offers interface services, there is a match. And so on all the way down the chain of connections that deliver network services to both database and client, and ultimately allow there to be a session between them. The path is the chain of functional connections that allows delivery of services.
- If the model were to realize and display simultaneously all such service relationships, the result would be a snapshot of the internetworked MO's at that moment, i.e., a map of the paths of the network at that moment. In the terms of the present invention, the model is an emergent property of MO's and services, and is not itself the main focus of what is being done.
- It should also be noted that functional services and paths vary independently of each other. For example, suppose that a client application clientA connects to its server “servers” via IP, while another application clientB connects to the same servers via the Internetwork Packet Exchange protocol (“IPX”). Both receive database service from the server. Since a path is a concatenation of computational service links, by definition one path is determined by service “IPX service”, the other, by service “IP service”, and so the two paths are different.
- Again, suppose that a database-using application runs on a mobile computer. When the computer is at its docking station, the application converses with the databases through the LAN using synchronous IP protocols. When the computer is away from the desktop, the application converses with the databases through asynchronous store-and-forward queues that are realized over a dial-up point-to-point protocol connection. As far as the application is concerned, it expects and receives the same database service, and yet the path over which the service is realized, is quite different.
- The same point can be considered informatively from the viewpoint of the lifecycle of information within the invention. Paths are a necessary concept for computing the impact of physical and lower-level logical failures on higher-level services. Services, on the other hand, categorize the kinds of relationships that make up the system at increasing levels of abstraction. Services, therefore, create a dimension of impact analysis from differing perspectives. For example, suppose it is desired to know how reliable database service is for some given set of disparate users. Categorization by service allows the model to cut simply across different database service providers and to provide an aggregation of both fault and impact data for that service.
- Faults, States, Events, Anomalies and Performance Degradations
- Any detected change of an individual managed object's state to some undesirable value is a “fault”. (A MO's state is simply the set of values that its various attributes have.) An event is a representation of a fault in the model, or of the inverse, a recovery from a fault.
- Many resources (parts of the external system) report their faults, whether through spontaneously recording to a log, emission of traps of the type provided for by the Simple Network Management Protocol (“SNMP”), or in response to an active request for information on performance. Some resources, however, are not constructed with the ability to report their own state spontaneously, and are not instrumented to respond to a polling request or the like; this necessitates fault inference. What this means is simply that the model infers that a fault exists. Passive monitoring techniques such as pinging a task or device, give the barest of information, essentially only whether the task or device is still running in the system.
- Performance metrics offer another source of information. By analyzing such data with a variety of statistical tests, it is possible both to anticipate faults and to interpret skew conditions as faults. Such analytically-inferred or -anticipated faults are termed “anomalies”.
- In addition to actual faults, and anomalies, performance degradations are also of interest. These are changes in the state of external system resources that hinder, but do not destroy, the ability of themselves or other parts of the external system to perform their required tasks.
- Impacts
- Faults and anomalies are adequate for component management because they report on the health of each component of the external system taken in isolation (“component” here is to be taken broadly, including either hardware or software, but always refers herein to an element of the external system). Service-oriented management also demands knowledge of impacts. An impact is the description of a disruption in service for some portion or user A of the external system owing to a correlated disruption in service of some portion B. For instance, a database suffers sympathetically if a business application cannot reach it owing to router failure. More simply, the router fault has an impact on some database sessions. The external system itself is unaware of the impact. Rather, the known relevant information (i.e., after all the other extraneous information is stripped away) is likely to be:
- The router registers a fault through an SNMP trap.
- The application may register a fault in that it cannot receive data—or the end user will telephone either application support or database support to complain.
- The database probably registers no problem.
- Database support needs to know about impacts as well as true database faults (disk full, program crash, etc.). If there is no use of the concept of impact, then when an affected user calls database support, the support person is likely not to know of the problem and will have to begin a second effort researching current conditions. If an operation system determines impacts, on the other hand, then the database administrator will have received information that a malfunctioning router may be hampering some users' database access, and can anticipate users' calls.
- From the foregoing, it will be clear that the ability to manage impacts presupposes the reporting of faults, as well as the ability to correlate disparate fault reports (events). Impacts, as used herein, are the outcomes of applying some reasoning system to a directed graph of known faults.
- FIG. 4 provides an illustration of the overall flow of information in a model constructed according to the preferred embodiment of the present invention. That Figure includes both a partial view of the external system (the “Network Fragment” in the right-hand portion of the Figure) and a portion of the model.
- Incidents
- During the course of normal operation, managed resources (the elements or components, including both hardware and software, of the external system) suffer faults and performance degradations. New managed resources join the external system, others leave the system, and others change their configuration relative either to themselves (e.g., an equipment upgrade) or to other resources in the external system. All these systemic occurrences will be termed incidents.
- One main purpose of the invention is to convey information spontaneously about incidents that affect any end user's set of interests. A user's interests are defined by a set of managed objects, as well as by the functional perspective the user has on that set (e.g., business user, administrator, or application support). To accomplish this goal, the model must reflect each such change to the external system or the resources comprising it and present that information to the relevant user(s).
- Several stages of processing must occur in the invention to allow for such reflection. The invention must incorporate the following stages in the lifecycle of information:
- 1. Raw data: Learn of all incidents.
- 2. Event winnowing: Analyze the information received to determine whether that information indicates
- a change of state to a MO that must pass to the model,
- performance data to store for later examination, or
- just “noise” to discard.
- 3. Event association: Convert the representation of interesting events into the model's own event formalism so that it can act on them.
- 4. Entry to model: Determine if the event marks the addition of a new MO (and if so, inform the MO Factory component to construct the new MO), or a change to a known managed resource (and if so, inform the Dispatcher component to notify the corresponding MO of the event, or performance metrics about some managed resource, and inform the Operational Datastore component to insert this new material).
- 5. Impact/root cause analysis: Determine the systemic, or contextual, significance of condition-changing event(s) (that is, determine the effect on the condition of the overall system that results from the specific event).
- 6. Persistence: Store persistently a completely associated set of events that it has categorized as an alarm.
- 7. Information filter: Organize and categorize information about which MO's have changed state either of their own accord or sympathetically to ambient conditions, filtered according to user interests and entitlements.
- 8. User display: Make the fact of change, and the information as to whether it occurred of the MO's own accord or is a sympathetic reaction, available to its viewers according to preselected (but redefinable) groups of MO's (business units, for example).
- Steps (1) through (3) are performed by the data-gathering infrastructure, while steps (4) through (7) occur in the model and related components, an step (8) is performed by the display system.
- Network Fragment Containing Managed Resources
- As described above, the model monitors the operation of, and faults occurring in, the set of managed resources that 30 constitute the distributed computing ensemble (the external system). A fragment thereof, shown at the right in FIG. 4, is shown enlarged in FIG. 5 (Diagram4). This exemplary fragment contains a network N1, to which are connected a router R1, a hub H1 and three computers C1, C2 and C3. Database D1 runs on computer C1. Applications A1 and A2 run on computers C2 and C3, respectively. Router R1 also connects to networks N2, N3 and N4.
- FIG. 6 illustrates a portion of the preferred embodiment of the invention. The two subcomponents illustrated here are an agent manager and the model server.
- An agent manager (of which any required number may in principle be provided) contains multiple dynamic agents (“mobile” programs to perform specified tasks), indicated here as d1, d2 and d3. An agent manager may exist without any dynamic agents in it, or may contain up to some specified maximum number (in the preferred embodiment, this limit is 64; the invention is of course not limited to this value) These dynamic agents use such known mechanisms as file tailing, SNMP polling and SNMP trap receiving to monitor the managed resources in the external system. Monitoring amounts to performing two activities:
- (1) capturing messages that devices or applications in the external system have emitted spontaneously; and
- (2) collecting performance metrics of applications or devices in the external system.
- The desired result of monitoring is:
- to recognize the onset or termination (hereinafter sometimes termed the “offset”) of faults by interpreting the messages a device or application has emitted; and
- to detect the onset or offset of anomalous behavior by performing quantitative analytic tests against performance metrics.
- Dynamic agents formulate the results of their analysis in terms of formal message objects of the model (events), and return those events to the model (this process of event return is explained in section relating to the data-gathering infrastructure, below).
- FIG. 6 provides a high-level perspective of how events can take three different paths from an agent manager to further processing, by either the model server or the Operational Datastore, depending on the event. More detail is given in the following paragraphs.
- FIG. 7 illustrates how events that report the discovery of some new managed resource are directed to the Factory process, f1 of model server m1, which then generates a new MO for the model, in this case mo6. That is, if an event enters the system for a managed resource for which there is no corresponding MO, Dispatcher i1 notifies the Factory f1 of the need to create a new MO.
- FIG. 8 illustrates how events representing reported and detected faults go to the Dispatcher component i1 of model server m1, which then directs the event to the corresponding managed object, in this case mo6.
- All performance data, including data that shows no detected anomaly, goes directly to the Operational Datastore ods from which a reporting engine r1 is able to derive reports to fulfill users' needs.
- In the preferred embodiment, all MO's adhere to a publisher-subscriber pattern. In such a pattern, one dedicated component takes the role of a publisher, and all components dependent on changes in the publisher are termed its subscribers. The publisher maintains a registry of its current subscribers. Whenever a component wants to become a subscriber, it uses a subscribe interface, offered by the publisher. Whenever the publisher changes state, it sends a notification to this effect to all its subscribers, which in turn retrieve the changed data at their discretion.
- In the present invention, a MO is said to publish a state change it has undergone to its dependents (i.e., the list of MO's that subscribe to it), and is said to subscribe to the state changes of all its supporters (i.e., the list of MO's it subscribes to). All MO's have at least one relation with at least one other MO, which relations can be thought of as a “tree-like” graph whose nodes are the MO's and whose edges are the publisher-subscriber relations among those MO's. Each of those graph edges is thought of as a vector (i.e., has a direction), pointing from the supporter (publisher) to the subscriber.
- The messages MO's pass to one another in their publisher-subscriber relationships are state changes that result from a MO's receipt of an event. In the preferred embodiment, that is, a change in a MO may result in a corresponding change in one or more of that MO's subscriber MO's. These changes, in turn, may result in changes in still other MO's. In effect, the model traverses the directed graph referred to above, starting from the MO corresponding to the resource from which the occurrence was reported, to the edge of the graph in what may be termed the “leafward” direction, that is, in the direction from parent (publisher) MO to child (subscriber) MO. In addition to this, however, the model also traverses the graph in the other direction (“rootward”), to find other MO's that may have undergone state changes. Of particular interest here are root-cause events, that is, events that inherently are basic faults themselves, and not just sympathetic events, the consequences of other events. (Those types of events which are treated as root-cause events are listed in a table in the Control Repository.) The encountering of a sympathetic event indicates that the rootward traversal is still navigating intermediary points.
- Thus, rootward traversal moves processing toward a root cause, and leafward traversal towards systemic impact.
- Suppose that a disk drive connected to a computer node fails in the external system. Applications depending on the drive will stop functioning also. Both the disk drive and (probably) the applications will emit error messages. In the model of the preferred embodiment, the disk drive MO receives a message indicating an event that is inherently a root-cause event (the disk drive failure), and emits a state change message to its dependents, including the MO's for the applications in question. The application MO's consequently change their state. In this case, the model is predicting that the applications will feel the impact of the disk drive failure, and the invention labels the application MO state changes as impacts.
- Suppose further that a dynamic agent independently captures a report emitted by one of the applications about its inability to gain access to needed data (for example). Within the model, the MO corresponding to that application receives a message indicating an event that is inherently a sympathetic event. It searches for an associated root-cause event by searching among those MO's supporting it for any that holds a root-cause event (i.e., for any that has received a message indicating the occurrence of an event of a type that is inherently a root-cause event). In this case, the model is corroborating the impact it has predicted. More importantly, the model is associating apparently disparate sympathetic events by associating those events with root causes, in a rootward traversal. If no root-cause event is found, the operator cannot, of course, be given a clear identification of the basic problem, but can still be provided with the enormously useful information as to which of the various incoming events are related to each other and what their likely impacts will be. In addition, the identification of the MO's that are involved in such a group of related events will likely facilitate the eventual identification and cure of the fault. The cluster of such events is termed an alarm. (If the cluster does not contain a root-cause event, then the cluster is said to be a proxy alarm, until such time as a root-cause event is reported.)
- Suppose, for example, that multiple nodes in a given portion of the model receive apparently independent events that are in fact due to the same underlying cause. An alarm is the abstract container of all associated root-cause events. The alarm is also the primary unit of display of what is going on to the administrator. Using alarms constrains the flow of information to the administrator using the invention, and thus allows the administrator to focus on the root cause rather than on all the varied presentations or manifestations of it.
- A viewer other than a system administrator (e.g., application support, an IT manager or a business user) has a different basic level of categorization of events than the administrator does. The general rule is that the viewer's basic level of categorization is the focus of that viewer's interest. For the administrator, that is the root cause; for application support, that set of applications she is responsible for in a given deployment context; for a business user, that set of resources she interacts with directly (applications, printers, etc.). The impact is the alarm equivalent for non-operational perspectives—that is, while the operator is interested in the alarm (which is handled in such manner as to direct attention to the root cause), the impacts are handled in such manner as to present to other users the information of most interest to them.
- Alarm Creation
- FIG. 9 illustrates the lifecycle of an event from the Model Server dispatcher process to alarm formation. As shown, a MO passes messages about its state changes to MO's that depend on it (leafward traversal), or on which it depends (rootward traversal). Rootward traversal terminates when it encounters a node (MO) with a flag set that indicates that the latter MO has received a message indicating a root cause. Leafward traversal terminates on exhaustion of the tree (i.e., when there are no more leafward nodes to go to, traveling along the relationships from the root-cause MO). The collection of sympathetic leaf events and their underlying root-cause event together create an interaction history known as an alarm. The description of the root-cause event labels the alarm.
- The Dispatcher receives a reference to an event from the Control Repository, values to fit into that frame, and a reference to a MO. If the Dispatcher does not recognize both the MO and the associated event, alarm processing is finished. Either the Factory needs to create a new MO or the model server needs to log that it has received an unknown event. Otherwise, the MO constructs the event by fitting the parametrized data that it has received into the event frame it has retrieved from the Repository.
- If the event is not of a type that is ordinarily a root-cause event, then the MO checks against its set of active supporters and adds the event to any active alarms in that set.
- On the other hand, if the event is one that is ordinarily a root cause, the MO checks to see if it has an active alarm. If not, it creates an alarm; otherwise, it adds the event to an existing alarm, and the text of that alarm is changed to reflect that the root cause has been identified. The addition of a new alarm or an update to an existing one causes publication of change of states to all objects that subscribe to alarms (filters, which are discussed below).
- Filters and Displays—Getting the Information to Users
- Processors called filters, represented in FIG. 4 as fi1 and fi2, are sensitive to alarms according to configurable criteria. Each filter stands in subscriber relationship to a given set of MO's. In essence, a filter is a set of inclusion criteria for selecting MO's with which the given set of MO's should enter a subscriber relationship. In their role as subscribers, the filters receive messages when alarms associated with Mo's in their inclusion-list change state.
- The filters then alert the view applications of user stations (us1 and us2 in FIG. 6) of the alarms. The view applications sit in a subscriber relationship to their filter, which publishes its own state change, i.e., a new alarm (or a modification of an existing one), to the viewer applications.
- When the viewer application receives the update message, it updates its display. The user can double-click on the alarm display to see the underlying events behind the alarm. The exact arrangement of this display is not critical, as long as the user or operator seeing the display is provided with the requisite information about what is happening.
- The Data-Gathering Infrastructure
- The data-gathering infrastructure of the preferred embodiment is a flexible, reusable provider of data collection and distribution services. This infrastructure is entirely independent of the model logically, functionally and in code base. Both infrastructure and model, however, are parts of the preferred embodiment. The description that follows is of a computational mechanism that has multiple functional uses within that embodiment.
- The three main uses in that embodiment are:
- discovery of things in the external world, where the customer of the information is the MO Factory component;
- collectors of information about the conditions of things in the external system, where the customers of the information are the MOs; and
- collectors of information about things in the external system, where the customers of the information are the end users of the preferred embodiment.
- These are not the full range of possible uses of the infrastructure, but simply representative uses.
- Main Purpose of the Data-Gathering Infrastructure
- The model contains relatively little knowledge about the things that make up the external system, and most preferably has as little such knowledge as possible, knowing only the kinds of relationships the components of the external system enter into. To get that information, the model must rely on the infrastructure, which does have the requisite abilities. The infrastructure takes, as input, aspects of the essential characteristics of the things themselves (the intensions), and produce as output aspects of the external characteristics of those things (the extensions). The model thus requests services from the infrastructure. The infrastructure does not need to know anything about the model's implementation it only needs instructions of what to collect and what to emit. Enterprise management, consolidated financial data feeds and messaging services are just some examples of possible models that could take advantage of the identical data-gathering infrastructure, differentiated only by the specific instructions it follows (that is, the data-gathering infrastructure has many applications other than with a model of a distributed computing ensemble, as in the preferred embodiment, and this aspect of the invention is not limited to use of the infrastructure with such a model).
- Goals of the Data-Gathering Infrastructure
- Some particular goals that are achievable using the architecture:
- 24 hour a day, seven day a week availability;
- extreme flexibility in dynamic configuration and reconfiguration;
- fault tolerance of data collection;
- independence of the infrastructure from the model;
- parallel processing of data collection;
- distributed location of data consumers (i.e., multiple model instances);
- ability to cascade information emitting from data collectors;
- seamless availability across multiple platforms;
- ability to execute user-provided stored procedures in both ad hoc and programmatic contexts; high throughput;
- authentication and authorization security; and
- easy extensibility to new environments and models (i.e., a portable infrastructure pattern that can be reused) seamlessly for, say, models of factory flow).
- Structural Components of the Data-Gathering Infrastructure
- The main housing of the data-gathering infrastructure is the Agent Manager. An Agent Manager is an application written in Java, and runs as a stand-alone address space.
- There are seven first-order components of an Agent Manager in the preferred embodiment. Four of them are intrinsic components, by which is meant that the rationale for their existence derives from demands of Agent Manager processing itself. The other three are extrinsic components, that is, components whose rationale for existence derives from demands placed on the Agent Manager by the macroarchitecture of the preferred embodiment.
- An unspecified number of dynamic agents (from none to an architecturally unlimited, but practically limited number) may run in the same Agent Manager address space. To host dynamic agents, Agent Managers need additional components that fall into three categories:
- control communication with other address spaces;
- information communication with other address spaces; and
- internal housekeeping.
- The Agent Managers function as the gateways into the main processing area of the model of the preferred embodiment. The primary function of the Agent Managers is to perform a preliminary analysis of uninterpreted (raw) data.
- Such data can arise from two sources:
- 1. the managed resources report their own condition spontaneously through such mechanisms as recording messages in system logs or emitting SNMP traps; and
- 2. some intermediary agent infers the condition of the managed resource by observing its external behavior (e.g., CPU consumption, packet loss rate) that are detectable through such mechanisms as polling of SNMP Management Information Blocks (MIB's), or retrieving system management metrics from the control blocks of the relevant OS.
- Preliminary analysis by an Agent Manager results in three possible outcomes, in which the data can indicate, respectively:
- 1. that some change of state has occurred to a known MO (either a shift from normal to anomalous behavior, or a return to normal);
- 2. that the model should create a new MO to represent a previously unknown managed resource; or
- 3. that a known MO is behaving appropriately, and its performance data should thus just be recorded.
- Broadly speaking, the function of the Agent Manager is to analyze the uninterpreted data, determine which of these three outcome paths it should follow, format the data appropriately to the particular outcome, and route it accordingly.
- The architecture of the Agent Manager comprises non-terminating reactive programs that interact with their surroundings. These programs react to either external stimuli (incoming data from various sources) or internal stimuli (control data from other components of the external system). This control data instructs dynamic agents (described below) about what kind of data to acquire, how to analyze it, and where to pass it. The Agent Manager learns from control data what dynamic agents to stop, start or update.
- The control infrastructure of the preferred embodiment follows a data-driven architecture. The implementation of functional components does not have a one-to-one correspondence to address spaces, or even to objects. In some cases, the method of an object might implement the function, while in others implementation may be by means of an independent address space. What is important from the overall architectural standpoint is the set of functions themselves.
- The Control Repository provides a central storehouse of control information, which must flow through the system in an appropriate way. This flow of control data constitutes a set of “control conversations” among functional components. These types of “conversations” are illustrated in FIG. 11, and include:
- 1. Control Repository to Agent Manager: mission packages, bundles;
- 2. Agent Manager to instruction set: attribute list of parameters;
- 3. Agent Manager to sensory monitor: attribute list of parameters;
- 4. Agent Manager to analyzer: attribute list of parameters;
- 5. Model server to Agent Manager: ad hoc requests, updates of running modules;
- 6. Model server to Control Repository: requests for bundles to implement ad hoc requests;
- 7. User station or model to model server: requests for active data, requests for monitor services (ad hoc modules);
- 8. User station to Control Repository: listing currently available modules (scripts) that can be run, their description and their parameters (it should be noted that in most cases this is likely an indirect conversation mediated by the server);
- 9. Administrative station to Control Repository:
- overall model configuration, module insertion, update, reporting, etc.
- 10. Control Repository to warehouse extract-translate-load: instructions for maintaining roll-up data model based on raw information.
- FIG. 11 illustrates the division of the functional components of the Agent Manager into those with an intrinsic rationale and those with an extrinsic rationale. Among the former are: control and configuration handling, inter-thread queue framework, communications handling, and thread handling. Among the latter are control of the instruction set, control of sensory monitors, and control of analysis. These various functions will be discussed in what follows.
- Configuration Handler
- This function is to receive mission packages and both ad hoc and persistent bundle requests from consumers, and to instantiate those requests from the Control Repository.
- Inter-Thread Queue Framework
- This function is to provide low-level semaphore, mutex and asynchronous queue services to allow threads to provide parallel processing and object sharing as necessary.
- Communications Handler
- This function is to provide a mechanism for passing collected information out from the Agent Manager, and collecting information from other Agent Managers, as described below.
- Thread Handler
- This function is to manage (start up, stop and monitor) child threads that are performing services for the model or other end consumers.
- Control Communication
- Agent Managers bind to the rest of the data-gathering infrastructure through a series of conversations with neighboring components (these are software components of the infrastructure, and should not be confused with the hardware and software components that make up the external system). In control conversations, software components of the preferred embodiment pass to one another runtime messages or system metadata whose function is to pass control from one point of processing to the next, thus effectively binding the functions of the distributed system together.
- The configuration handler subcomponent is responsible for the conversation between the agent manager and the Control Repository. It receives assignments for the running instance of the Agent Manager (data structures called mission packages). These assignments include:
- names of the services that the communications handler will use to fit itself into the cascade; and
- bundle execution data structures, comprising the Instruction Set and Analyzer scripts that dynamic agents will run, and types of SensoryMonitor the Agent Manager needs to start up.
- Because the configuration handler accepts external interrupts with control data, its presence allows dynamic reconfiguration of the number, type and assignment of dynamic agent in any given Agent Manager.
- Information Communication
- In information conversations in the preferred embodiment, Agent Manager components pass the information their dynamic agents have collected to the model. As already mentioned, this passing of information does not need to be direct. The preferred embodiment is organized to allow a cascade of information to enhance load balancing and ensure that dynamic agents run close to their managed resources, and at the same time to diminish the number of direct connections into the model server. In a cascade, Agent Managers are configured in a tiered tree pattern. Only the Agent Managers at the root of the tree pass their information directly to the model server; all others pass to the Agent Manager node in the next level. Each Agent Manager is, then, potentially a passthrough as well as an originator of data. In this way, a heavier stream of data reaches the model than it would with point-to-point connections, but fewer concurrent interrupt points disturb the model's processing. The communications handler subcomponent is responsible for transmitting information messages from an agent manager, and receiving messages coming in from another agent manager lower in the cascade. Communications handlers communicate using CORBA CosEvent channels and find each other through CORBA CosNaming services. These commercial infrastructure components allow agent managers and servers to be in a fully dynamic configuration. Each of these components needs know only the name of its higher node, not whether that node is a fellow agent manager or a server. This can be expressed architecturally by noting that only the information-message-passing behavior needs be exposed to the data collection component, not the intrinsic behaviors of either server or agent manager. Communications handlers have two subcomponents themselves: Inbound and Outbound Handlers. As their names suggest, Inbound is responsible for receiving incoming information from other agent managers, Outbound for passing to the next node in the cascade. Reconfiguration of the agent manager topology is dynamic. An administrator can add new assignments, delete existing assignments or change the behavior of existing assignments of any dynamic agent plant during runtime.
- Internal Housekeeping
- The first portion of the internal housekeeping functions relates to inter-thread queue infrastructure.
- Inter-Thread Queue Infrastructure
- The Agent Manager is implemented as a set of parallel processing threads that divide the work of the Agent Manager functionally. Each thread runs according to an “active thread” strategy—that is, each is in a non-terminating main loop that accepts state-changing instructions from external interrupts. Joining the threads are asynchronous event queues and a synchronization mechanism based on waits and interrupts. The set of synchronization techniques is the inter-thread queue infrastructure.
- Thread Handler
- A mission package contains the configuration information for a number of different Agent Manager functions. When it enters the Agent Manager, some Agent Manager component needs to deconstruct it and disseminate its contents appropriately to the various responsible Agent Manager components. The thread handler is the component responsible for starting the dynamic agents, and thus receives the dynamic agent bundle sections of the mission package. It interprets the incoming information, determines how many threads to start up, and with what parameters, and then enters a monitoring phase in which it waits for either a dynamic agent to end prematurely or a new mission package to arrive.
- Dynamic Agents
- Dynamic agents are the true data capture component of the data-gathering infrastructure, and so form the bridge between the set of data sources to the model and the model itself. In the preferred embodiment, the data sources are the devices and applications that the preferred embodiment manages (the hardware and software components of the external system).
- Dynamic agents are also non-terminating reactive programs that interact with their surroundings. Structural non-termination does not mean that the tasks stay running forever, only that they run until requested to do otherwise. The system requests, for instance, that discovery agents terminate after they have completed their task. The programs react either to external stimuli (incoming data from various sources) or to internal stimuli (control data from other components). This control data instructs the quite generic dynamic agents about what kind of data to acquire, how to analyze it, and where to pass it. The Agent Manager learns from control data what dynamic agents to stop, start or update. Thus, the Agent Manager plays a critical role also in the flow of control data through the system.
- Conceptual Architecture of a Dynamic Agent
- A dynamic agent bundles three functional components under a single cover: SensoryMonitor, InstructionSet, and Analyzer. Dynamic agents are the metaphoric eyes, ears and nose of the model. Consequently, the dynamic agent component most directly responsible for capturing data is called the SensoryMonitor. The SensoryMonitor collects data through a variety of protocols—SNMP, file tailing, TTY, to name a few—depending entirely on the mechanism most appropriate to the device. The SensoryMonitor is responsible only for knowing how to handle the lowest level requests, not for knowing what requests to issue, nor how to interpret the results.
- An InstructionSet is the module function that formulates protocol- and device- or source-specific data-eliciting messages.
- An Analyzer is the module function that winnows, interprets and massages incoming information. Most of the knowledge necessary for interpreting the incoming data as impacts must reside in the model proper, since only the model has access to information about how individual events fit into the fabric of the overall managed environment. The Analyzer, however, can perform first-level acceptance testing for messages, and reformat messages into a normalized appearance that the model is able to interpret more simply (i.e., assign the device- or protocol-specific knowledge to a point near the source rather than cluttering up the model with it). From the perspective of the Agent Manager, InstructionSets and Analyzers are exogenous guests running in the dynamic-agent context. Their implementations are artifacts of the control component of the model that pass transparently into the dynamic agent (in what might be thought of as a kind of friendly Trojan Horse strategy). From the perspective of the overall processing of the data-gathering infrastructure, the dynamic agent provides a distributed, possibly remote run-time context for some behavior of the model. The InstructionSet and Analyzer expose only their external behavior to the Agent Manager, and do not need to make their implementation known in any way to the Agent Manager. While the InstructionSet and Analyzer work on behalf of the model their instructions are mainly uninteresting to the runtime model. They come rather from the repository instantiation of the model. The particular implementation of the modules is immaterial to the functional division of labor. The components could all be bundled together as synchronous calls to the same class, as methods of multiple classes, or split into parallel threads. The architecture will support multiple implementations as suits the particular context at hand.
- A schematic illustration of making an Agent Manager and dynamic agents data driven, is provided in FIG. 12, and the flow of control in an Agent Manager is illustrated in FIG. 13.
- Overview of Runtime
- When the preferred embodiment of the invention is to be used to operate and manage an external computing system, it is sufficient to install and begin running: it is not necessary beforehand to customize the software either to the particular types of components in the external system, or to the number of instances or configuration of those components in the actual system, or how the various component interrelate. This is because the preferred embodiment is constructed in such a way that, once it begins running, it itself discovers the information that it needs concerning the external system The runtime of the preferred embodiment includes several phases: the discovery phase, the model building phase, and the interaction phase (see FIG. 14). These phases overlap in time, and in a sense neither the discovery phase nor the model building phase ever terminates, but rather both continue throughout runtime, as changes are made to the external system's make-up or arrangement. Each of these will be discussed in detail in turn.
- I. The Discovery Phase
- When the preferred embodiment first begins to run, it has no knowledge of the particulars of its environment (i.e., the external system it is supposed to monitor). All its “knowledge” that will be used in learning what it needs to know about the particular external system, is initially to be found in the Control Repository. This Repository contains a lexicon of kinds of modeled MO, and hence an inventory of the hardware and software components that, initially, the runtime model of the preferred embodiment can recognize. The system administrator can manually supply the information needed to instantiate MO's for elements or parts of the external system that are not accounted for by the initial contents of the Control Repository.
- The model's first steps are to recognize and represent the world in which it operates (the external system). This will include finding the set of devices and applications and other software that populate the external system, recognizing the members of that set as instances of items in the inventory (where possible), instantiating objects of the appropriate type within the model, and labeling those objects (MO's) with names that reflect their name in the external system.
- Once armed with this information, the model establishes a directed graph in which the MO's form the nodes (it will be understood that this graph is of the model, not directly of the external system, and that the “nodes” here referred to are MO's, and are not the same “nodes” to which reference was made above in the portion of the Detailed Description titled “Nodes”). In the directed graph, the relationships between the MO's characterize and define the edges. Whenever possible, determination of the relationships between MO's comes about by applying the MOs' procedural knowledge of services they consume, services they provide, and systemic constraints on those services For example, suppose that in the external system to be managed, IP services must be provided to consumers on the same computer node (this “node” is the concrete type of node referred to initially), and that one provider provides those services to many consumers. Given any IP service consumer application, the model can determine that that application must consume IP services from a single IP provider on the same computer node (i.e., the network protocol of that node). In general, it is possible to characterize MOs' procedural knowledge as their ability to determine their nearest connections based on the services they supply and consume.
- To recognize managed resources in the external system as exemplars of models that exist in the Control Repository, there is required a logical “discovery” process that has two primary functions. The first is heuristic, to identify absolutely that the resource it has discovered is an instance of a known type. The second relates to configuration, and is to identify from the configuration information for the resource in question, the set of relationships it participates in (e.g., the MIB tables that reveal the interrelations among cards, ports and interfaces within a router). This information includes specific identifications of the other system components involved in these relationships.
- The discovery process also needs to pass the information it has learned to the model, so that the model can create the required instances of MO's to correspond to the external system resources the discovery process has encountered. The process encapsulates its information into an attribute list that describes the resource in the terms required by the model. It passes that attribute list to the Factory component of the model server, which draws on the information stored in the Control Repository to determine the possible services a MO of this type can provide and consume. (An attribute list, as is well known, is a data structure comprising a variable list of labels followed by associated values.) It matches this information to the services it deduces this particular MO actually is providing or consuming (inferred from the attribute list received from the discovery process), thus determining the interrelationships between this MO and others, and so, effectively, situating the MO in the model properly.
- Creation of an Inventory of Instances
- From the model's perspective, there are two gross categories of resource or component in the external system: those that have adequate instrumentation to allow an external observer to discover them, and those that do not. Autodiscovery is used herein to refer to the process of gathering the inventory of system resources that have such instrumentation, such that a process can determine their existence and identify them properly. The discovery of system objects that lack such instrumentation is termed manual discovery.
- Autodiscovery
- During autodiscovery, the model interacts in an unmediated manner with the external system. For the sake of a simple example, this explanation will focus on the example of IP-based discovery because it is clear and well understood.
- A number of figures are referred to in this explanation, and represent the same graph of a small segment of an external system under management by the invention, seen at different phases of operation. FIG. 15 contains a legend to explain the symbols used in this explanation. As shown in that Figure, a disk with an X across is an unidentified MO. A disk with a horizontal bar across it represents an identified MO (such disks are labeled to distinguish them from each other, the letters used for that purpose in this example including A for application, C for computer, D for database, H for hub and R for router; it will of course be understood that the preferred embodiment is not limited to managing resources of these or any other specific types). A solid arrow is used to designate a path, while a hollow arrow denotes a session. A rectangle encloses and denotes a business unit. A tag denotes a service (B-Service or I-functional Service), a cloud outline a portion of the network that is not included in the illustration, and a cloud with lightning a reported fault. A square with rounded corners and a cross on it indicates a root cause, and a disk with a diagonal bar across it, an impact.
- In autodiscovery, the criteria for inclusion of MO's in the model might, for example, comprise a set of starting network addresses for the discovery process, together with the constraints on the range of acceptable addresses that effectively limit the possible inventory to addresses that fall within that range. These limits are preferably set by the system administrator at the time of installation, and so permit the administrator to limit the breadth of the discovery process (for instance, it may be decided to limit the initial discovery to only a particular subsystem of the overall external system). Such breadth constraints thus define which IP networks are to be discovered, and which IP subnets in those networks are to be ignored.
- The preferred embodiment itself sets the limits on the depth to which discovery is conducted (that is, the information actually needed is gathered, but information beyond that is not sought). Such depth constraints include the need to discover all subnets within the defined breadth constraints, all IP routing devices within the breadth constraints, the physical make-up of the IP subnets, all IP devices within those discovered subnets, all media access control methods (“MAC”) of those IP devices, all repeating devices (hubs) among those IP devices, the physical connections between hubs and other IP devices, all bridging devices among those IP devices, and the physical connections between the bridging devices and other IP devices.
- After performing such an IP discovery, the model will know all IP subnets that meet the administrator's breadth criteria, and within those criteria all IP devices and their capabilities. From that information, the model will then be able to create IP network maps that show how the IP subnets are joined together, create IP subnet maps that show how IP devices are physically connected together, and know what devices to monitor for health and performance. (It is to be emphasized, however, that in the preferred embodiment, the model will not actually include all this information, although the information will be used to construct the model. If the model later needs some of this information, what is required will again be gathered from the external system by means of dynamic agents. That is, once the model is constructed, only a subset of the available data, and only a subset of the originally discovered data, is included in the model. The knowledge base for the model is the external system itself, and not a separate database, or the model itself.)
- The autodiscovery process includes three phases: protozoan discovery, generic discovery, and personality discovery The protozoan phase involves identification of discoverable nodes (this refers of course to “nodes” in the more concrete sense first used above, not to the nodes in the graph formed by the program, as the latter only come to be defined as the former are discovered); the generic, the association of discovered nodes with standard types (e.g., a MIB defined by a network working group's RFC); and the personality phase, which in the preferred embodiment is the association of discovered standard types with manufacturer-specific types (i.e., proprietary MIB extensions) These multiple phases are used because given discovery procedures are able to identify only limited amounts of information. The preferred embodiment incorporates preexisting procedures, but is believed to be unique in the way it combines and drives those procedures. Heterogeneous procedures that were evolved independently of each other, without any unifying design concept or strategy to ensure their ability to co-exist and cooperate, do not necessarily flow easily from one into another. Splitting the autodiscovery into phases recognizes this circumstance, and also provides a method for optimizing the flow of control among the various procedures that are utilized.
- In protozoan autodiscovery (see FIG. 16), the process emits Internet Control Management Protocol (“ICMP”) ping signals to all possible addresses within its breadth constraints on the networks it is responsible for discovering. A response to an ICMP ping indicates only that an object exists on the network, however, and does not indicate its type or identity. By recording all addresses that respond, the process effectively creates an inventory of all IP addresses that might label discoverable devices. FIG. 16 depicts nodes that have responded to the discovery processes's ICMP ping requests. That Figure represents those nodes as disks, without any other information, to reflect that the model has insufficient information as yet to determine what kinds of system resources or components they are.
- The aim of the generic phase of autodiscovery is to learn what kinds of standard devices—routers, repeaters, bridges, hubs, computers, etc.—are present in the external system. At the end of this phase, the process will still not know manufacturer-specific extensions to the standard device types. FIG. 17 illustrates the result of this phase. The process has successfully discovered computers (C-nodes), hubs (H-nodes) and routers (R-nodes) in the external system in question. The discovery process issues SNMP ping requests to the items it has located in the protozoan phase. A positive response to an SNMP ping indicates that the device is SNMP-compliant. Once the discovery process knows that a given device responds to SNMP requests, further SNMP queries against the device's MIB tables reveal what kind of device it
- The discovery process also extracts heuristic information from the Control Repository that allows it to identify any standard MIB positively. The value of either the Enterprise Object Identifier (“EOID”) MIB variable or some concatenation of MIB variables provides a generic device type's necessary and sufficient recognition criteria. Configuration information from the Model Repository allows the process to extract configuration information once the device is positively identified. This information will include information about the resource's ability to recognize and report problems it may experience, to the extent that such ability is standard in devices of the general type in question.
- Upon completion of the generic phase, the discovery process has identified the resources in the pertinent portions of the external system down to the level of manufacturer's make and model number (where appropriate). If the MIB's of all of a manufacturer's models are identical as far as their participation in the fault and performance operations is concerned, differing only in variables concerned with device configuration or the like, there is no reason to differentiate among them.
- The personality phase uses the same procedures as the generic phase. The difference is that this phase extends the process beyond standard MIB's to the manufacturer's MIB extensions. These extensions offer additional information about the device's own internal configuration and any errors it may recognize and report beyond the standard. The Control Repository houses heuristic identifiers and configuration material associated with “personalities” (the individual characteristics of a particular model or the like) in the same manner as with generic material.
- Once the dynamic agent has gathered this information, it instantiates a software object that contains the information of interest. The class of object instantiated corresponds, in the preferred embodiment, to the generic type of component or sub-component, and the personality information is represented as an attribute of that object. That is, the preferred embodiment avoids the conventional means of extending existing categories (the generic classes), i.e., using object inheritance hierarchies. The inheritance mechanism, as is well known, allows objects to share attributes and methods based on their structural relationship. The disadvantage of that technique for the purposes of the present invention, is that inheritance relationships are intrinsically compiled, even in an interpretive language like SmallTalk. Adding a new class of object within an existing class would require a recompilation. The approach taken in the preferred embodiment avoids the need to recompile, and thus permits the invention to continue running without interruption.
- As an example of the discovery process, IP discovery is illustrated in FIG. 18A. This is typically handled by a single software agent, which does “boxes and wires” discovery (“BWD”), after which other agents fill in system-level information such as applications and databases. If desired, a given site can break the BWD agent into multiple agents each responsible for a subset of the enterprise. (It should be noted that while the process is illustrated as a sequence of steps, in practice, the entire discovery process is preferably executed in parallel, which permits a dynamic agent to pass off control to another dynamic agent that can pursue alternate discovery paths.)
- The process begins with the receipt from the system administrator of a list of networks, subnets, subnet ranges, and hosts or host ranges. In this step, every subnet or host is passed in with a subnet mask. A list of subnets to discover is then built; this may include a predefined list of hosts. For each subnet, a broadcast ping is emitted, to see if anything answers. Based on the response, a list of all possible addresses on the subnet is made, and those addresses are pinged. Then, an initial SNMP query is done. (All SNMP-compliant devices are handled before any non-compliant devices, to save time.)
- The initial SNMP query requests how many interfaces the queried node has, whether it has its routing flag turned on, and what system services it supports. As it is possible that a device that does not support a service may nonetheless report that it does support that service, the query also includes the system name, confirmation that the node's apparent address is actually in the node's address table, and if so, the node's subnet mask.
- Each SNMP node has its entire address table walked, and any nodes having multiple addresses (mostly routers) are queried for their address resolution protocol (“ARP”) tables. (This table is important because it is the only means by which one can hook a computer up to its LAN access port, usually a hub port, as the hubs only know about MAC addresses.)
- One point to be observed is that while a router (for example) may have more than one active IPAddress, it is not desired to create multiple nodes in the model, and neither is it desirable to query the router for the same information repeatedly. The ARP information is used to avoid both of these pitfalls.
- All SNMP nodes are queried for interfaces, ifStack, etc. The process then builds the SNMP nodes; that is, the process instantiates software objects representing the respective nodes. These objects are not part of the model, but will eventually be used to forward the necessary information to the model for the latter to create a corresponding MO. Any repeater boards or ports are built, and MAC addresses are retrieved. Vendor specific discovery is performed, and finally, factory orders (that is, objects containing the information needed by the Factory component to create a corresponding MO), are sent to the model for that purpose. Once all SNMP nodes have been processed in this way, non-SNMP nodes are processed, after ARP tables for their subnet have been retrieved. Once the entire subnet has been completed, connections are made between hub and computers.
- Preferably, the threads of the agent correspond to respective ones of the boxes shown in FIG. 18A. The ping thread receives Subnet objects in its queue and generates IPAdress objects via its pinging. These latter objects pass through the other queues.
- Manual Discovery
- Not all resources in an external system, in general, will have instrumentation sufficient to participate in the discovery process described above. In some cases, this may be simply because the resources of interest include ones that the system itself does not recognize, such as a subdivision of the business enterprise. For example, the “New York Accounting Department” is not meaningful to the distributed computing system itself, although it does represent the way people in the enterprise think of a particular group of applications and devices. In other cases, devices or applications simply may not observe instrumentation standards. In the preferred embodiment, manual intervention is used to add MO's corresponding to both types of resources to the model. Administrative functions at a user station allow the system administrator to add such MO's by hand. The addition of such MO's (including applications (A-nodes) and databases (D-nodes)) is illustrated in FIG. 18.
- Once the necessary information regarding the discovered component has been put into the proper format, it is passed through the Agent Manager cascade to the model server. The Dispatcher delivers it to the Factory component, which instantiates a corresponding MO.
- II. Model Building—Constructing Interrelationships
- The next phase of building the model is constructing the mesh of actual service interrelationships among MO's.
- One important aspect of the invention is premised on the idea that these interrelationships are a key way of characterizing the system, and that the understanding of the way that faults and performance degradations flow through the system must depend on these interrelationships.
- The model introduces the concept of service to describe relationships between MO's. All MO's participate in the network by consuming or providing services (or both). A singleton or isolate (an MO having no relationship to any other MO), by definition, cannot be part of the network. The concept of services is used to constrain the relations between MO's. By restricting the number of possible services, the system forces “services” as represented in the model to be abstract because they necessarily have to ignore subtler distinctions. More importantly, perhaps, services inherently impose constraints through cardinality co-occurrence restrictions. For instance, an application needing IP services is constrained to find IP services on the same copy of the operating system that it is running on. The IP MO provides IP services to all requesters running on that OS; therefore, IP services must support a one-to-many cardinality.
- The purpose of services in the preferred embodiment of the invention is to provide MO's with the ability to determine what other MO's they connect to, in as many cases as possible. If an application needs IP service, for example, its MO will construct a relation to its network protocol MO. If an operating system needs domain name services (“DNS”) within its subnet, its MO will construct a relation to the MO corresponding to its local DNS service provider. If multiple services differentiate themselves only by tolerance, the MO representing the element that needs the service will form a relation with the MO of the provider whose service tolerances best meet the service-consumer MO's requirements.
- FIG. 19 illustrates some of the computational services that the discovered MO's of FIG. 19 provide and consume. In contrast, FIG. 20 illustrates some of the higher-level services that the discovered MO's provide in response to the organizational use of the computational systems.
- As discussed above, the model becomes a set of MO's that know how to resolve their connection needs, maintain dictionary memory structures of those MO's that are at a given time providing essential services for them, and consuming essential services from them. If all such service relationships were realized and displayed simultaneously, the result would be a snapshot of the internetworked MO's at that moment, i.e., a map of the paths of the network. The network of the model, that is, is an emergent property of MO's and services, and is not itself the ultimate focus of the model as the model is used in the preferred embodiment.
- FIG. 21 represents the world of the model in terms of a network that has emerged from realized computational services or paths. Although this view is both volatile, and lacking in the richness afforded by services in general, it is visually simpler and so has greater graphical explanatory power.
- FIG. 22 illustrates a more complete view of the world of the model than does the previous FIG. Here, in addition to paths, we see sessions, the network of MO's that emerges owing to patterns of organizational use of the services those MO's provide. Ultimately, it is necessary for computational services to realize sessions. The organizational-use model and the computational model are not transitive, however. The model is able to constrain possible paths over which a session may be realized, but cannot deduce—or explain—the necessity of a given session based on the computational model. This becomes particularly apparent when viewing a session between say, an application and a database from the user's perspective. While the session may be constant (meaning only that “this” application always interacts with “this” database), from the perspective of the computational model the path may change considerably—and invisibly to the organizational-use model. In other words, it is not always possible to infer the organizational-use model from the computational model.
- The organizational-use model describes relations among MO's that are motivated by people wanting to accomplish some purpose through use of the distributed computing ensemble. This model is a grouping mechanism that informs the viewer of the preferred embodiment of the invention that particular MO's are related, but does not explain why. For instance, if user Un is a VP of marketing, she may need a number of networked automation resources available to her at all times to perform her job effectively. Business unit BUm is, say, the Comptroller's department. That department's job function also calls for a number of networked automation resources to be available at all times. In other words, for the purposes of the world that this embodiment represents, users and groups of users are containers of networked resources. These resources can be computers, application, telephone extensions, printers, and so forth. For graphical purposes, a depiction such as that in FIG. (Diagram 23) represents these groupings of needed resources as containers whose elements match users' basic level of categorization of resources. In other words, if business users “see” their resources as computers and applications running on them, not all the behind-the-scenes devices, services and middleware that proper functioning of those resources depends on, then that is their effective grouping. Because the model contains the fully elaborated microscopic level of “behind-the-scenes” resources and their service relations, the end user's view does not preclude detail, it simply removes it from the center of focus.
- It should be noted that while FIG. 23 may be an effective graphical representation of business unit or user containment, it is not the only notational variant that may be useful, and is not the only variant within the scope of the invention. FIG. 24 depicts a user U as a service consumer of application A in path notation; this notation makes clear that U suffers the impact of changes to A—as far as the model is concerned, there is no inherent difference between devices and end users for spread of impact.
- In the preferred embodiment of the present invention, relations between MO's are classified as either simple or complex. A “session” in this embodiment comprises both path and service, as explained above. In a complex relationship, these two elements vary independently, while in a simple relationship, they vary in unison.
- For example, networkaccess is a simple relationship in UNIX. The network protocol is responsible for providing network access—i.e., networkAccess is a service of the network protocol. There is no particular path choice that the system can make; by virtue of running on the OS, the application potentially has the service. Physical connections between, say, a card and a backplane are also simple; in this case, however, the path is the focus. There is no particular service choice available. By virtue of being seated in the containing box, the card is connected to the backplane. The simple isConnected relationship here does not even specify that the system uses the given connection, let alone for what purpose. It records that there is a connection.
- From the viewpoint of the preferred embodiment, paths are important for the reasoning engine to compute the impact of physical and lower-level logical failures on higher-level sessions. Services, on the other hand, categorize the kinds of relationships that make up the system at increasing levels of abstraction, and therefore provide a dimension of impact analysis from differing perspectives. For example, suppose one wants to know how reliable database service was for some set of business units. Categorization by service allows the model to cut simply across different database service providers and provide a meaningful aggregation of both fault and impact data for that service.
- In the preferred embodiment, relationship types have the following attributes:
ID internal symbol Name human readable name Cardinality one-to-one, one-to-many, many-to-one, many-to-many Dependency parent, child, both, neither Complex/Simple complex (independently varying path and service) or simple (dependent variation of path & service) - Relationship rules specify what entity types participate in what types of relationships. The preferred embodiment splits rules into producer (or parent) halves and consumer (or child) halves. This split provides a very flexible means of defining constraints on relationship participants, and allows the model to perform the discovery process described above, during runtime.
- A relationship rule has the following attributes:
ID foreign key back to service type Parent/Child rule <see example below> Class foreign key to the entity type NextHopService Type foreign key to a service type; used for path determination -- for instance networkAccess Partner Restriction foreign key to an entity type - A given entity type inherits the relationship rules of its superclasses. It has the ability to add new rules that further constrain a more general rule, or even block the use of a rule defined by superclass types. For example, suppose that in a given client site, both IP and IPX provide networkAccess services. The preferred embodiment can express this capacity as a general network access rule for applications. Client application applicationA can use only IP, however. The definition for applications of the same type as applicationA, then, needs to override a general (child) network access rule that offers no restriction and provide a rule that restricts instances to IP.
- Again, assume there is a predefined class Program, and that both NEWServer and NEWClient are to be subclasses of Program. The model definer can also define a default value for the name attribute (such as “NEWClient” and “NEWServer”), as illustrated in the following tables:
ID NEWservice Name New Service Cardinality one-to-many Dependency child Complex/Simple complex (NEWService definition) ID NEWService Parent NEWServer NextHop networkAccess Partner Restriction none (NEWService parent rule) ID NEWService Child NEWClient NextHop networkAccess Partner Restriction none (NEWService child rule) - At this point, therefore, the invention has discovered what components populate the external system and constructed a model representing the service relationships that exist among those components. In particular, the system administrator has been able to define, in addition to the individual hardware and software components of the external system, higher-level groupings thereof, based on users' groups, business units or whatever other basis meets needs of the organization that owns the external system.
- III. The Interaction Phase
- As was stated above, two distinct kinds of events enter the model—systemic events (addition or deletion of MO's), and MO events (a change in the state or condition of a known MO). Both kinds of event are forms of interaction between the model and the external system. The function of the model at this point is to (1) correlate changes that affect disparate MO's back to root-cause events (in cases where the root-cause event is one that is identifiable as such); and (2) use the known patterns of interrelations among MO's to predict the impact of those changes. The event processing within the model will now be described.
- FIG. 25 illustrates the state of a portion of the model after the dynamic agents have collected reported information from the resources represented by hub H7, computers C10 and C11 and applications A14 and A16. The MO's in this portion of the model report that they have received state-changing events.
- FIG. 26 illustrates the effects of the model's rootward graph traversal. The event at node H7 indicated that it is a root cause. Therefore, rootward traversal through the model stops at H7, and the model generates an alarm that focuses on H7 but contains all the concomitant sympathetic events from the affected leaves. It should be noted that application A14 may not appear in the FIG. to be directly leafward from the root-cause event at H7, because of the direction of the arrow representing the session between that application and database D. In fact, however, the existence of that session means that that application is included in a leafward traversal from H7.
- FIG. 27 illustrates the leafward spread of impacts through this portion of the model. The failure of the hub at H7 has caused loss of service to computers C10 and C11, and to sessions S19 and S20 that connect to database D15 running on computer C10. The loss of these two sessions interrupts the work that people in business units BU22 and BU23 are doing. Thus the model has predicted the impact of a lower-level device failure on the activities of end users. These users are not resources that the system recognizes, nor resources that the system is able to collect data about (except through the intermediary of a trouble ticketing process). While the model's predictions for computers C10 and C11 may have corroborating evidence from the sympathetic events logged for those nodes, users and business units may well in a given instance have only predicted impacts at this time.
- A schematic summary of the interaction phase is shown in FIGS. 28 through 32.
- A more-detailed explanation of the functioning and structure of important elements of the system of the preferred embodiment will now be provided.
- How to Analyze Events
- The Analyzer function of the Agent Manager in the preferred embodiment is responsible for interpreting incoming data from whatever source, recognizing the data as an instance of some predefined event of interest to the system, and creating the association between the captured external event and the model event. Techniques of analyzing, interpreting and associating messages vary according to the data collection technique.
- Structure of an Event Definition
- Events in the model are instances of a pattern known as an “eventDefinition”. EventDefinitions function as a kind of frame to hold information culled from a captured event.
- In keeping with the data-driven nature of the invention, the inventory of eventDefinitions and their interpretation reside in the Control Repository. When new eventDefinitions and their interpretation arise (because of new MIB's, new devices or applications, a greater degree of granularity in event discrimination, etc.), the system administrator can update the Repository with the appropriate information and the model updates itself dynamically.
- An example of the structure of an eventDefinition is as follows:
resultantState severity indicator message Text foreign key to a message dictionary based on language; each message has substitutable parameters defined that are filled in with values pulled out of actual events eventId a unique identifier for the event isRootCause true or false - If the model captures information about an entity type via SNMP, it will have relationships to several SNMP objects in the repository: the SNMP enterprise OID; a table mapping trap numbers to eventDefinitions; and one or more MIB tables.
- If the model captures information about an entity type via log file tailing, it will have relationships to several logging objects in the repository: the file(s) to tail; tables of pattern matches (inclusion criteria); parsing rules—what fields to extract from the message; and the eventDefinition.
- Defining the State Calculation of a Model
- While it is possible for system-defined MO's to have a single state, which they calculate based on the “resultantState” field of an event, the preferred architecture demands that states be properties of the viewer of a situation rather than of any component of the situation itself. Accordingly, it is preferred to replace a state value with a pointer to a tuple that has the dimensions viewer and formula. The Control Repository then contains all aspects of the association, allowing different sites to configure different classes of viewer and site-specific state-calculation formulae. Nonetheless, it is possible to envisage a version in which, at creation time, the user can define the state calculation for a user-defined grouping of MO's (i.e., logical groupings) as a predetermined formula that will provide minimum, maximum and average value calculations.
- Dynamic Agents in Runtime
- In the preferred embodiment, dynamic agents are semi-autonomous Java applications that are responsible for: interacting with OS-specific or protocol-specific sources; eliciting data according to a specific “recipe”; and analyzing the elicited data. As already discussed, each dynamic agent comprises three functional (although not necessarily structural) components: a Sensory Monitor, an Instruction Set, and an Analyzer. The sensory monitors interact with system- or high-level protocols—for instance, TCP/IP sockets, OS file systems, CMIP or SNMP. Instruction sets belong to the process of eliciting data. They are control parameters and methods that direct sensory monitors in how to connect to a particular host, to delay three minutes between polls, etc. Analyzers belong to the process of analyzing data. They interpret and winnow information coming in from the dynamic agents (for instance, “take all messages that begin with [amd];” etc.).
- The dynamic agents run according to a thread-based active object model, comprising at least one independent thread constantly running in an endless main loop. External system events interrupt the main loop; message-appropriate callbacks run on interrupt. New control information thus interrupts dynamic agents and alters their states on the fly.
- The Agent Manager hosts the independent threads, and provides system and housekeeping services including: ensuring that all threads that should be running, are running; knowing where to send the data collected by the dynamic agents; handling thread interruption, termination and monitoring; and offering inter-thread messaging, queuing and synchronization services. That is, the Agent Manager provides an execution context for the dynamic agents.
- A combination of inheritance and interfaces fits the dynamic agent components into the execution context. Integration requires:
- an understanding of Agent Manager synchronization mechanisms;
- asynchronous queues and interrupts;
- message passing behavior;
- message structure;
- mechanisms to pass messages to other services in the system of the invention and to receive messages from them;
- self-scheduling within the Agent Manager; and
- return of information to the model server.
- The foregoing, however, mainly serves to supply the model with stream data. Operational end users also need ad hoc data. For instance, a network operator might ask the system to tell her about the collision rates on a given hub for the next ten minutes, polling at 20 millisecond intervals. The Agent Manager architecture allows users to demand data in this fashion. The data-driven architecture services users' ad hoc requests through flexible, reusable Java modules. Each of these modules follows the interface definitions required for the services of the system of the invention, but needs minimal exposure to the mechanics of the execution context to get its services.
- Extending stream data collection techniques to ad hoc requests requires being able to:
- tell the system of the existence of a module;
- find an Agent Manager to run it;
- schedule the module's running;
- get an Agent Manager to find the module;
- let a user pass state-changing control information to the running module;
- return collected data to the specific requester rather than the general pool or data; and
- deal with the module's having two masters—the end user and the local Agent Manager, each of which needs to control some aspects of the module's execution.
- In the preferred embodiment, the last aspect is the key to the others. The Agent Manager needs to handle the scheduling of all modules running within it, and to ensure that they have all necessary resources. On the other hand, the implementation of the running module is a black box from the viewpoint of the Agent Manager. Only the requester (human or automated) knows what the module is for. The requester needs to retain addressability of the module and to receive the collected information. Two independent subsystems in the environment, therefore, need to control the module simultaneously.
- These competing demands are mutually exclusive, however. One set concerns governing compliant participation in the dynamic agent manager context. These methods are predetermined in the design of the embodiment, and thus programmers can build them into the system itself. Object inheritance provides a mechanism for the module to use a parameter block object and its queue manipulation routines that the local context furnishes at instantiation time.
- The implementation of the module reflects user interpretation and business needs, and so is not predictable at design time. It does not play a role within the bounds of the closed system, so its role cannot be structural as far as the execution context is concerned. The system provides a fixed path for passing control and information messages for the individual request by assigning arbitrary CORBA CosNaming service names for both directions of the conversation at the time of the request. Such assignment tells a user request to address requests through service name X and receive responses through service name Y. (The CosNaming service also serves to define the cascades of Agent Managers discussed previously.) The ad hoc module receives the complementary information: listen for interrupts on service name X and return information on service name Y. Implementation remains up to the individual request.
- The user-written Java module is now situated in the infrastructure. The execution context must find the module and pass its parameters to it.
- When an ad hoc request starts up, it instructs the Agent Manager to fetch a bundle from the Control Repository. A bundle is a complex structure that contains modules, named parameter lists, and corequisite dependencies. For instance, a bundle for SNMP polling of hubs could be structured as follows:
bundle SNMP_hub_poller { module=SNMP_poller; //name of Java module parms1=SNMP_hub_parms; //variables to poll for parms2=davids_devices; //list of devices to //poll parms3=normal_frequency; //every 15 seconds needed_service=SNMP_polling; //dynamic agent //manager must offer SNMP //polling service } - This bundle encapsulates the following: the name of a Java module to run, three named sets of parameters, and a corequisite service the Agent Manager must offer. Bundling software needs to perform symbolic substitutions, and so requires a parameter list namespace.
- An example of reuse will now be given. Suppose a bundle for polling bridging devices is set up as follows:
bundle SNMP_bridge_poller { module=SNMP_poller; //name of Java module parms1=SNMP_bridge_parms; //variables to poll //for parms2=franks_devices; //list of devices to //poll parms3=high_frequency; //every 5 seconds needed_service=SNMP_polling; //dynamic agent //manager must offer SNMP //polling service } - This bridge-polling bundle shares the Java module and corequisite service needs of the hub-polling bundle illustrated above. It achieves reuse by changing the incoming parameter lists designating the devices to poll, the variables polled for, and the frequency with which to poll them. In this instance, both bundles happen to be stored as such in the Control Repository. This is an implementation detail that reflects that the two kinds of polling are frequent occurrences for the system administrator, and therefore convenient to have persist. On-the-fly overrides can produce the same results, but demand that the administrator spend more effort at runtime.
- More generally, the software must:
- allow a request to override parameters by providing
- a list of values on the fly;
- allow a request to override parameters by passing a different parameter list name than the default;
- refer to a specific parameter list in a case where a module has several lists; and
- associate the parameter list override to a specific module in a bundle with multiple modules.
- Thus, speaking more broadly, ad hoc requests are a mechanism for a user to manipulate the demand-data-flow behavior of the system. In some cases, this behavior is configured in a more-or-less predetermined way administrators set up beforehand (different bundles that have much overlap). In some cases, the behavior is fully ad hoc (the same bundle gets on-the-fly overrides). In broader terms, the system has the ability to add new behavior dynamically, in accordance with user needs. An existing, well-defined static structure provides the mechanism for adding the new behavior.
- Potentially, many Agent Managers run in any particular copy of the system. An ad hoc request needs to find the right one to service its needs, and the Agent Manager needs to realize the demand for corequisite services. The preferred embodiment of the present invention deals with this issue through the concept of service. Broadly speaking, an object enlists the services of other objects when it passes the responsibility of returning some desired result set to those other objects. In the narrower context of bundles and Agent Managers, bundles need services from the execution context to run. Services are the codified sets of behavioral expectations within a configuration instance. That is, for the system to be able to handle SNMP device polling (for example), some component (or set of components) must have the responsibility of providing SNMP polling service. Each site may configure the services that are appropriate to its runtime environment and create configuration-time roles and services.
- Any instance of the system of the invention must be able to provide a site-specific inventory of these services. Unless all services are available on all distributed Agent Managers, the system needs to maintain a catalog that tells it where it has distributed those services—which components it has charged with the responsibility of executing the proper tasks. A model such as CORBA Trading Services can meet this need.
- Conversely, a Agent Manager charged with offering a given service needs to know how to offer it. Bundles play a role here, too. The Agent Manager requests a bundle from the Control Repository. The Repository maps the service structurally to a specific bundle (i.e., a module and its concomitant set of parameter lists). An operator can alter the implementation of the service at will, even dynamically.
- The Agent Manager in Runtime
- An Agent Manager with no control information starts up running only the configuration handler, the communications handler, the thread manager and the inter-thread queue routines. At this point, the Agent Manager has no information as to what its goal is, what dynamic agents to run, or how to pass its data to the next hop. An Agent Manager receives its higher level objectives through two data structures: the mission package and the bundle execution. The mission package can contain bundle executions, but its main purpose is to provide control information that allows the Agent Manager to situate itself in the mesh of the infrastructure (that is, what service it should advertise for the cascade, what service it should send to for the cascade).
- Bundle executions are the instantiation of data structures that reside in the Control Repository called bundles. A bundle, as already described, ties together named parameter lists, named parameters, executable modules or scripts for building dynamic agents. Various portions of the bundle can be overridden to allow easy reuse. The actual content of a bundle that is passed to the Agent Manager, after any overrides have taken place, is the bundle execution.
- Once a dynamic agent has started running within the context of the Agent Manager, it is independently addressable. That is, whatever component controls the dynamic agent—the end user in the case of an ad hoc request, the model in the case of a programmatic request—is able to address the dynamic agent independently of the Agent Manager. In other words, there is no need for the higher-order entity to understand the topology or constitution of the infrastructure in order to keep tabs on, or manipulate, the dynamic agent as it runs. Agent Managers are also independently addressable, since there are occasions—especially times of administrative interaction—where the Agent Manager, rather than some dynamic agent running under it, needs to receive control information.
- All protocols and programming languages mentioned above herein are well known, and are incorporated herein by reference.
- The foregoing description of the invention, and of the preferred embodiment thereof, is sufficient to enable one of ordinary skill to practice the invention without undue experimentation, and apprises one of ordinary skill of the best mode of doing so. Moreover, while the invention has been disclosed with reference to the preferred embodiment, many modifications and variations thereof will now be apparent to those of ordinary skill. Accordingly, the scope of the invention is not to be limited by the details of the preferred embodiment, but only by the appended claims.
Claims (52)
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/048,025 US6393386B1 (en) | 1998-03-26 | 1998-03-26 | Dynamic modeling of complex networks and prediction of impacts of faults therein |
EP99915051A EP1073960A1 (en) | 1998-03-26 | 1999-03-26 | Dynamic modeling of complex networks and prediction of impacts of faults therein |
AU33657/99A AU3365799A (en) | 1998-03-26 | 1999-03-26 | Dynamic modeling of complex networks and prediction of impacts of faults therein |
JP2000538359A JP2002508555A (en) | 1998-03-26 | 1999-03-26 | Dynamic Modeling of Complex Networks and Prediction of the Impact of Failures Within |
PCT/US1999/006669 WO1999049474A1 (en) | 1998-03-26 | 1999-03-26 | Dynamic modeling of complex networks and prediction of impacts of faults therein |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/048,025 US6393386B1 (en) | 1998-03-26 | 1998-03-26 | Dynamic modeling of complex networks and prediction of impacts of faults therein |
Publications (2)
Publication Number | Publication Date |
---|---|
US20020022952A1 true US20020022952A1 (en) | 2002-02-21 |
US6393386B1 US6393386B1 (en) | 2002-05-21 |
Family
ID=21952355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/048,025 Expired - Fee Related US6393386B1 (en) | 1998-03-26 | 1998-03-26 | Dynamic modeling of complex networks and prediction of impacts of faults therein |
Country Status (5)
Country | Link |
---|---|
US (1) | US6393386B1 (en) |
EP (1) | EP1073960A1 (en) |
JP (1) | JP2002508555A (en) |
AU (1) | AU3365799A (en) |
WO (1) | WO1999049474A1 (en) |
Cited By (116)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020032769A1 (en) * | 2000-04-28 | 2002-03-14 | Sharon Barkai | Network management method and system |
US20020138242A1 (en) * | 2000-12-12 | 2002-09-26 | Uri Wilensky | Distributed agent network using object based parallel modeling language to dynamically model agent activities |
US20020174222A1 (en) * | 2000-10-27 | 2002-11-21 | Cox Earl D. | Behavior experts in e-service management |
US20030097245A1 (en) * | 2001-11-20 | 2003-05-22 | Athena Christodoulou | System analysis |
FR2838844A1 (en) * | 2002-04-23 | 2003-10-24 | France Telecom | Performance model generation method in which a functional model of a system is used as a starting point for model generation, the invention relating particularly to communications networks with a multiplicity of hard- and software |
US20040111327A1 (en) * | 2002-12-10 | 2004-06-10 | Kidd Susan O. | Product toolkit system and method |
US20040199572A1 (en) * | 2003-03-06 | 2004-10-07 | Hunt Galen C. | Architecture for distributed computing system and automated design, deployment, and management of distributed applications |
US20040205179A1 (en) * | 2003-03-06 | 2004-10-14 | Hunt Galen C. | Integrating design, deployment, and management phases for systems |
US20040236728A1 (en) * | 2003-04-07 | 2004-11-25 | Newman Gary H. | Grouping of computers in a computer information database system |
US20040268358A1 (en) * | 2003-06-30 | 2004-12-30 | Microsoft Corporation | Network load balancing with host status information |
US20040267920A1 (en) * | 2003-06-30 | 2004-12-30 | Aamer Hydrie | Flexible network load balancing |
US20050010586A1 (en) * | 2002-02-22 | 2005-01-13 | Fujitsu Limited | Server machine, client machine, server/client system, server program and client program |
US20050055435A1 (en) * | 2003-06-30 | 2005-03-10 | Abolade Gbadegesin | Network load balancing with connection manipulation |
US20050091078A1 (en) * | 2000-10-24 | 2005-04-28 | Microsoft Corporation | System and method for distributed management of shared computers |
US20050125212A1 (en) * | 2000-10-24 | 2005-06-09 | Microsoft Corporation | System and method for designing a logical model of a distributed computer system and deploying physical resources according to the logical model |
US20060015596A1 (en) * | 2004-07-14 | 2006-01-19 | Dell Products L.P. | Method to configure a cluster via automatic address generation |
US20060092861A1 (en) * | 2004-07-07 | 2006-05-04 | Christopher Corday | Self configuring network management system |
US20060149782A1 (en) * | 2005-01-05 | 2006-07-06 | Microsoft Corporation | Prescribed navigation using topology metadata and navigation path |
US20060153089A1 (en) * | 2004-12-23 | 2006-07-13 | Silverman Robert M | System and method for analysis of communications networks |
US7092378B1 (en) * | 2001-12-10 | 2006-08-15 | At & T Corp. | System for utilizing a genetic algorithm to provide constraint-based routing of packets in a communication network |
US20060232927A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based system monitoring |
US20060235962A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based system monitoring |
US20060235664A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based capacity planning |
US20060235650A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based system monitoring |
US20060271341A1 (en) * | 2003-03-06 | 2006-11-30 | Microsoft Corporation | Architecture for distributed computing system and automated design, deployment, and management of distributed applications |
US20070006218A1 (en) * | 2005-06-29 | 2007-01-04 | Microsoft Corporation | Model-based virtual system provisioning |
US20070016393A1 (en) * | 2005-06-29 | 2007-01-18 | Microsoft Corporation | Model-based propagation of attributes |
US20070112847A1 (en) * | 2005-11-02 | 2007-05-17 | Microsoft Corporation | Modeling IT operations/policies |
US20070230955A1 (en) * | 2006-03-31 | 2007-10-04 | Applied Micro Circuits Corporation | Optical transceiver with electrical ring distribution interface |
US20070300233A1 (en) * | 2006-06-27 | 2007-12-27 | Kulvir S Bhogal | Computer data communications in a high speed, low latency data communications environment |
US20070300234A1 (en) * | 2006-06-27 | 2007-12-27 | Eliezer Dekel | Selecting application messages from an active feed adapter and a backup feed adapter for application-level data processing in a high speed, low latency data communications environment |
US20080010487A1 (en) * | 2006-06-27 | 2008-01-10 | Eliezer Dekel | Synchronizing an active feed adapter and a backup feed adapter in a high speed, low latency data communications environment |
US20080059214A1 (en) * | 2003-03-06 | 2008-03-06 | Microsoft Corporation | Model-Based Policy Application |
US20080114938A1 (en) * | 2006-11-14 | 2008-05-15 | Borgendale Kenneth W | Application Message Caching In A Feed Adapter |
US20080141275A1 (en) * | 2006-12-12 | 2008-06-12 | Borgendale Kenneth W | Filtering Application Messages In A High Speed, Low Latency Data Communications Environment |
US20080141276A1 (en) * | 2006-12-12 | 2008-06-12 | Borgendale Kenneth W | Referencing Message Elements In An Application Message In A Messaging Environment |
US20080141274A1 (en) * | 2006-12-12 | 2008-06-12 | Bhogal Kulvir S | Subscribing For Application Messages In A Multicast Messaging Environment |
US20080137830A1 (en) * | 2006-12-12 | 2008-06-12 | Bhogal Kulvir S | Dispatching A Message Request To A Service Provider In A Messaging Environment |
US20080140550A1 (en) * | 2006-12-07 | 2008-06-12 | Berezuk John F | Generating a global system configuration for a financial market data system |
US20080141272A1 (en) * | 2006-12-06 | 2008-06-12 | Borgendale Kenneth W | Application Message Conversion Using A Feed Adapter |
US20080178194A1 (en) * | 2002-09-27 | 2008-07-24 | International Business Machines Corporation | Integrating Non-Compliant Providers of Dynamic Services into a Resource Management infrastructure |
US20080244017A1 (en) * | 2007-03-27 | 2008-10-02 | Gidon Gershinsky | Filtering application messages in a high speed, low latency data communications environment |
US20080256395A1 (en) * | 2007-04-10 | 2008-10-16 | Araujo Carlos C | Determining and analyzing a root cause incident in a business solution |
US20080288622A1 (en) * | 2007-05-18 | 2008-11-20 | Microsoft Corporation | Managing Server Farms |
US20090006559A1 (en) * | 2007-06-27 | 2009-01-01 | Bhogal Kulvir S | Application Message Subscription Tracking In A High Speed, Low Latency Data Communications Environment |
US20090024498A1 (en) * | 2007-07-20 | 2009-01-22 | Berezuk John F | Establishing A Financial Market Data Component In A Financial Market Data System |
US20090172689A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Adaptive business resiliency computer system for information technology environments |
US20090171704A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Management based on computer dynamically adjusted discrete phases of event correlation |
US20090171708A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Using templates in a computing environment |
US20090172668A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Conditional computer runtime control of an information technology environment based on pairing constructs |
US20090171706A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Computer pattern system environment supporting business resiliency |
US20090171731A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Use of graphs in managing computing environments |
US20090172670A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Dynamic generation of processes in computing environments |
US20090172687A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Management of computer events in a computer environment |
US20090172674A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Managing the computer collection of information in an information technology environment |
US20090171705A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Defining and using templates in configuring information technology environments |
US20090172460A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Defining a computer recovery process that matches the scope of outage |
US20090171730A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Non-disruptively changing scope of computer business applications based on detected changes in topology |
US20090172688A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Managing execution within a computing environment |
US20090171703A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Use of multi-level state assessment in computer business environments |
US20090172682A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Serialization in computer management |
US20090172669A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Use of redundancy groups in runtime computer management of business applications |
US20090171733A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Dynamic selection of actions in an information technology environment |
US20090172461A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Conditional actions based on runtime conditions of a computer system environment |
US20090171732A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Non-disruptively changing a computing environment |
US20090172769A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Programmatic validation in an information technology environment |
US20090172671A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Adaptive computer sequencing of actions |
US20090171707A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Recovery segments for computer business applications |
US7669235B2 (en) | 2004-04-30 | 2010-02-23 | Microsoft Corporation | Secure domain join for computing devices |
US20100106742A1 (en) * | 2006-09-01 | 2010-04-29 | Mu Dynamics, Inc. | System and Method for Discovering Assets and Functional Relationships in a Network |
US20100169066A1 (en) * | 2008-12-30 | 2010-07-01 | International Business Machines Corporation | General Framework to Predict Parametric Costs |
US7778422B2 (en) | 2004-02-27 | 2010-08-17 | Microsoft Corporation | Security associations for devices |
US20100268565A1 (en) * | 2000-07-10 | 2010-10-21 | Bmc Software, Inc. | System and method of enterprise systems and business impact management |
CN101933003A (en) * | 2008-01-31 | 2010-12-29 | 惠普开发有限公司 | Automated application dependency mapping |
US20110093853A1 (en) * | 2007-12-28 | 2011-04-21 | International Business Machines Corporation | Real-time information technology environments |
US8122144B2 (en) | 2006-06-27 | 2012-02-21 | International Business Machines Corporation | Reliable messaging using redundant message streams in a high speed, low latency data communications environment |
US8250202B2 (en) | 2003-01-04 | 2012-08-21 | International Business Machines Corporation | Distributed notification and action mechanism for mirroring-related events |
US8285720B2 (en) | 2003-04-07 | 2012-10-09 | Belarc, Inc. | Grouping of computers in a computer information database system |
US8375244B2 (en) | 2007-12-28 | 2013-02-12 | International Business Machines Corporation | Managing processing of a computing environment during failures of the environment |
US8433811B2 (en) | 2008-09-19 | 2013-04-30 | Spirent Communications, Inc. | Test driven deployment and monitoring of heterogeneous network systems |
US20130152106A1 (en) * | 2009-07-22 | 2013-06-13 | International Business Machines Corporation | Managing events in a configuration of soa governance components |
US20130176867A1 (en) * | 2012-01-09 | 2013-07-11 | Florin Balus | Method and apparatus for object grouping and state modeling for application instances |
US20130262347A1 (en) * | 2012-03-29 | 2013-10-03 | Prelert Ltd. | System and Method for Visualisation of Behaviour within Computer Infrastructure |
US20130290520A1 (en) * | 2012-04-27 | 2013-10-31 | International Business Machines Corporation | Network configuration predictive analytics engine |
WO2014001841A1 (en) | 2012-06-25 | 2014-01-03 | Kni Műszaki Tanácsadó Kft. | Methods of implementing a dynamic service-event management system |
US20140281739A1 (en) * | 2013-03-14 | 2014-09-18 | Netflix, Inc. | Critical systems inspector |
US20140297684A1 (en) * | 2011-10-11 | 2014-10-02 | International Business Machines Corporation | Predicting the Impact of Change on Events Detected in Application Logic |
US8972543B1 (en) | 2012-04-11 | 2015-03-03 | Spirent Communications, Inc. | Managing clients utilizing reverse transactions |
US8998544B1 (en) * | 2011-05-20 | 2015-04-07 | Amazon Technologies, Inc. | Load balancer |
US20160004584A1 (en) * | 2013-08-09 | 2016-01-07 | Hitachi, Ltd. | Method and computer system to allocate actual memory area from storage pool to virtual volume |
US20160315818A1 (en) * | 2014-04-25 | 2016-10-27 | Teoco Corporation | System, Method, and Computer Program Product for Extracting a Topology of a Telecommunications Network Related to a Service |
US9537749B2 (en) | 2012-06-06 | 2017-01-03 | Tufin Software Technologies Ltd. | Method of network connectivity analyses and system thereof |
US20170102982A1 (en) * | 2015-10-13 | 2017-04-13 | Honeywell International Inc. | Methods and apparatus for the creation and use of reusable fault model components in fault modeling and complex system prognostics |
US9632904B1 (en) * | 2013-02-15 | 2017-04-25 | Ca, Inc. | Alerting based on service dependencies of modeled processes |
US20170344926A1 (en) * | 2016-05-26 | 2017-11-30 | International Business Machines Corporation | System impact based logging with resource finding remediation |
US20170344413A1 (en) * | 2016-05-26 | 2017-11-30 | International Business Machines Corporation | System impact based logging with enhanced event context |
US20180367405A1 (en) * | 2017-06-19 | 2018-12-20 | Cisco Technology, Inc. | Validation of bridge domain-l3out association for communication outside a network |
US10313365B2 (en) * | 2016-08-15 | 2019-06-04 | International Business Machines Corporation | Cognitive offense analysis using enriched graphs |
US10332056B2 (en) | 2016-03-14 | 2019-06-25 | Futurewei Technologies, Inc. | Features selection and pattern mining for KQI prediction and cause analysis |
US10482158B2 (en) | 2017-03-31 | 2019-11-19 | Futurewei Technologies, Inc. | User-level KQI anomaly detection using markov chain model |
US10546241B2 (en) | 2016-01-08 | 2020-01-28 | Futurewei Technologies, Inc. | System and method for analyzing a root cause of anomalous behavior using hypothesis testing |
CN111052252A (en) * | 2017-09-01 | 2020-04-21 | X开发有限责任公司 | Heterogeneous method for modeling a biochemical environment |
CN111260176A (en) * | 2018-11-30 | 2020-06-09 | 西门子股份公司 | Method and system for eliminating fault conditions in a technical installation |
CN112130888A (en) * | 2020-08-12 | 2020-12-25 | 百度时代网络技术(北京)有限公司 | Method, device and equipment for updating application program and computer storage medium |
CN112433716A (en) * | 2020-11-12 | 2021-03-02 | 北京航空航天大学 | Runtime component dynamic interaction model construction method based on non-invasive monitoring |
US11005766B1 (en) * | 2020-04-06 | 2021-05-11 | International Business Machines Corporation | Path profiling for streaming applications |
CN113268370A (en) * | 2021-05-11 | 2021-08-17 | 西安交通大学 | Root cause alarm analysis method, system, equipment and storage medium |
US11100152B2 (en) * | 2017-08-17 | 2021-08-24 | Target Brands, Inc. | Data portal |
US11405260B2 (en) | 2019-11-18 | 2022-08-02 | Juniper Networks, Inc. | Network model aware diagnosis of a network |
US11533215B2 (en) * | 2020-01-31 | 2022-12-20 | Juniper Networks, Inc. | Programmable diagnosis model for correlation of network events |
US11568331B2 (en) * | 2011-09-26 | 2023-01-31 | Open Text Corporation | Methods and systems for providing automated predictive analysis |
US20230048212A1 (en) * | 2019-03-12 | 2023-02-16 | Ebay Inc. | Enhancement Of Machine Learning-Based Anomaly Detection Using Knowledge Graphs |
US20230168966A1 (en) * | 2021-11-29 | 2023-06-01 | Vmware, Inc. | Optimized alarm state restoration through categorization |
US11809266B2 (en) | 2020-07-14 | 2023-11-07 | Juniper Networks, Inc. | Failure impact analysis of network events |
US20240004743A1 (en) * | 2022-06-29 | 2024-01-04 | Workspot, Inc. | Method and system for real-time identification of blast radius of a fault in a globally distributed virtual desktop fabric |
US11956116B2 (en) | 2020-01-31 | 2024-04-09 | Juniper Networks, Inc. | Programmable diagnosis model for correlation of network events |
Families Citing this family (244)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7389211B2 (en) * | 1998-05-13 | 2008-06-17 | Abu El Ata Nabil A | System and method of predictive modeling for managing decisions for business enterprises |
US20020049573A1 (en) * | 1998-05-13 | 2002-04-25 | El Ata Nabil A. Abu | Automated system and method for designing model based architectures of information systems |
US6990437B1 (en) | 1999-07-02 | 2006-01-24 | Abu El Ata Nabil A | Systems and method for determining performance metrics for constructing information systems |
US7031901B2 (en) * | 1998-05-13 | 2006-04-18 | Abu El Ata Nabil A | System and method for improving predictive modeling of an information system |
US6311144B1 (en) * | 1998-05-13 | 2001-10-30 | Nabil A. Abu El Ata | Method and apparatus for designing and analyzing information systems using multi-layer mathematical models |
US7783468B2 (en) * | 1998-05-13 | 2010-08-24 | Accretive Technologies, Inc. | Automated system and method for service and cost architecture modeling of enterprise systems |
US6363421B2 (en) * | 1998-05-31 | 2002-03-26 | Lucent Technologies, Inc. | Method for computer internet remote management of a telecommunication network element |
JP2000020443A (en) * | 1998-07-02 | 2000-01-21 | Victor Co Of Japan Ltd | Software agent system |
US6789050B1 (en) * | 1998-12-23 | 2004-09-07 | At&T Corp. | Method and apparatus for modeling a web server |
US6707795B1 (en) * | 1999-04-26 | 2004-03-16 | Nortel Networks Limited | Alarm correlation method and system |
JP2000322288A (en) * | 1999-05-06 | 2000-11-24 | Fujitsu Ltd | Distributed object development system and computer readable recording medium for recording program for execution of distributed object development by computer |
US6654914B1 (en) * | 1999-05-28 | 2003-11-25 | Teradyne, Inc. | Network fault isolation |
US6952741B1 (en) * | 1999-06-30 | 2005-10-04 | Computer Sciences Corporation | System and method for synchronizing copies of data in a computer system |
US6823299B1 (en) * | 1999-07-09 | 2004-11-23 | Autodesk, Inc. | Modeling objects, systems, and simulations by establishing relationships in an event-driven graph in a computer implemented graphics system |
US6728590B1 (en) * | 1999-07-14 | 2004-04-27 | Nec Electronics, Inc. | Identifying wafer fabrication system impacts resulting from specified actions |
US6820042B1 (en) * | 1999-07-23 | 2004-11-16 | Opnet Technologies | Mixed mode network simulator |
US6305944B1 (en) * | 1999-09-30 | 2001-10-23 | Qwest Communications Int'l., Inc. | Electrical connector |
US6654782B1 (en) * | 1999-10-28 | 2003-11-25 | Networks Associates, Inc. | Modular framework for dynamically processing network events using action sets in a distributed computing environment |
US7526487B1 (en) * | 1999-10-29 | 2009-04-28 | Computer Sciences Corporation | Business transaction processing systems and methods |
US7693844B1 (en) | 1999-10-29 | 2010-04-06 | Computer Sciences Corporation | Configuring processing relationships among entities of an organization |
US7571171B1 (en) | 1999-10-29 | 2009-08-04 | Computer Sciences Corporation | Smart trigger for use in processing business transactions |
US7363264B1 (en) | 1999-10-29 | 2008-04-22 | Computer Sciences Corporation | Processing business transactions using dynamic database packageset switching |
TW503355B (en) * | 1999-11-17 | 2002-09-21 | Ibm | System and method for communication with mobile data processing devices by way of ""mobile software agents"" |
US6618755B1 (en) | 1999-12-07 | 2003-09-09 | Watchguard Technologies, Inc. | Automatically identifying subnetworks in a network |
EP1107108A1 (en) * | 1999-12-09 | 2001-06-13 | Hewlett-Packard Company, A Delaware Corporation | System and method for managing the configuration of hierarchically networked data processing devices |
US6694362B1 (en) * | 2000-01-03 | 2004-02-17 | Micromuse Inc. | Method and system for network event impact analysis and correlation with network administrators, management policies and procedures |
US7315801B1 (en) * | 2000-01-14 | 2008-01-01 | Secure Computing Corporation | Network security modeling system and method |
US6832184B1 (en) * | 2000-03-02 | 2004-12-14 | International Business Machines Corporation | Intelligent work station simulation—generalized LAN frame generation simulation structure |
US6845352B1 (en) * | 2000-03-22 | 2005-01-18 | Lucent Technologies Inc. | Framework for flexible and scalable real-time traffic emulation for packet switched networks |
US20020038217A1 (en) * | 2000-04-07 | 2002-03-28 | Alan Young | System and method for integrated data analysis and management |
US6789257B1 (en) * | 2000-04-13 | 2004-09-07 | International Business Machines Corporation | System and method for dynamic generation and clean-up of event correlation circuit |
US7500143B2 (en) * | 2000-05-05 | 2009-03-03 | Computer Associates Think, Inc. | Systems and methods for managing and analyzing faults in computer networks |
AU2001261258A1 (en) * | 2000-05-05 | 2001-11-20 | Aprisma Management Technologies, Inc. | Help desk systems and methods for use with communications networks |
US7752024B2 (en) * | 2000-05-05 | 2010-07-06 | Computer Associates Think, Inc. | Systems and methods for constructing multi-layer topological models of computer networks |
WO2001086875A2 (en) * | 2000-05-05 | 2001-11-15 | Sun Microsystems, Inc. | A means for incorporating software into availability models |
US7237138B2 (en) * | 2000-05-05 | 2007-06-26 | Computer Associates Think, Inc. | Systems and methods for diagnosing faults in computer networks |
JP3511620B2 (en) * | 2000-05-17 | 2004-03-29 | 日本電気株式会社 | Performance analysis method and system for large-scale network monitoring system |
US7117215B1 (en) | 2001-06-07 | 2006-10-03 | Informatica Corporation | Method and apparatus for transporting data for data warehousing applications that incorporates analytic data interface |
US7881920B2 (en) | 2000-08-29 | 2011-02-01 | Abu El Ata Nabil A | Systemic enterprise management method and apparatus |
US6907474B2 (en) * | 2000-09-15 | 2005-06-14 | Microsoft Corporation | System and method for adding hardware registers to a power management and configuration system |
US6985845B1 (en) * | 2000-09-26 | 2006-01-10 | Koninklijke Philips Electronics N.V. | Security monitor of system runs software simulator in parallel |
EP1193906A3 (en) * | 2000-09-27 | 2005-05-25 | Telefonaktiebolaget LM Ericsson (publ) | Filter deployment method and agent for event channel networks |
US6560591B1 (en) * | 2000-09-29 | 2003-05-06 | Intel Corporation | System, method, and apparatus for managing multiple data providers |
EP1327322A2 (en) * | 2000-10-18 | 2003-07-16 | Alcatel | Network management |
US7043661B2 (en) | 2000-10-19 | 2006-05-09 | Tti-Team Telecom International Ltd. | Topology-based reasoning apparatus for root-cause analysis of network faults |
US8250570B2 (en) | 2000-10-31 | 2012-08-21 | Hewlett-Packard Development Company, L.P. | Automated provisioning framework for internet site servers |
US20020082818A1 (en) * | 2000-10-31 | 2002-06-27 | Glenn Ferguson | Data model for automated server configuration |
US7213265B2 (en) * | 2000-11-15 | 2007-05-01 | Lockheed Martin Corporation | Real time active network compartmentalization |
US7225467B2 (en) * | 2000-11-15 | 2007-05-29 | Lockheed Martin Corporation | Active intrusion resistant environment of layered object and compartment keys (airelock) |
US7158486B2 (en) * | 2001-03-12 | 2007-01-02 | Opcoast Llc | Method and system for fast computation of routes under multiple network states with communication continuation |
US7089556B2 (en) * | 2001-03-26 | 2006-08-08 | International Business Machines Corporation | System and method for dynamic self-determining asynchronous event-driven computation |
US8135815B2 (en) * | 2001-03-27 | 2012-03-13 | Redseal Systems, Inc. | Method and apparatus for network wide policy-based analysis of configurations of devices |
US7003562B2 (en) * | 2001-03-27 | 2006-02-21 | Redseal Systems, Inc. | Method and apparatus for network wide policy-based analysis of configurations of devices |
US7743147B2 (en) * | 2001-04-20 | 2010-06-22 | Hewlett-Packard Development Company, L.P. | Automated provisioning of computing networks using a network database data model |
WO2002088877A2 (en) * | 2001-04-26 | 2002-11-07 | Celcorp | System and method for the automatic creation of a graphical representation of navigation paths generated by an intelligent planner |
JP2002335245A (en) * | 2001-05-10 | 2002-11-22 | Allied Tereshisu Kk | Method, device and program for detecting node |
EP1261168B1 (en) * | 2001-05-22 | 2011-05-18 | Alcatel Lucent | Method and agents for processing event messages |
WO2002099635A1 (en) * | 2001-06-01 | 2002-12-12 | The Johns Hopkins University | System and method for an open autonomy kernel (oak) |
US7162643B1 (en) | 2001-06-15 | 2007-01-09 | Informatica Corporation | Method and system for providing transfer of analytic application data over a network |
US20030028680A1 (en) * | 2001-06-26 | 2003-02-06 | Frank Jin | Application manager for a content delivery system |
US8204972B2 (en) * | 2001-06-29 | 2012-06-19 | International Business Machines Corporation | Management of logical networks for multiple customers within a network management framework |
US7720842B2 (en) | 2001-07-16 | 2010-05-18 | Informatica Corporation | Value-chained queries in analytic applications |
US20030167182A1 (en) * | 2001-07-23 | 2003-09-04 | International Business Machines Corporation | Method and apparatus for providing symbolic mode checking of business application requirements |
US20030028859A1 (en) * | 2001-07-31 | 2003-02-06 | Nigel Street | Method of collecting, visualizing and analyzing object interaction |
US7506046B2 (en) * | 2001-07-31 | 2009-03-17 | Hewlett-Packard Development Company, L.P. | Network usage analysis system and method for updating statistical models |
CA2460492A1 (en) * | 2001-09-28 | 2003-04-10 | British Telecommunications Public Limited Company | Agent-based intrusion detection system |
GB2381153B (en) * | 2001-10-15 | 2004-10-20 | Jacobs Rimell Ltd | Policy server |
US7523127B2 (en) * | 2002-01-14 | 2009-04-21 | Testout Corporation | System and method for a hierarchical database management system for educational training and competency testing simulations |
US6862698B1 (en) * | 2002-01-22 | 2005-03-01 | Cisco Technology, Inc. | Method of labeling alarms to facilitate correlating alarms in a telecommunications network |
WO2003065634A2 (en) * | 2002-02-01 | 2003-08-07 | John Fairweather | System and method for analyzing data |
US7000193B1 (en) * | 2002-02-07 | 2006-02-14 | Impink Jr Albert J | Display to facilitate the monitoring of a complex process |
US6820077B2 (en) | 2002-02-22 | 2004-11-16 | Informatica Corporation | Method and system for navigating a large amount of data |
US6907549B2 (en) * | 2002-03-29 | 2005-06-14 | Nortel Networks Limited | Error detection in communication systems |
US20040017395A1 (en) * | 2002-04-16 | 2004-01-29 | Cook Thomas A. | System and method for configuring and managing enterprise applications |
US7379857B2 (en) * | 2002-05-10 | 2008-05-27 | Lockheed Martin Corporation | Method and system for simulating computer networks to facilitate testing of computer network security |
US20030217125A1 (en) * | 2002-05-15 | 2003-11-20 | Lucent Technologies, Inc. | Intelligent end user gateway device |
US20030217129A1 (en) * | 2002-05-15 | 2003-11-20 | Lucent Technologies Inc. | Self-organizing intelligent network architecture and methodology |
US7490148B1 (en) | 2002-05-30 | 2009-02-10 | At&T Intellectual Property I, L.P. | Completion performance analysis for internet services |
US8447963B2 (en) | 2002-06-12 | 2013-05-21 | Bladelogic Inc. | Method and system for simplifying distributed server management |
AU2003247560A1 (en) * | 2002-06-18 | 2003-12-31 | Oryxa, Inc. | Data movement platform |
US8140635B2 (en) | 2005-03-31 | 2012-03-20 | Tripwire, Inc. | Data processing environment change management methods and apparatuses |
US7316016B2 (en) * | 2002-07-03 | 2008-01-01 | Tripwire, Inc. | Homogeneous monitoring of heterogeneous nodes |
AU2002950066A0 (en) * | 2002-07-09 | 2002-09-12 | Netmap Analytics Pty Ltd | System analysis |
US8266270B1 (en) | 2002-07-16 | 2012-09-11 | At&T Intellectual Property I, L.P. | Delivery performance analysis for internet services |
US20070061884A1 (en) * | 2002-10-29 | 2007-03-15 | Dapp Michael C | Intrusion detection accelerator |
US20040083466A1 (en) * | 2002-10-29 | 2004-04-29 | Dapp Michael C. | Hardware parser accelerator |
US7080094B2 (en) * | 2002-10-29 | 2006-07-18 | Lockheed Martin Corporation | Hardware accelerated validating parser |
US7146643B2 (en) * | 2002-10-29 | 2006-12-05 | Lockheed Martin Corporation | Intrusion detection accelerator |
US7962589B1 (en) * | 2002-11-07 | 2011-06-14 | Cisco Technology, Inc. | Method and apparatus for providing notification of network alarms using a plurality of distributed layers |
US7428300B1 (en) | 2002-12-09 | 2008-09-23 | Verizon Laboratories Inc. | Diagnosing fault patterns in telecommunication networks |
US7409721B2 (en) * | 2003-01-21 | 2008-08-05 | Symantac Corporation | Network risk analysis |
AU2003277247A1 (en) * | 2003-02-28 | 2004-09-28 | Lockheed Martin Corporation | Hardware accelerator state table compiler |
US7590715B1 (en) | 2003-03-03 | 2009-09-15 | Emc Corporation | Method and system for automatic classification of applications and services by packet inspection |
US7281270B2 (en) * | 2003-04-01 | 2007-10-09 | Lockheed Martin Corporation | Attack impact prediction system |
US7421458B1 (en) | 2003-10-16 | 2008-09-02 | Informatica Corporation | Querying, versioning, and dynamic deployment of database objects |
US6968291B1 (en) * | 2003-11-04 | 2005-11-22 | Sun Microsystems, Inc. | Using and generating finite state machines to monitor system status |
US20050108063A1 (en) * | 2003-11-05 | 2005-05-19 | Madill Robert P.Jr. | Systems and methods for assessing the potential for fraud in business transactions |
US7254590B2 (en) * | 2003-12-03 | 2007-08-07 | Informatica Corporation | Set-oriented real-time data processing based on transaction boundaries |
FR2864282A1 (en) * | 2003-12-17 | 2005-06-24 | France Telecom | Alarm management method for intrusion detection system, involves adding description of alarms to previous alarm, using values established by taxonomic structures, and storing added alarms in logical file system for analysis of alarms |
FR2864392A1 (en) * | 2003-12-17 | 2005-06-24 | France Telecom | Intrusion sensing probe alarm set classifying process for use in information security system, involves constructing lattice for each alarm originated from intrusion sensing probes, and merging lattices to form general lattice |
US20050182582A1 (en) * | 2004-02-12 | 2005-08-18 | International Business Machines Corporation | Adaptive resource monitoring and controls for a computing system |
US20050234696A1 (en) * | 2004-04-15 | 2005-10-20 | The University Of Chicago | Automated agent-based method for identifying infrastructure interdependencies |
US7552447B2 (en) * | 2004-05-26 | 2009-06-23 | International Business Machines Corporation | System and method for using root cause analysis to generate a representation of resource dependencies |
US8626894B2 (en) * | 2004-06-24 | 2014-01-07 | International Business Machines Corporation | Generating visualization output of event correlation information |
US20060010423A1 (en) * | 2004-07-08 | 2006-01-12 | Microsoft Corporation | Variable namespaces and scoping for variables in an object model |
US8214799B2 (en) * | 2004-07-08 | 2012-07-03 | Microsoft Corporation | Providing information to an isolated hosted object via system-created variable objects |
US7661135B2 (en) * | 2004-08-10 | 2010-02-09 | International Business Machines Corporation | Apparatus, system, and method for gathering trace data indicative of resource activity |
US7546601B2 (en) * | 2004-08-10 | 2009-06-09 | International Business Machines Corporation | Apparatus, system, and method for automatically discovering and grouping resources used by a business process |
US7630955B2 (en) * | 2004-08-10 | 2009-12-08 | International Business Machines Corporation | Apparatus, system, and method for analyzing the association of a resource to a business process |
US7631222B2 (en) * | 2004-08-23 | 2009-12-08 | Cisco Technology, Inc. | Method and apparatus for correlating events in a network |
US20060059021A1 (en) * | 2004-09-15 | 2006-03-16 | Jim Yulman | Independent adjuster advisor |
JP2006107080A (en) * | 2004-10-05 | 2006-04-20 | Hitachi Ltd | Storage device system |
US7490073B1 (en) | 2004-12-21 | 2009-02-10 | Zenprise, Inc. | Systems and methods for encoding knowledge for automated management of software application deployments |
US7721152B1 (en) | 2004-12-21 | 2010-05-18 | Symantec Operating Corporation | Integration of cluster information with root cause analysis tool |
US7475130B2 (en) * | 2004-12-23 | 2009-01-06 | International Business Machines Corporation | System and method for problem resolution in communications networks |
US20060242305A1 (en) * | 2005-04-25 | 2006-10-26 | Telefonaktiebolaget L M Ericsson (Publ) | VPN Proxy Management Object |
US7895308B2 (en) * | 2005-05-11 | 2011-02-22 | Tindall Steven J | Messaging system configurator |
US20070005320A1 (en) * | 2005-06-29 | 2007-01-04 | Microsoft Corporation | Model-based configuration management |
WO2007021823A2 (en) | 2005-08-09 | 2007-02-22 | Tripwire, Inc. | Information technology governance and controls methods and apparatuses |
US10318894B2 (en) * | 2005-08-16 | 2019-06-11 | Tripwire, Inc. | Conformance authority reconciliation |
US20070130219A1 (en) * | 2005-11-08 | 2007-06-07 | Microsoft Corporation | Traversing runtime spanning trees |
US8069452B2 (en) | 2005-12-01 | 2011-11-29 | Telefonaktiebolaget L M Ericsson (Publ) | Method and management agent for event notifications correlation |
US7949628B1 (en) | 2005-12-29 | 2011-05-24 | United Services Automobile Association (Usaa) | Information technology configuration management |
US20070162602A1 (en) | 2006-01-06 | 2007-07-12 | International Business Machines Corporation | Template-based approach for workload generation |
US8027271B2 (en) * | 2006-05-30 | 2011-09-27 | Panasonic Corporation | Communicating apparatus and controlling method of thereof |
US8209405B1 (en) * | 2006-06-12 | 2012-06-26 | Juniper Networks, Inc. | Failover scheme with service-based segregation |
US20080104699A1 (en) * | 2006-09-28 | 2008-05-01 | Microsoft Corporation | Secure service computation |
US20080083031A1 (en) * | 2006-12-20 | 2008-04-03 | Microsoft Corporation | Secure service computation |
US20080155336A1 (en) * | 2006-12-20 | 2008-06-26 | International Business Machines Corporation | Method, system and program product for dynamically identifying components contributing to service degradation |
US7716327B2 (en) * | 2007-05-15 | 2010-05-11 | International Business Machines Corporation | Storing dependency and status information with incidents |
JP4848392B2 (en) * | 2007-05-29 | 2011-12-28 | ヒューレット−パッカード デベロップメント カンパニー エル.ピー. | Method and system for determining the criticality of a hot plug device in a computer configuration |
JP4740979B2 (en) * | 2007-05-29 | 2011-08-03 | ヒューレット−パッカード デベロップメント カンパニー エル.ピー. | Method and system for determining device criticality during SAN reconfiguration |
US7861072B2 (en) * | 2007-06-25 | 2010-12-28 | Microsoft Corporation | Throwing one selected representative exception among aggregated multiple exceptions of same root cause received from concurrent tasks and discarding the rest |
US8146085B2 (en) * | 2007-06-25 | 2012-03-27 | Microsoft Corporation | Concurrent exception handling using an aggregated exception structure |
US20090112668A1 (en) * | 2007-10-31 | 2009-04-30 | Abu El Ata Nabil A | Dynamic service emulation of corporate performance |
US20090177692A1 (en) * | 2008-01-04 | 2009-07-09 | Byran Christopher Chagoly | Dynamic correlation of service oriented architecture resource relationship and metrics to isolate problem sources |
US8806037B1 (en) | 2008-02-29 | 2014-08-12 | Netapp, Inc. | Remote support automation for a storage server |
KR100931025B1 (en) * | 2008-03-18 | 2009-12-10 | 한국과학기술원 | Query expansion method using additional terms to improve accuracy without compromising recall |
WO2009132444A1 (en) | 2008-04-28 | 2009-11-05 | Sitemasher Corporation | Object-oriented system for creating and managing websites and their content |
US8160855B2 (en) * | 2008-06-26 | 2012-04-17 | Q1 Labs, Inc. | System and method for simulating network attacks |
EP2292064A4 (en) * | 2008-06-27 | 2011-09-28 | Ericsson Telefon Ab L M | Method and arrangement in a communication network system |
US8914341B2 (en) | 2008-07-03 | 2014-12-16 | Tripwire, Inc. | Method and apparatus for continuous compliance assessment |
WO2010034060A1 (en) * | 2008-09-24 | 2010-04-01 | Iintegrate Systems Pty Ltd | Alert generation system and method |
JP5237034B2 (en) * | 2008-09-30 | 2013-07-17 | 株式会社日立製作所 | Root cause analysis method, device, and program for IT devices that do not acquire event information. |
US8572548B2 (en) * | 2008-10-08 | 2013-10-29 | Accenture Global Services Gmbh | Integrated design application |
US8621211B1 (en) * | 2008-10-24 | 2013-12-31 | Juniper Networks, Inc. | NETCONF/DMI-based secure network device discovery |
US8086909B1 (en) * | 2008-11-05 | 2011-12-27 | Network Appliance, Inc. | Automatic core file upload |
US7992044B2 (en) * | 2008-12-05 | 2011-08-02 | Oracle America, Inc. | Method and system for platform independent fault management |
US8635585B2 (en) * | 2009-02-14 | 2014-01-21 | International Business Machines Corporation | Capturing information accessed, updated and created by processes and using the same for validation of consistency |
US8589863B2 (en) * | 2008-12-11 | 2013-11-19 | International Business Machines Corporation | Capturing information accessed, updated and created by services and using the same for validation of consistency |
US9384042B2 (en) * | 2008-12-16 | 2016-07-05 | International Business Machines Corporation | Techniques for dynamically assigning jobs to processors in a cluster based on inter-thread communications |
US9396021B2 (en) * | 2008-12-16 | 2016-07-19 | International Business Machines Corporation | Techniques for dynamically assigning jobs to processors in a cluster using local job tables |
US8122132B2 (en) * | 2008-12-16 | 2012-02-21 | International Business Machines Corporation | Techniques for dynamically assigning jobs to processors in a cluster based on broadcast information |
US8239524B2 (en) | 2008-12-16 | 2012-08-07 | International Business Machines Corporation | Techniques for dynamically assigning jobs to processors in a cluster based on processor workload |
US7992040B2 (en) | 2009-02-20 | 2011-08-02 | International Business Machines Corporation | Root cause analysis by correlating symptoms with asynchronous changes |
US7979747B2 (en) * | 2009-02-20 | 2011-07-12 | International Business Machines Corporation | Interactive problem resolution presented within the context of major observable application behaviors |
US8516106B2 (en) | 2009-05-18 | 2013-08-20 | International Business Machines Corporation | Use tag clouds to visualize components related to an event |
WO2011025424A1 (en) * | 2009-08-28 | 2011-03-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Handling alarms based on user session records |
US9384080B2 (en) * | 2009-09-01 | 2016-07-05 | International Business Machines Corporation | Synchronizing problem resolution task status using awareness of current state and transaction history |
US20110314331A1 (en) * | 2009-10-29 | 2011-12-22 | Cybernet Systems Corporation | Automated test and repair method and apparatus applicable to complex, distributed systems |
US8930526B2 (en) * | 2009-10-30 | 2015-01-06 | International Business Machines Corporation | Processing network events |
US8458301B1 (en) * | 2009-10-30 | 2013-06-04 | Bradford Networks, Inc. | Automated configuration of network devices administered by policy enforcement |
US9223892B2 (en) | 2010-09-30 | 2015-12-29 | Salesforce.Com, Inc. | Device abstraction for page generation |
US8533523B2 (en) * | 2010-10-27 | 2013-09-10 | International Business Machines Corporation | Data recovery in a cross domain environment |
US8935360B2 (en) | 2010-12-03 | 2015-01-13 | Salesforce.Com, Inc. | Techniques for metadata-driven dynamic content serving |
US20120158446A1 (en) * | 2010-12-20 | 2012-06-21 | Jochen Mayerle | Determining Impacts Of Business Activities |
US8209274B1 (en) | 2011-05-09 | 2012-06-26 | Google Inc. | Predictive model importation |
US9946988B2 (en) * | 2011-09-28 | 2018-04-17 | International Business Machines Corporation | Management and notification of object model changes |
US9077635B2 (en) * | 2012-02-27 | 2015-07-07 | Xerox Corporation | Method and apparatus for network subnet discovery |
US8995249B1 (en) | 2013-02-13 | 2015-03-31 | Amazon Technologies, Inc. | Predicting route utilization and non-redundant failures in network environments |
US9015643B2 (en) * | 2013-03-15 | 2015-04-21 | Nvidia Corporation | System, method, and computer program product for applying a callback function to data values |
US20140278328A1 (en) | 2013-03-15 | 2014-09-18 | Nvidia Corporation | System, method, and computer program product for constructing a data flow and identifying a construct |
US9323502B2 (en) | 2013-03-15 | 2016-04-26 | Nvidia Corporation | System, method, and computer program product for altering a line of code |
US9015646B2 (en) | 2013-04-10 | 2015-04-21 | Nvidia Corporation | System, method, and computer program product for translating a hardware language into a source database |
US9171115B2 (en) | 2013-04-10 | 2015-10-27 | Nvidia Corporation | System, method, and computer program product for translating a common hardware database into a logic code model |
US9021408B2 (en) | 2013-04-10 | 2015-04-28 | Nvidia Corporation | System, method, and computer program product for translating a source database into a common hardware database |
US20140337084A1 (en) * | 2013-05-07 | 2014-11-13 | Vnt Software Ltd. | Method and system for associating it assets with business functions |
US20140337069A1 (en) * | 2013-05-08 | 2014-11-13 | Infosys Limited | Deriving business transactions from web logs |
US9231832B2 (en) * | 2013-12-29 | 2016-01-05 | Ahikam Aharony | Automatically-reconfigurable tropospheric scatter communication link |
US9525599B1 (en) * | 2014-06-24 | 2016-12-20 | Google Inc. | Modeling distributed systems |
US20160072688A1 (en) * | 2014-09-08 | 2016-03-10 | Mayank DESAI | Fault monitoring in multi-domain networks |
US10666715B2 (en) | 2017-05-08 | 2020-05-26 | International Business Machines Corporation | Incident management for complex information technology platforms |
US11349724B2 (en) * | 2018-01-05 | 2022-05-31 | Nicira, Inc. | Predictive analysis in a software defined network |
US11449370B2 (en) | 2018-12-11 | 2022-09-20 | DotWalk, Inc. | System and method for determining a process flow of a software application and for automatically generating application testing code |
KR20200093093A (en) | 2019-01-08 | 2020-08-05 | 삼성전자주식회사 | Distributed inference system and operating method of the same |
US10805144B1 (en) * | 2019-06-18 | 2020-10-13 | Cisco Technology, Inc. | Monitoring interactions between entities in a network by an agent for particular types of interactions and indexing and establishing relationships of the components of each interaction |
CN112887119B (en) * | 2019-11-30 | 2022-09-16 | 华为技术有限公司 | Fault root cause determination method and device and computer storage medium |
US11025508B1 (en) | 2020-04-08 | 2021-06-01 | Servicenow, Inc. | Automatic determination of code customizations |
US11296922B2 (en) | 2020-04-10 | 2022-04-05 | Servicenow, Inc. | Context-aware automated root cause analysis in managed networks |
US10999152B1 (en) | 2020-04-20 | 2021-05-04 | Servicenow, Inc. | Discovery pattern visualizer |
US11301435B2 (en) | 2020-04-22 | 2022-04-12 | Servicenow, Inc. | Self-healing infrastructure for a dual-database system |
US11392768B2 (en) | 2020-05-07 | 2022-07-19 | Servicenow, Inc. | Hybrid language detection model |
US11263195B2 (en) | 2020-05-11 | 2022-03-01 | Servicenow, Inc. | Text-based search of tree-structured tables |
US11175825B1 (en) | 2020-05-13 | 2021-11-16 | International Business Machines Corporation | Configuration-based alert correlation in storage networks |
US11470107B2 (en) | 2020-06-10 | 2022-10-11 | Servicenow, Inc. | Matching configuration items with machine learning |
US11277359B2 (en) | 2020-06-11 | 2022-03-15 | Servicenow, Inc. | Integration of a messaging platform with a remote network management application |
US11451573B2 (en) | 2020-06-16 | 2022-09-20 | Servicenow, Inc. | Merging duplicate items identified by a vulnerability analysis |
US11379089B2 (en) | 2020-07-02 | 2022-07-05 | Servicenow, Inc. | Adaptable user interface layout for applications |
US11277321B2 (en) | 2020-07-06 | 2022-03-15 | Servicenow, Inc. | Escalation tracking and analytics system |
US11301503B2 (en) | 2020-07-10 | 2022-04-12 | Servicenow, Inc. | Autonomous content orchestration |
US11449535B2 (en) | 2020-07-13 | 2022-09-20 | Servicenow, Inc. | Generating conversational interfaces based on metadata |
US11632300B2 (en) | 2020-07-16 | 2023-04-18 | Servicenow, Inc. | Synchronization of a shared service configuration across computational instances |
US11748115B2 (en) | 2020-07-21 | 2023-09-05 | Servicenow, Inc. | Application and related object schematic viewer for software application change tracking and management |
US11272007B2 (en) | 2020-07-21 | 2022-03-08 | Servicenow, Inc. | Unified agent framework including push-based discovery and real-time diagnostics features |
US11343079B2 (en) | 2020-07-21 | 2022-05-24 | Servicenow, Inc. | Secure application deployment |
US11095506B1 (en) | 2020-07-22 | 2021-08-17 | Servicenow, Inc. | Discovery of resources associated with cloud operating system |
US11582106B2 (en) | 2020-07-22 | 2023-02-14 | Servicenow, Inc. | Automatic discovery of cloud-based infrastructure and resources |
US11275580B2 (en) | 2020-08-12 | 2022-03-15 | Servicenow, Inc. | Representing source code as implicit configuration items |
US11372920B2 (en) | 2020-08-31 | 2022-06-28 | Servicenow, Inc. | Generating relational charts with accessibility for visually-impaired users |
US11245591B1 (en) | 2020-09-17 | 2022-02-08 | Servicenow, Inc. | Implementation of a mock server for discovery applications |
US11625141B2 (en) | 2020-09-22 | 2023-04-11 | Servicenow, Inc. | User interface generation with machine learning |
US11150784B1 (en) | 2020-09-22 | 2021-10-19 | Servicenow, Inc. | User interface elements for controlling menu displays |
US11632303B2 (en) | 2020-10-07 | 2023-04-18 | Servicenow, Inc | Enhanced service mapping based on natural language processing |
US11734025B2 (en) | 2020-10-14 | 2023-08-22 | Servicenow, Inc. | Configurable action generation for a remote network management platform |
US11342081B2 (en) | 2020-10-21 | 2022-05-24 | Servicenow, Inc. | Privacy-enhanced contact tracing using mobile applications and portable devices |
US11258847B1 (en) | 2020-11-02 | 2022-02-22 | Servicenow, Inc. | Assignments of incoming requests to servers in computing clusters and other environments |
US11868593B2 (en) | 2020-11-05 | 2024-01-09 | Servicenow, Inc. | Software architecture and user interface for process visualization |
US11363115B2 (en) | 2020-11-05 | 2022-06-14 | Servicenow, Inc. | Integrated operational communications between computational instances of a remote network management platform |
US11281442B1 (en) | 2020-11-18 | 2022-03-22 | Servicenow, Inc. | Discovery and distribution of software applications between multiple operational environments |
US11693831B2 (en) | 2020-11-23 | 2023-07-04 | Servicenow, Inc. | Security for data at rest in a remote network management platform |
US11216271B1 (en) | 2020-12-10 | 2022-01-04 | Servicenow, Inc. | Incremental update for offline data access |
US11269618B1 (en) | 2020-12-10 | 2022-03-08 | Servicenow, Inc. | Client device support for incremental offline updates |
US11630717B2 (en) | 2021-01-06 | 2023-04-18 | Servicenow, Inc. | Machine-learning based similarity engine |
US11301365B1 (en) | 2021-01-13 | 2022-04-12 | Servicenow, Inc. | Software test coverage through real-time tracing of user activity |
US11418586B2 (en) | 2021-01-19 | 2022-08-16 | Servicenow, Inc. | Load balancing of discovery agents across proxy servers |
US11301271B1 (en) | 2021-01-21 | 2022-04-12 | Servicenow, Inc. | Configurable replacements for empty states in user interfaces |
US11921878B2 (en) | 2021-01-21 | 2024-03-05 | Servicenow, Inc. | Database security through obfuscation |
US11513885B2 (en) | 2021-02-16 | 2022-11-29 | Servicenow, Inc. | Autonomous error correction in a multi-application platform |
US11277369B1 (en) | 2021-03-02 | 2022-03-15 | Servicenow, Inc. | Message queue architecture and interface for a multi-application platform |
US11831729B2 (en) | 2021-03-19 | 2023-11-28 | Servicenow, Inc. | Determining application security and correctness using machine learning based clustering and similarity |
US11640369B2 (en) | 2021-05-05 | 2023-05-02 | Servicenow, Inc. | Cross-platform communication for facilitation of data sharing |
US11635953B2 (en) | 2021-05-07 | 2023-04-25 | Servicenow, Inc. | Proactive notifications for robotic process automation |
US11635752B2 (en) | 2021-05-07 | 2023-04-25 | Servicenow, Inc. | Detection and correction of robotic process automation failures |
US11277475B1 (en) | 2021-06-01 | 2022-03-15 | Servicenow, Inc. | Automatic discovery of storage cluster |
US11762668B2 (en) | 2021-07-06 | 2023-09-19 | Servicenow, Inc. | Centralized configuration data management and control |
US11418571B1 (en) | 2021-07-29 | 2022-08-16 | Servicenow, Inc. | Server-side workflow improvement based on client-side data mining |
US11516307B1 (en) | 2021-08-09 | 2022-11-29 | Servicenow, Inc. | Support for multi-type users in a single-type computing system |
US11734381B2 (en) | 2021-12-07 | 2023-08-22 | Servicenow, Inc. | Efficient downloading of related documents |
US11829233B2 (en) | 2022-01-14 | 2023-11-28 | Servicenow, Inc. | Failure prediction in a computing system based on machine learning applied to alert data |
US11582317B1 (en) | 2022-02-07 | 2023-02-14 | Servicenow, Inc. | Payload recording and comparison techniques for discovery |
US11595245B1 (en) | 2022-03-27 | 2023-02-28 | Bank Of America Corporation | Computer network troubleshooting and diagnostics using metadata |
US11658889B1 (en) | 2022-03-27 | 2023-05-23 | Bank Of America Corporation | Computer network architecture mapping using metadata |
US11734150B1 (en) | 2022-06-10 | 2023-08-22 | Servicenow, Inc. | Activity tracing through event correlation across multiple software applications |
CN114996110A (en) * | 2022-06-20 | 2022-09-02 | 中电信数智科技有限公司 | Deep inspection optimization method and system based on micro-service architecture |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4985857A (en) | 1988-08-19 | 1991-01-15 | General Motors Corporation | Method and apparatus for diagnosing machines |
US5295244A (en) | 1990-09-17 | 1994-03-15 | Cabletron Systems, Inc. | Network management system using interconnected hierarchies to represent different network dimensions in multiple display views |
US5195095A (en) | 1990-12-28 | 1993-03-16 | General Electric Company | Algorithm for identifying tests to perform for fault isolation |
US5317568A (en) * | 1991-04-11 | 1994-05-31 | Galileo International Partnership | Method and apparatus for managing and facilitating communications in a distributed hetergeneous network |
US5309448A (en) | 1992-01-03 | 1994-05-03 | International Business Machines Corporation | Methods and systems for alarm correlation and fault localization in communication networks |
US5537547A (en) * | 1992-12-14 | 1996-07-16 | At&T Corp. | Automatic network element identity information distribution apparatus and method |
US5608720A (en) * | 1993-03-09 | 1997-03-04 | Hubbell Incorporated | Control system and operations system interface for a network element in an access system |
US5528516A (en) | 1994-05-25 | 1996-06-18 | System Management Arts, Inc. | Apparatus and method for event correlation and problem reporting |
SE502999C2 (en) * | 1994-06-13 | 1996-03-11 | Ericsson Telefon Ab L M | telecommunication systems |
US5918051A (en) * | 1995-07-19 | 1999-06-29 | Ricoh Company, Ltd. | Object-oriented communication system with support for multiple remote machine types |
US5748896A (en) * | 1995-12-27 | 1998-05-05 | Apple Computer, Inc. | Remote network administration methods and apparatus |
US5854750A (en) * | 1996-09-03 | 1998-12-29 | Insession, Inc. | System and method for processing transactions in an environment containing a number of object oriented applications |
US5951680A (en) * | 1997-06-24 | 1999-09-14 | International Business Machines Corporation | Configurator object |
-
1998
- 1998-03-26 US US09/048,025 patent/US6393386B1/en not_active Expired - Fee Related
-
1999
- 1999-03-26 JP JP2000538359A patent/JP2002508555A/en not_active Withdrawn
- 1999-03-26 WO PCT/US1999/006669 patent/WO1999049474A1/en not_active Application Discontinuation
- 1999-03-26 AU AU33657/99A patent/AU3365799A/en not_active Abandoned
- 1999-03-26 EP EP99915051A patent/EP1073960A1/en not_active Withdrawn
Cited By (202)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020032769A1 (en) * | 2000-04-28 | 2002-03-14 | Sharon Barkai | Network management method and system |
US7930396B2 (en) * | 2000-07-10 | 2011-04-19 | Bmc Software, Inc. | System and method of enterprise systems and business impact management |
US20100268565A1 (en) * | 2000-07-10 | 2010-10-21 | Bmc Software, Inc. | System and method of enterprise systems and business impact management |
US7739380B2 (en) | 2000-10-24 | 2010-06-15 | Microsoft Corporation | System and method for distributed management of shared computers |
US20050125212A1 (en) * | 2000-10-24 | 2005-06-09 | Microsoft Corporation | System and method for designing a logical model of a distributed computer system and deploying physical resources according to the logical model |
US20050097097A1 (en) * | 2000-10-24 | 2005-05-05 | Microsoft Corporation | System and method for distributed management of shared computers |
US20050091078A1 (en) * | 2000-10-24 | 2005-04-28 | Microsoft Corporation | System and method for distributed management of shared computers |
US7711121B2 (en) | 2000-10-24 | 2010-05-04 | Microsoft Corporation | System and method for distributed management of shared computers |
US20020174222A1 (en) * | 2000-10-27 | 2002-11-21 | Cox Earl D. | Behavior experts in e-service management |
US20020138242A1 (en) * | 2000-12-12 | 2002-09-26 | Uri Wilensky | Distributed agent network using object based parallel modeling language to dynamically model agent activities |
US7711533B2 (en) * | 2000-12-12 | 2010-05-04 | Uri Wilensky | Distributed agent network using object based parallel modeling language to dynamically model agent activities |
US7778804B2 (en) * | 2001-11-20 | 2010-08-17 | Hewlett-Packard Development Company, L.P. | Network system analysis |
US20030097245A1 (en) * | 2001-11-20 | 2003-05-22 | Athena Christodoulou | System analysis |
US20100103937A1 (en) * | 2001-12-10 | 2010-04-29 | O'neil Joseph Thomas | System for utilizing genetic algorithm to provide constraint-based routing of packets in a communication network |
US8064432B2 (en) | 2001-12-10 | 2011-11-22 | At&T Intellectual Property Ii, L.P. | System for utilizing genetic algorithm to provide constraint-based routing of packets in a communication network |
US7664094B1 (en) * | 2001-12-10 | 2010-02-16 | At&T Corp. | System for utilizing genetic algorithm to provide constraint-based routing of packets in a communication network |
US7092378B1 (en) * | 2001-12-10 | 2006-08-15 | At & T Corp. | System for utilizing a genetic algorithm to provide constraint-based routing of packets in a communication network |
US20050010586A1 (en) * | 2002-02-22 | 2005-01-13 | Fujitsu Limited | Server machine, client machine, server/client system, server program and client program |
FR2838844A1 (en) * | 2002-04-23 | 2003-10-24 | France Telecom | Performance model generation method in which a functional model of a system is used as a starting point for model generation, the invention relating particularly to communications networks with a multiplicity of hard- and software |
WO2003092245A3 (en) * | 2002-04-23 | 2004-04-08 | France Telecom | Method of generating a performance model from a functional model |
US7536292B2 (en) | 2002-04-23 | 2009-05-19 | France Telecom | Method of generating a performance model from a functional model |
WO2003092245A2 (en) * | 2002-04-23 | 2003-11-06 | France Telecom | Method of generating a performance model from a functional model |
US8141104B2 (en) * | 2002-09-27 | 2012-03-20 | International Business Machines Corporation | Integrating non-compliant providers of dynamic services into a resource management infrastructure |
US20080178194A1 (en) * | 2002-09-27 | 2008-07-24 | International Business Machines Corporation | Integrating Non-Compliant Providers of Dynamic Services into a Resource Management infrastructure |
US7469217B2 (en) | 2002-12-10 | 2008-12-23 | Savvis Communications Corporation | Product toolkit system and method |
US20040111327A1 (en) * | 2002-12-10 | 2004-06-10 | Kidd Susan O. | Product toolkit system and method |
US8250202B2 (en) | 2003-01-04 | 2012-08-21 | International Business Machines Corporation | Distributed notification and action mechanism for mirroring-related events |
US7890951B2 (en) | 2003-03-06 | 2011-02-15 | Microsoft Corporation | Model-based provisioning of test environments |
US7890543B2 (en) * | 2003-03-06 | 2011-02-15 | Microsoft Corporation | Architecture for distributed computing system and automated design, deployment, and management of distributed applications |
US7792931B2 (en) | 2003-03-06 | 2010-09-07 | Microsoft Corporation | Model-based system provisioning |
US7886041B2 (en) | 2003-03-06 | 2011-02-08 | Microsoft Corporation | Design time validation of systems |
US20060271341A1 (en) * | 2003-03-06 | 2006-11-30 | Microsoft Corporation | Architecture for distributed computing system and automated design, deployment, and management of distributed applications |
US20060025985A1 (en) * | 2003-03-06 | 2006-02-02 | Microsoft Corporation | Model-Based system management |
US20060031248A1 (en) * | 2003-03-06 | 2006-02-09 | Microsoft Corporation | Model-based system provisioning |
US20040199572A1 (en) * | 2003-03-06 | 2004-10-07 | Hunt Galen C. | Architecture for distributed computing system and automated design, deployment, and management of distributed applications |
US20060037002A1 (en) * | 2003-03-06 | 2006-02-16 | Microsoft Corporation | Model-based provisioning of test environments |
US20060034263A1 (en) * | 2003-03-06 | 2006-02-16 | Microsoft Corporation | Model and system state synchronization |
US7684964B2 (en) | 2003-03-06 | 2010-03-23 | Microsoft Corporation | Model and system state synchronization |
US7689676B2 (en) | 2003-03-06 | 2010-03-30 | Microsoft Corporation | Model-based policy application |
US20080059214A1 (en) * | 2003-03-06 | 2008-03-06 | Microsoft Corporation | Model-Based Policy Application |
US8122106B2 (en) | 2003-03-06 | 2012-02-21 | Microsoft Corporation | Integrating design, deployment, and management phases for systems |
US20040205179A1 (en) * | 2003-03-06 | 2004-10-14 | Hunt Galen C. | Integrating design, deployment, and management phases for systems |
US7657499B2 (en) * | 2003-04-07 | 2010-02-02 | Belarc, Inc. | Grouping of computers in a computer information database system |
US8285720B2 (en) | 2003-04-07 | 2012-10-09 | Belarc, Inc. | Grouping of computers in a computer information database system |
US20040236728A1 (en) * | 2003-04-07 | 2004-11-25 | Newman Gary H. | Grouping of computers in a computer information database system |
US20040267920A1 (en) * | 2003-06-30 | 2004-12-30 | Aamer Hydrie | Flexible network load balancing |
US20050055435A1 (en) * | 2003-06-30 | 2005-03-10 | Abolade Gbadegesin | Network load balancing with connection manipulation |
US20040268358A1 (en) * | 2003-06-30 | 2004-12-30 | Microsoft Corporation | Network load balancing with host status information |
US7778422B2 (en) | 2004-02-27 | 2010-08-17 | Microsoft Corporation | Security associations for devices |
US7669235B2 (en) | 2004-04-30 | 2010-02-23 | Microsoft Corporation | Secure domain join for computing devices |
US10686675B2 (en) | 2004-07-07 | 2020-06-16 | Sciencelogic, Inc. | Self configuring network management system |
US20060092861A1 (en) * | 2004-07-07 | 2006-05-04 | Christopher Corday | Self configuring network management system |
US9077611B2 (en) * | 2004-07-07 | 2015-07-07 | Sciencelogic, Inc. | Self configuring network management system |
US20060015596A1 (en) * | 2004-07-14 | 2006-01-19 | Dell Products L.P. | Method to configure a cluster via automatic address generation |
US20060153089A1 (en) * | 2004-12-23 | 2006-07-13 | Silverman Robert M | System and method for analysis of communications networks |
US7769850B2 (en) * | 2004-12-23 | 2010-08-03 | International Business Machines Corporation | System and method for analysis of communications networks |
US20060149782A1 (en) * | 2005-01-05 | 2006-07-06 | Microsoft Corporation | Prescribed navigation using topology metadata and navigation path |
US7650349B2 (en) * | 2005-01-05 | 2010-01-19 | Microsoft Corporation | Prescribed navigation using topology metadata and navigation path |
US20060235650A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based system monitoring |
US7802144B2 (en) | 2005-04-15 | 2010-09-21 | Microsoft Corporation | Model-based system monitoring |
US7797147B2 (en) * | 2005-04-15 | 2010-09-14 | Microsoft Corporation | Model-based system monitoring |
US20060235664A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based capacity planning |
US20060235962A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based system monitoring |
US20060232927A1 (en) * | 2005-04-15 | 2006-10-19 | Microsoft Corporation | Model-based system monitoring |
US8489728B2 (en) | 2005-04-15 | 2013-07-16 | Microsoft Corporation | Model-based system monitoring |
US9317270B2 (en) | 2005-06-29 | 2016-04-19 | Microsoft Technology Licensing, Llc | Model-based virtual system provisioning |
US8549513B2 (en) | 2005-06-29 | 2013-10-01 | Microsoft Corporation | Model-based virtual system provisioning |
US9811368B2 (en) | 2005-06-29 | 2017-11-07 | Microsoft Technology Licensing, Llc | Model-based virtual system provisioning |
US10540159B2 (en) | 2005-06-29 | 2020-01-21 | Microsoft Technology Licensing, Llc | Model-based virtual system provisioning |
US20070016393A1 (en) * | 2005-06-29 | 2007-01-18 | Microsoft Corporation | Model-based propagation of attributes |
US20070006218A1 (en) * | 2005-06-29 | 2007-01-04 | Microsoft Corporation | Model-based virtual system provisioning |
US7941309B2 (en) | 2005-11-02 | 2011-05-10 | Microsoft Corporation | Modeling IT operations/policies |
US20070112847A1 (en) * | 2005-11-02 | 2007-05-17 | Microsoft Corporation | Modeling IT operations/policies |
US7561801B2 (en) * | 2006-03-31 | 2009-07-14 | Applied Micro Circuits Corporation | Optical transceiver with electrical ring distribution interface |
US20070230955A1 (en) * | 2006-03-31 | 2007-10-04 | Applied Micro Circuits Corporation | Optical transceiver with electrical ring distribution interface |
US20090238567A1 (en) * | 2006-03-31 | 2009-09-24 | Glen Miller | Electrical Ring Distribution Interface for an Optical Transceiver |
US9003428B2 (en) | 2006-06-27 | 2015-04-07 | International Business Machines Corporation | Computer data communications in a high speed, low latency data communications environment |
US8676876B2 (en) | 2006-06-27 | 2014-03-18 | International Business Machines Corporation | Synchronizing an active feed adapter and a backup feed adapter in a high speed, low latency data communications environment |
US20070300233A1 (en) * | 2006-06-27 | 2007-12-27 | Kulvir S Bhogal | Computer data communications in a high speed, low latency data communications environment |
US20070300234A1 (en) * | 2006-06-27 | 2007-12-27 | Eliezer Dekel | Selecting application messages from an active feed adapter and a backup feed adapter for application-level data processing in a high speed, low latency data communications environment |
US20080010487A1 (en) * | 2006-06-27 | 2008-01-10 | Eliezer Dekel | Synchronizing an active feed adapter and a backup feed adapter in a high speed, low latency data communications environment |
US8122144B2 (en) | 2006-06-27 | 2012-02-21 | International Business Machines Corporation | Reliable messaging using redundant message streams in a high speed, low latency data communications environment |
US8296778B2 (en) | 2006-06-27 | 2012-10-23 | International Business Machines Corporation | Computer data communications in a high speed, low latency data communications environment |
US8549168B2 (en) | 2006-06-27 | 2013-10-01 | International Business Machines Corporation | Reliable messaging using redundant message streams in a high speed, low latency data communications environment |
US9172611B2 (en) * | 2006-09-01 | 2015-10-27 | Spirent Communications, Inc. | System and method for discovering assets and functional relationships in a network |
US20100106742A1 (en) * | 2006-09-01 | 2010-04-29 | Mu Dynamics, Inc. | System and Method for Discovering Assets and Functional Relationships in a Network |
US20080114938A1 (en) * | 2006-11-14 | 2008-05-15 | Borgendale Kenneth W | Application Message Caching In A Feed Adapter |
US20080141272A1 (en) * | 2006-12-06 | 2008-06-12 | Borgendale Kenneth W | Application Message Conversion Using A Feed Adapter |
US8695015B2 (en) | 2006-12-06 | 2014-04-08 | International Business Machines Corporation | Application message conversion using a feed adapter |
US20080140550A1 (en) * | 2006-12-07 | 2008-06-12 | Berezuk John F | Generating a global system configuration for a financial market data system |
US20080137830A1 (en) * | 2006-12-12 | 2008-06-12 | Bhogal Kulvir S | Dispatching A Message Request To A Service Provider In A Messaging Environment |
US20080141274A1 (en) * | 2006-12-12 | 2008-06-12 | Bhogal Kulvir S | Subscribing For Application Messages In A Multicast Messaging Environment |
US8327381B2 (en) | 2006-12-12 | 2012-12-04 | International Business Machines Corporation | Referencing message elements in an application message in a messaging environment |
US20080141276A1 (en) * | 2006-12-12 | 2008-06-12 | Borgendale Kenneth W | Referencing Message Elements In An Application Message In A Messaging Environment |
US8850451B2 (en) | 2006-12-12 | 2014-09-30 | International Business Machines Corporation | Subscribing for application messages in a multicast messaging environment |
US20080141275A1 (en) * | 2006-12-12 | 2008-06-12 | Borgendale Kenneth W | Filtering Application Messages In A High Speed, Low Latency Data Communications Environment |
US7917912B2 (en) | 2007-03-27 | 2011-03-29 | International Business Machines Corporation | Filtering application messages in a high speed, low latency data communications environment |
US20080244017A1 (en) * | 2007-03-27 | 2008-10-02 | Gidon Gershinsky | Filtering application messages in a high speed, low latency data communications environment |
US20080256395A1 (en) * | 2007-04-10 | 2008-10-16 | Araujo Carlos C | Determining and analyzing a root cause incident in a business solution |
US20080288622A1 (en) * | 2007-05-18 | 2008-11-20 | Microsoft Corporation | Managing Server Farms |
US20090006559A1 (en) * | 2007-06-27 | 2009-01-01 | Bhogal Kulvir S | Application Message Subscription Tracking In A High Speed, Low Latency Data Communications Environment |
US20090024498A1 (en) * | 2007-07-20 | 2009-01-22 | Berezuk John F | Establishing A Financial Market Data Component In A Financial Market Data System |
US8763006B2 (en) | 2007-12-28 | 2014-06-24 | International Business Machines Corporation | Dynamic generation of processes in computing environments |
US20090172669A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Use of redundancy groups in runtime computer management of business applications |
US20090172689A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Adaptive business resiliency computer system for information technology environments |
US20110093853A1 (en) * | 2007-12-28 | 2011-04-21 | International Business Machines Corporation | Real-time information technology environments |
US20090171704A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Management based on computer dynamically adjusted discrete phases of event correlation |
US7958393B2 (en) | 2007-12-28 | 2011-06-07 | International Business Machines Corporation | Conditional actions based on runtime conditions of a computer system environment |
US20090171708A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Using templates in a computing environment |
US8990810B2 (en) | 2007-12-28 | 2015-03-24 | International Business Machines Corporation | Projecting an effect, using a pairing construct, of execution of a proposed action on a computing environment |
US20090172668A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Conditional computer runtime control of an information technology environment based on pairing constructs |
US20090171706A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Computer pattern system environment supporting business resiliency |
US20090172671A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Adaptive computer sequencing of actions |
US20090171731A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Use of graphs in managing computing environments |
US20090172670A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Dynamic generation of processes in computing environments |
US20090172687A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Management of computer events in a computer environment |
US20090172674A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Managing the computer collection of information in an information technology environment |
US8326910B2 (en) | 2007-12-28 | 2012-12-04 | International Business Machines Corporation | Programmatic validation in an information technology environment |
US8341014B2 (en) * | 2007-12-28 | 2012-12-25 | International Business Machines Corporation | Recovery segments for computer business applications |
US8346931B2 (en) | 2007-12-28 | 2013-01-01 | International Business Machines Corporation | Conditional computer runtime control of an information technology environment based on pairing constructs |
US8365185B2 (en) | 2007-12-28 | 2013-01-29 | International Business Machines Corporation | Preventing execution of processes responsive to changes in the environment |
US8375244B2 (en) | 2007-12-28 | 2013-02-12 | International Business Machines Corporation | Managing processing of a computing environment during failures of the environment |
US8428983B2 (en) | 2007-12-28 | 2013-04-23 | International Business Machines Corporation | Facilitating availability of information technology resources based on pattern system environments |
US20090171703A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Use of multi-level state assessment in computer business environments |
US8447859B2 (en) | 2007-12-28 | 2013-05-21 | International Business Machines Corporation | Adaptive business resiliency computer system for information technology environments |
US20090172769A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Programmatic validation in an information technology environment |
US20090171732A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Non-disruptively changing a computing environment |
US20090171705A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Defining and using templates in configuring information technology environments |
US20090172682A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Serialization in computer management |
US20090172460A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Defining a computer recovery process that matches the scope of outage |
US9558459B2 (en) | 2007-12-28 | 2017-01-31 | International Business Machines Corporation | Dynamic selection of actions in an information technology environment |
US20090172461A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Conditional actions based on runtime conditions of a computer system environment |
US20090171733A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Dynamic selection of actions in an information technology environment |
US8868441B2 (en) | 2007-12-28 | 2014-10-21 | International Business Machines Corporation | Non-disruptively changing a computing environment |
US8677174B2 (en) | 2007-12-28 | 2014-03-18 | International Business Machines Corporation | Management of runtime events in a computer environment using a containment region |
US20090171730A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Non-disruptively changing scope of computer business applications based on detected changes in topology |
US8682705B2 (en) * | 2007-12-28 | 2014-03-25 | International Business Machines Corporation | Information technology management based on computer dynamically adjusted discrete phases of event correlation |
US20090172688A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Managing execution within a computing environment |
US8751283B2 (en) | 2007-12-28 | 2014-06-10 | International Business Machines Corporation | Defining and using templates in configuring information technology environments |
US20090171707A1 (en) * | 2007-12-28 | 2009-07-02 | International Business Machines Corporation | Recovery segments for computer business applications |
US8775591B2 (en) | 2007-12-28 | 2014-07-08 | International Business Machines Corporation | Real-time information technology environments |
US8782662B2 (en) | 2007-12-28 | 2014-07-15 | International Business Machines Corporation | Adaptive computer sequencing of actions |
US8826077B2 (en) * | 2007-12-28 | 2014-09-02 | International Business Machines Corporation | Defining a computer recovery process that matches the scope of outage including determining a root cause and performing escalated recovery operations |
US20110066719A1 (en) * | 2008-01-31 | 2011-03-17 | Vitaly Miryanov | Automated Applicatin Dependency Mapping |
CN101933003A (en) * | 2008-01-31 | 2010-12-29 | 惠普开发有限公司 | Automated application dependency mapping |
US8433811B2 (en) | 2008-09-19 | 2013-04-30 | Spirent Communications, Inc. | Test driven deployment and monitoring of heterogeneous network systems |
US20100169066A1 (en) * | 2008-12-30 | 2010-07-01 | International Business Machines Corporation | General Framework to Predict Parametric Costs |
US8234100B2 (en) * | 2008-12-30 | 2012-07-31 | International Business Machines Corporation | General framework to predict parametric costs |
US20130152106A1 (en) * | 2009-07-22 | 2013-06-13 | International Business Machines Corporation | Managing events in a configuration of soa governance components |
US9225777B2 (en) | 2011-05-20 | 2015-12-29 | Amazon Technologies, Inc. | Load balancer |
US8998544B1 (en) * | 2011-05-20 | 2015-04-07 | Amazon Technologies, Inc. | Load balancer |
US11568331B2 (en) * | 2011-09-26 | 2023-01-31 | Open Text Corporation | Methods and systems for providing automated predictive analysis |
US20160210553A1 (en) * | 2011-10-11 | 2016-07-21 | International Business Machines Corporation | Predicting the Impact of Change on Events Detected in Application Logic |
US9679245B2 (en) * | 2011-10-11 | 2017-06-13 | International Business Machines Corporation | Predicting the impact of change on events detected in application logic |
US20140297684A1 (en) * | 2011-10-11 | 2014-10-02 | International Business Machines Corporation | Predicting the Impact of Change on Events Detected in Application Logic |
US9384305B2 (en) * | 2011-10-11 | 2016-07-05 | International Business Machines Corporation | Predicting the impact of change on events detected in application logic |
US20130176867A1 (en) * | 2012-01-09 | 2013-07-11 | Florin Balus | Method and apparatus for object grouping and state modeling for application instances |
US9160626B2 (en) * | 2012-01-09 | 2015-10-13 | Alcatel Lucent | Method and apparatus for object grouping and state modeling for application instances |
US10346744B2 (en) * | 2012-03-29 | 2019-07-09 | Elasticsearch B.V. | System and method for visualisation of behaviour within computer infrastructure |
US20130262347A1 (en) * | 2012-03-29 | 2013-10-03 | Prelert Ltd. | System and Method for Visualisation of Behaviour within Computer Infrastructure |
US11657309B2 (en) | 2012-03-29 | 2023-05-23 | Elasticsearch B.V. | Behavior analysis and visualization for a computer infrastructure |
US8972543B1 (en) | 2012-04-11 | 2015-03-03 | Spirent Communications, Inc. | Managing clients utilizing reverse transactions |
US20130290520A1 (en) * | 2012-04-27 | 2013-10-31 | International Business Machines Corporation | Network configuration predictive analytics engine |
US9923787B2 (en) * | 2012-04-27 | 2018-03-20 | International Business Machines Corporation | Network configuration predictive analytics engine |
US20130290512A1 (en) * | 2012-04-27 | 2013-10-31 | International Business Machines Corporation | Network configuration predictive analytics engine |
US9537749B2 (en) | 2012-06-06 | 2017-01-03 | Tufin Software Technologies Ltd. | Method of network connectivity analyses and system thereof |
WO2014001841A1 (en) | 2012-06-25 | 2014-01-03 | Kni Műszaki Tanácsadó Kft. | Methods of implementing a dynamic service-event management system |
US9632904B1 (en) * | 2013-02-15 | 2017-04-25 | Ca, Inc. | Alerting based on service dependencies of modeled processes |
US20140281739A1 (en) * | 2013-03-14 | 2014-09-18 | Netflix, Inc. | Critical systems inspector |
US9582395B2 (en) * | 2013-03-14 | 2017-02-28 | Netflix, Inc. | Critical systems inspector |
US20160004584A1 (en) * | 2013-08-09 | 2016-01-07 | Hitachi, Ltd. | Method and computer system to allocate actual memory area from storage pool to virtual volume |
US20160315818A1 (en) * | 2014-04-25 | 2016-10-27 | Teoco Corporation | System, Method, and Computer Program Product for Extracting a Topology of a Telecommunications Network Related to a Service |
US10454770B2 (en) * | 2014-04-25 | 2019-10-22 | Teoco Ltd. | System, method, and computer program product for extracting a topology of a telecommunications network related to a service |
US20170102982A1 (en) * | 2015-10-13 | 2017-04-13 | Honeywell International Inc. | Methods and apparatus for the creation and use of reusable fault model components in fault modeling and complex system prognostics |
US9959158B2 (en) * | 2015-10-13 | 2018-05-01 | Honeywell International Inc. | Methods and apparatus for the creation and use of reusable fault model components in fault modeling and complex system prognostics |
US10546241B2 (en) | 2016-01-08 | 2020-01-28 | Futurewei Technologies, Inc. | System and method for analyzing a root cause of anomalous behavior using hypothesis testing |
US10332056B2 (en) | 2016-03-14 | 2019-06-25 | Futurewei Technologies, Inc. | Features selection and pattern mining for KQI prediction and cause analysis |
US10614085B2 (en) * | 2016-05-26 | 2020-04-07 | International Business Machines Corporation | System impact based logging with enhanced event context |
US10614398B2 (en) * | 2016-05-26 | 2020-04-07 | International Business Machines Corporation | System impact based logging with resource finding remediation |
US20170344926A1 (en) * | 2016-05-26 | 2017-11-30 | International Business Machines Corporation | System impact based logging with resource finding remediation |
US20170344413A1 (en) * | 2016-05-26 | 2017-11-30 | International Business Machines Corporation | System impact based logging with enhanced event context |
US10313365B2 (en) * | 2016-08-15 | 2019-06-04 | International Business Machines Corporation | Cognitive offense analysis using enriched graphs |
US10482158B2 (en) | 2017-03-31 | 2019-11-19 | Futurewei Technologies, Inc. | User-level KQI anomaly detection using markov chain model |
US20180367405A1 (en) * | 2017-06-19 | 2018-12-20 | Cisco Technology, Inc. | Validation of bridge domain-l3out association for communication outside a network |
US10812336B2 (en) * | 2017-06-19 | 2020-10-20 | Cisco Technology, Inc. | Validation of bridge domain-L3out association for communication outside a network |
US11283682B2 (en) | 2017-06-19 | 2022-03-22 | Cisco Technology, Inc. | Validation of bridge domain-L3out association for communication outside a network |
US11100152B2 (en) * | 2017-08-17 | 2021-08-24 | Target Brands, Inc. | Data portal |
CN111052252A (en) * | 2017-09-01 | 2020-04-21 | X开发有限责任公司 | Heterogeneous method for modeling a biochemical environment |
US11188067B2 (en) * | 2018-11-30 | 2021-11-30 | Siemens Aktiengesellschaft | Method and system for elimination of fault conditions in a technical installation |
CN111260176A (en) * | 2018-11-30 | 2020-06-09 | 西门子股份公司 | Method and system for eliminating fault conditions in a technical installation |
US20230048212A1 (en) * | 2019-03-12 | 2023-02-16 | Ebay Inc. | Enhancement Of Machine Learning-Based Anomaly Detection Using Knowledge Graphs |
US11405260B2 (en) | 2019-11-18 | 2022-08-02 | Juniper Networks, Inc. | Network model aware diagnosis of a network |
US11533215B2 (en) * | 2020-01-31 | 2022-12-20 | Juniper Networks, Inc. | Programmable diagnosis model for correlation of network events |
US11956116B2 (en) | 2020-01-31 | 2024-04-09 | Juniper Networks, Inc. | Programmable diagnosis model for correlation of network events |
US11005766B1 (en) * | 2020-04-06 | 2021-05-11 | International Business Machines Corporation | Path profiling for streaming applications |
US11809266B2 (en) | 2020-07-14 | 2023-11-07 | Juniper Networks, Inc. | Failure impact analysis of network events |
CN112130888A (en) * | 2020-08-12 | 2020-12-25 | 百度时代网络技术(北京)有限公司 | Method, device and equipment for updating application program and computer storage medium |
CN112433716A (en) * | 2020-11-12 | 2021-03-02 | 北京航空航天大学 | Runtime component dynamic interaction model construction method based on non-invasive monitoring |
CN113268370A (en) * | 2021-05-11 | 2021-08-17 | 西安交通大学 | Root cause alarm analysis method, system, equipment and storage medium |
US20230168966A1 (en) * | 2021-11-29 | 2023-06-01 | Vmware, Inc. | Optimized alarm state restoration through categorization |
US11815999B2 (en) * | 2021-11-29 | 2023-11-14 | Vmware, Inc. | Optimized alarm state restoration through categorization |
US20240004743A1 (en) * | 2022-06-29 | 2024-01-04 | Workspot, Inc. | Method and system for real-time identification of blast radius of a fault in a globally distributed virtual desktop fabric |
Also Published As
Publication number | Publication date |
---|---|
JP2002508555A (en) | 2002-03-19 |
US6393386B1 (en) | 2002-05-21 |
EP1073960A1 (en) | 2001-02-07 |
AU3365799A (en) | 1999-10-18 |
WO1999049474A1 (en) | 1999-09-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6393386B1 (en) | Dynamic modeling of complex networks and prediction of impacts of faults therein | |
US11625293B1 (en) | Intent driven root cause analysis | |
US7069480B1 (en) | Method and apparatus for identifying problems in computer networks | |
US7509540B1 (en) | Method and apparatus for maintaining the status of objects in computer networks using virtual state machines | |
US7480713B2 (en) | Method and system for network management with redundant monitoring and categorization of endpoints | |
US7296194B1 (en) | Method and apparatus for maintaining the status of objects in computer networks using virtual state machines | |
US10924329B2 (en) | Self-healing Telco network function virtualization cloud | |
US10530740B2 (en) | Systems and methods for facilitating closed loop processing using machine learning | |
US20030009552A1 (en) | Method and system for network management with topology system providing historical topological views | |
US20010027470A1 (en) | System, method and computer program product for providing a remote support service | |
US11356318B2 (en) | Self-healing telco network function virtualization cloud | |
US6990518B1 (en) | Object-driven network management system enabling dynamically definable management behavior | |
Smith | A system for monitoring and management of computational grids | |
Vlahavas et al. | ExperNet: an intelligent multiagent system for WAN management | |
EP1118952A2 (en) | System, method and computer program product for providing a remote support service | |
Lutfiyya et al. | Fault management in distributed systems: A policy-driven approach | |
Martin-Flatin et al. | A survey of distributed network and systems management paradigms | |
Danciu et al. | IT Service Management: Getting the View | |
Jailani et al. | FMS: A computer network fault management system based on the OSI standards | |
Frey et al. | Multi-level reasoning for managing distributed enterprises and their networks | |
Viswanathan et al. | ERMIS: Designing, developing, and delivering a remote managed infrastructure services solution | |
Yusuff | Network Monitoring: Using Nagios as an example tool | |
Vornanen | ScienceLogic SL1 basics and server monitoring | |
Bohdanowicz et al. | The problematic of distributed systems supervision-an example: Genesys | |
Talwar et al. | Scalable management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TRANSAMERICA BUSINES CREDIT CORPORATION, CALIFORNI Free format text: SECURITY AGREEMENT;ASSIGNOR:AVESTA TECHNOLOGIES, INC.;REEL/FRAME:009589/0541 Effective date: 19981016 |
|
AS | Assignment |
Owner name: AVESTA TECHNOLOGIES, INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZAGER, DAVID;KOSTES, ROBERT;REEL/FRAME:010504/0640 Effective date: 20000103 |
|
AS | Assignment |
Owner name: SILICON VALLEY BANK, MASSACHUSETTS Free format text: SECURITY AGREEMENT;ASSIGNORS:VISUAL NETWORKS, INC.;VISUAL NETWORKS OPERATIONS, INC.;VISUAL NETWORKS INVESTMENTS, INC.;AND OTHERS;REEL/FRAME:011641/0979 Effective date: 20010228 |
|
AS | Assignment |
Owner name: VISUAL NETWORKS TECHNOLOGIES, INC., MARYLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AVESTA TECHNOLOGIES, INC.;REEL/FRAME:011771/0001 Effective date: 20010502 |
|
AS | Assignment |
Owner name: VISUAL NETWORKS OPERATIONS, INC., MARYLAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VISUAL NETWORKS TECHNOLOGIES, INC.;REEL/FRAME:015008/0554 Effective date: 20040225 |
|
AS | Assignment |
Owner name: SPECIAL SITUATIONS FUND III, L.P., NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNORS:VISUAL NETWORKS, INC.;VISUAL NETWORKS OPERATIONS, INC.;VISUAL NETWORKS INTERNATIONAL OPERATIONS, INC.;AND OTHERS;REEL/FRAME:016489/0725 Effective date: 20050808 Owner name: SPECIAL SITUATIONS CAYMAN FUND, L.P., CAYMAN ISLAN Free format text: SECURITY AGREEMENT;ASSIGNORS:VISUAL NETWORKS, INC.;VISUAL NETWORKS OPERATIONS, INC.;VISUAL NETWORKS INTERNATIONAL OPERATIONS, INC.;AND OTHERS;REEL/FRAME:016489/0725 Effective date: 20050808 Owner name: SPECIAL SITUATIONS PRIVATE EQUITY FUND, L.P., NEW Free format text: SECURITY AGREEMENT;ASSIGNORS:VISUAL NETWORKS, INC.;VISUAL NETWORKS OPERATIONS, INC.;VISUAL NETWORKS INTERNATIONAL OPERATIONS, INC.;AND OTHERS;REEL/FRAME:016489/0725 Effective date: 20050808 Owner name: SPECIAL SITUATIONS TECHNOLOGY FUND, L.P., NEW YORK Free format text: SECURITY AGREEMENT;ASSIGNORS:VISUAL NETWORKS, INC.;VISUAL NETWORKS OPERATIONS, INC.;VISUAL NETWORKS INTERNATIONAL OPERATIONS, INC.;AND OTHERS;REEL/FRAME:016489/0725 Effective date: 20050808 Owner name: SPECIAL SITUATIONS TECHNOLOGY FUND II, L.P., NEW Y Free format text: SECURITY AGREEMENT;ASSIGNORS:VISUAL NETWORKS, INC.;VISUAL NETWORKS OPERATIONS, INC.;VISUAL NETWORKS INTERNATIONAL OPERATIONS, INC.;AND OTHERS;REEL/FRAME:016489/0725 Effective date: 20050808 |
|
REMI | Maintenance fee reminder mailed | ||
LAPS | Lapse for failure to pay maintenance fees | ||
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20060521 |
|
AS | Assignment |
Owner name: VISUAL NETWORKS INTERNATIONAL OPERATIONS, INC., DE Free format text: RELEASE BY SECURED PARTY;ASSIGNORS:SPECIAL SITUATIONS FUND III, L.P.;SPECIAL SITUATIONS CAYMAN FUND, L.P.;SPECIAL SITUATIONS PRIVATE EQUITY FUND, L.P.;AND OTHERS;REEL/FRAME:035448/0316 Effective date: 20150211 Owner name: VISUAL NETWORKS, INC., DELAWARE Free format text: RELEASE BY SECURED PARTY;ASSIGNORS:SPECIAL SITUATIONS FUND III, L.P.;SPECIAL SITUATIONS CAYMAN FUND, L.P.;SPECIAL SITUATIONS PRIVATE EQUITY FUND, L.P.;AND OTHERS;REEL/FRAME:035448/0316 Effective date: 20150211 Owner name: VISUAL NETWORKS TECHNOLOGIES, INC., CALIFORNIA Free format text: RELEASE BY SECURED PARTY;ASSIGNORS:SPECIAL SITUATIONS FUND III, L.P.;SPECIAL SITUATIONS CAYMAN FUND, L.P.;SPECIAL SITUATIONS PRIVATE EQUITY FUND, L.P.;AND OTHERS;REEL/FRAME:035448/0316 Effective date: 20150211 Owner name: VISUAL NETWORKS OPERATIONS, INC., MARYLAND Free format text: RELEASE BY SECURED PARTY;ASSIGNORS:SPECIAL SITUATIONS FUND III, L.P.;SPECIAL SITUATIONS CAYMAN FUND, L.P.;SPECIAL SITUATIONS PRIVATE EQUITY FUND, L.P.;AND OTHERS;REEL/FRAME:035448/0316 Effective date: 20150211 |