SOFTWARE-BASED WATCHDOG METHOD AND APPARATUS
BACKGROUND OF THE INVENTION  1. Field of the Invention
 The present invention relates to software monitoring processes. More particularly, it relates to watchdog processes, which monitor the operation of other processes and restart the other processes, as necessary, to maintain proper operation.
 2. Discussion of Related Art
 Computer processes have been known to occasionally have operating problems. Errors in operation can cause a process to fail or cease to execute. A process may enter a non-exiting loop, or may lose data and cease operation. In order to maintain proper operation, monitoring processes, called watchdog processes, have been used to track operation of another process. When the watchdog process determines that there is a operating problem, it will interrupt the watched process and restart it. In this manner, the main process will be maintained as operating.
 Known watchdog processes have been implemented using circuits separate from those implementing the main process. Typically, these circuits include a counter which is periodically reset by the main process. If the main process fails, then the timer is not reset. Once the timer expires, the watchdog process determines that the main process has failed and operates to restart the process. Such watchdog processes are implemented using a hardware circuit or a separate processor and appropriate software. While these processes assist in preventing the total loss of the main process, they lack the ability to adequately determine or resolve various processing problems. For example, a main process could hang in a loop which resets the timer. Thus, even though the main process has failed, it would not trigger the watchdog process. Therefore, a need exists for a watchdog process which can monitor a main process independent of the type of error.
 Furthermore, the watchdog process cannot determine or correct the error which caused the problem. This can result in the main process failing again after it is restarted. Therefore, a need exists for a watchdog process which can monitor and correct errors which cause the main process to fail.
 Furthermore, for known watchdog processes, the main program must reset the timer of the watchdog process. Thus, the main program must be designed to operate with the watchdog process. The watchdog process cannot operate to monitor other programs. Also, each watchdog timer can only be used to monitor a single program. Therefore, a need exists for a watchdog process which can monitor any program and multiple programs.
 Finally, the watchdog process itself may fail. If the watchdog process fails, the main process could also fail without being monitored. Therefore, a need exists for a watchdog process which can also be monitored.
SUMMARY OF THE INVENTION
 The deficiencies of known watchdog processes are substantially overcome with the system of the present inven
tion through the utilization of a software implemented watchdog process. According to one aspect of the invention a distinct software process operating on the same CPU as a primary process uses calls to the operating system to monitor the operation of a primary process. If the primary process is not executing, or is over utilizing CPU time, it is determined to be non-operating. The primary process is restarted. According to another aspect of the invention, the watchdog process can check and correct damaged configuration or data files used by the primary process before restarting. According to another aspect of the invention, the watchdog process can be used to monitor and restart various primary processes operating on a single computer system. According to another aspect of the invention, a secondary software watchdog process can be included as part of a primary process for fault tolerant operation. The secondary software watchdog process monitors the primary watchdog process to ensure continued operation.
 According to another aspect of the present invention, the primary process and watchdog process communicate information through a loop back TCP/IP address. In this manner, the primary process and watchdog process can periodically send messages. If the watchdog process does not receive a message from the primary process within a certain predetermined time, the watchdog process determines that the primary process is not operating properly and restarts the primary process.
BRIEF DESCRIPTION OF THE DRAWINGS
 FIG. 1 is a block flow diagram for operation of a watchdog process according to an embodiment of the present invention.
 FIG. 2 is a block diagram of the relationship between processes according to a second embodiment of the present invention.
 FIG. 3 is a block diagram of the relationship between processes according to a third embodiment of the present invention.
 FIG. 4 is a block flow diagram for operation of a watchdog process according to the third embodiment of the present invention.
DETAILED DESCRIPTION OF PREFERRED
 FIG. 1 illustrates operation of a software watchdog process according to an embodiment of the present invention. The software watchdog process of the present invention may be implemented in any known computer system which allows simultaneous operation of processes. Such computer systems include personal computers, servers and mainframe computers. Furthermore, to utilize the present invention, according to the first embodiment, the computer system must include an operating system which collects and provides information about process operation. The Windows NT and Windows 2000 operating systems from Microsoft Corporation have such functionality. Other operating systems also provide similar functionality. Additionally, the software watchdog process may be created in any known programming language. Preferably, the software watchdog process is stored in the memory of the computer system and is automatically accessed and operated when the computer system is started or when the primary process is started.