WO2009074778A1

WO2009074778A1 - Exception information collation

Info

Publication number: WO2009074778A1
Application number: PCT/GB2008/004041
Authority: WO
Inventors: Richard Fitzgerald
Original assignee: Symbian Software Limited
Priority date: 2007-12-12
Filing date: 2008-12-08
Publication date: 2009-06-18
Also published as: EP2220561A1; GB2455537A; GB0724256D0

Abstract

A repository for exception-relevant information where information relating to a component of a computer system is registered and updated. In the event of an exception or a crash, the contents of the repository may be copied to non-volatile memory for later analysis. The exception-relevant information may, for example, relate to a device driver or to sate information.

Description

EXCEPTION INFORMATION COLLATION

TECHNICAL FIELD

This invention relates to the collation and storage of information for use in the event of a computer system error. In particular, the invention relates to the collation and storage of information for use in analysing the circumstances surrounding, and causes of, computer system exceptions such as crashes.

BACKGROUND TO THE INVENTION

It is known that computers suffer errors and that these errors can lead to crashes. Crashes occur, for example, when the computer's operating system is unable to resolve conflicts such as memory address conflicts. Crashes may also occur where device drivers are incorrectly implemented. Different operating systems have different vulnerabilities to crashes. In "open" systems (i.e. computer systems which are capable of incorporating additional hardware and software), the more common causes of crashes are device driver errors and resource mismanagement.

A crash occurs when the operating system encounters an error which it is not able to resolve. As used herein, the terms "crash" and "exception" denote errors encountered by a computer operating system.

When a computer system does crash, it would be advantageous to know what caused the crash so that this may be avoided in future. The term "post-mortem analysis" refers to the analysis of system information after a crash has occurred in an attempt to determine what caused the crash. In order to return the system to a usable state once a crash has occurred, it is necessary to restart or reboot the system. However, when the system reboots or restarts, the contents of the system memory will be erased and it is the contents of this memory which is to be studied to determine possible causes of the crash. To avoid rendering the system inoperative while the post-mortem analysis is performed, it is desirable to copy the information relevant to the post-mortem analysis prior to rebooting or restarting the system to a storage medium capable of retaining the information after an interruption in power.

There are at least two known approaches to transferring information for later postmortem analysis and both are concerned with copying the relevant information prior to the restart or reboot so that the information may be analysed later, without inhibiting further operations of the system.

In the first, and older, approach, the entire memory addressed by the operating system concerned is copied to non- volatile memory in the event of a crash. The computer is then shut down or rebooted. The copied memory contents may then be analysed in an attempt to discern the causes of the crash.

This approach suffers from the disadvantage that the non- volatile memory has to be of equal size, or greater than, the memory being copied to ensure that all relevant information is maintained. This is not practical for certain systems, such as portable devices, where the size of non- volatile memory is often restricted and is significantly smaller than the size of the total memory of the operating system.

Furthermore, in order to analyse the copied memory, it is necessary to know where the relevant information is located. In certain situations, for example where device drivers are loaded as and when needed, the location of all relevant information relating to that device driver will not be known after the crash has occurred.

In the second known approach to dealing with system crashes, once the system crashes, the system memory is analysed and only those portions thereof deemed relevant to the crash are copied to non-volatile memory. This has the advantage that the non-volatile memory to which the information is copied may be significantly smaller than the size of the system memory concerned. Typically in this approach system memory contents relating to, for example, current CPU registers, the currently running thread, internal kernel data objects etc. are stored for later analysis. This approach to storing information for post-mortem analysis also suffers from the disadvantage that only information which is known from an analysis of the memory after the crash can be saved for later analysis. It is necessary that the software performing the post-mortem analysis be able to identify the relevant information. In many instances, it is not known prior to the exception or crash where the relevant information will be stored, rendering the later retrieval thereof impossible. Any memory contents rendered inaccessible by the crash cannot be saved for later analysis.

Furthermore, certain information stored in memory is not directly accessible. For example, in certain situations it is necessary to enable a clock before specific hardware registers can be read. Neither of the aforementioned approaches is capable of accessing such information.

SUMMARY OF THE INVENTION

According to a first aspect, the invention provides for a method of collating exception-relevant information for use in a computer system, said method comprising the steps of: registering exception-relevant information in an exception repository; maintaining said exception repository during operation of said computer system; and in the event of an exception, preserving the contents of said exception repository for later use.

The exception-relevant information may be data, may be a pointer to data, or may be a function. For example, the exception-relevant information may be the contents of a memory location or may be an address of that memory location. Further examples are provided below.

A repository of exception-relevant information provides a single repository for exception related information which facilitates the analysis of the information in the event of an exception to determine what caused the exception. Furthermore, by maintaining the exception repository during operation of the computer system, the relevance of the information stored in the exception repository is ensured. The exception repository may be implemented in volatile or in non- volatile memory.

The exception-relevant information may relate to a device driver or a corresponding device. In this instance, the method relates to registering and maintaining information regarding the device or the device driver.

Preferably the step of registering said exception-relevant information occurs when a device corresponding to the device driver is first used.

Preferably, the step of maintaining the exception repository includes the step of deregistering the exception-relevant information when the device corresponding to the device driver is no longer in use.

The exception-relevant information may relate to said device or to said device driver and may include one or more of: a memory address; a memory address range; state information; a buffer contents; information internal to said device (such as the state of hardware registers for the device), or statistical data.

The step of registering exception-relevant information may include the step of storing access instructions in said exception repository, said access instructions including instructions for accessing data. This allows information which would not otherwise be accessible to be copied for later analysis. The access instructions may include one or more of: instructions to enable a clock; instructions to access a memory by means of an interface; and a signal to enable a memory.

The step of preserving the exception-relevant information may include the step of copying said contents of said exception repository to non-volatile memory.

The content of the exception repository is preferably stored in a linked list. A linked list exhibits the advantages of being easy and quick to traverse and does not require additional memory for storage-it is sufficient to store a pointer to each entry in the list. Furthermore, a linked list provides a structure to the contents of the exception repository so that the contents may be easily copied to a database, or other repository, if required.

The exception in response to which the content of the exception repository is preserved may be a system crash.

According to a further aspect, the invention provides for an exception repository for use in a computer system, said exception repository being adapted to register at least one entry containing exception-relevant information, said exception-relevant information corresponding to a component of said computer system.

Preferably, the exception repository is adapted to register a plurality of entries, each entry containing exception-relevant information, said plurality of entries corresponding to a plurality of components of said computer system.

At least one of the computer system components may be a device driver and, in this instance, the repository is adapted to register the entry corresponding to the device driver for as long as a device corresponding to the device driver is operational in said computer system.

The exception-relevant information may include one or more of: a memory address; a memory address range; state information; a buffer contents; internal device information; or statistical data. The exception-relevant information may comprise access instructions which include instructions for accessing data.

The access instructions may include one or more of: instructions to enable a clock; instructions to access a memory by means of an interface; and a signal to enable a memory.

The exception repository may further comprise a plurality of entries and a linked list, the linked list linking one of the plurality of entries to another of the plurality of entries.

According to a further aspect, the invention provides for a computer system comprising an exception repository as herein described.

The computer system may comprise a read-only memory and a writable memory wherein said exception-relevant information is stored in said read-only memory and wherein a linked list linking a plurality of entries of said exception repository is stored in said writable memory.

According to a further aspect, the invention provides for a computer readable medium comprising instructions for directing a computer to perform the method herein described.

According to a further aspect, the invention provides for an operating system arranged to cause a computing device to operate in accordance with the method herein described.

According to a further aspect, the invention provides for a computer program or a suite of computer programs suitable for causing a computing device to operate in accordance with the method herein described.

According to a further aspect, the invention provides for a computer readable medium comprising instructions for directing a computer to perform the method hereinbefore described.

According to a further aspect, the invention provides for a computer system comprising a plurality of components, a volatile and a non-volatile memory; said volatile memory being demarcated into a plurality of regions, at least one of the regions being arranged to provide an exception repository, said exception repository comprising at least one entry containing exception-relevant information, said exception-relevant information corresponding to a component of said computer system.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the invention will now be described with reference to the accompanying drawings, in which:

Figure 1 is a schematic diagram of a computer system incorporating an exception repository according to an embodiment of the invention;

Figure 2 illustrates a structure for data stored in the exception repository of the computer system of Figure 1;

Figure 3 a illustrates a portion of an alternate structure for data stored in the exception repository of the computer system of Figure 1;

Figure 3b illustrates a portion of an alternate structure for data stored in the exception repository of the computer system of Figure 1;

Figure 3 c illustrates a linked list incorporating the data structure of Figures 3a and 3b;

Figure 4 illustrates the structure of an application programming interface provided according to an embodiment of the invention;

Figure 5 is a flow diagram illustrating the manner in which an exception repository according to an embodiment of the invention is populated and maintained; and

Figure 6 is a flow diagram of the operation of an embodiment of the invention in the event of an exception. DESCRIPTION OF PREFERRED EMBODIMENTS

Figure 1 illustrates a schematic diagram of a computer system 10 which includes an operating system kernel 12. The kernel 12 communicates with hardware devices: keyboard 22, non-volatile memory 24 and monitor 26. Interaction between the kernel 12 and all hardware devices occurs by means of a corresponding device driver. Therefore, communication with keyboard 22 will occur by means of keyboard device driver 16; communication with non- volatile memory 24 by means of hard disk controller 18 and with monitor 26 by means of display device driver 20.

Computer system 10 further comprises a volatile system memory 14 connected to kernel 12 and a user program 28, also connected to kernel 12. The user program 28 interacts with the kernel 12 to control the operation of devices 22, 24 and 26 in a known manner. The system memory 14 is used by the kernel 12 in executing the user program 28. Volatile system memory 14 does not retain information when the power to this memory is interrupted, for example when the computer system is shut down or rebooted. On the other hand, nonvolatile memory 24 stores information even in the absence of power thereto.

The volatile system memory 14 has been divided into regions (only one of which is shown), and one of these regions is demarcated as an exception repository 30. The exception repository 30 stores information which may be relevant on the occurrence of an exception, as described below. In a further embodiment, the exception repository 30 is stored in dedicated non- volatile memory and, in this case, it is not necessary to transfer a copy of the contents of the exception repository to non- volatile memory in the event of an exception.

Figure 2 illustrates a structure of data stored in the exception repository 30. Each data entry (referred to as SCrashDumpRegion in Figure 2) has a name field iName which serves to identify and describe the corresponding data entry so that the relevance of the corresponding exception-relevant information may be easily assessed when the data is analysed after the exception has occurred. This field may also be used as the entire data entry, in which case it could, for example, be used to register the presence of a device driver. In this case the mere presence of a suitably labelled iName would be sufficient to denote the presence of the driver.

The iStartFn is an optional field which may contain one or more procedures which are necessary to call in order to access the memory 14 specified at iAddr (see below). For example, iStartFn may have instructions to enable a clock; instructions to access a memory by means of an interface where interface-particular instructions are required before the data may be accessed; a signal to enable a memory; or any other procedure necessary to access the relevant memory address.

Furthermore, iStartFn can be used to dump information which cannot conveniently store information in a memory region specified by a starting address and a size (denoted by the pair: {iAddr, iSize}, with reference to Figure 2). In this case, use will be made of the Printf and Dump functions (as discussed below with reference to Figure 4). iAddr is a memory address of system memory 14 where data which may be relevant in the event of an exception is stored. iSize is the number of bytes of the memory, starting at iAddr, which contain the relevant information. Therefore, iSize defines a memory address range. iNext is a pointer to the next entry in the exception repository so that the entries of this repository form a linked list. For the last entry in the repository, iNext will be null.

The fields iStartFn, iAddr and iSize are optional. For a given entry in exception repository 30, it may be sufficient to store a label, which is stored in the field iName, as previously discussed.

Alternatively, only the iName and iStartFn, or the iName and the {iAddr, iSize} pair may be used. If, as in these cases, iStartFn is not used, it should be set to null or any other value regarded as illegal in the system where the embodiment is implemented. Likewise, if {iAddr, iSize} is not used, iSize should be set to zero.

It is to be realised that the exception repository 30 can be used to store a wide range of information. In one example, the exception repository 30 is used to store information particular to the operation of a device driver. In this case the information which is stored in the exception repository 30 is relevant to the operation of the device driver. Furthermore, the type of information which is stored will depend on the particular device driver involved. Although the invention is not limited in this respect, the following information may be stored in the repository 30 for a particular device driver:

(i) The content of data buffers used by the device driver. In this instance, the corresponding entry in exception repository 30 will have a pointer (iAddr) to the contents of the buffer as well as an entry corresponding to the size of the buffer (iSize). Provided that the address at which the buffer is located remains constant, it is not necessary to update this entry while the device driver is operational.

(ii) Internal state variables of the device driver. Such entries will not include a memory address, but will instead include a label describing the internal state. In this case, iStartFn would be used to point to a function which will save the state information into non- volatile memory in an appropriate format.

(iii) Historical state information. Exception repository 30, in this instance, maintains a record of how the states of that device driver have changed over time. This information is also specified by a function pointed to by iStartFn.

(iv) Internal device information. An entry in the repository 30 of this type is used to store information which is generally not accessible to the kernel 12. This information may relate to, for example, the peripheral with which that device driver is currently communicating (which would apply where the device driver is capable of communicating with more than one device) or, where clients are able to set the configuration of the device driver, the current configuration of that device driver (which may relate to, for example, baud rate, buffer sizes, bit-width, duplex mode, channel numbers, network addresses etc.). This would use iStartFn, {iAddr, iSize} or both.

(v) Statistical data logged by the device driver. Such statistical information may be relevant in the post-mortem analysis to determine why an error occurred. Such information may include "high-water" and "low-water" marks indicating maximum and minimum performance-related indicators (relating to, for example, memory allocation, data transfer buffers etc.); error rates and types (serial communication error statistics, network quality indicators etc.); or current up-time of something relevant to that device (e.g. network uptime). This would use iStartFn, {iAddr, iSize} or both.

Each entry in the exception repository 30 is created by the kernel 12 after communication between the kernel 12 and the device driver. It is to be realised that a particular device driver may have more than one entry in the execution repository 30 associated therewith.

Figure 2 illustrates a structure where each entry includes the pointer iNext to the next entry in the exception repository so that entries of the corresponding repository form a linked list. This implementation has the advantage that the pointer is part of the data structure and that therefore creation of a new member of the linked list will not utilize additional memory. However, this does suffer from the disadvantage that the list is necessarily maintained in writable memory which means that the entries are vulnerable to being overwritten during an exception or a crash (by a stray pointer for example). A second disadvantage exists in that, at build time, the entry may be known and resident in read-only memory, but has to be copied to writable memory, resulting in two copies of the entry. Figures 3 a and 3 b illustrate an alternate implementation of a data structure for storing the entries of the exception repository 30. Figure 3a illustrates the structure of SCrashDumpRegionListltem which forms a linked list of the structure SCrashDumpRegion. As illustrated, the linked list comprises pointers to the next entry in the linked list as well as pointers to the exception-relevant information, SCrashDumpRegion. The structure for the corresponding SCrashDumpRegion item is illustrated in Figure 3b. This structure is similar to that of Figure 2 other than the exclusion here of a linked list pointer.

In this instance, the pointers to the linked list are created separately and the operating system will maintain its own linked list in RAM with the structure illustrated in Figure 3 a. When a new SCrashDumpRegion is added, the operating system allocates memory for a SCrashDumpRegionListltem and, if successful, the iNext of the previous SCrashDumpRegionListltem is set to point to the new SCrashDumpRegionList Item, and the ilnf o of the new entry points to the SCrashDumpRegion.

The structure of Figures 3 a and 3b separates the linked list (which must be in writable memory) from the exception-relevant information of the repository 30, which may now reside in read-only memory. Figure 3 c illustrates a portion of a linked list 70 comprising data structures according to Figures 3 a and 3b. In this embodiment, the structure illustrated in Figure 1 would be modified in that the exception repository 30 comprises a read-only section where the exception-relevant information (Figure 3b) is kept whereas the links of the linked list (Figure 3 a) are stored in the system memory 14 which is writable memory.

The main disadvantage with this structure is that each entry requires memory so it is possible that the creation of the entry may fail where memory is not available.

Figure 4 illustrates the API (application programming interface) provided by the kernel 12 to access the exception repository 30. Printf is a function which copies arbitrarily formatted strings and data stored in the repository 30 to non- volatile memory 24. Dump copies the memory specified at aAddress of length aLength bytes to non- volatile memory 24. Print f and Dump provide the means whereby exception-relevant information is transferred to non-volatile memory 24 so that this information may be analysed after the computer system 10 has been shut down or rebooted. RegisterPlugin registers an entry by adding information to the exception repository 30 in the format illustrated in Figure 2. Any appropriate event may trigger this. For example, when more memory is allocated, hardware peripheral enabled, another channel to a device driver is opened etc. DeregisterPlugin deregisters the entry by removing the entry from the exception repository 30 when the relevant information is no longer relevant. For example, when memory has been deallocated, hardware peripheral is disabled, channel to a device driver has been disabled etc.

Where a particular repository entry relates to dynamic information which changes memory address or size with time, a function is called by means of the S tart Fn method which outputs the current address and size of the information dynamically while dumping information to non- volatile storage

Figure 5 illustrates the manner in which an exception repository according to an embodiment of the invention is populated and maintained. By way of illustration, the example where an external input device such as a headset is added to the computer system is used. In block 32, the hardware is installed. This initiates the process whereby the exception repository is populated. In the next block, block 34, the corresponding device driver is loaded by the kernel 12. In this instance, the device driver will be the device driver which corresponds to the installed headset. The process then moves to block 36 where the exception repository 30 is populated with the entries relevant to the headset through use of the RegisterPlugin function. Once the exception repository has been populated, the process proceeds to block 38 where the status of the device driver is monitored. If the device driver undergoes a change which influences one or more of the entries in the exception repository 30, the process will move on to block 40 where the relevant entry in the exception repository is updated by the device driver by calling an appropriate function, RegisterPlugin or DeregisterPlugin. This monitoring and updating will continue for as long as the hardware (i.e. the headset) is in use.

At block 42, the removal of the hardware is detected. If the hardware is no longer in use, the process moves to block 44 to update the exception repository by deregistering the entries relating to the headset which were previously registered in block 36.

In this manner, it is ensured that the entries in the exception repository will contain information relevant to hardware which is in use, and that entries which are no longer relevant are removed from the repository.

Figure 6 illustrates the process followed when an exception occurs whereby the content of the exception repository 30 is transferred to non- volatile memory 24 for later analysis. The process initialises at start block 50. At the first procedural step, block 52, the variable p is set to the beginning of the linked list formed by the entries in the exception repository 30. At decision block 54 it is determined whether p is null. If p is null, the end of linked list has been reached or the linked list is empty and the process is terminated at block 56. \ϊp is not null, entries remain in the list to be processed, and the process will continue to block 58.

At block 58, the string stored in variable iName for the first entry in the linked list is copied to non- volatile memory 24 using the Print f function. At decision block 60, it is determined whether the variable iStartFn is defined for the first entry in the linked list. If it is determined that the variable iStartFn has been defined, the process proceeds to block 62 where this function is executed. Alternatively, if this function is not defined, the process will proceed to block 64 to determine whether the variable isize is greater than zero. If iSize is greater than zero, the process proceeds to block 66 where the contents of the memory at iAddr, for the number of bytes specified by iSize are copied to non- volatile memory 24 by means of the Dump function.

If iSize is not greater than zero, there is no memory content corresponding to this entry to be copied and the process proceeds to block 68 where variable p is set equal to the next entry in the linked list of the exception repository 30. The process then reverts back to block 54 and the process outlined above is repeated for the next entry in the linked list. In this way, all of the data of all entries of the exception repository 30 will be copied to nonvolatile memory 24.

Although the aforementioned embodiments of the invention have been described with reference to device drivers, it is to be realised that the invention is equally applicable to exception-relevant information which does not necessarily relate to hardware, such as dynamic system information. For example, the repository 30 may store entries relating to the currently running thread or the value of other context specific operational parameters. Furthermore, any application which is capable of executing on the computer system may register or deregister information in the repository 30. By way of example, the application responsible for maintaining a telephony stack outside the kernel may store entries in the repository 30 relating to the status of the kernel.

Claims

1. A method of collating exception-relevant information for use in a computer system, said method comprising the steps of: registering exception-relevant information in an exception repository; maintaining said exception repository during operation of said computer system; and in the event of an exception, preserving the contents of said exception repository for later use.

2. The method according to claim 1 wherein said exception-relevant information is data, a pointer to data or a function.

3. The method according to claim 1 or claim 2 wherein said exception-relevant information relates to a device driver or a corresponding device.

4. The method according to claim 3 wherein said step of registering said exception-relevant information occurs when a device corresponding to said device driver is first used.

5. The method according to claim 3 or claim 4 wherein the step of maintaining said exception repository includes the step of deregistering said exception-relevant information when said device corresponding to said device driver is no longer in use.

6. The method according to any preceding claim wherein said exception-relevant information includes one or more of: a memory address; a memory address range; state information; a buffer contents; information internal to a device, or statistical data.

7. The method according to any preceding claim wherein said step of registering exception- relevant information includes the step of storing access instructions in said exception repository, said access instructions including instructions for accessing data.

8. The method according to claim 7 wherein said access instructions include one or more of: instructions to enable a clock; instructions to access a memory by means of an interface; and a signal to enable a memory.

9. The method according to any preceding claim wherein said step of preserving said exception-relevant information includes the step of copying said contents of said exception repository to non- volatile memory.

10. The method according to claim 9 wherein said contents of said exception repository is stored in a linked list.

11. The method according to any preceding claim wherein said exception is a system crash.

12. An exception repository for use in a computer system, said exception repository being adapted to register at least one entry containing exception-relevant information, said exception-relevant information corresponding to a component of said computer system, and said exception repository further being adapted to preserve said exception-relevant information in the event of an exception.

13. The exception repository according to claim 12 adapted to register a plurality of entries, each entry containing exception-relevant information, said plurality of entries corresponding to a plurality of components of said computer system.

14. The exception repository according to claim 12 or claim 13 wherein said computer system component is a device driver.

15. The exception repository according to claim 14 wherein said repository is adapted to retain said entry corresponding to said device driver for as long as a device corresponding to said device driver is operational in said computer system.

16. The exception repository according to any one of claims 12 to 15 wherein said exception-relevant information includes one or more of: a memory address; a memory address range; state information; a buffer contents; internal device information; or statistical data.

17. The exception repository according to any one of claims 12 to 16 wherein said exception-relevant information comprises access instructions, said access instructions including instructions for accessing data.

18. The exception repository according to claim 17 wherein said access instructions include one or more of: instructions to enable a clock; instructions to access a memory by means of an interface; and a signal to enable a memory.

19. The exception repository according to any one of claims 12 to 18 comprising a plurality of entries and a linked list, said linked list linking one of said plurality of entries to another of said plurality of entries.

20. A computer system comprising an exception repository according to any one of claims 12 to 19.

21. A computer system according to claim 20, when dependent on any one of claims 12 to 18, said computer system further comprising a read-only memory and a writable memory wherein said exception-relevant information is stored in said read-only memory and wherein a linked list linking a plurality of entries of said exception repository is stored in said writable memory.

22. A computer readable medium comprising instructions for directing a computer to perform the method of any one of claims 1 to 11.

23. An operating system arranged to cause a computing device to operate in accordance with claims 1 to 11.

24. A computer program or a suite of computer programs suitable for causing a computing device to operate in accordance with any one of claims 1 to 11.

25. A computer system comprising a plurality of components, a volatile and a non- volatile memory; said volatile memory being demarcated into a plurality of regions, at least one of the regions being arranged to provide an exception repository, said exception repository comprising at least one entry containing exception-relevant information, said exception- relevant information corresponding to a component of said computer system.