WO2001016741A2 - Semaphore control of shared-memory - Google Patents

Semaphore control of shared-memory Download PDF

Info

Publication number
WO2001016741A2
WO2001016741A2 PCT/US2000/024217 US0024217W WO0116741A2 WO 2001016741 A2 WO2001016741 A2 WO 2001016741A2 US 0024217 W US0024217 W US 0024217W WO 0116741 A2 WO0116741 A2 WO 0116741A2
Authority
WO
WIPO (PCT)
Prior art keywords
shared memory
semaphore
processing nodes
pointer
processor
Prior art date
Application number
PCT/US2000/024217
Other languages
French (fr)
Other versions
WO2001016741A3 (en
Inventor
Lynn Parker West
Karlon K. West
Original Assignee
Times N Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Times N Systems, Inc. filed Critical Times N Systems, Inc.
Priority to AU69497/00A priority Critical patent/AU6949700A/en
Priority to EP00957948A priority patent/EP1214651A2/en
Priority to CA002382927A priority patent/CA2382927A1/en
Publication of WO2001016741A2 publication Critical patent/WO2001016741A2/en
Publication of WO2001016741A3 publication Critical patent/WO2001016741A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/526Mutual exclusion algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0284Multiple user address space allocation, e.g. using different base addresses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/45Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
    • G06F8/457Communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0837Cache consistency protocols with software control, e.g. non-cacheable data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/52Indexing scheme relating to G06F9/52
    • G06F2209/523Mode

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Hardware Redundancy (AREA)
  • Information Transfer Systems (AREA)

Abstract

Methods, systems and devices are described for semaphore control of a shared-memory cluster. A method, includes writing at least one pointer to a semaphore region of a shared memory region that is coupled to a plurality of processing nodes. The at least one pointer points to at least one of the plurality of processing nodes, the at least one pointer i) indicating that a portion of the shared memory node is dedicated to reading by the at least one of the plurality of processing nodes and ii) protecting access to the portion of the shared memory node until the portion of the shared memory node has been read by the at least one of the plurality of processing nodes. The methods, systems and devices provide advantages because the speed and scalability of parallel processor systems is enhanced.

Description

SEMAPHORE CONTROL OF SHARED-MEMORY
BACKGROUND OF THE INVENTION
1. Field of the Invention The invention relates generally to the field of computer systems based on multiple processors and shared memory. More particularly, the invention relates to computer systems that utilize semaphore control of a shared-memory cluster.
2. Discussion of the Related Art The clustering of workstations is a well-known art. In the most common cases, the clustering involves workstations that operate almost totally independently, utilizing the network only to share such services as a printer, license-limited applications, or shared files.
In more-closely-coupled environments, some software packages (such as NQS) allow a cluster of workstations to share work. In such cases the work arrives, typically as batch jobs, at an entry point to the cluster where it is queued and dispatched to the workstations on the basis of load.
In both of these cases, and all other known cases of clustering, the operating system and cluster subsystem are built around the concept of message-passing. The term message-passing means that a given workstation operates on some portion of a job until communications (to send or receive data, typically) with another workstation is necessary. Then, the first workstation prepares and communicates with the other workstation.
Another well-known art is that of clustering processors within a machine, usually called a Massively Parallel Processor or MPP, in which the techniques are essentially identical to those of clustered workstations. Usually, the bandwidth and latency of the interconnect network of an MPP are more highly optimized, but the system operation is the same.
In the general case, the passing of a message is an extremely expensive operation; expensive in the sense that many CPU cycles in the sender and receiver are consumed by the process of sending, receiving, bracketing, verifying, and routing the message, CPU cycles that are therefore not available for other operations. A highly streamlined message-passing subsystem can typically require 10,000 to 20,000 CPU cycles or more.
There are specific cases wherein the passing of a message requires significantly less overhead. However, none of these specific cases is adaptable to a general-purpose computer system.
Message-passing parallel processor systems have been offered commercially for years but have failed to capture significant market share because of poor performance and difficulty of programming for typical parallel applications. Message-passing parallel processor systems do have some advantages. In particular, because they share no resources, message-passing parallel processor systems are easier to provide with high-availability features. What is needed is a better approach to parallel processor systems.
There are alternatives to the passing of messages for closely-coupled cluster work. One such alternative is the use of shared memory for inter- processor communication.
Shared-memory systems, have been much more successful at capturing market share than message-passing systems because of the dramatically superior performance of shared-memory systems, up to about four-processor systems. In Search of Clusters, by Gregory F. Pfister 2nd ed. (January 1998) Prentice Hall Computer Books; ISBN: 0138997098 describes a computing system with multiple processing nodes in which each processing node is provided with private, local memory and also has access to a range of memory which is shared with other processing nodes. The disclosure of this publication in its entirety is hereby expressly incorporated herein by reference for the purpose of indicating the background of the invention and illustrating the state of the art.
However, providing high availability for traditional shared-memory systems has proved to be an elusive goal. The nature of these systems, which share all code and all data, including that data which controls the shared operating systems, is incompatible with the separation normally required for high availability. What is needed is an approach to shared-memory systems that improves availability. Although the use of shared memory for inter-processor communication is a well-known art, prior to the teachings of U.S. Ser. No. 09/273,430, filed March 19, 1999, entitled Shared Memory Apparatus and Method for Multiprocessing Systems, the processors shared a single copy of the operating system. The problem with such systems is that they cannot be efficiently scaled beyond four to eight way systems except in unusual circumstances. All known cases of said unusual circumstances are such that the systems are not good price-performance systems for general-purpose computing.
The entire contents of U.S. Patent Applications 09/273,430, filed March 19, 1999 and PCT/USOO/01262, filed January 18, 2000 are hereby expressly incorporated by reference herein for all purposes. U.S. Ser. No. 09/273,430, improved upon the concept of shared memory by teaching the concept which will herein be referred to as a tight cluster. The concept of a tight cluster is that of individual computers, each with its own CPU(s), memory, I/O, and operating system, but for which collection of computers there is a portion of memory which is shared by all the computers and via which they can exchange information. U.S. Ser. No. 09/273,430 describes a system in which each processing node is provided with its own private copy of an operating system and in which the connection to shared memory is via a standard bus. The advantage of a tight cluster in comparison to an SMP is "scalability" which means that a much larger number of computers can be attached together via a tight cluster than an SMP with little loss of processing efficiency.
What is needed are improvements to the concept of the tight cluster. What is also needed is an expansion of the concept of the tight cluster. Another well-known art is that of a "heartbeat" function. In a system involving multiple independent computers, a function can be provided such that each of said computers occasionally signals to at least a subset of the other processors an indication that status is operational. Failure to signal this heartbeat is a primary indication that the computer has failed, in either hardware or software. Should a companion processor fail to receive the heartbeat within a specified period of time following the previous received heartbeat signal, said companion processor will execute a verification routine. Should the results of the verification routine indicate computer failure, the system will enter checkpoint restart mode and will restart. The failed computer will be removed from the group upon restart and an operator message will be issued as part of the restart. In symmetric multiprocessors (SMP's) such a heartbeat function is normally not applicable, as all the processors are using a single copy of the software, and software failure is the most common failure. Also, in event of hardware failure, the cache of the failed processor may contain dirty, required operating system status, so that recovery is often impossible. Since there is no way to determine from other processors whether recovery is possible, there are no known examples of SMP systems which attempt to recover from processor or memory failures.
While the heartbeat functionality can be provided in the context of a message passing system, such systems have performance deficiencies, as discussed above. Therefore, what is also needed in an approach to providing the heartbeat function in the context of a symmetric multiprocessor system.
SUMMARY OF THE INVENTION A goal of the invention is to simultaneously satisfy the above-discussed requirements of improving and expanding the tight cluster concept which, in the case of the prior art, are not satisfied. The invention can include a system in which accesses to one particular range of addresses is used for inter-processor signaling, semaphores, pointers, and/or to aid high-availability design.
One embodiment of the invention is based on a method, comprising: writing at least one pointer to a semaphore region of a shared memory region that is coupled to a plurality of processing nodes, wherein the at least one pointer points to at least one of said plurality of processing nodes, the at least one pointer i) indicating that a portion of said shared memory node is dedicated to reading by the at least one of said plurality of processing nodes and ii) protecting access to said portion of said shared memory node until said portion of said shared memory node has been read by the at least one of said plurality of processing nodes. Another embodiment of the invention is based on a system, comprising a multiplicity of processors, each with some private memory and the multiplicity with some shared memory, interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and arranged such that at least a portion of one special range of memory addresses or I/O addresses are provided for the purpose of signaling from a first processor to a second, said signaling to occur when said first processor reads or writes a location dedicated to the signaling of said second processor, and said location protected by semaphore or other locking mechanism so that said first processor can determine whether said second processor has already been signaled by a third process or processor and can wait using defined procedures for the signaling event to complete. Another embodiment of the invention is based on a system, comprising a multiplicity of processors, each with some private memory and the multiplicity with some shared memory, interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and arranged such that at least a portion of one special range of memory addresses or I/O addresses are provided for the purpose of semaphore control of the remainder or a portion of the remainder of shared memory so that when a first process enters a critical section it may obtain a semaphore to continue into that section.
These, and other goals and embodiments of the invention will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following description, while indicating preferred embodiments of the invention and numerous specific details thereof, is given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the invention without departing from the spirit thereof, and the invention includes all such modifications. BRIEF DESCRIPTION OF THE DRAWINGS
A clear conception of the advantages and features constituting the invention, and of the components and operation of model systems provided with the invention, will become more readily apparent by referring to the exemplary, and therefore nonlimiting, embodiments illustrated in the drawings accompanying and forming a part of this specification, wherein like reference characters designate the same parts. It should be noted that the features illustrated in the drawings are not necessarily drawn to scale.
FIG. 1 illustrates a block diagram of a system including a shared memory node and a plurality of processing nodes, representing an embodiment of the invention.
FIG. 2 illustrates a block diagram of a shared memory node having a shared memory region and a semaphore region, representing an embodiment of the invention. FIG. 3 illustrates a block diagram of a system including a shared memory region and a semaphore region, showing pointers from the semaphore region to each of six processing nodes, representing an embodiment of the invention.
FIG. 4 illustrates a block diagram of a system showing a node P5 signaling to a node P2 via a semaphore region, representing an embodiment of the invention.
FIG. 5 illustrates a semaphore format, representing an embodiment of the invention.
DESCRIPTION OF PREFERRED EMBODIMENTS
The invention and the various features and advantageous details thereof are explained more fully with reference to the nonlimiting embodiments that are illustrated in the accompanying drawings and detailed in the following description of preferred embodiments. Descriptions of well known components and processing techniques are omitted so as not to unnecessarily obscure the invention in detail. The teachings of U.S. Ser. No. 09/273,430 include a system which is a single entity; one large supercomputer. The invention is also applicable to a cluster of workstations, or even a network.
The invention is applicable to systems of the type of Pfister or the type of U.S. Ser. No. 09/273,430 in which each processing node has its own copy of an operating system. The invention is also applicable to other types of multiple processing node systems.
The context of the invention can include a tight cluster as described in U.S. Ser. No. 09/273,430. A tight cluster is defined as a cluster of workstations or an arrangement within a single, multiple-processor machine in which the processors are connected by a high-speed, low-latency interconnection, and in which some but not all memory is shared among the processors. Within the scope of a given processor, accesses to a first set of ranges of memory addresses will be to local, private memory but accesses to a second set of memory address ranges will be to shared memory. The significant advantage to a tight cluster in comparison to a message-passing cluster is that, assuming the environment has been appropriately established, the exchange of information involves a single STORE instruction by the sending processor and a subsequent single LOAD instruction by the receiving processor. The establishment of the environment, taught by U.S. Ser. No.
09/273,430 and more fully by companion disclosures (U.S. Provisional Application Ser. No. 60/220,794, filed July 26, 2000; U.S. Provisional Application Ser. No. 60/220,748, filed July 26, 2000; WSGR 15245-711; WSGR 15245-712; WSGR 15245-713; WSGR 15245-715; WSGR 15245-716; WSGR 15245-718; WSGR 15245-719; and WSGR 15245-720, the entire contents of all which are hereby expressly incorporated herein by reference for all purposes) can be performed in such a way as to require relatively little system overhead, and to be done once for many, many information exchanges. Therefore, a comparison of 10,000 instructions for message-passing to a pair of instructions for tight-clustering, is valid.
The present invention contemplates such an environment in which each standing computer is provided with very high-speed, low-latency communication means to shared memory, and which shared-memory includes at least one address range usable for each of the following: (1) the signaling of one processor by another; (2) semaphores; (3) pointer passing; and (4) failure recovery. The low-latency communication means can include a communication link based on traces, cables and/or optical fiber(s). The low-latency communication means can include hardware (e.g., a circuit), firmware (e.g., flash memory) and/or software (e.g., a program). The communications means can include on-chip traces and/or waveguides. Further, each of these communication means can be duplicated. Any or several of the above functions could be provided in a separate address range, or several address ranges, or a single dedicated address range. Any or several of the above functions could be provided in an I/O address range, or several I/O address ranges, or a single dedicated I/O address range. Within the stated address range, signaling of one processor by another can be accomplished by reading or writing a particular address, an address dedicated to the destination processor. Semaphore protecting the access to the particular address will assure that after a first processor signals the target processor, a second processor can determine whether a signaling event to the target processor is under way. Referring to FIG. 1, a system 100 includes a shared memory node 110 and a plurality of processing nodes 120, 130, 140. Processing node 120 is coupled to the shared memory node 110 with a communications link 150. Processing node 130 is coupled to the shared memory node 110 with a communications link 160. Processing node 140 is coupled to the shared memory node 110 with a communications link 170.
Referring to FIG. 2, the shared memory node 110 includes a shared memory region 210. The shared memory region 210 includes a semaphore region 220.
Referring to FIG. 3, another embodiment is shown. This embodiment includes six processing nodes. The six processing nodes include a first processing node 310. The six processing nodes include a second processing node 320. The six processing nodes include a third processing node 330. The six processing nodes include a fourth processing node 340. The six processing nodes include a fifth processing node 350. The six processing nodes include a sixth processing node 360.
Still referring to FIG. 3, a shared memory node 370 includes a semaphore region 380. The semaphore region includes six pointers P0, PI, P2, P3, P4, P5, P6, each of which points to one of the six processing nodes. The pointing feature can be implements with data contained within the pointer itself as discussed below in more detail.
By convention the signaled processor and the signaling processor use a separate mechanism to indicate acknowledgment of the signaling event. However, acknowledgment is not required.
The signaling event is initiated by reading or writing a particular address. The value written to that or to a companion address can be a simple command or vector address to a location providing more complex command or sequence of commands.
Within the range, or in a separate range, a sub-range of addresses can be used to semaphore protect all of memory. Semaphore protection should be on a word, a cache line, or other meaningful element. All of protectable memory will be hashed, on an entity-granularity basis, to the semaphore sub-range. In the preferred embodiment, the means of such hashing is to address the sub-range modulo the lowest unique memory address bits. Of course, other hashing means can be derived.
When a given process encounters a critical section that requires semaphore protection, that process uses the atomic operation provided for the particular processor architecture to write the semaphore protecting that address range. If the semaphore has already been written, the process enters a waiting routine.
Given the hashing of memory for semaphores, there is a non-zero probability that a lock on one critical section can interfere with entry to a different critical section. Based on empirical tests, if at least 2048 semaphores are provided, such interference did not occur. In the actual running of a system over time, such interference will occur, but is only a performance issue so that if it occurs very rarely, no measurable loss of performance will occur.
While not being limited to any particular performance indicator or diagnostic identifier, preferred embodiments of the invention can be identified one at a time by testing for the presence of measurable loss of performance. Preferred embodiments will demonstrate substantially no measurable loss of performance. The test for the presence of measurable loss of performance can be carried out without undue experimentation by the use of a simple and conventional benchmark (speed) experiment. The semaphore range can be used as an aid in failure recovery when accompanied by "heartbeat" mechanisms. When capturing a semaphore, this invention teaches the concept that the owning processor write its identification to the semaphore location. When subsequent heartbeat mechanisms indicate that a processor has failed, the processor detecting the heartbeat failure will search and release semaphores owned by the failed processor.
The semaphore region can be duplicated in a mirrored-memory region. This increases the reliability, and therefore, the availability of the system.
The term substantially, as used herein, is defined as at least approaching a given state (e.g., preferably within 10% of, more preferably within 1% of, and most preferably within 0.1 % of). The term coupled, as used herein, is defined as connected, although not necessarily directly, and not necessarily mechanically. The term means, as used herein, is defined as hardware, firmware and/or software for achieving a result. The term program or phrase computer program, as used herein, is defined as a sequence of instructions designed for execution on a computer system. A program may include a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, and/or other sequence of instructions designed for execution on a computer system.
Example A specific embodiment of the invention will now be further described by the following, nonlimiting example which will serve to illustrate in some detail various features of significance. The example is intended merely to facilitate an understanding of ways in which the invention may be practiced and to further enable those of skill in the art to practice the invention. Accordingly, the example should not be construed as limiting the scope of the invention.
Referring to FIG. 4, a specific implementation of the invention is shown. A shared memory node 410 includes a semaphore region 420. A first node P5 can signal to a second node P2 with a pointer P2 that is located within the semaphore region 420. The existence of the pointer P2 alerts other nodes to the fact that a portion of the shared region is dedicated to reading by node P2.
Referring to FIG. 5, a semaphore can be more informationally rich than a simple flag. A semaphore format 500 can include a count of messages in the shared region for a node 510 (in this example for node P2). The semaphore format 500 can include a pointer to a shared region dedicated to reading by the node 520 (in this example node P2). The semaphore 500 can include a pointer to the node 530 (in this example node P). Thus, the semaphore can be self identifying and does not have to be written to a node specific memory region. Finally, the semaphore 500 can include a lock bit 540.
Practical Applications of the Invention A practical application of the invention that has value within the technological arts is waveform transformation. Further, the invention is useful in conjunction with data input and transformation (such as are used for the purpose of speech recognition), or in conjunction with transforming the appearance of a display (such as are used for the purpose of video games), or the like. There are virtually innumerable uses for the invention, all of which need not be detailed here. Advantages of the Invention
A system, representing an embodiment of the invention, can be cost effective and advantageous for at least the following reasons. The invention improves the speed of parallel computing systems. The invention improves the scalability of parallel computing systems. All the disclosed embodiments of the invention described herein can be realized and practiced without undue experimentation. Although the best mode of carrying out the invention contemplated by the inventors is disclosed above, practice of the invention is not limited thereto. Accordingly, it will be appreciated by those skilled in the art that the invention may be practiced otherwise than as specifically described herein.
For example, although the shared memory node described herein can be a separate module, it will be manifest that the shared memory node may be integrated into the system with which it is associated. Furthermore, all the disclosed elements and features of each disclosed embodiment can be combined with, or substituted for, the disclosed elements and features of every other disclosed embodiment except where such elements or features are mutually exclusive.
It will be manifest that various additions, modifications and rearrangements of the features of the invention may be made without deviating from the spirit and scope of the underlying inventive concept. It is intended that the scope of the invention as defined by the appended claims and their equivalents cover all such additions, modifications, and rearrangements.
The appended claims are not to be interpreted as including means-plus- function limitations, unless such a limitation is explicitly recited in a given claim using the phrase "means for." Expedient embodiments of the invention are differentiated by the appended subclaims.

Claims

CLAIMS What is claimed is:
1. A method, comprising: writing at least one pointer to a semaphore region of a shared memory region that is coupled to a plurality of processing nodes, wherein the at least one pointer points to at least one of said plurality of processing nodes, the at least one pointer i) indicating that a portion of said shared memory node is dedicated to reading by the at least one of said plurality of processing nodes and ii) protecting access to said portion of said shared memory node until said portion of said shared memory node has been read by the at least one of said plurality of processing nodes.
2. The method of claim 1, wherein writing the at least one pointer includes writing data that represents a count of messages in said portion of said shared memory node to be read the at least one of said plurality of processing nodes.
3. The method of claim 1 , wherein writing the at least one pointer includes writing data that identifies said portion of said shared memory node.
4. The method of claim 1 , wherein writing the at least one pointer includes writing data that points to one of said plurality of processing nodes.
5. The method of claim 1, wherein writing the at least one pointer includes writing a lock bit.
6. The method of claim 1 , further comprising determining whether a signaling event to the at least one of said plurality of processor nodes is under way.
7. The method of claim 1, further comprising acknowledging completion of a signaling event.
8. The method of claim 1, further comprising erasing the at least one pointer from said semaphore region.
9. The method of claim 1 , further comprising writing another pointer to said semaphore region.
10. An apparatus, comprising: a shared memory node; and a plurality of processing nodes coupled to said shared memory node, wherein said shared memory node includes at least one semaphore region having at least one pointer that points to at least one of said plurality of processing nodes, the at least one pointer i) indicating that a portion of said shared memory node is dedicated to reading by the at least one of said plurality of processing nodes and ii) protecting access to said portion of said shared memory node until said portion of said shared memory node has been read by the at least one of said plurality of processing nodes.
11. The apparatus of claim 10, wherein the at least one pointer includes data that represents a count of messages in said portion of said shared memory node to be read by the at least one of said plurality of processing nodes.
12. The apparatus of claim 10, wherein the at least one pointer includes data that identifies said portion of said shared memory node.
13. The apparatus of claim 10, wherein the at least one pointer includes data that points to one of said plurality of processing nodes.
14. The apparatus of claim 10, wherein the at least one pointer includes a lock bit.
15. The apparatus of claim 10, further comprising a plurality of links coupled between said shared memory node and said plurality of processing nodes.
16. A computer system comprising the apparatus of claim 10.
17. An electronic media, comprising: a computer program adapted to write at least one pointer to a semaphore region of a shared memory region that is coupled to a plurality of processing nodes, wherein the at least one pointer points to at least one of said plurality of processing nodes, the at least one pointer i) indicating that a portion of said shared memory node is dedicated to reading by the at least one of said plurality of processing nodes and ii) protecting access to said portion of said shared memory node until said portion of said shared memory node has been read by the at least one of said plurality of processing nodes.
18. A computer program comprising computer program means adapted to perform the step of writing at least one pointer to a semaphore region of a shared memory region that is coupled to a plurality of processing nodes when said computer program is run on a computer, wherein the at least one pointer points to at least one of said plurality of processing nodes, the at least one pointer i) indicating that a portion of said shared memory node is dedicated to reading by the at least one of said plurality of processing nodes and ii) protecting access to said portion of said shared memory node until said portion of said shared memory node has been read by the at least one of said plurality of processing nodes.
19. A computer program as claimed in claim 18, embodied on a computer- readable medium.
20. A system, comprising: a multiplicity of processors, each with private memory; and a shared memory, wherein said multiplicity of processors and said shared memory are interconnected and arranged such that memory accesses to a first set of address ranges will be to private memory whereas memory accesses to a second set of address ranges will be to said shared memory and at least a portion of one range of memory addresses or I/O addresses are provided for the purpose of signaling from a first processor to a second processor, said signaling to occur when said first processor reads or writes a location dedicated to said signaling, said location protected by a semaphore or other locking mechanism so that said first processor can determine whether said second processor has already been signaled by a third process or processor and, if said second processor has already been signaled by a third process or processor, said first processor can wait using defined procedures for signaling by said third process or processor to be complete.
21. The system of claim 20, further comprising storing in a signal address location or a companion location a simple command or vector address to a location providing a complex command or sequence of commands.
22. The system of claim 20, wherein said at least said portion of one range of memory is a binary bit.
23. A system, comprising: a multiplicity of processors, each with private memory; and a shared memory, wherein said multiplicity of processors and said shared memory are interconnected and arranged such that memory accesses to a first set of address ranges will be to local, private memory whereas memory accesses to a second set of address ranges will be to shared memory, and at least a portion of one range of memory addresses or I/O addresses are provided for the purpose of semaphore control of at least a portion of a remainder of said shared memory so that when a first process enters a critical section it may obtain a semaphore to continue into that section.
24. The system of claim 23, wherein said semaphore is to be obtained by reading a semaphore location to which an addressable element address hashes, said obtaining of said semaphore to be protected by the hardware atomic operation provided for the system.
25. The system of claim 24, wherein if a semaphore is already locked when said first process attempts to obtain said semaphore, said semaphore is not obtained by said first process and said first process waits for the release of said semaphore.
26. The system of claim 23, wherein obtaining said semaphore is accompanied by registration of an obtaining processor's identification and determining at a first processor when a second processor has stopped responding using an implementation of a heartbeat function, wherein release of said semaphore held by said second processor in response to such determination on a repeated basis is an indication that operations are normal.
PCT/US2000/024217 1999-08-31 2000-08-31 Semaphore control of shared-memory WO2001016741A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU69497/00A AU6949700A (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
EP00957948A EP1214651A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
CA002382927A CA2382927A1 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US15215199P 1999-08-31 1999-08-31
US60/152,151 1999-08-31
US60/220,794 2000-07-25
US22074800P 2000-07-26 2000-07-26
US22097400P 2000-07-26 2000-07-26
US60/220,748 2000-07-26

Publications (2)

Publication Number Publication Date
WO2001016741A2 true WO2001016741A2 (en) 2001-03-08
WO2001016741A3 WO2001016741A3 (en) 2001-09-20

Family

ID=27387201

Family Applications (9)

Application Number Title Priority Date Filing Date
PCT/US2000/024039 WO2001016760A1 (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster
PCT/US2000/024248 WO2001016742A2 (en) 1999-08-31 2000-08-31 Network shared memory
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster
PCT/US2000/024217 WO2001016741A2 (en) 1999-08-31 2000-08-31 Semaphore control of shared-memory
PCT/US2000/024216 WO2001016761A2 (en) 1999-08-31 2000-08-31 Efficient page allocation
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster

Family Applications Before (6)

Application Number Title Priority Date Filing Date
PCT/US2000/024039 WO2001016760A1 (en) 1999-08-31 2000-08-31 Switchable shared-memory cluster
PCT/US2000/024248 WO2001016742A2 (en) 1999-08-31 2000-08-31 Network shared memory
PCT/US2000/024150 WO2001016738A2 (en) 1999-08-31 2000-08-31 Efficient page ownership control
PCT/US2000/024298 WO2001016743A2 (en) 1999-08-31 2000-08-31 Shared memory disk
PCT/US2000/024210 WO2001016740A2 (en) 1999-08-31 2000-08-31 Efficient event waiting
PCT/US2000/024147 WO2001016737A2 (en) 1999-08-31 2000-08-31 Cache-coherent shared-memory cluster

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/US2000/024216 WO2001016761A2 (en) 1999-08-31 2000-08-31 Efficient page allocation
PCT/US2000/024329 WO2001016750A2 (en) 1999-08-31 2000-08-31 High-availability, shared-memory cluster

Country Status (4)

Country Link
EP (3) EP1214653A2 (en)
AU (9) AU7108300A (en)
CA (3) CA2382927A1 (en)
WO (9) WO2001016760A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory
WO2013012524A1 (en) * 2011-07-19 2013-01-24 Qualcomm Incorporated Synchronization of shader operation
WO2014088726A1 (en) * 2012-12-07 2014-06-12 Intel Corporation Memory based semaphores

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003007134A1 (en) * 2001-07-13 2003-01-23 Koninklijke Philips Electronics N.V. Method of running a media application and a media system with job control
US6920485B2 (en) 2001-10-04 2005-07-19 Hewlett-Packard Development Company, L.P. Packet processing in shared memory multi-computer systems
US6999998B2 (en) 2001-10-04 2006-02-14 Hewlett-Packard Development Company, L.P. Shared memory coupling of network infrastructure devices
US7254745B2 (en) 2002-10-03 2007-08-07 International Business Machines Corporation Diagnostic probe management in data processing systems
US7685381B2 (en) 2007-03-01 2010-03-23 International Business Machines Corporation Employing a data structure of readily accessible units of memory to facilitate memory access
US7899663B2 (en) 2007-03-30 2011-03-01 International Business Machines Corporation Providing memory consistency in an emulated processing environment
EP2851807B1 (en) * 2013-05-28 2017-09-20 Huawei Technologies Co., Ltd. Method and system for supporting resource isolation under multi-core architecture

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4725946A (en) * 1985-06-27 1988-02-16 Honeywell Information Systems Inc. P and V instructions for semaphore architecture in a multiprogramming/multiprocessing environment
US5367690A (en) * 1991-02-14 1994-11-22 Cray Research, Inc. Multiprocessing system using indirect addressing to access respective local semaphore registers bits for setting the bit or branching if the bit is set
EP0769740A1 (en) * 1995-10-17 1997-04-23 International Business Machines Corporation Inter-object communication

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3668644A (en) * 1970-02-09 1972-06-06 Burroughs Corp Failsafe memory system
US4484262A (en) * 1979-01-09 1984-11-20 Sullivan Herbert W Shared memory computer method and apparatus
US4403283A (en) * 1980-07-28 1983-09-06 Ncr Corporation Extended memory system and method
US4414624A (en) * 1980-11-19 1983-11-08 The United States Of America As Represented By The Secretary Of The Navy Multiple-microcomputer processing
JPH063589B2 (en) * 1987-10-29 1994-01-12 インターナシヨナル・ビジネス・マシーンズ・コーポレーシヨン Address replacement device
US5175839A (en) * 1987-12-24 1992-12-29 Fujitsu Limited Storage control system in a computer system for double-writing
EP0343646B1 (en) * 1988-05-26 1995-12-13 Hitachi, Ltd. Task execution control method for a multiprocessor system with enhanced post/wait procedure
US4992935A (en) * 1988-07-12 1991-02-12 International Business Machines Corporation Bit map search by competitive processors
US4965717A (en) * 1988-12-09 1990-10-23 Tandem Computers Incorporated Multiple processor system having shared memory with private-write capability
EP0457308B1 (en) * 1990-05-18 1997-01-22 Fujitsu Limited Data processing system having an input/output path disconnecting mechanism and method for controlling the data processing system
US5206952A (en) * 1990-09-12 1993-04-27 Cray Research, Inc. Fault tolerant networking architecture
JPH04271453A (en) * 1991-02-27 1992-09-28 Toshiba Corp Composite electronic computer
EP0528538B1 (en) * 1991-07-18 1998-12-23 Tandem Computers Incorporated Mirrored memory multi processor system
US5315707A (en) * 1992-01-10 1994-05-24 Digital Equipment Corporation Multiprocessor buffer system
US5398331A (en) * 1992-07-08 1995-03-14 International Business Machines Corporation Shared storage controller for dual copy shared data
US5434975A (en) * 1992-09-24 1995-07-18 At&T Corp. System for interconnecting a synchronous path having semaphores and an asynchronous path having message queuing for interprocess communications
DE4238593A1 (en) * 1992-11-16 1994-05-19 Ibm Multiprocessor computer system
JP2963298B2 (en) * 1993-03-26 1999-10-18 富士通株式会社 Recovery method of exclusive control instruction in duplicated shared memory and computer system
US5590308A (en) * 1993-09-01 1996-12-31 International Business Machines Corporation Method and apparatus for reducing false invalidations in distributed systems
US5664089A (en) * 1994-04-26 1997-09-02 Unisys Corporation Multiple power domain power loss detection and interface disable
US5636359A (en) * 1994-06-20 1997-06-03 International Business Machines Corporation Performance enhancement system and method for a hierarchical data cache using a RAID parity scheme
US5940870A (en) * 1996-05-21 1999-08-17 Industrial Technology Research Institute Address translation for shared-memory multiprocessor clustering
US5784699A (en) * 1996-05-24 1998-07-21 Oracle Corporation Dynamic memory allocation in a computer using a bit map index
JPH10142298A (en) * 1996-11-15 1998-05-29 Advantest Corp Testing device for ic device
US5829029A (en) * 1996-12-18 1998-10-27 Bull Hn Information Systems Inc. Private cache miss and access management in a multiprocessor system with shared memory
US5918248A (en) * 1996-12-30 1999-06-29 Northern Telecom Limited Shared memory control algorithm for mutual exclusion and rollback
US6360303B1 (en) * 1997-09-30 2002-03-19 Compaq Computer Corporation Partitioning memory shared by multiple processors of a distributed processing system
DE69715203T2 (en) * 1997-10-10 2003-07-31 Bull Sa A data processing system with cc-NUMA (cache coherent, non-uniform memory access) architecture and cache memory contained in local memory for remote access

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4725946A (en) * 1985-06-27 1988-02-16 Honeywell Information Systems Inc. P and V instructions for semaphore architecture in a multiprogramming/multiprocessing environment
US5367690A (en) * 1991-02-14 1994-11-22 Cray Research, Inc. Multiprocessing system using indirect addressing to access respective local semaphore registers bits for setting the bit or branching if the bit is set
EP0769740A1 (en) * 1995-10-17 1997-04-23 International Business Machines Corporation Inter-object communication

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MAHENDRA RAMACHANDRAN, MUKESH SINGHAL: "DECENTRALIZED SEMAPHORE SUPPORT IN A VIRTUAL SHARED-MEMORY SYSTEM" JOURNAL OF SUPERCOMPUTING,NL,KLUWER ACADEMIC PUBLISHERS, DORDRECHT, vol. 9, no. 1/02, 1995, pages 51-70, XP000521882 ISSN: 0920-8542 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1895413A3 (en) * 2006-08-18 2009-09-30 Fujitsu Limited Access monitoring method and device for shared memory
WO2013012524A1 (en) * 2011-07-19 2013-01-24 Qualcomm Incorporated Synchronization of shader operation
US9442780B2 (en) 2011-07-19 2016-09-13 Qualcomm Incorporated Synchronization of shader operation
WO2014088726A1 (en) * 2012-12-07 2014-06-12 Intel Corporation Memory based semaphores
US9064437B2 (en) 2012-12-07 2015-06-23 Intel Corporation Memory based semaphores
US10078879B2 (en) 2012-12-07 2018-09-18 Intel Corporation Process synchronization between engines using data in a memory location

Also Published As

Publication number Publication date
CA2382728A1 (en) 2001-03-08
AU6949700A (en) 2001-03-26
WO2001016738A9 (en) 2002-09-12
AU7110000A (en) 2001-03-26
WO2001016743A2 (en) 2001-03-08
WO2001016760A1 (en) 2001-03-08
WO2001016738A2 (en) 2001-03-08
EP1214651A2 (en) 2002-06-19
CA2382929A1 (en) 2001-03-08
WO2001016737A2 (en) 2001-03-08
WO2001016750A2 (en) 2001-03-08
WO2001016738A3 (en) 2001-10-04
WO2001016761A3 (en) 2001-12-27
AU7100700A (en) 2001-03-26
WO2001016743A8 (en) 2001-10-18
AU6949600A (en) 2001-03-26
AU7108300A (en) 2001-03-26
WO2001016737A3 (en) 2001-11-08
WO2001016761A2 (en) 2001-03-08
WO2001016743A3 (en) 2001-08-09
EP1214653A2 (en) 2002-06-19
WO2001016741A3 (en) 2001-09-20
AU7113600A (en) 2001-03-26
WO2001016740A3 (en) 2001-12-27
WO2001016738A8 (en) 2001-05-03
WO2001016740A2 (en) 2001-03-08
WO2001016742A3 (en) 2001-09-20
CA2382927A1 (en) 2001-03-08
EP1214652A2 (en) 2002-06-19
AU7474200A (en) 2001-03-26
AU7112100A (en) 2001-03-26
WO2001016750A3 (en) 2002-01-17
AU7108500A (en) 2001-03-26
WO2001016742A2 (en) 2001-03-08

Similar Documents

Publication Publication Date Title
US7145837B2 (en) Global recovery for time of day synchronization
KR101291016B1 (en) Registering a user-handler in hardware for transactional memory event handling
US7668923B2 (en) Master-slave adapter
JP4160925B2 (en) Method and system for communication between processing units in a multiprocessor computer system including a cross-chip communication mechanism in a distributed node topology
US5574945A (en) Multi channel inter-processor coupling facility processing received commands stored in memory absent status error of channels
US20050091383A1 (en) Efficient zero copy transfer of messages between nodes in a data processing system
JP6238898B2 (en) System and method for providing and managing message queues for multi-node applications in a middleware machine environment
US7552236B2 (en) Routing interrupts in a multi-node system
US20010052054A1 (en) Apparatus and method for partitioned memory protection in cache coherent symmetric multiprocessor systems
JP2009506403A (en) Direct update software transactional memory
US20050080920A1 (en) Interpartition control facility for processing commands that effectuate direct memory to memory information transfer
Arvind et al. Two fundamental issues in multiprocessing
KR20200014378A (en) Job management
WO2001016741A2 (en) Semaphore control of shared-memory
JPH0512126A (en) Device and method for address conversion for virtual computer
US5909574A (en) Computing system with exception handler and method of handling exceptions in a computing system
US20070130303A1 (en) Apparatus, system, and method for recovering messages from a failed node
US20050078708A1 (en) Formatting packet headers in a communications adapter
US6823498B2 (en) Masterless building block binding to partitions
US20040128351A1 (en) Mechanism to broadcast transactions to multiple agents in a multi-node system
JPH06309285A (en) Communication processing circuit for parallel computer
US7073031B1 (en) Multi-processor system having data coherency
CN113076282A (en) Deadlock processing method for processor on-chip network
Zertal et al. Communication/synchronisation mechanism for multiprocessor on Chip architectures
JPH09146903A (en) Data transfer control method for parallel computer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US US US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

WWE Wipo information: entry into national phase

Ref document number: 2382927

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2000957948

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2000957948

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 2000957948

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)