US9201685B2 - Transactional cache versioning and storage in a distributed data grid - Google Patents

Transactional cache versioning and storage in a distributed data grid Download PDF

Info

Publication number
US9201685B2
US9201685B2 US13/359,375 US201213359375A US9201685B2 US 9201685 B2 US9201685 B2 US 9201685B2 US 201213359375 A US201213359375 A US 201213359375A US 9201685 B2 US9201685 B2 US 9201685B2
Authority
US
United States
Prior art keywords
transaction
client
version
commit
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/359,375
Other versions
US20120197994A1 (en
Inventor
Tom Beerbower
John P. Speidel
Jonathan Purdy
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oracle International Corp
Original Assignee
Oracle International Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oracle International Corp filed Critical Oracle International Corp
Priority to US13/359,375 priority Critical patent/US9201685B2/en
Assigned to ORACLE INTERNATIONAL CORPORATION reassignment ORACLE INTERNATIONAL CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BEERBOWER, TOM, PURDY, JONATHAN, SPEIDEL, JOHN P.
Publication of US20120197994A1 publication Critical patent/US20120197994A1/en
Application granted granted Critical
Publication of US9201685B2 publication Critical patent/US9201685B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing

Definitions

  • the current invention relates to transactions and versioning in distributed cache environments.
  • Transactions are a common form of accessing and modifying a data set.
  • a transaction can be thought of as a unit of work performed on a piece of data such as a create, read, update or delete (CRUD) operation.
  • CRUD create, read, update or delete
  • Transactions provide reliable units of work that can allow correct recovery from failures and keep the data consistent, as well manage concurrency between multiple programs or threads accessing the data to provide isolation between those programs.
  • a transaction coordinator maintains a commit version for each transaction. This version is updated over the course of the transaction.
  • each cluster member maintains a local current version that is updated as messages are received from the client.
  • the transaction coordinator includes an associated transaction's current version value with the message.
  • the receiving member process sets its current version to be the maximum of its own value and the received value.
  • the receiving member process includes its current version in the return message to the sender.
  • the client sets the transaction's commit version to be greater than the maximum of its own value and the received value.
  • FIG. 1 is an illustration of the transaction versioning, in accordance with various embodiments of the invention.
  • FIG. 2 is an illustration of the process for transaction versioning, in accordance with various embodiments of the invention.
  • FIG. 3 is an illustration of an example showing the interaction between a client and the cluster of servers using transactional caches, in accordance with various embodiments of the invention.
  • a transaction framework for a distributed data grid that manages data used by application objects.
  • the data grid is a data management system for application objects that are shared across multiple servers, require low response time, high throughput, predictable scalability, continuous availability and information reliability. As a result of these capabilities, the data grid is ideally suited for use in computational intensive, stateful middle-tier applications.
  • the data management is targeted to run in the application tier, and is often run in-process with the application itself, for example in the application server cluster.
  • a data grid is a system composed of multiple servers that work together to manage information and related operations—such as computations—in a distributed environment.
  • An in-memory data grid then is a data grid that stores the information in memory to achieve higher performance and uses redundancy by keeping copies of that information synchronized across multiple servers to ensure resiliency of the system and the availability of the data in the event of server failure.
  • the transaction framework provides partial ordering of events within a distributed system via an algorithm based on Lamport Timestamps. Because the events are partially ordered (events are only ordered with respect to other events that they are related to), there is no requirement to maintain a global clock. Therefore if two transactions are intersecting then their committed versions are relevant to each other and imply an ordering between the two transactions. If transactions do not intersect then no ordering between them is required.
  • FIG. 1 is an illustration of the transaction versioning, in accordance with various embodiments of the invention.
  • this diagram depicts components as logically separate, such depiction is merely for illustrative purposes. It will be apparent to those skilled in the art that the components portrayed in this figure and in other figures can be combined or divided into separate software, firmware and/or hardware. Furthermore, it will also be apparent to those skilled in the art that such components, regardless of how they are combined or divided, can execute on the same computing device or can be distributed among different computing devices connected by one or more networks or other suitable communication means.
  • a transaction coordinator 100 (e.g. client) maintains a commit version 105 for each transaction. This version is updated over the course of the transaction.
  • Each cluster member 101 - 104 maintains a local current version 106 - 109 that is updated as messages are received from the client.
  • the transaction coordinator when a client 100 (transaction coordinator) process sends a message, the transaction coordinator includes an associated transaction's current version value with the message.
  • the receiving member 101 process sets its current version to be the maximum of its own value and the received value.
  • the receiving member process includes its current version in the return message to the sender.
  • the client sets the transaction's commit version to be greater than the maximum of its own value and the received value.
  • the transaction framework also maintains a consistent read version which is used to provide a consistent read isolation level.
  • This provides transaction scoped read consistency. This isolation level guarantees that all the data read in a transaction comes from a single point in time; the time that the transaction began.
  • a commit version 105 is maintained for each transaction.
  • the commit version 105 is updated by the transaction coordinator 100 in the transaction's local state over the life of the transaction.
  • the commit version of the transaction may be updated based on the versions returned from the cluster members.
  • the commit version becomes that transaction's version of record.
  • the commit version is thus maintained per transaction, is updated each time a mutating operation occurs in the context of the transaction and is used as the version of record when the transaction is committed.
  • the current version ( 106 , 107 , 108 , 109 ) value is maintained for each storage enabled cluster member ( 101 , 102 , 103 , 104 ).
  • the current version is updated in each cluster member's local state each time a message is received so that the current version is the maximum of the member's current value and the received value.
  • the current version is monotonic; is maintained per cluster member; is attached to a message for any mutating operation (client); is obtained from message of any mutating operation and set on member (server) if greater than current version value for the member; and is obtained from the member and set on the result after any mutating operation.
  • the returned version is used to calculate (+1) the commit version for the transaction.
  • a consistent read version value is maintained by the cluster.
  • the consistent read version is the version that will be used to obtain a consistent view of the system for connections configured as read consistent.
  • the consistent read version must be monotonic, is maintained per member; is set on message for any read operation (client). Actual version can be used for consistent read, while LONG_MAX can be used for read committed.
  • the consistent read version is obtained from message of any read operation (server) and is used to obtain the correct version of the values being read for consistent reads.
  • FIG. 2 is an illustration of the process for transaction versioning, in accordance with various embodiments of the invention.
  • FIG. 2 depicts functional steps in a particular sequence for purposes of illustration, the process is not necessarily limited to this particular order or steps.
  • One skilled in the art will appreciate that the various steps portrayed in this figure can be changed, rearranged, performed in parallel or adapted in various ways.
  • certain steps or sequences of steps can be added to or omitted from this process, without departing from the spirit and scope of the invention.
  • the transaction coordinator maintains a commit version for each transaction, wherein the commit version is updated over the course of the transaction.
  • each cluster member maintains a current version that is updated as messages are received.
  • the transaction coordinator when the transaction coordinator sends a message, the transaction coordinator includes the associated transaction's current version value with the message.
  • the receiving member sets its current version to be the maximum of its own value and the received value.
  • the receiving member upon receiving the return messages, the client sets the transaction's commit version to be greater than the maximum of its own value and the received value.
  • FIG. 3 is an illustration of an example showing the interaction between a client and the cluster of servers using transactional caches, in accordance with various embodiments of the invention.
  • FIG. 3 depicts functional steps in a particular sequence for purposes of illustration, the process is not necessarily limited to this particular order or steps.
  • One skilled in the art will appreciate that the various steps portrayed in this figure can be changed, rearranged, performed in parallel or adapted in various ways.
  • certain steps or sequences of steps can be added to or omitted from this process, without departing from the spirit and scope of the invention.
  • the diagram shows the interaction between a client and servers using transactional caches.
  • the diagram shows how the version counters are passed along with messages to drive version progression across the cluster.
  • the application code on the client 300 issues a put operation.
  • the key passed to the put operation is associated with Server 1 301 .
  • Server 1 301 is responsible for storing that data.
  • the message passed to the server contains the version associated with the transaction context from the client (transaction coordinator).
  • the transaction framework on the server will update its own current version to be the max of its version and the received version. This version is returned with the result of the put operation.
  • the client 300 will update the commit version of the transaction context to be the max of its current value and the received value+1.
  • the application code then issues another put operation.
  • Server 2 302 is responsible for storing the data.
  • the message passed to the server 2 302 contains the version associated with the transaction context from the client (transaction coordinator).
  • the server 302 On receiving the message the server 302 will update its own current version to be the max of its version and the received version. This version is returned with the result of the put operation.
  • the client 300 will update the commit version of the transaction context to be the max of its current value and the received value, plus one.
  • the transactional cache uses several distributed caches as internal system tables for the storage of natural keys, versioned cache values and commit records.
  • This system of storage allows the transactional framework to store multiple versions of a value for a single key in any transactional cache. This enables transaction isolation and read consistency.
  • a transactional cache when a transactional cache is created, its underlying storage is ensured. Accordingly, the distributed caches that make up the transactional cache's storage are ensured through the associated service. This internal storage is set up on a per transactional cache basis. As such, each transactional cache will have its own set of distributed caches created for its storage.
  • the tables used by the transactional cache include but are not limited to the following tables:
  • a natural key When a natural key is first used in a transactional cache, a synthetic key is created for it by the transaction framework. This key is unique in the system, exists in a one to one relationship with the natural key and encodes certain information such as the partition number of the natural key.
  • the synthetic keys are simply shown as “sk” with some numeric identifier appended.
  • the example table below represents the natural table a transactional cache named “txcache 1 ”. In this case, there are three natural keys known to the system that have been associated with system assigned synthetic keys.
  • the values table maps an transaction id(xid)/synthetic composite key to a value.
  • the transaction framework is able to store multiple values for each natural key.
  • Deleted values are represented by a NIL value entry in the cache. In accordance with an embodiment, it may be important to show that a key may be associated with some value in one transaction but deleted in another.
  • the transaction id is simply a unique identifier for a transaction.
  • the transaction identifiers are simply shown as “tx” with some numeric identifier appended.
  • the example table below shows the values table the transactional cache named “txcache 1 ”. As shown, there are different values for the same synthetic key. For example, there are three different entries for the synthetic key “sk 001 ” with the values of “A 1 ”, “A 2 ” and NIL, each associated with a different transaction.
  • the versions table maps an xid/synthetic key to a committed version.
  • the version is a monotonically increasing value that provides an order to the committed transactions.
  • an entry is added to the versions table for each key that was involved in the transaction.
  • the value is the commit version associated with the transaction being committed.
  • the example table below shows the versions table the transactional cache named “txcache 1 ”. In this case there are four committed transactions.
  • a consistent view of the transactional cache can be obtained by querying the versions table for the maximum version for each unique synthetic key where the version is less than or equal to some given version.
  • the xid/synthetic keys from these results can be used to obtain the values of the cache at version 15 from the values table. Using the results above would produce the following rows from the values table:
  • a single operation to a transactional cache may involve more than one of the related system caches. For example, a put on a transactional cache is invoked against the natural table but causes an insert into the values table.
  • the specialized processors used to perform transactional cache operations may make changes directly against the backing maps of any of the related system tables. This means that all of the entries related to a given natural key may need to be co-located in the same partition.
  • the transaction framework can achieve this by providing a custom key partitioning strategy that makes use of the partition number encoded in each synthetic key which is set to match the partition number of its associated natural key.
  • the following example shows the state of the transactional cache and its underlying storage as transactions are committed. For simplicity, this example assumes that only a single process is mutating the transactional cache during the course of each transaction.
  • the following transaction is a simple put on a transactional cache.
  • the connection is set to auto commit true, so the put is immediately committed.
  • NamedCache txcache 1 connection.getCache(“tx-cache 1 ”);
  • this direct association between key and value does not actually exist in the transactional cache the way it does in a normal distributed cache.
  • multiple versions of a value may be maintained so the association between key and value must be derived for given a version as described above.
  • the natural table for txcache 1 will contain an association between the given key “key 1 ” and some synthetic key value “sk 001 ”.
  • the values table for the txcache 1 will contain an entry that associates the transaction “tx 001 ” and the synthetic key “sk 001 ” with the given value “A 1 ”.
  • the versions table for the txcache 1 will contain an entry that associates the transaction “tx 001 ” and the synthetic key “sk 001 ” with a commit version of 1 .
  • the following shows a transaction on a transactional cache that includes multiple put operations.
  • the connection is set to auto commit false, so neither put is committed until commit is called on the connection.
  • NamedCache txcache 1 connection.getCache(“tx-cache 1 ”);
  • the values table for the txcache 1 will contain an entries that associate the transaction “tx 002 ” and the synthetic keys with the values passed in the put operations. This example shows that the previous value from Transaction 1 is still maintained in the values table.
  • the versions table for txcache 1 will contain entries that associate the transaction “tx 001 ” and each synthetic key used in the transaction with a commit version of 5 . It should be noted that the commit version value of 5 is not significant other than for the fact that it is monotonically increasing. It should also be noted that there are multiple committed versions for the synthetic key “sk 001 ”.
  • the following shows a transaction on a transactional cache that includes multiple put operations and a remove operation.
  • the connection is set to auto commit false, so nothing is committed until commit is called on the connection.
  • NamedCache txcache 1 connection.getCache(“tx-cache 1 ”);
  • the values table for txcache 1 will contain an entries that associate the transaction “tx 003 ” and the synthetic keys with the values passed in the cache operations.
  • the table below shows that the previous values from Transaction 1 and Transaction 2 are still maintained in the $Values table.
  • the versions table for txcache 1 will contain entries that associate the transaction “tx 003 ” and each synthetic key used in the transaction with a commit version of 12 .
  • the transaction framework further includes an application programming interface (API) that is a connection-based API and provides atomic transaction guarantees across partitions and caches of the data grid even in the event of client failure.
  • API application programming interface
  • the transaction framework includes a set of transactional caches (which are specialized forms of distributed caches), transactional connections, optimistic transaction support, recovery managers and resource adapters.
  • the transaction framework allows the use of the data grid caches in the context of a transaction. This includes most of the NamedCache API's including queries, entry processors, and aggregators.
  • the transaction framework provides ACID guarantees, multiple read isolation levels, and deferred operations.
  • the transaction framework uses transactional connections that represent a logical connection to the data grid.
  • the transaction framework includes a transactional connection to the data grid.
  • the transactional connection represents a logical connection to the data grid. It serves as a factory for transactional caches and provides transaction demarcation, such as auto-commit mode, commit and rollback.
  • the transactional connection allows the user to set an isolation level.
  • the isolation level includes the following levels:
  • Read Committed guarantees that only committed data is visible and does not provide any consistency guarantees (this is a default isolation level). Generally, this is the weakest isolation level and provides the highest performance.
  • Statement Consistent Read Provides statement-scoped read consistency. It guarantees that all the data read by a single operation comes from a single point in time when the statement began execution.
  • Transaction Consistent Read Provides transaction-scoped read consistency. It guarantees that all the data read in a transaction comes from a single point in time when the transaction began.
  • the transactional connection can function in an eager-mode or a non-eager mode.
  • eager mode every operation is immediately flushed to the grid.
  • non-eager mode the flush of the operation can be deferred which can provide certain performance advantages since some of the operations may be batched when they are flushed.
  • the embodiments of the invention further encompass computer apparatus, computing systems and machine-readable media configured to carry out the foregoing systems and methods.
  • the present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art.
  • the various embodiments include a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a general purpose or specialized computing processor(s)/device(s) to perform any of the features presented herein.
  • the storage medium can include, but is not limited to, one or more of the following: any type of physical media including floppy disks, optical discs, DVDs, CD-ROMs, microdrives, magneto-optical disks, holographic storage, ROMs, RAMs, PRAMS, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs); paper or paper-based media; and any type of media or device suitable for storing instructions and/or information.
  • the computer program product can be transmitted in whole or in parts and over one or more public and/or private networks wherein the transmission includes instructions which can be used by one or more processors to perform any of the features presented herein.
  • the transmission may include a plurality of separate transmissions.
  • the computer storage medium containing the instructions is non-transitory (i.e. not in the process of being transmitted) but rather is persisted on a physical device.

Abstract

A set of techniques are described for transactional cache versioning and data storage in a distributed data grid environment. A transaction coordinator maintains a commit version for each transaction. This version is updated over the course of the transaction. In addition, each cluster member maintains a local current version that is updated as messages are received from the client. When a client serving as a transaction coordinator sends a message, the transaction coordinator includes an associated transaction's current version value with the message. On receiving a message, the receiving member process sets its current version to be the maximum of its own value and the received value. The receiving member process includes its current version in the return message to the sender. On receiving the return messages, the client sets the transaction's commit version to be greater than the maximum of its own value and the received value.

Description

CLAIM OF PRIORITY
The present application claims the benefit of the following U.S. Provisional Patent Application, which is incorporated by reference herein in its entirety:
U.S. Provisional Patent Application No. 61/437,532, entitled “TRANSACTION FRAMEWORK FOR A DISTRIBUTED IN-MEMORY DATA GRID,” by Tom Beerbower et al., filed on Jan. 28, 2011.
CROSS REFERENCE TO RELATED APPLICATIONS
This patent application is related to the following U.S. patent applications, each of which is incorporated by reference herein in its entirety:
U.S. Provisional Patent Application No. 61/437,521 entitled “IN-MEMORY DATA GRID FOR MANAGING AND CACHING DATA USED BY APPLICATIONS” by Christer Fahlgren et al., filed on Jan. 28, 2011;
U.S. Provisional Patent Application No. 61/437,536 entitled “QUERY LANGUAGE FOR ACCESSING DATA STORED IN A DISTRIBUTED IN-MEMORY DATA GRID” by David Leibs et al., filed on Jan. 28, 2011;
U.S. Provisional Patent Application No. 61/437,541 entitled “SECURITY FOR A DISTRIBUTED IN-MEMORY DATA GRID” by David Guy et al., filed on Jan. 28, 2011;
U.S. patent application Ser. No. 13/352,195 entitled “SYSTEM AND METHOD FOR USE WITH A DATA GRID CLUSTER TO SUPPORT DEATH DETECTION” by Mark Falco et al., filed on Jan. 17, 2012;
U.S. patent application Ser. No. 13/352,203 entitled “SYSTEM AND METHOD FOR USING CLUSTER LEVEL QUORUM TO PREVENT SPLIT BRAIN SCENARIO IN A DATA GRID CLUSTER” by Robert H. Lee et al., filed on Jan. 17, 2012;
U.S. patent application Ser. No. 13/360,487 entitled “PUSH REPLICATION FOR USE WITH A DISTRIBUTED DATA GRID” by Brian Oliver et al., filed on Jan. 27, 2012; and
U.S. patent application Ser. No. 13/359,391 entitled “PROCESSING PATTERN FRAMEWORK FOR DISPATCHING AND EXECUTING TASKS IN A DISTRIBUTED COMPUTING GRID” by Brian Oliver et al., filed on Jan. 26, 2012.
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
FIELD OF THE INVENTION
The current invention relates to transactions and versioning in distributed cache environments.
BACKGROUND
In recent times, various data management systems have become the backbone of information technology and other systems. Virtually all businesses, consumers and other entities utilize some form of data management ranging from enterprises that manage huge sets of data, to individuals that desire to store a personal data, music or other information.
The use of transactions is a common form of accessing and modifying a data set. In this context, a transaction can be thought of as a unit of work performed on a piece of data such as a create, read, update or delete (CRUD) operation. Transactions provide reliable units of work that can allow correct recovery from failures and keep the data consistent, as well manage concurrency between multiple programs or threads accessing the data to provide isolation between those programs.
In order to maintain transaction isolation it is typically a requirement of a transactional system to provide some partial ordering of events. This ensures that transactions are processed in a reliable manner and removes interference (data inconsistency) between concurrent executions. One way to perform this ordering is to base them on a timestamp so that it can be determined which event has priority over another event. In a distributed system, however, it can be extremely difficult to base such ordering of events on a global timestamp because of the difficulties of keeping the distributed processes perfectly synchronized. It is thus desirable to address the issue of ordering of events for transactions within a distributed cache system.
BRIEF SUMMARY
In accordance with various embodiments, a set of techniques are described for transactional cache versioning and data storage in a distributed data grid environment. A transaction coordinator maintains a commit version for each transaction. This version is updated over the course of the transaction. In addition, each cluster member maintains a local current version that is updated as messages are received from the client. When a client serving as a transaction coordinator sends a message, the transaction coordinator includes an associated transaction's current version value with the message. On receiving a message, the receiving member process sets its current version to be the maximum of its own value and the received value. The receiving member process includes its current version in the return message to the sender. On receiving the return messages, the client sets the transaction's commit version to be greater than the maximum of its own value and the received value.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is an illustration of the transaction versioning, in accordance with various embodiments of the invention.
FIG. 2 is an illustration of the process for transaction versioning, in accordance with various embodiments of the invention.
FIG. 3 is an illustration of an example showing the interaction between a client and the cluster of servers using transactional caches, in accordance with various embodiments of the invention.
DETAILED DESCRIPTION
In accordance with various embodiments, a transaction framework is described for a distributed data grid that manages data used by application objects. The data grid is a data management system for application objects that are shared across multiple servers, require low response time, high throughput, predictable scalability, continuous availability and information reliability. As a result of these capabilities, the data grid is ideally suited for use in computational intensive, stateful middle-tier applications. The data management is targeted to run in the application tier, and is often run in-process with the application itself, for example in the application server cluster. In accordance with an embodiment, a data grid is a system composed of multiple servers that work together to manage information and related operations—such as computations—in a distributed environment. An in-memory data grid then is a data grid that stores the information in memory to achieve higher performance and uses redundancy by keeping copies of that information synchronized across multiple servers to ensure resiliency of the system and the availability of the data in the event of server failure.
In accordance with an embodiment, the transaction framework provides partial ordering of events within a distributed system via an algorithm based on Lamport Timestamps. Because the events are partially ordered (events are only ordered with respect to other events that they are related to), there is no requirement to maintain a global clock. Therefore if two transactions are intersecting then their committed versions are relevant to each other and imply an ordering between the two transactions. If transactions do not intersect then no ordering between them is required.
FIG. 1 is an illustration of the transaction versioning, in accordance with various embodiments of the invention. Although this diagram depicts components as logically separate, such depiction is merely for illustrative purposes. It will be apparent to those skilled in the art that the components portrayed in this figure and in other figures can be combined or divided into separate software, firmware and/or hardware. Furthermore, it will also be apparent to those skilled in the art that such components, regardless of how they are combined or divided, can execute on the same computing device or can be distributed among different computing devices connected by one or more networks or other suitable communication means.
As illustrated, a transaction coordinator 100 (e.g. client) maintains a commit version 105 for each transaction. This version is updated over the course of the transaction. Each cluster member 101-104 maintains a local current version 106-109 that is updated as messages are received from the client. As illustrated in the figure, when a client 100 (transaction coordinator) process sends a message, the transaction coordinator includes an associated transaction's current version value with the message. On receiving a message, the receiving member 101 process sets its current version to be the maximum of its own value and the received value. The receiving member process includes its current version in the return message to the sender. On receiving the return messages, the client sets the transaction's commit version to be greater than the maximum of its own value and the received value.
In accordance with an embodiment, the transaction framework also maintains a consistent read version which is used to provide a consistent read isolation level. This provides transaction scoped read consistency. This isolation level guarantees that all the data read in a transaction comes from a single point in time; the time that the transaction began.
In accordance with an embodiment, a commit version 105 is maintained for each transaction. The commit version 105 is updated by the transaction coordinator 100 in the transaction's local state over the life of the transaction. When a mutating operation occurs in the context of a transaction, the commit version of the transaction may be updated based on the versions returned from the cluster members. When the transaction is committed, the commit version becomes that transaction's version of record. The commit version is thus maintained per transaction, is updated each time a mutating operation occurs in the context of the transaction and is used as the version of record when the transaction is committed.
In accordance with an embodiment, the current version (106, 107, 108, 109) value is maintained for each storage enabled cluster member (101, 102, 103, 104). The current version is updated in each cluster member's local state each time a message is received so that the current version is the maximum of the member's current value and the received value. In accordance with an embodiment, the current version is monotonic; is maintained per cluster member; is attached to a message for any mutating operation (client); is obtained from message of any mutating operation and set on member (server) if greater than current version value for the member; and is obtained from the member and set on the result after any mutating operation. The returned version is used to calculate (+1) the commit version for the transaction.
In accordance with an embodiment, a consistent read version value is maintained by the cluster. The consistent read version is the version that will be used to obtain a consistent view of the system for connections configured as read consistent. In accordance with an embodiment, the consistent read version must be monotonic, is maintained per member; is set on message for any read operation (client). Actual version can be used for consistent read, while LONG_MAX can be used for read committed. The consistent read version is obtained from message of any read operation (server) and is used to obtain the correct version of the values being read for consistent reads.
FIG. 2 is an illustration of the process for transaction versioning, in accordance with various embodiments of the invention. Although this figure depicts functional steps in a particular sequence for purposes of illustration, the process is not necessarily limited to this particular order or steps. One skilled in the art will appreciate that the various steps portrayed in this figure can be changed, rearranged, performed in parallel or adapted in various ways. Furthermore, it is to be understood that certain steps or sequences of steps can be added to or omitted from this process, without departing from the spirit and scope of the invention.
As shown in step 200, the transaction coordinator maintains a commit version for each transaction, wherein the commit version is updated over the course of the transaction. In step 201, each cluster member maintains a current version that is updated as messages are received. In step 202, when the transaction coordinator sends a message, the transaction coordinator includes the associated transaction's current version value with the message. In step 203, upon receiving the message, the receiving member sets its current version to be the maximum of its own value and the received value. In step 204, the receiving member includes its current version in the return message to the sender. In step 205, upon receiving the return messages, the client sets the transaction's commit version to be greater than the maximum of its own value and the received value.
FIG. 3 is an illustration of an example showing the interaction between a client and the cluster of servers using transactional caches, in accordance with various embodiments of the invention. Although this figure depicts functional steps in a particular sequence for purposes of illustration, the process is not necessarily limited to this particular order or steps. One skilled in the art will appreciate that the various steps portrayed in this figure can be changed, rearranged, performed in parallel or adapted in various ways. Furthermore, it is to be understood that certain steps or sequences of steps can be added to or omitted from this process, without departing from the spirit and scope of the invention.
As illustrated, the diagram shows the interaction between a client and servers using transactional caches. The diagram shows how the version counters are passed along with messages to drive version progression across the cluster.
In this example, the application code on the client 300 issues a put operation. The key passed to the put operation is associated with Server 1 301. Server 1 301 is responsible for storing that data. The message passed to the server contains the version associated with the transaction context from the client (transaction coordinator). On receiving the message the transaction framework on the server will update its own current version to be the max of its version and the received version. This version is returned with the result of the put operation. The client 300 will update the commit version of the transaction context to be the max of its current value and the received value+1.
In accordance with the illustrated example, the application code then issues another put operation. In this instance, Server 2 302 is responsible for storing the data. The message passed to the server 2 302 contains the version associated with the transaction context from the client (transaction coordinator). On receiving the message the server 302 will update its own current version to be the max of its version and the received version. This version is returned with the result of the put operation. The client 300 will update the commit version of the transaction context to be the max of its current value and the received value, plus one.
Storage
In accordance with an embodiment, the transactional cache uses several distributed caches as internal system tables for the storage of natural keys, versioned cache values and commit records. This system of storage allows the transactional framework to store multiple versions of a value for a single key in any transactional cache. This enables transaction isolation and read consistency.
In accordance with an embodiment, when a transactional cache is created, its underlying storage is ensured. Accordingly, the distributed caches that make up the transactional cache's storage are ensured through the associated service. This internal storage is set up on a per transactional cache basis. As such, each transactional cache will have its own set of distributed caches created for its storage.
In accordance with various embodiments, the tables used by the transactional cache include but are not limited to the following tables:
Natural Table
A natural table that maps the natural key (user provided key) to a synthetic key. When a natural key is first used in a transactional cache, a synthetic key is created for it by the transaction framework. This key is unique in the system, exists in a one to one relationship with the natural key and encodes certain information such as the partition number of the natural key.
For the following examples, the synthetic keys are simply shown as “sk” with some numeric identifier appended. The example table below represents the natural table a transactional cache named “txcache1”. In this case, there are three natural keys known to the system that have been associated with system assigned synthetic keys.
$NATURAL-txcache1
Key SyntheticKey
key1 sk001
key2 sk002
key3 sk003
Values Table
The values table maps an transaction id(xid)/synthetic composite key to a value. By using a key that includes both the transaction id and synthetic key, the transaction framework is able to store multiple values for each natural key. Deleted values are represented by a NIL value entry in the cache. In accordance with an embodiment, it may be important to show that a key may be associated with some value in one transaction but deleted in another.
The transaction id is simply a unique identifier for a transaction. For the following examples, the transaction identifiers are simply shown as “tx” with some numeric identifier appended.
The example table below shows the values table the transactional cache named “txcache1”. As shown, there are different values for the same synthetic key. For example, there are three different entries for the synthetic key “sk001” with the values of “A1”, “A2” and NIL, each associated with a different transaction.
$VALUES-txcache1
Xid/SyntheticKey Value
tx001/sk001 A1
tx002/sk001 A2
tx002/sk002 B1
tx003/sk002 B2
tx003/sk003 C1
tx004/sk001 <NIL>
Versions Table
The versions table maps an xid/synthetic key to a committed version. The version is a monotonically increasing value that provides an order to the committed transactions. When a transaction is committed, an entry is added to the versions table for each key that was involved in the transaction. The value is the commit version associated with the transaction being committed.
The example table below shows the versions table the transactional cache named “txcache1”. In this case there are four committed transactions.
$VERSIONS-txcache1
Xid/SyntheticKey Version
tx001/sk001 1
tx002/sk001 5
tx002/sk002 5
tx003/sk002 12
tx003/sk003 12
tx004/sk001 17
Consistent Read
A consistent view of the transactional cache can be obtained by querying the versions table for the maximum version for each unique synthetic key where the version is less than or equal to some given version.
For example, given the above versions table, a consistent read at version 15 would produce the following rows from the versions table:
Xid/SyntheticKey Version
tx002/sk001 5
tx003/sk002 12
tx003/sk003 12
The xid/synthetic keys from these results can be used to obtain the values of the cache at version 15 from the values table. Using the results above would produce the following rows from the values table:
Xid/SyntheticKey Value
tx002/sk001 A2
tx003/sk002 B2
tx003/sk003 C1
The synthetic keys from the above results can be used to obtain the associated natural keys of the transactional cache. The consistent read of the transactional cache named “txcache1” at version 15 would appear as follows:
txcache1
Key Value
key1 A2
key2 B2
key3 C1
Key Partitioning Strategy
In accordance with an embodiment, a single operation to a transactional cache may involve more than one of the related system caches. For example, a put on a transactional cache is invoked against the natural table but causes an insert into the values table. The specialized processors used to perform transactional cache operations may make changes directly against the backing maps of any of the related system tables. This means that all of the entries related to a given natural key may need to be co-located in the same partition.
In accordance with an embodiment, the transaction framework can achieve this by providing a custom key partitioning strategy that makes use of the partition number encoded in each synthetic key which is set to match the partition number of its associated natural key.
The following example shows the state of the transactional cache and its underlying storage as transactions are committed. For simplicity, this example assumes that only a single process is mutating the transactional cache during the course of each transaction.
Transaction 1
The following transaction is a simple put on a transactional cache. The connection is set to auto commit true, so the put is immediately committed.
NamedCache txcache1=connection.getCache(“tx-cache1”);
connection.setAutoCommit(true);
txcache1.put(“key1”, “A1”);
The above put operation results in an association of the value “A1” with the key “key1”. The user of the transactional cache could verify this with a get operation, for example. A read committed view of the transactional cache following the auto commit put operation could be illustrated as the following:
txcache1
Key Value
key1 A1
In accordance with an embodiment, this direct association between key and value does not actually exist in the transactional cache the way it does in a normal distributed cache. In a transactional cache multiple versions of a value may be maintained so the association between key and value must be derived for given a version as described above.
The following tables show a snapshot of how the internal transactional cache storage would look following the auto commit put operation.
After the put operation, the natural table for txcache1 will contain an association between the given key “key1” and some synthetic key value “sk001”.
$NATURAL-txcache1
Key SyntheticKey
key1 sk001
After the put operation the values table for the txcache1 will contain an entry that associates the transaction “tx001” and the synthetic key “sk001” with the given value “A1”.
$VALUES-txcache1
Xid/SyntheticKey Value
tx001/sk001 A1
After the put operation the versions table for the txcache1 will contain an entry that associates the transaction “tx001” and the synthetic key “sk001” with a commit version of 1.
$VERSIONS-txcache1
Xid/SyntheticKey Version
tx001/sk001 1
Transaction 2
The following shows a transaction on a transactional cache that includes multiple put operations. The connection is set to auto commit false, so neither put is committed until commit is called on the connection.
NamedCache txcache1=connection.getCache(“tx-cache1”);
connection.setAutoCommit(false);
txcache1.put(“key1”, “A2”);
txcache1.put(“key2”, “B1”);
connection.commit( );
The above operations result in an association of the value “A2” with the key “key1” and “B1” with the key “key2”. A read committed view of the transactional cache following the commit of Transaction 2 could be illustrated as the following . . .
txcache1
Key Value
key1 A2
key2 B1
After the commit of Transaction 2, the natural table for txcache1 will still contain an entry for the key “key1” as well as a new entry for the given key “key2” and the synthetic key value “sk002”.
$NATURAL-txcache1
Key SyntheticKey
key1 sk001
key2 sk002
After the commit of Transaction 2, the values table for the txcache1 will contain an entries that associate the transaction “tx002” and the synthetic keys with the values passed in the put operations. This example shows that the previous value from Transaction 1 is still maintained in the values table.
$VALUES-txcache1
Xid/SyntheticKey Value
tx001/sk001 A1
tx002/sk001 A2
tx002/sk002 B1
After the commit of Transaction 2, the versions table for txcache1 will contain entries that associate the transaction “tx001” and each synthetic key used in the transaction with a commit version of 5. It should be noted that the commit version value of 5 is not significant other than for the fact that it is monotonically increasing. It should also be noted that there are multiple committed versions for the synthetic key “sk001”.
$VERSIONS-txcache1
Xid/SyntheticKey Version
tx001/sk001 1
tx002/sk001 5
tx002/sk002 5
Transaction 3
The following shows a transaction on a transactional cache that includes multiple put operations and a remove operation. The connection is set to auto commit false, so nothing is committed until commit is called on the connection.
NamedCache txcache1=connection.getCache(“tx-cache1”);
connection.setAutoCommit(false);
txcache1.remove(“key1”);
txcache1.put(“key2”, “B2”);
txcache1.put(“key3”, “C1”);
connection.commit( );
The above operations result in an association of the value “B2” with the key “key2” and “C1” with the key “key3” as well as the removal of the value for the key “key1”. A read committed view of the transactional cache following the commit of Transaction 3 could be illustrated as the following:
txcache1
Key Value
key2 B2
key3 C1
After the commit of Transaction 3, the natural table for txcache1 will still contain an entry for the key “key1” and “key2” as well as a new entry for key “key3” and the synthetic key value “sk003”.
$NATURAL-txcache1
Key SyntheticKey
key1 sk001
key2 sk002
key3 sk003
After the commit of Transaction 3, the values table for txcache1 will contain an entries that associate the transaction “tx003” and the synthetic keys with the values passed in the cache operations. The table below shows that the previous values from Transaction 1 and Transaction 2 are still maintained in the $Values table.
$VALUES-txcache1
Xid/SyntheticKey Value
tx001/sk001 A1
tx002/sk001 A2
tx002/sk002 B1
tx003/sk001 <NIL>
tx003/sk002 B2
tx003/sk003 C1
After the commit of Transaction 3, the versions table for txcache1 will contain entries that associate the transaction “tx003” and each synthetic key used in the transaction with a commit version of 12.
$VERSIONS-txcache1
Xid/SyntheticKey Version
tx001/sk001 1
tx002/sk001 5
tx002/sk002 5
tx003/sk001 12
tx003/sk002 12
tx003/sk003 12
In accordance with an embodiment, the transaction framework further includes an application programming interface (API) that is a connection-based API and provides atomic transaction guarantees across partitions and caches of the data grid even in the event of client failure. The transaction framework includes a set of transactional caches (which are specialized forms of distributed caches), transactional connections, optimistic transaction support, recovery managers and resource adapters. The transaction framework allows the use of the data grid caches in the context of a transaction. This includes most of the NamedCache API's including queries, entry processors, and aggregators. The transaction framework provides ACID guarantees, multiple read isolation levels, and deferred operations. The transaction framework uses transactional connections that represent a logical connection to the data grid.
In accordance with an embodiment, the transaction framework includes a transactional connection to the data grid. The transactional connection represents a logical connection to the data grid. It serves as a factory for transactional caches and provides transaction demarcation, such as auto-commit mode, commit and rollback. In accordance with a embodiment, the transactional connection allows the user to set an isolation level. The isolation level includes the following levels:
Read Committed—guarantees that only committed data is visible and does not provide any consistency guarantees (this is a default isolation level). Generally, this is the weakest isolation level and provides the highest performance.
Statement Consistent Read—provides statement-scoped read consistency. It guarantees that all the data read by a single operation comes from a single point in time when the statement began execution.
Statement Monotonic Consistent Read—same as the statement consistent read, but guarantees that all statements must be monotonic, meaning that a read is guaranteed to return a version that is greater or equal to a version that was previously encountered while using the connection.
Transaction Consistent Read—provides transaction-scoped read consistency. It guarantees that all the data read in a transaction comes from a single point in time when the transaction began.
Transaction Monotonic Consistent Read—same as the transaction consistent read but guarantees that all reads are monotonic.
In accordance with an embodiment, the transactional connection can function in an eager-mode or a non-eager mode. When in eager mode, every operation is immediately flushed to the grid. In non-eager mode, the flush of the operation can be deferred which can provide certain performance advantages since some of the operations may be batched when they are flushed.
Throughout the various contexts described in this disclosure, the embodiments of the invention further encompass computer apparatus, computing systems and machine-readable media configured to carry out the foregoing systems and methods. In addition to an embodiment consisting of specifically designed integrated circuits or other electronics, the present invention may be conveniently implemented using a conventional general purpose or a specialized digital computer or microprocessor programmed according to the teachings of the present disclosure, as will be apparent to those skilled in the computer art.
Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art. The invention may also be implemented by the preparation of application specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as will be readily apparent to those skilled in the art.
The various embodiments include a computer program product which is a storage medium (media) having instructions stored thereon/in which can be used to program a general purpose or specialized computing processor(s)/device(s) to perform any of the features presented herein. The storage medium can include, but is not limited to, one or more of the following: any type of physical media including floppy disks, optical discs, DVDs, CD-ROMs, microdrives, magneto-optical disks, holographic storage, ROMs, RAMs, PRAMS, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs); paper or paper-based media; and any type of media or device suitable for storing instructions and/or information. The computer program product can be transmitted in whole or in parts and over one or more public and/or private networks wherein the transmission includes instructions which can be used by one or more processors to perform any of the features presented herein. The transmission may include a plurality of separate transmissions. In accordance with certain embodiments, however, the computer storage medium containing the instructions is non-transitory (i.e. not in the process of being transmitted) but rather is persisted on a physical device.
The foregoing description of the preferred embodiments of the present invention has been provided for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations can be apparent to the practitioner skilled in the art. Embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the relevant art to understand the invention. It is intended that the scope of the invention be defined by the following claims and their equivalents.

Claims (18)

What is claimed is:
1. A system for providing a transaction framework in a distributed data grid, said system comprising:
a cluster of server nodes that store and manage data, said data being modified using a set of transactions; and
a client that establishes a connection to the cluster and maintains a commit version having a commit version value for each transaction of said set of transactions;
wherein the cluster of server nodes employs a set of distributed caches as internal system tables for storing natural keys, synthetic keys, and versioned cache values;
wherein said internal system tables include a natural table that maps a natural key associated with a transaction to one or more synthetic key associated with the transaction, whereby, by using the natural key and one or more synthetic keys associated with the transaction in combination, the set of distributed caches are enabled to store a plurality of versioned cache values simultaneously for each transaction;
wherein each server node in the cluster maintains for each transaction on the server node a current version having a current version value that is updated as messages are received from the client;
wherein when the client sends a message to a receiving server node of the cluster of server nodes, the client includes with the message a commit version value associated with the transaction;
wherein upon receiving the message, the receiving server node sets the current version maintained by the receiving server node to the maximum of the current version value being stored in said receiving server node and the associated transaction's commit version value received in the message from the client, and sends a return message to the client, said return message including the current version value set by the receiving server node; and
wherein upon receiving the return message, the client sets the commit version to a commit version value greater than the maximum of the commit version value stored at the client and the current version value received from the receiving server node in the return message.
2. The system of claim 1, wherein the cluster of server nodes further maintain a consistent read version that provides transaction scoped read consistency.
3. The system of claim 1, wherein said commit version is updated by the client over a course of the transaction.
4. The system of claim 3, wherein when a mutating operation occurs in a context of the transaction, the commit version is updated based on versions returned from the nodes in the cluster.
5. The system of claim 1, wherein each synthetic key encodes a partition number of an associated natural key.
6. The system of claim 1, wherein said internal system tables include a values table that maps a transaction identifier and synthetic key to a versioned cache value.
7. A method for providing a transaction framework in a distributed data grid, said method comprising:
storing a set of data on a cluster of server nodes, said data being modified using a set of transactions;
maintaining a commit version for each transaction on a client that establishes a connection to the cluster;
maintaining a set of distributed caches on set server nodes as internal system tables for storing natural keys, synthetic keys, and versioned cache values, wherein said internal system tables include a natural table that maps a natural key associated with a transaction to one or more synthetic key associated with the transaction, whereby, by using the natural key and one or more synthetic keys associated with the transaction in combination, the set of distributed caches are enabled to store a plurality of versioned cache values simultaneously for each transaction;
maintaining on each server node in the cluster of server nodes and for each transaction on said each server node a current version having a current version value, said current version value being updated as messages are received from the client to the node;
sending a message including an associated transaction's current version value from the client to receiving server node of the cluster of server nodes;
receiving the message at the receiving server node, wherein the receiving server node sets the current version value stored in the receiving server node to be the maximum of the current value being stored in said receiving node and the current version value received from the client; and
sending a return message from the receiving server node to the client, said return message including the current version value set by the receiving server node; and
wherein upon receiving the return message, the client sets the commit version to a commit version value greater than the maximum of the commit version value stored at the client and the current version value received from the receiving server node in with the return message.
8. The method of claim 7, wherein the cluster of server nodes further maintain a consistent read version that provides transaction scoped read consistency.
9. The method of claim 7, wherein said commit version is updated by the client over a course of the transaction.
10. The method of claim 9, wherein when a mutating operation occurs in a context of the transaction, the commit version is updated based on versions returned from the nodes in the cluster.
11. The method of claim 7, wherein the current version is maintained at each node in the cluster and is updated in the node's local state each time a message is received by the node, such that the current version is a maximum of the node's current version and a value received from the client.
12. The method of claim 7, wherein said internal system tables include a values table that maps a transaction identifier and synthetic key to a value.
13. A non-transitory computer readable medium including instructions stored thereon for providing a transaction framework in a distributed data grid, which instructions, when executed, cause a system to perform steps comprising:
storing a set of data on a cluster of server nodes, said data being modified using a set of transactions;
maintaining a commit version for each transaction on a client that establishes a connection to the cluster;
maintaining a set of distributed caches on set server nodes as internal system tables for storing natural keys, synthetic keys, and versioned cache values, wherein said internal system tables include a natural table that maps a natural key associated with a transaction to one or more synthetic key associated with the transaction, whereby, by using the natural key and one or more synthetic keys associated with the transaction in combination, the set of distributed caches are enabled to store a plurality of versioned cache values simultaneously for each transaction;
maintaining on each server node in the cluster of server nodes and for each transaction on said each server node a current version having a current version value, said current version value being updated as messages are received from the client to the node;
sending a message including an associated transaction's current version value from the client to receiving server node of the cluster of server nodes;
receiving the message at the receiving server node, wherein the receiving server node sets the current version value stored in the receiving server node to be the maximum of the current value being stored in said receiving node and the current version value received from the client; and
sending a return message from the receiving server node to the client, said return message including the current version value set by the receiving server node; and
wherein upon receiving the return message, the client sets the commit version to a commit version value greater than the maximum of the commit version value stored at the client and the current version value received from the receiving server node in with the return message.
14. The non-transitory computer readable medium of claim 13, wherein the cluster of server nodes further maintain a consistent read version that provides transaction scoped read consistency.
15. The non-transitory computer readable medium of claim 13, wherein said commit version is updated by the client over a course of the transaction.
16. The non-transitory computer readable medium of claim 13, wherein when a mutating operation occurs in a context of the transaction, the commit version is updated based on versions returned from the nodes in the cluster.
17. The non-transitory computer readable medium of claim 13, wherein the current version is maintained at each node in the cluster and is updated in the node's local state each time a message is received by the node, such that the current version is a maximum of the node's current version and a value received from the client.
18. The non-transitory computer readable medium of claim 13, wherein said internal system tables include a values table that maps a transaction identifier and synthetic key to a value.
US13/359,375 2011-01-28 2012-01-26 Transactional cache versioning and storage in a distributed data grid Active 2034-02-08 US9201685B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/359,375 US9201685B2 (en) 2011-01-28 2012-01-26 Transactional cache versioning and storage in a distributed data grid

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US201161437542P 2011-01-28 2011-01-28
US201161437546P 2011-01-28 2011-01-28
US201161437550P 2011-01-28 2011-01-28
US201161437541P 2011-01-28 2011-01-28
US201161437536P 2011-01-28 2011-01-28
US201161437532P 2011-01-28 2011-01-28
US201161437521P 2011-01-28 2011-01-28
US201161437554P 2011-01-28 2011-01-28
US13/359,375 US9201685B2 (en) 2011-01-28 2012-01-26 Transactional cache versioning and storage in a distributed data grid

Publications (2)

Publication Number Publication Date
US20120197994A1 US20120197994A1 (en) 2012-08-02
US9201685B2 true US9201685B2 (en) 2015-12-01

Family

ID=46578287

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/359,375 Active 2034-02-08 US9201685B2 (en) 2011-01-28 2012-01-26 Transactional cache versioning and storage in a distributed data grid

Country Status (1)

Country Link
US (1) US9201685B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11468045B2 (en) * 2020-04-17 2022-10-11 Microsoft Technology Licensing, Llc Transactional support for non-relational database

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9110940B2 (en) 2012-02-29 2015-08-18 Red Hat, Inc. Supporting transactions in distributed environments using a local copy of remote transaction data and optimistic locking
US10019258B2 (en) 2014-04-29 2018-07-10 Hewlett Packard Enterprise Development Lp Hardware assisted software versioning of clustered applications
US10496618B2 (en) 2017-02-07 2019-12-03 Red Hat, Inc. Managing data replication in a data grid
US11892992B2 (en) * 2022-01-31 2024-02-06 Salesforce, Inc. Unique identification management

Citations (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5784569A (en) 1996-09-23 1998-07-21 Silicon Graphics, Inc. Guaranteed bandwidth allocation method in a computer system for input/output data transfers
US5940367A (en) 1996-11-06 1999-08-17 Pluris, Inc. Fault-tolerant butterfly switch
US5991894A (en) 1997-06-06 1999-11-23 The Chinese University Of Hong Kong Progressive redundancy transmission
US5999712A (en) 1997-10-21 1999-12-07 Sun Microsystems, Inc. Determining cluster membership in a distributed computer system
US6182139B1 (en) 1996-08-05 2001-01-30 Resonate Inc. Client-side resource-based load-balancing with delayed-resource-binding using TCP state migration to WWW server farm
US20020035559A1 (en) 2000-06-26 2002-03-21 Crowe William L. System and method for a decision engine and architecture for providing high-performance data querying operations
US6377993B1 (en) 1997-09-26 2002-04-23 Mci Worldcom, Inc. Integrated proxy interface for web based data management reports
US20020073223A1 (en) 1998-09-28 2002-06-13 Raytheon Company, A Delaware Corporation Method and system for scheduling network communication
US20020078312A1 (en) 2000-12-15 2002-06-20 International Business Machines Corporation Support for single-node quorum in a two-node nodeset for a shared disk parallel file system
US6487622B1 (en) 1999-10-28 2002-11-26 Ncr Corporation Quorum arbitrator for a high availability system
US20030023898A1 (en) * 2001-07-16 2003-01-30 Jacobs Dean Bernard Layered architecture for data replication
US20030046286A1 (en) * 2001-08-30 2003-03-06 Jacobs Dean Bernard Cluster caching with concurrency checking
US20030120715A1 (en) 2001-12-20 2003-06-26 International Business Machines Corporation Dynamic quorum adjustment
US20030187927A1 (en) 2002-02-22 2003-10-02 Winchell David F. Clustering infrastructure system and method
US20030191795A1 (en) 2002-02-04 2003-10-09 James Bernardin Adaptive scheduling
US6693874B1 (en) 1999-05-26 2004-02-17 Siemens Information & Communication Networks, Inc. System and method for enabling fault tolerant H.323 systems
US6714979B1 (en) 1997-09-26 2004-03-30 Worldcom, Inc. Data warehousing infrastructure for web based reporting tool
US20040179471A1 (en) 2001-03-07 2004-09-16 Adisak Mekkittikul Bi-directional flow-switched ring
US20040205148A1 (en) 2003-02-13 2004-10-14 International Business Machines Corporation Method for operating a computer cluster
US20040267897A1 (en) 2003-06-24 2004-12-30 Sychron Inc. Distributed System Providing Scalable Methodology for Real-Time Control of Server Pools and Data Centers
US20050021737A1 (en) 2003-05-01 2005-01-27 Ellison Carl M. Liveness protocol
US20050083834A1 (en) 2003-10-17 2005-04-21 Microsoft Corporation Method for providing guaranteed distributed failure notification
US20050097185A1 (en) 2003-10-07 2005-05-05 Simon Gibson Localization link system
US20050138460A1 (en) 2003-11-19 2005-06-23 International Business Machines Corporation Error recovery in a client/server application using two independent sockets for communication
US20050193056A1 (en) 2002-12-26 2005-09-01 Schaefer Diane E. Message transfer using multiplexed connections in an open system interconnection transaction processing environment
US20060095573A1 (en) 2004-11-01 2006-05-04 Microsoft Corporation Delayed HTTP response
US7139925B2 (en) 2002-04-29 2006-11-21 Sun Microsystems, Inc. System and method for dynamic cluster adjustment to node failures in a distributed data system
US20070016822A1 (en) 2005-07-15 2007-01-18 Rao Sudhir G Policy-based, cluster-application-defined quorum with generic support interface for cluster managers in a shared storage environment
US20070118693A1 (en) 2005-11-19 2007-05-24 International Business Machines Cor Method, apparatus and computer program product for cache restoration in a storage system
US20070140110A1 (en) 2005-12-21 2007-06-21 Microsoft Corporation Peer communities
US20070174160A1 (en) 2003-04-29 2007-07-26 Solberg Eric L Hierarchical transaction filtering
US7266822B1 (en) 2002-08-14 2007-09-04 Sun Microsystems, Inc. System and method for controlling and managing computer farms
US20070237072A1 (en) 2006-04-07 2007-10-11 Sbc Knowledge Ventures, L.P. Resilient ip ring protocol and architecture
US20070260714A1 (en) 2006-03-30 2007-11-08 International Business Machines Asynchronous interconnect protocol for a clustered dbms
US20070271584A1 (en) 2006-05-16 2007-11-22 Microsoft Corporation System for submitting and processing content including content for on-line media console
US7328237B1 (en) 2002-07-25 2008-02-05 Cisco Technology, Inc. Technique for improving load balancing of traffic in a data network using source-side related information
US7376953B2 (en) 2001-10-29 2008-05-20 Hewlett-Packard Development Company, L.P. Apparatus and method for routing a transaction to a server
US20080183876A1 (en) 2007-01-31 2008-07-31 Sun Microsystems, Inc. Method and system for load balancing
US20080276231A1 (en) 2007-05-03 2008-11-06 Yunwu Huang Method and apparatus for dependency injection by static code generation
US20080281959A1 (en) 2007-05-10 2008-11-13 Alan Robertson Managing addition and removal of nodes in a network
US7464378B1 (en) 2003-12-04 2008-12-09 Symantec Operating Corporation System and method for allowing multiple sub-clusters to survive a cluster partition
US7543046B1 (en) 2008-05-30 2009-06-02 International Business Machines Corporation Method for managing cluster node-specific quorum roles
US20090265449A1 (en) 2008-04-22 2009-10-22 Hewlett-Packard Development Company, L.P. Method of Computer Clustering
US20090320005A1 (en) 2008-06-04 2009-12-24 Microsoft Corporation Controlling parallelization of recursion using pluggable policies
US7698390B1 (en) 2005-03-29 2010-04-13 Oracle America, Inc. Pluggable device specific components and interfaces supported by cluster devices and systems and methods for implementing the same
US7720971B2 (en) 2005-09-12 2010-05-18 Microsoft Corporation Arbitrating an appropriate back-end server to receive channels of a client session
US20100128732A1 (en) 2007-04-25 2010-05-27 Yamatake Corporation Rstp processing system
US7739677B1 (en) 2005-05-27 2010-06-15 Symantec Operating Corporation System and method to prevent data corruption due to split brain in shared data clusters
US20100211931A1 (en) * 2009-02-13 2010-08-19 Microsoft Corporation Stm with global version overflow handling
US7792977B1 (en) 2001-02-28 2010-09-07 Oracle International Corporation Method for fencing shared resources from cluster nodes
US7814248B2 (en) 2006-12-07 2010-10-12 Integrated Device Technology, Inc. Common access ring/sub-ring system
US20100312861A1 (en) 2007-11-30 2010-12-09 Johan Kolhi Method, network, and node for distributing electronic content in a content distribution network
US20110041006A1 (en) 2009-08-12 2011-02-17 New Technology/Enterprise Limited Distributed transaction processing
US20110107135A1 (en) 2009-11-02 2011-05-05 International Business Machines Corporation Intelligent rolling upgrade for data storage systems
US7953861B2 (en) 2006-08-10 2011-05-31 International Business Machines Corporation Managing session state for web applications
US20110161289A1 (en) 2009-12-30 2011-06-30 Verisign, Inc. Data Replication Across Enterprise Boundaries
US20110179231A1 (en) 2010-01-21 2011-07-21 Sun Microsystems, Inc. System and method for controlling access to shared storage device
US20110249552A1 (en) 2008-04-11 2011-10-13 Stokes Olen L Redundant ethernet automatic protection switching access to virtual private lan services
US20120117157A1 (en) 2010-11-09 2012-05-10 Ristock Herbert W A System for Determining Presence of and Authorizing a Quorum to Transact Business over a Network
US20120158650A1 (en) 2010-12-16 2012-06-21 Sybase, Inc. Distributed data cache database architecture
US8209307B2 (en) 2009-03-31 2012-06-26 Commvault Systems, Inc. Systems and methods for data migration in a clustered file system
US20120215740A1 (en) 2010-11-16 2012-08-23 Jean-Luc Vaillant Middleware data log system
US8312439B2 (en) 2005-02-18 2012-11-13 International Business Machines Corporation Inlining native functions into compiled java code

Patent Citations (68)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182139B1 (en) 1996-08-05 2001-01-30 Resonate Inc. Client-side resource-based load-balancing with delayed-resource-binding using TCP state migration to WWW server farm
US5784569A (en) 1996-09-23 1998-07-21 Silicon Graphics, Inc. Guaranteed bandwidth allocation method in a computer system for input/output data transfers
US5940367A (en) 1996-11-06 1999-08-17 Pluris, Inc. Fault-tolerant butterfly switch
US5991894A (en) 1997-06-06 1999-11-23 The Chinese University Of Hong Kong Progressive redundancy transmission
US7114083B2 (en) 1997-09-26 2006-09-26 Mci, Inc. Secure server architecture for web based data management
US6631402B1 (en) 1997-09-26 2003-10-07 Worldcom, Inc. Integrated proxy interface for web based report requester tool set
US6377993B1 (en) 1997-09-26 2002-04-23 Mci Worldcom, Inc. Integrated proxy interface for web based data management reports
US6490620B1 (en) 1997-09-26 2002-12-03 Worldcom, Inc. Integrated proxy interface for web based broadband telecommunications management
US6714979B1 (en) 1997-09-26 2004-03-30 Worldcom, Inc. Data warehousing infrastructure for web based reporting tool
US6615258B1 (en) 1997-09-26 2003-09-02 Worldcom, Inc. Integrated customer interface for web based data management
US6968571B2 (en) 1997-09-26 2005-11-22 Mci, Inc. Secure customer interface for web based data management
US5999712A (en) 1997-10-21 1999-12-07 Sun Microsystems, Inc. Determining cluster membership in a distributed computer system
US20020073223A1 (en) 1998-09-28 2002-06-13 Raytheon Company, A Delaware Corporation Method and system for scheduling network communication
US6693874B1 (en) 1999-05-26 2004-02-17 Siemens Information & Communication Networks, Inc. System and method for enabling fault tolerant H.323 systems
US6487622B1 (en) 1999-10-28 2002-11-26 Ncr Corporation Quorum arbitrator for a high availability system
US20020035559A1 (en) 2000-06-26 2002-03-21 Crowe William L. System and method for a decision engine and architecture for providing high-performance data querying operations
US20020078312A1 (en) 2000-12-15 2002-06-20 International Business Machines Corporation Support for single-node quorum in a two-node nodeset for a shared disk parallel file system
US7792977B1 (en) 2001-02-28 2010-09-07 Oracle International Corporation Method for fencing shared resources from cluster nodes
US20040179471A1 (en) 2001-03-07 2004-09-16 Adisak Mekkittikul Bi-directional flow-switched ring
US20030023898A1 (en) * 2001-07-16 2003-01-30 Jacobs Dean Bernard Layered architecture for data replication
US20030046286A1 (en) * 2001-08-30 2003-03-06 Jacobs Dean Bernard Cluster caching with concurrency checking
US7376953B2 (en) 2001-10-29 2008-05-20 Hewlett-Packard Development Company, L.P. Apparatus and method for routing a transaction to a server
US20030120715A1 (en) 2001-12-20 2003-06-26 International Business Machines Corporation Dynamic quorum adjustment
US20030191795A1 (en) 2002-02-04 2003-10-09 James Bernardin Adaptive scheduling
US20030187927A1 (en) 2002-02-22 2003-10-02 Winchell David F. Clustering infrastructure system and method
US7139925B2 (en) 2002-04-29 2006-11-21 Sun Microsystems, Inc. System and method for dynamic cluster adjustment to node failures in a distributed data system
US7328237B1 (en) 2002-07-25 2008-02-05 Cisco Technology, Inc. Technique for improving load balancing of traffic in a data network using source-side related information
US7266822B1 (en) 2002-08-14 2007-09-04 Sun Microsystems, Inc. System and method for controlling and managing computer farms
US20050193056A1 (en) 2002-12-26 2005-09-01 Schaefer Diane E. Message transfer using multiplexed connections in an open system interconnection transaction processing environment
US20040205148A1 (en) 2003-02-13 2004-10-14 International Business Machines Corporation Method for operating a computer cluster
US20070174160A1 (en) 2003-04-29 2007-07-26 Solberg Eric L Hierarchical transaction filtering
US20050021737A1 (en) 2003-05-01 2005-01-27 Ellison Carl M. Liveness protocol
US20040267897A1 (en) 2003-06-24 2004-12-30 Sychron Inc. Distributed System Providing Scalable Methodology for Real-Time Control of Server Pools and Data Centers
US20050097185A1 (en) 2003-10-07 2005-05-05 Simon Gibson Localization link system
US20050083834A1 (en) 2003-10-17 2005-04-21 Microsoft Corporation Method for providing guaranteed distributed failure notification
US20050138460A1 (en) 2003-11-19 2005-06-23 International Business Machines Corporation Error recovery in a client/server application using two independent sockets for communication
US7464378B1 (en) 2003-12-04 2008-12-09 Symantec Operating Corporation System and method for allowing multiple sub-clusters to survive a cluster partition
US20060095573A1 (en) 2004-11-01 2006-05-04 Microsoft Corporation Delayed HTTP response
US8312439B2 (en) 2005-02-18 2012-11-13 International Business Machines Corporation Inlining native functions into compiled java code
US7698390B1 (en) 2005-03-29 2010-04-13 Oracle America, Inc. Pluggable device specific components and interfaces supported by cluster devices and systems and methods for implementing the same
US7739677B1 (en) 2005-05-27 2010-06-15 Symantec Operating Corporation System and method to prevent data corruption due to split brain in shared data clusters
US20070016822A1 (en) 2005-07-15 2007-01-18 Rao Sudhir G Policy-based, cluster-application-defined quorum with generic support interface for cluster managers in a shared storage environment
US7720971B2 (en) 2005-09-12 2010-05-18 Microsoft Corporation Arbitrating an appropriate back-end server to receive channels of a client session
US20070118693A1 (en) 2005-11-19 2007-05-24 International Business Machines Cor Method, apparatus and computer program product for cache restoration in a storage system
US20070140110A1 (en) 2005-12-21 2007-06-21 Microsoft Corporation Peer communities
US20070260714A1 (en) 2006-03-30 2007-11-08 International Business Machines Asynchronous interconnect protocol for a clustered dbms
US20070237072A1 (en) 2006-04-07 2007-10-11 Sbc Knowledge Ventures, L.P. Resilient ip ring protocol and architecture
US20070271584A1 (en) 2006-05-16 2007-11-22 Microsoft Corporation System for submitting and processing content including content for on-line media console
US7953861B2 (en) 2006-08-10 2011-05-31 International Business Machines Corporation Managing session state for web applications
US7814248B2 (en) 2006-12-07 2010-10-12 Integrated Device Technology, Inc. Common access ring/sub-ring system
US20080183876A1 (en) 2007-01-31 2008-07-31 Sun Microsystems, Inc. Method and system for load balancing
US20100128732A1 (en) 2007-04-25 2010-05-27 Yamatake Corporation Rstp processing system
US20080276231A1 (en) 2007-05-03 2008-11-06 Yunwu Huang Method and apparatus for dependency injection by static code generation
US20080281959A1 (en) 2007-05-10 2008-11-13 Alan Robertson Managing addition and removal of nodes in a network
US20100312861A1 (en) 2007-11-30 2010-12-09 Johan Kolhi Method, network, and node for distributing electronic content in a content distribution network
US20110249552A1 (en) 2008-04-11 2011-10-13 Stokes Olen L Redundant ethernet automatic protection switching access to virtual private lan services
US20090265449A1 (en) 2008-04-22 2009-10-22 Hewlett-Packard Development Company, L.P. Method of Computer Clustering
US7543046B1 (en) 2008-05-30 2009-06-02 International Business Machines Corporation Method for managing cluster node-specific quorum roles
US20090320005A1 (en) 2008-06-04 2009-12-24 Microsoft Corporation Controlling parallelization of recursion using pluggable policies
US20100211931A1 (en) * 2009-02-13 2010-08-19 Microsoft Corporation Stm with global version overflow handling
US8209307B2 (en) 2009-03-31 2012-06-26 Commvault Systems, Inc. Systems and methods for data migration in a clustered file system
US20110041006A1 (en) 2009-08-12 2011-02-17 New Technology/Enterprise Limited Distributed transaction processing
US20110107135A1 (en) 2009-11-02 2011-05-05 International Business Machines Corporation Intelligent rolling upgrade for data storage systems
US20110161289A1 (en) 2009-12-30 2011-06-30 Verisign, Inc. Data Replication Across Enterprise Boundaries
US20110179231A1 (en) 2010-01-21 2011-07-21 Sun Microsystems, Inc. System and method for controlling access to shared storage device
US20120117157A1 (en) 2010-11-09 2012-05-10 Ristock Herbert W A System for Determining Presence of and Authorizing a Quorum to Transact Business over a Network
US20120215740A1 (en) 2010-11-16 2012-08-23 Jean-Luc Vaillant Middleware data log system
US20120158650A1 (en) 2010-12-16 2012-06-21 Sybase, Inc. Distributed data cache database architecture

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11468045B2 (en) * 2020-04-17 2022-10-11 Microsoft Technology Licensing, Llc Transactional support for non-relational database

Also Published As

Publication number Publication date
US20120197994A1 (en) 2012-08-02

Similar Documents

Publication Publication Date Title
Taft et al. Cockroachdb: The resilient geo-distributed sql database
CN109739935B (en) Data reading method and device, electronic equipment and storage medium
US9740582B2 (en) System and method of failover recovery
US10503699B2 (en) Metadata synchronization in a distrubuted database
Corbett et al. Spanner: Google’s globally distributed database
CN104793988B (en) The implementation method and device of integration across database distributed transaction
CN107070919B (en) Idempotency for database transactions
US10430298B2 (en) Versatile in-memory database recovery using logical log records
US6873995B2 (en) Method, system, and program product for transaction management in a distributed content management application
US8954391B2 (en) System and method for supporting transient partition consistency in a distributed data grid
CN105814544B (en) System and method for supporting persistent partition recovery in a distributed data grid
JP7389793B2 (en) Methods, devices, and systems for real-time checking of data consistency in distributed heterogeneous storage systems
US9201685B2 (en) Transactional cache versioning and storage in a distributed data grid
CN110402429B (en) Copying storage tables for managing cloud-based resources to withstand storage account interruptions
Dey et al. Scalable distributed transactions across heterogeneous stores
EP3377970B1 (en) Multi-version removal manager
EP3794458B1 (en) System and method for a distributed database
Pankowski Consistency and availability of Data in replicated NoSQL databases
WO2021022396A1 (en) Transaction processing for database distributed across regions
Padhye Transaction and data consistency models for cloud applications
Lev-Ari et al. Quick: a queuing system in cloudkit
Olmsted et al. Buddy System: Available, Consistent, Durable Web Service Transactions
Chou et al. Oracle timesten scaleout: a new scale-out in-memory database architecture for extreme oltp
Grov et al. Scalable and fully consistent transactions in the cloud through hierarchical validation
Pandey et al. Persisting the AntidoteDB Cache: Design and Implementation of a Cache for a CRDT Datastore

Legal Events

Date Code Title Description
AS Assignment

Owner name: ORACLE INTERNATIONAL CORPORATION, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BEERBOWER, TOM;SPEIDEL, JOHN P.;PURDY, JONATHAN;REEL/FRAME:027690/0076

Effective date: 20120120

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8