WO2004111849A2 - Method, system and article of manufacture for remote copying of data - Google Patents

Method, system and article of manufacture for remote copying of data Download PDF

Info

Publication number
WO2004111849A2
WO2004111849A2 PCT/EP2004/051115 EP2004051115W WO2004111849A2 WO 2004111849 A2 WO2004111849 A2 WO 2004111849A2 EP 2004051115 W EP2004051115 W EP 2004051115W WO 2004111849 A2 WO2004111849 A2 WO 2004111849A2
Authority
WO
WIPO (PCT)
Prior art keywords
storage
storage unit
control unit
data
storage control
Prior art date
Application number
PCT/EP2004/051115
Other languages
French (fr)
Other versions
WO2004111849A3 (en
Inventor
Warren Stanley
William Frank Micka
Gail Andrea Spear
Sam Clark Warner
Olympia Gluck
Michael Factor
Robert Francis Bartfai
Original Assignee
International Business Machines Corporation
Ibm United Kingdom Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corporation, Ibm United Kingdom Limited filed Critical International Business Machines Corporation
Publication of WO2004111849A2 publication Critical patent/WO2004111849A2/en
Publication of WO2004111849A3 publication Critical patent/WO2004111849A3/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2058Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using more than 2 mirrored copies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2064Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring while ensuring consistency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2056Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring
    • G06F11/2071Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant by mirroring using a plurality of controllers

Definitions

  • the present invention relates to a method, system, and an article of manufacture for remote copying of data.
  • Information technology systems may need protection from site disasters or outages, where outages may be planned or unplanned. Furthermore, information technology systems may require features for data migration, data backup, or data duplication. Implementations for disaster or outage recovery, data migration, data backup, and data duplication may include mirroring or copying of data in storage systems. Such mirroring or copying of data may involve interactions among hosts, storage systems and connecting networking components of the information technology system.
  • An enterprise storage server such as the IBM* TotalStorage Enterprise
  • Storage Server* may be a disk storage server that includes one or more processors coupled to storage devices, including high capacity scalable storage devices, Redundant Array of Independent Disks (RAID), etc.
  • the enterprise storage servers are connected to a network and include features for copying data in storage systems.
  • PeertoPeer Remote Copy is an ESS function that allows the shadowing of application system data from a first site to a second site.
  • the first site may be referred to as an application site, a local site, or a primary site.
  • the second site may be referred to as a recovery site, a remote site, or a secondary site.
  • the logical volumes that hold the data in the ESS at the local site are called local volumes, and the corresponding logical volumes that hold the mirrored data at the remote site are called remote volumes.
  • High speed links, such as ESCON links may connect the local and remote ESS systems.
  • synchronous PPRC In the synchronous type of operation for PPRC, i.e., synchronous PPRC, the updates done by a host application to the local volumes at the local site are synchronously shadowed onto the remote volumes at the remote site.
  • synchronous PPRC is a synchronous copying solution
  • write updates are ensured on both copies (local and remote) before the write is considered to be completed for the host application.
  • the host application does not get the Awrite complete® condition until the update is synchronously done in both the local and the remote volumes. Therefore, from the perspective of the host application the data at the remote volumes at the remote site is equivalent to the data at the local volumes at the local site.
  • Synchronous PPRC increases the response time as compared to an asynchronous copy operation, and this is inherent to the synchronous operation.
  • the overhead comes from the additional steps that are executed before the write operation is signaled as completed to the host application.
  • the PPRC activity between the local site and the remote site may be comprised of signals that travel through the links that connect the sites, and the overhead on the response time of the host application write operations will increase proportionally with the distance between the sites. Therefore, the distance affects a host applications write response time.
  • PPRC In the Extended Distance PPRC (also referred to as PPRC Extended Distance) method of operation, PPRC mirrors the updates of the local volume onto the remote volumes in an asynchronous manner, while the host application is running.
  • PPRC Extended Distance also referred to as PPRC Extended Distance
  • the first storage unit is a first storage volume in a first storage control unit, wherein the second storage unit is a second storage volume in a second storage control unit, wherein the third storage unit is a third storage volume in a third storage control unit, and wherein the third storage control unit is beyond a synchronous communication distance to the first storage control unit.
  • the first storage unit is a storage control unit, wherein the second storage unit is a second storage control unit, wherein the third storage unit is a third storage control unit, wherein the second storage unit is within a synchronous communication distance to the first storage unit, and wherein the third storage unit is beyond the synchronous communication distance to the second storage unit.
  • the copied data is received at the third storage unit.
  • the data received from the second storage unit is asynchronously copied at the third storage unit.
  • a write request is received at the first storage unit from a host application coupled to a host system.
  • Data corresponding to the write request is written at a cache and a non-volatile storage coupled to the first storage unit.
  • the data corresponding to the write request is sent synchronously to the second storage unit.
  • a response, indicative of the write request being completed, from the second storage unit is received at the first storage unit. An indication is sent to the host application that the write request is completed.
  • the data is copied to a cache and a non-volatile storage at the second storage unit.
  • the data is marked as modified at the second storage unit.
  • a response is sent from the second storage unit to the first storage unit, wherein the response indicates that the received data has been copied at the second storage unit.
  • sending the copied data further comprises sending the modified data asynchronously from the second storage unit to the third storage unit.
  • the first storage unit is coupled to a host that sends Input/Output requests to the first storage unit, wherein an update from the host at the first storage unit is synchronously reflected at the second storage unit and asynchronously reflected at the third storage unit.
  • a host application that performs I/O with the first storage unit completes a write operation faster hi comparison to the time taken if an update corresponding to the write operation were copied synchronously to the third storage unit.
  • Optionally recovering from a disaster at the first storage unit is performed by substituting replicated data from the second or third storage units.
  • additional storage units are cascaded to the first, second, and third storage units, wherein the additional storage units may communicate synchronously or asynchronously.
  • functions of the first and second storage units are integrated into a single storage control unit.
  • an embodiment of the invention may be used to create a long distance disaster recovery solution by copying synchronously from a local storage control unit to an intermediate storage control unit, and in parallel copying asynchronously from the intermediate storage control unit to a remote storage control unit.
  • FlG. 1 illustrates a block diagram of a computing environment, in accordance with certain described aspects of the invention
  • FIG.2 illustrates a block diagram of a cascading copy application, in accordance with certain described implementations of the invention
  • FIG. 3 illustrates logic implemented in a local storage control unit, in accordance with certain described implementations of the invention
  • FIG.4 illustrates logic for receiving data synchronously as implemented in an intermediate storage control unit, in accordance with certain described implementations of the invention
  • FIG. 5 illustrates logic for copying data asynchronously as implemented in the intermediate storage control unit, in accordance with certain described implementations of the invention.
  • FIG. 6 illustrates a block diagram of a computer architecture in which certain described aspects of the invention are implemented.
  • FIG. 1 illustrates a computing environment utilizing three storage control units, such as a local storage control unit 100, an intermediate storage control unit 102, and a remote storage control unit 104 connected by data interface channels 106, 108, such as, the Enterprise System Connection (ESCON)* channel or any other data interface mechanism known in the art (e.g., fibre channel, Storage Area Network (SAN) interconnections, etc.).
  • ESCON Enterprise System Connection
  • SAN Storage Area Network
  • the three storage control units 100, 102, 104 may be at three different sites with the local storage control unit 100 and the intermediate storage control unit 102 being within a synchronous communication distance of each other.
  • the synchronous communication distance between two storage control units is the distance up to which synchronous communication is feasible between the two storage control units.
  • the remote storage control unit 104 may be a long distance away from the intermediate storage control unit 102 and the local storage control unit 100, such that, synchronous copying of data from the intermediate storage control unit 102 to the remote storage control unit 104 may be time consuming or impractical.
  • the intermediate storage control unit 102 may be hi a secure environment separated from the local storage control unit 100 and with separate power to reduce the possibility of an outage affecting both the local storage control unit 100 and the intermediate storage control unit 102.
  • Certain implementations of the invention create a three site (local, intermediate, remote) disaster recovery solution where there may be no data loss if the local storage control unit 100 is lost. Li the three site disaster recovery solution, the local storage control unit 100 is kept at the local site, the intermediate storage control unit 102 is kept at the intermediate site, and the remote storage control unit 104 is kept at the remote site. Data copied on the intermediate storage control unit 102 or the remote storage control unit 104 may be used to recover from the loss of the local storage control unit 100. Ih certain alternative implementations, there may be less than three sites.
  • the local storage control unit 100 and the intermediate storage control unit 102 may be at the same site. In additional alternative implementations of the invention, there may be more than three storage control units distributed among three or more sites. Furthermore, functions of a plurality of storage control units may be integrated into a single storage control unit, e.g., functions of the local storage control unit 100 and the intermediate storage control unit 102 may be integrated into a single storage control unit.
  • the local storage control unit 100 is coupled to a host 110 via data interface channel 112. While only a single host 110 is shown coupled to the local storage control unit 100, in certain implementations of the invention, a plurality of hosts may be coupled to the local storage control unit 100.
  • the host 110 may be any computational device known in the art, such as a personal computer, a workstation, a server, a mainframe, a hand held computer, a palm top computer, a telephony device, network appliance, etc.
  • the host 110 may include any operating system (not shown) known in the art, such as the IBM OS/390* operating system.
  • the host 110 may include at least one host application 114 that sends Input/Output (I/O) requests to the local storage control unit 100.
  • I/O Input/Output
  • the storage control units 100, 102, and 104 are coupled to storage volumes, such as, local site storage volumes 116, intermediate site storage volumes 118, and remote site storage volumes 120, respectively.
  • the storage volumes 116, 118, 120 may be configured as a Direct Access Storage Device (DASD), one or more RAID ranks, Just a bunch of disks (JBOD), or any other data repository system known in the art.
  • DASD Direct Access Storage Device
  • JBOD Just a bunch of disks
  • the storage control units 100, 102, and 104 may each include a cache, such as, cache 122, 124, 126 respectively.
  • the caches 122, 124, 126 comprise volatile memory to store tracks.
  • the storage control units 100, 102, and 104 may each include a nonvolatile storage (NVS), such as non-volatile storage 128, 130, 132 respectively.
  • NVS nonvolatile storage
  • the non- volatile storage 128, 130, 132 elements may buffer certain modified tracks in the caches 122, 124, 126 respectively.
  • the local storage control unit 100 additionally includes an application, such as, a local application 134, for synchronous copying of data stored in the cache 122, nonvolatile storage 128, and local site storage volumes 116 to another storage control unit, such as, the intermediate storage control unit 102.
  • the local application 134 includes copy services functions that execute in the local storage control unit 100.
  • the local storage control unit 100 receives I/O requests from the host application 114 to read and write to the local site storage volumes 116.
  • the intermediate storage control unit 102 additionally includes an application, such as a cascading PPRC application 136.
  • the cascading PPRC application 136 includes copy services functions that execute in the intermediate storage control unit 102.
  • the cascading PPRC application 136 can interact with the local storage control unit 100 to receive data synchronously.
  • the cascading PPRC application 136 can also send data asynchrottously to the remote storage control unit 104. Therefore, the cascading PPRC application 136 cascades a first pair of storage control units formed by the local storage control unit 100 and the intermediate storage control unit 102, and a second pair of storage control units formed by the intermediate storage control unit 102 and the remote storage control unit 104.
  • additional storage control units may be cascaded.
  • the remote storage control unit 104 additionally includes an application, such as a remote application 138, that can receive data asynchronously from another storage control unit, such as, the intermediate storage control unit 102.
  • the remote application 138 includes copy services functions that execute in the remote storage control unit 104.
  • FIG. 1 illustrates a computing environment where a host application
  • the local storage control unit 100 synchronously copies data to the intermediate storage control unit 102, and the intermediate storage control unit 104 asynchronously copies data to the remote storage control unit 104.
  • FIG. 2 illustrates a block diagram that illustrates communications between the local application 134, the cascading PPRC application 136 and the remote application 138, in accordance with certain implementations of the invention.
  • the local application 134 performs a synchronous data transfer, such as, via synchronous PPRC 200, to a synchronous copy process 202 that may be generated by the cascading PPRC application 136.
  • the synchronous data transfer 200 takes place over the data interface channel 106.
  • a background asynchronous copy process 204 that may be generated by the cascading PPRC application 136 performs an asynchronous data transfer, such as, via Extended Distance PPRC 206, to the remote application 138.
  • the asynchronous data transfer takes place over the data interface channel 108.
  • the intermediate site storage volumes 118 may include a copy of the local site storage volumes 116.
  • the distance between the local storage control unit 100 and the intermediate storage control unit is kept as close as possible to minimize the performance impact of synchronous PPRC.
  • Data is copied asynchronously from the intermediate storage control unit 102 to the remote storage control unit 104. As a result, the effect of long distance on the host response time is eliminated.
  • FIG. 2 illustrates how the cascading PPRC application 136 on the intermediate storage control unit 102 receives data synchronously from the local storage control unit 100, and transmits data asynchronously to the remote storage control unit 104.
  • FIG. 3 illustrates logic implemented hi the local storage control unit 100, in accordance with certain implementations of the invention.
  • the logic of FIG. 3 may be implemented in the local application 134 resident in the local storage control unit 100.
  • Control starts at block 300, where the local application 134 receives a write request from the host application 114.
  • the local application 134 writes (at block 302) data corresponding to the write request on the cache 122 and the non- volatile storage 128 on the local storage control unit 100.
  • Additional applications (not shown), such as, caching applications and non-volatile storage applications, hi the local storage control unit 100 may manage the data hi the cache 122 and the data hi the non-volatile storage 128 and keep the data hi the cache 122 and the non-volatile storage 128 consistent with the data in the local site storage volumes 116.
  • the local application 134 determines (at block 304) if the local storage control unit
  • the local storage control unit 100 is a primary PPRC device, i.e., the local storage control unit includes source data for a PPRC transaction. If so, the local application 134 sends (at block 306) the written data to the intermediate storage control unit 102 via a new write request. The local application 134 waits (at block 308) for a write complete acknowledgment from the intermediate storage control unit 102. The local application 134 receives (at block 310) a write complete acknowledgment from the intermediate storage control unit 102. Therefore, the local application 134 has transferred the data written by the host application 114 on the local storage control unit 100 to the intermediate storage control unit 102 via a synchronous copy.
  • the local application 134 signals (at block 312) to the host application 114 that the write request from the host application 114 has been completed at the local storage control unit 100.
  • the local application 134 receives (at block 300) a next write request from the host application 114.
  • the local application 134 determines (at block 304) that the local storage control unit 100 is not a primary PPRC device, i.e., the local storage control unit is not a source device for a PPRC transaction, then the local application 134 does not have to send any data to the intermediate storage control unit 102, and the local application 134 signals (at block 312) to the host application 114 that the write request from the host application 114 has been completed at the local storage control unit 100.
  • FIG. 3 illustrates a logic for receiving a write request from the host application 114 to the local storage control unit 100 and synchronously copying the data corresponding to the write request from the local storage control unit 100 to the intermediate storage control unit 102.
  • the host application 114 waits for the write request to be completed while the synchronous copying of data takes place. Since the local storage control unit 100 and the intermediate storage control unit 102 are within a synchronous communication distance of each other, the synchronous copying of data from the local storage control unit 100 to the intermediate storage control unit 102 takes a smaller amount of time when compared to the situation where the local storage control unit 100 is beyond a synchronous communication distance to the intermediate storage control unit 102. Since the copy of the data on the intermediate storage control unit 102 is written synchronously, the intermediate storage control unit 102 includes an equivalent copy of the data on the local storage control unit 100.
  • FIG.4 illustrates logic for receiving data synchronously as implemented in the intermediate storage control unit 102, in accordance with certain implementations of the invention.
  • the cascading PPRC application 136 may perform the logic illustrated in FIG. 4.
  • Control starts at block 400 where the cascading PPRC application 136 receives a write request from the local application 134.
  • the write request sent at block 306 of FIG. 3 to the intermediate storage control unit 102 may be received by the cascading PPRC application 136.
  • the cascading PPRC application 136 writes (at block 402) data corresponding to the write request to the cache 124 and the non-volatile storage 130.
  • the intermediate storage control unit 102 may keep the cache 124 and the non-volatile storage 130 consistent with the intermediate site storage volumes 118.
  • the cascading PPRC application 136 determines (at block 404) if data on the intermediate storage control unit 102 is to be cascaded, i.e., the data is to be sent to the remote storage control unit 104. If so, the synchronous copy process 202 of the cascading PPRC application 136 marks (at block 406) data as PPRC modified. The synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408) a write complete acknowledgment to the local application 134. The cascading PPRC application 136 receives (at block 400) the next write request from the local ap- plication 134.
  • the cascading PPRC application 136 determines (at block 404) that data on the intermediate storage control unit 102 does not have to be cascaded, then the synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408) a write complete acknowledgment to the local application 134 and the cascading PPRC application 136 receives (at block 400) the next request from the local application 134.
  • FIG.4 illustrates how the intermediate storage control unit 102 receives a write request from the local storage control unit 100, where the write request corresponds to a host write request.
  • the intermediate storage control unit 102 marks data corresponding to the host write request as PPRC modified.
  • FIG. 5 illustrates logic for copying data asynchronously as implemented in the intermediate storage control unit 102, in accordance with certain implementations of the invention.
  • the logic illustrated in FIG.5 may be performed by the background asynchronous copy process 204 of the cascading PPRC application 136.
  • Control starts at block 500 where the background asynchronous copy process 204 of the cascading PPRC application 136 determines the PPRC modified data stored in the cache 124, non-volatile storage 130, and the intermediate site storage volumes 118 of the intermediate storage control unit 102.
  • the background asynchronous copy process 204 of the cascading PPRC application 136 sends (at block 502) the PPRC modified data to the remote storage control unit 104 asynchronously, Le., the background asynchronous copy process 204 keeps sending the PPRC modified data stored in the cache 124, non-volatile storage 130, and the intermediate site storage volumes 118 of the intermediate storage control unit 102.
  • the background asynchronous copy process 204 determines (at block 504) if write complete acknowledgment has been received from the remote storage control unit 104. If not, the background asynchronous copy process 204 again determines (at block 504) if the write complete acknowledgment has been received.
  • the background asynchronous copy process 204 determines (at block 504) that write complete acknowledgment has been received from the remote storage control unit 104 then the background asynchronous copy process 204 determines (at block 500) the PPRC modified data once again.
  • the logic of FTG.5, illustrates how the background asynchronous copy process 204 while executing in the background copies data asynchronously from the intermediate storage control unit 102 to the remote storage control unit 104. Since the copying is asynchronous, the intermediate storage control unit 102 and the remote storage control unit 104 may be separated by long distances, such as, the extended distances allowed by Extended Distance PPRC.
  • the background asynchronous copy process 204 may quickly complete the copy of all remaining modified data to the remote storage control unit 104.
  • the remote site storage volumes 120 will include an equivalent copy of all updates up to the time of the outage. If there are multiple failures, such that both the local storage control unit 100 and the intermediate storage control unit 102 are lost then there may be data loss at the remote site.
  • the remote storage control unit 104 Since the remote storage control unit 104 is updated asynchronously, the data on the remote storage control unit 104 may not be equivalent to the data on the local storage control unit 100, unless all of the data from the intermediate storage control unit 102 has been copied up to some point in time.
  • certain implementations of the invention may force the data at the remote storage control unit to contain all dependent updates up to some specified time.
  • the consistent copy at the remote storage control unit may be preserved via a point in tune copy, such as FlashCopy*.
  • One method may include quiescing the host I/O temporarily at the local site while the remote storage control unit 104 catches up with the updates. Another method may prevent writes to the intermediate storage control unit 102 while the remote storage control unit 104 catches up with the updates.
  • the implementations create a long distance disaster recovery solution by first copying synchronously from a local storage control unit to an intermediate storage control unit, and subsequently copying asynchronously from the intermediate storage control unit to a remote storage control unit.
  • the distance between the local storage control unit and the intermediate storage control unit may be small enough such that copying data synchronously does not cause a significant performance impact on applications that perform I/O operations on the local storage control unit.
  • the data can be recovered from replicated copies of the data on either the intermediate storage control unit 102 or the remote storage control unit 104.
  • Aarticle of manufacture® refers to code or logic implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium (e.g., magnetic storage medium, such as hard disk drives, floppy disks, tape), optical storage (e.g., CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.).
  • hardware logic e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.
  • a computer readable medium e.g., magnetic storage medium, such as hard disk drives, floppy disks, tape
  • optical storage e.g., CD-ROMs, optical disks, etc.
  • volatile and non-volatile memory devices e.g., EEPROMs,
  • Code in the computer readable medium is accessed and executed by a processor.
  • the code in which implementations are made may further be accessible through a transmission media or from a file server over a network.
  • the article of manufacture in which the code is implemented may comprise a transmission media, such as a network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc.
  • the article of manufacture may comprise any information bearing medium known in the art.
  • FIG. 6 illustrates a block diagram of a computer architecture in which certain aspects of the invention are implemented.
  • FIG. 6 illustrates one implementation of the host 110, and the storage control units 100, 102, 104.
  • the host 110, and the storage control units 100, 102, 104 may implement a computer architecture 600 having a processor 602, a memory 604 (e.g., a volatile memory device), and storage 606 (e.g., a non- volatile storage, magnetic disk drives, optical disk drives, tape drives, etc.).
  • the storage 606 may comprise an internal storage device, an attached storage device or a network accessible storage device. Programs in the storage 606 may be loaded into the memory 604 and executed by the processor 602 in a manner known in the art.
  • the architecture may further include a network card 608 to enable communication with a network.
  • the architecture may also include at least one input 610, such as a keyboard, a touchscreen, a pen, voice-activated input, etc., and at least one output 612, such as a display device, a speaker, a printer, etc.
  • the data transfer between the local storage control unit 100 and the intermediate storage control unit 102 may be via Extended Distance PPRC. However, there may be data loss if there is an outage at the local storage control unit 100. Additionally, in alternative implementations of the invention the data transfer between the intermediate storage control unit 102 and the remote storage control unit 104 may be via synchronous PPRC. However, there may be performance impacts on the I/O from the host 110 to the local storage control unit 100.
  • the functions of the local storage control unit 100 and the intermediate storage control unit 102 may be implemented in a single storage control unit.
  • a fourth storage control unit may be coupled to the remote storage control unit 104 and data may be transferred from the remote storage control unit 104 to the fourth storage control unit.
  • a chain of synchronous data transfers and a chain of asynchronous data transfers may take place among a plurality of cascaded storage control units.
  • the storage control units may be any storage unit known in the art.
  • FIGs. 3, 4, and 5 describe specific operations occurring in a particular order. Further, the operations may be performed in parallel as well as sequentially. In alternative implementations, certain of the logic operations may be performed in a different order, modified or removed and still implement implementations of the present invention. Moreover, steps may be added to the above described logic and still conform to the implementations. Yet further steps may be performed by a single process or distributed processes.

Abstract

Provided are a method, system, and article of manufacture for copying storage. Data sent from a first storage unit is synchronously copied at a second storage unit. The copied data is sent asynchronously from the second storage unit to a third storage unit.

Description

Description
METHOD. SYSTEM. AND ARTICLE OF MANUFACTURE FOR
REMOTE COPYING OF DATA
Technical Field
[001] The present invention relates to a method, system, and an article of manufacture for remote copying of data. Background Art
[002] Information technology systems, including storage systems, may need protection from site disasters or outages, where outages may be planned or unplanned. Furthermore, information technology systems may require features for data migration, data backup, or data duplication. Implementations for disaster or outage recovery, data migration, data backup, and data duplication may include mirroring or copying of data in storage systems. Such mirroring or copying of data may involve interactions among hosts, storage systems and connecting networking components of the information technology system.
[003] An enterprise storage server (ESS), such as the IBM* TotalStorage Enterprise
Storage Server*, may be a disk storage server that includes one or more processors coupled to storage devices, including high capacity scalable storage devices, Redundant Array of Independent Disks (RAID), etc. The enterprise storage servers are connected to a network and include features for copying data in storage systems.
[004] PeertoPeer Remote Copy (PPRC) is an ESS function that allows the shadowing of application system data from a first site to a second site. The first site may be referred to as an application site, a local site, or a primary site. The second site may be referred to as a recovery site, a remote site, or a secondary site. The logical volumes that hold the data in the ESS at the local site are called local volumes, and the corresponding logical volumes that hold the mirrored data at the remote site are called remote volumes. High speed links, such as ESCON links may connect the local and remote ESS systems.
[005] In the synchronous type of operation for PPRC, i.e., synchronous PPRC, the updates done by a host application to the local volumes at the local site are synchronously shadowed onto the remote volumes at the remote site. As synchronous PPRC is a synchronous copying solution, write updates are ensured on both copies (local and remote) before the write is considered to be completed for the host application. In synchronous PPRC the host application does not get the Awrite complete® condition until the update is synchronously done in both the local and the remote volumes. Therefore, from the perspective of the host application the data at the remote volumes at the remote site is equivalent to the data at the local volumes at the local site.
[006] Synchronous PPRC increases the response time as compared to an asynchronous copy operation, and this is inherent to the synchronous operation. The overhead comes from the additional steps that are executed before the write operation is signaled as completed to the host application. Also the PPRC activity between the local site and the remote site may be comprised of signals that travel through the links that connect the sites, and the overhead on the response time of the host application write operations will increase proportionally with the distance between the sites. Therefore, the distance affects a host applications write response time. In certain implementations, there may be a maximum supported distance for synchronous PPRC operations referred to as the synchronous communication distance.
[007] In the Extended Distance PPRC (also referred to as PPRC Extended Distance) method of operation, PPRC mirrors the updates of the local volume onto the remote volumes in an asynchronous manner, while the host application is running. In Extended
[008] Distance PPRC, the host application receives a write complete response before the update is copied from the local volumes to the remote volumes. In this way, when in Extended Distance PPRC, a host application=s write operations are free of the typical synchronous overheads. Therefore, Extended Distance PPRC is suitable for remote copy solutions at very long distances with minimal impact on host applications. There is no overhead penalty upon the host application=s write such as in synchronous PPRC. However, Extended Distance PPRC does not continuously maintain an equivalent copy of the local data at the remote site.
[009] Further details of the PPRC are described in the IBM publication AIBM To- talStorage Enterprise Storage Server: PPRC Extended Distance,® IBM document no. SG24656800 (Copyright IBM, 2002), which publication is incorporated herein by reference in its entirety. Disclosure of Invention
[010] Provided are a method, system, and article of manufacture for copying storage. Data sent from a first storage unit is synchronously copied at a second storage unit. The copied data is sent asynchronously from the second storage unit to a third storage unit.
[011] Preferably the first storage unit is a first storage volume in a first storage control unit, wherein the second storage unit is a second storage volume in a second storage control unit, wherein the third storage unit is a third storage volume in a third storage control unit, and wherein the third storage control unit is beyond a synchronous communication distance to the first storage control unit. [012] Alternatively the first storage unit is a storage control unit, wherein the second storage unit is a second storage control unit, wherein the third storage unit is a third storage control unit, wherein the second storage unit is within a synchronous communication distance to the first storage unit, and wherein the third storage unit is beyond the synchronous communication distance to the second storage unit.
[013] Preferably the copied data is received at the third storage unit. The data received from the second storage unit, is asynchronously copied at the third storage unit.
[014] Optionally a write request is received at the first storage unit from a host application coupled to a host system. Data corresponding to the write request is written at a cache and a non-volatile storage coupled to the first storage unit. The data corresponding to the write request is sent synchronously to the second storage unit. Optionally a response, indicative of the write request being completed, from the second storage unit is received at the first storage unit. An indication is sent to the host application that the write request is completed.
[015] Optionally, in response to receiving the data at the second storage unit, the data is copied to a cache and a non-volatile storage at the second storage unit. The data is marked as modified at the second storage unit. A response is sent from the second storage unit to the first storage unit, wherein the response indicates that the received data has been copied at the second storage unit.
[016] Optionally a determination is made of the modified data at the second storage unit, wherein sending the copied data, further comprises sending the modified data asynchronously from the second storage unit to the third storage unit.
[017] Optionally the first storage unit is coupled to a host that sends Input/Output requests to the first storage unit, wherein an update from the host at the first storage unit is synchronously reflected at the second storage unit and asynchronously reflected at the third storage unit.
[018] Preferably a host application that performs I/O with the first storage unit completes a write operation faster hi comparison to the time taken if an update corresponding to the write operation were copied synchronously to the third storage unit.
[019] Optionally recovering from a disaster at the first storage unit is performed by substituting replicated data from the second or third storage units.
[020] Optionally additional storage units are cascaded to the first, second, and third storage units, wherein the additional storage units may communicate synchronously or asynchronously. In yet further implementations, functions of the first and second storage units are integrated into a single storage control unit.
[021] Accordingly, an embodiment of the invention may be used to create a long distance disaster recovery solution by copying synchronously from a local storage control unit to an intermediate storage control unit, and in parallel copying asynchronously from the intermediate storage control unit to a remote storage control unit. Brief Description of the Drawings
[022] The invention will now be described, by way of example only, with reference to a preferred embodiment thereof, as illustrated in the accompanying drawings, in which:
[023] FlG. 1 illustrates a block diagram of a computing environment, in accordance with certain described aspects of the invention;
[024] FIG.2 illustrates a block diagram of a cascading copy application, in accordance with certain described implementations of the invention;
[025] FIG. 3 illustrates logic implemented in a local storage control unit, in accordance with certain described implementations of the invention;
[026] FIG.4 illustrates logic for receiving data synchronously as implemented in an intermediate storage control unit, in accordance with certain described implementations of the invention;
[027] FIG. 5 illustrates logic for copying data asynchronously as implemented in the intermediate storage control unit, in accordance with certain described implementations of the invention; and
[028] FIG. 6 illustrates a block diagram of a computer architecture in which certain described aspects of the invention are implemented.
[029] Note that in the drawings like reference numbers represent corresponding parts throughout. Best Mode for Carrying Out the Invention
[030] FIG. 1 illustrates a computing environment utilizing three storage control units, such as a local storage control unit 100, an intermediate storage control unit 102, and a remote storage control unit 104 connected by data interface channels 106, 108, such as, the Enterprise System Connection (ESCON)* channel or any other data interface mechanism known in the art (e.g., fibre channel, Storage Area Network (SAN) interconnections, etc.).
[031] The three storage control units 100, 102, 104 may be at three different sites with the local storage control unit 100 and the intermediate storage control unit 102 being within a synchronous communication distance of each other. The synchronous communication distance between two storage control units is the distance up to which synchronous communication is feasible between the two storage control units. The remote storage control unit 104 may be a long distance away from the intermediate storage control unit 102 and the local storage control unit 100, such that, synchronous copying of data from the intermediate storage control unit 102 to the remote storage control unit 104 may be time consuming or impractical. Additionally, the intermediate storage control unit 102 may be hi a secure environment separated from the local storage control unit 100 and with separate power to reduce the possibility of an outage affecting both the local storage control unit 100 and the intermediate storage control unit 102. Certain implementations of the invention create a three site (local, intermediate, remote) disaster recovery solution where there may be no data loss if the local storage control unit 100 is lost. Li the three site disaster recovery solution, the local storage control unit 100 is kept at the local site, the intermediate storage control unit 102 is kept at the intermediate site, and the remote storage control unit 104 is kept at the remote site. Data copied on the intermediate storage control unit 102 or the remote storage control unit 104 may be used to recover from the loss of the local storage control unit 100. Ih certain alternative implementations, there may be less than three sites. For example, the local storage control unit 100 and the intermediate storage control unit 102 may be at the same site. In additional alternative implementations of the invention, there may be more than three storage control units distributed among three or more sites. Furthermore, functions of a plurality of storage control units may be integrated into a single storage control unit, e.g., functions of the local storage control unit 100 and the intermediate storage control unit 102 may be integrated into a single storage control unit.
[032] The local storage control unit 100 is coupled to a host 110 via data interface channel 112. While only a single host 110 is shown coupled to the local storage control unit 100, in certain implementations of the invention, a plurality of hosts may be coupled to the local storage control unit 100. The host 110 may be any computational device known in the art, such as a personal computer, a workstation, a server, a mainframe, a hand held computer, a palm top computer, a telephony device, network appliance, etc. The host 110 may include any operating system (not shown) known in the art, such as the IBM OS/390* operating system. The host 110 may include at least one host application 114 that sends Input/Output (I/O) requests to the local storage control unit 100.
[033] The storage control units 100, 102, and 104 are coupled to storage volumes, such as, local site storage volumes 116, intermediate site storage volumes 118, and remote site storage volumes 120, respectively. The storage volumes 116, 118, 120 may be configured as a Direct Access Storage Device (DASD), one or more RAID ranks, Just a bunch of disks (JBOD), or any other data repository system known in the art.
[034] The storage control units 100, 102, and 104 may each include a cache, such as, cache 122, 124, 126 respectively. The caches 122, 124, 126 comprise volatile memory to store tracks. The storage control units 100, 102, and 104 may each include a nonvolatile storage (NVS), such as non-volatile storage 128, 130, 132 respectively. The non- volatile storage 128, 130, 132 elements may buffer certain modified tracks in the caches 122, 124, 126 respectively. [035] The local storage control unit 100 additionally includes an application, such as, a local application 134, for synchronous copying of data stored in the cache 122, nonvolatile storage 128, and local site storage volumes 116 to another storage control unit, such as, the intermediate storage control unit 102. The local application 134 includes copy services functions that execute in the local storage control unit 100. The local storage control unit 100 receives I/O requests from the host application 114 to read and write to the local site storage volumes 116.
[036] The intermediate storage control unit 102 additionally includes an application, such as a cascading PPRC application 136. The cascading PPRC application 136 includes copy services functions that execute in the intermediate storage control unit 102. The cascading PPRC application 136 can interact with the local storage control unit 100 to receive data synchronously. The cascading PPRC application 136 can also send data asynchrottously to the remote storage control unit 104. Therefore, the cascading PPRC application 136 cascades a first pair of storage control units formed by the local storage control unit 100 and the intermediate storage control unit 102, and a second pair of storage control units formed by the intermediate storage control unit 102 and the remote storage control unit 104. In alternative implementations of the invention, additional storage control units may be cascaded.
[037] The remote storage control unit 104 additionally includes an application, such as a remote application 138, that can receive data asynchronously from another storage control unit, such as, the intermediate storage control unit 102. The remote application 138 includes copy services functions that execute in the remote storage control unit 104.
[038] Therefore, FIG. 1, illustrates a computing environment where a host application
114 sends I/O requests to a local storage control unit 100. The local storage control unit 100 synchronously copies data to the intermediate storage control unit 102, and the intermediate storage control unit 104 asynchronously copies data to the remote storage control unit 104.
[039] FIG. 2 illustrates a block diagram that illustrates communications between the local application 134, the cascading PPRC application 136 and the remote application 138, in accordance with certain implementations of the invention.
[040] The local application 134 performs a synchronous data transfer, such as, via synchronous PPRC 200, to a synchronous copy process 202 that may be generated by the cascading PPRC application 136. The synchronous data transfer 200 takes place over the data interface channel 106.
[041] A background asynchronous copy process 204 that may be generated by the cascading PPRC application 136 performs an asynchronous data transfer, such as, via Extended Distance PPRC 206, to the remote application 138. The asynchronous data transfer takes place over the data interface channel 108.
[042] Since data from the local storage control unit 100 are copied synchronously to the intermediate storage control unit 102, the intermediate site storage volumes 118 may include a copy of the local site storage volumes 116. In certain implementations of the invention the distance between the local storage control unit 100 and the intermediate storage control unit is kept as close as possible to minimize the performance impact of synchronous PPRC. Data is copied asynchronously from the intermediate storage control unit 102 to the remote storage control unit 104. As a result, the effect of long distance on the host response time is eliminated.
[043] Therefore, FIG. 2 illustrates how the cascading PPRC application 136 on the intermediate storage control unit 102 receives data synchronously from the local storage control unit 100, and transmits data asynchronously to the remote storage control unit 104.
[044] FIG. 3 illustrates logic implemented hi the local storage control unit 100, in accordance with certain implementations of the invention. In certain implementations of the invention the logic of FIG. 3 may be implemented in the local application 134 resident in the local storage control unit 100.
[045] Control starts at block 300, where the local application 134 receives a write request from the host application 114. The local application 134 writes (at block 302) data corresponding to the write request on the cache 122 and the non- volatile storage 128 on the local storage control unit 100. Additional applications (not shown), such as, caching applications and non-volatile storage applications, hi the local storage control unit 100 may manage the data hi the cache 122 and the data hi the non-volatile storage 128 and keep the data hi the cache 122 and the non-volatile storage 128 consistent with the data in the local site storage volumes 116.
[046] The local application 134 determines (at block 304) if the local storage control unit
100 is a primary PPRC device, i.e., the local storage control unit includes source data for a PPRC transaction. If so, the local application 134 sends (at block 306) the written data to the intermediate storage control unit 102 via a new write request. The local application 134 waits (at block 308) for a write complete acknowledgment from the intermediate storage control unit 102. The local application 134 receives (at block 310) a write complete acknowledgment from the intermediate storage control unit 102. Therefore, the local application 134 has transferred the data written by the host application 114 on the local storage control unit 100 to the intermediate storage control unit 102 via a synchronous copy.
[047] The local application 134 signals (at block 312) to the host application 114 that the write request from the host application 114 has been completed at the local storage control unit 100. The local application 134 receives (at block 300) a next write request from the host application 114.
[048] If the local application 134 determines (at block 304) that the local storage control unit 100 is not a primary PPRC device, i.e., the local storage control unit is not a source device for a PPRC transaction, then the local application 134 does not have to send any data to the intermediate storage control unit 102, and the local application 134 signals (at block 312) to the host application 114 that the write request from the host application 114 has been completed at the local storage control unit 100.
[049] Therefore, FIG. 3 illustrates a logic for receiving a write request from the host application 114 to the local storage control unit 100 and synchronously copying the data corresponding to the write request from the local storage control unit 100 to the intermediate storage control unit 102. The host application 114 waits for the write request to be completed while the synchronous copying of data takes place. Since the local storage control unit 100 and the intermediate storage control unit 102 are within a synchronous communication distance of each other, the synchronous copying of data from the local storage control unit 100 to the intermediate storage control unit 102 takes a smaller amount of time when compared to the situation where the local storage control unit 100 is beyond a synchronous communication distance to the intermediate storage control unit 102. Since the copy of the data on the intermediate storage control unit 102 is written synchronously, the intermediate storage control unit 102 includes an equivalent copy of the data on the local storage control unit 100.
[050] FIG.4 illustrates logic for receiving data synchronously as implemented in the intermediate storage control unit 102, in accordance with certain implementations of the invention. The cascading PPRC application 136 may perform the logic illustrated in FIG. 4.
[051] Control starts at block 400 where the cascading PPRC application 136 receives a write request from the local application 134. For example, the write request sent at block 306 of FIG. 3 to the intermediate storage control unit 102 may be received by the cascading PPRC application 136. The cascading PPRC application 136 writes (at block 402) data corresponding to the write request to the cache 124 and the non-volatile storage 130. The intermediate storage control unit 102 may keep the cache 124 and the non-volatile storage 130 consistent with the intermediate site storage volumes 118.
[052] The cascading PPRC application 136 determines (at block 404) if data on the intermediate storage control unit 102 is to be cascaded, i.e., the data is to be sent to the remote storage control unit 104. If so, the synchronous copy process 202 of the cascading PPRC application 136 marks (at block 406) data as PPRC modified. The synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408) a write complete acknowledgment to the local application 134. The cascading PPRC application 136 receives (at block 400) the next write request from the local ap- plication 134.
[053] If the cascading PPRC application 136 determines (at block 404) that data on the intermediate storage control unit 102 does not have to be cascaded, then the synchronous copy process 202 of the cascading PPRC application 136 signals (at block 408) a write complete acknowledgment to the local application 134 and the cascading PPRC application 136 receives (at block 400) the next request from the local application 134.
[054] Therefore, FIG.4 illustrates how the intermediate storage control unit 102 receives a write request from the local storage control unit 100, where the write request corresponds to a host write request. The intermediate storage control unit 102 marks data corresponding to the host write request as PPRC modified.
[055] FIG. 5 illustrates logic for copying data asynchronously as implemented in the intermediate storage control unit 102, in accordance with certain implementations of the invention. The logic illustrated in FIG.5 may be performed by the background asynchronous copy process 204 of the cascading PPRC application 136.
[056] Control starts at block 500 where the background asynchronous copy process 204 of the cascading PPRC application 136 determines the PPRC modified data stored in the cache 124, non-volatile storage 130, and the intermediate site storage volumes 118 of the intermediate storage control unit 102.
[057] The background asynchronous copy process 204 of the cascading PPRC application 136 sends (at block 502) the PPRC modified data to the remote storage control unit 104 asynchronously, Le., the background asynchronous copy process 204 keeps sending the PPRC modified data stored in the cache 124, non-volatile storage 130, and the intermediate site storage volumes 118 of the intermediate storage control unit 102.
[058] After the PPRC modified data has been sent, the background asynchronous copy process 204 determines (at block 504) if write complete acknowledgment has been received from the remote storage control unit 104. If not, the background asynchronous copy process 204 again determines (at block 504) if the write complete acknowledgment has been received.
[059] If after all PPRC modified data has been sent, the background asynchronous copy process 204 determines (at block 504) that write complete acknowledgment has been received from the remote storage control unit 104 then the background asynchronous copy process 204 determines (at block 500) the PPRC modified data once again.
[060] The logic of FTG.5, illustrates how the background asynchronous copy process 204 while executing in the background copies data asynchronously from the intermediate storage control unit 102 to the remote storage control unit 104. Since the copying is asynchronous, the intermediate storage control unit 102 and the remote storage control unit 104 may be separated by long distances, such as, the extended distances allowed by Extended Distance PPRC.
[061] In certain implementations of the invention, if the local storage control unit 100 stops sending updates to the intermediate storage control unit 102 because of an outage at the local site that has the local storage control unit 100, then the background asynchronous copy process 204 may quickly complete the copy of all remaining modified data to the remote storage control unit 104. At the completion of the copy, the remote site storage volumes 120 will include an equivalent copy of all updates up to the time of the outage. If there are multiple failures, such that both the local storage control unit 100 and the intermediate storage control unit 102 are lost then there may be data loss at the remote site.
[062] Since the remote storage control unit 104 is updated asynchronously, the data on the remote storage control unit 104 may not be equivalent to the data on the local storage control unit 100, unless all of the data from the intermediate storage control unit 102 has been copied up to some point in time. To maintain an equivalent copy of data at the remote storage control unit 104 in case of failure of both the local storage control unit 100 and the intermediate storage control unit 102, certain implementations of the invention may force the data at the remote storage control unit to contain all dependent updates up to some specified time. The consistent copy at the remote storage control unit may be preserved via a point in tune copy, such as FlashCopy*. One method may include quiescing the host I/O temporarily at the local site while the remote storage control unit 104 catches up with the updates. Another method may prevent writes to the intermediate storage control unit 102 while the remote storage control unit 104 catches up with the updates.
[063] The implementations create a long distance disaster recovery solution by first copying synchronously from a local storage control unit to an intermediate storage control unit, and subsequently copying asynchronously from the intermediate storage control unit to a remote storage control unit. The distance between the local storage control unit and the intermediate storage control unit may be small enough such that copying data synchronously does not cause a significant performance impact on applications that perform I/O operations on the local storage control unit.
[064] In implementations of the invention, if either the local storage control unit 100 or data on the local storage control unit 100 is lost, then the data can be recovered from replicated copies of the data on either the intermediate storage control unit 102 or the remote storage control unit 104. In certain implementations, it may be preferable to recover the data from the intermediate storage control unit 102 as the data on the intermediate storage control unit 102 is always equivalent to the data on the local storage control unit 100 since data is copied synchronously from the local storage control unit 100 to the intermediate storage control unit 102.
[065] Additional Implementation Details
[066] The described techniques may be implemented as a method, apparatus or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term Aarticle of manufacture® as used herein refers to code or logic implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.) or a computer readable medium (e.g., magnetic storage medium, such as hard disk drives, floppy disks, tape), optical storage (e.g., CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.). Code in the computer readable medium is accessed and executed by a processor. The code in which implementations are made may further be accessible through a transmission media or from a file server over a network. In such cases, the article of manufacture in which the code is implemented may comprise a transmission media, such as a network transmission line, wireless transmission media, signals propagating through space, radio waves, infrared signals, etc. The article of manufacture may comprise any information bearing medium known in the art.
[067] FIG. 6 illustrates a block diagram of a computer architecture in which certain aspects of the invention are implemented. FIG. 6 illustrates one implementation of the host 110, and the storage control units 100, 102, 104. The host 110, and the storage control units 100, 102, 104 may implement a computer architecture 600 having a processor 602, a memory 604 (e.g., a volatile memory device), and storage 606 (e.g., a non- volatile storage, magnetic disk drives, optical disk drives, tape drives, etc.). The storage 606 may comprise an internal storage device, an attached storage device or a network accessible storage device. Programs in the storage 606 may be loaded into the memory 604 and executed by the processor 602 in a manner known in the art. The architecture may further include a network card 608 to enable communication with a network. The architecture may also include at least one input 610, such as a keyboard, a touchscreen, a pen, voice-activated input, etc., and at least one output 612, such as a display device, a speaker, a printer, etc.
[068]
[069] In alternative implementations of the invention the data transfer between the local storage control unit 100 and the intermediate storage control unit 102 may be via Extended Distance PPRC. However, there may be data loss if there is an outage at the local storage control unit 100. Additionally, in alternative implementations of the invention the data transfer between the intermediate storage control unit 102 and the remote storage control unit 104 may be via synchronous PPRC. However, there may be performance impacts on the I/O from the host 110 to the local storage control unit 100.
[070] In alternative implementations of the invention, the functions of the local storage control unit 100 and the intermediate storage control unit 102 may be implemented in a single storage control unit. Furthermore, in additional implementations of the invention there may be more than three storage control units cascaded to each other. For example, a fourth storage control unit may be coupled to the remote storage control unit 104 and data may be transferred from the remote storage control unit 104 to the fourth storage control unit. In certain implementations of the invention, a chain of synchronous data transfers and a chain of asynchronous data transfers may take place among a plurality of cascaded storage control units. Furthermore, while the implementations have been described with storage control units, the storage control units may be any storage unit known in the art.
[071] The logic of FIGs. 3, 4, and 5 describe specific operations occurring in a particular order. Further, the operations may be performed in parallel as well as sequentially. In alternative implementations, certain of the logic operations may be performed in a different order, modified or removed and still implement implementations of the present invention. Moreover, steps may be added to the above described logic and still conform to the implementations. Yet further steps may be performed by a single process or distributed processes.
[072] Many of the software and hardware components have been described in separate modules for purposes of illustration. Such components may be integrated into a fewer number of components or divided into a larger number of components. Additionally, certain operations described as performed by a specific component may be performed by other components.
[073] *IBM, IBM TotalStorage Enterprise Storage Server, Enterprise System Connection
(ESCON), OS/390, FlashCopy are trademarks of International Business Machines Corp.

Claims

Claims
[001] A method for copying storage, comprising: synchronously copying data, sent from a first storage unit, at a second storage unit; and sending the copied data asynchronously from the second storage unit to a third storage unit.
[002] The method of claim 1, wherein the first storage unit is a first storage volume in a first storage control unit, wherein the second storage unit is a second storage volume in a second storage control unit, wherein the third storage unit is a third storage volume in a third storage control unit, and wherein the third storage control unit is beyond a synchronous communication distance to the first storage control unit.
[003] The method of claim 1, wherein the first storage unit is a storage control unit, wherein the second storage unit is a second storage control unit, wherein the third storage unit is a third storage control unit, wherein the second storage unit is within a synchronous communication distance to the first storage unit, and wherein the third storage unit is beyond the synchronous communication distance to the second storage unit.
[004] The method of claim 1, further comprising: receiving a write request at the first storage unit from a host application coupled to a host system; writing data corresponding to the write request at a cache and a non-volatile storage coupled to the first storage unit; and sending the data corresponding to the write request synchronously to the second storage unit.
[005] The method of claim 1, further comprising: in response to receiving the data at the second storage unit, copying the data to a cache and a non-volatile storage at the second storage unit; and marking the data as modified at the second storage unit; sending a response from the second storage unit to the first storage unit, wherein the response indicates that the received data has been copied at the second storage unit.
[006] The method of claim 5, further comprising: determining the modified data at the second storage unit; and wherein sending the copied data, further comprises sending the modified data asynchronously from the second storage unit to the third storage unit.
[007] The method of claim 1, further comprising: recovering from a disaster at the first storage unit by substituting replicated data from the second or third storage units.
[008] The method of claim 1, wherein additional storage units are cascaded to the first, second, and third storage units, wherein the additional storage units may communicate synchronously or asynchronously.
[009] A system for copying storage, comprising: a first storage unit; a second storage unit coupled to the first storage unit; a third storage unit coupled to the second storage unit; means for synchronously copying data, sent from the first storage unit, at the second storage unit; and means for sending the copied data asyn- chronously from the second storage unit to the third storage unit.
[010] The system of claim 9, wherein the first storage unit is a first storage volume in a first storage control unit, wherein the second storage unit is a second storage volume in a second storage control unit, wherein the third storage unit is a third storage volume in a third storage control unit, and wherein the third storage control unit is beyond a synchronous communication distance to the first storage control unit.
[Oil] The system of claim 9, wherein the first storage unit is a storage control unit, wherein the second storage unit is a second storage control unit, wherein the third storage unit is a third storage control unit, wherein the second storage unit is within a synchronous communication distance to the first storage unit, and wherein the third storage unit is beyond the synchronous communication distance to the second storage unit.
[012] The system of claim 9, farther comprising: a host system coupled to the first storage unit; a host application coupled to the host system; means for receiving a write request at the first storage unit from the host application; means for writing data corresponding to the write request at a cache and a non-volatile storage coupled to the first storage unit; and means for sending the data corresponding to the write request synchronously to the second storage unit.
[013] The system of claim 9, further comprising: means for copying the data to a cache and a non- volatile storage at the second storage unit, in response to receiving the data at the second storage unit; and means for marking the data as modified at the second storage unit; means for sending a response from the second storage unit to the first storage unit, wherein the response indicates that the received data has been copied at the second storage unit.
[014] The system of claim 13, further comprising: means for determmhig the modified data at the second storage unit; and wherein the means for sending the copied data, further comprises means for sending the modified data asynchronously from the second storage unit to the third storage unit.
[015] The system of claim 9, further comprising: means for recovering from a disaster at the first storage unit by substituting replicated data from the second or third storage units.
[016] The system of claim 9, wherein additional storage units are cascaded to the first, second, and third storage units, wherein the additional storage units may communicate synchronously or asynchronously. An article of manufacture for copying storage, wherein the article of manufacture includes at least one program which comprises operation capable of performing a method according to any one of claims 1 to 8.
PCT/EP2004/051115 2003-06-17 2004-06-15 Method, system and article of manufacture for remote copying of data WO2004111849A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/464,024 US7266665B2 (en) 2003-06-17 2003-06-17 Method, system, and article of manufacture for remote copying of data
US10/464,024 2003-06-17

Publications (2)

Publication Number Publication Date
WO2004111849A2 true WO2004111849A2 (en) 2004-12-23
WO2004111849A3 WO2004111849A3 (en) 2005-07-28

Family

ID=33517195

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2004/051115 WO2004111849A2 (en) 2003-06-17 2004-06-15 Method, system and article of manufacture for remote copying of data

Country Status (4)

Country Link
US (2) US7266665B2 (en)
KR (1) KR20060017789A (en)
TW (1) TW200540614A (en)
WO (1) WO2004111849A2 (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7065589B2 (en) * 2003-06-23 2006-06-20 Hitachi, Ltd. Three data center remote copy system with journaling
US7133983B2 (en) * 2003-09-29 2006-11-07 International Business Machines Corporation Method, system, and program for asynchronous copy
JP4282464B2 (en) * 2003-12-17 2009-06-24 株式会社日立製作所 Remote copy system
JP2005309793A (en) 2004-04-22 2005-11-04 Hitachi Ltd Data processing system
JP4401895B2 (en) * 2004-08-09 2010-01-20 株式会社日立製作所 Computer system, computer and its program.
US20060069890A1 (en) * 2004-09-30 2006-03-30 Emc Corporation Triangular asynchronous replication with minimal synchronous storage
JP4376750B2 (en) * 2004-10-14 2009-12-02 株式会社日立製作所 Computer system
US7412576B2 (en) * 2004-12-08 2008-08-12 Hitachi, Ltd. Remote copy system having multiple data centers
US7401260B2 (en) * 2005-01-28 2008-07-15 International Business Machines Corporation Apparatus, system, and method for performing storage device maintenance
US8156195B2 (en) * 2006-02-14 2012-04-10 Emc Corporation Systems and methods for obtaining ultra-high data availability and geographic disaster tolerance
US7603581B2 (en) * 2006-03-17 2009-10-13 International Business Machines Corporation Remote copying of updates to primary and secondary storage locations subject to a copy relationship
JP2007265271A (en) * 2006-03-29 2007-10-11 Nec Corp Storage device, data arrangement method and program
WO2008039057A1 (en) * 2006-09-25 2008-04-03 Intellimagic Optimising remote mirroring resynchronisation
US8015343B2 (en) 2008-08-08 2011-09-06 Amazon Technologies, Inc. Providing executing programs with reliable access to non-local block data storage
EP2546753B1 (en) * 2010-03-08 2015-04-08 Nec Corporation Computer system, active system computer and standby system computer
AU2010201718B2 (en) * 2010-04-29 2012-08-23 Canon Kabushiki Kaisha Method, system and apparatus for identifying a cache line
US8601214B1 (en) * 2011-01-06 2013-12-03 Netapp, Inc. System and method for write-back cache in sparse volumes
US9582384B2 (en) * 2011-03-23 2017-02-28 Stormagic Limited Method and system for data replication
US8719523B2 (en) * 2011-10-03 2014-05-06 International Business Machines Corporation Maintaining multiple target copies
JP5862246B2 (en) * 2011-11-30 2016-02-16 富士通株式会社 Data management program, data management method, and storage apparatus
US11030125B2 (en) * 2013-02-05 2021-06-08 International Business Machines Corporation Point in time copy operations from source volumes to space efficient target volumes in two stages via a non-volatile storage
US9652345B2 (en) * 2015-09-24 2017-05-16 The Florida International University Board Of Trustees Techniques and systems for local independent failure domains
US10671493B1 (en) 2017-08-29 2020-06-02 Wells Fargo Bank, N.A. Extended remote copy configurator of three-site data replication for disaster recovery
US11360688B2 (en) * 2018-05-04 2022-06-14 EMC IP Holding Company LLC Cascading snapshot creation in a native replication 3-site configuration

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155845A (en) * 1990-06-15 1992-10-13 Storage Technology Corporation Data storage system for providing redundant copies of data on different disk drives
WO2001035244A1 (en) * 1999-11-11 2001-05-17 Miralink Corporation Flexible remote data mirroring
US6496908B1 (en) * 2001-05-18 2002-12-17 Emc Corporation Remote mirroring
US20030014433A1 (en) * 2001-07-13 2003-01-16 Sun Microsystems, Inc. Storage network data replicator

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5544347A (en) * 1990-09-24 1996-08-06 Emc Corporation Data storage system controlled remote data mirroring with respectively maintained data indices
US6304980B1 (en) * 1996-03-13 2001-10-16 International Business Machines Corporation Peer-to-peer backup system with failure-triggered device switching honoring reservation of primary device
US6684306B1 (en) * 1999-12-16 2004-01-27 Hitachi, Ltd. Data backup in presence of pending hazard
US6640291B2 (en) * 2001-08-10 2003-10-28 Hitachi, Ltd. Apparatus and method for online data migration with remote copy
US6910150B2 (en) * 2001-10-15 2005-06-21 Dell Products L.P. System and method for state preservation in a stretch cluster
US20030217211A1 (en) * 2002-05-14 2003-11-20 Rust Robert A. Controller communications over an always-on controller interconnect
US20040034807A1 (en) * 2002-08-14 2004-02-19 Gnp Computers, Inc. Roving servers in a clustered telecommunication distributed computer system
US7149919B2 (en) * 2003-05-15 2006-12-12 Hewlett-Packard Development Company, L.P. Disaster recovery system with cascaded resynchronization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155845A (en) * 1990-06-15 1992-10-13 Storage Technology Corporation Data storage system for providing redundant copies of data on different disk drives
WO2001035244A1 (en) * 1999-11-11 2001-05-17 Miralink Corporation Flexible remote data mirroring
US6496908B1 (en) * 2001-05-18 2002-12-17 Emc Corporation Remote mirroring
US20030014433A1 (en) * 2001-07-13 2003-01-16 Sun Microsystems, Inc. Storage network data replicator

Also Published As

Publication number Publication date
US20080021974A1 (en) 2008-01-24
US7266665B2 (en) 2007-09-04
WO2004111849A3 (en) 2005-07-28
US20040260902A1 (en) 2004-12-23
US7921273B2 (en) 2011-04-05
TW200540614A (en) 2005-12-16
KR20060017789A (en) 2006-02-27

Similar Documents

Publication Publication Date Title
US7921273B2 (en) Method, system, and article of manufacture for remote copying of data
US7188272B2 (en) Method, system and article of manufacture for recovery from a failure in a cascading PPRC system
US5682513A (en) Cache queue entry linking for DASD record updates
US7610318B2 (en) Autonomic infrastructure enablement for point in time copy consistency
US8005800B2 (en) Data storage system for fast reverse restore
US7904684B2 (en) System and article of manufacture for consistent copying of storage volumes
US5720029A (en) Asynchronously shadowing record updates in a remote copy session using track arrays
US7747576B2 (en) Incremental update control for remote copy
US7137033B2 (en) Method, system, and program for synchronizing subtasks using sequence numbers
US7370163B2 (en) Adaptive cache engine for storage area network including systems and methods related thereto
US7366846B2 (en) Redirection of storage access requests
US7206911B2 (en) Method, system, and program for a system architecture for an arbitrary number of backup components
US7133983B2 (en) Method, system, and program for asynchronous copy
US20050071386A1 (en) Method, system, and program for data synchronizatiom
US20040236983A1 (en) Method, apparatus and program storage device for maintaining data consistency and cache coherency during communications failures between nodes in a remote mirror pair
US7185157B2 (en) Method, system, and article of manufacture for generating a copy of a first and a second set of volumes in a third set of volumes
US7647357B2 (en) Data transfer management in consistency group formation
US7035978B2 (en) Method, system, and program for policies for improving throughput in remote mirroring systems
US20230333777A1 (en) Replication techniques using a replication log

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1020057021872

Country of ref document: KR

ENP Entry into the national phase

Ref document number: PI0406376

Country of ref document: BR

WWP Wipo information: published in national office

Ref document number: 1020057021872

Country of ref document: KR

122 Ep: pct application non-entry in european phase