US20100180151A1 - Method for handling interrupted writes using multiple cores - Google Patents
Method for handling interrupted writes using multiple cores Download PDFInfo
- Publication number
- US20100180151A1 US20100180151A1 US12/354,126 US35412609A US2010180151A1 US 20100180151 A1 US20100180151 A1 US 20100180151A1 US 35412609 A US35412609 A US 35412609A US 2010180151 A1 US2010180151 A1 US 2010180151A1
- Authority
- US
- United States
- Prior art keywords
- controller
- data
- state
- write
- storage array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1076—Parity data used in redundant arrays of independent storages, e.g. in RAID systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/08—Error detection or correction by redundancy in data representation, e.g. by using checking codes
- G06F11/10—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
- G06F11/1008—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
- G06F11/1012—Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices using codes or arrangements adapted for a specific type of error
- G06F11/1032—Simple parity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2053—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
- G06F11/2089—Redundant storage control functionality
- G06F11/2092—Techniques of failing over between control units
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/18—Error detection or correction; Testing, e.g. of drop-outs
- G11B20/1833—Error detection or correction; Testing, e.g. of drop-outs by adding special lists or symbols to the coded information
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B2220/00—Record carriers by type
- G11B2220/40—Combinations of multiple record carriers
- G11B2220/41—Flat as opposed to hierarchical combination, e.g. library of tapes or discs, CD changer, or groups of record carriers that together store one title
- G11B2220/415—Redundant array of inexpensive disks [RAID] systems
Definitions
- the present invention relates to storage arrays generally and, more particularly, to a method and/or apparatus for responding to handling interrupted writes using multiple cores.
- a conventional controller enters an NVSRAM interrupted write mode condition and the owning controller is rebooted by the test. Due to the controller reboot (i) the controller firmware will regenerate parity for the data stripe involved in a write, and (ii) a forced transfer to the surviving controller takes place.
- next N writes to be implemented using old/new data to generate a new parity bit.
- a data drive can fail unexpectedly in a volume group before the host retries the write.
- the controller does not know whether the write completed to the data drive and/or parity drive. If the write completes to the data drive, but does not complete to the parity drive, or vice versa, a potential data corruption will be detected due to the inconsistency between data and parity.
- the present invention concerns an apparatus comprising a storage array, a primary controller, a secondary controller and a solid state device.
- the storage array may be configured to be accessed by a plurality of controllers.
- a first of the plurality of the controllers may be configured as the primary controller configured to read and write to and from the storage array during a normal condition.
- a second of the plurality of the controllers may be configured as the secondary controller configured to read and write to and from the storage array during a fault condition.
- the solid state device may be configured to (i) store data and (ii) be accessed by the storage array and the secondary controller.
- the objects, features and advantages of the present invention include providing a method and/or apparatus that may (i) have multiple cores to handle IO processing, (ii) have certain cores handle reconstruction and IO write process, (iii) have certain cores handle IO read process and stripe-set preservation (e.g., SSP and/or previous state preservation), (iv) have a non-volatile RAM (e.g., a solid state drive) to store the data for SSP, (v) prevent data corruption when double faults occur, and/or (vi) provide performance enhancement with multiple cores handling next N writes by reading all stripes to generate new parity.
- a non-volatile RAM e.g., a solid state drive
- FIG. 1 is a flow diagram illustrating a double fault condition
- FIG. 2 is a block diagram of a double fault condition
- FIG. 3 is a conceptual diagram of a double fault condition
- FIG. 4 is a block diagram of an example embodiment of the present invention.
- FIG. 5 is a flow diagram illustrating an example embodiment of the present invention.
- FIG. 6 is a conceptual diagram of an example embodiment of the present invention.
- the process 50 generally comprises a state 52 , a state 54 , a state 56 , a state 58 , a state 60 , and a state 62 .
- a controller enters an NVSRAM interrupted write mode.
- the controller reboots.
- a LUN undergoing an input/output (IO) is forced to transfer to an alternate controller.
- IO input/output
- subsequent write operations have old/new data in the stripe set to regenerate a new parity.
- one of the physical drives under the LUN 0 fails before completing the interrupted write.
- a potential data corruption state occurs.
- the system 70 generally comprises a block 72 , a block 74 , a block 76 , and a block 78 .
- the block 72 may be implemented as a controller (e.g., controller A).
- the block 74 may be implemented as a controller (e.g., controller B).
- the block 76 may be implemented as a LUN 0 .
- the block 78 may be implemented as a drive array.
- the system 70 may operate in a number of states as shown in the following TABLE 1:
- the controller A is implemented as an owning controller for the LUN 0 76 .
- the controller B is implemented as an alternate controller.
- the controller B is in a passive state and implemented for redundancy.
- the LUN 0 76 is created on top of the drive array 78 .
- the drive array 78 is implemented as four physical drives (or disks) (e.g., P 1 , P 2 , P 3 , P 4 ).
- the particular number of drives may be varied to meet the design criteria of a particular implementation.
- a number of data segments D 0 A, D 1 A, D 2 A are implemented in a stripe set (e.g., A) which is used to generate a parity (e.g., PA).
- the state 4 of TABLE 1 the state 3 is located on the stripe set A location of the LUN 0 76 created in state 2 .
- the controller A sends a write IO (e.g., the data segment D 0 A′) to the LUN 0 76 .
- the data D 0 A′ is written in a first disk (e.g., P 1 ) of the drive array 78 .
- a parity e.g., PA′
- a fourth disk e.g., P 4
- the controller A reboots.
- a forced transfer of the LUN 0 76 to the controller B happens.
- the action of the state 7 triggers the pending write and the next “N” writes to involve generation of the parity PA′ by reading all data segments (whether the LUN 0 76 has old/new data).
- a scenario may be considered where the controller B tries to regenerate the parity PA′ for the stripe set A, which has an interrupted write mode (e.g., the state 9 ).
- the controller B reads new data (e.g., D 0 A′ from the first disk), old data (e.g., D 1 A from the second disk) and old data (e.g., D 2 A from the third disk) to generate the parity PA′ as new parity in place of the parity PA (from the fourth disk). If the second disk fails during the state 10 , one new data and old parity results (the state 11 of TABLE 1).
- the stripe set A (e.g., D 0 A, D 1 A, D 2 A, and PA) is stored in the physical drives (e.g., P 1 , P 2 , P 3 , and P 4 ) of the drive array 78 .
- a write IO is sent for a new data (e.g., D 0 A′) to be written to the first disk (e.g., P 1 ).
- the write IO is completed for the data D 0 A′.
- an interrupted write mode occurs (e.g., a first fault) without completing the write to the second disk (e.g., P 2 ).
- the second disk e.g., P 2
- fails e.g., a second fault
- the system 70 may experience both the first fault and the second fault (e.g., the double fault condition).
- the system generally comprises a module 102 , a module 104 , a module 106 , a module 108 , and a module 110 .
- the module 102 may be implemented as a controller (e.g., controller A).
- the module 104 may be implemented as a controller (e.g., controller B).
- the module 106 may be implemented as a LUN 0 .
- the module 108 may be implemented as a storage array.
- the module 108 may represent an array of disk drives or other storage devices (e.g., solid state storage, etc.).
- the modules 102 , 104 and 106 may be implemented as hardware, software, a combination of hardware and software, or other implementations.
- the module 110 may be implemented as a storage device. In one example, the storage device 110 may be implemented as a solid state drive (or device).
- the system 100 may be implemented in a number of states as shown in the following TABLE 2:
- the controller A may be implemented as an owning controller for the LUN 0 106 (e.g., during a normal condition).
- the controller B may be implemented as an alternate controller.
- the controller B may be in a passive state and may be implemented for redundancy.
- the LUN 0 106 may be created on top of the drive array 108 .
- IO requests are normally sent to the LUN 0 106 , which translates such requests to the storage devices in the storage array 108 . While one LUN 0 is shown, a number of LUNs may be implemented (e.g., up to 2048 or more) in a particular design.
- the storage array 108 may be implemented as four physical drives (or disks) (e.g., P 1 , P 2 , P 3 , P 4 ).
- the particular number of drives may be varied to meet the design criteria of a particular implementation.
- a number of data segments D 0 A, D 1 A, D 2 A may be implemented in a stripe set (e.g., A) which is normally used to generate a parity (e.g., PA).
- the state 3 may be located on the stripe set A location of the LUN 0 106 created in the state 2 .
- the controller A may send a write IO (e.g., the data segment D 0 A′) to the LUN 0 106 .
- the system 100 may implement a Stripe-set State Preservation (SSP).
- SSP Stripe-set State Preservation
- the previous state of the stripe set A (e.g., D 0 A, D 1 A, D 2 A and PA) may be stored by the storage device 110 .
- the previous state of the stripe set A may be stored before the stripe set A is written to the LUN 0 106 .
- Corresponding mappings may be maintained by the controller B.
- the stripe set having the data segment to be written may be read and stored in the storage device 110 for each write IO request.
- the data segment D 0 A′ may be written in a first disk (e.g., P 1 ) of the drive array 108 .
- the controller A may reboot (e.g., a fault condition).
- the controller B may not be sure that the state 7 has completed correctly (e.g., an incomplete write, a recovered error state, etc.).
- the controller B may retry the write with the new data in cache and make sure it is written to the first disk before generating the parity PA′.
- a forced transfer of the LUN 0 106 to the controller B e.g., the alternate controller
- the state 8 may trigger the pending write and the next “N” writes to generate the parity PA′ by reading all stripe segments (whether the LUN 0 106 has old/new data).
- a scenario may be considered where the controller B tries to regenerate the parity PA′ for the stripe set A, which may have an interrupted write mode (e.g., the state 10 of TABLE 2).
- the controller B may read new data (e.g., D 0 A′ from the first disk), old data (e.g., D 1 A from the second disk) and old data (e.g., D 2 A from the third disk) to generate the parity PA′ as new parity in place of the parity PA (e.g. from the fourth disk).
- the second disk may fail in the state 11 . If the second disk fails, one new data and the old parity may result.
- Regenerating data for the second disk may not be done with a present stripe-set state.
- the data D 1 A (stored in the state 6 ) may be generated or read from the storage device 110 (the state 13 ).
- the storage device 110 may generate the data D 1 A.
- the LUN 0 106 may use the data D 0 A′, D 1 A, D 2 A to regenerate the parity PA′.
- the state 14 of TABLE 2 the data stored in the storage device 110 during the state 6 may be erased.
- the system 100 may involve multiple cores (or controllers). Based on design implementation, certain cores (e.g., write cores) may handle data reconstruction and IO write operations while other cores (e.g., read cores) may focus on read operations.
- the read cores may handle the Stripe-set State Preservation (SSP).
- SSP Stripe-set State Preservation
- the SSP may involve reading an entire data segment in the stripe set (e.g., for a write operation) before the write operation begins.
- the data segment to be written may be read by the read cores in response to each write IO request.
- the data segment may also be stored in the storage device 110 (e.g., a NVRAM, solid state drive, etc.).
- the corresponding data state with respect to the LUN 0 106 may be mapped and maintained in a separate table by the controller B.
- the system 100 may prevent possible data corruption.
- the process 200 generally comprises a state 202 , a state 204 , a state 206 , a state 208 , a state 210 , a state 212 , a state 214 , a state 216 , a state 218 , a state 220 , a state 222 , and state 224 .
- a write IO may be sent to a stripe set A of the LUN 0 106 .
- the controller A may trigger the read cores for a Stripe-set State Preservation (SSP) operation.
- SSP Stripe-set State Preservation
- the storage device 110 may store the SSP data.
- the controller A may trigger the write cores for a IO write.
- the controller A may enter a non-volatile SRAM interrupted write mode.
- the controller A may reboot.
- the LUN 0 106 undergoing an IO may be forced to transfer to the controller B (e.g., an alternate controller).
- the next “N” writes may have old/new data in the stripe set A to regenerate the parity PA′.
- one physical drive under the LUN 0 106 may fail before completion of the interrupted write.
- the LUN 0 106 may retrieve the SSP data from the storage device 110 .
- the parity PA′ may be generated with correct old/new data stripes and may avoid data corruption.
- the storage device 110 may erase the SSP data.
- the stripe set A (e.g., a data block D 0 A, a data block D 1 A, a data block D 2 A, and a parity block PA) may be shown written to the drive array 108 .
- the data block DOA may represent a single data block or a plurality of data blocks.
- the data blocks D 1 A and D 2 A, as well as the parity block PA, may each represent one or more data blocks.
- a write IO may be sent for a new data block (e.g., D 0 A′).
- the stripe set A may be written to the storage device 110 .
- the write operation for the data block D 0 A′ may be completed in the STATE 3 .
- an interrupted write mode may occur without completing the write to the second disk (e.g., P 2 ).
- the write operation to the second disk e.g., P 2
- the write operation to the second disk may fail.
- the failure of the write to the second disk may result in the loss of the data block D 1 A.
- the data block D 1 A may be read from the storage device 110 .
- the parity block PA′ may be generated.
- the stripe set A (stored in the STATE 3 ) may be erased.
- the STATES 1 - 7 generally describe different states used to implement a read modify/write implementation. In general, the symbol “+” in FIG. 6 represents an exclusive OR.
- the system 100 may handle interrupted write modes using dual cores.
- one or more cores may handle the IO thread for write and one or more cores may handle the IO thread for reading the old data in the stripe set where the write is intended to be performed.
- the old data read (e.g., by the read cores) may be stored in the storage device 110 (e.g., a NVRAM) and the controller B may map the old data with respect to the stripe set location.
- Data corruptions may be prevented during a double fault situation where a NVSRAM interrupted write mode condition happens within the controller (e.g., the owning controller A is rebooted) and one of the hard disk drives fails.
- NVSRAM interrupted write mode is meant to describe the condition where a controller is in the middle of writing IO to a series of drives (or storage devices) and the controller hits an exception (e.g., a reboot, failed state, etc.) thus interrupting the write sequence to a particular stripe set.
- exception e.g., a reboot, failed state, etc.
Abstract
An apparatus comprising a storage array, a primary controller, a secondary controller and a solid state device. The storage array may be configured to be accessed by a plurality of controllers. A first of the plurality of the controllers may be configured as the primary controller configured to read and write to and from the storage array during a normal condition. A second of the plurality of the controllers may be configured as the secondary controller configured to read and write to and from the storage array during a fault condition. The solid state device may be configured to (i) store data and (ii) be accessed by the storage array and the secondary controller.
Description
- The present invention relates to storage arrays generally and, more particularly, to a method and/or apparatus for responding to handling interrupted writes using multiple cores.
- Conventional systems address potential double fault conditions in a variety of ways. A conventional controller enters an NVSRAM interrupted write mode condition and the owning controller is rebooted by the test. Due to the controller reboot (i) the controller firmware will regenerate parity for the data stripe involved in a write, and (ii) a forced transfer to the surviving controller takes place.
- The two conditions above cause the next N writes to be implemented using old/new data to generate a new parity bit. While performing the previous tasks associated with the next N write cycles, a data drive can fail unexpectedly in a volume group before the host retries the write. In such a condition, the controller does not know whether the write completed to the data drive and/or parity drive. If the write completes to the data drive, but does not complete to the parity drive, or vice versa, a potential data corruption will be detected due to the inconsistency between data and parity.
- It would be desirable to implement a method and/or apparatus for handling interrupted writes using multiple cores that avoids data corruption.
- The present invention concerns an apparatus comprising a storage array, a primary controller, a secondary controller and a solid state device. The storage array may be configured to be accessed by a plurality of controllers. A first of the plurality of the controllers may be configured as the primary controller configured to read and write to and from the storage array during a normal condition. A second of the plurality of the controllers may be configured as the secondary controller configured to read and write to and from the storage array during a fault condition. The solid state device may be configured to (i) store data and (ii) be accessed by the storage array and the secondary controller.
- The objects, features and advantages of the present invention include providing a method and/or apparatus that may (i) have multiple cores to handle IO processing, (ii) have certain cores handle reconstruction and IO write process, (iii) have certain cores handle IO read process and stripe-set preservation (e.g., SSP and/or previous state preservation), (iv) have a non-volatile RAM (e.g., a solid state drive) to store the data for SSP, (v) prevent data corruption when double faults occur, and/or (vi) provide performance enhancement with multiple cores handling next N writes by reading all stripes to generate new parity.
- These and other objects, features and advantages of the present invention will be apparent from the following detailed description and the appended claims and drawings in which:
-
FIG. 1 is a flow diagram illustrating a double fault condition; -
FIG. 2 is a block diagram of a double fault condition; -
FIG. 3 is a conceptual diagram of a double fault condition; -
FIG. 4 is a block diagram of an example embodiment of the present invention; -
FIG. 5 is a flow diagram illustrating an example embodiment of the present invention; and -
FIG. 6 is a conceptual diagram of an example embodiment of the present invention. - Referring to
FIG. 1 , a flow diagram of aprocess 50 is shown illustrating a double fault condition. Theprocess 50 generally comprises astate 52, astate 54, astate 56, astate 58, astate 60, and astate 62. In thestate 52, a controller enters an NVSRAM interrupted write mode. In thestate 54, the controller reboots. In thestate 56, a LUN undergoing an input/output (IO) is forced to transfer to an alternate controller. In thestate 58, subsequent write operations have old/new data in the stripe set to regenerate a new parity. In thestate 60, one of the physical drives under the LUN0 fails before completing the interrupted write. In thestate 62, a potential data corruption state occurs. - Referring to
FIG. 2 , a block diagram of asystem 70 is shown illustrating a double fault condition. Thesystem 70 generally comprises ablock 72, ablock 74, ablock 76, and a block 78. Theblock 72 may be implemented as a controller (e.g., controller A). Theblock 74 may be implemented as a controller (e.g., controller B). Theblock 76 may be implemented as a LUN0. The block 78 may be implemented as a drive array. Thesystem 70 may operate in a number of states as shown in the following TABLE 1: -
TABLE 1 STATE ACTION 1 Controller A and Controller B 2 LUN0 created on top of physical drives 3 Stripe Set A used to generate parity PA 4 State 3 located on the Stripe Set A location ofLUN0 5 Write IO sent to LUN0 6 DOA′ written to P1, Controller A reboots 7 Forced transfer of LUN0 to Controller B 8 Pending write and the next “n” writes involve generation of parity PA′ by reading all strip segments 9 Controller B regenerates parity PA′ for stripe set A having an interrupted write mode 10 Controller B reads new data and old data to generate parity PA′ 11 P2 fails during step 10 12 If state 11 occurs, regeneration of data for P2 results in wrong data 13 Data corruption for next “x” writes due to state 11 - In the
state 1 of TABLE 1, the controller A is implemented as an owning controller for theLUN0 76. The controller B is implemented as an alternate controller. The controller B is in a passive state and implemented for redundancy. In thestate 2 of TABLE 1, the LUN0 76 is created on top of the drive array 78. In one example, the drive array 78 is implemented as four physical drives (or disks) (e.g., P1, P2, P3, P4). However, the particular number of drives may be varied to meet the design criteria of a particular implementation. In thestate 3 of TABLE 1, a number of data segments D0A, D1A, D2A are implemented in a stripe set (e.g., A) which is used to generate a parity (e.g., PA). In thestate 4 of TABLE 1, thestate 3 is located on the stripe set A location of theLUN0 76 created instate 2. In the state 5 of TABLE 1, the controller A sends a write IO (e.g., the data segment D0A′) to theLUN0 76. In thestate 6 of TABLE 1, the data D0A′ is written in a first disk (e.g., P1) of the drive array 78. Before regenerating and writing a parity (e.g., PA′) to a fourth disk (e.g., P4), the controller A reboots. In thestate 7 of TABLE 1, a forced transfer of theLUN0 76 to the controller B (e.g., the alternate controller) happens. In the state 8 of TABLE 1, the action of thestate 7 triggers the pending write and the next “N” writes to involve generation of the parity PA′ by reading all data segments (whether theLUN0 76 has old/new data). - In one example, a scenario may be considered where the controller B tries to regenerate the parity PA′ for the stripe set A, which has an interrupted write mode (e.g., the state 9). In the state 10 of TABLE 1, the controller B reads new data (e.g., D0A′ from the first disk), old data (e.g., D1A from the second disk) and old data (e.g., D2A from the third disk) to generate the parity PA′ as new parity in place of the parity PA (from the fourth disk). If the second disk fails during the state 10, one new data and old parity results (the state 11 of TABLE 1). In the state 11 scenario (e.g., the second disk fails), regenerating data for the second disk will result in wrong data since one new data and the parity PA that was not generated using the new data stripes is used. In the state 13 of TABLE 1, data corruption occurs for the next ‘x’ writes due to the second disk failing.
- Referring to
FIG. 3 , a conceptual diagram of a double fault condition of thesystem 70 is shown. In theSTATE 1, the stripe set A (e.g., D0A, D1A, D2A, and PA) is stored in the physical drives (e.g., P1, P2, P3, and P4) of the drive array 78. In theSTATE 2, a write IO is sent for a new data (e.g., D0A′) to be written to the first disk (e.g., P1). In theSTATE 3, the write IO is completed for the data D0A′. After the write IO has completed in theSTATE 3, an interrupted write mode occurs (e.g., a first fault) without completing the write to the second disk (e.g., P2). In theSTATE 4, while trying to generate the new parity PA′, the second disk (e.g., P2) fails (e.g., a second fault) resulting in the wrong generation of D1A′ using the parity PA with the data D0A′ and the data D2A. Thesystem 70 may experience both the first fault and the second fault (e.g., the double fault condition). - Referring to
FIG. 4 , a block diagram of thesystem 100 is shown. The system generally comprises amodule 102, amodule 104, amodule 106, amodule 108, and amodule 110. Themodule 102 may be implemented as a controller (e.g., controller A). Themodule 104 may be implemented as a controller (e.g., controller B). Themodule 106 may be implemented as a LUN0. Themodule 108 may be implemented as a storage array. For example, themodule 108 may represent an array of disk drives or other storage devices (e.g., solid state storage, etc.). Themodules module 110 may be implemented as a storage device. In one example, thestorage device 110 may be implemented as a solid state drive (or device). Thesystem 100 may be implemented in a number of states as shown in the following TABLE 2: -
TABLE 2 STATE ACTION 1 Controller A and Controller B 2 LUN0 created on top of physical drives 3 Stripe Set A used to generate parity PA 4 State 3 located on the stripe set A location ofLUN0 5 Write IO sent to LUN0 6 Stripe-set State Preservation of data 7 DOA′ written to P1, Controller A reboots 8 Forced transfer of LUN0 to Controller B 9 Pending write and the next “n” writes involve generation of parity PA′ by reading all strip segments 10 Controller B regenerates parity PA′ for stripe set A having an interrupted write mode 11 Controller B reads new data and old data to generate parity PA′ 12 P2 fails during step 11 13 If state 12 occurs, D1A generated/read from SSP 14 Data stored in state 6 is erased - In the
state 1 of TABLE 2, the controller A may be implemented as an owning controller for the LUN0 106 (e.g., during a normal condition). The controller B may be implemented as an alternate controller. The controller B may be in a passive state and may be implemented for redundancy. In thestate 2 of TABLE 2, theLUN0 106 may be created on top of thedrive array 108. IO requests are normally sent to theLUN0 106, which translates such requests to the storage devices in thestorage array 108. While one LUN0 is shown, a number of LUNs may be implemented (e.g., up to 2048 or more) in a particular design. In the example shown, thestorage array 108 may be implemented as four physical drives (or disks) (e.g., P1, P2, P3, P4). However, the particular number of drives may be varied to meet the design criteria of a particular implementation. In thestate 3 of TABLE 2, a number of data segments D0A, D1A, D2A may be implemented in a stripe set (e.g., A) which is normally used to generate a parity (e.g., PA). In thestate 4 of TABLE 2, thestate 3 may be located on the stripe set A location of theLUN0 106 created in thestate 2. In the state 5 of TABLE 2, the controller A may send a write IO (e.g., the data segment D0A′) to theLUN0 106. - In the
state 6 of TABLE 2, thesystem 100 may implement a Stripe-set State Preservation (SSP). In an SSP, the previous state of the stripe set A (e.g., D0A, D1A, D2A and PA) may be stored by thestorage device 110. In one example, the previous state of the stripe set A may be stored before the stripe set A is written to theLUN0 106. Corresponding mappings may be maintained by the controller B. In one example, the stripe set having the data segment to be written may be read and stored in thestorage device 110 for each write IO request. In thestate 7 of TABLE 2, the data segment D0A′ may be written in a first disk (e.g., P1) of thedrive array 108. Before regenerating and writing a parity (e.g., PA′) to a fourth disk (e.g., P4), the controller A may reboot (e.g., a fault condition). In one example, the controller B may not be sure that thestate 7 has completed correctly (e.g., an incomplete write, a recovered error state, etc.). The controller B may retry the write with the new data in cache and make sure it is written to the first disk before generating the parity PA′. In the state 8 of TABLE 2, a forced transfer of theLUN0 106 to the controller B (e.g., the alternate controller) may happen. In the state 9 of TABLE 2, the state 8 may trigger the pending write and the next “N” writes to generate the parity PA′ by reading all stripe segments (whether theLUN0 106 has old/new data). - In one example, a scenario may be considered where the controller B tries to regenerate the parity PA′ for the stripe set A, which may have an interrupted write mode (e.g., the state 10 of TABLE 2). In the state 11 of TABLE 2, the controller B may read new data (e.g., D0A′ from the first disk), old data (e.g., D1A from the second disk) and old data (e.g., D2A from the third disk) to generate the parity PA′ as new parity in place of the parity PA (e.g. from the fourth disk). In the state 12 of TABLE 2, the second disk may fail in the state 11. If the second disk fails, one new data and the old parity may result. Regenerating data for the second disk may not be done with a present stripe-set state. The data D1A (stored in the state 6) may be generated or read from the storage device 110 (the state 13). The
storage device 110 may generate the data D1A. TheLUN0 106 may use the data D0A′, D1A, D2A to regenerate the parity PA′. In the state 14 of TABLE 2, the data stored in thestorage device 110 during thestate 6 may be erased. - The
system 100 may involve multiple cores (or controllers). Based on design implementation, certain cores (e.g., write cores) may handle data reconstruction and IO write operations while other cores (e.g., read cores) may focus on read operations. In one example, the read cores may handle the Stripe-set State Preservation (SSP). For example, the SSP may involve reading an entire data segment in the stripe set (e.g., for a write operation) before the write operation begins. The data segment to be written may be read by the read cores in response to each write IO request. The data segment may also be stored in the storage device 110 (e.g., a NVRAM, solid state drive, etc.). The corresponding data state with respect to theLUN0 106 may be mapped and maintained in a separate table by the controller B. Thesystem 100 may prevent possible data corruption. - Referring to
FIG. 5 , a flow diagram of theprocess 200 is shown illustrating an example embodiment of the present invention. Theprocess 200 generally comprises astate 202, astate 204, astate 206, astate 208, astate 210, astate 212, astate 214, astate 216, astate 218, astate 220, astate 222, andstate 224. In thestate 202, a write IO may be sent to a stripe set A of theLUN0 106. In thestate 204, the controller A may trigger the read cores for a Stripe-set State Preservation (SSP) operation. In thestate 206, thestorage device 110 may store the SSP data. In thestate 208, the controller A may trigger the write cores for a IO write. In thestate 210, the controller A may enter a non-volatile SRAM interrupted write mode. In thestate 212, the controller A may reboot. In thestate 214, theLUN0 106 undergoing an IO may be forced to transfer to the controller B (e.g., an alternate controller). In thestate 216, the next “N” writes may have old/new data in the stripe set A to regenerate the parity PA′. In thestate 218, one physical drive under theLUN0 106 may fail before completion of the interrupted write. In thestate 220, theLUN0 106 may retrieve the SSP data from thestorage device 110. In thestate 222, the parity PA′ may be generated with correct old/new data stripes and may avoid data corruption. In thestep 224, thestorage device 110 may erase the SSP data. - Referring to
FIG. 6 , a conceptual diagram of Stripe-set State Preservation by thesystem 100 is shown. In theSTATE 1, the stripe set A (e.g., a data block D0A, a data block D1A, a data block D2A, and a parity block PA) may be shown written to thedrive array 108. The data block DOA may represent a single data block or a plurality of data blocks. The data blocks D1A and D2A, as well as the parity block PA, may each represent one or more data blocks. In theSTATE 2, a write IO may be sent for a new data block (e.g., D0A′). In theSTATE 3, the stripe set A may be written to thestorage device 110. The write operation for the data block D0A′ may be completed in theSTATE 3. After theSTATE 3, an interrupted write mode may occur without completing the write to the second disk (e.g., P2). While theLUN0 106 may try to generate the parity PA′ in theSTATE 4, the write operation to the second disk (e.g., P2) may fail. The failure of the write to the second disk may result in the loss of the data block D1A. In the STATE 5, the data block D1A may be read from thestorage device 110. In theSTATE 6, the parity block PA′ may be generated. In theSTATE 7, the stripe set A (stored in the STATE 3) may be erased. The STATES 1-7 generally describe different states used to implement a read modify/write implementation. In general, the symbol “+” inFIG. 6 represents an exclusive OR. - The
system 100 may handle interrupted write modes using dual cores. For example, one or more cores may handle the IO thread for write and one or more cores may handle the IO thread for reading the old data in the stripe set where the write is intended to be performed. The old data read (e.g., by the read cores) may be stored in the storage device 110 (e.g., a NVRAM) and the controller B may map the old data with respect to the stripe set location. Data corruptions may be prevented during a double fault situation where a NVSRAM interrupted write mode condition happens within the controller (e.g., the owning controller A is rebooted) and one of the hard disk drives fails. - As used herein, the term “NVSRAM interrupted write mode” is meant to describe the condition where a controller is in the middle of writing IO to a series of drives (or storage devices) and the controller hits an exception (e.g., a reboot, failed state, etc.) thus interrupting the write sequence to a particular stripe set.
- While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the invention.
Claims (18)
1. An apparatus comprising:
a storage array configured to be accessed by a plurality of controllers;
a first of said plurality of said controllers configured as a primary controller configured to read and write to and from said storage array during a normal condition;
a second of said plurality of said controllers configured as a secondary controller configured to read and write to and from said storage array during a fault condition; and
a solid state device configured to (i) store data and (ii) be accessed by said storage array and said secondary controller.
2. The apparatus according to claim 1 , wherein said storage array accesses said solid state device when (i) said primary controller experiences said fault condition and (ii) said storage array experiences a failure during a write.
3. The apparatus according to claim 1 , wherein said said primary controller reboots during said fault condition.
4. The apparatus according to claim 3 , wherein said storage array transfers to said secondary controller in response to said fault condition.
5. The apparatus according to claim 4 , wherein said transfer is forced.
6. The apparatus according to claim 1 , wherein said secondary controller is in a passive state for redundancy during said normal condition.
7. The apparatus according to claim 1 , wherein said storage array comprises a plurality of physical drives.
8. The apparatus according to claim 1 , wherein said fault condition comprises a NVSRAM interrupted write mode.
9. The apparatus according to claim 1 , wherein said data comprises a stripe set.
10. The apparatus according to claim 9 , wherein said solid state device stores said data before said data is written to said storage array.
11. The apparatus according to claim 1 , wherein said data is stored by said storage device for each write IO request.
12. An apparatus comprising:
means for storing information configured to be accessed by a plurality of controllers;
means for implementing a first of said plurality of said controllers as a primary controller for reading and writing to and from said means for storing during a normal condition;
means for implementing a second of said plurality of said controllers as a secondary controller for reading and writing to and from said means for storing during a fault condition; and
means for implementing a solid state device for (i) storing data and (ii) accessing by a storage array and said secondary controller.
13. A method for preventing data corruption during a double fault condition, comprising the steps of:
(A) creating a LUN to address a storage array comprising a plurality of storage devices;
(B) sending an input/output request to said LUN;
(C) storing a data set of said write input/output to a storage device separate from said plurality of storage devices; and
(D) reading said data set from said separate storage device to generate a parity after a double fault condition.
14. The method according to claim 13 , wherein step (B) further comprises the step of:
generating said input/output request by a first controller.
15. The method according to claim 14 , wherein said double fault condition causes said first controller to reboot and allow a second controller to control said storage array.
16. The method according to claim 13 , wherein said double fault condition occurs in response to one of said storage devices failing.
17. The method according to claim 13 , wherein said plurality of storage devices each comprise a disk drive.
18. The method according to claim 13 , wherein said double fault condition comprises a failure of (i) said first controller and (ii) one of said storage devices.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/354,126 US20100180151A1 (en) | 2009-01-15 | 2009-01-15 | Method for handling interrupted writes using multiple cores |
US14/805,667 US9400716B2 (en) | 2009-01-15 | 2015-07-22 | Method for handling interrupted writes using multiple cores |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/354,126 US20100180151A1 (en) | 2009-01-15 | 2009-01-15 | Method for handling interrupted writes using multiple cores |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/805,667 Division US9400716B2 (en) | 2009-01-15 | 2015-07-22 | Method for handling interrupted writes using multiple cores |
Publications (1)
Publication Number | Publication Date |
---|---|
US20100180151A1 true US20100180151A1 (en) | 2010-07-15 |
Family
ID=42319871
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/354,126 Abandoned US20100180151A1 (en) | 2009-01-15 | 2009-01-15 | Method for handling interrupted writes using multiple cores |
US14/805,667 Active US9400716B2 (en) | 2009-01-15 | 2015-07-22 | Method for handling interrupted writes using multiple cores |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/805,667 Active US9400716B2 (en) | 2009-01-15 | 2015-07-22 | Method for handling interrupted writes using multiple cores |
Country Status (1)
Country | Link |
---|---|
US (2) | US20100180151A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5774643A (en) * | 1995-10-13 | 1998-06-30 | Digital Equipment Corporation | Enhanced raid write hole protection and recovery |
US6574709B1 (en) * | 1999-09-30 | 2003-06-03 | International Business Machine Corporation | System, apparatus, and method providing cache data mirroring to a data storage system |
US6820212B2 (en) * | 2001-02-20 | 2004-11-16 | Digi-Data Corporation | RAID system having channel capacity unaffected by any single component failure |
US6993610B2 (en) * | 2001-04-26 | 2006-01-31 | Richmount Computers Limited | Data storage system having two disk drive controllers each having transmit and receive path connected in common to single port of disk drive via buffer or multiplexer |
US7058848B2 (en) * | 2000-03-30 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | Controller-based remote copy system with logical unit grouping |
US20080005614A1 (en) * | 2006-06-30 | 2008-01-03 | Seagate Technology Llc | Failover and failback of write cache data in dual active controllers |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6523087B2 (en) * | 2001-03-06 | 2003-02-18 | Chaparral Network Storage, Inc. | Utilizing parity caching and parity logging while closing the RAID5 write hole |
US6766491B2 (en) * | 2001-05-09 | 2004-07-20 | Dot Hill Systems Corp. | Parity mirroring between controllers in an active-active controller pair |
JP4435705B2 (en) * | 2005-03-14 | 2010-03-24 | 富士通株式会社 | Storage device, control method thereof, and program |
US8583865B1 (en) * | 2007-12-21 | 2013-11-12 | Emc Corporation | Caching with flash-based memory |
-
2009
- 2009-01-15 US US12/354,126 patent/US20100180151A1/en not_active Abandoned
-
2015
- 2015-07-22 US US14/805,667 patent/US9400716B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5774643A (en) * | 1995-10-13 | 1998-06-30 | Digital Equipment Corporation | Enhanced raid write hole protection and recovery |
US6574709B1 (en) * | 1999-09-30 | 2003-06-03 | International Business Machine Corporation | System, apparatus, and method providing cache data mirroring to a data storage system |
US7058848B2 (en) * | 2000-03-30 | 2006-06-06 | Hewlett-Packard Development Company, L.P. | Controller-based remote copy system with logical unit grouping |
US6820212B2 (en) * | 2001-02-20 | 2004-11-16 | Digi-Data Corporation | RAID system having channel capacity unaffected by any single component failure |
US6993610B2 (en) * | 2001-04-26 | 2006-01-31 | Richmount Computers Limited | Data storage system having two disk drive controllers each having transmit and receive path connected in common to single port of disk drive via buffer or multiplexer |
US20080005614A1 (en) * | 2006-06-30 | 2008-01-03 | Seagate Technology Llc | Failover and failback of write cache data in dual active controllers |
Also Published As
Publication number | Publication date |
---|---|
US9400716B2 (en) | 2016-07-26 |
US20150324263A1 (en) | 2015-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8289641B1 (en) | Partial data storage device failures and improved storage resiliency | |
US8156392B2 (en) | Apparatus, system, and method for bad block remapping | |
TWI450087B (en) | Data storage method for a plurality of raid systems and data storage system thereof | |
CN102023815B (en) | RAID is realized in solid-state memory | |
US10120769B2 (en) | Raid rebuild algorithm with low I/O impact | |
US20090327803A1 (en) | Storage control device and storage control method | |
US8904244B2 (en) | Heuristic approach for faster consistency check in a redundant storage system | |
US7689890B2 (en) | System and method for handling write commands to prevent corrupted parity information in a storage array | |
US11531590B2 (en) | Method and system for host-assisted data recovery assurance for data center storage device architectures | |
US20080184062A1 (en) | System and method for detecting write errors in a storage device | |
CN104813290A (en) | Raid surveyor | |
US7925918B2 (en) | Rebuilding a failed disk in a disk array | |
WO2015133982A1 (en) | Dram row sparing | |
TW202026874A (en) | Method and apparatus for performing dynamic recovery management regarding redundant array of independent disks and storage system operating according to the method | |
JP2013125513A (en) | Nonvolatile semiconductor memory device and management method therefor | |
US20060259812A1 (en) | Data protection method | |
US7730370B2 (en) | Apparatus and method for disk read checking | |
TW201329701A (en) | Automatic remapping in redundant array of independent disks and related raid | |
JP2005004753A (en) | Method and device of performing data version checking | |
US7577804B2 (en) | Detecting data integrity | |
US7174476B2 (en) | Methods and structure for improved fault tolerance during initialization of a RAID logical unit | |
US8671264B2 (en) | Storage control device and storage system | |
US9400716B2 (en) | Method for handling interrupted writes using multiple cores | |
JPH09218754A (en) | Data storage system | |
JP4218636B2 (en) | Storage device for storing portable media |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: LSI CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JIBBE, MAHMOUD K.;KANNAN, SENTHIL;RASAPPAN, SELVARAJ;SIGNING DATES FROM 20090114 TO 20090115;REEL/FRAME:022112/0312 |
|
AS | Assignment |
Owner name: NETAPP, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LSI CORPORATION;REEL/FRAME:026656/0659 Effective date: 20110506 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |