US 8180744 B2 Abstract A particular data value is represented as a group of segments stored in corresponding entries of a data structure. Additional data values represented by corresponding groups of segments are written into the data structure. A probability of overwriting segments representing the particular data value increases as a number of the additional data values increase. A correct version of the particular data value is retrieved even though one or more segments representing the particular data value has been overwritten.
Claims(23) 1. A method executed by a computer of managing storing of data in a data structure, comprising:
representing a particular data value as a group of segments stored in corresponding entries of the data structure;
writing additional data values represented by corresponding groups of segments into the data structure, wherein a probability of overwriting segments representing the particular data value increases as a number of the additional data values increase; and
retrieving, using error correction, a correct version of the particular data value even though one or more segments representing the particular data value has been overwritten,
wherein the error correction enables retrieval of the particular data value even if up to a predetermined number of the segments representing the particular data value has been overwritten, and
wherein the particular data value becomes non-retrievable if more than the predetermined number of the segments representing the particular data value has been overwritten.
2. The method of
causing gradual degradation of the particular data value with the writing of the additional data values written into the data structure, wherein the gradual degradation causes the particular data value to become irretrievable after some number of writes of the additional data items.
3. The method of
4. The method of
storing the data values in the data structure that is a hash table arranged as rows and columns;
hashing a key associated with the particular data value plural times to produce plural hash values;
storing the segments representing the particular data value in respective rows of the hash table according to the corresponding hash values.
5. The method of
6. The method of
hashing a key associated with each of the additional data values plural times to produce plural hash values for the corresponding key; and
storing the segments representing each of the additional data values in respective rows of the hash table according to the hash values of the corresponding additional data values.
7. The method of
8. The method of
9. The method of
10. The method of
in response to receiving a given key to retrieve a corresponding given data value from the data structure, retrieving the segments corresponding to the given data value from the data structure; and
decoding a code constructed from the retrieved segments to determine whether the given data value is retrievable.
11. The method of
12. The method of
13. The method of
inserting plural copies of the particular data value into the data structure to provide diversity for the particular data value, and to increase an expected lifetime of the particular data value in the data structure compared to some other data values in the data structure.
14. The method of
15. The method of
dividing a second data value to be stored in the data structure into a plurality of parts;
encoding each of the plurality of parts to produce a corresponding set of segments; and
storing the sets of segments that represent the corresponding plurality of parts in the data structure.
16. A method executed by a computer of managing storing of data in a data structure, comprising:
representing a particular data value as a group of segments stored in corresponding entries of the data structure;
writing additional data values represented by corresponding groups of segments into the data structure, wherein a probability of overwriting segments representing the particular data value increases as a number of the additional data values increase;
retrieving a correct version of the particular data value even though one or more segments representing the particular data value has been overwritten;
inserting plural copies of the particular data value into the data structure to provide diversity for the particular data value, and to increase an expected lifetime of the particular data value in the data structure compared to some other data values in the data structure, wherein inserting the plural copies of the particular data value comprises inserting plural corresponding groups of segments into the data structure, where each of the plural groups represents a respective one of the plural copies of the particular data value;
defining plural families of hash functions for the particular data value, where each of the plural families includes respective hash functions; and
for each of the plural groups of the segments, hashing a key associated with the particular data value with a respective one of the plural families of hash functions.
17. The method of
18. A method executed by a computer, comprising:
receiving a first pair of a key and data value to be stored in a hash table;
applying plural functions on the key to produce plural pointer values that point to different entries of the hash table;
encoding the data value to produce segments;
storing the segments representing the data value into entries of the hash table identified by the pointer values, wherein a first of the segments is stored in a first of the entries identified by a first of the pointer values, and a second of the segments is stored in a second of the entries identified by a second of the pointer values; and
retrieving, using error correction, a correct version of the data value even though one or more segments representing the data value has been overwritten, wherein the error correction enables retrieval of the data value even if up to a predetermined number of the segments representing the data value has been overwritten, and
wherein the data value becomes non-retrievable if more than the predetermined number of the segments representing the data value has been overwritten.
19. The method of
receiving additional pairs of keys and data values;
storing the additional pairs of keys and data values into the hash table,
wherein storing the additional pairs of keys and data values causes gradual degradation of the data value of the first pair.
20. The method of
21. An article comprising at least one computer-readable storage medium containing instructions that when executed cause a computer to:
store plural representations of a first data value in a data structure:
store additional data values in the data structure after storing the plural representations of the first data value, wherein storing the additional data values causes gradual degradation of at least one of the plural representations of the first data value;
in response to a request to retrieve the first data value, retrieve the plural representations of the first data value; and
decode the retrieved plural representations to determine whether the first data value is retrievable, wherein the decoding includes using error correction to recover the first data value, where the error correction enables recovery of the first data value even if up to a predetermined number of segments of a particular one of the plural representations has been overwritten, and
wherein the first data value becomes non-retrievable from the particular representation if more than the predetermined number of the segments of the particular representation has been overwritten.
22. The article of
23. The article of
Description This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Application Ser. No. 61/033,811, entitled “MANAGING STORAGE OF DATA IN A DATA STRUCTURE,” filed Mar. 5, 2008. In computers, storage devices (such as memory devices) are used to store various data involved in the execution of software or to perform other tasks, such as communications tasks, management tasks, and so forth. Data structures, such as tables, stored in storage devices often have fixed sizes. Examples of fixed-size data structures include lookup tables used in cache memory subsystems, lookup tables used for database applications, and so forth. With a fixed-size data structure, an algorithm conventionally has to be provided to explicitly select a data item in the data structure to remove (to eject the data item) so that space is freed up to enable addition of a new data item to the data structure. An example of such an algorithm is a least recently used (LRU) replacement algorithm. However, having to provide an algorithm to explicitly eject (remove) a data item from a data structure adds complexity to a system. Some embodiments of the invention are described, by way of example, with respect to the following figures: In accordance with some embodiments, a technique is provided to enable storage of data items into a fixed-size data structure without having to provide an explicit eject mechanism for selecting data items in the data structure to remove such that new data can be inserted into the data structure. A technique according to some embodiments, gradually degrades data items stored in the data structure by probabilistically overwriting different portions of existing data items in the data structure as new data items are inserted into the data structure. As more data items are inserted into the data structure, the degradation of earlier data items (data items written into the data structure at an earlier time) is increased until, at some point, the earlier data items are considered lost and cannot be retrieved. A data item that has been degraded to a point that it is no longer retrievable is considered to have “exit” the data structure, even though no explicit eject mechanism has been provided to remove this data item. A “data item” (or interchangeably, “data value”) refers generally to a unit of data that can be stored into a data structure. In accordance with some embodiments, the data item or data value is first encoded, and it is the encoded version of the data item or data value that is stored in the data structure. Thus, storing a data item or data value in a data structure can refer to storing an encoded version of the data item or data value in the data structure. A “data structure” refers to some predefined arrangement that provides entries for accepting data. More specifically, it is desired to store p key-data value pairs (K Each key K As depicted in As further depicted in In accordance with some embodiments, a “time arrow” feature is associated with the data structure To enable the retrieval of a correct version of data values from the data structure An error correction code can be used to recover the original data value (produce a correct version of the original data value), assuming that some portion(s) of the data value has been overwritten. Adding error correction codes when storing data values into the data structure In some embodiments, the data structure In some examples, the data structure As further depicted in The Store software module A process performed by the Store software module An input key-data value pair (K Each data value V The r segments are stored (at For example, in The process of Next, the key K If the original data value V In the foregoing, it was assumed that all key-data value pairs stored in the data structure The expected life of a key-data value pair (K, V) is the value of n (n insertions after (K,V) has been inserted) for which the probability of correct retrieval of (K, V) falls to below 0.5. It may be desirable that the expected life be longer for more important key-value pairs. To counteract the degradation of a data value stored in the data structure Multiple families H For each such family, a corresponding group of r segments representing the respective data value V is stored into the data structure The following describes the algorithm used for retrieving a key-data value pair that is associated with multiple families of hash functions. It is desired to retrieve a pair (K For each family of hash functions, a respective set of r segments is retrieved (at The multiple versions C′ Effectively each of the u parts v In storing the r segments for each part v The above process is depicted in Instructions of software described above (including the Store module Data and instructions (of the software) are stored in respective storage devices, which are implemented as one or more computer-readable or computer-usable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs). Note that the instructions of the software discussed above can be provided on one computer-readable or computer-usable storage medium, or alternatively, can be provided on multiple computer-readable or computer-usable storage media distributed in a large system having possibly plural nodes. Such computer-readable or computer-usable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components. In the foregoing description, numerous details are set forth to provide an understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these details. While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention. Patent Citations
Non-Patent Citations
Classifications
Legal Events
Rotate |