WO2011139176A1 - Multidimensional database and the method of control thereof - Google Patents

Multidimensional database and the method of control thereof Download PDF

Info

Publication number
WO2011139176A1
WO2011139176A1 PCT/RU2010/000532 RU2010000532W WO2011139176A1 WO 2011139176 A1 WO2011139176 A1 WO 2011139176A1 RU 2010000532 W RU2010000532 W RU 2010000532W WO 2011139176 A1 WO2011139176 A1 WO 2011139176A1
Authority
WO
WIPO (PCT)
Prior art keywords
identifiers
coordinates
identifier
type
pointers
Prior art date
Application number
PCT/RU2010/000532
Other languages
French (fr)
Inventor
Andrey Evgenevich Vasilev
Original Assignee
Andrey Evgenevich Vasilev
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Andrey Evgenevich Vasilev filed Critical Andrey Evgenevich Vasilev
Publication of WO2011139176A1 publication Critical patent/WO2011139176A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

A multidimensional database, the physical data structure consisting of elements logically corresponding to the point of entry, hypercubes, dimensions, coordinates descriptors and data vectors' coordinates of multidimensional space. Each element is represented by an identifier, transaction number, a code and a set of pointers of four types. Each element of hypercubes paired binary trees of hypercubes is connected to the roots of paired binary trees of dimensions, comprising elements with odd and even identifiers codes. Each element of hypercubes paired binary trees of dimensions is connected to the roots of paired binary trees of descriptors, comprising elements with odd and even identifiers codes. Each element of hypercubes paired binary trees of descriptors is connected to coordinate identifier with matching code. Method of multidimensional database control is disclosed comprising writing, reading, deletion, alteration, compression operations, and restoration of inner consistency of information after disorderly closedown of said database.

Description

Multidimensional database and method of control
thereof .
DESCRIPTION
STATE OF ART
The invention relates to indexed arrays of information logically organized into databases and stored on physical media, e.g., on magnetic disks, and to methods of controlling databases realized by software means on the base of computers.
The database (DB) is characterized by its logical model and consists of separate records, i.e. data comprising identifying characteristics. Data groups are united into subsets.
Information includes the data itself that describe objects of application environment, and metadata systematizing objects of application environment. Physical structure of data provides storing and processing of information by software means .
Method of DB control is based on realizing of data writing, reading, deletion, altering and compression operations and restoring internal consistency of information after abnormal DB shutdown. Operations with information are performed in the form of transactions, i.e., groups of. sequential operation representing logical units of data processing.
A multidimensional model of DB data logically represented as a multidimensional space is known in the art. Said multidimensional space is defined by dimensions - coordinate axes of multidimensional space, hypercubes - spheres of multidimensional space and vectors - cells of multidimensional space. Directly accessing cells of multidimensional space provides high speed of all data processing operations. However, as a number of dimensions and coordinate values increase, DB volume increases in scores, though the majority of multidimensional space cells remain empty. In order to decrease volume of multidimensional DB in its physical data structure several levels are formed, that are designated to separately store cells of multidimensional space, basically, data and metadata. An example of this is a multidimensional DB as described in US patent No. 5359724 of October 25, 1994.
At first structural level there is a full set of all possible combinations of coordinate axes which represent cells of multidimensional space. Pointers to blocks of data on second level are placed in cells. Metadata is located on third structural level. Every block of data is defined for a limited set of coordinates in order to write a data subset into a block. Number of multidimensional space cells significantly exceeds number of data blocks, which stipulates increased DB size.
Method of DB controlling includes recording of information changes by replacing previous data and metadata values, restoring information consistency by writing both DB file and transaction log file, which stipulates increased DB access time .
Prior art.
Subject of invention relates generally to multiuser DB and its controlling methods. Invention as disclosed in patent RU No.2389066 was chosen as prototype.
Logical model of the prototype includes dimensions consisting of coordinate values, hypercubes consisting of coordinates descriptors and data comprising coordinates united into hierarchic and variant groups.
The distinctness of logical data model and prototype from other multidimensional DB's is based on forming an incomplete set of multidimensional space cells corresponding only to meaningful data, writing data directly to multidimensional space cells and performing access to data via descriptors. The prototype uses three-level data structure. The first level comprises dimensions tables, the second comprises descriptors vectors, the third - data vectors. DB includes data subsets, i.e. hypercubes.
Method of DB controlling includes performance of information (data) . writing, reading, deletion, altering and compression operations and restoring internal consistency of information after abnormal DB shutdown.
Information is written as tables in form of dimensions coordinates values and in form of vectors as data versions and coordinates descriptors related thereto. Reading of information is realized by searching data versions and the further their consolidation. Deletion and changing of data is realized by writing additional data versions.
Compression of information is realized by creating second copies of hypercubes for storing (writing) new information and third copies of hypercubes for writing consolidated information previously stored in DB. After the completion of the compression, the second and the third copies of hypercubes are linked to each other, and the first copies are deleted for DB.
Recovery of internal consistency of information after the DB emergency shutdown is performed through vectors of descriptors anticipatory writing in relation to writing data vectors prior to the shutdown and comparing values of their identifiers (IDs) in the recovery phase of internal consistency of information .
However, the prototype uses two different physical data structures: Dimensions and coordinates values are stored in the DB in form of tables, while descriptors and the data itself - in form of vectors. Resulting non-uniform physical structure of data leads to increasing of data search time and the additionally causes whole DB volume growth. Data compression in the prototype is based on forming two additional copies of hypercubes which also causes DB volume increase .
The approach used in the prototype of anticipatory writing descriptors' vectors in relation to the moment of writing data vectors with consequent comparison of values of coordinates thereof slows down the process of recovering .internal data consistency after DB emergency shutdown due to necessity to review all sets of vectors, coordinates identifiers of which match identifiers of descriptors' vectors coordinates that were written last before emergency shutdown.
The technical result of this invention is decrease of DB volume and access time.
SUMMARY OF THE INVENTION
Declared technical result is achieved by forming uniform physical structure of data and by executed steps number decrease with DB information.
Physical structure of data consists of elements that logically correspond to the entry point, hypercubes, dimensions, descriptors of coordinates and coordinates of data vectors of multidimensional space. Each element
The "dimension" as used herewith means direction of reference, or a coordinate.
Each element is represented by its identifier that comprises address equal to offset from the beginning of writing multidimensional database file, the number of transaction wherein the identifier was written, code corresponding to the identifier value and set of pointers of four types comprising addresses of identifiers related thereto and indicators.
Identifier of entry point is located in the beginning of physical file with consequent data writing (further - file) . Its pointer of first type comprises identifier' s address written last in order to the file, pointer (index) of second type comprises address of second copy of- entry point, pointer of third type comprises address of root of hypercubes binary tree with odd identifiers' codes, pointer of fourth type comprises address of hypercubes binary tree with even identifiers codes.
Identifiers of hypercubes, dimensions, descriptors and coordinates are located in file in the writing order.
Hypercubes identifiers are linked with first and second type pointers into- hypercubes binary trees (with even and odd elements) . Codes of identifiers are keys for element search within hypercubes binary trees. Pointers of third type comprise addresses of dimensions binary trees roots with odd identifiers codes, while pointers of fourth . type comprise addresses of binary trees roots with even identifiers codes.
Dimensions identifiers are linked with first and second type pointers into dimensions binary trees (with even and odd elements) . Codes of identifiers are keys for element search within dimensions binary trees. Pointers of third type comprise addresses of descriptors binary trees roots with odd identifiers codes, while pointers of fourth type comprise addresses of binary trees roots with even identifiers codes.
Descriptors identifiers are linked with first and second type pointers into descriptors binary trees (with even and odd elements) . Codes of identifiers are keys for element search within descriptors binary trees. Pointers of third type comprise addresses of coordinates identifiers with matching codes contained in multilinked coordinate lists and written last. Pointers of fourth type comprise addresses of the written first data vectors coordinates said data vectors comprising identifiers coordinates with matching codes.
Coordinates identifiers are linked by pointers into multilinked coordinates lists. Pointers of first type of coordinates identifiers that are written first, comprise descriptors identifiers addresses with matching codes, pointers of first type of coordinates identifiers that are written next comprise addresses of previous coordinates identifiers with matching codes, second type pointers comprise written first coordinates identifiers addresses that are included into the previous data vector, pointers of third type comprise addresses of coordinates identifiers that are included into the same hierarchic group of data vector, pointers of fourth type comprise addresses of coordinates identifiers included into the same variant group of data vector .
Indicator of pointer of first type of descriptor identifier comprises a number of linked coordinates identifiers with matching code. Pointer indicator of the fourth type of descriptor indicator comprises the address of the last coordinate indicator with matching code within the data vector. First type pointer indicator of coordinate identifier comprises number of coordinates identifiers prior in writing (storing) order as part of coordinates list with matching code. Indicator of second type of coordinate identifier comprises the address of the last coordinate identifier within the previous data vector.—Indicator of pointer of coordinate identifier of third type comprises level of hierarchy of the coordinate within hierarchic group of coordinates of data vector. Indicator of pointer of coordinate identifier of fourth type comprises sequence number of variant group of coordinates within the data vector.
DB hypercubes comprise auxiliary (service) hypercube comprising full set of dimensions, each one of said dimensions comprising full set of coordinates. The set of dimensions comprises an auxiliary (service) dimension, coordinates of which are dimensions of multidimension space., Data vectors that are related . to auxiliary hypercube data subset comprise names of database, hypercubes, dimensions and coordinates that correspond to codes of coordinates identifiers and are represented as fragments of unstructured information located between identifiers of coordinates.
Each data vector includes auxiliary coordinates identifiers located at the first and the second hierarchy levels and correspondingly relating to dimension of data types within hypercube subset and to dimension of unique identifying (ID) numbers of data that are assigned as they are included into hypercube subset .
Method of multidimensional DB control comprises data writing, reading, deletion, altering and compression operations and restoring internal consistency of information after abnormal DB shutdown.
Data writing is performed as a set of- identifiers included in the same transaction. Writing of new identifiers values of entry point, hypercubes, dimensions and descriptors that were previously written ' to file, is performed by changing previously written values of said identifiers. Identifiers of coordinates included into the same transaction are written before writing of identifiers descriptors that are included into the same transaction.
Information is read basing on search key composed of identifier codes of coordinates included in target data vectors. Identifiers of hypercubes, dimensions and descriptors are read by sequential searching of physical data structure elements, codes of which match codes of identifiers of search key coordinates.
Coordinates identifiers are read starting with selection of descriptor identifier with the lowest value of third type identifier pointer, continuing by transition to identifier of coordinate with matching code, reading all coordinate identifiers within the same data vector, inclusion of data vector into search report in case of match of coordinates identifiers with the search key, selection of coordinate identifier that is included into search key and having the lower value of first type indicator pointer and consequent transition to next coordinates identifiers within data vectors up to reaching identifier of the coordinate with zero first type pointer indicator value.
Deletion and altering of information is performed at logical level .
Hypercubes identifiers, dimensions and descriptors are deleted through assigning zero value to addresses of third and fourth type pointers. Hypercubes identifiers, dimensions and descriptors are altered through assigning zero value to addresses of third and fourth type pointers.
Deletion of coordinates identifiers is performed by writing new versions of data vectors, which comprise deleted coordinates identifiers with zero codes and unchanged addresses values. Altering coordinates identifiers is performed by writing new versions of data vectors, which comprise changed coordinates identifiers with changed codes and unchanged addresses values.
Data compression is performed by writing second copies of identifiers for entry point, hypercubes, dimensions, descriptors and coordinates, by consolidation of data vectors and further deletion of first identifiers copies from the file.
Second copies of identifiers for hypercubes, dimensions and descriptors are written within the same transaction with simultaneous assigning of addresses zero values to pointers of third and fourth types of previously deleted identifiers..
Second copies of coordinates identifiers are written within different transactions after consolidation of versions of data vectors without inclusion of deleted and changed coordinates identifiers into consolidated versions.
After the compression, information is written and read basing on addresses of identifiers pointers for hypercubes, dimensions, descriptors and coordinates, correspondingly increased or decreased by the value of address contained in entry point identifier.
Recovering internal consistency of information after disorderly closedown of database is provided by checking correspondence of descriptors identifiers pointers addresses to actual coordinates identifiers offset in relation to beginning of file.
Checking correspondence is started with reading coordinates identifiers as part of last record of transaction, reading identifier of descriptor, code of which matches first identifier of coordinate as part of transaction and comparing descriptor identifier third type pointer address to actual offset of coordinate's first identifier.
In case of match between descriptor identifier of third type pointer address and offset of coordinate's first identifier reading, control and matching descriptor identifier third type pointer address to actual offsets of other identifiers of coordinates of last transaction, are performed. In case of mismatch between address of third type pointer of descriptor identifier and offset of coordinate's first identifier all identifiers of coordinates of the last write transaction are deleted from the file.
DESCRIPTION OF DRAWINGS
FIG. 1. Data physical structure diagram.
FIG. 2. Hypercubes binary tree diagram.
FIG. 3. Data vector diagram.
FIG. 4. Diagram for information search in DB .
DETAILED DESCRIPTION
DESCRIPTION OF DEVICE (SYSTEM)
Technical result as related to the device is achieved by organizing a DB data physical structure comprising the set of one-type identifiers. At first, second third and fourth structural levels the identifiers are included into one- coordinate vectors, at fifth level they are included into multi-coordinate vectors.
Entry point identifier is located in beginning of DB file, other types of identifiers - as they are written to the file.
Identifier format includes:
- address of identifier equal to value of offset from BOF;
- number of transaction wherein the identifier is written;
- identifier code that consists of dimension's sequence number within DB and coordinate's sequence number in the same dimension;
- set of pointers of four types, each said pointer comprises addresses of linked identifiers and indicators thereto .
At second, third and fourth structural levels one-level identifiers are linked together into dual binary search trees that consist correspondingly of elements with even and odd identifiers codes. At the fifth structural level the elements are linked together into multilinked coordinates lists according to number of pointers of coordinates identifiers.
Depending on the number of structural level, identifiers pointers assume various values.
At first structural level pointer of first type of entry point identifier comprises identifier's address of hypercube, dimension, descriptor or coordinate written last into EOF. Into second type pointer address of second copy of entry point is written, that is used in data compression operation. Third type pointer comprises hypercubes binary tree root address, which consists of elements with odd codes. Third type pointer comprises hypercubes binary tree root address, which consists of elements with even codes.
At second structural level first type pointer of. each identifier comprises address of neighboring element of hypercubes binary tree with smaller code value. Pointer of second type comprises address of neighboring element of hypercubes binary tree that has the bigger code value. Third type pointer comprises dimensions binary tree root address, which consists of elements with odd codes. Third type pointer comprises dimensions binary tree root address, which consists of elements with even codes. '
At third structural level first type pointer of each identifier comprises address of neighboring element of dimensions binary tree with (smaller) code value. Pointer of second type comprises address of neighboring element of dimensions binary tree that has the bigger code value. Third type pointer comprises descriptors binary tree root address, which consists of elements with odd codes. Third type pointer comprises descriptors binary tree root address, which consists of elements with even codes.
At fourth structural level first type pointer of each identifier comprises address of neighboring element of descriptors binary tree with smaller code value. Pointer of second type comprises address of neighboring element of descriptors binary tree that has the bigger code value. Pointer of third type comprises address of element of multilinked list of coordinates with matching code that is written last. Third type pointer indicator comprises a number of elements of the same coordinates list with matching code. Fourth type pointer is used in data compression for writing address of the last coordinate identifier with matching code within data vector.
At fifth structural level identifiers pointers take following values.
Pointer of first type coordinate identifier comprises address of previous coordinate identifier with matching code and a number of preceding elements in the list of identifiers of coordinates with matching code. The first in order said written identifier of coordinate comprises address of element of binary tree of descriptors with matching code.
Pointer of second type comprises address of the first in order coordinate identifier within the previous data vector, said previous data vector further comprising coordinate identifier with the matching code. Indicator of second type pointer comprises the address of the last in writing order coordinate indicator within the previous data vector.
Pointer of third type comprises address of previous identifier located at a higher hierarchy level and sequence number of hierarchy level for this coordinate identifier. Said coordinates identifiers pointers included in the same hierarchy group of data vector form final list.
Pointer of the fourth type comprises coordinate identifier included into the same variant group, (or, in case of completion of one variant group included into another variant group) and number of its own (current) variant group. Said coordinates identifiers pointers included in the same data vector form finite list.
The Database (further Database, DB) comprises auxiliary dimensions that systematize application environment objects:
- main dimension with coordinates values represented by sequence numbers of other dimensions;
- dimension of data types with coordinates values represented by sequence numbers of hypercubes;
- dimensions of unique identification numbers with coordinates values represented by numbers of data assigned in order of inclusion of data into hypercube subset .
Included into DB, the auxiliary hypercube is designated for storing information that is used in process of direct and reverse change of identifiers codes for names of DB component elements. The auxiliary subset hypercube includes data vectors with included therein fragments of unstructured information representing names of symbol-coded and digital names for DB, hypercubes, dimensions and coordinates. Fragments of unstructured information are placed between data vectors coordinates identifiers.
Each data vector comprises two auxiliary coordinates identifiers relating to data types dimension and dimension of unique identification numbers. Data types dimension logically links data vectors included into subset of the same hypercube, dimension of unique ID numbers - primary, changed and deleted versions of every data vector in this subset. Data types and unique ID numbers dimensions coordinates identifiers are placed accordingly at first and second levels of data vector coordinates hierarchy.
Diagram of data physical structure is shown in fig.l. Pointers are indicated by arrows . For entry point pointers of first, third and fourth types are shown, for hypercubes and dimensions binary trees - pointers of third and fourth types, for descriptors binary trees - pointers of third type, for multilinked coordinates lists - pointers of first type.
Diagram of hypercube binary tree comprising elements with even codes is shown in fig.2.
Diagram of data vector comprising ten coordinates identifiers included into three hierarchic and two variant groups is shown in fig.3.
DESCRIPTION OF OPERATION.
Technical result in the part of disclosed device operation is reached by reducing number of steps of operation of writing, reading, deletion, changing, compression operations and restoring inner consistency of information after DB emergency shutdown.
Information is written to DB after it is converted into vector format and vectors coordinates names are changed to identifiers' codes. Identifiers codes for hypercubes, dimensions and descriptors are assigned as they are written to DB. Coordinates names are changed for their identifiers codes basing on comparison of coordinates names with content of fragments of unstructured information included in data vectors of auxiliary hypercube subset data.
Information converted to vector format is written to DB. Entry point vector is written to BOF .
After it a set of hypercubes vectors is written consequently, whose (said vectors' ) identifiers are linked together into dual binary hypercubes trees, each comprising elements with odd and even identifier codes. Address of the last written hypercube identifier and addresses of top nodes of hypercubes binary trees are written to entry point vector identifier. During further write of information, the current address of last written to file identifier of hypercube, dimension, descriptor or coordinate is indicated in the entry point .
After that several sets of dimensions vectors with identifiers linked to each other into dual binary dimensions trees with even and odd elements are written. One pair of binary dimensions trees corresponds to one hypercube vector with written addresses of top nodes of dual binary dimension trees to identifier thereof.
At the next stage vectors are written as part of multilink coordinates list. Pointers of first type of coordinates' identifiers for first written data vectors comprise calculated addresses of descriptors identifiers that are unavailable at the moment of writing to file and zero-value indicators. First type pointers of identifiers for coordinates of next written data vectors comprise addresses and number of previous identifiers. Pointers of other types link together identifiers of data vectors coordinates, their hierarchic and variant groups .
At the final stage descriptors vectors are written, whose identifiers correspond to coordinates identifiers codes of data vectors written before. Pointers of third type of descriptors identifiers comprise addresses of data vectors coordinates identifiers written last, as well as number of identifiers in lists. Identifiers of descriptors are linked together into dual binary descriptors trees with even and odd elements that correspond to dimensions vectors, to identifiers of said dimensions vectors addresses of dual binary descriptors trees top nodes are written. Complement of descriptors vectors is kept unchanged in case of further writing of data vectors, codes of coordinates identifiers of which correspond to previously written descriptors identifiers .
Information is read basing on search key composed of desired data vectors coordinates identifiers.
Reading starts from entry point with permanent address within the file. Then by pointers of third and fourth types a move to binary hypercubes trees is performed where data type identifiers are selected. After that a move to binary trees of dimensions is performed, selection of dimensions identifiers, move to binary trees of descriptors and within them selection of descriptors identifiers that match search key.
In a case of lack of at least one target identifier of a hypercube, dimension or descriptor, the search is terminated by forming a negative report.
Otherwise the comparison is performed of indicator values for third type pointers of descriptors identifiers and selection of the identifier with the least indicator value. Basing on address of third type pointer of the selected descriptor identifier the identifier of data vector coordinate is read as a part of multilinked coordinates list. In accordance with pointers of second type other identifiers of data vector coordinates are read, with further tracing through hierarchic and variant groups of data vector coordinates identifiers and collating thereof with search key. In case of location of coordinates identifiers that match search key into the same hierarchic and variant groups, the data vector is' included into search report.
Within found data vector the following operations are performed:
comparison of first type pointers indicator values for coordinates identifiers matching search key;
selection of coordinate identifier with the least indicator value;
reading identifier of next data vector coordinate by pointer of first type and reading other identifiers of next data vector coordinates by pointers of second type.
Said actions are repeated in order of approaching the beginning of multilinked list of coordinates up to reading one of data vector coordinates identifiers that match search key, first type pointer indicator value of which is equal to zero.
Deletion and altering information in DB is performed at logical level.
Deletion and changing of hypercubes, dimensions and descriptors is performed with via changing values of third and fourth type pointers for corresponding identifiers. When deleted, addresses of third and fourth type pointers are assigned zero value. Change is performed by assigning zero value to indicators of third and fourth type pointers .
Data are deleted by writing additional versions of each data vector in the following sequence.
Firstly, the initial (previously written) version of data vector is read.
Then a deleted version of data vector is formed as part of coordinates identifiers that were deleted from initial version of data vector and have zero values of coordinates sequence numbers as part of identifiers codes, and two auxiliary coordinates identifiers - data types and unique identification numbers. After that the deleted version of data vector is written. Further reading of deleted information is performed by searching and consolidation of data vector versions in accordance with values of auxiliary coordinates identifiers.
Data are altered by writing additional versions of each data vector in the following sequence.
First, the initial version of data vector is read.
Then an altered version of data vector is formed as part of one or a plurality of identifiers of coordinates that were changed relative to initial version, and two auxiliary coordinates identifiers - data types and unique ID numbers. After that the changed version of data vector is written. Next reading of changed information is performed by searching and consolidation of data vector versions in accordance with auxiliary coordinates identifiers.
DB information is compressed at physical level by replacing copies of binary trees of hypercubes, dimensions, descriptors and multilinked coordinates lists by their consolidated versions .
At first step, second copies of entry point, binary trees of hypercubes, dimensions and descriptors are written. During writing operation, logically deleted identifiers of hypercubes, dimensions and descriptors are additionally marked as compressed via zeroing of third type pointers addresses values. After writing of second copies the addition of new identifiers is performed only to the second copies of binary trees of hypercubes, dimensions and descriptors.
Into second type pointer of first copy of entry point identifier address of second copy of entry point identifier is written .
Into first type pointer of second copy of the entry point identifier, an address of last written identifier of hypercube, dimension, descriptor or coordinate is written. In identifier of second copy of entry point the actual address thereof is indicated. Addresses of second copies of dual hypercubes binary trees are written into pointers of third and fourth types.
Addresses of identifiers of coordinates for newly written, last data vectors, are written into pointers of third type of descriptors identifiers, included into second copies of descriptors binary trees. Newly written, initial in order of writing data vectors are linked to second copies of descriptors binary trees or with prior newly written data vectors with matching coordinates identifiers codes.
Addresses of the first and last coordinates identifiers newly written last in order data vectors are written into pointers of fourth type of descriptors identifiers, that are included into second copies of descriptors binary trees.
Addresses of coordinates identifiers with matching codes, that are included into first copies of -multilink coordinates lists are written into pointers of the first type of coordinates identifiers initially written, first in order data vectors .-
At the second stage, performed are reading, consolidation and writing of data vectors that were saved to DB before writing second copies of identifiers of entry point, hypercubes, dimensions and descriptors. Said consolidation is performed in the order of increasing hypercubes ID codes and codes of dimension coordinates identifiers of identification numbers. Newly written and consolidates data vectors form second copies of multilinked coordinates lists.
Reading information during compression is performed by transferring from first to second copy of entry point, passing through second copies of hypercubes, dimensions and descriptors binary trees, searching in the second copy multilinked coordinates lists, transfer and searching in first copy of multilinked coordinates lists. In the third stage first copies of data and meta data are deleted from DB file. Further information writing and reading is performed in second copies of data and metadata in accordance with identifiers addresses of hypercubes, dimensions, descriptors and coordinates, correspondingly increased or decreased by the value of the address contained in entry point identifier.
Restoring internal consistency of information after DB emergency shutdown is performed as follows.
On the first stage data written last and contained in the same transaction is read. Reading is performed starting from coordinate identifier, address of which was written in first type pointer of entry point before DB emergency shutdown. Reading other coordinates identifiers is performed according to addresses written in their pointer of second type and a common transaction number.
On second stage, from entry point a trace to second, third and fourth data structure levels along with reading of descriptor identifier, whose code matches code of coordinate identifier that was written first within the current transaction is performed.
On the third stage, a comparison of address recorded in third type pointer of descriptor identifier and actual offset of coordinate identifier from database BOF is performed. In case of match of the said address and actual offset of coordinate identifier, the reading of other descriptors identifiers, writing missing descriptors identifiers and harmonization of their addresses of third type pointers with actual coordinates identifiers offsets is performed.
In case of mismatch of actual offset of first written coordinate identifier and address of third type pointer of descriptor identifier, all data vectors, written later in order within the same transaction, are deleted from DB file.
REALIZATION EXAMPLE
Realization of the disclosed invention is illustrated on a telephone network subscribers database. The DB includes hypercubes of subscribers, phone numbers and operators (data types) and the following dimensions (shown in brackets are their codes) :
- data type (1)
- unique Individual Number - UIN(2);
- last name (3) ;
- first name (4);
- city (5) ;
- street ( 6) ;
- house No . (7) ;
- telephone network operator name (8);
- telephone network operator code (9);
- subscriber number (10) .
Data vectors related to subscribers hypercube include coordinates of the following dimensions, in hierarchy:
1 hierarchy level - data type («Subscriber»)
2 hierarchy level - UIN
3 hierarchy level - last name
4 hierarchy level - first name
3 hierarchy level - city
4 hierarchy level - street
5 hierarchy level - house No.
3 hierarchy level - data type («telephone No.»)
4 hierarchy level - UIN
Data vectors related to telephone No . s hypercube include coordinates of the following dimensions, in hierarchy:
1 hierarchy level - data type («telephone No.»)
2 hierarchy level - UIN
3 hierarchy level - telephone network operator code
4 hierarchy level - subscriber telephone number
Data vectors related to operators hypercube include coordinates of the following dimensions, in hierarchy: 1 hierarchy level - data type («Operator»)
2 hierarchy level - UIN
3 hierarchy level - operator' s name
4 hierarchy level - operator' s code
For descriptive reasons, the hypothetical record in the example other elements of physical structure of the DB are not used. In the hypothetical record the paragraphs starting with digit 1.1 correspond to vectors, paragraphs lines - to identifiers of vectors coordinates. In the line groups, codes and values of coordinates are indicated. Character "/" divides groups, codes and values of coordinates. Characters "." divide numbers of hierarchic and variant groups, codes of dimensions and coordinates, names of dimensions and coordinates.
Subscribers hypercube
1.1 / 1.1 / data type. Subscriber
2.1 / 2.1 / UIN. 1
3.1 / 3.1 / Last name. Kuznetsov
4.1 / 4.1 / First name. Alexander
5.1 / 5.1 / City. Moscow
6.1 / 6.1 / Street. Tsvetochnaya
7.1 / 7.1 / House number. 128
3.1 / 1.2 / data type. Telephone
4.1 / 2.1 / UIN. 1
3.2 / 1.2 / data type. Telephone
4.2 / 2.2 / UIN. 2
Telephones hypercube
1.1 / 1.2 / data type. Telephone
2.1 / 2.1 / UIN. 1
3.1 / 9.1 / Operator code. 324
4.1 / 10.1 / Subscriber telephone number. 208473
1.1 / 1.2 / data type. Telephone
2.1 / 2.2 / UI . 2 3.1 / 9.2 / Operator code. 698
4.1 / 10.2 / Subscriber telephone number. 593451
Operators hypercube
1.1 / 1.3 / data type. Telephone network operator
2.1 / 2.1 / UIN. 1
3.1 / 8.1 / Operator name. Alpha Telecom
4.1 / 9.1 / Operator code. 324
1.1 / 1.3 / data type. Telephone network operator
2.1 / 2.2 / UIN. 2
3.1 / 8.2 / Operator name. Omega Telecom
4.1 / 9.2 / Operator code. 698
In the example shown is search by operator name, telephone networks that serve telephone numbers that belong to a subscriber with a certain last name. The search is performed in accordance with the key that forms from coordinates of the following dimensions.
Dimension «Data type» - coordinate «Subscriber» (known value) ;
Dimension «Last name» - coordinate «Kuznetsov» (known value) ;
Dimension «Data type» - coordinate «Telephone No . » (known value) ;
Dimension «Operator code» - coordinate «*» (unknown value) ;
Dimension «Data type» - coordinate «Operator» (known value) ;
Dimension «Operator name» - coordinate «*» (unknown value) .
Plan of search for information in DB is shown in fig.4.
In accordance with known values of coordinates of search key, transfer to hypercube of subscribers is performed, dimensions "Last name" and descriptor "Kuznetsov". In third type pointer of descriptor identifier address is defined of identifier of coordinate "Kuznetsov" in multilinked list of coordinates and information is read. After that reading of other identifiers of data vector coordinates is performed by pointers of second type.
Within found data vector, at hierarchic levels greater than 2, Identifiers are defined that belong to the known dimension "data type" and coordinate "Telephone No." At lower hierarchic levels defined are coordinates identifiers related thereto, relating to dimension "UIN" and their values (examples "1" and "2") .
In accordance with known values of coordinates of search key, transfer to hypercube of telephone numbers is performed, dimensions "UIN" and descriptors "1" and "2". In third type pointer of descriptors identifiers address is defined of identifier of coordinates "1" and "2" in multilinked list of coordinates and information is read. After that reading of other identifiers of data vector coordinates is performed by pointers of second type.
Within found data vector one defines coordinates identifiers relating to known dimension "Operator code" ant their values («324» and «698» in example)
In accordance with known values of coordinates of search key, transfer to hypercube of operators is performed, dimensions "Operator code" and descriptors "324" and "698". In third type pointer of descriptors identifiers address is defined of identifier of coordinates "324" and "698" in multilinked list of coordinates and information is read. After that reading of other identifiers of data vector coordinates is performed by pointers of second type.
Within found data vector one defines coordinates identifiers relating to known dimension "Operator Name" ant their values («Alpha Telecom» and «Omega Telecom» in example)
Found coordinates values are included into report. Declared technical solutions are industrially applicable, as they make use of known manufactured hardware and known software procedures. The declared technical solutions may be duplicated in industry, using aforesaid description.

Claims

1. A multidimensional database, the physical data structure of said database consisting of elements that logically correspond to the point of entry, hypercubes, dimensions, coordinates descriptors and data vectors' coordinates of multidimensional space,
wherein each element is represented by an identifier said each identifier includes an address equal to offset from physical beginning of record file (hereinafter - the file) of said multidimensional database, the transaction number in which said identifier was written, a code corresponding to identifier value and a set of pointers of four types comprising addresses of identifiers related to said element and indicators.
2. The multilevel database as recited in claim 1, wherein said identifiers comprise the following properties:
the identifier of the entry point is located at the beginning of the file, said identifiers pointer of the first type comprises the identifier' s address written latest to the file, the pointer of the second type comprises the address of the second copy of entry point, the pointer of the third type comprises the address of root of hypercubes binary tree with odd identifiers' codes, the pointer of the fourth type comprises the address of hypercubes binary tree root with even identifiers codes;
the identifiers of hypercubes are arranged in order of writing thereof to the file and are connected by the pointers of the first and second types into binary hypercubes trees, the codes of said identifiers act as keys for elements search within binary hypercubes trees, the third type pointers comprise addresses of binary dimensional trees roots with odd identifier codes, the fourth type pointers comprise roots of dimensional trees roots with even identifiers codes;
the identifiers of dimensions are arranged in order of writing thereof to the file and are connected by the pointers of the first and second types into binary dimensions trees, the codes of identifiers act as keys for elements search within binary measurement trees, the third type pointers comprise addresses of binary descriptor trees roots with odd identifier codes, the fourth type pointers comprise addresses of descriptor trees roots with even identifiers codes;
the identifiers of descriptors are arranged in order of writing thereof to the file and are associated by pointers of the first and second type into binary descriptors trees, said identifier codes are acting as keys for searching elements in the binary descriptors trees, said pointers of the third type comprise addresses of coordinates identifiers with matching codes, being constituent part of multilink lists of coordinates said identifiers being written last (latest) in order,
said pointers of the fourth type comprise addresses of first coordinates identifiers of data vectors, said data vectors further comprising coordinates identifiers with matching codes; the identifiers of coordinates are arranged in order of writing thereof to the file and are connected by the pointers into multilink coordinates lists, the pointers of the first type of coordinates identifiers that are written first, comprise descriptors identifiers addresses with matching codes, the pointers of the first type of coordinates identifiers that are written next comprise addresses of previous coordinates identifiers with matching codes, the second type pointers comprise first coordinates identifiers addresses being constituent part of the previous data vector, the pointers of third type comprise addresses of coordinates identifiers being constituent part of the same hierarchic group of data vector, the pointers of fourth type comprise addresses of coordinates identifiers being constituent part of the same variant group of data vector.
3. The multilevel database as recited in claim 2, wherein said indicators of pointers identifiers comprise the following values
said indicator of the first type pointer of descriptor identifier comprises a number of linked coordinates identifiers thereto with matching code; said indicator of the first type pointer of descriptor identifier comprises address of the latest coordinate identifier with matching code being constituent part of a data vector;
said indicator of the first type pointer of coordinate identifier comprises a number of linked coordinates identifiers thereto with matching code previous in order of writing thereof;
said indicator of the second type pointer of coordinate identifier comprises address of the latest (last) coordinate identifier being constituent part of the previous data vector; said indicator of the third type pointer of coordinate identifier comprises a hierarchy level of the coordinate within hierarchic group of coordinates of said data vector;
said indicator of the fourth type pointer of coordinate identifier comprises a sequence number of variant group of coordinates within said data vector.
4. The multidimensional database as recited in claim 3 wherein
said database further comprises auxiliary hypercube said auxiliary hypercube comprising a full set of dimensions, each of said dimensions comprising a full set of coordinates, said auxiliary hypercube dimensions set comprising an auxiliary dimension, the coordinates of said auxiliary dimension are dimensions of multidimensional space;
data vectors related to said auxiliary hypercube comprise titles of said database, said hypercubes, said dimensions and said coordinates that correspond to codes of coordinates identifiers and are represented as unstructured information fragments located between identifiers of coordinates;
each data vector comprises auxiliary coordinates identifiers located at the first hierarchy level relating to dimension of data types within hypercube subset and at the second hierarchy level relating to dimension of unique identification numbers of data that are assigned as they are placed into hypercube subset .
5. Method of multidimensional database control comprising the operations of writing, reading, deletion, alteration, compression, and restoration of inner consistency of information after disorderly closedown of said database, further comprising
data writing is performed by writing of a set of identifiers included into the same transaction,
writing of new identifiers values of entry point, hypercubes, dimensions and descriptors that were previously written to file, is performed by replacement of previously written values of said identifiers,
identifiers of coordinates included into the same transaction are written before writing descriptors identifiers included into the same transaction;
reading of information is performed by means search key generated on the base of coordinates identifiers comprising target data vectors;
reading of identifiers of hypercubes, dimensions and descriptors is performed by sequential search of physical data structure elements with codes of said of said physical data structure elements matching codes of said identifiers of search key coordinates,
reading of coordinates identifiers is starting with selection of descriptor identifier with the lowest value of third type identifier pointer, transferring to identifier of coordinate with matching code, reading all coordinate identifiers within the same data vector, inclusion of data vector into search report in case of match of coordinates identifiers with the search key, selection of coordinate identifier included into search key and possessing the lower value of first type indicator pointer and consequent transition to next coordinates identifiers within data vectors up to reaching identifier of the coordinate with zero first type pointer indicator value;
the deletion and altering of information is done at logical level, hypercubes identifiers, descriptors and dimensions are deleted by equating addresses of third and fourth type pointers to zero, hypercubes identifiers, dimensions and descriptors are changed by assigning zero value to indicators of pointers of the third and fourth types,
coordinates identifiers are deleted by writing new versions of data vectors comprising deleted identifiers of coordinates with zero codes and unchanged addresses values, the coordinates identifiers are changed by writing new versions of data vectors comprising changed identifiers of coordinates with zero codes and unchanged addresses values;
the data compression is performed by writing second copies of identifiers for entry point, hypercubes, dimensions, descriptors and coordinates, by consolidation of data vectors and further deletion of first identifiers copies from the file, the second copies of identifiers for hypercubes, dimensions and descriptors are written within the same transaction with simultaneous assigning of zero values to addresses of the pointers of third and fourth types of previously deleted identifiers,
second copies of coordinates identifiers are written within different transactions after consolidation of versions of data vectors without inclusion of deleted and changed coordinates identifiers into consolidated versions,
after the end of compression process, data search for reading and writing is performed basing on addresses of identifiers pointers for hypercubes, dimensions, descriptors and coordinates, correspondingly increased or decreased by the value of address contained in said entry point identifier;
recovering of information internal consistency after disorderly closedown of database is performed by checking of descriptors identifiers pointers addresses correspondence to actual coordinates identifiers offset in relation to beginning of file,
correspondence checking is started with reading coordinates identifiers comprising transaction last in recording order, reading of descriptor identifier matching first identifier of coordinate comprising the current transaction and comparing descriptor identifier third type pointer address to actual offset of coordinate's first identifier,
in case of match between said descriptor identifier of the third type pointer address and actual offset of coordinate's first identifier operations of reading, control and matching descriptor identifier third type pointer address to actual offsets of other identifiers of coordinates of last transaction in order, are performed,
in case of mismatch between address of third type pointer of descriptor identifier and actual offset of coordinate's first identifier all identifiers of coordinates of the last written transaction are deleted from the file.
PCT/RU2010/000532 2010-05-06 2010-09-27 Multidimensional database and the method of control thereof WO2011139176A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
RU2010118074/08A RU2010118074A (en) 2010-05-06 2010-05-06 MULTIDIMENSIONAL DATABASE AND METHOD OF MANAGING THE MULTIDIMENSIONAL DATABASE
RU2010118074 2010-05-06

Publications (1)

Publication Number Publication Date
WO2011139176A1 true WO2011139176A1 (en) 2011-11-10

Family

ID=44025269

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2010/000532 WO2011139176A1 (en) 2010-05-06 2010-09-27 Multidimensional database and the method of control thereof

Country Status (2)

Country Link
RU (1) RU2010118074A (en)
WO (1) WO2011139176A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017139503A1 (en) * 2016-02-10 2017-08-17 Curtail Security, Inc. Comparison of behavioral populations for security and compliance monitoring
US10432659B2 (en) 2015-09-11 2019-10-01 Curtail, Inc. Implementation comparison-based security system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5257365A (en) * 1990-03-16 1993-10-26 Powers Frederick A Database system with multi-dimensional summary search tree nodes for reducing the necessity to access records
US5359724A (en) 1992-03-30 1994-10-25 Arbor Software Corporation Method and apparatus for storing and retrieving multi-dimensional data in computer memory
RU2008120913A (en) * 2008-05-28 2009-12-10 Андрей Евгеньевич Васильев (RU) MULTIDIMENSIONAL DATABASE AND METHOD OF MANAGING THE MULTIDIMENSIONAL DATABASE

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5257365A (en) * 1990-03-16 1993-10-26 Powers Frederick A Database system with multi-dimensional summary search tree nodes for reducing the necessity to access records
US5359724A (en) 1992-03-30 1994-10-25 Arbor Software Corporation Method and apparatus for storing and retrieving multi-dimensional data in computer memory
RU2008120913A (en) * 2008-05-28 2009-12-10 Андрей Евгеньевич Васильев (RU) MULTIDIMENSIONAL DATABASE AND METHOD OF MANAGING THE MULTIDIMENSIONAL DATABASE
RU2389066C2 (en) 2008-05-28 2010-05-10 Андрей Евгеньевич Васильев Multidimensional database and method of managing multidimensional database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BOHM C: "Multidimensional index structures in relational databases", JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, July 2007 (2007-07-01), Netherlands, pages 21PP, XP002639328, Retrieved from the Internet <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.5478&rep=rep1&type=pdf> [retrieved on 20110520] *
SUNITA SARAWAGI: "Indexing OLAP data", March 1996 (1996-03-01), pages 1 - 9, XP002639327, Retrieved from the Internet <URL:http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=6D488DB238566DE0E449470CC9F1A743?doi=10.1.1.33.7109&rep=rep1&type=pdf> [retrieved on 20110520] *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10432659B2 (en) 2015-09-11 2019-10-01 Curtail, Inc. Implementation comparison-based security system
US10986119B2 (en) 2015-09-11 2021-04-20 Curtail, Inc. Implementation comparison-based security system
US11637856B2 (en) 2015-09-11 2023-04-25 Curtail, Inc. Implementation comparison-based security system
WO2017139503A1 (en) * 2016-02-10 2017-08-17 Curtail Security, Inc. Comparison of behavioral populations for security and compliance monitoring
US10462256B2 (en) 2016-02-10 2019-10-29 Curtail, Inc. Comparison of behavioral populations for security and compliance monitoring
US11122143B2 (en) 2016-02-10 2021-09-14 Curtail, Inc. Comparison of behavioral populations for security and compliance monitoring

Also Published As

Publication number Publication date
RU2010118074A (en) 2011-11-20

Similar Documents

Publication Publication Date Title
CN101751406B (en) Method and device for realizing column storage based relational database
US8255398B2 (en) Compression of sorted value indexes using common prefixes
CN103473239B (en) A kind of data of non relational database update method and device
CN107918612A (en) The implementation method and device of key assignments memory system data structure
CN107766374B (en) Optimization method and system for storage and reading of massive small files
US10496612B2 (en) Method for reliable and efficient filesystem metadata conversion
CN104021123A (en) Method and system for data transfer
CN101833511B (en) Data management method, device and system
CN110297781B (en) Method for recovering deleted data in APFS (advanced File System) based on copy-on-write
WO2011139176A1 (en) Multidimensional database and the method of control thereof
CN116450656B (en) Data processing method, device, equipment and storage medium
WO2021179488A1 (en) Monitoring data storage method and device, server and storage medium
CN114185934B (en) Indexing and query method and system based on Tiandun database column storage
RU2389066C2 (en) Multidimensional database and method of managing multidimensional database
CN115237914A (en) Tamper-resistant index structure and construction, storage and query methods thereof
CN114416741A (en) KV data writing and reading method and device based on multi-level index and storage medium
WO2013039420A2 (en) Relational database and operation mode of relational database
CN110990394B (en) Method, device and storage medium for counting number of rows of distributed column database table
CN112069510A (en) Data encryption and de-duplication method
JP3649472B2 (en) Information retrieval device
RU2621628C1 (en) Way of the linked data storage arrangement
CN113821476B (en) Data processing method and device
CN109460385B (en) Data access method for trace of electronic government system
CN112380174B (en) XFS file system analysis method containing deleted files, terminal device and storage medium
CN117033388A (en) Method for storing rowid mapping data in oracle database real-time replication environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10796191

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10796191

Country of ref document: EP

Kind code of ref document: A1