CN104361113A - OLAP (On-Line Analytical Processing) query optimization method in memory and flesh memory hybrid storage mode - Google Patents

OLAP (On-Line Analytical Processing) query optimization method in memory and flesh memory hybrid storage mode Download PDF

Info

Publication number
CN104361113A
CN104361113A CN201410717830.6A CN201410717830A CN104361113A CN 104361113 A CN104361113 A CN 104361113A CN 201410717830 A CN201410717830 A CN 201410717830A CN 104361113 A CN104361113 A CN 104361113A
Authority
CN
China
Prior art keywords
vector
flash
memory
olap
grouping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410717830.6A
Other languages
Chinese (zh)
Other versions
CN104361113B (en
Inventor
张延松
张宇
王珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN201410717830.6A priority Critical patent/CN104361113B/en
Publication of CN104361113A publication Critical patent/CN104361113A/en
Application granted granted Critical
Publication of CN104361113B publication Critical patent/CN104361113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device

Abstract

The invention relates to an OLAP (On-Line Analytical Processing) query optimization method in a memory and flesh memory hybrid storage mode, which comprises the following steps that a flash-aware storage model is adopted for OLAP storage, division is carried out between a relatively small DRAM (Dynamic Random Access Memory) and a relatively big flash memory according to locality of data access, and storage optimization is carried out two stages of heterogeneous memories; array storage is adopted for memory OLAP, each attribute line is stored in a continuous array unit, conventional connection operation is simplified into array index access (AIR), and OLAP query processing of an AIR algorithm is carried out, wherein AIR is array index access; OLAP query processing on the basis of array storage and the AIR algorithm is decomposed into three sequential data access processes; a designated measurement value of a flash storage measurement line is accessed to by a selection vector; K keyword connection bitmaps are optimally stored in two stages of DRAM-flash storage on the basis of keyword-based bitmap continuous indexing to form a two-stage connection bitmap indexing structure. According to the invention, storage cost performance of the memory, use efficiency of the memory and a CPU (Central Processing Unit) and data storage efficiency can be improved; the OLAP query optimization method can be widely applied to universal OLAP application scenes.

Description

OLAP query optimization method under a kind of internal memory-flash memory mixing memory module
Technical field
The present invention relates to storage optimization and OLAP (on-line analytical processing) enquiring and optimizing method in a kind of database field, particularly about the OLAP query optimization method under a kind of internal memory based on DRAM (dynamic RAM) and Flash (flash memory) two-level memory-flash memory mixing memory module being applicable to memory database all-in-one platform.
Background technology
Memory analysis process (memory OLAP) is the important technology of large real time processing, under the support of large internal memory and polycaryon processor parallel processing capability, memory OLAP has excellent real-time analysis processing power, but relative to other memory devices, as flash, disk etc., internal memory remains very expensive storage medium, and storing an order of magnitude (DRAM: ~ 100mW/GB higher than flash in energy consumption, NAND flash:1-10mW/GB), memory OLAP analyzing and processing needs based on large data, the hardware cost of memory analysis process is very high.PCIe Flash Card is as a kind of Large Copacity (hundreds of GB to TB level memory capacity) high speed storing technology, be widely used in field, high-performance data storehouse, as Oracle Exadata X3 memory database all-in-one is configured with jumbo high speed flash card, and Smart Flash Cache buffer memory dsc data is provided, optimizing cache algorithm by data base logic also can based on table named cache optimisation strategy.The application of high speed flash card is on the one hand for expensive memory provides cheap secondary storage extended capability, high speed flash card is mainly used as the expansion buffer memory of database on flash but then, extend the capacity of memory cache (buffer), but do not have and combine with the storage optimization of memory OLAP and query processing optimisation technique, do not realize the OLAP optimisation technique of flash-aware in OLAP algorithm aspect.
Current analytic type memory database is mainly main storage device with DRAM, and flash is used as the standby storage or the disk buffering that substitute disk, is not also included in by flash in the OLAP algorithm design of internal memory.The secondary storage model application how internal memory and Large Copacity flash to be formed is to memory analysis process field, make it the Mainstream Platform becoming high-performance, high performance-price ratio, and memory database not only will support the analyzing and processing of complete memorymodel, also need to support that the memory analysis under DRAM-flash two-level memory is treated as technical matters urgently to be resolved hurrily.
Summary of the invention
For the problems referred to above, the object of this invention is to provide the OLAP query optimization method under a kind of internal memory-flash memory mixing memory module, the method, based on DRAM-flash two-level memory, improves memory cost performance by the flash of Large Copacity cheapness.Meanwhile, the method effectively can improve data storage efficiency.Further, method provided by the invention effectively can improve internal memory and CPU service efficiency.
For achieving the above object, the present invention takes following technical scheme: the OLAP query optimization method under a kind of internal memory-flash memory mixing memory module, it comprises the following steps: 1) OLAP stores the memory model adopting flash-aware, namely, predicate less according to dimension table in OLAP star or snowflake shape model operates the feature that more feature and fact table are made up of external key and metric attribute, divide by the locality of data access between relatively little DRAM and relatively large flash stores, the two-stage internal memory of isomery carries out storage optimization; 2) memory OLAP adopts storage of array, each attribute column is stored in continuous print array location, table is made up of each attribute array of equal length, dimension table uses array index as major key, fact table external key is the data subscript of respective record in dimension table, fact table record can directly locate array location corresponding in dimension table according to foreign key value, traditional attended operation is reduced to array index access, carries out the OLAP query process of AIR algorithm; Wherein AIR is array index access; 3) the OLAP query process based on storage of array and AIR algorithm is decomposed into the data access process of three orders: dimension table access, the access of fact table external key and fact table metric attribute are accessed, and the intermediate data structure that three phases produces comprises: dimension table filter packets vector, selection vector sum grouping vector, grouping Multidimensional numerical; Dimension table filter packets vector, the data structure selecting vector sum grouping vector to share for each inquiry, different inquiries only needs the content upgrading vector, and grouping Multidimensional numerical dynamically generates according to the difference of inquiry; 4) adopt and in DRAM-flash two-level memory, optimize storage K key word connection bitmap on the basis of the bit join index based on key word, therefrom select n high frequency access relation word and corresponding bitmap is stored in DRAM, remaining bitmap is stored in flash, forms secondary connection bit index of the picture structure.
Described step 1) in, adopt memory engine, the resident DRAM of dimension table; The row that when external key in fact table is Y-connection in multidimensional analysis, visiting frequency is higher, reside in DRAM equally; The metric attribute of fact table is stored in flash, and provides tolerance row upper opsition dependent access API to support the random access of tolerance row.
Described step 3) in, dimension table filter packets vector, selection vector sum grouping vector, grouping Multidimensional numerical are stored in DRAM; Fact table metric attribute is stored in flash.
Described step 2) in, the OLAP query process of described AIR algorithm comprises the following steps: 1. OLAP query is decomposed into the packet filtering operation on dimension table: the selection in inquiry and division operation being divided by dimension table, is the subquery on each dimension table by query decomposition; 2. dimension table filter packets vector is generated: each dimension table meets the packet attributes of alternative condition according to respective where clause to recording to filter and project out, the packet attributes meeting alternative condition carries out dictionary table compression, dictionary table is stored in array, dictionary compression code is dictionary array index, in filter packets attribute, alternative condition is that the position of false is set to-1, otherwise be set to packed compressed coding, and be recorded in and show in isometric filter packets vector with dimension; 3. the many times scanning of fact table external key creates and selects and grouping vector: scan foreign key column corresponding to fact table successively by dimension table filter packets vector selection rate order from low to high, be mapped to the vectorial position of specifying of respective dimension table filter packets by foreign key value during every column scan, during vector value non-negative, current fact table record position inserted and select vector; Adopt random access manner according to the record position in current selection vector during next foreign key column scanning, and select vector by the renewal of respective dimension table filter packets vector, delete the fact table record position not meeting follow-up external key mapping condition; When after each foreign key column end of scan, vector is selected to have recorded in fact table the record position meeting whole condition of contact; When total selection rate is higher, while vector is selected in renewal, upgrade grouping vector, with the subscript increment of current group in dimension table filter packets vector calculate grouping Multidimensional numerical subscript; When total selection rate is very low, first generates and select vector, the finally selectively vectorial each foreign key column of position random access, disposable generation grouping vector; 4. by selecting vector access flash to store tolerance row specified metric value: according to the metric attribute value in the position access flash storage selected in vector; Multi-core parallel concurrent is adopted to perform according to the random access selecting vector on flash; 5. by grouping vector, metric is assembled in grouping Multidimensional numerical: the grouping subscript recorded according to grouping vector, carry out Aggregation computation by storing the metric attribute value returned from flash and " pushing away " in the grouping Multidimensional numerical unit of subscript value instruction in grouping vector, complete whole OLAP query process; 6. will obtain packet attributes value in each dimension back mapping of grouping Multidimensional numerical to grouping dictionary table and export Query Result.
Described step 2. in, described filter packets vector is a dimension table additional column dynamically generated, and replaces dimension table and fact table carry out attended operation and provide the coding of linkage record packet attributes on leading dimension; When dimension table not having packet attributes, filter packets vector is reduced to a bitmap, and the connection for fact table external key is filtered.
Described step 4) in, the basis of key word bit join index adopts DRAM-flash two-stage bit join index method, accurately connect in bitmap at K key word the n selecting the most frequently to use according to memory storage space quota individual as the resident bit join index of DRAM, all the other K-n key word bitmaps are stored in flash as secondary index.
The present invention is owing to taking above technical scheme, and it has the following advantages: 1, the present invention is owing to adopting DRAM-flash two-level memory as the storage platform of memory OLAP.Relative to complete memory model, DRAM-flash two-level memory considerably reduces the demand to expensive internal memory, reduce the holistic cost of hardware, by the optimization of flash-aware memory model and AIR OLAP algorithm, high efficiency random access is adopted to a large amount of metric attributes that flash stores, reduces the performance gap that flash stores.2, the AIR OLAP query Processing Algorithm that the present invention adopts a complete OLAP query processing procedure is divided into three independently processing stage, the data compared with small scale are only related to, the filter packets vector of each dimension table generation fixed length Wei Biao and fact table external key processing stage, fact table is inquired about the how many all only needs one of the number of columns related to and select vector sum one grouping vector, the initial length of selection and grouping vector determines by tieing up upper minimum selection rate, in the Y-connection of fact table external key, length constantly shortens, and memory expense is limited, the metric attribute that in database, storage space accounting is maximum is stored in Large Copacity flash, the Hash that traditional line access (comprising the line access that row stores and row store) OLAP query process completes pipeline system in the full table scan process of fact table connects and the operation of Hash packet aggregation, AIR OLAP query algorithm will connect, division operation and aggregation operator decompose, do not use full table scan conventional in traditional database but complete connection division operation on a limited number of fact table external key, then the assigned address that extremely low according to selection rate selection vector opsition dependent access specified metric arranges, shift the data access operation in flash storage onto final processing stage, greatly reduce the data access load on flash, also can give play to the good random access performance of flash simultaneously.3, the present invention due to adopt bit join index be a kind of increment index, the index upgrade operation produced because fact table record increases is that the order of bitmap increases, instead of resemble B+-and set index and upgrade node content and structure, the rewriting data cost of Data Update on flash can be eliminated, be therefore more suitable for flash and store.DRAM-flash two-stage memory bitmap join index method adopts two-level memory pattern by key word of the inquiry visiting frequency and memory storage space to key word bitmap on the basis based on key word bit join index, the key word bitmap of n high frequency use in K key word bitmap is stored in DRAM, all the other K-n key word bitmaps are stored in flash, reduce the storage overhead of internal memory index.4, the present invention sets up isomery memory model by the locality characteristic of data in the feature of OLAP pattern and load and OLAP query Processing Algorithm (i.e. data be accessed frequently degree), the intermediate data structure relied on the row group of different pieces of information locality intensity in table, index, table and inquiry is for object, according to memory size and data access performance constraint, its Optimum distribution during the less DRAM of capacity relative and Large Copacity but the relatively low flash of performance store, is improved data storage efficiency in high-performance.5, the present invention is according to the characteristic that DRAM and flash stores when query processing, data access on flash is postponed execution, query processing process complete for OLAP is divided into internal memory process and flash calculates two stages, support that the flowing water between OLAP query on the different memory access stage walks abreast, improve internal memory and CPU service efficiency.The present invention can be widely used in general purpose O LAP application scenarios.
Accompanying drawing explanation
Fig. 1 is the storage schematic diagram of memory OLAP of the present invention on DRAM-flash two-level memory;
Fig. 2 is the query processing schematic diagram of memory OLAP on DRAM-flash two-level memory in the embodiment of the present invention;
Fig. 3 is the key word bit join index process schematic diagram on DRAM-flash two-level memory of the present invention.
Embodiment
Existing memory OLAP technology adopts complete internal memory computation schema usually, or Large Copacity flash is used as high-speed cache, and the former adds the cost that internal memory calculates, and the latter is difficult to OLAP to optimize on from memory expansion to flash.For this reason, the present invention proposes the OLAP query optimization method under a kind of internal memory based on DRAM-flash two-level memory-flash memory mixing memory module, feature according to OLAP pattern, load and OLAP algorithm optimizes the storage of OLAP data on internal memory and flash, optimizes OLAP query algorithm towards two-level memory feature.The present invention is applicable to general purpose O LAP application scenarios.Below in conjunction with drawings and Examples, the present invention is described in detail.
As shown in Figure 1, the invention provides the OLAP query optimization method under a kind of internal memory-flash memory mixing memory module, the method is based on DRAM-flash two-level memory, memory OLAP enquiring and optimizing method on the DRAM-flash two-level memory adopting DRAM and Large Copacity PCIe Flash Card to form, it comprises the following steps:
1) OLAP stores the memory model adopting flash-aware, namely, predicate less according to dimension table in OLAP star or snowflake shape model operates the feature that more feature and fact table are made up of external key and metric attribute, divide by the locality of data access between relatively little DRAM and relatively large flash stores, the two-stage internal memory of isomery carries out storage optimization, to improve the efficiency of internal memory medium-high frequency usage data.
Show less and modern OLAP due to dimension to support the renewal that dimension is shown therefore to adopt memory engine, tie up the resident DRAM of table.The row that when external key in fact table is Y-connection in multidimensional analysis, visiting frequency is higher, reside in DRAM equally.The metric attribute of fact table is more, and OLAP query only for the metric in minority metric attribute and extremely low selection rate, is therefore stored in the flash of cheap mass usually, and provides tolerance row upper opsition dependent access API to support the opsition dependent random access of tolerance row.
2) memory OLAP adopts storage of array, each attribute column is stored in continuous print array location, table is made up of each attribute array of equal length, wherein more preferably, dimension table uses array index as major key, fact table external key is the data subscript of respective record in dimension table, and fact table record can directly locate array location corresponding in dimension table according to foreign key value, traditional attended operation is reduced to array index access (Arra yindex Reference, AIR), carry out the OLAP query process of AIR algorithm.
3) the OLAP query process based on storage of array and AIR algorithm is decomposed into the data access process of three orders: dimension table access, the access of fact table external key and fact table metric attribute are accessed, and the intermediate data structure that three phases produces comprises: dimension table filter packets vector, selection vector sum grouping vector, grouping Multidimensional numerical.Dimension table filter packets vector, the data structure selecting vector sum grouping vector to share for each inquiry, different inquiries only needs the content upgrading vector; Grouping Multidimensional numerical dynamically generates according to the difference of inquiry.This three classes intermediate data structure is reused in query processing process, belongs to strong locality data set, needs to be stored in DRAM; Shared by fact table metric attribute, storage space is larger, but usually only access less metric attribute in inquiry, and the record by selecting vector only to need extremely low ratio in random access metric attribute row, therefore metric attribute row can be stored in reduce the demand of memory OLAP to DRAM in flash, reduce the hardware cost of system.
AIR OLAP algorithm of the present invention needs to select the data such as vector, grouping vector, dimension table filter packets vector, grouping Multidimensional numerical to be used for OLAP query processing procedure, these data structures can be reused between OLAP query, shared memory headroom is fixed, and therefore resides in DRAM.
4) data warehouse usually uses bit join index to optimize or eliminates the connection cost between dimension table and fact table.The present invention adopts to optimize in DRAM-flash two-level memory on the basis of the bit join index (namely according to keywords instead of whole attribute member set up bit join index) based on key word and stores K key word and connect the bitmap (bitmap that namely everyone key word correspondence one is isometric with fact table, record the fact table record position corresponding to this key word), therefrom select n high frequency access relation word and corresponding bitmap is stored in DRAM, remaining bitmap is stored in flash, forms secondary connection bit index of the picture structure;
Wherein, the present invention adopts DRAM-flash two-stage bit join index method on the basis of key word bit join index, accurately connect in bitmap at K key word the n selecting the most frequently to use according to memory storage space quota individual as the resident bit join index of DRAM, all the other K-n key word bitmaps are as secondary index.Be specially: the fact table position bitmap of index entry corresponding to certain dimension Table Properties value, the size of bitmap index is the bitmap size of fixed length and the product of key word quantity, and bitmap index can reduce memory storage space further by data compression.By query execution daily record and key word of the inquiry analysis, K the dimension attribute key word the most frequently used can be determined and set up bitmap index for it, the key word bitmap that wherein n the most frequently uses can be resided at internal memory according to memory headroom quota, all the other K-n key word bitmaps are stored in flash, form secondary connection bit index of the picture structure.
Above-mentioned steps 2) in, the OLAP query process of AIR algorithm comprises the following steps:
1. OLAP query is decomposed into the packet filtering operation on dimension table: the selection in inquiry and division operation being divided by dimension table, is the subquery on each dimension table by query decomposition.
2. dimension table filter packets vector is generated: each dimension table meets the packet attributes of alternative condition according to respective where clause to recording to filter and project out, the packet attributes meeting alternative condition carries out dictionary table compression, dictionary table is stored in array, dictionary compression code is dictionary array index, in filter packets attribute, alternative condition is that the position of false is set to-1, otherwise be set to packed compressed coding, and be recorded in and show in isometric filter packets vector with dimension.
Filter packets vector is a dimension table additional column dynamically generated, and replaces dimension table and fact table carry out attended operation and provide the coding of linkage record packet attributes on leading dimension.When dimension table not having packet attributes, filter packets vector is reduced to a bitmap, and the connection for fact table external key is filtered.
3. the many times scanning of fact table external key creates and selects and grouping vector: scan foreign key column corresponding to fact table successively by dimension table filter packets vector selection rate order from low to high, be mapped to the vectorial position of specifying of respective dimension table filter packets by foreign key value during every column scan, during vector value non-negative, current fact table record position inserted and select vector; Adopt random access manner according to the record position in current selection vector during next foreign key column scanning, and select vector by the renewal of respective dimension table filter packets vector, delete the fact table record position not meeting follow-up external key mapping condition; When after each foreign key column end of scan, vector is selected to have recorded in fact table the record position meeting whole condition of contact.When total selection rate is higher, while vector is selected in renewal, upgrade grouping vector, with the subscript increment of current group in dimension table filter packets vector calculate grouping Multidimensional numerical subscript; When total selection rate is very low, first generates and select vector, the finally selectively vectorial each foreign key column of position random access, disposable generation grouping vector.
4. by selecting vector access flash to store tolerance row specified metric value: according to the metric attribute value in the position access flash storage selected in vector, only need to return few metric and carry out packet aggregation calculating; Flash has good parallel random access performance, adopts multi-core parallel concurrent to perform, reduce flash data access delay according to the random access selecting vector on flash.
5. by grouping vector, metric is assembled in grouping Multidimensional numerical: the grouping subscript recorded according to grouping vector, carry out Aggregation computation by storing the metric attribute value returned from flash and " pushing away " in the grouping Multidimensional numerical unit of subscript value instruction in grouping vector, complete whole OLAP query process.
6. will obtain packet attributes value in each dimension back mapping of grouping Multidimensional numerical to grouping dictionary table and export Query Result.
Embodiment:
As shown in Figure 2, flash-aware is embodied in by shifting onto finally to the access of the metric attribute that flash stores to the optimization of OLAP query algorithm, being according to selecting vectorial random access fact table metric attribute being performed to low selection rate by fact table order scan transformation, reducing the delay of metric attribute access on flash.
Step 2) for querying command below:
SELECT c_nation,s_nation,sum(l_revenue),sum(l_price)
FROM customer,lineorder,supplier
WHERE lo_custkey=c_custkey
and lo_suppkey=s_suppkey
and c_region='AMERICA'
and s_region='ASIA'
group by c_nation,s_nation;
Querying command needs fact table l ineorder and Wei Biao customer, supplier to connect, and then asks cumulative sum by c_region and the s_region attribute of dimension table to fact table tolerance row l_revenue and l_price.
As shown in Figure 3, step 4 of the present invention) in key word bit join index Processing Example on DRAM-flash two-level memory:
In traditional database, index is a kind of planar structure, supposes to use in identical memory hierarchy.The index technology that B+-sets the disk database such as index realizes the exchanges data of internal memory disappearance node at disk and internal memory by buffer zone mechanism, but still be a kind of opaque mechanism, its index accesses efficiency depends on that the efficiency of algorithm is replaced in buffer zone, can not be optimized by custom database formula.In OLAP application, index entry is present in dimension Table Properties, but the object of index is then linkage record corresponding in fact table, the data on single table can only be retrieved in common B+-tree index, OLAP is applied, the accelerating effect of raising to OLAP query overall performance of the index accesses performance on less dimension table is limited, and the attended operation that index can accelerate between fact table and dimension table is set up to fact table external key, but the index stores expense on fact table external key is very huge, and fact table needs a large amount of in real time renewal in modern OLAP application, the renewal of index is costly.The present invention adopt be bit join index method, namely by attended operation be the attribute member that dimension table is specified set up fact table connect bitmap, each member sets up a bitmap, indicates the link position of this member on fact table record.Bit join index is a kind of increment index, expands bitmap lengths with only needing increment when fact table mass data is inserted, and does not need the connection bitmap to having set up to be reconstructed.The bit join index used in traditional database is to take attribute as granularity be all members connect bitmap, be faced with that low power set attribute bit join index space expense is little but selection rate is too high, and high power set Attributions selection rate is low but to connect bitmap quantity many, the contradiction that storage overhead is excessive.The present invention adopts and selects bit join index key word by the TOP K visiting frequency of the dimension attribute key word in query load, and be that K overall high frequency access key sets up overall bit join index, form a key/value index structure, key is the global name of key word, comprise table and keyword message, value connects bitmap.The key word of different dimensional table, different attribute has identical connection bitmap lengths, can be stored in overall bit join index.Wherein more preferably, the present invention proposes DRAM-flash two-stage bit join index method on the basis of key word bit join index, accurately connect in bitmap at K key word the n selecting the most frequently to use according to memory storage space quota individual as the resident bit join index of DRAM, all the other K-n key word bitmaps are as secondary index.As shown in Figure 3, show the application scenarios that key word of the inquiry bitmap is present in DRAM and flash two-level memory respectively, the bitmap first in DRAM completes selection operation, generates filtered bitmap, when the selection rate of filtered bitmap is at certain threshold range (S low, S high) between time, in storing to flash according to the position of " 1 " in the filtered bitmap generated, the assigned address of corresponding bitmap carries out random access, determines the logical consequence that this position is final.Flash has good random access ability, the present invention adopts filtered bitmap sequential access, to the strategy of flash bitmap concurrent access, improves the concurrent access performance of data on flash, reduce the delay that bitmap index calculates, parallel flash accesses thread as shown in Figure 3.Suppose that the selection rate of filtered bitmap in DRAM is S 1, the selection rate of flash Bitmap is S 2(selection rate of bitmap accurately can be provided by the quantity of " 1 "), T f(S 1) be the bitmap access delay on flash, T p(S 2, S 1) for being S in index selection rate 2and S 1time query processing mistiming, work as T p(S 2, S 1)-T f(S 1) >0 time, on flash bitmap access there is query performance income.S lowand S highfor the minimum of satisfied inquiry income and most high selectivity difference are (as S 2-S 1).
In sum, compared with prior art, the present invention adopts the memory OLAP enquiring and optimizing method based on DRAM-flash two-level memory, supports the memory OLAP query processing on DRAM-flash two-level memory, reduce memory OLAP to the demand of expensive internal memory, improve the cost performance of memory OLAP.By based on the storage optimization method of DRAM-flash two-level memory, OLAP query optimization method and optimiged index method, optimize the data access in OLAP query processing procedure and query processing performance pellucidly.The present invention improves visit data managerial ability and the query processing performance of the large data of memory OLAP simultaneously.
The various embodiments described above are only for illustration of the present invention; wherein the structure of each parts, connected mode and manufacture craft etc. all can change to some extent; every equivalents of carrying out on the basis of technical solution of the present invention and improvement, all should not get rid of outside protection scope of the present invention.

Claims (8)

1. the OLAP query optimization method under internal memory-flash memory mixing memory module, it comprises the following steps:
1) OLAP stores the memory model adopting flash-aware, namely, predicate less according to dimension table in OLAP star or snowflake shape model operates the feature that more feature and fact table are made up of external key and metric attribute, divide by the locality of data access between relatively little DRAM and relatively large flash stores, the two-stage internal memory of isomery carries out storage optimization;
2) memory OLAP adopts storage of array, each attribute column is stored in continuous print array location, table is made up of each attribute array of equal length, dimension table uses array index as major key, fact table external key is the data subscript of respective record in dimension table, fact table record can directly locate array location corresponding in dimension table according to foreign key value, traditional attended operation is reduced to array index access, carries out the OLAP query process of AIR algorithm; Wherein AIR is array index access;
3) the OLAP query process based on storage of array and AIR algorithm is decomposed into the data access process of three orders: dimension table access, the access of fact table external key and fact table metric attribute are accessed, and the intermediate data structure that three phases produces comprises: dimension table filter packets vector, selection vector sum grouping vector, grouping Multidimensional numerical; Dimension table filter packets vector, the data structure selecting vector sum grouping vector to share for each inquiry, different inquiries only needs the content upgrading vector, and grouping Multidimensional numerical dynamically generates according to the difference of inquiry;
4) adopt and in DRAM-flash two-level memory, optimize storage K key word connection bitmap on the basis of the bit join index based on key word, therefrom select n high frequency access relation word and corresponding bitmap is stored in DRAM, remaining bitmap is stored in flash, forms secondary connection bit index of the picture structure.
2. the OLAP query optimization method under a kind of internal memory as claimed in claim 1-flash memory mixing memory module, is characterized in that: described step 1) in, adopt memory engine, the resident DRAM of dimension table; The row that when external key in fact table is Y-connection in multidimensional analysis, visiting frequency is higher, reside in DRAM equally; The metric attribute of fact table is stored in flash, and provides tolerance row upper opsition dependent access API to support the random access of tolerance row.
3. the OLAP query optimization method under a kind of internal memory as claimed in claim 1-flash memory mixing memory module, is characterized in that: described step 3) in, dimension table filter packets vector, selection vector sum grouping vector, grouping Multidimensional numerical are stored in DRAM; Fact table metric attribute is stored in flash.
4. the OLAP query optimization method under a kind of internal memory as claimed in claim 2-flash memory mixing memory module, is characterized in that: described step 3) in, dimension table filter packets vector, selection vector sum grouping vector, grouping Multidimensional numerical are stored in DRAM; Fact table metric attribute is stored in flash.
5. the OLAP query optimization method under a kind of internal memory as claimed in claim 1 or 2 or 3 or 4-flash memory mixing memory module, is characterized in that: described step 2) in, the OLAP query process of described AIR algorithm comprises the following steps:
1. OLAP query is decomposed into the packet filtering operation on dimension table: the selection in inquiry and division operation being divided by dimension table, is the subquery on each dimension table by query decomposition;
2. dimension table filter packets vector is generated: each dimension table meets the packet attributes of alternative condition according to respective where clause to recording to filter and project out, the packet attributes meeting alternative condition carries out dictionary table compression, dictionary table is stored in array, dictionary compression code is dictionary array index, in filter packets attribute, alternative condition is that the position of false is set to-1, otherwise be set to packed compressed coding, and be recorded in and show in isometric filter packets vector with dimension;
3. the many times scanning of fact table external key creates and selects and grouping vector: scan foreign key column corresponding to fact table successively by dimension table filter packets vector selection rate order from low to high, be mapped to the vectorial position of specifying of respective dimension table filter packets by foreign key value during every column scan, during vector value non-negative, current fact table record position inserted and select vector; Adopt random access manner according to the record position in current selection vector during next foreign key column scanning, and select vector by the renewal of respective dimension table filter packets vector, delete the fact table record position not meeting follow-up external key mapping condition; When after each foreign key column end of scan, vector is selected to have recorded in fact table the record position meeting whole condition of contact; When total selection rate is higher, while vector is selected in renewal, upgrade grouping vector, with the subscript increment of current group in dimension table filter packets vector calculate grouping Multidimensional numerical subscript; When total selection rate is very low, first generates and select vector, the finally selectively vectorial each foreign key column of position random access, disposable generation grouping vector;
4. by selecting vector access flash to store tolerance row specified metric value: according to the metric attribute value in the position access flash storage selected in vector; Multi-core parallel concurrent is adopted to perform according to the random access selecting vector on flash;
5. by grouping vector, metric is assembled in grouping Multidimensional numerical: the grouping subscript recorded according to grouping vector, carry out Aggregation computation by storing the metric attribute value returned from flash and " pushing away " in the grouping Multidimensional numerical unit of subscript value instruction in grouping vector, complete whole OLAP query process;
6. will obtain packet attributes value in each dimension back mapping of grouping Multidimensional numerical to grouping dictionary table and export Query Result.
6. the OLAP query optimization method under a kind of internal memory as claimed in claim 5-flash memory mixing memory module, it is characterized in that: described step 2. in, described filter packets vector is a dimension table additional column dynamically generated, and replaces dimension table and fact table carry out attended operation and provide the coding of linkage record packet attributes on leading dimension; When dimension table not having packet attributes, filter packets vector is reduced to a bitmap, and the connection for fact table external key is filtered.
7. the OLAP query optimization method under a kind of internal memory as described in claim 1 or 2 or 3 or 4 or 6-flash memory mixing memory module, it is characterized in that: described step 4) in, the basis of key word bit join index adopts DRAM-flash two-stage bit join index method, accurately connect in bitmap at K key word the n selecting the most frequently to use according to memory storage space quota individual as the resident bit join index of DRAM, all the other K-n key word bitmaps are stored in flash as secondary index.
8. the OLAP query optimization method under a kind of internal memory as claimed in claim 5-flash memory mixing memory module, it is characterized in that: described step 4) in, the basis of key word bit join index adopts DRAM-flash two-stage bit join index method, accurately connect in bitmap at K key word the n selecting the most frequently to use according to memory storage space quota individual as the resident bit join index of DRAM, all the other K-n key word bitmaps are stored in flash as secondary index.
CN201410717830.6A 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module Active CN104361113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410717830.6A CN104361113B (en) 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410717830.6A CN104361113B (en) 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module

Publications (2)

Publication Number Publication Date
CN104361113A true CN104361113A (en) 2015-02-18
CN104361113B CN104361113B (en) 2017-06-06

Family

ID=52528373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410717830.6A Active CN104361113B (en) 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module

Country Status (1)

Country Link
CN (1) CN104361113B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930388A (en) * 2016-04-14 2016-09-07 中国人民大学 OLAP grouping aggregation method based on function dependency relationship
CN108255829A (en) * 2016-12-28 2018-07-06 腾讯科技(北京)有限公司 Data search method and device
CN108733681A (en) * 2017-04-14 2018-11-02 华为技术有限公司 Information processing method and device
CN109086456A (en) * 2018-08-31 2018-12-25 中国联合网络通信集团有限公司 data index method and device
CN109783008A (en) * 2017-11-13 2019-05-21 爱思开海力士有限公司 Data storage device and its operating method
CN110647722A (en) * 2019-09-20 2020-01-03 北京中科寒武纪科技有限公司 Data processing method and device and related product
CN111399752A (en) * 2019-01-03 2020-07-10 慧荣科技股份有限公司 Control device and method for different types of storage units
CN111782734A (en) * 2019-04-04 2020-10-16 华为技术服务有限公司 Data compression and decompression method and device
CN112597114A (en) * 2020-12-23 2021-04-02 跬云(上海)信息科技有限公司 OLAP pre-calculation engine optimization method based on object storage and application
US11386089B2 (en) 2020-01-13 2022-07-12 The Toronto-Dominion Bank Scan optimization of column oriented storage
CN115309947A (en) * 2022-08-15 2022-11-08 北京欧拉认知智能科技有限公司 Method and system for realizing online analysis engine based on graph
CN116483831A (en) * 2023-04-12 2023-07-25 上海沄熹科技有限公司 Recommendation index generation method for distributed database
US11748022B2 (en) 2019-01-03 2023-09-05 Silicon Motion, Inc. Method and apparatus for controlling different types of storage units

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6651055B1 (en) * 2001-03-01 2003-11-18 Lawson Software, Inc. OLAP query generation engine
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
US20130282650A1 (en) * 2012-04-18 2013-10-24 Renmin University Of China OLAP Query Processing Method Oriented to Database and HADOOP Hybrid Platform
CN103631911A (en) * 2013-11-27 2014-03-12 中国人民大学 OLAP query processing method based on array storage and vector processing

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6651055B1 (en) * 2001-03-01 2003-11-18 Lawson Software, Inc. OLAP query generation engine
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
US20130282650A1 (en) * 2012-04-18 2013-10-24 Renmin University Of China OLAP Query Processing Method Oriented to Database and HADOOP Hybrid Platform
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
CN103631911A (en) * 2013-11-27 2014-03-12 中国人民大学 OLAP query processing method based on array storage and vector processing

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930388A (en) * 2016-04-14 2016-09-07 中国人民大学 OLAP grouping aggregation method based on function dependency relationship
CN108255829A (en) * 2016-12-28 2018-07-06 腾讯科技(北京)有限公司 Data search method and device
CN108255829B (en) * 2016-12-28 2021-10-19 腾讯科技(北京)有限公司 Data searching method and device
US11132346B2 (en) 2017-04-14 2021-09-28 Huawei Technologies Co., Ltd. Information processing method and apparatus
CN108733681A (en) * 2017-04-14 2018-11-02 华为技术有限公司 Information processing method and device
CN108733681B (en) * 2017-04-14 2021-10-22 华为技术有限公司 Information processing method and device
CN109783008B (en) * 2017-11-13 2022-04-26 爱思开海力士有限公司 Data storage device and operation method thereof
CN109783008A (en) * 2017-11-13 2019-05-21 爱思开海力士有限公司 Data storage device and its operating method
CN109086456A (en) * 2018-08-31 2018-12-25 中国联合网络通信集团有限公司 data index method and device
CN111399752B (en) * 2019-01-03 2023-11-28 慧荣科技股份有限公司 Control device and method for different types of storage units
CN111399752A (en) * 2019-01-03 2020-07-10 慧荣科技股份有限公司 Control device and method for different types of storage units
US11748022B2 (en) 2019-01-03 2023-09-05 Silicon Motion, Inc. Method and apparatus for controlling different types of storage units
CN111782734A (en) * 2019-04-04 2020-10-16 华为技术服务有限公司 Data compression and decompression method and device
CN110647722A (en) * 2019-09-20 2020-01-03 北京中科寒武纪科技有限公司 Data processing method and device and related product
CN110647722B (en) * 2019-09-20 2024-03-01 中科寒武纪科技股份有限公司 Data processing method and device and related products
US11386089B2 (en) 2020-01-13 2022-07-12 The Toronto-Dominion Bank Scan optimization of column oriented storage
CN112597114B (en) * 2020-12-23 2023-09-15 跬云(上海)信息科技有限公司 OLAP (on-line analytical processing) precomputation engine optimization method and application based on object storage
CN112597114A (en) * 2020-12-23 2021-04-02 跬云(上海)信息科技有限公司 OLAP pre-calculation engine optimization method based on object storage and application
CN115309947A (en) * 2022-08-15 2022-11-08 北京欧拉认知智能科技有限公司 Method and system for realizing online analysis engine based on graph
CN115309947B (en) * 2022-08-15 2023-03-21 北京欧拉认知智能科技有限公司 Method and system for realizing online analysis engine based on graph
CN116483831A (en) * 2023-04-12 2023-07-25 上海沄熹科技有限公司 Recommendation index generation method for distributed database
CN116483831B (en) * 2023-04-12 2024-01-30 上海沄熹科技有限公司 Recommendation index generation method for distributed database

Also Published As

Publication number Publication date
CN104361113B (en) 2017-06-06

Similar Documents

Publication Publication Date Title
CN104361113A (en) OLAP (On-Line Analytical Processing) query optimization method in memory and flesh memory hybrid storage mode
US8660985B2 (en) Multi-dimensional OLAP query processing method oriented to column store data warehouse
CN103309958B (en) The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture
US9697254B2 (en) Graph traversal operator inside a column store
CN103294831B (en) Based on the packet aggregation computational methods of Multidimensional numerical in column storage database
CN104866608A (en) Query optimization method based on join index in data warehouse
CN102663116A (en) Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN104361118B (en) A kind of mixing OLAP query processing method for adapting to coprocessor
US20100293135A1 (en) Highconcurrency query operator and method
CN103942342A (en) Memory database OLTP and OLAP concurrency query optimization method
CN110362566B (en) Data placement in a hybrid data layout of a hierarchical HTAP database
CN105117417A (en) Read-optimized memory database Trie tree index method
CN108536692B (en) Execution plan generation method and device and database server
US20140101132A1 (en) Swapping expected and candidate affinities in a query plan cache
CN104778077B (en) Figure processing method and system outside high speed core based on random and continuous disk access
CN103678589A (en) Database kernel query optimization method based on equivalence class
EP2469423A1 (en) Aggregation in parallel computation environments with shared memory
CN113032427B (en) Vectorization query processing method for CPU and GPU platform
CN105159616A (en) Disk space management method and device
CN103914483A (en) File storage method and device and file reading method and device
US11294816B2 (en) Evaluating SQL expressions on dictionary encoded vectors
Su et al. Indexing and parallel query processing support for visualizing climate datasets
Shanoda et al. JOMR: Multi-join optimizer technique to enhance map-reduce job
Zacharatou et al. The case for distance-bounded spatial approximations
US8832157B1 (en) System, method, and computer-readable medium that facilitates efficient processing of distinct counts on several columns in a parallel processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant