CN104361113B - A kind of OLAP query optimization method under internal memory flash memory mixing memory module - Google Patents

A kind of OLAP query optimization method under internal memory flash memory mixing memory module Download PDF

Info

Publication number
CN104361113B
CN104361113B CN201410717830.6A CN201410717830A CN104361113B CN 104361113 B CN104361113 B CN 104361113B CN 201410717830 A CN201410717830 A CN 201410717830A CN 104361113 B CN104361113 B CN 104361113B
Authority
CN
China
Prior art keywords
flash
vector
memory
packet
olap
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410717830.6A
Other languages
Chinese (zh)
Other versions
CN104361113A (en
Inventor
张延松
张宇
王珊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Renmin University of China
Original Assignee
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Renmin University of China filed Critical Renmin University of China
Priority to CN201410717830.6A priority Critical patent/CN104361113B/en
Publication of CN104361113A publication Critical patent/CN104361113A/en
Application granted granted Critical
Publication of CN104361113B publication Critical patent/CN104361113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device

Abstract

The present invention relates to the OLAP query optimization method under a kind of internal memory flash memory mixing memory module, it includes:OLAP storages are divided between relatively small DRAM and relatively large flash storages using the storage model of flash aware by the locality of data access, and storage optimization is carried out on the two-stage internal memory of isomery;Memory OLAP uses storage of array, and each attribute column is stored in continuous array location, and traditional attended operation is reduced into array index accesses, and carries out the OLAP query treatment of AIR algorithms;Wherein AIR is accessed for array index;OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order;Measure column specified metric value is stored by selecting vector to access flash;Using K keyword connection bitmap of storage is optimized in DRAM flash two-level memories on the basis of the bit join index based on keyword, two grades of connection bitmap index structures are formed.The present invention can improve memory storage cost performance, internal memory and CPU service efficiencies and data storage efficiency, can be widely used in general purpose O LAP application scenarios.

Description

A kind of OLAP query optimization method under internal memory-flash memory mixing memory module
Technical field
It is special the present invention relates to storage optimization in a kind of database field and OLAP (on-line analytical processing) enquiring and optimizing method It is not (to be dodged with DRAM (dynamic random access memory) and Flash suitable for memory database one machine platform on a kind of Deposit) OLAP query optimization method under internal memory-flash memory mixing memory module based on two-level memory.
Background technology
Memory analysis treatment (memory OLAP) are the important technologies that big data is analyzed and processed in real time, at big internal memory and multinuclear Manage under the support of device parallel processing capability, memory OLAP has excellent real-time analyzing and processing ability, but relative to other storages Equipment, such as flash, disk, internal memory are still very expensive storage medium, and higher than flash by one in storage energy consumption The individual order of magnitude (DRAM:~100mW/GB, NAND flash:1-10mW/GB), memory OLAP analyzing and processing needs are with big data Basis, the hardware cost of memory analysis treatment is very high.Used as a kind of Large Copacity, (hundreds of GB to TB grades is deposited PCIe Flash Card Storage capacity) high speed storing technology, it has been widely used in high-performance data storehouse field, such as Oracle Exadata X3 internal memories Database all-in-one is configured with the high speed flash card of Large Copacity, and provides Smart Flash Cache caching dsc datas, by number According to storehouse logic optimization cache algorithm and cache optimization strategy can be specified based on table.On the one hand the application of high speed flash card is costliness Memory storage provide cheap secondary storage extended capability, high speed flash card is mainly used as database and exists but then Extension caching on flash, extends the capacity of memory cache (buffer), but not with the storage optimization of memory OLAP and Query processing optimisation technique combines, and does not realize the OLAP optimisation techniques of flash-aware in OLAP algorithm aspects.
Current analytic type memory database is main with DRAM as main storage device, and flash is used as to substitute the standby of disk Storage or disk buffering, during also the OLAP algorithms that flash includes internal memory are not designed.How by internal memory and Large Copacity flash The secondary storage model application for being formed makes the Mainstream Platform of high-performance, high performance-price ratio to memory analysis process field, And memory database will not only support the analyzing and processing of complete memorymodel, it is also desirable under supporting DRAM-flash two-level memories Memory analysis be treated as technical problem urgently to be resolved hurrily.
The content of the invention
Regarding to the issue above, it is an object of the invention to provide the OLAP query under a kind of internal memory-flash memory mixing memory module Optimization method, the method is based on DRAM-flash two-level memories, and memory storage sexual valence is improved by the cheap flash of Large Copacity Than.Meanwhile, the method can effectively improve data storage efficiency.Further, the method that the present invention is provided can effectively improve internal memory With CPU service efficiencies.
To achieve the above object, the present invention takes following technical scheme:Under a kind of internal memory-flash memory mixing memory module OLAP query optimization method, it is comprised the following steps:1) OLAP storages use the storage model of flash-aware, i.e. basis The characteristics of dimension table is smaller in OLAP stars or snowflake shape model, predicate operation is more and true table are made up of external key and metric attribute The characteristics of, divided by the locality of data access between relatively small DRAM and relatively large flash storages, different Storage optimization is carried out on the two-stage internal memory of structure;2) memory OLAP uses storage of array, and each attribute column is stored in continuous array In unit, table is made up of each attribute array of equal length, and dimension table uses array index as major key, and true off-balancesheet key is dimension table The data subscript of middle respective record, fact token record can directly position corresponding array location in dimension table according to foreign key value, will Traditional attended operation is reduced to array index access, carries out the OLAP query treatment of AIR algorithms;Wherein AIR is array index Access;3) the OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order:Dimension table Access, true off-balancesheet key is accessed and true metric table attribute access, the intermediate data structure that three phases are produced includes:Dimension table mistake Filter packet vector, selection vector sum packet vector, packet Multidimensional numerical;Dimension table filter packets vector, selection vector sum be grouped to It is the shared data structure of each inquiry to measure, and different inquiries only needs to the content of renewal vector, is grouped Multidimensional numerical according to looking into Inquiry different and dynamic is generated;4) use and deposited in DRAM-flash two-stages on the basis of the bit join index based on keyword Optimize K keyword connection bitmap of storage in storage, therefrom select n high frequency access relation word and be stored in corresponding bitmap DRAM, remaining bitmap is stored in flash, forms two grades of connection bitmap index structures.
The step 1) in, using memory storage engine, dimension table is resident DRAM;During external key in true table is multidimensional analysis Visiting frequency row higher during Y-connection, in equally residing in DRAM;The metric attribute of true table is stored in flash, and Opsition dependent on measure column is provided and accesses the random access that API supports measure column.
The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in In DRAM;True metric table attribute is stored in flash.
The step 2) in, the OLAP query treatment of the AIR algorithms is comprised the following steps:1. OLAP query is decomposed into Packet filtering operation on dimension table:Selection in inquiry and division operation are divided by dimension table, is each by query decomposition Subquery on individual dimension table;2. generation dimension table filter packets are vectorial:Each dimension table is carried out according to respective where clause to record Filter and be projected out to meet the packet attributes of alternative condition, the packet attributes for meeting alternative condition carry out dictionary table compression, dictionary Table is stored in array, and dictionary compression code is dictionary array index, and alternative condition is the position of false in filter packets attribute - 1 is set to, packed compressed coding is otherwise set to, and record in the filter packets vector isometric with dimension table;3. true off-balancesheet key is more Scanning is plowed to create selection and be grouped vectorial:True table is scanned successively by dimension table filter packets vector selection rate order from low to high Corresponding foreign key column, the position that corresponding dimension table filter packets vector is specified is mapped to during per column scan by foreign key value, and vector value is non- Will the current fact table record position insertion selection vector when negative;According to the note in current selection vector during next external key column scan Record position uses random access manner, and updates selection vector by corresponding dimension table filter packets vector, and deletion is unsatisfactory for follow-up The fact that external key mapping condition table record position;After each foreign key column end of scan, selection vector meets in have recorded true table The record position of whole condition of contacts;When total selection rate is higher, packet vector is updated while selection vector is updated, with dimension The subscript of current group incrementally calculates packet Multidimensional numerical subscript in table filter packets vector;It is first when total selection rate is very low First generation selection vector, finally by each foreign key column of position random access of selection vector, disposable generation packet vector;4. pass through Selection vector accesses flash storage measure column specified metric values:Position in selection vector accesses the degree in flash storages Amount property value;Performed using multi-core parallel concurrent according to random access of the selection vector on flash;5. will measurement by being grouped vector Value is assembled in Multidimensional numerical is grouped:According to the packet subscript that packet vector is recorded, the measurement for returning will be stored from flash Aggregation computation is carried out in the packet Multidimensional numerical unit that property value " pushing away " is indicated to subscript value in packet vector, whole is completed OLAP query treatment;6. each dimension back mapping of Multidimensional numerical will be grouped to acquisition packet attributes value in packet dictionary table and will be exported Query Result.
The step 2. in, the filter packets vector be one dynamic generation dimension table additional column, instead of dimension table and thing Real table is attached the coding for operating and providing linkage record packet attributes on leading dimension;When there is no packet attributes on dimension table When, filter packets vector is reduced to a bitmap, for the connection filtering of true off-balancesheet key.
The step 4) in, DRAM-flash two-stage bitmap connecting strands are used on the basis of keyword bit join index Draw method, n most frequently used conduct is accurately selected in K keyword connection bitmap according to memory storage space quota DRAM is resident bit join index, and remaining K-n keyword bitmap is stored in flash as secondary index.
Due to taking above technical scheme, it has advantages below to the present invention:1st, the present invention is due to using DRAM-flash Two-level memory as memory OLAP storage platform.Relative to complete memory storage model, DRAM-flash two-level memories are significantly Degree reduces the demand to expensive internal memory, the holistic cost of hardware is reduced, by flash-aware storage models and AIR The optimization of OLAP algorithms, a large amount of metric attributes to being stored on flash use efficient random access, reduce flash and deposit The performance gap of storage.2nd, the AIR OLAP queries Processing Algorithm that the present invention is used draws a complete OLAP query processing procedure It is divided into three independent processing stages, the data of small percentage, each dimension is pertained only in dimension table and true off-balancesheet key processing stage Table generates a filter packets vector for fixed length;Inquire about how much the number of columns being related to all only needs to a selection on true table The initial length of one packet vector of vector sum, selection and packet vector is determined by the upper minimum selection rate of dimension, in true off-balancesheet Length constantly shortens in the Y-connection of key, and memory storage expense is limited;The maximum measurement category of memory space accounting in database Property storage in Large Copacity flash, traditional line accesses (including the line in row storage and row storage is accessed) OLAP and looks into Ask Hash connection and the operation of Hash packet aggregation that treatment completes pipeline system during the full table scan of true table, AIR OLAP Search algorithm will be connected, division operation and aggregation operator are decomposed, the full table scan commonly used in traditional database is not used but Connection division operation is completed on the fact that limited amount off-balancesheet key, is then accessed according to the extremely low selection vector opsition dependent of selection rate The specified location of specified metric row, shifts the data access operation that flash is stored onto final processing stage, greatly reduces Data access load on flash, while can also give play to the good random access performances of flash.3rd, the present invention is due to adopting Bit join index is a kind of increment index, because the index upgrade operation that fact token record increases and produces is the suitable of bitmap Sequence increases, and node content and structure are updated rather than as B+- trees index, can eliminate the data that data update on flash Cost is rewritten, therefore is more suitable for flash storages.DRAM-flash two-stage memory bitmap join index methods are based on crucial word bit Two-stage is used to keyword bitmap by key word of the inquiry visiting frequency and memory storage space on the basis of figure join index Memory module, DRAM is stored in by the keyword bitmap that n high frequency in K keyword bitmap is used, remaining K-n crucial word bit Figure is stored in flash, reduces the storage overhead of internal memory index.4th, the present invention be by OLAP patterns and load the characteristics of and The locality characteristic (degree that i.e. data are accessed frequently) of data sets up isomery storage model in OLAP query Processing Algorithm, with The intermediate data structure that the row group of different pieces of information locality intensity and inquiry are relied in table, index, table is object, according to internal memory Capacity and data access performance constraint are by its Optimum distribution in high-performance but the relatively small DRAM of capacity and Large Copacity but performance In relatively low flash storages, data storage efficiency is improved.5th, the present invention is according to DRAM and flash in query processing The characteristic of storage, execution is postponed by the data access on flash, and the complete query processing processes of OLAP are divided into internal memory treatment Two stages are calculated with flash, the different flowing water stored on dial-tone stages are parallel between supporting OLAP query, improve internal memory and CPU Service efficiency.The present invention can be widely applied in general purpose O LAP application scenarios.
Brief description of the drawings
Fig. 1 is storage schematic diagram of the memory OLAP of the invention on DRAM-flash two-level memories;
Fig. 2 is query processing schematic diagram of the memory OLAP on DRAM-flash two-level memories in the embodiment of the present invention;
Fig. 3 is the keyword bit join index treatment schematic diagram on DRAM-flash two-level memories of the invention.
Specific embodiment
Existing memory OLAP technology generally uses complete internal memory computation schema, or Large Copacity flash is used as at a high speed Caching, the former increased the cost of internal memory calculating, and the latter is difficult to optimize OLAP on from memory expansion to flash.Therefore, this hair The bright OLAP query optimization side proposed under a kind of internal memory based on DRAM-flash two-level memories-flash memory mixing memory module Method, storage of the optimization OLAP data on internal memory and flash according to the characteristics of OLAP patterns, load and OLAP algorithms, towards Two-level memory feature optimizes OLAP query algorithm.The present invention is applied to general purpose O LAP application scenarios.Below in conjunction with the accompanying drawings and implement Example is described in detail to the present invention.
As shown in figure 1, the present invention provides the OLAP query optimization method under a kind of internal memory-flash memory mixing memory module, should Method is based on DRAM-flash two-level memories, towards the DRAM- constituted using DRAM and Large Copacity PCIe Flash Card Memory OLAP enquiring and optimizing method on flash two-level memories, it is comprised the following steps:
1) OLAP storages use the storage model of flash-aware, i.e., according to dimension table in OLAP stars or snowflake shape model The characteristics of the characteristics of operation of smaller, predicate is more and true table are made up of external key and metric attribute, in relatively small DRAM and Divided by the locality of data access between relatively large flash storages, storage is carried out on the two-stage internal memory of isomery excellent Change, to improve the efficiency that internal memory medium-high frequency uses data.
Due to the renewal that dimension table is smaller and modern OLAP supports on dimension table, therefore memory storage engine is used, dimension table is resident DRAM.Visiting frequency row higher when external key in true table is Y-connection in multidimensional analysis, in equally residing in DRAM.Thing The metric attribute of real table is more, OLAP query generally just for the metric in a small number of metric attributes and extremely low selection rate, therefore It is stored in the flash of cheap mass, and opsition dependent access API on measure column is provided and supports that the opsition dependent of measure column is visited at random Ask.
2) memory OLAP uses storage of array, and each attribute column is stored in continuous array location, and table is by equal length Each attribute array composition, wherein more preferably, dimension table use array index as major key, true off-balancesheet key in dimension table accordingly to remember The data subscript of record, fact token record can directly position corresponding array location in dimension table according to foreign key value, by traditional company Connect operation and be reduced to array index access (ArrayIndex Reference, AIR), carry out at the OLAP query of AIR algorithms Reason.
3) the OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order: Dimension table is accessed, true off-balancesheet key is accessed and true metric table attribute access, and the intermediate data structure that three phases are produced includes:Dimension Table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical.Dimension table filter packets vector, selection vector sum point Group vector is the shared data structure of each inquiry, and different inquiries only needs to the content of renewal vector;Packet Multidimensional numerical root It is investigated that ask it is different and dynamic generate.This three classes intermediate data structure is reused during query processing, belongs to strong office Property data set in portion's is, it is necessary to be stored in DRAM;Memory space shared by true metric table attribute is larger, but is generally only accessed in inquiry Less metric attribute, and the record of extremely low ratio during random access metric attribute is arranged is only needed to by selecting vector, therefore Metric attribute row can be stored in flash to reduce demand of the memory OLAP to DRAM, reduce the hardware cost of system.
AIR OLAP algorithms of the present invention need selection vector, packet vector, dimension table filter packets vector, packet The data such as Multidimensional numerical are used for OLAP query processing procedure, and these data structures can be reused between OLAP query, shared Memory headroom is fixed, therefore resides in DRAM.
4) data warehouse optimizes or eliminates the connection cost between dimension table and true table usually using bit join index. The present invention using based on keyword bit join index (i.e. according to keywords rather than whole attribute member come set up bitmap connect Connect index) on the basis of optimize K keyword of storage in DRAM-flash two-level memories and connect bitmap (i.e. everyone keyword pair A bitmap isometric with true table is answered, the table record position of the fact that corresponding to the keyword is recorded), therefrom select n high frequency Corresponding bitmap is simultaneously stored in DRAM by access relation word, and remaining bitmap is stored in flash, forms two grades of connection bitmap indexs Structure;
Wherein, the present invention uses DRAM-flash two-stage bit join indexes on the basis of keyword bit join index Method, accurately n most frequently used conduct is selected according to memory storage space quota in K keyword connection bitmap DRAM is resident bit join index, and remaining K-n keyword bitmap is used as secondary index.Specially:Index entry is certain dimension table The epitope set of the fact that corresponding to property value figure, the size of bitmap index is the bitmap size of fixed length and multiplying for keyword quantity Product, bitmap index can further reduce memory storage space by data compression.It is crucial by query execution daily record and inquiry Word analysis, it may be determined that K most frequently used dimension attribute keyword and for it sets up bitmap index, according to memory headroom quota Wherein n most frequently used keyword bitmap can be resided at internal memory, remaining K-n keyword bitmap is stored in flash, Form two grades of connection bitmap index structures.
Above-mentioned steps 2) in, the OLAP query treatment of AIR algorithms is comprised the following steps:
1. OLAP query is decomposed into the packet filtering operation on dimension table:Dimension table is pressed into selection in inquiry and division operation Divided, be the subquery on each dimension table by query decomposition.
2. generation dimension table filter packets are vectorial:Each dimension table is filtered and thrown according to respective where clause to record Shadow goes out to meet the packet attributes of alternative condition, and the packet attributes for meeting alternative condition carry out dictionary table compression, and dictionary table is stored in In array, dictionary compression code is dictionary array index, and alternative condition is set to -1 for the position of false in filter packets attribute, Packed compressed coding is otherwise set to, and is recorded in the filter packets vector isometric with dimension table.
Filter packets vector is a dimension table additional column for dynamic generation, and operation is attached simultaneously with true table instead of dimension table The coding of linkage record packet attributes on leading dimension is provided.When not having packet attributes on dimension table, filter packets vector simplifies It is a bitmap, for the connection filtering of true off-balancesheet key.
3. true many times scanning of off-balancesheet key creates selection and packet vector:By dimension table filter packets vector selection rate by it is low to Order high scans the corresponding foreign key column of true table successively, per column scan when by foreign key value be mapped to corresponding dimension table filter packets to The position that amount is specified, will the current fact table record position insertion selection vector during vector value non-negative;During next external key column scan Random access manner is used according to the record position in current selection vector, and choosing is updated by corresponding dimension table filter packets vector Vector is selected, the fact that be unsatisfactory for follow-up external key mapping condition table record position is deleted;After each foreign key column end of scan, select to Amount have recorded the record position of the whole condition of contacts of satisfaction in true table.When total selection rate is higher, selection vector is being updated Packet vector is updated simultaneously, is incrementally calculated under packet Multidimensional numerical with the subscript of current group in dimension table filter packets vector Mark;When total selection rate is very low, selection vector is firstly generated, finally by each foreign key column of position random access of selection vector, one Secondary property generation packet vector.
4. measure column specified metric value is stored by selecting vector to access flash:Position in selection vector accesses Metric attribute value in flash storages, it is only necessary to which returning to few metric carries out packet aggregation calculating;Flash has good Parallel random access performance, performed using multi-core parallel concurrent according to selection random access of the vector on flash, reduce flash Data access delay.
5. metric is assembled in Multidimensional numerical is grouped by being grouped vector:According under the packet that packet vector is recorded Mark, will store the packet Multidimensional numerical list that the metric attribute value " pushing away " for returning is indicated to subscript value in packet vector from flash Aggregation computation is carried out in unit, whole OLAP query treatment are completed.
6. each dimension back mapping of Multidimensional numerical will be grouped to acquisition packet attributes value in packet dictionary table and inquiry will be exported As a result.
Embodiment:
As shown in Fig. 2 flash-aware is embodied in storage on flash by the optimization to OLAP query algorithm The access of metric attribute is shifted onto finally, true table sequential scan is converted to true metric table attribute is performed according to selection vector The random access of low selection rate, reduces the delay that metric attribute is accessed on flash.
Step 2) by taking following querying command as an example:
SELECT c_nation,s_nation,sum(l_revenue),sum(l_price)
FROM customer,lineorder,supplier
WHERE lo_custkey=c_custkey
And lo_suppkey=s_suppkey
And c_region='AMERICA'
And s_region='ASIA'
group by c_nation,s_nation;
Querying command is needed by true table l ineorder and dimension table customer, supplier connection, then by dimension table C_region and s_region attributes true metric table row l_revenue and l_price are asked it is cumulative and.
As shown in figure 3, step 4 of the present invention) in keyword bit join index treatment on DRAM-flash two-level memories Embodiment:
In traditional database, index is a kind of planar structure, it is assumed that used in identical storage hierarchy.B+- tree ropes The index technology for the disk database such as drawing realizes that internal memory lacks node in disk and the data exchange of internal memory by buffering area mechanism, But still be a kind of opaque mechanism, its index accesses efficiency depends on the efficiency that buffering area replaces algorithm, it is impossible to by data Storehouse customized type optimization.In OLAP applications, index entry is present in dimension table attribute, but the object of index is then corresponding in true table Linkage record, common B+- trees index can only retrieve the data on single table, for OLAP applications, the rope on less dimension table Draw acceleration of the raising of access performance to OLAP query overall performance limited, and setting up index to true off-balancesheet key can add Index storage overhead in attended operation between fast fact table and dimension table, but true off-balancesheet key is extremely huge, and modern True table needs largely to update in real time in OLAP applications, and the renewal of index is costly.The present invention uses bitmap connecting strand Draw method, i.e., be that the attribute member specified on dimension table sets up true table connection bitmap by attended operation, each member sets up one Individual bitmap, indicates link position of the member in fact token record.Bit join index is a kind of increment index, in true table Mass data only needs to incrementally extend bitmap lengths when inserting, it is not necessary to which the connection bitmap to having set up is reconstructed. The bit join index used in traditional database is, for all members set up connection bitmap, to be faced with low gesture with attribute as granularity Set attribute bit join index space expense is small but selection rate is too high, and power set Attributions selection rate high is low but connects bitmap quantity It is many, the excessive contradiction of storage overhead.The present invention is using the TOP K visiting frequencies selection by the dimension attribute keyword in query load Bit join index keyword, and for K global high frequency access key sets up global bit join index, form a key/ Value index structures, key is the global name of keyword, including table and keyword message, value are then connection bitmaps.It is different There is identical to connect bitmap lengths for dimension table, the keyword of different attribute, can store in global bit join index.Wherein More preferably, the present invention proposes DRAM-flash two-stages bit join index side on the basis of keyword bit join index Method, accurately most frequently used n is selected as DRAM according to memory storage space quota in K keyword connection bitmap Resident bit join index, remaining K-n keyword bitmap is used as secondary index.As shown in Figure 3, it is shown that key word of the inquiry position Figure is respectively present in the application scenarios of DRAM and flash two-level memories, and the bitmap in DRAM first completes selection operation, generates Filter bitmap, when the selection rate of filtered bitmap is in certain threshold range (Slow,Shigh) between when, according to generation filtered bitmap in The position of " 1 " carries out random access to the specified location of corresponding bitmap in flash storages, determines the final logic knot in the position Really.Flash has preferable random access ability, and the present invention is visited flash bitmaps parallel using to filtered bitmap sequential access The strategy asked, improves the concurrent access performance of data on flash, reduces the delay that bitmap index is calculated, as shown in Figure 3 simultaneously Row flash accesses thread.Assuming that the selection rate of filtered bitmap is S in DRAM1, the selection rate of flash Bitmaps is S2(bitmap Selection rate can be accurately given by the quantity of " 1 "), TF(S1) it is the bitmap access delay on flash, TP(S2,S1) be Index selection rate is S2And S1When query processing time difference, work as TP(S2,S1)-TF(S1)>When 0, the bitmap on flash accesses tool There is query performance income.SlowAnd ShighTo meet minimum and highest selection rate difference (such as S of inquiry income2-S1)。
In sum, compared with prior art, the present invention is looked into using the memory OLAP based on DRAM-flash two-level memories Optimization method is ask, the memory OLAP query processing on DRAM-flash two-level memories is supported, memory OLAP is reduced to expensive internal memory Demand, improve memory OLAP cost performance.By the storage optimization method based on DRAM-flash two-level memories, OLAP query Optimization method and optimiged index method, pellucidly optimize data access and Directory Enquiries rationality in OLAP query processing procedure Energy.The present invention improves the access data managing capacity and query processing performance of memory OLAP big data simultaneously.
The various embodiments described above are merely to illustrate the present invention, wherein the structure of each part, connected mode and manufacture craft etc. are all Can be what is be varied from, every equivalents carried out on the basis of technical solution of the present invention and improvement should not be excluded Outside protection scope of the present invention.

Claims (8)

1. a kind of internal memory-flash memory mixes the OLAP query optimization method under memory module, and it is comprised the following steps:
1) OLAP storage using flash-aware storage model, i.e., according to dimension table in OLAP stars or snowflake shape model it is smaller, The characteristics of the characteristics of predicate operation is more and true table are made up of external key and metric attribute, in relatively small DRAM and relatively Divided by the locality of data access between big flash storages, storage optimization is carried out on the two-stage internal memory of isomery;
2) memory OLAP uses storage of array, and each attribute column is stored in continuous array location, and table is by each of equal length Attribute array is constituted, and dimension table uses array index as major key, and true off-balancesheet key is the data subscript of respective record in dimension table, thing The record of real token can directly position corresponding array location in dimension table according to foreign key value, and traditional attended operation is reduced into array Subscript is accessed, and carries out the OLAP query treatment of AIR algorithms;Wherein AIR algorithms are array index access;
3) the OLAP query treatment based on storage of array and AIR algorithms is decomposed into three data access processes of order:Dimension table Access, true off-balancesheet key is accessed and true metric table attribute access, the intermediate data structure that three phases are produced includes:Dimension table mistake Filter packet vector, selection vector sum packet vector, packet Multidimensional numerical;Dimension table filter packets vector, selection vector sum be grouped to It is the shared data structure of each inquiry to measure, and different inquiries only needs to the content of renewal vector, is grouped Multidimensional numerical according to looking into Inquiry different and dynamic is generated;
4) using optimization storage K in DRAM-flash two-level memories on the basis of the bit join index based on keyword Keyword connects bitmap, therefrom selects n high frequency access relation word and corresponding bitmap is stored in into DRAM, and remaining bitmap is deposited Flash is stored in, two grades of connection bitmap index structures are formed.
2. a kind of internal memory-flash memory as claimed in claim 1 mixes the OLAP query optimization method under memory module, and its feature exists In:The step 1) in, using memory storage engine, dimension table is resident DRAM;External key in true table is star in multidimensional analysis Visiting frequency row higher during connection, in equally residing in DRAM;The metric attribute of true table is stored in flash, and is provided Opsition dependent accesses the random access that API supports measure column on measure column.
3. a kind of internal memory-flash memory as claimed in claim 1 mixes the OLAP query optimization method under memory module, and its feature exists In:The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in DRAM; True metric table attribute is stored in flash.
4. a kind of internal memory-flash memory as claimed in claim 2 mixes the OLAP query optimization method under memory module, and its feature exists In:The step 3) in, dimension table filter packets vector, selection vector sum packet vector, packet Multidimensional numerical are stored in DRAM; True metric table attribute is stored in flash.
5. a kind of internal memory as claimed in claim 1 or 2 or 3 or 4-flash memory mixes the OLAP query optimization side under memory module Method, it is characterised in that:The step 2) in, the OLAP query treatment of the AIR algorithms is comprised the following steps:
1. OLAP query is decomposed into the packet filtering operation on dimension table:Selection in inquiry and division operation are carried out by dimension table Divide, be the subquery on each dimension table by query decomposition;
2. generation dimension table filter packets are vectorial:Each dimension table is filtered and is projected out according to respective where clause to record Meet the packet attributes of alternative condition, the packet attributes for meeting alternative condition carry out dictionary table compression, and dictionary table is stored in array In, dictionary compression code is dictionary array index, and alternative condition is set to -1 for the position of false in filter packets attribute, otherwise Packed compressed coding is set to, and is recorded in the filter packets vector isometric with dimension table;
3. true many times scanning of off-balancesheet key creates selection and packet vector:By dimension table filter packets vector selection rate from low to high Order scans the corresponding foreign key column of true table successively, is mapped to corresponding dimension table filter packets vector by foreign key value during per column scan and refers to Fixed position, will the current fact table record position insertion selection vector during vector value non-negative;During next external key column scan according to Record position in current selection vector uses random access manner, and by corresponding dimension table filter packets vector update selection to Amount, deletes the fact that be unsatisfactory for follow-up external key mapping condition table record position;After each foreign key column end of scan, selection vector note The record position of the whole condition of contacts of satisfaction in true table is recorded;When total selection rate is higher, while selection vector is updated Packet vector is updated, packet Multidimensional numerical subscript is incrementally calculated with the subscript of current group in dimension table filter packets vector;When When always selection rate is very low, selection vector is firstly generated, it is disposably raw finally by each foreign key column of position random access of selection vector Into packet vector;
4. measure column specified metric value is stored by selecting vector to access flash:Position in selection vector accesses flash Metric attribute value in storage;Performed using multi-core parallel concurrent according to random access of the selection vector on flash;
5. metric is assembled in Multidimensional numerical is grouped by being grouped vector:According to the packet subscript that packet vector is recorded, To be stored from flash return metric attribute value " pushing away " to packet vector in subscript value indicate packet Multidimensional numerical unit in Aggregation computation is carried out, whole OLAP query treatment are completed;
6. each dimension back mapping of Multidimensional numerical will be grouped to acquisition packet attributes value in packet dictionary table and Query Result will be exported.
6. a kind of internal memory-flash memory as claimed in claim 5 mixes the OLAP query optimization method under memory module, and its feature exists In:The step 2. in, the filter packets vector be one dynamic generation dimension table additional column, instead of dimension table with the fact table enter Row attended operation simultaneously provides the coding of linkage record packet attributes on leading dimension;When there is no packet attributes on dimension table, filtering Packet vector is reduced to a bitmap, for the connection filtering of true off-balancesheet key.
7. the OLAP query optimization under a kind of internal memory as described in claim 1 or 2 or 3 or 4 or 6-flash memory mixing memory module Method, it is characterised in that:The step 4) in, the levels of DRAM-flash two are used on the basis of keyword bit join index Figure join index method, accurately selects most frequently used according to memory storage space quota in K keyword connection bitmap N be resident bit join index as DRAM, remaining K-n keyword bitmap is as secondary index storage in flash.
8. a kind of internal memory-flash memory as claimed in claim 5 mixes the OLAP query optimization method under memory module, and its feature exists In:The step 4) in, DRAM-flash two-stages bit join index side is used on the basis of keyword bit join index Method, accurately most frequently used n is selected as DRAM according to memory storage space quota in K keyword connection bitmap Resident bit join index, remaining K-n keyword bitmap is stored in flash as secondary index.
CN201410717830.6A 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module Active CN104361113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410717830.6A CN104361113B (en) 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410717830.6A CN104361113B (en) 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module

Publications (2)

Publication Number Publication Date
CN104361113A CN104361113A (en) 2015-02-18
CN104361113B true CN104361113B (en) 2017-06-06

Family

ID=52528373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410717830.6A Active CN104361113B (en) 2014-12-01 2014-12-01 A kind of OLAP query optimization method under internal memory flash memory mixing memory module

Country Status (1)

Country Link
CN (1) CN104361113B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930388B (en) * 2016-04-14 2019-04-23 中国人民大学 A kind of OLAP packet aggregation method based on functional dependencies
CN108255829B (en) * 2016-12-28 2021-10-19 腾讯科技(北京)有限公司 Data searching method and device
CN108733681B (en) * 2017-04-14 2021-10-22 华为技术有限公司 Information processing method and device
KR102507140B1 (en) * 2017-11-13 2023-03-08 에스케이하이닉스 주식회사 Data storage device and operating method thereof
CN109086456B (en) * 2018-08-31 2020-11-03 中国联合网络通信集团有限公司 Data indexing method and device
US11199991B2 (en) 2019-01-03 2021-12-14 Silicon Motion, Inc. Method and apparatus for controlling different types of storage units
TWI739075B (en) * 2019-01-03 2021-09-11 慧榮科技股份有限公司 Method and computer program product for performing data writes into a flash memory
CN111782734B (en) * 2019-04-04 2024-04-12 华为技术服务有限公司 Data compression and decompression method and device
CN110647722B (en) * 2019-09-20 2024-03-01 中科寒武纪科技股份有限公司 Data processing method and device and related products
US11386089B2 (en) 2020-01-13 2022-07-12 The Toronto-Dominion Bank Scan optimization of column oriented storage
CN112597114B (en) * 2020-12-23 2023-09-15 跬云(上海)信息科技有限公司 OLAP (on-line analytical processing) precomputation engine optimization method and application based on object storage
CN115309947B (en) * 2022-08-15 2023-03-21 北京欧拉认知智能科技有限公司 Method and system for realizing online analysis engine based on graph
CN116483831B (en) * 2023-04-12 2024-01-30 上海沄熹科技有限公司 Recommendation index generation method for distributed database

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6651055B1 (en) * 2001-03-01 2003-11-18 Lawson Software, Inc. OLAP query generation engine
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
CN103631911A (en) * 2013-11-27 2014-03-12 中国人民大学 OLAP query processing method based on array storage and vector processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9501550B2 (en) * 2012-04-18 2016-11-22 Renmin University Of China OLAP query processing method oriented to database and HADOOP hybrid platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6651055B1 (en) * 2001-03-01 2003-11-18 Lawson Software, Inc. OLAP query generation engine
CN102663114A (en) * 2012-04-17 2012-09-12 中国人民大学 Database inquiry processing method facing concurrency OLAP (On Line Analytical Processing)
CN103309958A (en) * 2013-05-28 2013-09-18 中国人民大学 OLAP star connection query optimizing method under CPU and GPU mixing framework
CN103631911A (en) * 2013-11-27 2014-03-12 中国人民大学 OLAP query processing method based on array storage and vector processing

Also Published As

Publication number Publication date
CN104361113A (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN104361113B (en) A kind of OLAP query optimization method under internal memory flash memory mixing memory module
CN103309958B (en) The star-like Connection inquiring optimization method of OLAP under GPU and CPU mixed architecture
US8660985B2 (en) Multi-dimensional OLAP query processing method oriented to column store data warehouse
CN104866608B (en) Enquiring and optimizing method based on join index in a kind of data warehouse
CN103631911B (en) OLAP query processing method based on storage of array and Vector Processing
US8762407B2 (en) Concurrent OLAP-oriented database query processing method
CN103294831B (en) Based on the packet aggregation computational methods of Multidimensional numerical in column storage database
CN103942342B (en) Memory database OLTP and OLAP concurrency query optimization method
CN105868388B (en) A kind of memory OLAP enquiring and optimizing method based on FPGA
CN102663116A (en) Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN108536692B (en) Execution plan generation method and device and database server
US20120011144A1 (en) Aggregation in parallel computation environments with shared memory
US9141666B2 (en) Incremental maintenance of range-partitioned statistics for query optimization
CN113032427B (en) Vectorization query processing method for CPU and GPU platform
EP2469423B1 (en) Aggregation in parallel computation environments with shared memory
CN105488231A (en) Self-adaption table dimension division based big data processing method
CN104750727B (en) A kind of column memory storage inquiry unit and column memory storage querying method
CN107943952A (en) A kind of implementation method that full-text search is carried out based on Spark frames
CN105975587A (en) Method for organizing and accessing memory database index with high performance
CN106095863A (en) A kind of multidimensional data query and storage system and method
CN109597829B (en) Middleware method for realizing searchable encryption relational database cache
US11294816B2 (en) Evaluating SQL expressions on dictionary encoded vectors
CN103365923A (en) Method and device for assessing partition schemes of database
CN104809210B (en) One kind is based on magnanimity data weighting top k querying methods under distributed computing framework
CN105359142A (en) Hash join method, device and database management system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant