CN100524357C - Data pre-fetching system in video processing - Google Patents

Data pre-fetching system in video processing Download PDF

Info

Publication number
CN100524357C
CN100524357C CNB2007100469298A CN200710046929A CN100524357C CN 100524357 C CN100524357 C CN 100524357C CN B2007100469298 A CNB2007100469298 A CN B2007100469298A CN 200710046929 A CN200710046929 A CN 200710046929A CN 100524357 C CN100524357 C CN 100524357C
Authority
CN
China
Prior art keywords
data
prefetch
processor
register
fetching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2007100469298A
Other languages
Chinese (zh)
Other versions
CN101140658A (en
Inventor
贺迅
杨华
郑素贞
方向忠
张重阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CNB2007100469298A priority Critical patent/CN100524357C/en
Publication of CN101140658A publication Critical patent/CN101140658A/en
Application granted granted Critical
Publication of CN100524357C publication Critical patent/CN100524357C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a data pre-acquisition system in video process; the system comprises a data pre-acquisition module, a pre-acquisition register group, a control bus and a data bus; wherein, the data pre-acquisition module is used to control data acquisition and writing sequence according to the arrangement of the processor; the pre-acquisition register is used to save data that are written into the data pre-acquisition module, before being acquired by the processor and only written by the data pre-acquisition module write; the control bus is used to arrange the data pre-acquisition module; the data bus is used to connect with the data pre-acquisition module and the pre-acquisition register; wherein: the data pre-acquisition module can follow the processor's arrangement to write data directly into the pre-acquisition register group for files in the processor registers; the processor can directly use the pre-acquisition registers in commands, in order to gain required data. Since the processor can directly use the pre-acquisition registers to gain require data in commands, the invention can reduce commands for data acquisition, shorten block data acquisition in video process, and therefore, increases efficiency of the processor.

Description

Data pre-fetching system in the Video processing
Technical field
What the present invention relates to is the system in a kind of telecommunication technology field, specifically is data pre-fetching system in a kind of Video processing.
Background technology
In recent years, in order to alleviate processor at a high speed and the gap between the internal memory, processor system adopts cache technology usually.Cache is a high-speed buffer processor, and it is used to deposit nearest used data of processor or instruction between CPU and internal memory.When processor to access data or when instruction, it at first visits cache, and whether data that inspection will be visited or instruction in cache, hit if data are cache in cache, then directly read data and instruction in the cache.If data that processor will be visited or instruction be not in cache, cache then occurring lost efficacy, at this time, processor will send request of access to internal memory, instruction that search will be visited or data, in the meantime, processor will spend tens even up to a hundred clock period to wait for reading of data from internal memory.Processor to access data is that Load/Store instructs to realize reading from storer and calculates required data to register by data access instruction generally, the Load/Store instruction occupies quite high ratio usually in programmed instruction, according to statistics, the SPECint92 test procedure is on the 80X86 order set, and only the Load instruction has just accounted for 22% more than of whole programmed instruction.In using the processor of cache, data could be used by next bar instruction the data read instruction needs 2-4 cycle under the situation that cache hits after, and data could be by next bar instruction use after tens cycles under the situation that cache lost efficacy.Therefore, the efficient of data access has significant effects to the work efficiency of processor integral body.In Video processing, data volume is very big, and the access times of each data generally are no more than ten times and the programmer needs those data when often knowing code below carrying out, and the temporal correlation of data access is poor, and spatial coherence is more intense.
Find through literature search prior art, Chinese patent application number is that 200510093077.9 patent names are in the patent of " a kind of cache prefetch module and method thereof ", by adding the data pre-fetching instruction, the data that part will be used are called in the cache memory, can reduce cache like this lost efficacy, improve the cache hit rate, but processor need be carried out more instruction and could realize this function.Some processor comes the predicted data visit that the data that need are called in internal memory by hardware cell, thereby reduce the time-delay of data access, but this data pre-fetching that does not need to instruct itself is the data pre-fetching of predictive, certainly exist the inefficacy of looking ahead, this can consume original just very limited bandwidth of memory.
Summary of the invention
The objective of the invention is at the deficiencies in the prior art, data pre-fetching system in a kind of Video processing is provided, optimize the efficient of carrying out data access operation in the field of video processing, comprise and reduce the processor data references stand-by period and reduce data access operation.
The present invention is achieved by the following technical solutions, the present invention includes: data pre-fetching module, processor register file, control bus, data bus.Described data pre-fetching module is used for the order that control data reads and writes that is provided with according to processor; Described processor register file comprises a prefetch register group, and this prefetch register group comprises two prefetch registers, and described prefetch register is used for the data that the store data prefetch module writes, and reads for processor, and can only be write by the data pre-fetching module; Described control bus is used to be provided with the data pre-fetching module; Described data bus is used to connect data prefetch module and prefetch register, wherein: described data pre-fetching module can directly write data in the described prefetch register group according to being provided with of processor, and processor can directly use described prefetch register to obtain desired data in instruction.
Described data pre-fetching module comprises writing unit, reading unit and register file, writing unit reading of data desired data and writing data in the prefetch register from register file wherein, reading unit reads desired data and is written in the register file from storer, register file is used for the temporary required data of data pre-fetching of finishing.
Described register file is made up of 128 register, is used for the temporary required data of data pre-fetching of finishing.
Described data pre-fetching module is supported three kinds of data pre-fetching patterns, i.e. one-dimensional vector prefetch mode, two dimension vectorial prefetch mode of row and two-dimensional columns vector prefetch mode; Wherein the one-dimensional vector prefetch mode is meant the required one-dimensional vector of prefetch process device from one-dimensional data successively, is spaced apart one between these vectors less than 128 constant; The vectorial prefetch mode of two dimension row is meant the required row vector of prefetch process device from 2-D data successively, and these row vectors are arranged in the adjacent row of 2-D data; Two-dimensional columns vector prefetch mode is meant the required column vector of prefetch process device from 2-D data successively, and these column vectors are arranged in the adjacent row of 2-D data.
Data pre-fetching module of the present invention can be according to setting continuously from the storer read processor required data of processor to it, and can directly write data in the prefetch register of processor.Processor need be provided with this module when using these module accesses data, comprises the pattern that data access is set, the start address of data.This module can be according to the setting of processor, and access memory obtains the processor desired data and also is written in the prefetch register of processor efficiently.If the data that processor needs have been written in the prefetch register, processor can directly use the source operand of this prefetch register as instruction, and processor need not to use the data read instruction to read from storer and calculates required data.In the register of processor, need prefetch register and be used to receive the data that prefetch module writes, the data pre-fetching module can write data into these prefetch registers, and processor can read these prefetch registers, and these registers can only be write by the data pre-fetching module.The preferred embodiments of the present invention are to use two this prefetch registers, two prefetch register 26S Proteasome Structure and Functions are the same, can be write by the data pre-fetching module, and directly being used as source operand by the instruction of processor, processor can use two data of taking out from storer simultaneously in an operational order like this.After processor is provided with the data pre-fetching module, if processor read this prefetch register before the data pre-fetching module is written to prefetch register with the processor desired data, will cause processor to be waited for and write data into this prefetch register up to the data pre-fetching module, the data pre-fetching module can the pattern according to the processor setting write the required data of processor next time processor carries out read operation to this register after.If the data pre-fetching module has been finished this required data access operation is set, then the data pre-fetching module enters idle pulley, up to processor this module is set once more.
Because Video processing is processing unit with vector sum 16x16 matrix normally, data access pattern is more fixing, optimizes data access in the Video processing so the present invention has designed three kinds of data pre-fetching patterns.Can greatly improve the efficient of data access in the Video processing by these three kinds of modes, for high definition video decoding, if processor reading command maximum can only read 32 Bit datas, use this method per second can reduce 46656000 data read instructions, the pipeline latency that produces in the time of can reducing the processor access memory data greatly when using this method visit data simultaneously at least.
Description of drawings
Fig. 1 is the block diagram of the data pre-fetching system of the preferred embodiment of the present invention;
Fig. 2 is the synoptic diagram of a kind of one-dimensional data prefetch mode of the preferred embodiment of the present invention;
Fig. 3 is the synoptic diagram of the capable vector of a kind of 2-D data of preferred embodiment of the present invention prefetch mode;
Fig. 4 is the synoptic diagram of a kind of 2-D data column vector prefetch mode of the preferred embodiment of the present invention.
Embodiment
Below embodiments of the invention are elaborated: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
Fig. 1 illustrates the block scheme that can realize the preferred embodiments of the present invention data pre-fetching system.As shown in the figure, data pre-fetching module F5 comprises writing unit F3, register file and reading unit F6.In the register file of processor, comprise two prefetch register R_a and R_b, be referred to as the prefetch register group, i.e. F1 among Fig. 1.Prefetch register R_a is the same with the R_b 26S Proteasome Structure and Function, can be write by the data pre-fetching module, directly is used as source operand by the instruction of processor.Realize two prefetch registers in the register file of processor, benefit is that processor can use two data of taking out from storer as source operand simultaneously in an operational order.Data pre-fetching module F5 writes data in the prefetch register of processor by data bus F2.Processor is provided with the data pre-fetching module by control bus F4.The data pre-fetching module by memory access bus F7 from memory read data.
The processor (not shown) at first is provided with data pre-fetching module F5 by control bus F4, comprises the pattern that data pre-fetching is set, and information relevant with this pattern and the start address of data in storer etc.
Data pre-fetching module F5 is according to the setting of processor, and at first data-reading unit is by memory access bus F7 reading of data and being written in the register file of data pre-fetching module F5 from storer.Writing unit F3 is according to setting reading of data from register file of processor, writes data in the prefetch register in the processor register file by data bus F2.Register file among the data pre-fetching module F5 is made up of the register of 128 bits.Because size of data is generally 8 bits and 16 bits in image and the Video processing, so register file supports 8 bits and 16 bits to read in the preferred embodiments of the present invention.The present invention does not do concrete restriction to register number in the register file.
Fig. 2 illustrates according to the synoptic diagram of the preferred embodiment of the present invention in a kind of one-dimensional data prefetch mode of data pre-fetching system support shown in Figure 1.F8 illustrates the location mode of one-dimensional data in storer among Fig. 2, and data cell is deposited in storer continuously.One-dimensional data shown in the F8 is by data cell E (1), E (2) ... form.F10 illustrates the Vector Groups S that is made up of data cell among the one-dimensional data F8 among Fig. 2.Vector Groups S is by n+1 vectorial R0, R1 ..., Rn forms.Vectorial R0 among the Vector Groups S is by the data cell E (1) among a the one-dimensional data F8, E (2), E (3),, E (a) forms, and any one vectorial Rx in the n+1 of the Vector Groups S vector (x represents more than or equal to the 0 any number less than n) is by E (m*x+1), E (m*x+2),, E (m*x+a) forms, m greater than 1 smaller or equal to 64.Processor need be provided with m among the Vector Groups S and the value of n, and the size of data cell F9 among the start address of one-dimensional data F8 and the one-dimensional data F8.Reading unit F6 writes in the register file of prefetch module F5 after obtaining data successively according to size and the address information of handling that sends desired data to storer that be provided with, and the maximal value of each data read is subjected to the bandwidth constraints of storer.The address information of data read increases progressively the start address of pointing to next data automatically according to being provided with.Writing module is written in the prefetch register the write sequence of register file reading of data successively then according to read module again.
Writing module enters waiting status after prefetch register is written into, and starts once more after processor carries out data read operation to this prefetch register, and writing module writes next data again.Writing module enters waiting status after finishing all write operations.Vector length among the Vector Groups S is subjected to register size restriction in the register file among the data pre-fetching module F5, the length of vector is smaller or equal to the length of register, if promptly among the data pre-fetching module F5 in the register file register be 128, then the vector among the Vector Groups S can not be above 128.
Vectorial R1 is read in the support of this pattern successively successively from the one-dimensional data of storer, R2 ..., Rn.And successively R1, R2 ..., Rn is written in the prefetch register, makes processor obtain R1 successively by the visit to prefetch register, R2 ..., Rn.
Fig. 3 illustrates according to the synoptic diagram of the preferred embodiment of the present invention in a kind of 2-D data prefetch mode of data pre-fetching system support shown in Figure 1.F11 illustrates the location mode of 2-D data in storer among Fig. 3, and data cell is deposited in storer continuously, and 2-D data is made of the capable j row of an i data unit F 12.F12 illustrates data cell E (1,1), E (1,2) ..., (i j) forms 2-D data F11 to E.F13 illustrates the capable Vector Groups V that is made up of data cell among the 2-D data F11 among Fig. 2.Go Vector Groups V by n vectorial V1, V2 ..., Vn forms.Any one vectorial Vx (x represents more than or equal to the 1 any number smaller or equal to n) in n the vector of row Vector Groups V is by E (x, 1), E (x, 2) ..., (x a) forms E.Processor need be provided with among the capable Vector Groups V number n of the big or small a and the vector of vector, and address and the E (i of E (1,1) among the 2-D data F11, j) size of j in, and the data type of data cell F12 among the 2-D data F11, data type refers to character type, short, integer data etc.At first reading unit F6 is according to the setting of processor E (1 from storer successively, 1) address begins to read vectorial V1 and writes data in the register file of F5, calculate the size according to j then, calculating next time, the start address of data read is the address of E (2,1).The start address of the data read start address that equals this data read adds j next time.Writing module is being written in the prefetch register the write sequence of register file reading of data successively then according to read module.After prefetch register is written into, enter waiting status, after processor carries out data read operation to this prefetch register, start once more, write next data.Writing module enters waiting status after finishing all write operations.
Vectorial V1 is read in the support of this pattern successively successively from the 2-D data of storer, V2 ..., Vn.And successively V1, V2 ..., Vn is written among the prefetch register F1, makes processor obtain vectorial V1 successively by the visit to prefetch register F1, V2 ..., Vn.
Fig. 4 illustrates according to the synoptic diagram of the preferred embodiment of the present invention in a kind of 2-D data column vector prefetch mode of data pre-fetching system support shown in Figure 1.F14 illustrates the location mode of 2-D data in storer among Fig. 4, and data cell is deposited in storer continuously, and 2-D data is made of the capable j row of an i data unit F 12.F12 illustrates data cell E (1,1), E (1,2) ..., (i j) forms 2-D data F14 to E.F15 illustrates the column vector group S that is made up of data cell among the 2-D data F14 among Fig. 4.Column vector group S is by n vectorial S1, S2 ..., Sn forms.Any one vectorial Sx in n the vector of column vector group S (x represents more than or equal to the 1 any number smaller or equal to n) is by E (x, 1), E (x, 2) ..., E (x, a) E (1, x), E (2, x) ..., (a x) forms E.Processor need be provided with among the column vector group S number n of the big or small a and the vector of vector, and address and the E (i of E (1,1) among the 2-D data F14, j) size of j in, and the data type of data cell among the 2-D data F14, data type refers to character type, short, integer data etc.At first reading unit F6 according to the setting of processor successively from storer the address of E (1,1) begin to read from the start address to the start address data that add (n-1) this sector address and be written in the register file of prefetch module.Start address adds that automatically n obtains new start address then, is that start address continues to read as stated above in the 2-D data with n data of delegation with this address, need read a secondary data from storer altogether, and register file is carried out a write operation.After finishing a read operation, writing module begins to read successively vectorial S1 from register file, S2 ..., Sn also is written in the prefetch register.During vectorial in reading column vector group S, writing module need each respectively from register file a be read in the register that module writes data and read a data unit, according to the order array data of the data cell shown in the column vector group F15, will arrange good data at last and be written among the prefetch register F1 of processor then.Column vector S1 is read in the support of this pattern successively successively from the 2-D data of storer, S2 ..., Sn.And successively vectorial S1, S2 ..., Sn is written among the prefetch register F1, makes processor obtain vectorial S1 successively by the visit to prefetch register F1, S2 ..., Sn.

Claims (7)

1, data pre-fetching system in a kind of Video processing, comprise: the data pre-fetching module, processor register file, control bus and data bus, wherein: the data pre-fetching module comprises writing unit, register file and reading unit, it is characterized in that: described data pre-fetching module is used for the order that control data reads and writes that is provided with according to processor, described processor register file comprises a prefetch register group, this prefetch register group comprises two prefetch registers, described prefetch register is used for the data that the store data prefetch module writes, read for processor, and can only be write by the data pre-fetching module, described control bus is used to be provided with the data pre-fetching module, described data bus is used to connect data prefetch module and prefetch register, wherein: described data pre-fetching module can directly write data in the described prefetch register group according to being provided with of processor, and processor can directly use described prefetch register to obtain desired data in instruction.
2, data pre-fetching system in the Video processing according to claim 1, it is characterized in that, the said write unit reads desired data and writes data in the prefetch register from register file, reading unit reads desired data and is written in the register file from storer, register file is used for the temporary required data of data pre-fetching of finishing.
3, data pre-fetching system in the Video processing according to claim 2 is characterized in that, described register file is made up of 128 register.
4, data pre-fetching system in the Video processing according to claim 1 is characterized in that, described data pre-fetching module is supported three kinds of data pre-fetching patterns, i.e. one-dimensional vector prefetch mode, the two dimension vectorial prefetch mode of row and two-dimensional columns vector prefetch mode.
5, data pre-fetching system in the Video processing according to claim 4 is characterized in that, described one-dimensional vector prefetch mode is meant: processor is the required one-dimensional vector of prefetch process device from one-dimensional data successively.
6, data pre-fetching system in the Video processing according to claim 4, it is characterized in that, the vectorial prefetch mode of described two dimension row is meant: processor is the required row vector of prefetch process device from 2-D data successively, and these row vectors are arranged in the adjacent row of 2-D data.
7, data pre-fetching system in the Video processing according to claim 4, it is characterized in that, described two-dimensional columns vector prefetch mode is meant: processor is the required column vector of prefetch process device from 2-D data successively, and these column vectors are arranged in the adjacent row of 2-D data.
CNB2007100469298A 2007-10-11 2007-10-11 Data pre-fetching system in video processing Expired - Fee Related CN100524357C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2007100469298A CN100524357C (en) 2007-10-11 2007-10-11 Data pre-fetching system in video processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2007100469298A CN100524357C (en) 2007-10-11 2007-10-11 Data pre-fetching system in video processing

Publications (2)

Publication Number Publication Date
CN101140658A CN101140658A (en) 2008-03-12
CN100524357C true CN100524357C (en) 2009-08-05

Family

ID=39192602

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007100469298A Expired - Fee Related CN100524357C (en) 2007-10-11 2007-10-11 Data pre-fetching system in video processing

Country Status (1)

Country Link
CN (1) CN100524357C (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763238B (en) * 2008-12-25 2012-07-25 北京联想软件有限公司 Method for improving audio playing quality, audio data acquisition method and audio data acquisition system
CN103634604B (en) * 2013-12-01 2017-01-11 北京航空航天大学 Multi-core DSP (digital signal processor) motion estimation-oriented data prefetching method
CN107168660A (en) * 2016-03-08 2017-09-15 成都锐成芯微科技股份有限公司 Image procossing caching system and method

Also Published As

Publication number Publication date
CN101140658A (en) 2008-03-12

Similar Documents

Publication Publication Date Title
Wang et al. Adaptive placement and migration policy for an STT-RAM-based hybrid cache
CN101354641B (en) Access control method and device of external memory
US9323672B2 (en) Scatter-gather intelligent memory architecture for unstructured streaming data on multiprocessor systems
US8176265B2 (en) Shared single-access memory with management of multiple parallel requests
US8108625B1 (en) Shared memory with parallel access and access conflict resolution mechanism
US7680988B1 (en) Single interconnect providing read and write access to a memory shared by concurrent threads
US6311280B1 (en) Low-power memory system with incorporated vector processing
JP2015530683A (en) Reducing cold translation index buffer misses in heterogeneous computing systems
US6282706B1 (en) Cache optimization for programming loops
US20130326145A1 (en) Methods and apparatus for efficient communication between caches in hierarchical caching design
Cho et al. Near data acceleration with concurrent host access
US20040225840A1 (en) Apparatus and method to provide multithreaded computer processing
Jamshidi et al. D2MA: Accelerating coarse-grained data transfer for GPUs
KR20160010580A (en) Memory unit for emulated shared memory architectures
CN100524357C (en) Data pre-fetching system in video processing
US11921634B2 (en) Leveraging processing-in-memory (PIM) resources to expedite non-PIM instructions executed on a host
JP6679570B2 (en) Data processing device
CN116149554B (en) RISC-V and extended instruction based data storage processing system and method thereof
CN112148366A (en) FLASH acceleration method for reducing power consumption and improving performance of chip
CN114911724B (en) Memory access structure based on multi-bank cache prefetching technology
Fu et al. A hardware-efficient dual-source data replication and local broadcast mechanism in distributed shared caches
CN108234147B (en) DMA broadcast data transmission method based on host counting in GPDSP
CN107807888B (en) Data prefetching system and method for SOC architecture
CN103186474B (en) The method that the cache of processor is purged and this processor
Zhang Improving Data Movement Efficiency in the Memory Systems for Irregular Applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090805

Termination date: 20151011

EXPY Termination of patent right or utility model