US20040062240A1 - Mechanism for data forwarding - Google Patents
Mechanism for data forwarding Download PDFInfo
- Publication number
- US20040062240A1 US20040062240A1 US10/663,880 US66388003A US2004062240A1 US 20040062240 A1 US20040062240 A1 US 20040062240A1 US 66388003 A US66388003 A US 66388003A US 2004062240 A1 US2004062240 A1 US 2004062240A1
- Authority
- US
- United States
- Prior art keywords
- directional
- input
- controller
- select signal
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000007246 mechanism Effects 0.000 title description 8
- 230000001143 conditioned effect Effects 0.000 claims description 12
- 238000000034 method Methods 0.000 abstract description 15
- 230000009467 reduction Effects 0.000 abstract description 5
- 238000007796 conventional method Methods 0.000 abstract 1
- FUYLLJCBCKRIAL-UHFFFAOYSA-N 4-methylumbelliferone sulfate Chemical compound C1=C(OS(O)(=O)=O)C=CC2=C1OC(=O)C=C2C FUYLLJCBCKRIAL-UHFFFAOYSA-N 0.000 description 41
- 238000010586 diagram Methods 0.000 description 15
- 238000012913 prioritisation Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30141—Implementation provisions of register files, e.g. ports
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
- G06F9/3826—Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage
- G06F9/3828—Bypassing or forwarding of data results, e.g. locally between pipeline stages or within a pipeline stage with global bypass, e.g. between pipelines, between clusters
Definitions
- MUXs multiplexers
- Access to computed results from several computational cycles not yet in a register file is provided by multiple layers, or a hierarchy, of MUXs.
- the data forwarding or bypass circuitry is repeated numerous times, increasing the complexity of the overall circuit.
- the complexity of the system is also increased as data forwarding is used for additional ALUs or as data forwarding is used for additional ALU inputs.
- Use of a MUX hierarchy can provide a capability in which every computed result from each ALU can be bypassed to every other ALU in one cycle or one machine state. When this is achieved, a complete bypass network is obtained.
- each data forwarding or bypass circuit requires one or more MUXs.
- Dynamic circuits are typically used to ensure that the right MUX output is selected, at the fastest possible speed. Dynamic circuits are monotonic signalling. For each input of to a dynamic circuit MUX, a corresponding separate discrete MUX select signal is required so as to avoid select signal decoding delays. Therefore, for a dynamic MUX, the number of inputs on a MUX is equal to the number of selects on the MUX. For a dynamic MUX which has N inputs, N selects would also be required resulting in at least 2N connections to the MUX. Connections are also required for the output and clock resulting in 2N+2 connections to the MUX.
- An object of the invention is to provide a methodology that allows access to computed results which are not available in a register file, without the associated drawbacks.
- a further object of the invention is to provide a data forwarding architecture which reduces the number of wires used.
- a further object of the invention is the use of less area for data forwarding.
- a further object of the invention is to reduce circuit complexity.
- a further object of the invention is to allow increased data forwarding capability.
- a bypass is established through the reuse of the register file input wires without the need for any additional wires.
- the speed of the bypass is also increased over a standard MUX hierarchy because the MUXs used in the invention are of a smaller size, and therefore faster. Less area is required by virtue of smaller MUXs and through a reduction in the number of MUXs used.
- the use of fewer wires and smaller components simplifies the circuit design and its complexity and allows for less onerous debugging of the circuitry.
- MUX control circuitry is also simplified because exclusivity in the select lines is no longer required.
- FIG. 1 shows a block diagram of a preferred embodiment for a mechanism for a data forwarding circuit
- FIG. 2 shows a block diagram of a prior art implementation of data forwarding circuitry
- FIG. 3 shows a block diagram of a multiplexer which can be used in the implementation of FIG. 1 or FIG. 2;
- FIG. 4 shows a block diagram of a conventional multiplexer with 2N inputs and 2N selects
- FIG. 5 shows an block diagram of an encoded multiplexer with 2N input and N +2 selects
- FIG. 6 shows an internal block diagram of the encoded multiplexer shown in FIG. 5;
- FIG. 7 shows a internal block diagram of a conventional multiplexer shown in FIG. 4;
- FIG. 8A shows a block diagram of a control circuits for the multiplexer shown in FIG. 4;
- FIG. 8B shows a block diagram of the control circuits for the multiplexer shown in FIG. 5;
- FIG. 9 shows a block diagram of a two multiplexer arrangement which accepts write back, write back minus one and the ALU output as input;
- FIG. 10 shows a block diagram of an additional register file bypass and its use with a cache bypass.
- FIG. 1 shows a new implementation of data forwarding with a write back (WRB) stage bypass.
- the main CPU core pipeline comprises the REG stage (register read), the EXE stage (execute) wherein operations are performed, the DET Stage (detect) wherein exceptions are detected, and the WRB stage (write-back) wherein results are committed to architected state.
- the invention will operate with other pipelines, as this pipeline is one example. It has an ALU 105 , 110 on either side of the register file 115 . Connecting the register file 115 in the middle of the ALUs of functional units optimizes the configuration. Throughout this description the term ALU is used. Those of ordinary skill in the art will understand that other functional units are equivalent to the described ALU's for the purpose of this invention.
- unit B's ALU 105 's output goes to both unit B's latch 120 and to unit B's MUX 125 .
- the output from unit B's latch 120 goes to unit B's bidirectional wired OR controller 130 .
- the output of unit B's bi-directional wired OR controller 130 goes to unit B's MUX 125 and to a path 140 in the register file 115 .
- both unit B's ALU 105 and unit A's ALU 110 results are destined to the register file 115 and the architecture, allows either unit B's ALU 105 or unit A's ALU 110 results to be written into the register file 115 in a given cycle, while preventing both from being written simultaneously. Therefore, when directed by control 142 the output of unit B's bi-directional wired OR controller 130 , goes into the register file 115 , goes across the register file 115 on the path 140 into unit A's bi-directional wire OR controller 135 , and through that into unit A's MUX 145 .
- FIG. 1 shows that this works in both directions. Computed results can travel from unit A to unit B as follows: unit A's ALU 110 to both unit A's MUX 145 and unit A's latch 150 , from unit A's latch 150 to unit A's bi-directional wired OR controller 135 to unit A's MUX 145 , and from unit A's bi-directional wired OR controller 135 to path 140 in register file 115 to unit B's bi-directional wired OR controller 130 to unit B's MUX 125 .
- Computed results can also travel from unit B to unit A as follows: unit B's ALU 105 to both unit B's MUX 125 and unit B's latch 120 , from unit B's latch 120 to unit B's bi-directional wired OR controller 130 to unit B's MUX 125 , and from unit B's bi-directional wired OR controller 130 to path 140 of register 115 to unit A's bi-directional wired OR controller 135 to unit A's MUX 145 .
- Path 140 which traverses across the register file 115 is bi-directional and performs two tasks. Path 140 is used to write the computed value into the register file 115 , and it is also used to allow computed results to be passed between units A and B in either direction.
- FIG. 1 also shows two inputs to unit B's MUX 125 and two inputs to unit A's MUX 145 .
- the two inputs are a write back minus 1 (WRB ⁇ 1) input 155 and a WRB result input 160 for each ALU.
- unit A's MUX 145 there is a WRB-1 input 165 and a WRB result input 170 for each ALU.
- unit B's MUX 125 and unit A's MUX 145 are symbolic of the data forwarding circuitry to allow unstored computed results to be available to the ALUs.
- EXE stage data is the fresher data than DET or WRB stage data, it represents data that has just been computed by the ALU during the last completed cycle.
- WRB ⁇ 1 represents the computed results of the previous ALU cycle. Note that this stage corresponds to the DET pipeline stage.
- WRB represents the computed results of the ALU two cycles ago.
- WRB data will be placed in a register file one cycle before WRB ⁇ 1 data will be recorded in a register file.
- WRB data will be placed in a register file two cycles before EXE data will be recorded in a register file. While this example describes a system in which the ALU result is recorded into a register file three cycles after it has been calculated, one of ordinary skill in the art will recognize that this mechanism for data forwarding and bypass can be expanded to encompass any delay in recording the data in the register file.
- FIG. 2 shows an example of a conventional arrangement used for data forwarding.
- Register file 200 is connected between unit B's ALU 205 and unit A's ALU 210 .
- a data output from unit B's ALU 205 is fed to both unit B's MUX 215 and unit B's latch 220 .
- the output from unit B's latch 220 is connected to unit B's MUX 215 , MUX 225 , and unit A's MUX 230 .
- the output from unit A's ALU 210 is connected to unit A's latch 235 and unit A's MUX 230 .
- the output from unit A's latch 235 is connected to unit A's MUX 230 , MUX 225 and unit B's MUX 215 .
- a first path 240 provides the output of unit B's latch 220 from unit B's ALU 205 to unit A.
- the second connection 245 provides the output of unit A's latch 235 from unit A's ALU 210 result to unit B.
- the third path 250 allows the output of unit B's latch 220 to be MUX'ed with the output of unit A's latch 235 in MUX 225 and is used to drive the input to the register file 200 .
- This configuration has three paths, 240 , 245 , 250 , traversing across the register file 200 .
- the buses are also directional.
- unit B's MUX 215 and unit A's MUX 230 each have three inputs which includes two write back (WRB) results, one set 255 from unit A and one set 260 from unit B.
- WRB write back
- FIG. 1 The embodiment illustrated in FIG. 1 is advantageous when compared to the circuitry illustrated in FIG. 2 for several reasons.
- FIG. 2 there are three paths 240 , 245 , 250 traversing the register file 200 , as compared to the single path 140 traversing the register file 115 shown in FIG. 1.
- the paths are directional, while in FIG. 1 the path 140 is bidirectional.
- both unit B's MUX 215 and unit A's MUX 230 each have three inputs
- FIG. 1's unit B's MUX 125 and unit A's MUX 145 each have two inputs.
- Each of these extra paths and wires, two paths within the register file and one input wire to each of the MUXs take up extra area.
- FIG. 2 and extra MUX 225 is required to multiplex the results from the unit A's ALU 210 and unit B's ALU 205 into the register file 200 .
- This third MUX 125 also takes up valuable area.
- the ALU result is staged to WRB stage, which is the stage where the data is written into the register file.
- WRB stage which is the stage where the data is written into the register file.
- the data then passes into this bi-directional wired OR controller, which determines whether this ALU's result is valid for writing into the register file. If it is, the wired OR controller drives it into the register file and across the register file, into the other wire OR controller, which forwards it on to the MUX or data forwarding hierarchy, which continues on to the source latch for the ALU on the other side. This can go in either direction.
- the bi-directional wire OR controller requires a control input which is easy to determine in this architecture, several cycles in advance to determine whether the data is flowing from unit A to unit B or from unit B to unit A.
- FIG. 1 can be expanded in a couple of different ways. In more complicated architectures, multiple ALUs on either side of the register file can be added. The addition of these extra ALUs means there will be many more results going across to the register file, one for each ALU on each side.
- This diagram can also be expanded with the use of multiple sides. While FIG. 1, depicts a unit A and a unit B, one skilled in the art will appreciate that the diagram could be expanded to include additional units. The inclusion of these additional units would increase the control mechanism 142 , but will not require additional data forwarding or bypass circuitry to be added. The number of ALUs that can write into the register file at a time determines the number of paths required across the register file. FIG.
- FIG. 1 shows two ALUs with a single path 140 across the register file 115 . If two ALUs were included on both sides of the figure, two paths would traverse across the register file to allow two ALUs to write to the register file simultaneously. If only one ALU could write to the register at a time, only one path would be required to traverse across the register file.
- FIG. 3 shows a MUX which can be used in the implementation of either FIG. 1 or FIG. 2.
- FIG. 4 shows a conventional MUX, which has N inputs of data for each stage and, correspondingly, it has N selects for each stage.
- FIG. 4 depicts a WRB stage with N inputs and a WRB ⁇ 1 stage with N inputs.
- This MUX has 2N inputs and consequently 2N selects.
- FIG. 5 shows a block diagram of an encoded MUX.
- the encoded MUX has N inputs of data for each stage.
- FIG. 5 depicts a WRB stage with N inputs and a WRB ⁇ 1 stage with N inputs.
- This encoded MUX therefore has 2N data inputs.
- the number of selects is drastically reduced. While the number of selects for the conventional MUX was 2N, the number of selects for the encoded MUX is N+2. By encoding the selects the number of selects was reduced to N selects plus two additional selects which determine which stage the data is from.
- FIG. 5 shows a block diagram of an encoded MUX.
- the encoded MUX has N inputs of data for each stage.
- FIG. 5 depicts a WRB stage with N inputs and a WRB ⁇ 1 stage with N inputs.
- This encoded MUX therefore has 2N data inputs.
- the number of selects is drastically reduced. While the number of selects for the
- WRB ⁇ 1 corresponds to input 300 into MUX A 305 and WRB corresponds to input 310 into MUX B 315 . Again, these last two inputs are used to determine which stage the data is from.
- FIG. 5 is advantageous because the circuits are wire limited and allow a reduction in the number of selects from 2N to N+2, which saves area and wiring resources.
- Encoded MUX 320 in FIG. 3 depicts a representative MUX.
- a data forwarding or bypass circuit is required for each ALU input.
- Each of these data forwarding or bypass circuits require at least one MUX.
- an ALU has two inputs, so the reduction in selects from 2N to N+2 in each MUX, is felt twice for a two input ALU. For ALUs with more than two inputs the savings in area and reduced complexity are significantly greater.
- FIG. 6 shows an internal diagram of the MUX shown in FIG. 5.
- FIG. 6 highlights include a WRB cone 600 that shows N inputs of WRB data and N selects going into a WRB MUX very similar to the MUX shown in FIG. 7.
- the output of that circuit then goes into a different circuit which contains WRB ⁇ 1 cone 605 , and takes the same N selects as the WRB cone 600 , but also takes N inputs of WRB ⁇ 1 data.
- Both the WRB cone 600 and the WRB ⁇ 1 cone 605 use the same raw selects, labeled cell 0 through N in both cones.
- the WRB selects are encoded such that the select is valid if one or the other is valid but not if both are valid. In this manner the selwrb and selwrb ⁇ 1 control signals are used to determine whether the system uses the output of the WRB ⁇ 1 cone 605 or the WRB cone 600 .
- FIGS. 8A and 8B show how the controls are generated for the MUX 400 shown in FIG. 4 and MUX 500 shown in FIG. 5 respectively.
- Raw selects in FIG. 8A for both the WRB (selwrb) 800 stage and WRB ⁇ 1 (selwrb ⁇ 1) 805 stage are available.
- WRB WRB
- WRB ⁇ 1 WRB ⁇ 1
- selwrb ⁇ 1 WRB ⁇ 1
- selwrb ⁇ 1 selwrb ⁇ 1
- they have to be conditioned and prioritized to work. This is accomplished by ensuring that selwrb ⁇ 1 takes precedence. If any of the selwrb ⁇ 1 are asserted, any selwrb set must be disabled.
- the circuit shown in FIG. 8A demonstrates one implementation of this prioritization.
- the prioritization between the sets ensures the exclusivity of all the selects for the MUXs in FIG. 4.
- the prioritization also ensures the correct priority between the stages.
- a total of 2N selects (N instances of selwrb and N instances of selwrb ⁇ 1) are present and a delay may need to be introduced in the system to allow for the extra time required for prioritization.
- FIG. 8B shows how the select lines are generated for the MUX 500 in FIG. 5.
- the raw selects for the WRB (selwrb) 810 stage and the-raw selects for the WRB ⁇ 1 (selwrb ⁇ 1) 815 stage are available and are bit-wise “OR'ed” together. This reduces the two N inputs 820 , 825 to a single N output 830 .
- This combined set of selects is sent down to the MUX 500 shown in FIG. 5 and is used for the select input sel 505 . Effectively what occurs is that one value from each set of data inputs on the MUX, is selected. In FIG. 5 one value from set WRB 520 and one value from set WRB ⁇ 1 525 is selected.
- the N inputs of selwrb ⁇ 1 815 are also “OR'ed” together and then inverted to give the two additional signals fed into the MUX 500 on FIG. 5.
- the “OR'ed” signal is used by MUX 500 shown in FIG. 5 as select input selwrb ⁇ 1 515 and the inverted “OR'ed” signal is used by MUX 500 shown in FIG. 5 as select input selwrb 510 . These inputs determine which of the two selected inputs are passed through.
- the two sets of data, WRB 520 and WRB ⁇ 1 525 , going into the MUX 500 on FIG. 5 represent data from two different pipeline stages, or data from two different stages in a single pipeline.
- the vector N is used to select one from each of those pipeline stages, and then the last two signals, shown in FIG. 8B, selects between the pair that was selected from the first set. Effectively, the MUX first selects two and then selects one from those two.
- the selwrbs of FIG. 8 are conditioned on the fact that there was no selwrb ⁇ 1. This is similar to the prioritization scheme shown in FIG. 8A.
- the MUX 500 of FIG. 5 is depicted as selecting between WRB and WRB ⁇ 1 data.
- a second IUX (not shown) would be configured to choose between the output of that MUX 500 and the ALU, to ensure the data needed is available.
- FIG. 9 shows a representative schematic.
- FIG. 10 shows a cache bypass.
- the cache 1005 usually can only exist on one side of the register file 1010 . Normally, the cache 1005 results are only bypassed to the ALUs which reside on the same side of the register file 1010 . In FIG. 10, the cache results would be accessible to ALU1 1015 , but not to ALU2 1020 . Because of the mechanism described within this invention for data forwarding, existing paths across the register file 1010 , previously used for data forwarding are now available to bypass the cache results across the register file to the ALUs on the off cache side. In FIG. 10, the available paths across the register file 1010 would allow the cache 1005 results to be sent to ALU2 1020 .
- FIG. 10 depicts one example of the additional capability available when the mechanism for data forwarding included in this invention is implemented.
- the implementation of this mechanism for data forwarding reduces the number of paths required to perform data forwarding and bypass.
- the number of MUXs required to performed data forwarding is also reduced.
Abstract
A system and method are disclosed which allow unstored computed results to be accessed without the normal overhead associated with traditional data forwarding and bypass techniques. Through the use of multiplexers and bi-directional OR controllers the unstored data is readily accessible for use before it is stored in a register file. The circuitry used also allows bi-directional travel across a register file or bank as information is passed between the bi-directional controllers used. Latches can also be used in the circuitry. Additionally, the features of the invention allow the required number of select signals fed to the multiplexers used to be reduced over conventional methods. These reductions are possible through circuitry disclosed herein.
Description
- Faster performance is achieved with current microprocessor technologies through the use of instruction pipelines for retrieving and executing program instructions. One obstacle to pipelining microprocessors is that computed results are not immediately written back into a register file, requiring multiple clock cycles for the computed result to be moved to, and stored into, the appropriate register file. Processing delays may result if the computed results are needed before they are placed into, and become available from, the register file. This delay problem may have a “domino effect” during each of the cycles in which the computed result is not stored in a register file while additional computed results become available and are needed before they, in turn, are recorded in the register file.
- In the past, the data availability delay problem has been addressed through the use of data forwarding and/or bypass techniques. Both data forwarding and bypass techniques allow arithmetic logic units (ALUs), or ALU execution units, to access and use the computed results before they are placed in the register file. By allowing these results to be used before they are placed in the register file the machine is used more efficiently and its performance is increased.
- Conventional data forwarding and bypass techniques use multiplexers (MUXs) to allow unstored computed results to be available for subsequent use. Access to computed results from several computational cycles not yet in a register file is provided by multiple layers, or a hierarchy, of MUXs. In order to provide access to all of the computed results before reading the register file, the data forwarding or bypass circuitry is repeated numerous times, increasing the complexity of the overall circuit. The complexity of the system is also increased as data forwarding is used for additional ALUs or as data forwarding is used for additional ALU inputs. Use of a MUX hierarchy can provide a capability in which every computed result from each ALU can be bypassed to every other ALU in one cycle or one machine state. When this is achieved, a complete bypass network is obtained.
- To provide these capabilities, each data forwarding or bypass circuit requires one or more MUXs. Dynamic circuits are typically used to ensure that the right MUX output is selected, at the fastest possible speed. Dynamic circuits are monotonic signalling. For each input of to a dynamic circuit MUX, a corresponding separate discrete MUX select signal is required so as to avoid select signal decoding delays. Therefore, for a dynamic MUX, the number of inputs on a MUX is equal to the number of selects on the MUX. For a dynamic MUX which has N inputs, N selects would also be required resulting in at least 2N connections to the MUX. Connections are also required for the output and clock resulting in 2N+2 connections to the MUX.
- While the use of data forwarding and bypass techniques provide high performance circuit operation, they have several drawbacks. These drawbacks fall into three categories: circuit performance, area required by the circuitry and circuit and wiring complexity. In particular, within a circuit, as the need for data forwarding and bypass techniques increases and is addressed with the techniques described, the overall performance of the circuit is reduced and both the area used for data forwarding/bypass and the resulting circuit and wiring complexity is increased. To alleviate these drawbacks, designers have concentrated on removing unnecessary bypasses, i.e., those that don't result in valuable performance gain.
- Conventional data MUXs, typically comprising a select control switch, involve a dynamic clock activated circuit. This means that the result output is only valid when the clock signal is asserted. When the clock signal is not asserted the result goes to a pre-charged or a predetermined state, and does not necessarily reflect the circuit's state. When a select signal is asserted the output will reflect the value of data corresponding to the high select signal. The select signals are guaranteed by design to be mutually exclusive allowing only one of the data values to be transmitted to the output. In a standard dynamic circuit style, this MUX output circuit would include an inverter circuit followed by a feedback hold circuit.
- An object of the invention is to provide a methodology that allows access to computed results which are not available in a register file, without the associated drawbacks. A further object of the invention is to provide a data forwarding architecture which reduces the number of wires used. A further object of the invention is the use of less area for data forwarding. A further object of the invention is to reduce circuit complexity. A further object of the invention is to allow increased data forwarding capability. These objectives are accomplished through the use of encoded wires and the reuse of data path multiple times to achieve different functions.
- According to a feature of the invention, a bypass is established through the reuse of the register file input wires without the need for any additional wires. The speed of the bypass is also increased over a standard MUX hierarchy because the MUXs used in the invention are of a smaller size, and therefore faster. Less area is required by virtue of smaller MUXs and through a reduction in the number of MUXs used. The use of fewer wires and smaller components: simplifies the circuit design and its complexity and allows for less onerous debugging of the circuitry. MUX control circuitry is also simplified because exclusivity in the select lines is no longer required. These and similar features allow simpler control mechanisms. Additionally, through higher capacity, this invention reduces the latency of particular bypasses.
- The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.
- For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
- FIG. 1 shows a block diagram of a preferred embodiment for a mechanism for a data forwarding circuit;
- FIG. 2 shows a block diagram of a prior art implementation of data forwarding circuitry;
- FIG. 3 shows a block diagram of a multiplexer which can be used in the implementation of FIG. 1 or FIG. 2;
- FIG. 4 shows a block diagram of a conventional multiplexer with 2N inputs and 2N selects;
- FIG. 5 shows an block diagram of an encoded multiplexer with 2N input and N +2 selects;
- FIG. 6 shows an internal block diagram of the encoded multiplexer shown in FIG. 5;
- FIG. 7 shows a internal block diagram of a conventional multiplexer shown in FIG. 4;
- FIG. 8A shows a block diagram of a control circuits for the multiplexer shown in FIG. 4;
- FIG. 8B shows a block diagram of the control circuits for the multiplexer shown in FIG. 5;
- FIG. 9 shows a block diagram of a two multiplexer arrangement which accepts write back, write back minus one and the ALU output as input; and
- FIG. 10 shows a block diagram of an additional register file bypass and its use with a cache bypass.
- FIG. 1 shows a new implementation of data forwarding with a write back (WRB) stage bypass. Note that the main CPU core pipeline comprises the REG stage (register read), the EXE stage (execute) wherein operations are performed, the DET Stage (detect) wherein exceptions are detected, and the WRB stage (write-back) wherein results are committed to architected state. Note that the invention will operate with other pipelines, as this pipeline is one example. It has an
ALU register file 115. Connecting theregister file 115 in the middle of the ALUs of functional units optimizes the configuration. Throughout this description the term ALU is used. Those of ordinary skill in the art will understand that other functional units are equivalent to the described ALU's for the purpose of this invention. - Within unit B of FIG. 1, unit B's
ALU 105's output goes to both unit B'slatch 120 and to unit B'sMUX 125. The output from unit B'slatch 120 goes to unit B's bidirectional wired ORcontroller 130. The output of unit B's bi-directional wired ORcontroller 130, goes to unit B'sMUX 125 and to apath 140 in theregister file 115. - In FIG. 1, both unit B's
ALU 105 and unit A'sALU 110 results are destined to theregister file 115 and the architecture, allows either unit B'sALU 105 or unit A'sALU 110 results to be written into theregister file 115 in a given cycle, while preventing both from being written simultaneously. Therefore, when directed bycontrol 142 the output of unit B's bi-directional wired ORcontroller 130, goes into theregister file 115, goes across theregister file 115 on thepath 140 into unit A's bi-directional wire ORcontroller 135, and through that into unit A'sMUX 145. - FIG. 1 shows that this works in both directions. Computed results can travel from unit A to unit B as follows: unit A's
ALU 110 to both unit A'sMUX 145 and unit A'slatch 150, from unit A'slatch 150 to unit A's bi-directional wired ORcontroller 135 to unit A'sMUX 145, and from unit A's bi-directional wired ORcontroller 135 topath 140 inregister file 115 to unit B's bi-directional wired ORcontroller 130 to unit B'sMUX 125. Computed results can also travel from unit B to unit A as follows: unit B'sALU 105 to both unit B'sMUX 125 and unit B'slatch 120, from unit B'slatch 120 to unit B's bi-directional wired ORcontroller 130 to unit B'sMUX 125, and from unit B's bi-directional wired ORcontroller 130 topath 140 ofregister 115 to unit A's bi-directional wired ORcontroller 135 to unit A'sMUX 145.Path 140, which traverses across theregister file 115 is bi-directional and performs two tasks.Path 140 is used to write the computed value into theregister file 115, and it is also used to allow computed results to be passed between units A and B in either direction. In unit A the computed result from unit B is sent to unit A'sMUX 145, and in unit B the computed result from unit A is sent to unit B'sMUX 125. TheMUXs register file 115 are symbolic of the data forwarding circuitry to get back to the ALU, shown in FIG. 3. FIG. 1 also shows two inputs to unit B'sMUX 125 and two inputs to unit A'sMUX 145. For unit B'sMUX 125 the two inputs are a write back minus 1 (WRB−1)input 155 and aWRB result input 160 for each ALU. Similarly, for unit A'sMUX 145 there is a WRB-1input 165 and aWRB result input 170 for each ALU. Again, unit B'sMUX 125 and unit A'sMUX 145 are symbolic of the data forwarding circuitry to allow unstored computed results to be available to the ALUs. - It normally takes several cycles for the ALU data to be written back into the register file. Once the ALU data is written into the register file it becomes available to other ALUs from the register file. During each of the subsequent cycles the ALU may calculate additional results. Each of these additional results needs to be available to other ALUs until they, in turn, are stored in a register file. EXE stage data is the fresher data than DET or WRB stage data, it represents data that has just been computed by the ALU during the last completed cycle. WRB−1 represents the computed results of the previous ALU cycle. Note that this stage corresponds to the DET pipeline stage. WRB represents the computed results of the ALU two cycles ago. Accordingly, WRB data will be placed in a register file one cycle before WRB−1 data will be recorded in a register file. Similarly, WRB data will be placed in a register file two cycles before EXE data will be recorded in a register file. While this example describes a system in which the ALU result is recorded into a register file three cycles after it has been calculated, one of ordinary skill in the art will recognize that this mechanism for data forwarding and bypass can be expanded to encompass any delay in recording the data in the register file.
- FIG. 2 shows an example of a conventional arrangement used for data forwarding.
Register file 200 is connected between unit B'sALU 205 and unit A'sALU 210. A data output from unit B'sALU 205 is fed to both unit B'sMUX 215 and unit B'slatch 220. The output from unit B'slatch 220 is connected to unit B'sMUX 215,MUX 225, and unit A'sMUX 230. Similarly, the output from unit A'sALU 210 is connected to unit A'slatch 235 and unit A'sMUX 230. The output from unit A'slatch 235 is connected to unit A'sMUX 230,MUX 225 and unit B'sMUX 215. - Three data paths are shown traversing the
register file 200. Afirst path 240 provides the output of unit B'slatch 220 from unit B'sALU 205 to unit A. Thesecond connection 245 provides the output of unit A'slatch 235 from unit A'sALU 210 result to unit B. Thethird path 250 allows the output of unit B'slatch 220 to be MUX'ed with the output of unit A'slatch 235 inMUX 225 and is used to drive the input to theregister file 200. This configuration has three paths, 240, 245, 250, traversing across theregister file 200. The buses are also directional. Besides the threepaths third MUX 225 required to multiplex the results from unit A'sALU 210 and unit B'sALU 205 into theregister file 200. In this configuration, unit B'sMUX 215 and unit A'sMUX 230 each have three inputs which includes two write back (WRB) results, one set 255 from unit A and one set 260 from unit B. - The embodiment illustrated in FIG. 1 is advantageous when compared to the circuitry illustrated in FIG. 2 for several reasons. In FIG. 2 there are three
paths register file 200, as compared to thesingle path 140 traversing theregister file 115 shown in FIG. 1. In FIG. 2, the paths are directional, while in FIG. 1 thepath 140 is bidirectional. In FIG. 2, both unit B'sMUX 215 and unit A'sMUX 230 each have three inputs, while FIG. 1's unit B'sMUX 125 and unit A'sMUX 145 each have two inputs. Each of these extra paths and wires, two paths within the register file and one input wire to each of the MUXs take up extra area. Additionally, in FIG. 2, andextra MUX 225 is required to multiplex the results from the unit A'sALU 210 and unit B'sALU 205 into theregister file 200. Thisthird MUX 125, also takes up valuable area. - In FIG. 1 the ALU result is staged to WRB stage, which is the stage where the data is written into the register file. The data then passes into this bi-directional wired OR controller, which determines whether this ALU's result is valid for writing into the register file. If it is, the wired OR controller drives it into the register file and across the register file, into the other wire OR controller, which forwards it on to the MUX or data forwarding hierarchy, which continues on to the source latch for the ALU on the other side. This can go in either direction. The bi-directional wire OR controller requires a control input which is easy to determine in this architecture, several cycles in advance to determine whether the data is flowing from unit A to unit B or from unit B to unit A.
- FIG. 1 can be expanded in a couple of different ways. In more complicated architectures, multiple ALUs on either side of the register file can be added. The addition of these extra ALUs means there will be many more results going across to the register file, one for each ALU on each side. This diagram can also be expanded with the use of multiple sides. While FIG. 1, depicts a unit A and a unit B, one skilled in the art will appreciate that the diagram could be expanded to include additional units. The inclusion of these additional units would increase the
control mechanism 142, but will not require additional data forwarding or bypass circuitry to be added. The number of ALUs that can write into the register file at a time determines the number of paths required across the register file. FIG. 1 shows two ALUs with asingle path 140 across theregister file 115. If two ALUs were included on both sides of the figure, two paths would traverse across the register file to allow two ALUs to write to the register file simultaneously. If only one ALU could write to the register at a time, only one path would be required to traverse across the register file. - FIG. 3 shows a MUX which can be used in the implementation of either FIG. 1 or FIG. 2.
- FIG. 4 shows a conventional MUX, which has N inputs of data for each stage and, correspondingly, it has N selects for each stage. FIG. 4 depicts a WRB stage with N inputs and a WRB−1 stage with N inputs. This MUX has 2N inputs and consequently 2N selects.
- FIG. 5 shows a block diagram of an encoded MUX. The encoded MUX has N inputs of data for each stage. FIG. 5 depicts a WRB stage with N inputs and a WRB−1 stage with N inputs. This encoded MUX therefore has 2N data inputs. But, by using the encoded MUX, rather than a conventional MUX, the number of selects is drastically reduced. While the number of selects for the conventional MUX was 2N, the number of selects for the encoded MUX is N+2. By encoding the selects the number of selects was reduced to N selects plus two additional selects which determine which stage the data is from. In FIG. 3 WRB−1 corresponds to input300 into
MUX A 305 and WRB corresponds to input 310 intoMUX B 315. Again, these last two inputs are used to determine which stage the data is from. FIG. 5 is advantageous because the circuits are wire limited and allow a reduction in the number of selects from 2N to N+2, which saves area and wiring resources. EncodedMUX 320 in FIG. 3 depicts a representative MUX. - In both FIGS. 4 and 5, the number of ALUs on a side of the register file is equivalent to N.
- In order to ensure access to all computed results that have not reached a register file, a data forwarding or bypass circuit is required for each ALU input. Each of these data forwarding or bypass circuits require at least one MUX. In a typical environment, an ALU has two inputs, so the reduction in selects from 2N to N+2 in each MUX, is felt twice for a two input ALU. For ALUs with more than two inputs the savings in area and reduced complexity are significantly greater.
- FIG. 6 shows an internal diagram of the MUX shown in FIG. 5. FIG. 6 highlights include a
WRB cone 600 that shows N inputs of WRB data and N selects going into a WRB MUX very similar to the MUX shown in FIG. 7. The output of that circuit then goes into a different circuit which contains WRB−1cone 605, and takes the same N selects as theWRB cone 600, but also takes N inputs of WRB−1 data. Both theWRB cone 600 and the WRB−1cone 605 use the same raw selects, labeledcell 0 through N in both cones. The WRB selects are encoded such that the select is valid if one or the other is valid but not if both are valid. In this manner the selwrb and selwrb−1 control signals are used to determine whether the system uses the output of the WRB−1cone 605 or theWRB cone 600. - FIGS. 8A and 8B show how the controls are generated for the
MUX 400 shown in FIG. 4 andMUX 500 shown in FIG. 5 respectively. Raw selects in FIG. 8A for both the WRB (selwrb) 800 stage and WRB−1 (selwrb−1) 805 stage are available. Before selwrb and selwrb−1 can be used with a conventional prior art MUX, they have to be conditioned and prioritized to work. This is accomplished by ensuring that selwrb−1 takes precedence. If any of the selwrb−1 are asserted, any selwrb set must be disabled. The circuit shown in FIG. 8A demonstrates one implementation of this prioritization. The prioritization between the sets ensures the exclusivity of all the selects for the MUXs in FIG. 4. The prioritization also ensures the correct priority between the stages. A total of 2N selects (N instances of selwrb and N instances of selwrb−1) are present and a delay may need to be introduced in the system to allow for the extra time required for prioritization. - FIG. 8B shows how the select lines are generated for the
MUX 500 in FIG. 5. The raw selects for the WRB (selwrb) 810 stage and the-raw selects for the WRB−1 (selwrb−1) 815 stage are available and are bit-wise “OR'ed” together. This reduces the twoN inputs single N output 830. This combined set of selects is sent down to theMUX 500 shown in FIG. 5 and is used for theselect input sel 505. Effectively what occurs is that one value from each set of data inputs on the MUX, is selected. In FIG. 5 one value from setWRB 520 and one value from set WRB−1 525 is selected. - In FIG. 8B, the N inputs of selwrb−1815 are also “OR'ed” together and then inverted to give the two additional signals fed into the
MUX 500 on FIG. 5. The “OR'ed” signal is used byMUX 500 shown in FIG. 5 as select input selwrb−1 515 and the inverted “OR'ed” signal is used byMUX 500 shown in FIG. 5 asselect input selwrb 510. These inputs determine which of the two selected inputs are passed through. - The two sets of data,
WRB 520 and WRB−1 525, going into theMUX 500 on FIG. 5 represent data from two different pipeline stages, or data from two different stages in a single pipeline. The vector N is used to select one from each of those pipeline stages, and then the last two signals, shown in FIG. 8B, selects between the pair that was selected from the first set. Effectively, the MUX first selects two and then selects one from those two. Finally, the selwrbs of FIG. 8 are conditioned on the fact that there was no selwrb−1. This is similar to the prioritization scheme shown in FIG. 8A. - The
MUX 500 of FIG. 5 is depicted as selecting between WRB and WRB−1 data. A second IUX (not shown) would be configured to choose between the output of thatMUX 500 and the ALU, to ensure the data needed is available. FIG. 9 shows a representative schematic. - FIG. 10 shows a cache bypass. The
cache 1005 usually can only exist on one side of theregister file 1010. Normally, thecache 1005 results are only bypassed to the ALUs which reside on the same side of theregister file 1010. In FIG. 10, the cache results would be accessible toALU1 1015, but not toALU2 1020. Because of the mechanism described within this invention for data forwarding, existing paths across theregister file 1010, previously used for data forwarding are now available to bypass the cache results across the register file to the ALUs on the off cache side. In FIG. 10, the available paths across theregister file 1010 would allow thecache 1005 results to be sent toALU2 1020. - FIG. 10 depicts one example of the additional capability available when the mechanism for data forwarding included in this invention is implemented. The implementation of this mechanism for data forwarding reduces the number of paths required to perform data forwarding and bypass. The number of MUXs required to performed data forwarding is also reduced. These reductions in the surface area used and system complexity allow for increased functionality to be added to the circuit.
- Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
Claims (20)
1. A circuit to access unstored data comprising:
a register file;
a first bi-directional OR controller connected to said register file;
a first multiplexer having a first input connected to an output of said first bi-directional OR controller;
a second bi-directional OR controller connected to said register file;
a second multiplexer having a first input connected to an output of said second bi-directional OR controller; and
a control circuit connected to said first bi-directional OR controller and said second bi-directional OR controller.
2. A circuit according to claim 1 further comprising:
a first latch having an output connected to a first input of said first bi-directional OR controller; and
a second latch having an output connected to a first input to said second bi-directional OR controller.
3. A circuit according to claim 2 further comprising:
a first functional unit having a first output connected to a second input to said first multiplexer and having a second output connected to an input to said first latch.
4. A circuit according to claim 3 further comprising:
a second functional unit having a first output connected to a second input of said second multiplexer and having a second output connected to an input of said second latch.
5. A circuit according to claim 4 wherein:
said first multiplexer is part of a first data forwarding circuit; and
said second multiplexer is part of a second data forwarding circuit.
6. A circuit according to claim 5 wherein said control circuit determines whether the output from said first bi-directional OR controller is sent across said register file to said second bi-directional OR controller or if the output from said second bi-directional OR controller is sent across said file register to said first bi-directional OR controller.
7. A circuit according to claim 6 wherein said first bi-directional OR controller is a first bi-directional wired OR controller; and
said second bi-directional OR controller is a second bi-directional wired OR controller.
8. A circuit according to claim 6 wherein said first bi-directional OR controller is a first bi-directional wired OR controller; and
said second bi-directional OR controller is a second bi-directional wired OR controller.
9. A circuit according to claim 6 further comprising:
one or more latches between a source of unstored data and said first bi-directional OR controller.
10. A circuit according to claim 8 further comprising:
one or more latches between a source of unstored data and said second bi-directional OR controller.
11. The circuit according to claim 6 further comprising:
one or more latches between a source of unstored data and said first bi-directional controller; and
one or more latches between a source of unstored data and said second bi-directional controller.
12. A circuit according to claim 6 further comprising:
one or more additional bi-directional OR controllers having a bi-directional connection across said register file.
13. A circuit according to claim 6 further comprising:
one or more additional bi-directional OR controllers having a bi-directional connection to said first bi-directional OR controller or to said second bi-directional OR controller wherein the connection does not go across said register file.
14. An encoded multiplexer comprising:
a first input with at least one instance;
a second input with at least one instance;
a first raw select signal with at least one instance;
a second raw select signal with at least one instance; and
a circuit which combines said first raw select signal and said second raw select signal to determine which input should be used as a first conditioned select signal and a second conditioned select signal.
15. An encoded multiplexer according to claim 14 wherein said circuit comprises:
an OR gate having said first raw select signal as an input;
an invertor having an input connected to an output of said OR gate; and
a nand gate having a first nand input connected to an output of said invertor and a second nand input connected to said second raw select signal input.
16. An encoded multiplexer according to claim 15 wherein said first conditioned select signal is connected to said first raw select signal; and
said second conditioned select signal is connected to the output of the nand gate.
17. An encoded multiplexer according to claim 16 wherein said first raw select signal is connected to the select signal from a previous cycle; and
said second raw select signal is connected to the select signal from the cycle before the previous cycle.
18. An encoded multiplexer comprising:
a first input with at least one instance;
a second input with at least one instance;
a first raw select signal with at least one instance;
a second raw select signal with at least one instance; and
a circuit which combines said first raw select signal and said second raw select signal to determine which input should be used as a first conditioned select signal, a second conditioned select signal and a third conditioned select signal.
19. An encoded multiplexer according to claim 18 wherein said circuit comprising:
a first NAND gate having an input A connected to said second raw select input;
a first OR gate having an input B connected to said first raw select input and an input C connected to an output of said first NAND gate;
a second OR gate having an input D connected to said first raw select input;
a invertor having an input connected to the output of said second OR gate; and
said first NAND gate having an input E connected to an output of said invertor.
20. An encoded multiplexer according to claim 19 wherein said first conditioned select signal is connected to a current select signal;
said second conditioned select signal is connected to an output of said second OR gate; and
said third conditioned select signal is connected to an output of said invertor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/663,880 US20040062240A1 (en) | 2000-02-21 | 2003-09-16 | Mechanism for data forwarding |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/510,278 US6707831B1 (en) | 2000-02-21 | 2000-02-21 | Mechanism for data forwarding |
US10/663,880 US20040062240A1 (en) | 2000-02-21 | 2003-09-16 | Mechanism for data forwarding |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/510,278 Continuation US6707831B1 (en) | 2000-02-21 | 2000-02-21 | Mechanism for data forwarding |
Publications (1)
Publication Number | Publication Date |
---|---|
US20040062240A1 true US20040062240A1 (en) | 2004-04-01 |
Family
ID=24030084
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/510,278 Expired - Fee Related US6707831B1 (en) | 2000-02-21 | 2000-02-21 | Mechanism for data forwarding |
US10/663,880 Abandoned US20040062240A1 (en) | 2000-02-21 | 2003-09-16 | Mechanism for data forwarding |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/510,278 Expired - Fee Related US6707831B1 (en) | 2000-02-21 | 2000-02-21 | Mechanism for data forwarding |
Country Status (1)
Country | Link |
---|---|
US (2) | US6707831B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060277425A1 (en) * | 2005-06-07 | 2006-12-07 | Renno Erik K | System and method for power saving in pipelined microprocessors |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6707831B1 (en) * | 2000-02-21 | 2004-03-16 | Hewlett-Packard Development Company, L.P. | Mechanism for data forwarding |
US6823434B1 (en) * | 2000-02-21 | 2004-11-23 | Hewlett-Packard Development Company, L.P. | System and method for resetting and initializing a fully associative array to a known state at power on or through machine specific state |
ATE491183T1 (en) * | 2007-04-16 | 2010-12-15 | Tixel Gmbh | METHOD AND DEVICE FOR ACCESS CONTROL OF SEVERAL APPLICATIONS |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4041461A (en) * | 1975-07-25 | 1977-08-09 | International Business Machines Corporation | Signal analyzer system |
US4450525A (en) * | 1981-12-07 | 1984-05-22 | Ibm Corporation | Control unit for a functional processor |
US4706240A (en) * | 1985-11-29 | 1987-11-10 | American Telephone And Telegraph Co., At&T Bell Labs | Switching system having multiple parallel switching networks |
US5043606A (en) * | 1990-03-30 | 1991-08-27 | Seagate Technology, Inc. | Apparatus and method for programmably controlling the polarity of an I/O signal of a magnetic disk drive |
US5097410A (en) * | 1988-12-30 | 1992-03-17 | International Business Machines Corporation | Multimode data system for transferring control and data information in an i/o subsystem |
US5253308A (en) * | 1989-06-21 | 1993-10-12 | Amber Engineering, Inc. | Massively parallel digital image data processor using pixel-mapped input/output and relative indexed addressing |
US5481495A (en) * | 1994-04-11 | 1996-01-02 | International Business Machines Corporation | Cells and read-circuits for high-performance register files |
US5590293A (en) * | 1988-07-20 | 1996-12-31 | Digital Equipment Corporation | Dynamic microbranching with programmable hold on condition, to programmable dynamic microbranching delay minimization |
US5773995A (en) * | 1996-04-22 | 1998-06-30 | Motorola, Inc. | Digital multiplexer circuit |
US6118300A (en) * | 1998-11-24 | 2000-09-12 | Xilinx, Inc. | Method for implementing large multiplexers with FPGA lookup tables |
US6163867A (en) * | 1998-08-28 | 2000-12-19 | Hewlett-Packard Company | Input-output pad testing using bi-directional pads |
US6215325B1 (en) * | 1999-03-29 | 2001-04-10 | Synopsys, Inc. | Implementing a priority function using ripple chain logic |
US6707831B1 (en) * | 2000-02-21 | 2004-03-16 | Hewlett-Packard Development Company, L.P. | Mechanism for data forwarding |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE4107172C2 (en) | 1991-03-06 | 1997-08-07 | Siemens Ag | Circuit arrangement for testing integrated digital circuits |
US5481743A (en) | 1993-09-30 | 1996-01-02 | Apple Computer, Inc. | Minimal instruction set computer architecture and multiple instruction issue method |
-
2000
- 2000-02-21 US US09/510,278 patent/US6707831B1/en not_active Expired - Fee Related
-
2003
- 2003-09-16 US US10/663,880 patent/US20040062240A1/en not_active Abandoned
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4041461A (en) * | 1975-07-25 | 1977-08-09 | International Business Machines Corporation | Signal analyzer system |
US4450525A (en) * | 1981-12-07 | 1984-05-22 | Ibm Corporation | Control unit for a functional processor |
US4706240A (en) * | 1985-11-29 | 1987-11-10 | American Telephone And Telegraph Co., At&T Bell Labs | Switching system having multiple parallel switching networks |
US5590293A (en) * | 1988-07-20 | 1996-12-31 | Digital Equipment Corporation | Dynamic microbranching with programmable hold on condition, to programmable dynamic microbranching delay minimization |
US5097410A (en) * | 1988-12-30 | 1992-03-17 | International Business Machines Corporation | Multimode data system for transferring control and data information in an i/o subsystem |
US5253308A (en) * | 1989-06-21 | 1993-10-12 | Amber Engineering, Inc. | Massively parallel digital image data processor using pixel-mapped input/output and relative indexed addressing |
US5043606A (en) * | 1990-03-30 | 1991-08-27 | Seagate Technology, Inc. | Apparatus and method for programmably controlling the polarity of an I/O signal of a magnetic disk drive |
US5481495A (en) * | 1994-04-11 | 1996-01-02 | International Business Machines Corporation | Cells and read-circuits for high-performance register files |
US5773995A (en) * | 1996-04-22 | 1998-06-30 | Motorola, Inc. | Digital multiplexer circuit |
US6163867A (en) * | 1998-08-28 | 2000-12-19 | Hewlett-Packard Company | Input-output pad testing using bi-directional pads |
US6118300A (en) * | 1998-11-24 | 2000-09-12 | Xilinx, Inc. | Method for implementing large multiplexers with FPGA lookup tables |
US6215325B1 (en) * | 1999-03-29 | 2001-04-10 | Synopsys, Inc. | Implementing a priority function using ripple chain logic |
US6707831B1 (en) * | 2000-02-21 | 2004-03-16 | Hewlett-Packard Development Company, L.P. | Mechanism for data forwarding |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060277425A1 (en) * | 2005-06-07 | 2006-12-07 | Renno Erik K | System and method for power saving in pipelined microprocessors |
Also Published As
Publication number | Publication date |
---|---|
US6707831B1 (en) | 2004-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR100309566B1 (en) | Method and apparatus for grouping multiple instructions, issuing grouped instructions concurrently, and executing grouped instructions in a pipeline processor | |
US6230257B1 (en) | Method and apparatus for staggering execution of a single packed data instruction using the same circuit | |
JP2550213B2 (en) | Parallel processing device and parallel processing method | |
JP3098071B2 (en) | Computer system for efficient execution of programs with conditional branches | |
US4774688A (en) | Data processing system for determining min/max in a single operation cycle as a result of a single instruction | |
EP1324190B1 (en) | Data processing system having a read-modify-write unit | |
JP5209933B2 (en) | Data processing device | |
KR100316078B1 (en) | Processor with pipelining-structure | |
US4773035A (en) | Pipelined data processing system utilizing ideal floating point execution condition detection | |
JPH0348536B2 (en) | ||
US6707831B1 (en) | Mechanism for data forwarding | |
US5253349A (en) | Decreasing processing time for type 1 dyadic instructions | |
EP1408405A1 (en) | "A reconfigurable control structure for CPUs and method of operating same" | |
JP2001100997A (en) | Parallel processing processor | |
US4707783A (en) | Ancillary execution unit for a pipelined data processing system | |
JP5104862B2 (en) | Instruction execution control device and instruction execution control method | |
JP3640855B2 (en) | Processor | |
US7191432B2 (en) | High frequency compound instruction mechanism and method for a compare operation in an arithmetic logic unit | |
JPH03269728A (en) | Instruction execution control system for pipeline computer | |
US11269651B2 (en) | Reusing adjacent SIMD unit for fast wide result generation | |
US20020087834A1 (en) | System and method for encoding constant operands in a wide issue processor | |
US7200739B2 (en) | Generation of modified command sequence from original command by feeding back for subsequent modification based on decode control signal | |
US6289439B1 (en) | Method, device and microprocessor for performing an XOR clear without executing an XOR instruction | |
JPH033047A (en) | Memory with arithmetic function | |
JPH0277840A (en) | Data processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |