WO2002097607A1 - Floating point system that represents status flag information within a floating point operand - Google Patents

Floating point system that represents status flag information within a floating point operand Download PDF

Info

Publication number
WO2002097607A1
WO2002097607A1 PCT/US2002/016024 US0216024W WO02097607A1 WO 2002097607 A1 WO2002097607 A1 WO 2002097607A1 US 0216024 W US0216024 W US 0216024W WO 02097607 A1 WO02097607 A1 WO 02097607A1
Authority
WO
WIPO (PCT)
Prior art keywords
floating point
status
operand
result
point operand
Prior art date
Application number
PCT/US2002/016024
Other languages
French (fr)
Inventor
Guy L. Steele, Jr.
Original Assignee
Sun Microsystems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems, Inc. filed Critical Sun Microsystems, Inc.
Publication of WO2002097607A1 publication Critical patent/WO2002097607A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • G06F5/012Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising in floating-point computations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4876Multiplying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • G06F9/30014Arithmetic instructions with variable precision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30094Condition code generation, e.g. Carry, Zero flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/01Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising
    • G06F5/015Methods or arrangements for data conversion without changing the order or content of the data handled for shifting, e.g. justifying, scaling, normalising having at least two separately controlled shifting levels, e.g. using shifting matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • G06F7/4873Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49905Exception handling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49942Significance control

Definitions

  • the invention generally relates to systems and methods for performing floating point arithmetic computations, and more particularly to systems and methods for performing floating point computations which conform to behavior specified in IEEE Standard (“Std.”) 754 using a modified or enhanced format for the floating point operand.
  • Std. IEEE Standard
  • IEEE Std. 754 IEEE Std. 754
  • ANSI American National Standards Institute
  • IEEE Std. 754 defines several standard formats for expressing values in floating point format, and a number of aspects regarding behavior of computation in connection therewith.
  • a value in representation in floating point format comprises a plurality of binary digits, or "bits,” having the structure
  • bits "s” is a sign bit indicating whether the entire value is positive or negative.
  • Bits " e mb ⁇ ⁇ -e hh " comprise an exponent field representing the exponent "e” in unsigned binary biased format.
  • bits " f detox, ih ⁇ ⁇ ⁇ f hh " comprise a fraction field that represents the fractional portion "f ' in unsigned binary format ("msb” represents "most significant bit” and "Isb” represents "least significant bit”).
  • the Standard defines two general formats, namely, a “single” format which comprises thirty-two bits, and a "double format which comprises sixty-four bits.
  • the exponent field of the floating point representation "e msb ⁇ ⁇ ⁇ e lib " represents the exponent "E” in biased format.
  • the biased format provides a mechanism by which the sign of the exponent is implicitly indicated.
  • IEEE Std. 754 provides for several different formats with both the single and double formats which are generally based on the bit patterns of the bits "e lll ---e hb " comprising the exponent field and the bits "/ course,_» • • •/ / ,* " comprising the fraction field.
  • the formats are generally depicted in FIG. 2.
  • circuits or devices that perform floating point computations or operations are designed to generate a result in three steps: [010]
  • an approximation calculation step an approximation to the absolutely accurate mathematical result (assuming that the input operands represent the specific mathematical values as described by IEEE Std. 754) is calculated that is sufficiently precise as to allow this accurate mathematical result to be summarized.
  • the summarized result is usually represented by a sign bit, an exponent (typically represented using more bits than are used for an exponent in the standard floating-point format), and some number "N" of bits of the presumed result fraction, plus a guard bit and a sticky bit.
  • the value of the exponent will be such that the value of the fraction generated in step (a) consists of a 1 before the binary point and a fraction after the binary point.
  • the bits are commonly calculated so as to obtain the same result as the following conceptual procedure (which is impossible under some circumstances to carry out in practice): calculate the mathematical result to an infinite number of bits of precision in binary scientific notation, and in such a way that there is no bit position in the significand such that all bits of lesser significance are 1-bits (this restriction avoids the ambiguity between, for example, 1.100000... and 1.011111...
  • the sticky bit is the logical OR of all remaining bits of the infinite fraction after the guard bit.
  • step (b) In the second step, a rounding step, the guard bit, the sticky bit, perhaps the sign bit, and perhaps some of the bits of the presumed significand generated in step (a) are used to decide whether to alter the result of step (a). For conventional rounding modes defined by IEEE Std. 754, this is a decision as to whether to increase the magnitude of the number represented by the presumed exponent and fraction generated in step (a). Increasing the magnitude of the number is done by adding 1 to the significand in its least significant bit position, as if the significand were a binary integer. It will be appreciated that, if the significand is all 1-bits, the magnitude of the number is "increased” by changing it to a high-order 1-bit followed by all 0-bits and adding 1 to the exponent.
  • IEEE Std. 754 defines a round-toward-nearest mode in which the least significant bit (Isb) of the significand, the guard bit, and the sticky bit are examined to decide if an increase is made according to the following decision table:
  • IEEE Std. 754 defines a round-toward-minus-infinity mode in which the decision as to whether to increase is made according to the following decision table:
  • IEEE Std. 754 defines a round-toward-plus-infinity mode in which the decision as to whether to increase is made according to the following decision table:
  • IEEE Std. 754 defines a round-toward-zero mode in which the decision as to whether to increase is made according to the following decision table:
  • a packaging step a result is packaged into a standard floating-point format. This may involve substituting a special representation, such as the representation defined for infinity or NaN if an exceptional situation (such as overflow, underflow, or an invalid operation) was detected. Alternatively, this may involve removing the leading 1-bit (if any) of the fraction, because such leading 1-bits are implicit in the standard format. As another alternative, this may involve shifting the fraction in order to construct a denormalized number. As a specific example, it is assumed that this is the step that forces the result to be a NaN if any input operand is a NaN. In this step, the decision is also made as to whether the result should be an infinity. It will be appreciated that, if the result is to be a NaN or infinity, the original result will be discarded and an appropriate representation will be provided as the result.
  • a special representation such as the representation defined for infinity or NaN if an exceptional situation (such as overflow, underflow, or an invalid operation) was detected.
  • this may involve
  • FIG. 1 depicts the typical organization for a floating point unit 10 for a conventional prior art microprocessor capable of performing the floating point operations described above.
  • the floating point unit includes a number of circuits commonly referred to as functional units. These functional units include an adder 11 , a multiplier 12, a divider 13, a square-root unit 14, a maximum/minimum unit 15 that delivers the larger or smaller of two operands, a comparator 16 that typically delivers one bit or a few bits describing a numerical relationship between two operands, and a tester 17 that examines just one operand and delivers one bit or a few bits describing numerical properties of the operand.
  • functional units include an adder 11 , a multiplier 12, a divider 13, a square-root unit 14, a maximum/minimum unit 15 that delivers the larger or smaller of two operands, a comparator 16 that typically delivers one bit or a few bits describing a numerical relationship between two operands, and a tester 17 that examines just one operand and delivers one bit or
  • the functional units are controlled by a control unit 20 that interprets program instructions, enables operands to be coupled from a set of floating point registers 21 onto respective operand buses 22 and 23, generates functional unit control signals that enable respective ones of the functional units 11 through 17 to receive the operand or operands and perform their respective operations to generate a result. If the functional unit control signals enable one of the functional units 11 through 16 to operate, the result is coupled onto a result bus 24 and stored in the floating point registers 21. In addition, if the functional unit control signals enable the comparator 16 or the tester 17 to operate, the result will be coupled to the control unit 20. Some functional units may be capable of delivering a result to the result bus in the same clock cycle that operands are presented to it. However, other functional units may be pipelined, in which case results are delivered during a clock cycle that is later than the clock cycle during which the corresponding operands were presented to the functional unit.
  • Functional units 11 through 14 also generate floating-point status information, which is typically coupled to a floating point status register 25 over floating-point status bus 26 for storage therein.
  • the floating point status information is stored and/or accumulated in the floating point status register 25.
  • the floating point status information generated for a particular floating point operation includes indications, for example, as to whether:
  • the floating point status information read from the floating point status register 25 may enable the control unit 20 to initiate processing of a trap sequence, which will interrupt the normal flow of program execution.
  • status bits read from the floating point status register 25 may be used by the control unit 20 to affect certain ones of the functional unit control signals that it sends to the functional units, such as the functional unit control signals that control the rounding mode.
  • IEEE Std. 754 has brought relative harmony and stability to floating-point computation and architectural design of floating-point units. Moreover, its design was based on some important principles, and rests on a sensible mathematical semantics that eases the job of programmers and numerical analysts. It also supports the implementation of interval arithmetic, which may prove to be preferable to simple scalar arithmetic for many tasks. Nevertheless, IEEE Std. 754 has some serious drawbacks, including:
  • Modes e.g., the rounding modes and traps enabled/disabled mode
  • flags e.g., flags representing the status information stored in floating point status register 25
  • traps required to implement IEEE Std. 754 introduce implicit serialization issues. Implicit serialization is essentially the need for serial control of access (read/write) to and from globally used registers, such as the floating point status register 25. Under IEEE Std. 754, implicit serialization may arise between (1) different concurrent floating-point instructions and (2) between floating point instructions and the instructions that read and write the flags and modes.
  • rounding modes may introduce implicit serialization because they are typically indicated as global state, although in some microprocessor architectures, the rounding mode is encoded as part of the instruction operation code, which will alleviate this problem to that extent.
  • implicit serialization makes the Standard difficult to implement coherently and efficiently in today's superscalar and parallel processing architectures without loss of performance.
  • the rounding mode may be eliminated as a global state by statically encoding the rounding mode as part of the instruction operation code.
  • the rounding mode may be eliminated as a global state by statically encoding the rounding mode as part of the instruction operation code.
  • there is a need for a system that allows for more efficient processing of floating point operands which eliminates the need to separately access status information separate from the floating point operand data structure itself.
  • a floating point unit supporting such an enhanced operand generates results in which floating point status information generated for an operation may be encoded in the result of the operation, i.e., within parts of the operand, and subsequently stored therewith in the floating point register set, instead of requiring a separate floating point status register to receive the status information. Since the floating point status information relating to a floating point operation is in the result generated for the operation, implicit serialization required by maintaining the floating point status information separate and apart therefrom can be advantageously obviated.
  • the floating point operand data structure can be used in floating point computations and processing within a processing device.
  • the floating point data structure comprises a first portion having floating point operand data and a second portion having embedded status information associated with at least one status condition of the floating point operand data.
  • the status condition may be determined from the embedded status information without regard to memory storage (such as a dedicated floating point status register) external to the data structure.
  • the status condition may also be associated with at least one floating point operation that generated the enhanced floating point operand data structure.
  • the outcome of a conditional floating point instruction may be based on the embedded status information without regard to contents of the floating point status register.
  • the second portion of the data structure may also have at least one bit that is indicative of the status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
  • the overflow status may represent one in a group of a +OV status and a -OV status and may also represent a predetermined non-infinity numerical value.
  • the underflow status may represent one in a group of a +UV status and a -UV status and may also represent a predetermined non-zero numerical value.
  • the invalid status may represent a not-a-number (NaN) status due to an invalid operation.
  • the infinity status may represent one in a group of a positive infinity status and a negative infinity status.
  • the second portion of the data structure may also include bits indicative of a predetermined type of operand condition resulting in the NaN status or another type of operand condition resulting in the infinity status.
  • a floating point system is broadly described that is consistent with an embodiment of the present invention.
  • the system is associated with a processing device, such as a microprocessor, for performing at least one floating point operation on a floating point operand.
  • the system includes an operand memory storage, a control unit and a first functional unit.
  • the operand memory storage device maintains the floating point operand.
  • the control unit is in communication with the operand memory storage device and receives the floating point instruction associated with the floating point operation to generate at least one control signal related to the floating point operation.
  • the first functional processing unit is in communication with the operand memory storage device and the control unit.
  • the first functional processing unit is capable processing the floating point operand and storing status information within the processed floating point operand.
  • the status information has at least one bit that is indicative of an operand status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
  • a method for encoding a floating point operand with status information begins by determining a status condition of the floating point operand prior to execution of a floating point operation on the floating point operand.
  • the method stores an updated status condition of the floating point operand within the floating point operand after execution of the floating point operation on the floating point operand.
  • determining the status condition may include identifying the status condition of the floating point operand from only embedded status information within the floating point operand.
  • the status condition is typically from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
  • storing the updated status condition may include embedding updated status information within the floating point operand after execution of the floating point operation.
  • Such updated status information typically represents the updated status condition of the processed floating point operand.
  • the updated status condition is typically from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
  • the updated status condition may be indicative of a previous floating point operation that resulted in the floating point operand.
  • the method may also include conditioning a subsequent floating point operation based only upon the updated status information within the floating point operand. Additionally, the method may include processing an additional floating point operand and storing updated status information related to the additional floating point operand within the additional floating point operand while the updated status information related to the other floating point operand is preserved.
  • a computer- readable medium that is consistent with an embodiment of the present invention.
  • the medium stores a set of instructions for encoding a floating point operand with status information. When executed, the instructions perform the encoding method as described above.
  • Figure 1 is a functional block diagram of a prior art floating point unit
  • Figure 2 depicts prior art formats for representations of floating point values generated by the floating point unit depicted in FIG. 1;
  • Figure 3 is a functional block diagram of an exemplary floating point unit consistent with an embodiment of the present invention.
  • Figure 4 depicts exemplary floating point operand data structure formats for representations of floating point operand values generated by the exemplary floating point unit depicted in FIG. 3 and consistent with an embodiment of the present invention
  • Figures 5 through 11 depict exemplary tables that are useful in understanding the operations of exemplary functional units of the floating point unit depicted in FIG. 3 and consistent with an embodiment of the present invention.
  • FIG. 3 is a functional block diagram of an exemplary floating point unit 40 consistent with an embodiment of the present invention.
  • floating point unit 40 includes plurality of exemplary functional units.
  • these functional units include an adder unit 41 , a multiplier unit 42, a divider unit 43, a square root unit 44, a maximum/minimum unit 45, a comparator unit 46 and a tester unit 47, all of which operate under control of functional unit control signals provided by a control unit 50.
  • the control unit 50 can enable one or two operands to be coupled from at least one of floating point registers 51 onto respective operand buses.
  • two operand buses 52 and 53 are illustrated and provide the operands to operand inputs of the respective functional units 41 through 47.
  • Those skilled in the art will recognize that the principles of the present invention contemplate alternative embodiments using single or multiple operand communication pathways in a variety of alternative communication pathway architectures.
  • the maximum/minimum unit 45 receives two operands and couples one of the maximum MAX(x.y) or minimum MIN(x,y) of the two operands.
  • the comparator unit 46 also receives two operands and identifies which, if either, operand is greater than the other (or equivalently, which operand is less than the other).
  • the tester unit 47 receives one operand and provides a result comprising one bit or a few bits describing numerical properties of the operand, as will be described below.
  • the exemplary control unit 50 generates functional unit control signals that control exemplary functional units 41 through 47.
  • the exemplary control unit 50 operates in response to floating point instructions provided thereto by a conventional arrangement not depicted in FIG 3, but which will be familiar to those skilled in the art.
  • the control unit 50 In response to the floating point instructions and conventional status signals (not shown, but which will also be apparent to those skilled in the art), the control unit 50 generates register control signals that enable the operands to be coupled from the registers 51 onto one or more operand buses 52 and 53. Examples of such "status signals” may include results from the comparator and tester as well as status signals (such as zero, negative, carry, or overflow) from an integer unit or "I/O transfer complete" from an input/output unit.
  • control unit 50 also generates functional unit control signals that enable respective ones of the functional units 41 through 47 to receive the operands from the operand buses 52 and 53, process the operands accordingly and generate results based upon such processing.
  • control unit 50 except for results generated by functional unit 47 (the tester unit), control unit 50 generates register control signals that enable the results to be stored in registers 51.
  • the results are coupled onto a result bus 54, which transfers the results to the registers 51 for storage therein. In this manner, control unit 50 is indirectly provided with the resulting operands having updated status information encoded or embedded within the operand itself.
  • comparator unit 46 is typically directly coupled to the control unit 50 for use thereby in a conventional manner.
  • comparator unit 46 is also connected to result bus 54 so that results may also be indirectly coupled to control unit 50 via register 51.
  • another embodiment of the present invention may be implemented such that results generated by tester unit 47 may be indirectly coupled to control unit 50 via result bus 54.
  • floating point unit 40 does not include a floating point status register for storing floating point status information, which is provided in the prior art floating point unit 10 depicted in FIG. 1.
  • the functional units advantageously encode the floating point status information in results (e.g., enhanced or modified floating point operand data structures) that are generated in certain formats.
  • results e.g., enhanced or modified floating point operand data structures
  • FIG. 4 depicts floating point formats of operands that the functional units 41-47 may receive and formats of results that they generate.
  • a zero format 60 an underflow format 61, a denormalized format 62, a normalized non-zero format 63, and overflow format 64, an infinity format 65 and a not-a-number (NaN) format 66.
  • the zero format 60 which has the same format as conventional zero format 30 (FIG. 2), is used to represent the values "zero.” More specifically, the zero format 60 represents positive or negative zero, depending on the value of "s,” the sign bit.
  • the underflow format 61 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result of a computation is an underflow.
  • the underflow format has a sign bit "s" that indicates whether the result is positive or negative, bits e mb - --e lsb of the exponent field that are all binary zero's, and bits f msb • • • • X b+ ⁇ of the fraction field, except for the least significant bit, that are all binary zero's.
  • the least significant bit f b of the fraction field is a binary one.
  • denormalized format 62 except in the case of the values represented by the underflow format 61 , and normalized non-zero format 63 have the same format as the prior art denormalized and normalized non-zero formats 31 and 32 (FIG. 2) and are used to represent substantially the same range of values.
  • the overflow format 64 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result of a computation is an overflow.
  • the overflow format 64 has a sign bit "s" that indicates whether the result is positive or negative and bits e ⁇ h • ⁇ e lsM of the exponent field that are all binary ones, with the least significant bit e lsh being zero.
  • the bits f msb - --f ⁇ sb of the fraction field are all binary ones.
  • the infinity format 65 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result is infinite.
  • the infinity format 65 has a sign bit "s" that indicates whether the result is positive or negative, bits e mb ---e lsh of the exponent field that are all binary ones, and bits f mb - --f, sb+s of the fraction field that are all binary zero's.
  • the five least significant bits f hh+ ⁇ "'f hh °f the fraction field are flags, which will be described below.
  • the NaN (not-a-number) format 66 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result is not a number.
  • the NaN format has a sign bit "s" that can be any value
  • the bits e n ⁇ sb - --e lsb of the exponent field are all binary ones
  • bits f mih ---f, ib+5 of the fraction field that are not all binary zero's.
  • the five least significant bits f l M - --f hb of the fraction field are flags, which will be described below.
  • the five low order bits f iM - --f hb of the fraction field are deemed to be flags.
  • the five flags include the flags that are defined by IEEE Std. 754, including an invalid operation flag "n,” an overflow flag “o,” an underflow flag “u,” a division-by- zero flag “z,” and an inexact flag "x.”
  • the control unit 50 can advantageously enable multiple instructions to be contemporaneously executed because the floating point status information generated and stored during execution of one instruction will not over-write previously-stored floating point status information generated during execution of another instruction. In this manner, the floating point status information for each floating point operation is advantageously preserved from being over-written and lost in a computationally intensive computing architecture.
  • the other information may indicate the operation and types of operands that gave rise to the result.
  • the other information is associated with binary encoded values (BEV) of the those bits f lsb+i ---f, ib 5 as follows in Table 5.
  • exemplary functional units 41-47 for use in connection with exemplary floating point operand data structure formats 60 through 66 will be described in detail in connection with FIGS. 5 through 11 Before proceeding to those descriptions, it will be convenient to define certain terms.
  • the term +OV generally represents a value in the overflow pattern with the sign bit "s" having the value "zero,” indicating a positive value.
  • the term -OV generally represents a value in the in the overflow pattern with the sign bit "s” having the value "one,” indicating a negative value.
  • +UN generally represents a value in the underflow pattern with the sign bit "s” having the value "zero,” indicating a positive value.
  • -UN generally represents a value in the underflow pattern with the sign bit "s" having the value "one,” indicating a negative value.
  • +OV can be deemed to refer to "some (or any) value that is strictly between +HUGE and + ⁇ "
  • + UN can be deemed to refer to "some (or any) value that is strictly between +0 and +TINY.”
  • -OV can be deemed to refer to "some (or any) value that is strictly between -HUGE and - ⁇ >"
  • -UN can be deemed to refer to "some (or any) value that is strictly between -0 and -TINY.”
  • An exemplary adder unit 41 performs several types of operations, which may include but is not limited to addition of two operands, negation of one operand, subtraction of one operand from another operand, and absolute value of one operand. Generally, the negation operation is not affected by the rounding mode, and the result is always a copy of the operand with its sign reversed, even if the operand is in the NaN format 66. In the subtraction operation, the adder unit 41 generates the result as the sum of the operand that is the minuend and the negative of the operand that is the subtrahend. Essentially, the negative of the operand that is the subtrahend being the result of the negation operation.
  • the adder unit 41 performs a subtraction operation by performing a negation operation on the operand that is the subtrahend and an addition operation in connection with the operand that is the minuend and the result of the negation operation.
  • the absolute value operation is also not affected by the rounding mode, and the result is a copy of the operand with its sign made positive, even if the operand is a NaN.
  • results generated by the adder unit 41 in connection with an addition operation are described in FIG. 5.
  • "+P” or “+Q” means any finite positive nonzero representable value other than +UN and +OV.
  • the label "-P” or “-Q” means any finite negative nonzero representable value other than -UN and -OV.
  • “NaN” means any value whose exponent field is 11111111 , other than one of the values represented by + ⁇ and - ⁇ >.
  • the result For "round toward positive infinity,” the result is + ⁇ ; for “round toward negative infinity,” the result is -oo.
  • the five least significant bits f liM • ⁇ •f lsb of the fraction field of the result are the bitwise OR of the five least significant bits f lsM • ⁇ -f hb of the fraction fields of the two operands.
  • the result is the positive NaN value 0 11111111 1000000000000000101 ouzx (to indicate "infinity minus infinity" with the invalid operation flag set).
  • the four least significant bits f lib+3 ---f hb of the fraction field of the result are the bitwise OR of the four least significant bits f lsM • - ⁇ of the fraction fields of the two operands.
  • the result is a copy of the NaN operand, except that the five least significant bits f, sM ---f, ih of the fraction field of the result are the bitwise OR of the five least significant bits f hM - --f hb of the fraction fields of the two operands.
  • the result is + ⁇ , with the five least significant bits f hb+A ⁇ ⁇ -f hb of the fraction field of the result being the bitwise OR of the five least significant bits f hM ---f hb of the fraction field of the infinite operand with 01001 (to indicate overflow and inexact conditions).
  • the result is a copy of the NaN operand, except that the five least significant bits f hb+ ⁇ --f hh of the fraction field of the result
  • +OV were replaced by +HUGE and -UN were replaced by -
  • the sign bit "s" of the result is "one” if and only if the sign bits of the two NaN operands are both "ones.”
  • results generated by an exemplary multiplier 42 are described in the table depicted in FIG. 6.
  • the term “+P” or “+Q” means any finite positive representable value greater than “one,” other than +OV.
  • the term “-P” or “-Q” means any finite negative representable value less than negative-one, other than -OV.
  • the term “+R” or “+S” means any positive non-zero value less than "one,” other than +UN.
  • the term “-R” or “-S” means any negative non-zero representable value greater than negative-one, other than -UN.
  • “NaN” means any value o
  • the exemplary divider unit 43 can perform two types of operations.
  • results generated by the exemplary divider 43 in connection with division operations are described in the table depicted in FIG. 7 and results generated by the exemplary divider 43 in connection with remainder operations are described in the table depicted in FIG. 8.
  • the term “+P” or “+Q” means any finite positive representable value greater than “one,” other than +OV.
  • the term “-P” or “-Q” means any finite negative representable value less than negative-one, other than -OV.
  • the term “+R” or “+S” means any positive non-zero value less than "one," other than +UN.
  • the term “-R” or “-S” means any negative nonzero representable value greater than negative-one, other than -UN.
  • the term “NaN” means any value whose exponent field is 11111111 , other than one of the values represented by +oo and - ⁇ .
  • the result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative, and the five least significant bits f sb+4 ---f hb of the fraction field of the result are the bitwise OR of the five least significant bits f lsb+A ---f, sb of the fraction fields of the operands.
  • the result is -UN; for “round toward minus infinity,” the result is -OV.
  • the result is the negative NaN value 1 11111111 10000000000000101010101 to indicate "UN divided by UN” with the invalid operation, underflow, and inexact flags set.
  • the result is -0.
  • the result is -oo with five least significant bits f lsb+4 • ⁇ •f ⁇ sb of the fraction field all having the value zero.
  • the result is the negative NaN value 1 11111111 1000000000000100010000 to indicate "zero divided by zero" with the invalid operation flag set.
  • +UN were replaced by +TINY, except that if overflow occurs, the result is +OV. For all other rounding modes, the result is +OV.
  • the result is a copy of the NaN operand that has the larger value in its fraction field, except that the five least significant bits f hb+4 ⁇ ⁇ -f lib of the fraction field of the result are the bitwise OR of the five least significant bits f hb+4 - --f hb of the fraction fields of the operands. Additionally, the sign bit of the result is 1 if and only if the sign bits of the two NaN operands differ. [087] Preliminarily, remainder operations are not affected by the rounding mode. The following is a key to symbols in the table depicted in
  • FIG. 8 (remainder operations) as follows:
  • the result is a copy of the NaN operand, except that the sign of the result is the same as the sign of the first operand, and the five least significant bits f lib+4 ⁇ • ⁇ f hb of the fraction field of the result are the bitwise OR of the five least significant bits
  • the result is a copy of the NaN operand, except that the sign of the result is the same as the sign of the first operand, and the five least significant bits f hb+4 ---f hb of the fraction field of the result are OR-ed with 01001 (to indicate overflow and inexact conditions).
  • the result is a NaN value s 11111111
  • the result is a copy of the NaN operand, except that the sign of the result is the same as the sign of the first operand, and the low five bits of the result are OR-ed with 00101 (to indicate underflow and inexact conditions).
  • the result is a copy of the NaN operand, except that the five least significant bits f hb+4 ---f, b of the fraction field of the result are the bitwise OR of the five least significant bits f, b+4 ---f, ib of the fraction fields of the operands.
  • Results generated by an exemplary square root unit are described in the table depicted in FIG. 9.
  • the term "+P” means any finite positive nonzero representable value other than +UN and +OV.
  • the term “-P” means any finite negative nonzero representable value other than -UN and -OV.
  • the term “NaN” means any value whose exponent field is 11111111 , other than one of the values represented by + ⁇ AND -oo.
  • results generated by an exemplary maximum/minimum unit 45 in connection with a maximum operation, in which the unit 45 determines the maximum of two operands, are described in the table depicted in FIG. 10.
  • Results generated in connection with a minimum operation, in which the unit 45 determines the minimum of two operands, are described in the table depicted in FIG. 11.
  • “+P” or “+Q” means any finite positive nonzero representable value other than +UN and +OV.
  • -P” or “-Q” means any finite negative nonzero representable value other than -UN and -OV.
  • NaN means any value whose exponent field is 1111111 , other than one of the values represented by +oo and -oo.
  • neither the maximum operation nor the minimum operation is affected by the rounding mode.
  • both the maximum operation and the minimum operation are commutative, even when one or both operands are NaN value.
  • the result is a copy of the NaN operand, except that the five least significant bits f hb+ ⁇ --f hb of the fraction field of the result are the bitwise OR of the five least significant bits f hb+4 ⁇ • ⁇ f, b of the fraction field of the two operands.
  • the result is a copy of the NaN operand, except that the five least significant bits f, b+4 ---f hb of the fraction field of the result are the bitwise OR of the five least significant bits f l M ⁇ • ⁇ f l h of the fraction field of the NaN operand with 00101 (to indicate underflow and inexact conditions), (h) The result is a copy of whichever operand has the larger magnitude, (i) The result is +oo with the five least significant bits f lsb+4 • • -/ ⁇ A of the fraction field of the result being the bitwise OR of the five least significant bits f hb+4 ⁇ ⁇ -f lib of the fraction fields of the two operands, (j) The result is a copy of whichever NaN operand has the larger fraction, except that the five least significant bits f hb+4 ⁇ ••f hb of the fraction field of the result are the bitwise OR of the five least significant bits f h
  • the other operand does not affect the five least significant bits ⁇ ⁇ +4 • "f hb °f the fraction field of the result.
  • the result is a copy of the NaN operand, except that the five least significant bits f hb+4 ⁇ ⁇ -f hh of the fraction field of the result are the bitwise OR of the five least significant bits f hM ⁇ ⁇ ⁇ f hb of the fraction fields of the two operands.
  • the result is a copy of the NaN operand, except that the five least significant bits f hM ---f hb of the fraction field of the result are the bitwise OR of the five least significant bits f lib+4 - --f ib of the fraction field of the NaN operand with 01001 (to indicate overflow and inexact conditions).
  • the result is a copy of the NaN operand, except that the five least significant bits f hb+4 ---f b of the fraction field of the result are the bitwise OR of the five least significant bits f hb+4 ⁇ ⁇ ⁇ f hb of the fraction field of the NaN operand with 00101 (to indicate underflow and inexact conditions).
  • exemplary comparator unit 46 receives two operands and generates a result that indicates whether the operands are equal, and, if not, which operand is larger. Generally, the comparison is not affected by the rounding mode. In a comparison operation consistent with an embodiment of the present invention,
  • NaN format 66 (iii) two negative operands in the infinity format 65 are equal regardless of the values of the flags "nouzx," (iv) a negative operand in the infinity format 65 is less than an operand in any other format, except for an operand that is in the
  • an operand in the NaN format 66 is unordered (i.e., neither greater than, less than, nor equal to) another operand in any format, including another operand in the NaN format 66, and (vi) operands in the format other than the infinity format 65 and NaN format 66 compare in accordance with IEEE Std. 754.
  • +UN is greater than +0 and less than +TINY; +OV is greater than +HUGE and less than +oo; and so on.
  • a tester unit 47 receives a single floating-point operand and determines whether it has one of a selected set of status conditions. Based upon the determination, the tester unit 47 produces a signal to indicate whether or not the selected condition holds.
  • the conditions include:
  • the operand is in either the infinity format 65 or the NaN format
  • the operand is in the overflow format 64; (v) the operand is in the overflow format 64 or contains a set overflow flag "o"; (vi) the operand is in the underflow format 61 ; (vii) the operand is in the underflow format 61 or contains a set underflow flag "u”; (viii) the operand is in the zero format 60; (ix) the operand is in the zero format 60 and the sign bit "s" is "zero"
  • the exemplary tester unit 47 can generate one or more result signals representing each of those conditions, which signals will be provided to the control unit 50.
  • the result signal is provided directly back to control unit 50.
  • the tester unit 47 provides the result signal on a result bus to register 51 for future access by the control unit 50.
  • the control unit 50 can select one of these signals and use the value of the selected signal to control the future behavior of the program.
  • the control unit 50 may control a functional unit's operation on one or more operands and use the result of the operation to complete processing of a conditional floating point operation. Examples of conditional floating point operations include but are not limited to conditional trap instructions and conditional branch instructions with multiple possible outcomes that dynamically depend upon the basis of the results of the conditional operations being processed.
  • the exemplary floating point unit 40 may include additional or fewer functional units than those described herein.
  • the exemplary floating point unit 40 may include additional or fewer functional units than those described herein.
  • circuits described in other patent applications as being useful in the floating point unit 40, it will be appreciated that other circuits may be used to implement the principles of the present invention.
  • particular floating point status flags "n,” “o,” “u,” “z” and “x” have been indicated as being provided, it will be appreciated that not all flags need to be provided, and that other flags, representing other conditions, may be used in addition or instead.
  • system may be operated and/or otherwise controlled by means of information provided by an operator using operator input elements (not shown) which may be connected directly to the system or which may transfer the information to the system over a network or other mechanism for transferring information in a conventional manner.
  • operator input elements not shown

Abstract

A floating point unit generates results in which status information generated for an operation is encoded within the resulting operand, instead of requiring a separate floating point status register for the status information. In one embodiment, a floating point operand data structure comprises a first portion having floating point operand data and a second portion having embedded status information associated with at least one status condition of the operand data. The status condition may be determined from only the embedded status information. The status condition may also be associated with at least one floating point operation that generated the operand data structure. The outcome of a conditional floating point instruction may be based on the embedded status information without regard to contents of the floating point status register. The second portion of the data structure may also have at least one bit indicative of the status condition, such as an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.

Description

FLOATING POINT SYSTEM THAT REPRESENTS STATUS FLAG INFORMATION WITHIN A FLOATING POINT OPERAND
DESCRIPTION OF THE INVENTION
Field of the Invention
[001] The invention generally relates to systems and methods for performing floating point arithmetic computations, and more particularly to systems and methods for performing floating point computations which conform to behavior specified in IEEE Standard ("Std.") 754 using a modified or enhanced format for the floating point operand. Background of the Invention
[002] Digital electronic devices, such as digital computers, calculators, and other devices, perform arithmetic calculations on values in integer, or "fixed point," format, in fractional, or "floating point" format, or both. Those skilled in the art will appreciate that IEEE Standard 754 (hereinafter "IEEE Std. 754" or "the Standard") is a conventional standard that was published in 1985 by the Institute of Electrical and Electronic Engineers, and adopted by the American National Standards Institute (ANSI). IEEE Std. 754 defines several standard formats for expressing values in floating point format, and a number of aspects regarding behavior of computation in connection therewith.
[003] Fundamentally and in accordance with IEEE Std. 754, a value in representation in floating point format comprises a plurality of binary digits, or "bits," having the structure
Sem b ' "
Figure imgf000003_0001
' " Jlsb where bit "s" is a sign bit indicating whether the entire value is positive or negative. Bits " emb ■ ■ -ehh " comprise an exponent field representing the exponent "e" in unsigned binary biased format. Finally, bits " f„,ih ■ ■ ■ fhh " comprise a fraction field that represents the fractional portion "f ' in unsigned binary format ("msb" represents "most significant bit" and "Isb" represents "least significant bit"). The Standard defines two general formats, namely, a "single" format which comprises thirty-two bits, and a "double format which comprises sixty-four bits. In the single format, there is one sign bit "s," eight bits "e7 ••e0" comprising the exponent field and twenty-three bits " f22 • ■ •/0 " comprising the fraction field. In the double format, there is one sign bit "s," eleven bits "e10 ■ ■ ■e0" comprising the exponent field and fifty-two bits " f5 • • ■/<, " comprising the fraction field.
[004] As indicated above, the exponent field of the floating point representation "emsb ■ ■ ■elib " represents the exponent "E" in biased format. The biased format provides a mechanism by which the sign of the exponent is implicitly indicated. Those skilled in the art will appreciate that the bits "emsb ---e h " represent a binary encoded value "e" such that "e=E+bias." This allows the exponent E to extend from -126 to +127, in the eight-bit "single" format, and from -1022 to +1023 in the eleven-bit "double" format, and provides for relatively easy manipulation of the exponents in multiplication and division operations, in which the exponents are added and subtracted, respectively.
[005] IEEE Std. 754 provides for several different formats with both the single and double formats which are generally based on the bit patterns of the bits "elll ---ehb" comprising the exponent field and the bits "/„,_» • •//,* " comprising the fraction field. The formats are generally depicted in FIG. 2. If a number is represented such that all of the bits "eι h ---ehh " of the exponent field are binary one's (i.e., if the bits represent a binary-encoded value of "255" in the single format or "2047" in the double format) and all of the bits " fm b • • • t, " °f ^e fraction field are binary zeros, then the value of the number is positive or negative infinity, depending on the value of the sign bit "s." In particular, the value "v" is v = (-l)'∞, where "∞" represents the value "infinity" (reference format 33). On the other hand, if all of the bits "elι b ■ ■■ehb " of the exponent field are binary one's and if the bits " fιmb ■ ■ ■ fhb " of the fraction field are not all zero's, then the value that is represented is conventionally deemed "not a number" and abbreviated in the Standard as "NaN" (reference format 34). [006] If a number has an exponent field in which the bits " em b • ■ • eΛΛ " are neither all binary ones nor all binary zeros (i.e., if the bits represent a binary-encoded value between 1 and 254 in the single format or between 1 and 2046 in the double format), the number is said to be in a "normalized" format (reference format 32). For a number in the normalized format, the value represented by the number is v=(-l) 2e-l"ai(l. \ f ib ■ ■ ■fhb) , where "|" represents a concatenation operation. In the normalized format, there is an implicit most significant digit having the value "one." In this manner, 23 digits in the fraction field of the single format or 52 digits in the fraction field of the double format will effectively represent a value having 24 digits or 53 digits of precision, respectively, where the value is less than 2, but not less than 1.
[007] On the other hand, if a number has an exponent filed in which the bits "en b ■ ■ ■ehh" are all binary zeros, representing the binary-encoded value of "zero," and a fraction field in which the bits fm b ---f,ib " are not all zero, the number is said to be in a "de-normalized" format (reference format 31). For a number in the de-normalized format, the value represented by the number is v=(-l)i 2e_*'αi+1(0. | /„„ft ■ ■ ■fhb) . For both single and double formats, it will be appreciated that the range of values of numbers that can be expressed in the de-normalized format is disjoint from the range of values of numbers that can be expressed in the normalized format.
[008] Finally, if a number has an exponent field in which the bits "emb ---elib " are all binary zeros, representing the binary-encoded value of
"zero," and a fraction field in which the bits " fm b ■ ■fhb " are all zero, the number has the value "zero" (reference format 30). Depending on the value of the sign bit, it will be appreciated that the value "zero" may be positive zero or negative zero.
[009] Generally, circuits or devices that perform floating point computations or operations (generally referred to as floating point units) conforming to IEEE Std. 754 are designed to generate a result in three steps: [010] (a) In the first step, an approximation calculation step, an approximation to the absolutely accurate mathematical result (assuming that the input operands represent the specific mathematical values as described by IEEE Std. 754) is calculated that is sufficiently precise as to allow this accurate mathematical result to be summarized. The summarized result is usually represented by a sign bit, an exponent (typically represented using more bits than are used for an exponent in the standard floating-point format), and some number "N" of bits of the presumed result fraction, plus a guard bit and a sticky bit. The value of the exponent will be such that the value of the fraction generated in step (a) consists of a 1 before the binary point and a fraction after the binary point. The bits are commonly calculated so as to obtain the same result as the following conceptual procedure (which is impossible under some circumstances to carry out in practice): calculate the mathematical result to an infinite number of bits of precision in binary scientific notation, and in such a way that there is no bit position in the significand such that all bits of lesser significance are 1-bits (this restriction avoids the ambiguity between, for example, 1.100000... and 1.011111... as representations of the value "one-and-one-half); let the N most significant bits of the infinite significand be used as the intermediate result significand; let the next bit of the infinite significand be the guard bit; and let the sticky bit be 0 if and only if ALL remaining bits of the infinite significant are 0-bits (in other words, the sticky bit is the logical OR of all remaining bits of the infinite fraction after the guard bit).
[011] (b) In the second step, a rounding step, the guard bit, the sticky bit, perhaps the sign bit, and perhaps some of the bits of the presumed significand generated in step (a) are used to decide whether to alter the result of step (a). For conventional rounding modes defined by IEEE Std. 754, this is a decision as to whether to increase the magnitude of the number represented by the presumed exponent and fraction generated in step (a). Increasing the magnitude of the number is done by adding 1 to the significand in its least significant bit position, as if the significand were a binary integer. It will be appreciated that, if the significand is all 1-bits, the magnitude of the number is "increased" by changing it to a high-order 1-bit followed by all 0-bits and adding 1 to the exponent.
[012] Regarding the rounding modes, it will be further appreciated that,
[013] (i) if the result is a positive number, and
[014] (a) if the decision is made to increase, effectively the decision has been made to increase the value of the result, thereby rounding the result up (i.e., towards positive infinity), but
[015] (b) if the decision is made not to increase, effectively the decision has been made to decrease the value of the result, thereby rounding the result down (i.e., towards negative infinity); and
[016] (ii) if the result is a negative number, and
[017] (a) if the decision is made to increase, effectively the decision has been made to decrease the value of the result, thereby rounding the result down, but
[018] (b) if the decision is made not to increase, effectively the decision has been made to increase the value of the result, thereby rounding the result up.
[019] For example, IEEE Std. 754 defines a round-toward-nearest mode in which the least significant bit (Isb) of the significand, the guard bit, and the sticky bit are examined to decide if an increase is made according to the following decision table:
TABLE 1
Isb guard sticky * increase?
0 0 0 * no
0 0 1 * no
0 1 0 * no
0 1 1 * yes
1 0 0 * no
1 0 1 * no
1 1 0 * yes 1 1 1 * yes
where "sign" refers to the sign bit, "guard" refers to the guard bit, and "sticky" refers to the sticky bit, and "increase?" refers to the decision as to whether to increase the magnitude of the number generated in step (a). This may also be described by the Boolean expression "guard AND (Isb OR sticky)."
[020] IEEE Std. 754 defines a round-toward-minus-infinity mode in which the decision as to whether to increase is made according to the following decision table:
TABLE 2 sign guard sticky * increase?
0 0 0 * no
0 0 1 * no
0 1 0 * no
0 1 1 * no
1 0 0 * no
1 0 1 * yes
1 1 0 * yes
1 1 1 * yes
This may also be described by the Boolean expression "(guard OR sticky) AND sign."
[021] IEEE Std. 754 defines a round-toward-plus-infinity mode in which the decision as to whether to increase is made according to the following decision table:
TABLE 3 sign guard sticky * increase?
0 0 0 * no
0 0 1 * yes
0 1 0 * yes
0 1 1 * yes
1 0 0 * no
1 0 1 * no
1 1 0 * no
1 1 1 * no
This may also be described by the Boolean expression "(guard OR sticky) AND NOT sign."
[022] Finally, IEEE Std. 754 defines a round-toward-zero mode in which the decision as to whether to increase is made according to the following decision table:
TABLE 4 sign guard sticky * increase?
0 0 0 * no
0 0 1 * no
0 1 0 * no
0 1 1 * no
1 0 0 * no
1 0 1 * no
1 1 0 * no
1 1 1 * no
This may also be described by the Boolean expression "FALSE." It will be appreciated that, in the "round toward zero" mode, the decision is made never to increase.
[023] (c) In the third step, a packaging step, a result is packaged into a standard floating-point format. This may involve substituting a special representation, such as the representation defined for infinity or NaN if an exceptional situation (such as overflow, underflow, or an invalid operation) was detected. Alternatively, this may involve removing the leading 1-bit (if any) of the fraction, because such leading 1-bits are implicit in the standard format. As another alternative, this may involve shifting the fraction in order to construct a denormalized number. As a specific example, it is assumed that this is the step that forces the result to be a NaN if any input operand is a NaN. In this step, the decision is also made as to whether the result should be an infinity. It will be appreciated that, if the result is to be a NaN or infinity, the original result will be discarded and an appropriate representation will be provided as the result.
[024] FIG. 1 depicts the typical organization for a floating point unit 10 for a conventional prior art microprocessor capable of performing the floating point operations described above. With reference to FIG. 1 , the floating point unit includes a number of circuits commonly referred to as functional units. These functional units include an adder 11 , a multiplier 12, a divider 13, a square-root unit 14, a maximum/minimum unit 15 that delivers the larger or smaller of two operands, a comparator 16 that typically delivers one bit or a few bits describing a numerical relationship between two operands, and a tester 17 that examines just one operand and delivers one bit or a few bits describing numerical properties of the operand. The functional units are controlled by a control unit 20 that interprets program instructions, enables operands to be coupled from a set of floating point registers 21 onto respective operand buses 22 and 23, generates functional unit control signals that enable respective ones of the functional units 11 through 17 to receive the operand or operands and perform their respective operations to generate a result. If the functional unit control signals enable one of the functional units 11 through 16 to operate, the result is coupled onto a result bus 24 and stored in the floating point registers 21. In addition, if the functional unit control signals enable the comparator 16 or the tester 17 to operate, the result will be coupled to the control unit 20. Some functional units may be capable of delivering a result to the result bus in the same clock cycle that operands are presented to it. However, other functional units may be pipelined, in which case results are delivered during a clock cycle that is later than the clock cycle during which the corresponding operands were presented to the functional unit.
[025] Functional units 11 through 14 also generate floating-point status information, which is typically coupled to a floating point status register 25 over floating-point status bus 26 for storage therein. The floating point status information is stored and/or accumulated in the floating point status register 25. The floating point status information generated for a particular floating point operation includes indications, for example, as to whether:
[026] (i) a particular operand is invalid for the operation to be performed ("invalid operation");
[027] (ii) if the operation to be performed is division, the divisor is zero ("division-by-zero");
[028] (iii) an overflow occurred during the operation
("overflow");
[029] (iv) an underflow occurred during the operation
("underflow"); and
[030] (v) the rounded result of the operation is not exact
("inexact").
[031] These conditions are typically represented by flags that are stored in the floating point status register 25. The floating point status information can be read from the floating point status register 25 by the control unit 20 in the same manner as result bits from the comparator or tester, and this information can be used to dynamically control the operations performed by the floating point unit 10 in response to certain conditional floating point instructions, such as conditional branch, conditional move, and conditional trap instructions. The outcome of such conditional instructions typically relies upon accessing the floating point status information from the floating point status register 25.
[032] Also, the floating point status information read from the floating point status register 25 may enable the control unit 20 to initiate processing of a trap sequence, which will interrupt the normal flow of program execution. In addition, status bits read from the floating point status register 25 may be used by the control unit 20 to affect certain ones of the functional unit control signals that it sends to the functional units, such as the functional unit control signals that control the rounding mode.
[033] IEEE Std. 754 has brought relative harmony and stability to floating-point computation and architectural design of floating-point units. Moreover, its design was based on some important principles, and rests on a sensible mathematical semantics that eases the job of programmers and numerical analysts. It also supports the implementation of interval arithmetic, which may prove to be preferable to simple scalar arithmetic for many tasks. Nevertheless, IEEE Std. 754 has some serious drawbacks, including:
[034] (i) Modes (e.g., the rounding modes and traps enabled/disabled mode), flags (e.g., flags representing the status information stored in floating point status register 25), and traps required to implement IEEE Std. 754 introduce implicit serialization issues. Implicit serialization is essentially the need for serial control of access (read/write) to and from globally used registers, such as the floating point status register 25. Under IEEE Std. 754, implicit serialization may arise between (1) different concurrent floating-point instructions and (2) between floating point instructions and the instructions that read and write the flags and modes. Furthermore, rounding modes may introduce implicit serialization because they are typically indicated as global state, although in some microprocessor architectures, the rounding mode is encoded as part of the instruction operation code, which will alleviate this problem to that extent. Thus, the potential for implicit serialization makes the Standard difficult to implement coherently and efficiently in today's superscalar and parallel processing architectures without loss of performance.
[035] (ii) The implicit side effects of a procedure that can change the flags or modes can make it very difficult for compilers to perform optimizations on floating point code. As a result, compilers for most languages usually assume that every procedure call is an optimization barrier in order to be safe. This unfortunately may lead to further loss of performance. [036] (iii) Global flags, such as those that signal certain modes, make it more difficult to do instruction scheduling where the best performance is provided by interleaving instructions of unrelated computations. Thus, instructions from regions of code governed by different flag settings or different flag detection requirements cannot easily be interleaved when they must share a single set of global flag bits.
[037] (iv) Furthermore, traps have been difficult to integrate efficiently into computing architectures and programming language designs for fine-grained control of algorithmic behavior.
[038] As noted above, the rounding mode may be eliminated as a global state by statically encoding the rounding mode as part of the instruction operation code. However, there is no existing architecture that eliminates flags and the trap enabled/disabled mode as global state while still supporting similar exception detection capabilities. Thus, there is a need for a system that allows for more efficient processing of floating point operands which eliminates the need to separately access status information separate from the floating point operand data structure itself.
SUMMARY OF THE INVENTION
[039] Methods, systems, data structures and articles of manufacture consistent with the present invention overcome these shortcomings with a new and improved type of floating point operand that advantageously and efficiently eliminates flags as global states while still supporting exception detection capabilities. In brief summary, a floating point unit supporting such an enhanced operand generates results in which floating point status information generated for an operation may be encoded in the result of the operation, i.e., within parts of the operand, and subsequently stored therewith in the floating point register set, instead of requiring a separate floating point status register to receive the status information. Since the floating point status information relating to a floating point operation is in the result generated for the operation, implicit serialization required by maintaining the floating point status information separate and apart therefrom can be advantageously obviated.
[040] More particularly stated, a floating point operand data structure consistent with the present invention is broadly described herein. The floating point operand data structure can be used in floating point computations and processing within a processing device. The floating point data structure comprises a first portion having floating point operand data and a second portion having embedded status information associated with at least one status condition of the floating point operand data. The status condition may be determined from the embedded status information without regard to memory storage (such as a dedicated floating point status register) external to the data structure. The status condition may also be associated with at least one floating point operation that generated the enhanced floating point operand data structure. The outcome of a conditional floating point instruction may be based on the embedded status information without regard to contents of the floating point status register.
[041] The second portion of the data structure may also have at least one bit that is indicative of the status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status. More specifically, the overflow status may represent one in a group of a +OV status and a -OV status and may also represent a predetermined non-infinity numerical value. The underflow status may represent one in a group of a +UV status and a -UV status and may also represent a predetermined non-zero numerical value. The invalid status may represent a not-a-number (NaN) status due to an invalid operation. The infinity status may represent one in a group of a positive infinity status and a negative infinity status. The second portion of the data structure may also include bits indicative of a predetermined type of operand condition resulting in the NaN status or another type of operand condition resulting in the infinity status. [042] Using such an enhanced type of data structure for a floating point operand, it will be appreciated that addition, multiplication, maximum and minimum floating point operations on the data structure are commutative.
[043] In still another aspect of the present invention, a floating point system is broadly described that is consistent with an embodiment of the present invention. The system is associated with a processing device, such as a microprocessor, for performing at least one floating point operation on a floating point operand. The system includes an operand memory storage, a control unit and a first functional unit. The operand memory storage device maintains the floating point operand. The control unit is in communication with the operand memory storage device and receives the floating point instruction associated with the floating point operation to generate at least one control signal related to the floating point operation. The first functional processing unit is in communication with the operand memory storage device and the control unit. The first functional processing unit is capable processing the floating point operand and storing status information within the processed floating point operand. Typically, the status information has at least one bit that is indicative of an operand status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
[044] In still another aspect of the present invention, a method for encoding a floating point operand with status information is described that is consistent with an embodiment of the present invention. The method begins by determining a status condition of the floating point operand prior to execution of a floating point operation on the floating point operand. The method stores an updated status condition of the floating point operand within the floating point operand after execution of the floating point operation on the floating point operand.
[045] In more detail, determining the status condition may include identifying the status condition of the floating point operand from only embedded status information within the floating point operand. The status condition is typically from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status. Furthermore, storing the updated status condition may include embedding updated status information within the floating point operand after execution of the floating point operation. Such updated status information typically represents the updated status condition of the processed floating point operand. The updated status condition is typically from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status. The updated status condition may be indicative of a previous floating point operation that resulted in the floating point operand.
[046] The method may also include conditioning a subsequent floating point operation based only upon the updated status information within the floating point operand. Additionally, the method may include processing an additional floating point operand and storing updated status information related to the additional floating point operand within the additional floating point operand while the updated status information related to the other floating point operand is preserved.
[047] In still another aspect of the present invention, a computer- readable medium is described that is consistent with an embodiment of the present invention. The medium stores a set of instructions for encoding a floating point operand with status information. When executed, the instructions perform the encoding method as described above.
[048] Additional advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
[049] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
[050] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[051] The scope of the present invention is pointed out with particularity in the appended claims. The above and further advantages of this invention may be better understood by referring to the following description taken in conjunction with the accompanying drawings, in which:
[052] Figure 1 is a functional block diagram of a prior art floating point unit;
[053] Figure 2 depicts prior art formats for representations of floating point values generated by the floating point unit depicted in FIG. 1;
[054] Figure 3 is a functional block diagram of an exemplary floating point unit consistent with an embodiment of the present invention;
[055] Figure 4 depicts exemplary floating point operand data structure formats for representations of floating point operand values generated by the exemplary floating point unit depicted in FIG. 3 and consistent with an embodiment of the present invention; and
[056] Figures 5 through 11 depict exemplary tables that are useful in understanding the operations of exemplary functional units of the floating point unit depicted in FIG. 3 and consistent with an embodiment of the present invention.
DESCRIPTION OF THE EMBODIMENTS
[057] Reference will now be made in detail to exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
[058] FIG. 3 is a functional block diagram of an exemplary floating point unit 40 consistent with an embodiment of the present invention. With reference to FIG. 3, floating point unit 40 includes plurality of exemplary functional units. In the exemplary embodiment, these functional units include an adder unit 41 , a multiplier unit 42, a divider unit 43, a square root unit 44, a maximum/minimum unit 45, a comparator unit 46 and a tester unit 47, all of which operate under control of functional unit control signals provided by a control unit 50. As with the floating point unit 10 depicted in FIG. 1 , the control unit 50 can enable one or two operands to be coupled from at least one of floating point registers 51 onto respective operand buses. In the exemplary embodiment, two operand buses 52 and 53 are illustrated and provide the operands to operand inputs of the respective functional units 41 through 47. Those skilled in the art will recognize that the principles of the present invention contemplate alternative embodiments using single or multiple operand communication pathways in a variety of alternative communication pathway architectures.
[059] The adder 41 receives two operands "x" and "y" and generates the result of the sum S(x,y)=x+y of the operands. The multiplier 42 also receives two operands and generates the result as the multiplicative product P(x,y)=x*y of the values represented by the operands. The divider 43 can perform two types of operations. In one type of operation, which will be referred to as "division," the divider 43 receives two operands and generates the result as the quotient Q(x,y)=x/y of one operand divided by the other operand. In a second operation, which will be referred to as "remainder," the divider receives two operands and generates the result as the remainder REM(x,y)=x-(y*n), where "n" is an integer nearest the value x/y. The square root unit 44 receives one operand and generates a result as the square root SR(x)=sqrt(x) of the operand. The maximum/minimum unit 45 receives two operands and couples one of the maximum MAX(x.y) or minimum MIN(x,y) of the two operands. The comparator unit 46 also receives two operands and identifies which, if either, operand is greater than the other (or equivalently, which operand is less than the other). The tester unit 47 receives one operand and provides a result comprising one bit or a few bits describing numerical properties of the operand, as will be described below.
[060] As noted above, the exemplary control unit 50 generates functional unit control signals that control exemplary functional units 41 through 47. The exemplary control unit 50 operates in response to floating point instructions provided thereto by a conventional arrangement not depicted in FIG 3, but which will be familiar to those skilled in the art. In response to the floating point instructions and conventional status signals (not shown, but which will also be apparent to those skilled in the art), the control unit 50 generates register control signals that enable the operands to be coupled from the registers 51 onto one or more operand buses 52 and 53. Examples of such "status signals" may include results from the comparator and tester as well as status signals (such as zero, negative, carry, or overflow) from an integer unit or "I/O transfer complete" from an input/output unit.
[061] The control unit also generates functional unit control signals that enable respective ones of the functional units 41 through 47 to receive the operands from the operand buses 52 and 53, process the operands accordingly and generate results based upon such processing. In addition, except for results generated by functional unit 47 (the tester unit), control unit 50 generates register control signals that enable the results to be stored in registers 51. In the embodiment with functional units 41 through 46, the results are coupled onto a result bus 54, which transfers the results to the registers 51 for storage therein. In this manner, control unit 50 is indirectly provided with the resulting operands having updated status information encoded or embedded within the operand itself.
[062] The results generated by the comparator unit 46 and tester unit 47 are typically directly coupled to the control unit 50 for use thereby in a conventional manner. In the embodiment illustrated in FIG. 3, comparator unit 46 is also connected to result bus 54 so that results may also be indirectly coupled to control unit 50 via register 51. Further, it is contemplated that another embodiment of the present invention may be implemented such that results generated by tester unit 47 may be indirectly coupled to control unit 50 via result bus 54.
[063] It should be noted that floating point unit 40 does not include a floating point status register for storing floating point status information, which is provided in the prior art floating point unit 10 depicted in FIG. 1. Instead, the functional units advantageously encode the floating point status information in results (e.g., enhanced or modified floating point operand data structures) that are generated in certain formats. These data structure formats will be illustrated in connection with FIG. 4, which depicts floating point formats of operands that the functional units 41-47 may receive and formats of results that they generate. With reference to FIG. 4, seven formats are depicted, including a zero format 60, an underflow format 61, a denormalized format 62, a normalized non-zero format 63, and overflow format 64, an infinity format 65 and a not-a-number (NaN) format 66.
[064] The zero format 60, which has the same format as conventional zero format 30 (FIG. 2), is used to represent the values "zero." More specifically, the zero format 60 represents positive or negative zero, depending on the value of "s," the sign bit.
[065] The underflow format 61 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result of a computation is an underflow. In the embodiment illustrated in FIG. 4, the underflow format has a sign bit "s" that indicates whether the result is positive or negative, bits emb - --elsb of the exponent field that are all binary zero's, and bits fmsb • • • Xb+\ of the fraction field, except for the least significant bit, that are all binary zero's. The least significant bit f b of the fraction field is a binary one.
[066] The denormalized format 62, except in the case of the values represented by the underflow format 61 , and normalized non-zero format 63 have the same format as the prior art denormalized and normalized non-zero formats 31 and 32 (FIG. 2) and are used to represent substantially the same range of values.
[067] The overflow format 64 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result of a computation is an overflow. In the embodiment illustrated in FIG. 4, the overflow format 64 has a sign bit "s" that indicates whether the result is positive or negative and bits eι h •■■elsM of the exponent field that are all binary ones, with the least significant bit elsh being zero. The bits fmsb - --fιsb of the fraction field are all binary ones.
[068] The infinity format 65 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result is infinite. In the embodiment illustrated in FIG. 4, the infinity format 65 has a sign bit "s" that indicates whether the result is positive or negative, bits emb ---elsh of the exponent field that are all binary ones, and bits fmb - --f,sb+s of the fraction field that are all binary zero's. The five least significant bits fhh+Λ "'fhh °f the fraction field are flags, which will be described below.
[069] The NaN (not-a-number) format 66 provides a mechanism by which a functional unit, such as units 41 through 45, can indicate that the result is not a number. In the embodiment illustrated in FIG. 4, the NaN format has a sign bit "s" that can be any value, the bits enιsb - --elsb of the exponent field are all binary ones, and bits fmih ---f,ib+5 of the fraction field that are not all binary zero's. The five least significant bits fl M - --fhb of the fraction field are flags, which will be described below.
[070] As noted above, in both values represented in the infinity format
65 and the NaN format 66, the five low order bits fiM - --fhb of the fraction field are deemed to be flags. In the illustrated embodiment, the five flags include the flags that are defined by IEEE Std. 754, including an invalid operation flag "n," an overflow flag "o," an underflow flag "u," a division-by- zero flag "z," and an inexact flag "x." For example, a value in the NaN format
66 in which both the overflow flag "o" and the division-by-zero flag "z" are set, indicates that the resulting operand value represents a result of a computation that caused an overflow (this from the overflow flag "o") as well as an attempt to divide by zero (this from the division-by-zero flag "z"). It should be noted that the flags provide the same status information as provided by prior art floating point status register 24 in prior art floating point unit 10. However, the status information is provided as an embedded part of the result and stored therewith in the registers 51. As a result, the control unit 50 can advantageously enable multiple instructions to be contemporaneously executed because the floating point status information generated and stored during execution of one instruction will not over-write previously-stored floating point status information generated during execution of another instruction. In this manner, the floating point status information for each floating point operation is advantageously preserved from being over-written and lost in a computationally intensive computing architecture.
[071] In addition to including status information in the five low-order bits fιsM "-flsh of the fraction field for values in the NaN format 66, other information may also be encoded in the next five low-order bits flsb+9 - --fhb+5
If the value in the NaN format 66 is the result of an operation, the other information may indicate the operation and types of operands that gave rise to the result. In one embodiment, the other information is associated with binary encoded values (BEV) of the those bits flsb+i ---f,ib 5 as follows in Table 5.
TABLE 5
Bit Pattern Of Result BEV of Meaning
Jhh+9 ' ' ' Jhh+5
O or 1 no specific meaning s 11111111 lOOOOOOOOOOOOOOOlOnouzx 2 infinity minus infinity s 11111111 lOOOOOOOOOOOOOOOllnouzx 3 OV minus OV s 11111111 lOOOOOOOOOOOOOOlOOnouzx 4 zero times infinity s 11111111 lOOOOOOOOOOOOOOlOlnouzx 5 UN times OV
6 or 7 no specific meaning
S 11111111 lOOOOOOOOOOOOOlOOOnouzx 8 zero divided by zero s 11111111 lOOOOOOOOOOOOOlOOlnouzx 9 infinity divided by infinity s 11111111 lOOOOOOOOOOOOOlOlOnouzx 10 UN divided by UN s 11111111 lOOOOOOOOOOOOOlOllnouzx 11 OV divided by OV s 11111111 lOOOOOOOOOOOOOHOOnouzx 12 square root of less than zero
13-16 no specific meaning s 11111111 lOOOOOOOOOOOOlOOOlnouzx 17 remainder by zero s 11111111 lOOOOOOOOOOOOlOOlOnouzx 18 remainder by UN s 11111111 lOOOOOOOOOOOOlOOllnouzx 19 remainder by OV S 11111111 lOOOOOOOOOOOOlOlOOnouzx 20 remainder of infinity s liiiiiii looooooooooooioioinouzx 21 remainder of infinity by zero s liiiiiii looooooooooooiononouzx 22 remainder if infinity by UN s liiiiiii looooooooooooioninouzx 23 remainder of infinity by OV s liiiiiii loooooooooooonooonouzx 24 remainder of OV s liiiiiii loooooooooooonooinouzx 25 remainder of OV by zero s liiiiiii loooooooooooonoionouzx 26 remainder of OV by UN s liiiiiii loooooooooooonoiinouzx 27 remainder of OV by OV
28-31 no specific meaning
[072] In Table 5, "OV" refers to an operand in the overflow format 64, "UN" refers to an operand in the underflow format 61 and "infinity" refers to an operand in the infinity format 65. Further, it will be assumed that the above listed formats represent thirty-two bit values. Those skilled in the art will readily appreciate that such formats are easily extended to, for example, sixty- four bit values or values represented in other numbers of bits.
[073] It will also be appreciated that, by including the floating point status information relating to a floating point operation embedded within the result generated for the operation, the implicit serialization required by maintaining the floating point status information separate and apart therefrom can be advantageously obviated.
[074] With this background, exemplary functional units 41-47 for use in connection with exemplary floating point operand data structure formats 60 through 66 will be described in detail in connection with FIGS. 5 through 11 Before proceeding to those descriptions, it will be convenient to define certain terms. The term +OV generally represents a value in the overflow pattern with the sign bit "s" having the value "zero," indicating a positive value. The term -OV generally represents a value in the in the overflow pattern with the sign bit "s" having the value "one," indicating a negative value. The term +UN generally represents a value in the underflow pattern with the sign bit "s" having the value "zero," indicating a positive value. Finally, the term -UN generally represents a value in the underflow pattern with the sign bit "s" having the value "one," indicating a negative value. [075] Additionally, the following table defines certain other finite nonzero numbers, such as +TINY, -TINY, +HUGE, and -HUGE as follows:
TABLE 6
0 00000000 00000000000000000000010 +TINY
1 00000000 00000000000000000000010 -TINY
0 11111110 11111111111111111111110 +HUGE
1 11111110 11111111111111111111110 -HUGE
[076] The numbers +TINY, -TINY, +HUGE, and -HUGE are helpful in further defining +OV, -OV, +UN, and -UN. For example, +OV can be deemed to refer to "some (or any) value that is strictly between +HUGE and +∞" and + UN can be deemed to refer to "some (or any) value that is strictly between +0 and +TINY." Similarly, -OV can be deemed to refer to "some (or any) value that is strictly between -HUGE and -α>" and -UN can be deemed to refer to "some (or any) value that is strictly between -0 and -TINY." These names will be used in the following description to aid in the description of how the exemplary functional units 41-47 operate and process a floating point operand having status information embedded within it.
Adder Unit 41
[077] An exemplary adder unit 41 performs several types of operations, which may include but is not limited to addition of two operands, negation of one operand, subtraction of one operand from another operand, and absolute value of one operand. Generally, the negation operation is not affected by the rounding mode, and the result is always a copy of the operand with its sign reversed, even if the operand is in the NaN format 66. In the subtraction operation, the adder unit 41 generates the result as the sum of the operand that is the minuend and the negative of the operand that is the subtrahend. Essentially, the negative of the operand that is the subtrahend being the result of the negation operation. Effectively, the adder unit 41 performs a subtraction operation by performing a negation operation on the operand that is the subtrahend and an addition operation in connection with the operand that is the minuend and the result of the negation operation. The absolute value operation is also not affected by the rounding mode, and the result is a copy of the operand with its sign made positive, even if the operand is a NaN.
[078] Generally, results generated by the adder unit 41 in connection with an addition operation are described in FIG. 5. In the table depicted in FIG. 5, "+P" or "+Q" means any finite positive nonzero representable value other than +UN and +OV. The label "-P" or "-Q" means any finite negative nonzero representable value other than -UN and -OV. Additionally, "NaN" means any value whose exponent field is 11111111 , other than one of the values represented by +∞ and -α>.
[079] Furthermore, the following is a key u symbols in the table depicted in FIG. 5:
(a) The result is -α>, with the five least significant bits fhM ---fhh of the fraction field of the result being the bitwise OR of the five least significant bits flsb+i • • fhb of the fraction fields of the two operands.
(b) The result -QO, with the five least significant bits fhM ■ ■ ■ flsb of the result being the bitwise OR of the five least significant bits fhb+A "X b of the fraction of the infinite operand with bit pattern 01001 (to indicate overflow and inexact conditions).
(c) The result is -∞, with the five least significant bits fhM ---fhb of the fraction field of the result equal to the five least significant bits fhM ■ ■ ■ fhb of the fraction field of the infinite operand. If the other operand is -UN or +UN, it is intentional in the present embodiment that the low five bits of the -∞ operand not be OR- ed with 0010 to indicate underflow and inexact conditions.
(d) For "round toward positive infinity," the result is +∞; for "round toward negative infinity," the result is -oo. In either of these two cases, the five least significant bits fliM•flsb of the fraction field of the result are the bitwise OR of the five least significant bits flsM -fhb of the fraction fields of the two operands. For all other rounding modes, the result is the positive NaN value 0 11111111 1000000000000000101 ouzx (to indicate "infinity minus infinity" with the invalid operation flag set). The four least significant bits flib+3 ---fhb of the fraction field of the result are the bitwise OR of the four least significant bits flsM • - ω of the fraction fields of the two operands.
(e) The result is a copy of the NaN operand, except that the five least significant bits f,sM ---f,ih of the fraction field of the result are the bitwise OR of the five least significant bits fhM - --fhb of the fraction fields of the two operands.
(f) For "round toward plus infinity," the result is the same as if -OV were replaced by -HUGE and +UN were replaced by +TINY, i.e., the result will be 1 11111110 11111111111111111111110) For all other rounding modes, the result is -OV.
(g) For "round toward plus infinity," the result is the same as if -OV were replaced by -HUGE. For all other rounding modes, the result is -OV.
(h) For "round toward plus infinity," the result is +OV. For "round toward minus infinity," the result is -OV. For all other rounding modes, the result is the positive NaN value 0 11111111 10000000000000001111001 , which indicates "OV minus OV" with the invalid operation "n," overflow "o," and inexact "x" flags set.
(i) The result is +∞, with the five least significant bits fhb+A ■ ■ -fhb of the fraction field of the result being the bitwise OR of the five least significant bits fhM ---fhb of the fraction field of the infinite operand with 01001 (to indicate overflow and inexact conditions). (j) The result is a copy of the NaN operand, except that the five least significant bits fhb+χ --fhh of the fraction field of the result
* are the bitwise OR of the five least significant bits fhM ■ ■ ■f,ib of the fraction field of the NaN operand with 01001 (to indicate overflow and inexact conditions).
(k) As computed in accordance with IEEE Std. 754, except that the result is-OV if overflow occurs or if the rounding mode is "round toward minus infinity" and the mathematical sum is less than - HUGE.
(I) For "round toward plus infinity," the result is the same as if -UN were replaced by -0; for "round toward minus infinity," the result is the same as if -UN were replaced by -TINY. For all other rounding modes, the result is as computed in accordance with IEEE Std. 754.
(m) For "round toward plus infinity," the result is the same as if +UN were replaced by +TINY. For "round toward minus infinity," the result is the same as if +UN were replaced by +0. For all other rounding modes, the result is as computed in accordance with IEEE Std. 754.
(n) As computed in accordance with IEEE Std. 754. If IEEE Std. 754 would compute the result as 1 00000000 00000000000000000000001 , then an embodiment of the present invention would do the same. However, the embodiment of the present invention calls this -UN and considers it to be underflow. If IEEE Std. 754 would compute the result as 0 00000000 00000000000000000000001 , then an embodiment of the present invention would do the same. However, the embodiment of the present invention calls this +UN and considers it to be underflow. (o) For "round toward minus infinity," the result is the same as if +OV were replaced by +HUGE. For all other rounding modes, the result is +OV.
(p) The result is +∞ with the five least significant bits flsb+4 • ••flsb of the fraction field of the result being equal to the five least significant bits flsM ---fhb of the fraction field of the infinite operand. If the other operand is -UN or +UN, it is intentional in this embodiment that the five least significant bits fhM ---f,sb of the fraction field of the +∞ operand not be OR-ed with 00101 to indicate underflow and inexact conditions, (q) The result is a copy of the NaN operand, (r) For "round toward minus infinity," the result is the same as if each -UN were replaced by -TINY (i.e., the result will be -
2TINY). For all other rounding modes, the result is -UN. (s) For "round toward minus infinity," the result is -UN. For all other rounding modes, the result is +UN. (t) For "round toward minus infinity," the result is the same as if
+OV were replaced by +HUGE and -UN were replaced by -
TINY, i.e., the result will be 0 11111110
11111111111111111111110. For all other rounding modes, the result is +OV. (u) The result is a copy of the NaN operand, except that the five least significant bits fhM - --fhb of the fraction field of the result are the bitwise OR of the five least significant bits fl M ---fhb of the fraction field NaN operand with 00101 (to indicate underflow and inexact conditions), (v) For "round toward minus infinity," the result is -0. For all other rounding modes, the result is +0. This is as in accordance with IEEE Std. 754. (w) For "round toward plus infinity," the result is the same as if each +UN were replaced by +TINY, i.e., the result will be +2*TINY. For all other rounding modes, the result is +UN.
(x) As computed in accordance with IEEE Std. 754, except that if overflow occurs, or if the rounding mode is "round toward plus infinity" and the mathematical sum is greater than +HUGE, the result is +OV.
(y) The result is +00, with the five least significant bits flib+i -~f,ib of the fraction field of the result being the bitwise OR of the five least significant bits fhM ---fhb of the fraction fields of the operands, (z) The result is a copy of the NaN operand that has the larger value in its fraction field, except that the five least significant bits fhb+X"fhb °f the fraction field of the result are the bitwise OR of the five least significant bits flsM ---flsb of the operands.
Additionally, the sign bit "s" of the result is "one" if and only if the sign bits of the two NaN operands are both "ones."
[080] It will be appreciated by one with skill in the art that, with exemplary adder unit 41 operating according to the table depicted in FIG. 5, addition is commutative, even those cases where one or both operands are NaN values.
Multiplier Unit 42
[081] Generally, results generated by an exemplary multiplier 42 are described in the table depicted in FIG. 6. In the table of FIG. 6, the term "+P" or "+Q" means any finite positive representable value greater than "one," other than +OV. The term "-P" or "-Q" means any finite negative representable value less than negative-one, other than -OV. The term "+R" or "+S" means any positive non-zero value less than "one," other than +UN. The term "-R" or "-S" means any negative non-zero representable value greater than negative-one, other than -UN. Finally, "NaN" means any value o
whose exponent field is 11111111 , other than one of the values represented by +oo and -∞.
[082] Furthermore, the following is a key to symbols in the table depicted in FIG. 6:
(a) The result is +∞, with the five least significant bits flsb+i flsb of the fraction field of the result being the bitwise OR of the five least significant bits flsM ■ ■ ■ flsh of the fraction fields of the two operands.
(b) The result is +∞, with the five least significant bits flsb+Λ ---fhb of the fraction field of the result being the bitwise OR of the five least significant bits fhM - --f,sh of the fraction field of the infinite operand with 01001 (to indicate overflow and inexact conditions).
(c) The result is +∞, with the five least significant bits flib+4 ■ ■ ■ fhb of the fraction field of the result being equal to the five least significant bits fhM ■ ■ ■ flsb of the fraction field of the infinite operand.
(d) The result is +∞, with five least significant bits flsM ■ ■ -fhb of the fraction field of the result being the bitwise OR of the five least significant bits f h+χ --fhb of the fraction field of the infinite operand with 00101 (to indicate underflow and inexact conditions).
(e) For "round toward plus infinity," the result is +∞, with the five least significant bits fhM■ ■fl h of the fraction field of the result being equal to the five least significant bits flsb+4 ■ ■•flsb of the fraction field of the infinite operand. For "round toward minus infinity," the result is +0. For all other rounding modes, the result _.
is a positive NaN value 0 11111111
1000000000000001001 ouzx (to indicate "zero times infinity" with the invalid operation flag set), where "ouzx" are the four least significant bits fhM ---flsb of the fraction field of the infinite operand.
(f) For "round toward plus infinity," the result is -0. For "round toward minus infinity," the result is -oo, with five least significant bits fhM ---fhb of the fraction field being equal to the five least significant bits f,5b+4 ---f,sb of the fraction field of the infinite operand. For all other rounding modes, the result is a negative NaN value 1 11111111 1000000000000001001 ouzx (to indicate "zero times infinity" with the invalid operation flag set), where "ouzx" are the four least significant bits fhb+3 ---fhb of the fraction field of the infinite operand.
(g) The result is -oo, with five least significant bits fhb+4 ■ ■ ■ fhb of the fraction field of the result being the bitwise OR of the five least significant bits fιsb+4 ---fhb of the fraction field of the infinite operand with 00101 (to indicate underflow and inexact conditions).
(h) The result is -oo, with the five least significant bits fl b+4 ■ ■ ■ flib of the fraction field of the result being equal to the five least significant bits f,sb+4 ---fhb of the fraction field of the infinite operand.
(i) The result is -oo, with the five least significant bits f,iM ---fhb of the fraction field of the result being the bitwise OR of the five least significant bits fhb+4 ---fhb of the fraction field of the infinite operand with 01001 (to indicate overflow and inexact conditions).
(j) The result is -oo, with the five least significant bits flsb+4■ ■ fhb of the fraction field of the result being the bitwise OR of the five least significant bits f,ib+4 ---f,ib of the fraction fields of the operands.
(k) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative, and the five least significant bits fhb+4 ---flib of the fraction field of the result being the bitwise OR of the five least significant bits f,ib+4 ---f,sh of the fraction fields of the operands.
(I) For "round toward minus infinity," the result is the same as if - OV were replaced by -HUGE. For all other rounding modes, the result is +OV.
(m) For "round toward plus infinity," the result is +OV. For "round toward minus infinity," the result is +UN; for all other rounding modes, the result is the positive NaN value 0 11111111 1000000000000010111101 (to indicate "UN times OV" with the invalid operation, overflow, underflow, and inexact flags set).
(n) For "round toward plus infinity," the result is -UN. For "round toward minus infinity," the result is -OV. For all other rounding modes, the result is the negative NaN value 1 11111111 10000000000000010111101 (to indicate "UN times OV" with the invalid operations, overflow, underflow, and inexact flags set).
(o) For "round toward plus infinity," the result is the same as if -OV were replaced by -HUGE. For all other rounding modes, the result is -OV. (p) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative and the five least significant bits flsb+4 - --flsb of the fraction field of the result are OR-ed with 01001 (to indicate overflow and inexact conditions).
(q) As computed in accordance with IEEE Std. 754, except that if overflow occurs or if the rounding mode is "round toward plus infinity" and the mathematical product is greater than +HUGE, the result is +OV. If underflow occurs and a computation in accordance with IEEE Std. 754 would result in the value +0 or if the rounding mode is "round toward minus infinity" and the mathematical product is less than +TINY, the result is +UN.
(r) For "round toward plus infinity," the result is the same as if -UN were replaced by -TINY. For all other rounding modes, the result is as computed in accordance with IEEE Std. 754.
(s) For "round toward minus infinity," the result is the same as if +UN were replaced by +TINY. For all other rounding modes, the result is as computed in accordance with IEEE Std. 754.
(t) As computed in accordance with IEEE Std. 754, except that if overflow occurs or if the rounding mode is "round toward minus infinity" and the mathematical product is less than -HUGE, the result is -OV. Additionally, if underflow occurs and a computation in accordance with IEEE Std. 754 would provide the result -0 or if the rounding mode is "round toward plus infinity" and the mathematical product is greater than -TINY, the result is -UN.
(u) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative. (v) For "round toward plus infinity," the result is the same as if +OV were replaced by +HUGE. For all other rounding modes, the result is -OV.
(w) For "round toward minus infinity," the result is the same as if - UN were replaced by -TINY. For all other rounding modes, the result is as computed in accordance with IEEE Std. 754.
(x) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative, and the five least significant bits flsb+4 ---f,sb of the fraction field of the result are
OR-ed with 00101 (to indicate underflow and inexact conditions).
(y) For "round toward plus infinity," the result is the same as if +UN were replaced by +TINY. For all other rounding modes, the result is as computed in accordance with IEEE Std. 754.
(z) For "round toward minus infinity," the result is the same as if +OV were replaced by +HUGE. For all other rounding modes, the result is +OV.
(@) The result is a copy of the NaN operand that has the larger value in the fraction field, except that the five least significant bits fιib+4 ---f,sb of the fraction field of the result are the bitwise
OR of the five least significant bits fhb+χ--fhh of the fraction field of the operands. Additionally, the sign bit of the result is 1 , indicating a negative result, if and only if the sign bits of the two NaN operands differ. [083] It will be appreciated by one skilled in the art that, with the exemplary multiplier unit 42 operating according to the table depicted in FIG.
6, multiplication is commutative, even those cases where one or both operands are NaN values. Divider Unit 43
[084] As noted above, the exemplary divider unit 43 can perform two types of operations. A division operation is one in which the result Q=x/y, where "x" and "y" are operands. A remainder operation is one in which the result REM (x,y)=x-(y/n), where "n" is the integer nearest to the value x/y.
[085] Generally, results generated by the exemplary divider 43 in connection with division operations are described in the table depicted in FIG. 7 and results generated by the exemplary divider 43 in connection with remainder operations are described in the table depicted in FIG. 8. In one or both of those tables, the term "+P" or "+Q" means any finite positive representable value greater than "one," other than +OV. The term "-P" or "-Q" means any finite negative representable value less than negative-one, other than -OV. The term "+R" or "+S" means any positive non-zero value less than "one," other than +UN. The term "-R" or "-S" means any negative nonzero representable value greater than negative-one, other than -UN. Finally, the term "NaN" means any value whose exponent field is 11111111 , other than one of the values represented by +oo and -∞.
[086] Furthermore, the following is a key to symbols in the table depicted in FIG. 7 (division operations):
(a) For "round toward plus infinity," the result is +oo with the five least significant bits fkb+4 - --fhb of the fraction field being equal to the bitwise OR of the five least significant bits fhb+ •••fhb of the fraction fields of the two operands. For "round toward minus infinity," the result is +0. For all other rounding modes, the result is a positive NaN value 0 11111111 1000000000000001001 ouzx (to indicate "infinity divided by infinity" with the invalid operation flag set), where ouzx is the bitwise OR of the four least significant bits fhM ---f, h of the fraction fields of the operands. (b) The result is +00, with the five least significant bits flsb+4 ■ ■ ■ fhb of the fraction field of the result being equal to the bitwise OR of the five least significant bits fhh+4 ---fl h of the fraction field of the infinite operand with 01001 (to indicate overflow and inexact conditions).
(c) The result is +00, with the five least significant bits f,sb+4 ■ ■ ■ fhb of the fraction field of the result being equal to the five least significant bits /ΛΛ+4 ΛΛ of the fraction field of the infinite operand.
(d) The result is +00, with the five least significant bits fl b+4 fhb of the fraction field of the result being the bitwise OR of the five least significant bits f,sb+4 ---fhb of the fraction field of the infinite operand with 00101 (to indicate underflow and inexact conditions).
(e) The result is -00, with the five least significant bits flib+4 ■ ■ ■fhb of the fraction field of the result being equal to the five least significant bits fhb+ ---fhb of the fraction field of tine infinite operand.
(f) The result is -00, with the five least significant bits fhb+4 ---f,sb of the fraction field of the result being the bitwise OR of the five least significant bits flsM ---fhb of the fraction field of the infinite operand with 00101 (to indicate underflow and inexact conditions).
(g) The result is -00, with five least significant bits //4i+4 • ••/ω of the fraction field of the result being the bitwise OR of the five least significant bits fhb+4 ---f,sb of the fraction field of the infinite operand with 01001 (to indicate overflow and inexact conditions).
(h) For "round toward plus infinity," the result is -0. For "round toward minus infinity," the result is -oo with the five least significant bits f,sb+4 ---fhb of the fraction field being equal to the bitwise OR of the five least significant bits fhb+4 ■ ■ ■ flsb of the fraction fields of the two operands. For all other rounding modes, the result is a negative NaN value 1 11111111 1000000000000001001 ouzx (to indicate "infinity divided by infinity" with the invalid operation flag set), where "ouzx" is the bitwise OR of the four least significant bits flib+3 ■ ■ ■ fhb of the fraction fields of the two operands.
(i) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative, and the five least significant bits fsb+4 ---fhb of the fraction field of the result are the bitwise OR of the five least significant bits flsb+A ---f,sb of the fraction fields of the operands.
(j) For "round toward plus infinity," the result is +OV. For "round toward minus infinity," the result is +UN. For all other rounding modes, the result is the positive NaN value 0 11111111 10000000000000101111001 to indicate "OV divided by OV" with the invalid operation, overflow, and inexact flags set.
(k) For "round toward minus infinity," the result is the same as if - OV were replaced by -HUGE. For all other rounding modes, the result is +OV.
(I) The result is the +oo value 0 11111111
00000000000000000001011 to indicate overflow, division by zero, and inexact. (m) The result is the -oo value 1 1111111
000000000000000000001011 to indicate overflow, division by zero, and inexact.
(n) For "round toward plus infinity," the result is the same as if -OV were replaced by -HUGE. For all other rounding modes, the result is -OV.
(o) For "round toward plus infinity," the result is -UN; for "round toward minus infinity," the result is -OV. For all other rounding modes, the result is the negative NaN value 1 11111111 10000000000000101111001 to indicate "OV divided by OV" with the invalid operation, overflow, and inexact flags set.
(p) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative, and the five least significant bits fhb+4 ---fhb of the fraction field of the results are OR-ed with 01001 (to indicate overflow and inexact conditions).
(q) For "round toward plus infinity," the result is the same as if -OV were replaced by -HUGE, except that if underflow occurs and a computation in accordance with IEEE Std. 754 would have the result +0, the result is +UN. For all other rounding modes, the result is +UN.
(r) As computed in accordance with IEEE Std. 754, except that if overflow occurs or if the rounding mode is "round toward plus infinity" and the mathematical quotient is greater than +HUGE, the result is +OV. If underflow occurs and a computation in accordance with IEEE Std. 754 would provide the result +0, or if the rounding mode is "round toward minus infinity" and the mathematical quotient is less than +TINY, the result is +UN. (s) The result is the +00 value 0 11111111
00000000000000000000010 to indicate division by zero.
(t) The result is the -00 value 1 11111111
00000000000000000000010 to indicate division by zero.
(u) As computed in accordance with IEEE Std. 754, except that if overflow occurs, or if the rounding mode is "round toward minus infinity" and the mathematical quotient is less than -HUGE, the result is -OV. If underflow occurs and a computation in accordance with IEEE Std. 754 would provide the result -0, or if the rounding mode is "round toward plus infinity" and the mathematical quotient is greater then -TINY, the result is -UN.
(v) For "round toward minus infinity," the result is the same as if +OV were replaced by +HUGE, except that if underflow occurs and a computation in accordance with IEEE Std. 754 would have result -0, the result is -UN. For all other rounding modes, the result is -UN.
(w) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative.
(x) For "round toward minus infinity," the result is the same as if - UN were replaced by -TINY, except that if overflow occurs, the result is +OV. For all other rounding modes, the result is +OV.
(y) For "round toward plus infinity," the result is the same as if +UN were replaced by +TINY, except that if overflow occurs, the result is -OV. For all other rounding modes, the result is -OV.
(z) For "round toward plus infinity," the result is the same as if -UN were replaced by -TINY. For all other rounding modes, the result is +UN. (1) For "round toward plus infinity," the result is +OV; for "round toward minus infinity," the result is +UN. For all other rounding modes, the result is the positive NaN value 0 11111111 10000000000000101010101 to indicate "UN divided by UN" with the invalid operation, underflow, and inexact flags set.
(2) The result is the +oo value 0 11111111 00000000000000000000111 to indicate underflow, division by zero, and inexact conditions.
(3) The result is the -oo value 1 11111111 00000000000000000000111 to indicate underflow, division by zero, and inexact conditions.
(4) For "rounding toward plus infinity," the result is -UN; for "round toward minus infinity," the result is -OV. For all other rounding modes, the result is the negative NaN value 1 11111111 10000000000000101010101 to indicate "UN divided by UN" with the invalid operation, underflow, and inexact flags set.
(5) For "round toward minus infinity," the result is the same as if - UN were replaced by -TINY. For all other rounding modes, the result is -UN.
(6) The result is a copy of the NaN operand, except that its sign is reversed if the other operand is negative, and the five least significant bits f,ib+4 ---fhb of the fraction field of the result are OR-ed with 00101 to indicate underflow and inexact conditions.
(7) For "round toward plus infinity," the result is +oo with the five least significant bits fhb+4 ---f,ib of the fraction field all having the value zero; for "round toward minus infinity," the result is +0. For all other rounding modes, the result is the positive NaN value 0 11111111 10000000000000100010000 to indicate "zero divided by zero" with the invalid operation flag set.
(8) For "round toward plus infinity," the result is -0. For "round toward minus infinity," the result is -oo with five least significant bits flsb+4•fιsb of the fraction field all having the value zero. For all other rounding modes, the result is the negative NaN value 1 11111111 1000000000000100010000 to indicate "zero divided by zero" with the invalid operation flag set.
(9) For "round toward minus infinity," the result is the same as if +UN were replaced by +TINY. For all other rounding modes, the result is -UN.
(@) For "round toward plus infinity," the result is the same as if +UN were replaced by +TINY. For all other rounding modes, the result is +UN.
(#) For "round toward minus infinity," the result is the same as if - OV were replaced by -HUGE, except that if underflow occurs and a computation in accordance with IEEE Std. 754 would have the result value -0, the result is -UN. For all other rounding modes, the result is -UN.
($) For "round toward plus infinity," the result is the same as if -UN were replaced by -TINY, except that if overflow occurs, the result is -OV. For all other rounding modes, the result is -OV.
(%) For "round toward minus infinity," the result is the same as if
+UN were replaced by +TINY, except that if overflow occurs, the result is +OV. For all other rounding modes, the result is +OV.
(Λ) For "round toward plus infinity," the result is the same as if +OV were replaced by +HUGE, except that if underflow occurs and a computation in accordance with IEEE Std. 754 would have the result +0, the result is +UN. For all other rounding modes, the result is +UN. (&) For "round toward plus infinity," the result is the same as if +OV were replaced by +HUGE. For all other rounding modes, the result is -OV.
(*) For "round toward minus infinity," the result is the same as if +OV were replaced by +HUGE; for all other rounding modes, the result is +OV.
(~) The result is a copy of the NaN operand that has the larger value in its fraction field, except that the five least significant bits fhb+4 ■ ■ -flib of the fraction field of the result are the bitwise OR of the five least significant bits fhb+4 - --fhb of the fraction fields of the operands. Additionally, the sign bit of the result is 1 if and only if the sign bits of the two NaN operands differ. [087] Preliminarily, remainder operations are not affected by the rounding mode. The following is a key to symbols in the table depicted in
FIG. 8 (remainder operations) as follows:
(a) The result is a NaN value s 11111111 1000000000000101001 ouzx (to indicate "remainder of infinity" with the invalid operation flag set), where "ouzx" is the bitwise OR of the four least significant bits fhM ---flsb of the fraction fields of the operand. The sign of the result is the same as the sign of the first operand.
(b) The result is a NaN value s 11111111 100000000000010111 nouzx (to indicate "remainder of infinity by OV"), where "nouzx" is the bitwise OR of the five least significant bits fhb+4 ■ ■ ■fhb of the fraction field of the infinite operand with
11001 (to indicate invalid operation, overflow, and inexact conditions). The sign of the result is the same as the sign of the first operand. (c) The result is a NaN value s 11111111 1000000000000101001 ouzx (to indicate "remainder of infinity" with the invalid operation flag set), where "ouzx" is the low four bits of the infinite operand. The sign of the result is the same as the sign of the first operand.
(d) The result is a NaN value s 11111111 1000000000000010110nouzx (to indicate "remainder of infinity by UN"), where "nouzx" is the bitwise OR of the five least significant bits fl b+4 ---flsb of the fraction field of the infinite operand with 10101 (to indicate invalid operation, underflow, and inexact conditions). The sign of the result is the same as the sign of the first operand.
(e) The result is a NaN value s 11111111
1000000000000101011 ouzx (to indicate "remainder of infinity by zero" with the invalid operation flag set), where "ouzx" is the four least significant bits fhb+3 ~-fhb of the fraction field of the infinite operand. The sign of the result is the same as the sign of the first operand.
(f) The result is a copy of the NaN operand, except that the sign of the result is the same as the sign of the first operand, and the five least significant bits flib+4 fhb of the fraction field of the result are the bitwise OR of the five least significant bits
ΛA+4 - • •f b °f the fraction fields of the two operands.
(g) The result is a copy of the first operand (either -OV or +OV). One might expect the result to be a NaN value s 11111111
100000000000011000nouzx (to indicate "remainder of OV"), where "nouzx" is the bitwise OR of the five least significant bits ΛΛ+4 • • •//,* °f the fraction field of the infinite operand with 11001 (to indicate invalid operation, overflow, and inexact), and where the sign of the result is the same as the sign of the first operand. However, the actual result is preferably a copy of the first operand, in order to preserve an important numerical identity, which is that REM(x, +lnf) = REM(x, -Inf) = x.
(h) The result is a NaN value s 11111111
10000000000001101111001 (to indicate "remainder of OV by OV" with the invalid operation, overflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(i) The result is a NaN value s 11111111
10000000000001100011001 (to indicate "remainder of OV" with the invalid operation, overflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(j) The result is a NaN value s 11111111
10000000000001101011101 (to indicate "remainder of OV by UN" with the invalid operation, overflow, underflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(k) The result isa NaN value s 11111111
10000000000001100111001 (to indicate "remainder of OV by zero" with the invalid operation, overflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(I) The result is a copy of the NaN operand, except that the sign of the result is the same as the sign of the first operand, and the five least significant bits fhb+4 ---fhb of the fraction field of the result are OR-ed with 01001 (to indicate overflow and inexact conditions). (m) The result is a NaN value s 11111111
10000000000001001111001 (to indicate "remainder by OV" with the invalid operation, overflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(n) The result is calculated in accordance with IEEE Std. 754. In this case, the sign of the result is not necessarily the same as the sign of the first operand.
(o) The result is a NaN value s 11111111
10000000000001001010101 (to indicate "remainder by UN" with the invalid operation, underflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(p) The result is a NaN value s 11111111
10000000000001000110000 (to indicate "remainder by zero" with the invalid operation flag set). The sign of the result is the same as the sign of the first operand.
(q) The result is a copy of the NaN operand, except that the sign of the result is the same as the sign of the first operand.
(r) The result is a NaN value s 11111111
10000000000001001111011 (to indicate "remainder by OV" with the invalid operation, overflow, underflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(s) The result is a NaN value s 11111111
10000000000001000110101 (to indicate "remainder by zero" with the invalid operation, underflow, and inexact flags set). The sign of the result is the same as the sign of the first operand.
(t) The result is a copy of the NaN operand, except that the sign of the result is the same as the sign of the first operand, and the low five bits of the result are OR-ed with 00101 (to indicate underflow and inexact conditions).
(u) The result is a copy of the NaN operand, except that the five least significant bits fhb+4 ---f,b of the fraction field of the result are the bitwise OR of the five least significant bits f, b+4 ---f,ib of the fraction fields of the operands.
(v) The result is a copy of the NaN operand, except that the five least significant bits fhh+4 ---fhb of the fraction field of the result are OR-ed with 01001 (to indicate overflow and inexact conditions).
(w) The result is a copy of the NaN operand.
(x) The result is a copy of the NaN operand, except that the five least significant bits fhM ---fhh of the fraction field of the result are OR-ed with 00101 (to indicate underflow and inexact conditions).
(y) The result is a copy of the NAN operand whose fraction field represents the larger value, except that the five least significant bits / Λ+4 ■ •/ΛΛ of the fraction field of the result are the bitwise OR of the low five bits of each operand and the sign of the result is the same as the sign of the first operand. Square Root Unit 44
[088] Results generated by an exemplary square root unit are described in the table depicted in FIG. 9. In the table of FIG. 9, the term "+P" means any finite positive nonzero representable value other than +UN and +OV. The term "-P" means any finite negative nonzero representable value other than -UN and -OV. Finally, the term "NaN" means any value whose exponent field is 11111111 , other than one of the values represented by +∞ AND -oo. Maximum/Minimum Unit 45
[089] Generally, results generated by an exemplary maximum/minimum unit 45 in connection with a maximum operation, in which the unit 45 determines the maximum of two operands, are described in the table depicted in FIG. 10. Results generated in connection with a minimum operation, in which the unit 45 determines the minimum of two operands, are described in the table depicted in FIG. 11. In those tables, "+P" or "+Q" means any finite positive nonzero representable value other than +UN and +OV. Additionally, "-P" or "-Q" means any finite negative nonzero representable value other than -UN and -OV. Further, "NaN" means any value whose exponent field is 1111111 , other than one of the values represented by +oo and -oo.
[090] In an embodiment of the present invention, neither the maximum operation nor the minimum operation is affected by the rounding mode. In addition, both the maximum operation and the minimum operation are commutative, even when one or both operands are NaN value.
[091] The following is a key to symbols in the table depicted in FIG. 10 (the maximum operation):
(a) The result is -oo with the five least significant bits fhb+4 ■ ■ -flsb of the fraction field of the result being the bitwise OR of the five least significant bits fhb+4 ---fhb of the fraction field of the two operands.
(b) The result is +oo with the five least significant bits fl +4 ■ ■ ■flsb of the fraction field of the result being the five least significant bits f b+4 ■ ■ •fisb of the fraction field of the +oo operand. The other operand does not affect the five least significant bits fhh+4 ■ ■ ■ fhb of the fraction field of the result.
(c) The result is a copy of the NaN operand, except that the five least significant bits fhb+χ--fhb of the fraction field of the result are the bitwise OR of the five least significant bits fhb+4 f,b of the fraction field of the two operands.
(d) The result is a copy of the Nan operand, except that the five least significant bits fhb+4 ---f,sb of the fraction field of the result are the bitwise OR of the five least significant bits flsb+4 ■ ■ ■f,sb of the fraction field of the NaN operand with 01001 (to indicate overflow and inexact conditions).
(e) The result is a copy of whichever operand has the smaller magnitude.
(f) The result is a copy of the Nan operand.
(g) The result is a copy of the NaN operand, except that the five least significant bits f, b+4 ---fhb of the fraction field of the result are the bitwise OR of the five least significant bits fl M fl h of the fraction field of the NaN operand with 00101 (to indicate underflow and inexact conditions), (h) The result is a copy of whichever operand has the larger magnitude, (i) The result is +oo with the five least significant bits flsb+4 • • -/ΛA of the fraction field of the result being the bitwise OR of the five least significant bits fhb+4 ■ ■ -flib of the fraction fields of the two operands, (j) The result is a copy of whichever NaN operand has the larger fraction, except that the five least significant bits fhb+4 ••fhb of the fraction field of the result are the bitwise OR of the five least significant bits fhM ---fhb of the fraction fields of the two operands. The sign of the result is the sign of whichever operand has the larger fraction but, if the fractions of the two NaN operands are the same and their signs differ, the sign of the result is +. [092] The following is a key to symbols in the table depicted in FIG. 11 (the minimum operation):
(a) The result is -oo with the five least significant bits flsb+4■ ■ fhb of the fraction field bits of the result being the bitwise OR of the five least significant bits flib+4 ■ ■ -fhb of the fraction fields of the two operands.
(b) The result is →o with the five least significant bits fliM ■ ■ ■flib of the fraction field of the result being equal to the five least significant bits fhb+4 fhb of the fraction field of the -oo operand.
The other operand does not affect the five least significant bits ΛΛ+4 "fhb °f the fraction field of the result.
(c) The result is a copy of the NaN operand, except that the five least significant bits fhb+4 ■ ■ -fhh of the fraction field of the result are the bitwise OR of the five least significant bits fhM ■ ■ ■ fhb of the fraction fields of the two operands.
(d) The result is a copy of the NaN operand, except that the five least significant bits fhM ---fhb of the fraction field of the result are the bitwise OR of the five least significant bits flib+4 - --fib of the fraction field of the NaN operand with 01001 (to indicate overflow and inexact conditions).
(e) The result is a copy of whichever operand has the larger magnitude.
(f) The result is a copy of the NaN operand.
(g) The result is a copy of the NaN operand, except that the five least significant bits fhb+4 ---f b of the fraction field of the result are the bitwise OR of the five least significant bits fhb+4 ■ ■ ■fhb of the fraction field of the NaN operand with 00101 (to indicate underflow and inexact conditions). (h) The result is a copy of whichever operand has the smaller magnitude, (i) The result is +00 with the five least significant bits fhM ---fhb of the fraction field of the result being the bitwise OR of the five least significant bits fhM■ ■ fhb of the fraction field of the two operands, (j) The result is a copy of whichever NaN operand has the larger (not smaller) fraction, except that the five least significant bits f b+X" fhb °f the fraction field of the result are the bitwise OR of the five least significant bits f,sh+4 ---f,sb of the fraction field of the two operands. The sign of the result is the sign of whichever operand has the larger fraction. However, if the fractions of the two NaN operands are the same and their signs differ, the sign of the result is -. Comparator Unit 46
[093] As noted above, exemplary comparator unit 46 receives two operands and generates a result that indicates whether the operands are equal, and, if not, which operand is larger. Generally, the comparison is not affected by the rounding mode. In a comparison operation consistent with an embodiment of the present invention,
(i) two positive operands in the infinity format 65 are equal regardless of the values of the flags "nouzx," (ii) a positive operand in the infinity format 65 is greater than an operand in any other format, except for an operand that is in the
NaN format 66, (iii) two negative operands in the infinity format 65 are equal regardless of the values of the flags "nouzx," (iv) a negative operand in the infinity format 65 is less than an operand in any other format, except for an operand that is in the
NaN format 66. (v) an operand in the NaN format 66 is unordered (i.e., neither greater than, less than, nor equal to) another operand in any format, including another operand in the NaN format 66, and (vi) operands in the format other than the infinity format 65 and NaN format 66 compare in accordance with IEEE Std. 754. Thus,
+UN is greater than +0 and less than +TINY; +OV is greater than +HUGE and less than +oo; and so on. Tester Unit 47
[094] A tester unit 47 receives a single floating-point operand and determines whether it has one of a selected set of status conditions. Based upon the determination, the tester unit 47 produces a signal to indicate whether or not the selected condition holds. In one embodiment of the present invention, the conditions include:
(i) the operand is in the NaN format 66;
(ii) the operand is in the infinity format 65;
(iii) the operand is in either the infinity format 65 or the NaN format
66; (iv) the operand is in the overflow format 64; (v) the operand is in the overflow format 64 or contains a set overflow flag "o"; (vi) the operand is in the underflow format 61 ; (vii) the operand is in the underflow format 61 or contains a set underflow flag "u"; (viii) the operand is in the zero format 60; (ix) the operand is in the zero format 60 and the sign bit "s" is "zero"
(representing value +0); (x) the operand is in the zero format 60 and the sign bit "s" is "one"
(representing value -0); (xi) the operand represents a non-zero number whose magnitude is less than 2"126; (xii) the operand contains a set invalid-operation flag "n"; (xiii) the operand contains a set divide-by-zero flag "z"; (xiv) the operand represents a numerical value strictly between -OV and +OV; (xv) the operand represents a numerical value strictly between -OV and +OV but is not +UN or -UN; (xvi) the operand is in any format 60 through 66 and contains a sign bit "s" that is "one"; and (xvii) all 11 non-trivial disjunctions ("OR") of subsets of: a = (the operand is in overflow format 64 or contains a set overflow flag "o") b = (the operand is in underflow format 61 or contains a set underflow flag "u") c = (the operand contains a set invalid operation flag "n") d = (the operand contains a set divide-by-zero flag "z") In other words, the subsets include (a OR b), (a OR c), (a OR d), (b OR c), (b OR d), (c OR d), (a OR b OR c), (a OR b OR d), (a OR c OR d), (b OR c OR d) and (a OR b OR c OR d).
[095] The exemplary tester unit 47 can generate one or more result signals representing each of those conditions, which signals will be provided to the control unit 50. In one embodiment, the result signal is provided directly back to control unit 50. In an alternative embodiment, the tester unit 47 provides the result signal on a result bus to register 51 for future access by the control unit 50. In another embodiment, the control unit 50 can select one of these signals and use the value of the selected signal to control the future behavior of the program. For example, the control unit 50 may control a functional unit's operation on one or more operands and use the result of the operation to complete processing of a conditional floating point operation. Examples of conditional floating point operations include but are not limited to conditional trap instructions and conditional branch instructions with multiple possible outcomes that dynamically depend upon the basis of the results of the conditional operations being processed.
[096] It will be appreciated that a number of modifications may be made to the exemplary floating point unit 40 described above. For example, the exemplary floating point unit 40 may include additional or fewer functional units than those described herein. In addition, although reference has been made to circuits described in other patent applications as being useful in the floating point unit 40, it will be appreciated that other circuits may be used to implement the principles of the present invention. Furthermore, although particular floating point status flags "n," "o," "u," "z" and "x" have been indicated as being provided, it will be appreciated that not all flags need to be provided, and that other flags, representing other conditions, may be used in addition or instead.
[097] The foregoing description has been limited to one or more specific embodiments of this invention. It will be appreciated that a system in accordance with the invention can be constructed in whole or in part from special purpose hardware or a general purpose computer system, or any combination thereof. Data structures used by such a system and described herein are illustrative and may be implemented in a variety of different ways in accordance with alternative embodiments of the present invention. Any portion of the system may also be controlled by a suitable program. Any program controlling such a system may, in whole or in part, comprise part of or be stored on the system in a conventional manner. Further, the program may, in whole or in part, be provided to the system over a network or other mechanism for transferring information in a conventional manner. In addition, it will be appreciated that the system may be operated and/or otherwise controlled by means of information provided by an operator using operator input elements (not shown) which may be connected directly to the system or which may transfer the information to the system over a network or other mechanism for transferring information in a conventional manner.
[098] Furthermore, those skilled in the art will appreciate that variations and modifications may be made to the invention, with the attainment of some or all of the advantages of the invention.
[099] Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention as defined solely by the following appended claims. Therefore, a true scope and spirit of the invention is indicated by the following claims.

Claims

WHAT IS CLAIMED IS:
1. A floating point Operand data structure used in floating point computations and processing within a processing device, comprising: a first portion of the data structure having floating point operand data; and a second portion of the data structure having embedded status information associated with at least one status condition of the floating point operand data.
2. The floating point operand data structure of claim 1 , wherein the at least one status condition is determined from the embedded status information without regard to memory storage external to the data structure.
3. The floating point operand data structure of claim 2, wherein the memory storage external to the data structure is a floating point status register.
4. The enhanced floating point operand data structure of claim 3, wherein the outcome of a conditional floating point instruction is based on the embedded status information without regard to contents of the floating point status register.
5. The floating point operand data structure of claim 1 , wherein the second portion of the data structure comprises at least one bit that is indicative of the at least one status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
6. The floating point operand data structure of claim 5, wherein the overflow status represents one in a group of a +OV status and a -OV status.
7. The floating point operand data structure of claim 6, wherein the overflow status is represented as a predetermined non-infinity numerical value.
8. The floating point operand data structure of claim 5, wherein the unde iow status represents one in a group of a +UV status and a -UV status.
9. The floating point operand data structure of claim 5, wherein the underflow status is represented as a predetermined non-zero numerical value.
10. The floating point operand data structure of claim 5, wherein the invalid status represents a not-a-number (NaN) status due to an invalid operation.
11. The floating point operand data structure of claim 10, wherein the second portion of the data structure comprises a plurality of bits indicative of a predetermined type of operand condition resulting in the NaN status.
12. The floating point operand data structure of claim 10, wherein addition, multiplication, maximum and minimum floating point operations on the data structure are commutative.
13. The floating point operand data structure of claim 5, wherein the infinity status represents one in a group of a positive infinity status and a negative infinity status.
14. The fioating point operand data structure of claim 1 , wherein the second portion of the data structure comprises a plurality of bits indicative of a predetermined type of operand condition resulting in the infinity status.
15. The floating point operand data structure of claim 1 , wherein the at least one status condition is associated with at least one floating point operation that generated the enhanced floating point operand data structure.
16. A floating point operand data structure used by a processing device when performing floating point operations, the data structure comprising: a first data field having sign information associated with the floating point operand; a second data field having exponent information associated with the floating point operand; and a third data field having fractional information associated with the floating point operand, wherein at least one of the first data field, the second data field and the third data field further includes embedded status information associated with at least one operand status condition.
17. The floating point operand data structure of claim 16, wherein the at least one operand status condition is determined from the embedded status information without regard to a floating point status register that is separate from an operand memory storage device for maintaining the floating point operand data structure.
18. The floating point operand data structure of claim 17, wherein the outcome of a conditional floating point instruction is based on the embedded status information without regard to contents of the floating point status register.
19. The floating point operand data structure of claim 16, wherein the embedded status information comprises at least one bit that is indicative of the at least one operand status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
20. The floating point operand data structure of claim 16, wherein the at least one operand status condition is indicative of at least one floating point operation that generated the floating point operand data structure.
21. A floating point system associated with a processing device for performing at least one floating point operation on a floating point operand, the system comprising: an operand memory storage device for maintaining the floating point operand; a control unit in communication with the operand memory storage device, the control unit receiving at least one floating point instruction associated with the at least one floating point operation and generating at least one control signal related to the at least one floating point operation; and a first functional processing unit in communication with the operand memory storage device and the control unit, the first functional processing unit capable processing the floating point operand and storing status information within the processed floating point operand.
22. The floating point system of claim 21 , wherein the status information comprises at least one bit that is indicative of an operand status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
23. The floating point system of claim 21 , wherein the first functional processing unit is capable of embedding the status information related to the processed floating point operand within predetermined fields of the processed floating point operand.
24. The floating point system of claim 21 , wherein the first functional processing unit is capable of providing the processed floating point operand to the operand memory storage device without storing the status information to a separate status memory device.
25. The floating point system of claim 24, wherein the control unit is operative to condition the outcome of the floating point instruction based upon the status information within the processed floating point operand without accessing the separate status memory device.
26. The floating point system of claim 21 further comprising a second functional processing unit in communication with the memory storage device and the control unit, the second functional processing unit being capable of processing a second floating point operand and storing status information related to the second floating point operand while the status information related to the first floating point operand is preserved.
27. A floating point processing system for performing at least one floating point operation on a floating point operand, the system comprising: o
an operand memory register for maintaining the floating point operand; and a functional processing unit in communication with the operand memory register, the functional processing unit being operative to: receive the floating point operand from the operand memory register, process the floating point operand to determine status information related to the processed floating point operand, and embed the status information within the processed floating point operand.
28. The floating point system of claim 27, wherein the status information comprises at least one bit that is indicative of an operand status condition from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
29. The floating point system of claim 27, wherein the functional processing unit is further operable to embed the status information within at least one predetermined field of the processed floating point operand.
30. The floating point system of claim 27, wherein the functional processing unit is capable of storing the processed floating point operand in the operand memory register without storing the status information in a floating point status register separate from the operand memory register.
31. The floating point system of claim 27 further comprising a control unit in communication with the operand memory register and the functional processing unit, the control unit being operative to condition the outcome of the floating point instruction based only upon the status information within the processed floating point operand.
32. The floating point system of claim 31 further comprising an additional functional processing unit in communication with the operand memory register and the control unit, the additional functional processing unit being capable of concurrently processing an additional floating point operand and storing status information related to the additional floating point operand within the additional floating point operand while the status information related to the other floating point operand is preserved within the other floating point operand.
33. A method of encoding a floating point operand with status information without maintaining the status information in a floating point status register, comprising: determining a status condition of the floating point operand as part of processing the floating point operand in association with a floating point operation; and representing an updated status condition of the floating point operand within the floating point operand.
34. The method of claim 33, wherein the determining step further comprises identifying the status condition and the updated status condition from only embedded status information within the floating point operand.
35. The method of claim 33, wherein the status condition and the updated status condition are from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
36. The method of claim 33, wherein the representing step further comprises embedding updated status information within the floating point operand after execution of the floating point operation, the updated status information representing the updated status condition.
37. The method of claim 33, wherein the updated status condition is indicative of a previous floating point operation that resulted in the floating point operand.
38. The method of claim 33 further comprising conditioning a subsequent floating point operation based only upon the updated status information within the floating point operand.
39. The method of claim 33 further comprising processing an additional floating point operand and representing updated status information related to the additional floating point operand within the additional floating point operand while the updated status information related to the other floating point operand is preserved.
40. The method of any one of claims, 33, 34, 35, 36, 37, 38, or 39 wherein a set of instructions are stored on a computer readable media, which when executed, perform the method.
41. A method of encoding a floating point operand with status information related to a status of the floating point operand without maintaining the status information in a floating point status register, comprising: receiving a floating point instruction; accessing a floating point operand to be processed as part of processing the floating point instruction; decoding an initial status condition of the floating point operand from only status information embedded within the floating point operand; and encoding a resulting status condition from execution of the floating point instruction on the floating point operand as updated status information within the floating point operand.
42. The method of claim 41 further comprising conditioning execution of a subsequent floating point instruction based only on the updated status information within the floating point operand.
43. The method of claim 41 , wherein the initial status condition and resulting status condition are from the group of an invalid operation status, an overflow status, an underflow status, a division by zero status, an infinity status, and an inexact status.
44. The method of claim 41 further comprising processing an additional floating point operand and encoding an updated status information into the additional floating point operand while the updated status information related to the other floating point operand is preserved.
45. The method of any one of claims 41 , 42, 43, or 44, wherein a set of instructions are stored on a computer readable media, which when executed, perform the method.
PCT/US2002/016024 2001-05-25 2002-05-22 Floating point system that represents status flag information within a floating point operand WO2002097607A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US29317301P 2001-05-25 2001-05-25
US60/293,173 2001-05-25
US10/035,747 2001-12-28
US10/035,747 US7395297B2 (en) 2001-05-25 2001-12-28 Floating point system that represents status flag information within a floating point operand

Publications (1)

Publication Number Publication Date
WO2002097607A1 true WO2002097607A1 (en) 2002-12-05

Family

ID=26712460

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2002/016024 WO2002097607A1 (en) 2001-05-25 2002-05-22 Floating point system that represents status flag information within a floating point operand

Country Status (2)

Country Link
US (1) US7395297B2 (en)
WO (1) WO2002097607A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6993549B2 (en) * 2001-05-25 2006-01-31 Sun Microsystems, Inc. System and method for performing gloating point operations involving extended exponents
US7222146B2 (en) * 2001-08-17 2007-05-22 Sun Microsystems, Inc. Method and apparatus for facilitating exception-free arithmetic in a computer system
ITUD20020104A1 (en) * 2002-05-16 2003-11-17 Neuricam Spa ELECTRONIC DEVICE FOR CALCULATING AND GENERATING LINEAR AND NON-LINEAR FUNCTIONS
JP2007072857A (en) * 2005-09-08 2007-03-22 Oki Electric Ind Co Ltd Arithmetic processing unit and information processing device
US8005885B1 (en) * 2005-10-14 2011-08-23 Nvidia Corporation Encoded rounding control to emulate directed rounding during arithmetic operations
US9195460B1 (en) * 2006-05-02 2015-11-24 Nvidia Corporation Using condition codes in the presence of non-numeric values
US7912890B2 (en) * 2006-05-11 2011-03-22 Intel Corporation Method and apparatus for decimal number multiplication using hardware for binary number operations
GB2447428A (en) * 2007-03-15 2008-09-17 Linear Algebra Technologies Lt Processor having a trivial operand register
US20100042814A1 (en) * 2008-08-14 2010-02-18 Saeid Tehrani Extended instruction set architectures
US20120059866A1 (en) * 2010-09-03 2012-03-08 Advanced Micro Devices, Inc. Method and apparatus for performing floating-point division
US9189581B2 (en) * 2012-07-30 2015-11-17 Synopsys, Inc. Equivalence checking between two or more circuit designs that include division circuits
US9141586B2 (en) * 2012-12-21 2015-09-22 Intel Corporation Method, apparatus, system for single-path floating-point rounding flow that supports generation of normals/denormals and associated status flags
US10460704B2 (en) 2016-04-01 2019-10-29 Movidius Limited Systems and methods for head-mounted display adapted to human visual mechanism
US10310814B2 (en) 2017-06-23 2019-06-04 International Business Machines Corporation Read and set floating point control register instruction
US10514913B2 (en) 2017-06-23 2019-12-24 International Business Machines Corporation Compiler controls for program regions
US10740067B2 (en) 2017-06-23 2020-08-11 International Business Machines Corporation Selective updating of floating point controls
US10379851B2 (en) 2017-06-23 2019-08-13 International Business Machines Corporation Fine-grained management of exception enablement of floating point controls
US10481908B2 (en) 2017-06-23 2019-11-19 International Business Machines Corporation Predicted null updated
US10684852B2 (en) 2017-06-23 2020-06-16 International Business Machines Corporation Employing prefixes to control floating point operations
US10725739B2 (en) 2017-06-23 2020-07-28 International Business Machines Corporation Compiler controls for program language constructs
US10949947B2 (en) 2017-12-29 2021-03-16 Intel Corporation Foveated image rendering for head-mounted display devices

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748516A (en) * 1995-09-26 1998-05-05 Advanced Micro Devices, Inc. Floating point processing unit with forced arithmetic results
US5995991A (en) * 1996-07-18 1999-11-30 Industrial Technology Research Institute Floating point architecture with tagged operands

Family Cites Families (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US33335A (en) * 1861-09-24 Improved method of setting posts
US194232A (en) * 1877-08-14 Improvement in presses for lard, fruit
US3660189A (en) * 1969-04-28 1972-05-02 Constantine T Troy Closed cell structure and methods and apparatus for its manufacture
US3725649A (en) 1971-10-01 1973-04-03 Raytheon Co Floating point number processor for a digital computer
JPS61288226A (en) 1985-06-17 1986-12-18 Panafacom Ltd External condition control system
US4777613A (en) 1986-04-01 1988-10-11 Motorola Inc. Floating point numeric data processor
US4991131A (en) 1987-10-06 1991-02-05 Industrial Technology Research Institute Multiplication and accumulation device
US5249149A (en) 1989-01-13 1993-09-28 International Business Machines Corporation Method and apparatus for performining floating point division
JPH0792739B2 (en) 1989-05-22 1995-10-09 甲府日本電気株式会社 Floating point data normalization method
US5161117A (en) 1989-06-05 1992-11-03 Fairchild Weston Systems, Inc. Floating point conversion device and method
US5307303A (en) 1989-07-07 1994-04-26 Cyrix Corporation Method and apparatus for performing division using a rectangular aspect ratio multiplier
US5065352A (en) 1989-08-16 1991-11-12 Matsushita Electric Industrial Co., Ltd. Divide apparatus employing multiplier with overlapped partial quotients
US5365465A (en) 1991-12-26 1994-11-15 Texas Instruments Incorporated Floating point to logarithm converter
EP0572695A1 (en) 1992-06-03 1993-12-08 International Business Machines Corporation A digital circuit for calculating a logarithm of a number
WO1994006075A1 (en) 1992-08-31 1994-03-17 Fujitsu Limited Method and apparatus for non-numeric character discrimination
US5357237A (en) 1992-09-04 1994-10-18 Motorola, Inc. In a data processor a method and apparatus for performing a floating-point comparison operation
US5347482A (en) 1992-12-14 1994-09-13 Hal Computer Systems, Inc. Multiplier tree using nine-to-three adders
US5347481A (en) 1993-02-01 1994-09-13 Hal Computer Systems, Inc. Method and apparatus for multiplying denormalized binary floating point numbers without additional delay
JP3583474B2 (en) 1994-06-29 2004-11-04 株式会社ルネサステクノロジ Multiplier
US5570310A (en) 1994-12-05 1996-10-29 Motorola Inc. Method and data processor for finding a logarithm of a number
US5953241A (en) 1995-08-16 1999-09-14 Microunity Engeering Systems, Inc. Multiplier array processing system with enhanced utilization at lower precision for group multiply and sum instruction
US5812439A (en) 1995-10-10 1998-09-22 Microunity Systems Engineering, Inc. Technique of incorporating floating point information into processor instructions
US5886915A (en) 1995-11-13 1999-03-23 Intel Corporation Method and apparatus for trading performance for precision when processing denormal numbers in a computer system
US5892697A (en) 1995-12-19 1999-04-06 Brakefield; James Charles Method and apparatus for handling overflow and underflow in processing floating-point numbers
US6108772A (en) 1996-06-28 2000-08-22 Intel Corporation Method and apparatus for supporting multiple floating point processing models
US5844830A (en) 1996-08-07 1998-12-01 Sun Microsystems, Inc. Executing computer instrucrions by circuits having different latencies
US5862066A (en) 1997-05-01 1999-01-19 Hewlett-Packard Company Methods and apparatus for fast check of floating point zero or negative zero
US6009511A (en) 1997-06-11 1999-12-28 Advanced Micro Devices, Inc. Apparatus and method for tagging floating point operands and results for rapid detection of special floating point numbers
US5978901A (en) * 1997-08-21 1999-11-02 Advanced Micro Devices, Inc. Floating point and multimedia unit with data type reclassification capability
JP3479438B2 (en) 1997-09-18 2003-12-15 株式会社東芝 Multiplication circuit
US5931943A (en) 1997-10-21 1999-08-03 Advanced Micro Devices, Inc. Floating point NaN comparison
US6049865A (en) 1997-12-18 2000-04-11 Motorola, Inc. Method and apparatus for implementing floating point projection instructions
US6490607B1 (en) 1998-01-28 2002-12-03 Advanced Micro Devices, Inc. Shared FP and SIMD 3D multiplier
US6131106A (en) 1998-01-30 2000-10-10 Sun Microsystems Inc System and method for floating-point computation for numbers in delimited floating point representation
US6189094B1 (en) 1998-05-27 2001-02-13 Arm Limited Recirculating register file
US6282634B1 (en) 1998-05-27 2001-08-28 Arm Limited Apparatus and method for processing data having a mixed vector/scalar register file
US6286023B1 (en) 1998-06-19 2001-09-04 Ati International Srl Partitioned adder tree supported by a multiplexer configuration
US6081823A (en) 1998-06-19 2000-06-27 Ati International Srl Circuit and method for wrap-around sign extension for signed numbers
US6138135A (en) 1998-08-27 2000-10-24 Institute For The Development Of Emerging Architectures, L.L.C. Propagating NaNs during high precision calculations using lesser precision hardware
US6219685B1 (en) 1998-09-04 2001-04-17 Intel Corporation Method to detect IEEE overflow and underflow conditions
US6256655B1 (en) 1998-09-14 2001-07-03 Silicon Graphics, Inc. Method and system for performing floating point operations in unnormalized format using a floating point accumulator
US6151669A (en) 1998-10-10 2000-11-21 Institute For The Development Of Emerging Architectures, L.L.C. Methods and apparatus for efficient control of floating-point status register
US6594681B1 (en) 1998-11-04 2003-07-15 Sun Microsystems, Inc. Quotient digit selection logic for floating point division/square root
US6697832B1 (en) 1999-07-30 2004-02-24 Mips Technologies, Inc. Floating-point processor with improved intermediate result handling
US6393555B1 (en) * 1999-08-05 2002-05-21 Advanced Micro Devices, Inc. Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit
US6571265B1 (en) 1999-10-29 2003-05-27 Intel Corporation Mechanism to detect IEEE underflow exceptions on speculative floating-point operations
US6658443B1 (en) 1999-11-03 2003-12-02 Sun Microsystems, Inc. Method and apparatus for representing arithmetic intervals within a computer system
US6732134B1 (en) 2000-09-11 2004-05-04 Apple Computer, Inc. Handler for floating-point denormalized numbers
US6789098B1 (en) * 2000-10-23 2004-09-07 Arm Limited Method, data processing system and computer program for comparing floating point numbers
US6658444B1 (en) 2000-11-09 2003-12-02 Sun Microsystems, Inc. Method and apparatus for performing a mask-driven interval division operation
US6629120B1 (en) 2000-11-09 2003-09-30 Sun Microsystems, Inc. Method and apparatus for performing a mask-driven interval multiplication operation
US6842764B2 (en) 2001-03-26 2005-01-11 Sun Microsystems, Inc. Minimum and maximum operations to facilitate interval multiplication and/or interval division
US6751638B2 (en) 2001-05-11 2004-06-15 Sun Microsystems, Inc. Min and max operations for multiplication and/or division under the simple interval system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5748516A (en) * 1995-09-26 1998-05-05 Advanced Micro Devices, Inc. Floating point processing unit with forced arithmetic results
US5995991A (en) * 1996-07-18 1999-11-30 Industrial Technology Research Institute Floating point architecture with tagged operands

Also Published As

Publication number Publication date
US20030005013A1 (en) 2003-01-02
US7395297B2 (en) 2008-07-01

Similar Documents

Publication Publication Date Title
US7395297B2 (en) Floating point system that represents status flag information within a floating point operand
US8799344B2 (en) Comparator unit for comparing values of floating point operands
US8543631B2 (en) Total order comparator unit for comparing values of two floating point operands
US8793294B2 (en) Circuit for selectively providing maximum or minimum of a pair of floating point operands
US7069288B2 (en) Floating point system with improved support of interval arithmetic
US7613762B2 (en) Floating point remainder with embedded status information
US5943249A (en) Method and apparatus to perform pipelined denormalization of floating-point results
CN108139912B (en) Apparatus and method for calculating and preserving error bounds during floating point operations
US7219117B2 (en) Methods and systems for computing floating-point intervals
US7366749B2 (en) Floating point adder with embedded status information
US6970898B2 (en) System and method for forcing floating point status information to selected values
US6154760A (en) Instruction to normalize redundantly encoded floating point numbers
US7236999B2 (en) Methods and systems for computing the quotient of floating-point intervals
US7831652B2 (en) Floating point multiplier with embedded status information
US5661674A (en) Divide to integer
US7444367B2 (en) Floating point status information accumulation circuit
US7016928B2 (en) Floating point status information testing circuit
US7363337B2 (en) Floating point divider with embedded status information
US7430576B2 (en) Floating point square root provider with embedded status information
US20110119471A1 (en) Method and apparatus to extract integer and fractional components from floating-point data
US11797300B1 (en) Apparatus for calculating and retaining a bound on error during floating-point operations and methods thereof

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP