The Float Manager provides functions for working with single and double-precision floating point values. This chapter provides reference documentation for the Float Manager APIs. It is divided into the following sections:
Float Manager Constants
Float Manager Functions and Macros
The header file FloatMgr.h
declares the API that this chapter describes.
For more information on the Float Manager, see Chapter 12, "Floating Point."
Float Manager Constants
Float Manager Error Codes
Purpose
Error codes returned by the various Float Manager functions.
Declared In
FloatMgr.h
Constants
-
#define flpErrOutOfRange (flpErrorClass | 1)
- Returned by
FlpBase10Info()
if the supplied floating point number is either not a number (NaN) or is infinite.
Miscellaneous Float Manager Constants
Purpose
The Float Manager also defines these constants.
Declared In
FloatMgr.h
Constants
Float Manager Functions and Macros
FlpAddDouble Function
Purpose
Calculate the sum of two double-precision floating point values.
Declared In
FloatMgr.h
Prototype
double FlpAddDouble ( doubleaddend1
, doubleaddend2
)
Parameters
-
→ addend1
- The first double-precision floating point value to be added.
-
→ addend2
- The second double-precision floating point value to be added.
Returns
Returns the sum of the two supplied values.
See Also
FlpAddFloat()
, FlpCorrectedAdd()
, FlpSubDouble()
FlpAddFloat Function
Purpose
Calculate the sum of two single-precision floating point values.
Declared In
FloatMgr.h
Prototype
float FlpAddFloat ( floataddend1
, floataddend2
)
Parameters
-
→ addend1
- The first single-precision floating point value to be added.
-
→ addend2
- The second single-precision floating point value to be added.
Returns
Returns the sum of the two supplied values.
See Also
FlpAddDouble()
, FlpCorrectedAdd()
, FlpSubFloat()
FlpBase10Info Function
Purpose
Extract detailed information on the base 10 form of a floating point number: the base 10 mantissa, exponent, and sign.
Declared In
FloatMgr.h
Prototype
status_t FlpBase10Info ( doublea
, uint32_t*mantissaP
, int16_t*exponentP
, int16_t*signP
)
Parameters
-
→ a
- The floating point number.
-
← mantissaP
- The base 10 mantissa.
-
← exponentP
- The base 10 exponent.
-
← signP
- The sign: 1 if the number is negative, 0 otherwise.
Returns
Returns errNone
if no error, or flpErrOutOfRange
if the supplied floating point number is either not a number (NaN) or is infinite.
Comments
The mantissa is normalized so it contains at least 8 significant digits when printed as an integer value.
See Also
FlpCompareDoubleEqual Function
Purpose
Determine whether two double-precision floating point values are equal.
Declared In
FloatMgr.h
Prototype
Boolean FlpCompareDoubleEqual ( doublefirst
, doublesecond
)
Parameters
-
→ first
- The first double-precision floating point value to be compared.
-
→ second
- The second double-precision floating point value to be compared.
Returns
Returns true
if the two double-precision values are equal, false
otherwise.
See Also
FlpCompareDoubleLessThan()
, FlpCompareDoubleLessThanOrEqual()
, FlpCompareFloatEqual()
FlpCompareDoubleLessThan Function
Purpose
Determine whether one double-precision floating point value is less than another.
Declared In
FloatMgr.h
Prototype
Boolean FlpCompareDoubleLessThan ( doublefirst
, doublesecond
)
Parameters
-
→ first
- The first double-precision floating point value to be compared.
-
→ second
- The second double-precision floating point value to be compared.
Returns
Returns true
if the value of first
is less than the value of second
. Otherwise, this function returns false
.
See Also
FlpCompareDoubleEqual()
, FlpCompareDoubleLessThanOrEqual()
, FlpCompareFloatLessThan()
FlpCompareDoubleLessThanOrEqual Function
Purpose
Determine whether one double-precision floating point value is less than or equal to another.
Declared In
FloatMgr.h
Prototype
Boolean FlpCompareDoubleLessThanOrEqual ( doublefirst
, doublesecond
)
Parameters
-
→ first
- The first double-precision floating point value to be compared.
-
→ second
- The second double-precision floating point value to be compared.
Returns
Returns true
if the value of first
is less than or equal to the value of second
. Otherwise, this function returns false
.
See Also
FlpCompareDoubleEqual()
, FlpCompareDoubleLessThan()
, FlpCompareFloatLessThanOrEqual()
FlpCompareFloatEqual Function
Purpose
Determine whether two single-precision floating point values are equal.
Declared In
FloatMgr.h
Prototype
Boolean FlpCompareFloatEqual ( floatfirst
, floatsecond
)
Parameters
-
→ first
- The first single-precision floating point value to be compared.
-
→ second
- The second single-precision floating point value to be compared.
Returns
Returns true
if the two single-precision values are equal, false
otherwise.
See Also
FlpCompareDoubleEqual()
, FlpCompareFloatLessThan()
, FlpCompareFloatLessThanOrEqual()
FlpCompareFloatLessThan Function
Purpose
Determine whether one single-precision floating point value is less than another.
Declared In
FloatMgr.h
Prototype
Boolean FlpCompareFloatLessThan ( floatfirst
, floatsecond
)
Parameters
-
→ first
- The first single-precision floating point value to be compared.
-
→ second
- The second single-precision floating point value to be compared.
Returns
Returns true
if the value of first
is less than the value of second
. Otherwise, this function returns false
.
See Also
FlpCompareDoubleLessThan()
, FlpCompareFloatEqual()
, FlpCompareFloatLessThanOrEqual()
FlpCompareFloatLessThanOrEqual Function
Purpose
Determine whether one single-precision floating point value is less than or equal to another.
Declared In
FloatMgr.h
Prototype
Boolean FlpCompareFloatLessThanOrEqual ( floatfirst
, floatsecond
)
Parameters
-
→ first
- The first single-precision floating point value to be compared.
-
→ second
- The second single-precision floating point value to be compared.
Returns
Returns true
if the value of first
is less than or equal to the value of second
. Otherwise, this function returns false
.
See Also
FlpCompareDoubleLessThanOrEqual()
, FlpCompareFloatEqual()
, FlpCompareFloatLessThan()
FlpCorrectedAdd Function
Purpose
Adds two floating point numbers and corrects for least-significant-bit errors when the result should be zero but is instead very close to zero.
Declared In
FloatMgr.h
Prototype
double FlpCorrectedAdd ( doublefirstOperand
, doublesecondOperand
, int16_thowAccurate
)
Parameters
-
→ firstOperand
- The first of the two numbers to be added.
-
→ secondOperand
- The second of the two numbers to be added.
-
→ howAccurate
- The smallest difference in exponents that won't force the result to zero. The value returned from
FlpCorrectedAdd()
is forced to zero if, when the exponent of the result of the addition is subtracted from the exponent of the smaller of the two operands, the difference exceeds the value specified forhowAccurate
. Supply a value of zero for this parameter to obtain the default level of accuracy (which is equivalent to ahowAccurate
value of 48).
Returns
Returns the calculated result.
Comments
Adding or subtracting a large number and a small number produces a result similar in magnitude to the larger number. Adding or subtracting two numbers that are similar in magnitude can, depending on their signs, produce a result with a very small exponent (that is, a negative exponent that is large in magnitude). If the difference between the result's exponent and that of the operands is close to the number of significant bits expressible by the mantissa, it is quite possible that the result should in fact be zero.
There also exist cases where it may be useful to retain accuracy in the low-order bits of the mantissa. For instance: 99999999 + 0.00000001 - 99999999. However, unless the fractional part is an exact (negative) power of two, it is doubtful that what few bits of mantissa that are available will be enough to properly represent the fractional value. In this example, the 99999999 requires 26 bits, leaving 26 bits for the .00000001; this guarantees inaccuracy after the subtraction.
The problem arises from the difficulty in representing decimal fractions such as 0.1 in binary. After about three successive additions or subtractions, errors begin to appear in the least significant bits of the mantissa. If the value represented by the most significant bits of the mantissa is then subtracted away, the least significant bit error is normalized and becomes the actual result—when in fact the result should be zero.
This problem is only an issue for addition and subtraction.
See Also
FlpAddDouble()
, FlpAddFloat()
, FlpCorrectedSub()
FlpCorrectedSub Function
Purpose
Subtracts two floating point numbers and corrects for least-significant-bit errors when the result should be zero but is instead very close to zero.
Declared In
FloatMgr.h
Prototype
double FlpCorrectedSub ( doublefirstOperand
, doublesecondOperand
, int16_thowAccurate
)
Parameters
-
→ firstOperand
- The value from which
secondOperand
is to be subtracted. -
→ secondOperand
- The value to subtract from
firstOperand
. -
→ howAccurate
- The smallest difference in exponents that won't force the result to zero.The value returned from
FlpCorrectedSub()
is forced to zero if, when the exponent of the result of the subtraction is subtracted from the exponent of the smaller of the two operands, the difference exceeds the value specified forhowAccurate
. Supply a value of zero for this parameter to obtain the default level of accuracy (which is equivalent to ahowAccurate
value of 48).
Returns
Returns the calculated result.
Comments
See the comments for FlpCorrectedAdd()
.
See Also
FlpDivDouble Function
Purpose
Divide one double-precision floating point value by another, and return the result.
Declared In
FloatMgr.h
Prototype
double FlpDivDouble ( doublenumerator
, doubledenominator
)
Parameters
-
→ numerator
- The double-precision value to be divided by the denominator.
-
→ denominator
- The double-precision value by which the numerator is to be divided.
Returns
Returns the double-precision result of dividing numerator
by denominator
.
See Also
FlpDivFloat Function
Purpose
Divide one single-precision floating point value by another, and return the result.
Declared In
FloatMgr.h
Prototype
float FlpDivFloat ( floatnumerator
, floatdenominator
)
Parameters
-
→ numerator
- The single-precision value to be divided by the denominator.
-
→ denominator
- The single-precision value by which the numerator is to be divided.
Returns
Returns the single-precision result of dividing numerator
by denominator
.
See Also
FlpDoubleToFloat Function
Purpose
Converts a double-precision floating point value to a float.
Declared In
FloatMgr.h
Prototype
float FlpDoubleToFloat (
double value
)
Parameters
Returns
The single-precision floating point representation of the supplied value.
See Also
FlpDoubleToInt32()
, FlpDoubleToLongDouble()
, FlpDoubleToLongLong()
, FlpDoubleToUInt32()
, FlpDoubleToULongLong()
, FlpFloatToDouble()
FlpDoubleToInt32 Function
Purpose
Converts a double-precision floating point value to a signed 32-bit integer.
Declared In
FloatMgr.h
Prototype
int32_t FlpDoubleToInt32 (
double value
)
Parameters
Returns
The signed 32-bit integer representation of the supplied value.
See Also
FlpDoubleToFloat()
, FlpDoubleToLongDouble()
, FlpDoubleToLongLong()
, FlpDoubleToUInt32()
, FlpDoubleToULongLong()
, FlpInt32ToDouble()
FlpDoubleToLongDouble Function
Purpose
Converts a double-precision floating point value to a "long double."
Declared In
FloatMgr.h
Prototype
long double FlpDoubleToLongDouble (
double value
)
Parameters
Returns
The "long double" floating point representation of the supplied value.
See Also
FlpDoubleToFloat()
, FlpDoubleToInt32()
, FlpDoubleToLongLong()
, FlpDoubleToUInt32()
, FlpDoubleToULongLong()
, FlpLongDoubleToDouble()
FlpDoubleToLongLong Function
Purpose
Converts a double-precision floating point value to a "long long."
Declared In
FloatMgr.h
Prototype
int64_t FlpDoubleToLongLong (
double value
)
Parameters
Returns
The signed "long long" integer representation of the supplied value.
See Also
FlpDoubleToFloat()
, FlpDoubleToInt32()
, FlpDoubleToLongDouble()
, FlpDoubleToUInt32()
, FlpDoubleToULongLong()
, FlpLongLongToDouble()
FlpDoubleToUInt32 Function
Purpose
Converts a double-precision floating point value to an unsigned 32-bit integer.
Declared In
FloatMgr.h
Prototype
uint32_t FlpDoubleToUInt32 (
double value
)
Parameters
Returns
The unsigned 32-bit integer representation of the supplied value.
See Also
FlpDoubleToFloat()
, FlpDoubleToInt32()
, FlpDoubleToLongDouble()
, FlpDoubleToLongLong()
, FlpDoubleToULongLong()
, FlpUInt32ToDouble()
FlpDoubleToULongLong Function
Purpose
Converts a double-precision floating point value to an unsigned "long long."
Declared In
FloatMgr.h
Prototype
uint64_t FlpDoubleToULongLong (
double value
)
Parameters
Returns
The unsigned "long long" integer representation of the supplied value.
See Also
FlpDoubleToFloat()
, FlpDoubleToInt32()
, FlpDoubleToLongDouble()
, FlpDoubleToLongLong()
, FlpDoubleToUInt32()
, FlpULongLongToDouble()
FlpFloatToDouble Function
Purpose
Converts a single-precision floating point value to a double.
Declared In
FloatMgr.h
Prototype
double FlpFloatToDouble (
float value
)
Parameters
Returns
The double-precision floating point representation of the supplied value.
See Also
FlpDoubleToFloat()
, FlpFloatToInt32()
, FlpFloatToLongDouble()
, FlpFloatToLongLong()
, FlpFloatToUInt32()
, FlpFloatToULongLong()
FlpFloatToInt32 Function
Purpose
Converts a single-precision floating point value to a 32-bit signed integer.
Declared In
FloatMgr.h
Prototype
int32_t FlpFloatToInt32 (
float value
)
Parameters
Returns
The 32-bit signed integer representation of the supplied value.
See Also
FlpFloatToDouble()
, FlpFloatToLongDouble()
, FlpFloatToLongLong()
, FlpFloatToUInt32()
, FlpFloatToULongLong()
, FlpInt32ToFloat()
FlpFloatToLongDouble Function
Purpose
Converts a single-precision floating point value to a double.
Declared In
FloatMgr.h
Prototype
long double FlpFloatToLongDouble (
float value
)
Parameters
Returns
The double-precision floating point representation of the supplied value.
See Also
FlpFloatToDouble()
, FlpFloatToInt32()
, FlpFloatToLongLong()
, FlpFloatToUInt32()
, FlpFloatToULongLong()
, FlpLongDoubleToFloat()
FlpFloatToLongLong Function
Purpose
Converts a single-precision floating point value to a signed "long long" integer.
Declared In
FloatMgr.h
Prototype
int64_t FlpFloatToLongLong (
float value
)
Parameters
Returns
The signed long long integer representation of the supplied value.
See Also
FlpFloatToDouble()
, FlpFloatToInt32()
, FlpFloatToLongDouble()
, FlpFloatToUInt32()
, FlpFloatToULongLong()
, FlpLongLongToFloat()
FlpFloatToUInt32 Function
Purpose
Converts a single-precision floating point value to an unsigned 32-bit integer.
Declared In
FloatMgr.h
Prototype
uint32_t FlpFloatToUInt32 (
float value
)
Parameters
Returns
The unsigned 32-bit integer representation of the supplied value.
See Also
FlpFloatToDouble()
, FlpFloatToInt32()
, FlpFloatToLongDouble()
, FlpFloatToLongLong()
, FlpFloatToULongLong()
, FlpUInt32ToFloat()
FlpFloatToULongLong Function
Purpose
Converts a single-precision floating point value to an unsigned "long long" integer.
Declared In
FloatMgr.h
Prototype
uint64_t FlpFloatToULongLong (
float value
)
Parameters
Returns
The unsigned long long integer representation of the supplied value.
See Also
FlpFloatToDouble()
, FlpFloatToInt32()
, FlpFloatToLongDouble()
, FlpFloatToLongLong()
, FlpFloatToUInt32()
, FlpULongLongToFloat()
FlpFToA Function
Purpose
Convert a floating-point number to a null-terminated ASCII string in exponential format: [-]x.yyyyyyye[-]zz
Declared In
FloatMgr.h
Prototype
status_t FlpFToA( doublevalue
, char *buffer
)
Parameters
Returns
Returns errNone
if no error, or flpErrOutOfRange
if the supplied value is either infinite or not a number. In this case, the buffer is set to the string "INF", "-INF", or "NaN" as appropriate.
FlpGetExponent Macro
Purpose
Extracts the exponent of a 64-bit floating point value. The returned value has the bias applied, so it ranges from -1023 to +1024.
Declared In
FloatMgr.h
Prototype
#define FlpGetExponent (
x
)
Parameters
Returns
Evaluates to the exponent of the specified value.
See Also
FlpInt32ToDouble Function
Purpose
Converts a signed 32-bit integer to a double.
Declared In
FloatMgr.h
Prototype
double FlpInt32ToDouble (
int32_t value
)
Parameters
Returns
The double-precision floating point representation of the supplied value.
See Also
FlpDoubleToInt32()
, FlpInt32ToFloat()
FlpInt32ToFloat Function
Purpose
Converts a signed 32-bit integer to a float.
Declared In
FloatMgr.h
Prototype
float FlpInt32ToFloat (
int32_t value
)
Parameters
Returns
The floating point representation of the supplied value.
See Also
FlpFloatToInt32()
, FlpInt32ToDouble()
FlpLongDoubleToDouble Function
Purpose
Converts a long double-precision floating point value to a double.
Declared In
FloatMgr.h
Prototype
double FlpLongDoubleToDouble (
long double value
)
Parameters
Returns
The double-precision floating point representation of the supplied value.
See Also
FlpDoubleToLongDouble()
, FlpLongDoubleToFloat()
FlpLongDoubleToFloat Function
Purpose
Converts a long double-precision floating point value to a float.
Declared In
FloatMgr.h
Prototype
float FlpLongDoubleToFloat (
long double value
)
Parameters
Returns
The single-precision floating point representation of the supplied value.
See Also
FlpFloatToLongDouble()
, FlpLongDoubleToDouble()
FlpLongLongToDouble Function
Purpose
Converts a signed 64-bit integer to a double.
Declared In
FloatMgr.h
Prototype
double FlpLongLongToDouble (
int64_t value
)
Parameters
Returns
The double-precision floating point representation of the supplied value.
See Also
FlpDoubleToLongLong()
, FlpLongLongToFloat()
FlpLongLongToFloat Function
Purpose
Converts a signed 64-bit integer to a float.
Declared In
FloatMgr.h
Prototype
float FlpLongLongToFloat (
int64_t value
)
Parameters
Returns
The floating point representation of the supplied value.
See Also
FlpFloatToLongLong()
, FlpLongLongToDouble()
FlpMulDouble Function
Purpose
Multiply one double-precision floating point value by another, and return the result.
Declared In
FloatMgr.h
Prototype
double FlpMulDouble ( doublemultiplier
, doublemultiplicand
)
Parameters
-
→ multiplier
- The first double-precision floating point value to be multiplied.
-
→ multiplicand
- The second double-precision floating point value to be multiplied.
Returns
The double-precision result of multiplying multiplier
by multiplicand
.
See Also
FlpMulFloat Function
Purpose
Multiply one second-precision floating point value by another, and return the result.
Declared In
FloatMgr.h
Prototype
float FlpMulFloat ( floatmultiplier
, floatmultiplicand
)
Parameters
-
→ multiplier
- The first single-precision floating point value to be multiplied.
-
→ multiplicand
- The second single-precision floating point value to be multiplied.
Returns
The single-precision result of multiplying multiplier
by multiplicand
.
See Also
FlpNegDouble Function
Purpose
Calculate the negative of a double-precision floating point value.
Declared In
FloatMgr.h
Prototype
double FlpNegDouble (
double value
)
Parameters
Returns
Returns the negative of the supplied value.
See Also
FlpNegFloat Function
Purpose
Calculate the negative of a single-precision floating point value.
Declared In
FloatMgr.h
Prototype
float FlpNegFloat (
float value
)
Parameters
Returns
Returns the negative of the supplied value.
See Also
FlpSubDouble Function
Purpose
Subtract one double-precision floating point value from another.
Declared In
FloatMgr.h
Prototype
double FlpSubDouble ( doubleminuend
, doublesubtrahend
)
Parameters
-
→ minuend
- The double-precision floating point value from which the subtrahend is to be subtracted.
-
→ subtrahend
- The double-precision floating point value to be subtracted from the minuend.
Returns
Returns the result of subtracting subtrahend
from minuend
.
See Also
FlpAddDouble()
, FlpCorrectedSub()
, FlpSubFloat()
FlpSubFloat Function
Purpose
Subtract one single-precision floating point value from another.
Declared In
FloatMgr.h
Prototype
float FlpSubFloat ( floatminuend
, floatsubtrahend
)
Parameters
-
→ minuend
- The single-precision floating point value from which the subtrahend is to be subtracted.
-
→ subtrahend
- The single-precision floating point value to be subtracted from the minuend.
Returns
Returns the result of subtracting subtrahend
from minuend
.
See Also
FlpAddFloat()
, FlpCorrectedSub()
, FlpSubDouble()
FlpUInt32ToDouble Function
Purpose
Converts an unsigned 32-bit integer to a double.
Declared In
FloatMgr.h
Prototype
double FlpUInt32ToDouble (
uint32_t value
)
Parameters
Returns
The double-precision floating point representation of the supplied value.
FlpUInt32ToFloat Function
Purpose
Converts an unsigned 32-bit integer to a float.
Declared In
FloatMgr.h
Prototype
float FlpUInt32ToFloat (
uint32_t value
)
Parameters
Returns
The floating point representation of the supplied value.
See Also
FlpFloatToUInt32()
, FlpUInt32ToDouble()
FlpULongLongToDouble Function
Purpose
Converts an unsigned 64-bit integer to a double.
Declared In
FloatMgr.h
Prototype
double FlpULongLongToDouble (
uint64_t value
)
Parameters
Returns
The double-precision floating point representation of the supplied value.
See Also
FlpDoubleToULongLong()
, FlpULongLongToFloat()
FlpULongLongToFloat Function
Purpose
Converts an unsigned 64-bit integer to a float.
Declared In
FloatMgr.h
Prototype
float FlpULongLongToFloat (
uint64_t value
)
Parameters
Returns
The floating point representation of the supplied value.
See Also
FlpFloatToULongLong()
, FlpULongLongToDouble()