Floating point limits
Maximum number size - 32 binary IEEE representation. 0 1111 1110 1111 1111 1111 1111 1111 111 Remember all 1s in exponent is special flag Exponent 8 bit biased exponent 2^(8-1)-1 = 127 bias 1111 1110b = 254 254 - 127 = 127 2^127 Significand 1111 1111 1111 1111 1111 111 = 23 bit mantissa 1.1111 1111 1111 1111 1111 111 = with significant bit. If in decimal, it can be shown that 9 = 10^1 - 10^0 Then The binary above is 10b - 0.0000 0000 0000 0000 0000 001b or 2^1 - 2^-23 Maximum size is (2^1 - 2^-23)* 2^127 |
Minimum normalized number size 32 binary representation. 0 0000 0001 0000 0000 0000 0000 0000 000 Remember all 0s in exponent is special flag Exponent 8 bit biased exponent 2^(8-1)-1 = 127 bias 0000 0001 = 1 1 - 127 = -126 2^-126 Significand 23 bit 0000 0000 0000 0000 0000 000 1.0000 0000 0000 0000 0000 000 Minimum normalized size is 2^-126 |
Minimum denormalized number size 32 binary representation. 0 0000 0000 0000 0000 0000 0000 0000 001 All 0s in exponent and a non-zero significand flag a denormalized value. Exponent Assume the normalized smallest exponent. 1 - 127 = -126 2^-126 Significand - the normal integer 1 is no longer present. 0.0000 0000 0000 0000 0000 001 1.0000 0000 0000 0000 0000 000 =1 * 2^-23 Minimum normalized size is 1 * 2^-23 * 1 * 2^-126 = 1 * 2^-149 |