Floating point limits


Maximum number size - 32 binary IEEE representation.
  0 1111 1110 1111 1111 1111 1111 1111 111

  Remember all 1s in exponent is special flag

Exponent
  8 bit biased exponent   2^(8-1)-1 = 127 bias

  1111 1110b = 254

  254 - 127 = 127   2^127

Significand
  1111 1111 1111 1111 1111 111 = 23 bit mantissa

  1.1111 1111 1111 1111 1111 111 = with significant bit. 

  If in decimal, it can be shown that 9 = 10^1 - 10^0
  Then
    The binary above is 10b - 0.0000 0000 0000 0000 0000 001b or 2^1 - 2^-23

Maximum size is (2^1 - 2^-23)* 2^127
Minimum normalized number size
  32 binary representation.

  0 0000 0001 0000 0000 0000 0000 0000 000

  Remember all 0s in exponent is special flag

  Exponent
    8 bit biased exponent   2^(8-1)-1 = 127 bias

    0000 0001 = 1

    1 - 127 = -126   2^-126

  Significand
    23 bit 0000 0000 0000 0000 0000 000

    1.0000 0000 0000 0000 0000 000

  Minimum normalized size is 2^-126
Minimum denormalized number size
  32 binary representation.

  0 0000 0000 0000 0000 0000 0000 0000 001

  All 0s in exponent and a non-zero significand flag a denormalized value.

  Exponent
    Assume the normalized smallest exponent. 1 - 127 = -126   2^-126

  Significand - the normal integer 1 is no longer present.
    0.0000 0000 0000 0000 0000 001
    1.0000 0000 0000 0000 0000 000 =1 * 2^-23

  Minimum normalized size is 1 * 2^-23 * 1 * 2^-126 = 1 * 2^-149