Floating Point Math

Made simple
Nice link

The IEEE Standard defines 32-bit and 64-bit floating-point representations. The 32-bit (single-precision) format is, from high-order to low-order, a sign bit, an 8-bit exponent with a bias of 127, and 23 bits of mantissa. The 64-bit (double-precision) format is, a sign bit, an 11-bit exponent with a bias of 1023, and 52 bits of mantissa. With the hidden bit, normalized numbers have an effective precision of 24 and 53 bits, respectively.

Single-precision format
31, 30-23, 22-0
S, Exponent, Significand

Double-precision format
63, 62-52, 51-0
S, Exponent, Significand

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: