Skip to content

Portal | Level: L1: Foundations | Topics: Binary & Number Representation | Domain: Linux

Binary and Floating Point - Primer

Why This Matters

Computers store everything as bit patterns. When you read a hex dump from a network capture, inspect a core dump, parse a binary protocol, or debug why a metric calculation drifts over time, you are working directly with the consequences of how integers and floats are represented in hardware. These are not academic topics — they explain real bugs you will encounter.

Understanding binary representation clarifies why counters overflow, why byte order matters in network protocols, why file permissions are expressed in octal, and why 0.1 + 0.2 != 0.3 in nearly every programming language. These are the kinds of issues that produce subtle, intermittent failures — the hardest kind to debug if you do not understand the underlying mechanics.

Core Concepts

1. Bits, Bytes, and Interpretation

A byte is 8 bits of storage. The same byte pattern means different things depending on interpretation:

Bit pattern: 1100 0011

As unsigned int (uint8):   195
As signed int (int8):      -61  (two's complement)
As part of UTF-8:          part of a multi-byte character
As part of a float:        part of sign/exponent/significand
As machine code:           a CPU instruction
As pixel data:             a color channel value

Context gives bytes meaning. There is no inherent "type" at the hardware level.

2. Unsigned and Signed Integers

Unsigned integers use the full bit range for non-negative values:

Bits Type Range
8 uint8 0 to 255
16 uint16 0 to 65,535
32 uint32 0 to 4,294,967,295
64 uint64 0 to 18,446,744,073,709,551,615

Signed integers (two's complement) split the range:

Bits Type Range
8 int8 -128 to 127
16 int16 -32,768 to 32,767
32 int32 -2,147,483,648 to 2,147,483,647
64 int64 -9.2 x 10^18 to 9.2 x 10^18

3. Two's Complement

The standard representation for signed integers. Negative numbers are formed by inverting all bits and adding 1:

To represent -5 in 8 bits:
  Start with 5:     0000 0101
  Invert all bits:  1111 1010
  Add 1:            1111 1011  →  -5

Why this works:
  5 + (-5) = 0000 0101 + 1111 1011 = 1 0000 0000
  The carry bit overflows, leaving 0000 0000 = 0

The elegance of two's complement is that addition and subtraction work identically for signed and unsigned at the hardware level. The CPU does not need separate circuits.

4. Integer Overflow

Fixed-width integers wrap when they exceed their range:

uint8:   255 + 1  →  0     (wraps around)
int8:    127 + 1  →  -128  (wraps to most negative)
uint32:  4,294,967,295 + 1  →  0

Real-world consequences: - Epoch timestamps in 32-bit signed int overflow on 2038-01-19 - Packet sequence counters wrap and must be handled - Prometheus counters are uint64 but still overflow eventually - Boeing 787 had a bug where a 32-bit counter overflowed after 248 days of continuous uptime, triggering a power shutdown

5. Endianness (Byte Order)

Name origin: The terms "big-endian" and "little-endian" were introduced by Danny Cohen in his 1980 paper "On Holy Wars and a Plea for Peace" (Internet Engineering Note 137). He borrowed them from Jonathan Swift's Gulliver's Travels (1726), where the nations of Lilliput and Blefuscu fight a war over which end of a boiled egg to crack first. Cohen's point: the byte-order debate is just as arbitrary — both work, but you must pick one and be consistent.

Multi-byte values can be stored in two orders:

Value: 0x01020304 (16,909,060 in decimal)

Big endian (network byte order):
  Address:  0x00  0x01  0x02  0x03
  Value:    01    02    03    04

Little endian (x86, ARM default):
  Address:  0x00  0x01  0x02  0x03
  Value:    04    03    02    01

Network protocols use big endian (most significant byte first). Most commodity CPUs (x86, ARM in default mode) use little endian. When reading binary data or network captures, you must know which byte order is in use or you will misinterpret values.

6. Hexadecimal

Hex is compact binary notation. Each hex digit represents exactly 4 bits:

Binary:  1010 1111  0000 1100
Hex:     A    F     0    C    →  0xAF0C

Useful anchors:
  0x00 =   0        0xFF =  255
  0x0A =  10        0x80 =  128
  0x10 =  16        0x7F =  127 (max int8)
  0x20 =  32 (space in ASCII)

Hex dumps are the standard way to inspect binary data:

# View hex dump of a file
xxd /path/to/binary | head
hexdump -C /path/to/binary | head

# View raw bytes of a network capture
tcpdump -XX -r capture.pcap | head -30

7. Bitwise Operations

Operator Symbol Use
AND & Mask bits, check if bit is set
OR \| Set bits, combine flags
XOR ^ Toggle bits, simple checksums
NOT ~ Invert all bits
Left shift << Multiply by powers of 2
Right shift >> Divide by powers of 2
# Linux file permissions are bitmasks
# rwxr-xr-- = 111 101 100 = 0754
chmod 0754 file.sh

# Subnet masks are bitmasks
# /24 = 255.255.255.0 = 11111111.11111111.11111111.00000000
# IP AND mask = network address
# 10.0.1.50 AND 255.255.255.0 = 10.0.1.0

8. IEEE 754 Floating Point

Who made it: IEEE 754 was primarily designed by William Kahan, a UC Berkeley mathematician who earned the Turing Award in 1989 for his work on numerical analysis. Before the standard (published in 1985), every CPU vendor had its own floating-point format — programs gave different results on different hardware. Kahan's standard unified floating-point arithmetic across all conforming processors. He is known as "The Father of Floating Point."

Floats represent real numbers using three fields:

Single precision (32-bit float):
  [1 bit sign] [8 bits exponent] [23 bits significand]

Double precision (64-bit float):
  [1 bit sign] [11 bits exponent] [52 bits significand]

Value = (-1)^sign × 2^(exponent - bias) × 1.significand

The significand has an implicit leading 1 (for normal numbers),
giving effectively 24 or 53 bits of precision.

9. Floating Point Precision Issues

Not every decimal fraction has an exact binary representation:

>>> 0.1 + 0.2
0.30000000000000004

>>> 0.1 + 0.2 == 0.3
False

This happens because 0.1 in binary is a repeating fraction (like 1/3 in decimal). The stored value is the nearest representable float, not the exact decimal value.

Floats are denser near zero and sparser as magnitude grows:

Between 1.0 and 2.0:    ~8 million representable values
Between 1e15 and 2e15:  same ~8 million values (much wider gaps)

War story: The Patriot missile failure in Dhahran (1991) was caused by floating-point drift. The system used a 24-bit fixed-point register to track time in tenths of seconds. After 100 hours of continuous operation, the accumulated rounding error was 0.34 seconds — enough to shift the tracking gate by 687 meters, causing the system to miss an incoming Scud missile. Twenty-eight soldiers died. This is the canonical example of why floating-point precision matters in real systems.

10. Special Float Values

+Infinity    — result of 1.0 / 0.0
-Infinity    — result of -1.0 / 0.0
NaN          — result of 0.0 / 0.0, sqrt(-1), etc.

NaN != NaN   — this is by specification (IEEE 754)
NaN comparisons always return false (except !=)

Use math.isnan() or equivalent to check for NaN.

What Experienced People Know

  • The printed decimal representation of a float is a string approximation. The actual stored value is a binary pattern. Languages differ in how many digits they display.
  • Never compare floats for exact equality in monitoring or alerting code. Use a tolerance (epsilon) or compare using relative error.
  • Never use floating point for money. Use integer cents, fixed-point decimal types, or dedicated money libraries.
  • The 2038 problem (32-bit signed time_t overflow) is real. Systems with 32-bit timestamps will wrap to 1901 or error. Many embedded systems, firmware, and file formats still use 32-bit timestamps.
  • When debugging network issues, always check byte order. A port number displayed as 20480 instead of 80 means someone forgot ntohs().
  • xxd, hexdump -C, and od are essential tools for inspecting binary data. Learn at least one well.
  • Bitwise operations show up constantly in systems work: IP subnet calculations, file permission masks, protocol flag fields, CPU feature flags, and memory alignment checks.
  • Integer overflow in counters is the source of many production incidents. Know the width of your counters and plan for wrapping.

    Remember: The quick way to check endianness on a Linux system: echo -n I | od -to2 | head -1 | cut -f2 -d' '. If it prints 00111 it is big-endian; 34111 means little-endian. Or simply: lscpu | grep 'Byte Order'.

  • Mixing signed and unsigned integers in arithmetic is a common source of subtle bugs in C/C++ code and the protocols they implement. The results of mixed operations are often surprising.


Wiki Navigation

  • Binary Flashcards (CLI) (flashcard_deck, L1) — Binary & Number Representation