Portal | Level: L1: Foundations | Topics: Binary & Number Representation | Domain: Linux
Binary and Floating Point - Primer¶
Why This Matters¶
Computers store everything as bit patterns. When you read a hex dump from a network capture, inspect a core dump, parse a binary protocol, or debug why a metric calculation drifts over time, you are working directly with the consequences of how integers and floats are represented in hardware. These are not academic topics — they explain real bugs you will encounter.
Understanding binary representation clarifies why counters overflow, why
byte order matters in network protocols, why file permissions are expressed
in octal, and why 0.1 + 0.2 != 0.3 in nearly every programming language.
These are the kinds of issues that produce subtle, intermittent failures —
the hardest kind to debug if you do not understand the underlying mechanics.
Core Concepts¶
1. Bits, Bytes, and Interpretation¶
A byte is 8 bits of storage. The same byte pattern means different things depending on interpretation:
Bit pattern: 1100 0011
As unsigned int (uint8): 195
As signed int (int8): -61 (two's complement)
As part of UTF-8: part of a multi-byte character
As part of a float: part of sign/exponent/significand
As machine code: a CPU instruction
As pixel data: a color channel value
Context gives bytes meaning. There is no inherent "type" at the hardware level.
2. Unsigned and Signed Integers¶
Unsigned integers use the full bit range for non-negative values:
| Bits | Type | Range |
|---|---|---|
| 8 | uint8 | 0 to 255 |
| 16 | uint16 | 0 to 65,535 |
| 32 | uint32 | 0 to 4,294,967,295 |
| 64 | uint64 | 0 to 18,446,744,073,709,551,615 |
Signed integers (two's complement) split the range:
| Bits | Type | Range |
|---|---|---|
| 8 | int8 | -128 to 127 |
| 16 | int16 | -32,768 to 32,767 |
| 32 | int32 | -2,147,483,648 to 2,147,483,647 |
| 64 | int64 | -9.2 x 10^18 to 9.2 x 10^18 |
3. Two's Complement¶
The standard representation for signed integers. Negative numbers are formed by inverting all bits and adding 1:
To represent -5 in 8 bits:
Start with 5: 0000 0101
Invert all bits: 1111 1010
Add 1: 1111 1011 → -5
Why this works:
5 + (-5) = 0000 0101 + 1111 1011 = 1 0000 0000
The carry bit overflows, leaving 0000 0000 = 0
The elegance of two's complement is that addition and subtraction work identically for signed and unsigned at the hardware level. The CPU does not need separate circuits.
4. Integer Overflow¶
Fixed-width integers wrap when they exceed their range:
uint8: 255 + 1 → 0 (wraps around)
int8: 127 + 1 → -128 (wraps to most negative)
uint32: 4,294,967,295 + 1 → 0
Real-world consequences: - Epoch timestamps in 32-bit signed int overflow on 2038-01-19 - Packet sequence counters wrap and must be handled - Prometheus counters are uint64 but still overflow eventually - Boeing 787 had a bug where a 32-bit counter overflowed after 248 days of continuous uptime, triggering a power shutdown
5. Endianness (Byte Order)¶
Name origin: The terms "big-endian" and "little-endian" were introduced by Danny Cohen in his 1980 paper "On Holy Wars and a Plea for Peace" (Internet Engineering Note 137). He borrowed them from Jonathan Swift's Gulliver's Travels (1726), where the nations of Lilliput and Blefuscu fight a war over which end of a boiled egg to crack first. Cohen's point: the byte-order debate is just as arbitrary — both work, but you must pick one and be consistent.
Multi-byte values can be stored in two orders:
Value: 0x01020304 (16,909,060 in decimal)
Big endian (network byte order):
Address: 0x00 0x01 0x02 0x03
Value: 01 02 03 04
Little endian (x86, ARM default):
Address: 0x00 0x01 0x02 0x03
Value: 04 03 02 01
Network protocols use big endian (most significant byte first). Most commodity CPUs (x86, ARM in default mode) use little endian. When reading binary data or network captures, you must know which byte order is in use or you will misinterpret values.
6. Hexadecimal¶
Hex is compact binary notation. Each hex digit represents exactly 4 bits:
Binary: 1010 1111 0000 1100
Hex: A F 0 C → 0xAF0C
Useful anchors:
0x00 = 0 0xFF = 255
0x0A = 10 0x80 = 128
0x10 = 16 0x7F = 127 (max int8)
0x20 = 32 (space in ASCII)
Hex dumps are the standard way to inspect binary data:
# View hex dump of a file
xxd /path/to/binary | head
hexdump -C /path/to/binary | head
# View raw bytes of a network capture
tcpdump -XX -r capture.pcap | head -30
7. Bitwise Operations¶
| Operator | Symbol | Use |
|---|---|---|
| AND | & |
Mask bits, check if bit is set |
| OR | \| |
Set bits, combine flags |
| XOR | ^ |
Toggle bits, simple checksums |
| NOT | ~ |
Invert all bits |
| Left shift | << |
Multiply by powers of 2 |
| Right shift | >> |
Divide by powers of 2 |
# Linux file permissions are bitmasks
# rwxr-xr-- = 111 101 100 = 0754
chmod 0754 file.sh
# Subnet masks are bitmasks
# /24 = 255.255.255.0 = 11111111.11111111.11111111.00000000
# IP AND mask = network address
# 10.0.1.50 AND 255.255.255.0 = 10.0.1.0
8. IEEE 754 Floating Point¶
Who made it: IEEE 754 was primarily designed by William Kahan, a UC Berkeley mathematician who earned the Turing Award in 1989 for his work on numerical analysis. Before the standard (published in 1985), every CPU vendor had its own floating-point format — programs gave different results on different hardware. Kahan's standard unified floating-point arithmetic across all conforming processors. He is known as "The Father of Floating Point."
Floats represent real numbers using three fields:
Single precision (32-bit float):
[1 bit sign] [8 bits exponent] [23 bits significand]
Double precision (64-bit float):
[1 bit sign] [11 bits exponent] [52 bits significand]
Value = (-1)^sign × 2^(exponent - bias) × 1.significand
The significand has an implicit leading 1 (for normal numbers),
giving effectively 24 or 53 bits of precision.
9. Floating Point Precision Issues¶
Not every decimal fraction has an exact binary representation:
This happens because 0.1 in binary is a repeating fraction (like 1/3 in decimal). The stored value is the nearest representable float, not the exact decimal value.
Floats are denser near zero and sparser as magnitude grows:
Between 1.0 and 2.0: ~8 million representable values
Between 1e15 and 2e15: same ~8 million values (much wider gaps)
War story: The Patriot missile failure in Dhahran (1991) was caused by floating-point drift. The system used a 24-bit fixed-point register to track time in tenths of seconds. After 100 hours of continuous operation, the accumulated rounding error was 0.34 seconds — enough to shift the tracking gate by 687 meters, causing the system to miss an incoming Scud missile. Twenty-eight soldiers died. This is the canonical example of why floating-point precision matters in real systems.
10. Special Float Values¶
+Infinity — result of 1.0 / 0.0
-Infinity — result of -1.0 / 0.0
NaN — result of 0.0 / 0.0, sqrt(-1), etc.
NaN != NaN — this is by specification (IEEE 754)
NaN comparisons always return false (except !=)
Use math.isnan() or equivalent to check for NaN.
What Experienced People Know¶
- The printed decimal representation of a float is a string approximation. The actual stored value is a binary pattern. Languages differ in how many digits they display.
- Never compare floats for exact equality in monitoring or alerting code. Use a tolerance (epsilon) or compare using relative error.
- Never use floating point for money. Use integer cents, fixed-point decimal types, or dedicated money libraries.
- The 2038 problem (32-bit signed time_t overflow) is real. Systems with 32-bit timestamps will wrap to 1901 or error. Many embedded systems, firmware, and file formats still use 32-bit timestamps.
- When debugging network issues, always check byte order. A port
number displayed as 20480 instead of 80 means someone forgot
ntohs(). xxd,hexdump -C, andodare essential tools for inspecting binary data. Learn at least one well.- Bitwise operations show up constantly in systems work: IP subnet calculations, file permission masks, protocol flag fields, CPU feature flags, and memory alignment checks.
-
Integer overflow in counters is the source of many production incidents. Know the width of your counters and plan for wrapping.
Remember: The quick way to check endianness on a Linux system:
echo -n I | od -to2 | head -1 | cut -f2 -d' '. If it prints00111it is big-endian;34111means little-endian. Or simply:lscpu | grep 'Byte Order'. -
Mixing signed and unsigned integers in arithmetic is a common source of subtle bugs in C/C++ code and the protocols they implement. The results of mixed operations are often surprising.
Wiki Navigation¶
Related Content¶
- Binary Flashcards (CLI) (flashcard_deck, L1) — Binary & Number Representation