1 Byte = 8 Bits, So How Many Bytes Is Your Name?

I once spoke with a colleague who could recite “1 byte = 8 bits” in his sleep, yet froze when asked, “How many bytes is your name?”

Let’s fix that.

The Parking Lot Model

Picture a parking lot with 8 slots. Each slot is either empty (0) or has a car (1). That whole lot is a byte.

Slot #	7	6	5	4	3	2	1	0
Price	$128	$64	$32	$16	$8	$4	$2	$1

1 byte = 1 parking lot with 8 slots
1 bit = 1 slot (empty = 0, car parked = 1)
Each slot has a price tag based on position: slot 0 costs $1, slot 1 costs $2, slot 7 costs $128
Total revenue = sum of occupied slot prices

This is a visualization of how a byte is structured:

00000001 → only the last slot has a car → revenue = 1

00000010 → one car parked in the next slot → revenue = 2

11111111 → all 8 slots are full → revenue = 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255

The number (0–255) just depends on which slots have cars.

ASCII

Why do computers group 8 bits (2³) into 1 byte? Why not 6 or 10?

Short answer: It’s the perfect size for text. 🐧

ASCII Character Set

To store text, computers need to map characters to numbers. With 8 bits, you get 256 slots (0–255). That’s enough space for every English letter (a-z, A-Z), number, and symbol, with room to spare.

Think of it as 256 unique parking configurations. Each one maps to a specific character.

But here’s the twist: Standard ASCII actually only needs 7 bits (128 slots).

So, why 8? Honestly, it was a mix of lucky hardware decisions and the need for a standardized “chunk” size that could handle more than just simple text.

This is a visualization of how characters are distributed across bytes:

ASCII Encoding

So, how do we convert between characters and bytes like the visualization above?

Let’s encode “Y” (from YELL) as an example.

Step 1: Character set Character → Number

Look it up in the ASCII table: Y = 89

Step 2: Encoding Number → Binary

Here’s where most people get stuck. Use the Bit-test method. Memorize this sequence: 128, 64, 32, 16, 8, 4, 2, 1

Start with 89 and work left to right:

89 ≥ 128? ❌ → 0
89 ≥ 64? ✅ → 1 (89 − 64 = 25)
25 ≥ 32? ❌ → 0
25 ≥ 16? ✅ → 1 (25 − 16 = 9)
9 ≥ 8? ✅ → 1 (9 − 8 = 1)
1 ≥ 4? ❌ → 0
1 ≥ 2? ❌ → 0
1 ≥ 1? ✅ → 1

128	64	32	16	8	4	2	1
0	1	0	1	1	0	0	1

Result: 01011001. That’s one byte.

There’s also the “Divide by 2” method for converting decimal to binary, but I find it tedious and rarely use it.

The mental shift:
Humans think in characters
Computers think in bytes
Encodings are the translation layer between the two

Bigger Than ASCII

ASCII works for English. But what about 🦈, 你, ộ, or ắ?

256 values won’t cut it. We need something bigger.

Unicode Character Set

Q: Is Unicode the same type of thing as ASCII but bigger?

A: Sort of, but not exactly.

ASCII has 2 parts:

Character set: Convert character to decimal
Encoding: Convert decimal to binary

Unicode is just a character set. It assigns a unique number (code point) to every character, but it doesn’t dictate how those numbers are stored as bits.

Char	Code Point	Decimal
🦈	U+1F988	129,416
你	U+4F60	20,320
ộ	U+1EC7	7,879
ắ	U+1EA5	7,845

Unicode 17.0 has 150,000+ code points.

The encoding part? That’s where UTF-8 comes in.

UTF-8 Encoding

Code point → bytes. How?

ASCII was simple: one character = one byte.

But 🦈 is 129,416. That’s way beyond 255. One parking lot (one byte) can’t hold it.

So, how do we distribute the cars across multiple lots to represent bigger numbers?

We need more lots.

UTF-8: The Multi-Lot Chain System

Chaining lots creates a problem: where does one character end and the next begin?

UTF-8 solves this with byte templates. The first few bits tell you how many bytes to expect.

Bytes needed	Template
1	`0xxxxxxx`
2	`110xxxxx 10xxxxxx`
3	`1110xxxx 10xxxxxx 10xxxxxx`
4	`11110xxx 10xxxxxx 10xxxxxx 10xxxxxx`

The prefixes (110, 1110, 11110) tell the computer: “Hey, I’m a multi-byte character, read X more bytes.”

Example: Encoding 🦈

Find code point: 🦈 = U+1F988 = 129,416
Check range: 129,416 falls in 65,536 – 1,114,111 → needs 4 bytes
Convert to binary: represent the code point in binary (21 bits)
Stuff into template: distribute those bits into the 4-byte template

Apply It in Python

Python keeps text (str) and raw bytes (bytes) as two different types.

A str is Unicode text (code points)
encode("utf-8") turns that text into the exact bytes that will be stored or sent

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> char = "🦈"

>>> char.encode("utf-8")
b'\xf0\x9f\xa6\x88'

>>> list(char.encode("utf-8"))
[240, 159, 166, 136]

>>> char.encode("utf-8").hex()
'f09fa688'

What you’re seeing:

b'...' means: this is bytes, not text
Each \x.. is one byte written in hex (f0, 9f, a6, 88)
Those same bytes in decimal are 240, 159, 166, 136
It’s 4 bytes because 🦈 is outside the ASCII range, so UTF-8 uses the 4-byte template

And you can always round-trip back to text:

1
2
>>> b'\xf0\x9f\xa6\x88'.decode('utf-8')
'🦈'

UTF-8 BOM

A BOM (Byte Order Mark) is an optional marker at the start of a text file that can signal the encoding (and for UTF‑16/UTF‑32, the byte order).

UTF-8 does not need a BOM. But some tools still save “UTF‑8 with BOM”, which prepends 3 bytes at the start of the file:

BOM bytes (hex): EF BB BF
That’s Unicode code point U+FEFF when decoded as text

Most of the time it’s harmless. The annoying case is when a strict parser expects the file to start with a specific character (like { for JSON), but it actually starts with BOM bytes.

Quick checks/fixes:

1
2
# show the first 3 bytes
xxd -g 1 -l 3 file.json

1
2
# when reading a text file that might include a UTF-8 BOM
open("file.json", "r", encoding="utf-8-sig").read()

Quick Reference

Code Point Range	Size	Example	Code Point	Decimal	UTF-8 bytes (dec)
0 – 127 (ASCII)	1 byte	Y	U+0059	89	89
128 – 2,047	2 bytes	é	U+00E9	233	195, 169
2,048 – 65,535	3 bytes	你	U+4F60	20,320	228, 189, 160
65,536 – 1,114,111	4 bytes	🦈	U+1F988	129,416	240, 159, 166, 136

The takeaway: UTF-8’s genius isn’t just that it supports emojis. It stays compatible with ASCII and scales to the entire Unicode space.

Quantum Computing (Bonus)

Let’s break the physics of our parking lot.

A classical bit is simple: The slot is either empty (0) or occupied (1).

A quantum bit (qubit) is weird. The car enters superposition.

Superposition: The car is in a ghostly state: both parked and empty at the same time.
Measurement: When you check the slot, the car is forced to “pick a side.” It instantly collapses into a normal 0 or 1.

It’s like Schrödinger’s Parking Spot. Spooky? Yes. But it allows quantum computers to calculate millions of possibilities at once.

Of course, with that “sometimes happy, sometimes sad” behavior, it’s used for other things. It’s not for storage the way we do it classically.