Featured image of post 1 Byte = 8 Bits, So How Many Bytes Is Your Name?

1 Byte = 8 Bits, So How Many Bytes Is Your Name?

If you have no idea how many bytes a song lyric or even a single letter takes, this post will unlock a core computing skill you're missing.

I once spoke with a colleague who could recite “1 byte = 8 bits” in his sleep, yet froze when asked, “How many bytes is your name?”

Let’s fix that.

The Parking Lot Model

Picture a parking lot with 8 slots. Each slot is either empty (0) or has a car (1). That whole lot is a byte.

Slot #76543210
Price$128$64$32$16$8$4$2$1
  • 1 byte = 1 parking lot with 8 slots
  • 1 bit = 1 slot (empty = 0, car parked = 1)
  • Each slot has a price tag based on position: slot 0 costs $1, slot 1 costs $2, slot 7 costs $128
  • Total revenue = sum of occupied slot prices

This is a visualization of how a byte is structured:

Byte Status 00000000
Revenue $ $0
// TAP SLOT TO INITIALIZE PARKING SEQUENCE

00000001 → only the last slot has a car → revenue = 1

00000010 → one car parked in the next slot → revenue = 2

11111111 → all 8 slots are full → revenue = 128 + 64 + 32 + 16 + 8 + 4 + 2 + 1 = 255

The number (0–255) just depends on which slots have cars.

ASCII

Why do computers group 8 bits (2³) into 1 byte? Why not 6 or 10?

Short answer: It’s the perfect size for text. 🐧

ASCII Character Set

To store text, computers need to map characters to numbers. With 8 bits, you get 256 slots (0–255). That’s enough space for every English letter (a-z, A-Z), number, and symbol, with room to spare.

Think of it as 256 unique parking configurations. Each one maps to a specific character.

But here’s the twist: Standard ASCII actually only needs 7 bits (128 slots).

So, why 8? Honestly, it was a mix of lucky hardware decisions and the need for a standardized “chunk” size that could handle more than just simple text.

This is a visualization of how characters are distributed across bytes:

ASCII = 1 Lot. UTF-8 (Emojis) = 2-4 Lots.

ASCII Encoding

So, how do we convert between characters and bytes like the visualization above?

Let’s encode “Y” (from YELL) as an example.

Step 1: Character set Character → Number

Look it up in the ASCII table: Y = 89

Step 2: Encoding Number → Binary

Here’s where most people get stuck. Use the Bit-test method. Memorize this sequence: 128, 64, 32, 16, 8, 4, 2, 1

Start with 89 and work left to right:

  • 89 ≥ 128? ❌ → 0
  • 89 ≥ 64? ✅ → 1 (89 − 64 = 25)
  • 25 ≥ 32? ❌ → 0
  • 25 ≥ 16? ✅ → 1 (25 − 16 = 9)
  • 9 ≥ 8? ✅ → 1 (9 − 8 = 1)
  • 1 ≥ 4? ❌ → 0
  • 1 ≥ 2? ❌ → 0
  • 1 ≥ 1? ✅ → 1
1286432168421
01011001

Result: 01011001. That’s one byte.

ASCII Char
Target (0-255)
PARKING REVENUE STRATEGY
Waiting for Input...
CURRENT REVENUE $0
REMAINING $89

There’s also the “Divide by 2” method for converting decimal to binary, but I find it tedious and rarely use it.

The mental shift:

  • Humans think in characters
  • Computers think in bytes
  • Encodings are the translation layer between the two

Bigger Than ASCII

ASCII works for English. But what about 🦈, 你, ộ, or ắ?

256 values won’t cut it. We need something bigger.

Unicode Character Set

Q: Is Unicode the same type of thing as ASCII but bigger?

A: Sort of, but not exactly.

ASCII has 2 parts:

  • Character set: Convert character to decimal
  • Encoding: Convert decimal to binary

Unicode is just a character set. It assigns a unique number (code point) to every character, but it doesn’t dictate how those numbers are stored as bits.

CharCode PointDecimal
🦈U+1F988129,416
U+4F6020,320
U+1EC77,879
U+1EA57,845

Unicode 17.0 has 150,000+ code points.

The encoding part? That’s where UTF-8 comes in.

UTF-8 Encoding

Code point → bytes. How?

ASCII was simple: one character = one byte.

But 🦈 is 129,416. That’s way beyond 255. One parking lot (one byte) can’t hold it.

So, how do we distribute the cars across multiple lots to represent bigger numbers?

We need more lots.

UTF-8: The Multi-Lot Chain System

Chaining lots creates a problem: where does one character end and the next begin?

UTF-8 solves this with byte templates. The first few bits tell you how many bytes to expect.

Bytes neededTemplate
10xxxxxxx
2110xxxxx 10xxxxxx
31110xxxx 10xxxxxx 10xxxxxx
411110xxx 10xxxxxx 10xxxxxx 10xxxxxx

The prefixes (110, 1110, 11110) tell the computer: “Hey, I’m a multi-byte character, read X more bytes.”

Example: Encoding 🦈

  1. Find code point: 🦈 = U+1F988 = 129,416
  2. Check range: 129,416 falls in 65,536 – 1,114,111 → needs 4 bytes
  3. Convert to binary: represent the code point in binary (21 bits)
  4. Stuff into template: distribute those bits into the 4-byte template
Character
Code Point (Hex)
$ Target (Dec)
UTF-8 Encoding Breakdown
Loading...
Header (Gate/Flag)
Payload (Data)

Apply It in Python

Python keeps text (str) and raw bytes (bytes) as two different types.

  • A str is Unicode text (code points)
  • encode("utf-8") turns that text into the exact bytes that will be stored or sent
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> char = "🦈"

>>> char.encode("utf-8")
b'\xf0\x9f\xa6\x88'

>>> list(char.encode("utf-8"))
[240, 159, 166, 136]

>>> char.encode("utf-8").hex()
'f09fa688'

What you’re seeing:

  • b'...' means: this is bytes, not text
  • Each \x.. is one byte written in hex (f0, 9f, a6, 88)
  • Those same bytes in decimal are 240, 159, 166, 136
  • It’s 4 bytes because 🦈 is outside the ASCII range, so UTF-8 uses the 4-byte template

And you can always round-trip back to text:

1
2
>>> b'\xf0\x9f\xa6\x88'.decode('utf-8')
'🦈'

UTF-8 BOM

A BOM (Byte Order Mark) is an optional marker at the start of a text file that can signal the encoding (and for UTF‑16/UTF‑32, the byte order).

UTF-8 does not need a BOM. But some tools still save “UTF‑8 with BOM”, which prepends 3 bytes at the start of the file:

  • BOM bytes (hex): EF BB BF
  • That’s Unicode code point U+FEFF when decoded as text

Most of the time it’s harmless. The annoying case is when a strict parser expects the file to start with a specific character (like { for JSON), but it actually starts with BOM bytes.

Quick checks/fixes:

1
2
# show the first 3 bytes
xxd -g 1 -l 3 file.json
1
2
# when reading a text file that might include a UTF-8 BOM
open("file.json", "r", encoding="utf-8-sig").read()

Quick Reference

Code Point RangeSizeExampleCode PointDecimalUTF-8 bytes (dec)
0 – 127 (ASCII)1 byteYU+00598989
128 – 2,0472 byteséU+00E9233195, 169
2,048 – 65,5353 bytesU+4F6020,320228, 189, 160
65,536 – 1,114,1114 bytes🦈U+1F988129,416240, 159, 166, 136

The takeaway: UTF-8’s genius isn’t just that it supports emojis. It stays compatible with ASCII and scales to the entire Unicode space.

Quantum Computing (Bonus)

Let’s break the physics of our parking lot.

A classical bit is simple: The slot is either empty (0) or occupied (1).

A quantum bit (qubit) is weird. The car enters superposition.

  • Superposition: The car is in a ghostly state: both parked and empty at the same time.
  • Measurement: When you check the slot, the car is forced to “pick a side.” It instantly collapses into a normal 0 or 1.

It’s like Schrödinger’s Parking Spot. Spooky? Yes. But it allows quantum computers to calculate millions of possibilities at once.

Of course, with that “sometimes happy, sometimes sad” behavior, it’s used for other things. It’s not for storage the way we do it classically.

Made with laziness love 🦥

Subscribe to My Newsletter