Blockchain Basics

FIN 4506

Lorenzo Naranjo

Spring 2026

Digital Foundations

Before blockchain mechanics, we need four building blocks:
- bits
- bytes
- hexadecimal notation
- cryptographic hash functions

Bits and Bytes

A bit is a binary digit: 0 or 1
A byte is 8 bits
Example: \begin{aligned} 10101100_2 & = 1\cdot 2^7 + 0\cdot 2^6 + 1\cdot 2^5 + 0\cdot 2^4 + 1\cdot 2^3 + 1\cdot 2^2 + 0\cdot 2^1 + 0\cdot 2^0 \\ & = 172_{10} \end{aligned}

Hexadecimal (Base 16)

Hex uses symbols 0-9 and A-F
One hex digit = 4 bits
Therefore:
- 1 byte (8 bits) = 2 hex digits
- 256 bits = 32 bytes = 64 hex characters
Example: 10101100_2 = \mathrm{AC}_{16} = 172_{10}

Text to Bytes Example

Computers hash bytes, not abstract text
UTF-8 encoding maps characters to bytes
Example string: "hello"
- bytes in hex: 68 65 6C 6C 6F
- so changing case/punctuation changes bytes before hashing

What Is a Hash?

A hash function maps arbitrary input to fixed-length output
SHA-256 output length is always 256 bits (64 hex chars)
Example interpretation:
- input: a transaction or block header
- output: a 256-bit digest used as a compact fingerprint

Why Hashes Are Useful

Deterministic: same input, same output
Avalanche effect: tiny input change, very different output
Pre-image resistance: hard to reverse digest back to original input
Collision resistance: hard to find two inputs with same digest
These properties enable tamper-evidence in blockchain data structures

Why Blockchain?

Core coordination problem: no central ledger manager
Nodes must agree on one transaction history
Agreement must hold under:
- network delays
- strategic behavior
- potentially malicious participants

Encoding and Hashes

Data are bytes (8 bits), usually shown in hex
SHA-256 maps arbitrary input to a fixed 256-bit digest
Security-relevant properties:
- deterministic
- pre-image resistance
- collision resistance
- avalanche effect

Block Composition

A block has two parts:
- Header: compact metadata used for PoW/identification
- Body: full transaction list
In this simplified setup, header includes:
- previous block hash
- transaction summary hash (Merkle root)
- timestamp
- nonce

Merkle Root (Intuition)

Start by hashing each transaction (leaf hashes)
Hash leaves in pairs, then hash resulting parents in pairs
Continue until one top hash remains: the Merkle root
One transaction change alters its leaf, then propagates upward and changes the root

Why Merkle Roots Matter

Compact commitment: one fixed-size hash summarizes all block transactions
Fast verification of inclusion:
- a node can verify one transaction with a short Merkle proof
- only sibling hashes along one tree path are needed
Full transactions are still in the block body; the root is a summary, not a replacement

Proof of Work (PoW)

Miners search for a nonce such that: \text{hash(header)} < \text{target}
Smaller target means higher difficulty
“More leading zeros” is shorthand for a lower target in hex representation

First-Passage View of Mining

Each nonce trial is one Bernoulli attempt
Under leading-N-hex-zero rule: p = 16^{-N}
If K is tries to first success, then K\sim\text{Geometric}(p) and \mathbb{E}[K]=\frac{1}{p}=16^N

Economic Security Logic

Producing valid blocks is expensive (hardware + energy)
Verifying blocks is cheap for all nodes
Rewards are paid only to accepted blocks
For most miners, honest participation is profit-maximizing

Confirmation Security

To reverse a confirmed payment, attacker must catch up from deficit z
If honest share is p, attacker share is q, with p+q=1:
- if q \ge p, eventual catch-up probability is 1
- if q < p, benchmark catch-up probability: Q_z = \left(\frac{q}{p}\right)^z
Reversal risk declines exponentially with confirmation depth when q<p

State-Process Representation

At block t, define state: X_t=(h_{t-1}, m_t, \tau_t, T_t)

h_{t-1}: previous hash
m_t: Merkle-root transaction summary
\tau_t: timestamp
T_t: difficulty target

Recursion: current state \rightarrow nonce search \rightarrow accepted hash \rightarrow next state.

Mining Pools

Solo mining has high payout variance
Pools smooth payouts by sharing reward flow
Common contracts:
- PPS: miners get smoother pay, pool bears more variance
- PPLNS: miners bear more variance, pool bears less
Pool concentration can increase coordination/security risk

Simulation Setup (Notebook Link)

Fix prev_hash, merkle_root, timestamp
Vary nonce sequentially
Stop when hash satisfies leading-zero difficulty rule
Compare empirical average tries with theoretical (16^N)

Simulation Code

import hashlib
import matplotlib.pyplot as plt
import pandas as pd
import random
import statistics
import time

def sha256_hex(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8")).hexdigest()

def mine_once_leading_zeros(N: int, max_tries: int = 2_000_000):
    prev_hash = "0" * 64
    merkle_root = sha256_hex(f"tx-set-{random.randint(0, 10**9)}")
    timestamp = int(time.time())
    target_prefix = "0" * N

    for nonce in range(max_tries):
        header = f"{prev_hash}|{merkle_root}|{timestamp}|{nonce}"
        h = sha256_hex(header)
        if h.startswith(target_prefix):
            return nonce + 1
    return None

Simulation Results: Summary Table

settings = [
    {"N": 2, "reps": 200},
    {"N": 3, "reps": 120},
    {"N": 4, "reps": 40},
]

rows = []
for s in settings:
    N, reps = s["N"], s["reps"]
    samples = [mine_once_leading_zeros(N) for _ in range(reps)]
    samples = [x for x in samples if x is not None]

    rows.append({
        "N": N,
        "reps": len(samples),
        "theory_E_tries": 16**N,
        "empirical_mean": round(statistics.mean(samples), 2),
        "empirical_median": round(statistics.median(samples), 2),
    })

df_summary = pd.DataFrame(rows)
df_summary

Simulation Results: Summary Table

	N	reps	theory_E_tries	empirical_mean	empirical_median
0	2	200	256	264.36	199.0
1	3	120	4096	3797.62	2706.5
2	4	40	65536	64913.88	43075.5

Simulation Results: Theory vs Empirical

Ns = df_summary["N"].tolist()
empirical = df_summary["empirical_mean"].tolist()
theory = df_summary["theory_E_tries"].tolist()

plt.figure(figsize=(7, 4))
plt.plot(Ns, empirical, marker="o", linewidth=2, label="Empirical mean tries")
plt.plot(Ns, theory, marker="s", linestyle="--", linewidth=2, label="Theory: $16^N$")
plt.xlabel("Difficulty (N leading hex zeros)")
plt.ylabel("Tries")
plt.title("Mining Effort: Simulation vs Theory")
plt.xticks(Ns)
plt.grid(alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

Simulation Results: Theory vs Empirical

Simulation Results: Dispersion at Fixed Difficulty

N = 3
reps = 200
samples = [mine_once_leading_zeros(N) for _ in range(reps)]

df_dispersion = pd.DataFrame([{
    "N": N,
    "min": min(samples),
    "median": statistics.median(samples),
    "mean": round(statistics.mean(samples), 2),
    "max": max(samples),
}])
df_dispersion

	N	min	median	mean	max
0	3	7	2938.5	4081.05	27347

Takeaways

Blockchain combines cryptographic commitment and incentive design
Merkle roots make transaction commitment compact and verifiable
PoW makes production costly but verification cheap
Confirmation depth converts elapsed work into security