Blockchain Basics

FIN 4506

Lorenzo Naranjo

Spring 2026

Digital Foundations

  • Before blockchain mechanics, we need four building blocks:
    • bits
    • bytes
    • hexadecimal notation
    • cryptographic hash functions

Bits and Bytes

  • A bit is a binary digit: 0 or 1
  • A byte is 8 bits
  • Example: \begin{aligned} 10101100_2 & = 1\cdot 2^7 + 0\cdot 2^6 + 1\cdot 2^5 + 0\cdot 2^4 + 1\cdot 2^3 + 1\cdot 2^2 + 0\cdot 2^1 + 0\cdot 2^0 \\ & = 172_{10} \end{aligned}

Hexadecimal (Base 16)

  • Hex uses symbols 0-9 and A-F
  • One hex digit = 4 bits
  • Therefore:
    • 1 byte (8 bits) = 2 hex digits
    • 256 bits = 32 bytes = 64 hex characters
  • Example: 10101100_2 = \mathrm{AC}_{16} = 172_{10}

Text to Bytes Example

  • Computers hash bytes, not abstract text
  • UTF-8 encoding maps characters to bytes
  • Example string: "hello"
    • bytes in hex: 68 65 6C 6C 6F
    • so changing case/punctuation changes bytes before hashing

What Is a Hash?

  • A hash function maps arbitrary input to fixed-length output
  • SHA-256 output length is always 256 bits (64 hex chars)
  • Example interpretation:
    • input: a transaction or block header
    • output: a 256-bit digest used as a compact fingerprint

Why Hashes Are Useful

  • Deterministic: same input, same output
  • Avalanche effect: tiny input change, very different output
  • Pre-image resistance: hard to reverse digest back to original input
  • Collision resistance: hard to find two inputs with same digest
  • These properties enable tamper-evidence in blockchain data structures

Why Blockchain?

  • Core coordination problem: no central ledger manager
  • Nodes must agree on one transaction history
  • Agreement must hold under:
    • network delays
    • strategic behavior
    • potentially malicious participants

Encoding and Hashes

  • Data are bytes (8 bits), usually shown in hex
  • SHA-256 maps arbitrary input to a fixed 256-bit digest
  • Security-relevant properties:
    • deterministic
    • pre-image resistance
    • collision resistance
    • avalanche effect

Block Composition

  • A block has two parts:
    • Header: compact metadata used for PoW/identification
    • Body: full transaction list
  • In this simplified setup, header includes:
    • previous block hash
    • transaction summary hash (Merkle root)
    • timestamp
    • nonce

Merkle Root (Intuition)

  • Start by hashing each transaction (leaf hashes)
  • Hash leaves in pairs, then hash resulting parents in pairs
  • Continue until one top hash remains: the Merkle root
  • One transaction change alters its leaf, then propagates upward and changes the root

Why Merkle Roots Matter

  • Compact commitment: one fixed-size hash summarizes all block transactions
  • Fast verification of inclusion:
    • a node can verify one transaction with a short Merkle proof
    • only sibling hashes along one tree path are needed
  • Full transactions are still in the block body; the root is a summary, not a replacement

Proof of Work (PoW)

  • Miners search for a nonce such that: \text{hash(header)} < \text{target}
  • Smaller target means higher difficulty
  • “More leading zeros” is shorthand for a lower target in hex representation

First-Passage View of Mining

  • Each nonce trial is one Bernoulli attempt
  • Under leading-N-hex-zero rule: p = 16^{-N}
  • If K is tries to first success, then K\sim\text{Geometric}(p) and \mathbb{E}[K]=\frac{1}{p}=16^N

Economic Security Logic

  • Producing valid blocks is expensive (hardware + energy)
  • Verifying blocks is cheap for all nodes
  • Rewards are paid only to accepted blocks
  • For most miners, honest participation is profit-maximizing

Confirmation Security

  • To reverse a confirmed payment, attacker must catch up from deficit z
  • If honest share is p, attacker share is q, with p+q=1:
    • if q \ge p, eventual catch-up probability is 1
    • if q < p, benchmark catch-up probability: Q_z = \left(\frac{q}{p}\right)^z
  • Reversal risk declines exponentially with confirmation depth when q<p

State-Process Representation

At block t, define state: X_t=(h_{t-1}, m_t, \tau_t, T_t)

  • h_{t-1}: previous hash
  • m_t: Merkle-root transaction summary
  • \tau_t: timestamp
  • T_t: difficulty target

Recursion: current state \rightarrow nonce search \rightarrow accepted hash \rightarrow next state.

Mining Pools

  • Solo mining has high payout variance
  • Pools smooth payouts by sharing reward flow
  • Common contracts:
    • PPS: miners get smoother pay, pool bears more variance
    • PPLNS: miners bear more variance, pool bears less
  • Pool concentration can increase coordination/security risk

Simulation Code

import hashlib
import matplotlib.pyplot as plt
import pandas as pd
import random
import statistics
import time

def sha256_hex(s: str) -> str:
    return hashlib.sha256(s.encode("utf-8")).hexdigest()

def mine_once_leading_zeros(N: int, max_tries: int = 2_000_000):
    prev_hash = "0" * 64
    merkle_root = sha256_hex(f"tx-set-{random.randint(0, 10**9)}")
    timestamp = int(time.time())
    target_prefix = "0" * N

    for nonce in range(max_tries):
        header = f"{prev_hash}|{merkle_root}|{timestamp}|{nonce}"
        h = sha256_hex(header)
        if h.startswith(target_prefix):
            return nonce + 1
    return None

Simulation Results: Summary Table

settings = [
    {"N": 2, "reps": 200},
    {"N": 3, "reps": 120},
    {"N": 4, "reps": 40},
]

rows = []
for s in settings:
    N, reps = s["N"], s["reps"]
    samples = [mine_once_leading_zeros(N) for _ in range(reps)]
    samples = [x for x in samples if x is not None]

    rows.append({
        "N": N,
        "reps": len(samples),
        "theory_E_tries": 16**N,
        "empirical_mean": round(statistics.mean(samples), 2),
        "empirical_median": round(statistics.median(samples), 2),
    })

df_summary = pd.DataFrame(rows)
df_summary

Simulation Results: Summary Table

N reps theory_E_tries empirical_mean empirical_median
0 2 200 256 264.36 199.0
1 3 120 4096 3797.62 2706.5
2 4 40 65536 64913.88 43075.5

Simulation Results: Theory vs Empirical

Ns = df_summary["N"].tolist()
empirical = df_summary["empirical_mean"].tolist()
theory = df_summary["theory_E_tries"].tolist()

plt.figure(figsize=(7, 4))
plt.plot(Ns, empirical, marker="o", linewidth=2, label="Empirical mean tries")
plt.plot(Ns, theory, marker="s", linestyle="--", linewidth=2, label="Theory: $16^N$")
plt.xlabel("Difficulty (N leading hex zeros)")
plt.ylabel("Tries")
plt.title("Mining Effort: Simulation vs Theory")
plt.xticks(Ns)
plt.grid(alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

Simulation Results: Theory vs Empirical

Simulation Results: Dispersion at Fixed Difficulty

N = 3
reps = 200
samples = [mine_once_leading_zeros(N) for _ in range(reps)]

df_dispersion = pd.DataFrame([{
    "N": N,
    "min": min(samples),
    "median": statistics.median(samples),
    "mean": round(statistics.mean(samples), 2),
    "max": max(samples),
}])
df_dispersion
N min median mean max
0 3 7 2938.5 4081.05 27347

Takeaways

  • Blockchain combines cryptographic commitment and incentive design
  • Merkle roots make transaction commitment compact and verifiable
  • PoW makes production costly but verification cheap
  • Confirmation depth converts elapsed work into security