ChipWhisperer physical security labs

Side-channel and fault-injection work with SPA, DPA on AES, VerifyPin, and hardening countermeasures.

The ChipWhisperer lab is where abstract statements about cryptographic security stop being sufficient. “The algorithm is secure” becomes less useful when you can see the power trace and watch the algorithm leak. The labs ran on actual hardware — an STM32-based target board connected to a ChipWhisperer-Lite oscilloscope capture device — and the firmware we attacked was our own code, which made the vulnerabilities impossible to dismiss.

Two labs stand out. The first was SPA against a password comparison function. The second was DPA against AES-128.

SPA: the password function

The target firmware for the Simple Power Analysis lab is in simpleserial-glitch.c. The password function is:

uint8_t password(uint8_t* pw, uint8_t len) {
    char passwd[] = "cy5ec";
    char passok = 1;
    int cnt;

    trigger_high();

    for(cnt = 0; cnt < 5; cnt++){
        if (pw[cnt] != passwd[cnt]){
            passok = 0;
            break;
        }
    }

    trigger_low();

    simpleserial_put('r', 1, (uint8_t*)&passok);
    return 0x00;
}

The vulnerability is the break statement. When the function finds a mismatch at position cnt, it sets passok = 0 and exits the loop early. If the first character is wrong, the loop runs once and exits. If the first two characters are correct but the third is wrong, the loop runs three times and exits. The number of loop iterations is directly proportional to the number of correct leading characters.

The microcontroller’s power consumption correlates with its execution. Each loop iteration consumes measurable power. The ChipWhisperer samples the power trace at millions of samples per second, and the trace for a password that matched on four characters before failing is visibly longer than the trace for one that failed at position zero.

The attack is byte-by-byte recovery. Pick a character set. For each candidate first character, send it as the password, capture the trace, and measure how many loop iterations occurred based on trace length. When a trace is longer than all the others, that candidate was correct for the first position. Move to the second character with the confirmed first character held fixed, and repeat.

The character set used in the lab was 36 characters (lowercase letters plus digits). Brute-forcing the full five-character password would require 36^5 = 60,466,176 attempts. Byte-by-byte reduces that to at most 5 × 36 = 180 attempts, and in practice it is fewer because the correct answer terminates the search for each position early.

Confirming the recovered password “cy5ec” required sending it and observing passok = 1 in the response.

The countermeasure

The countermeasure is in simpleserial-glitch-sol.c. The break is commented out:

for(cnt = 0; cnt < 5; cnt++){
    if (pw[cnt] != passwd[cnt]){
        passok = 0;
        // break;   <-- removed
    }
}

Now the loop always runs exactly five iterations. Every password attempt produces the same trace length regardless of how many characters are correct. The timing information is gone. This is constant-time comparison: the execution time does not depend on the content of the secret.

The cost is two extra character comparisons on average when a mismatch occurs at position zero. That cost is negligible. The decision to add a break to the original code was almost certainly not a security decision — it was an efficiency habit, completely reasonable in any other context, that becomes a serious vulnerability in a security-critical path.

Leakage assessment: finding where to look

Before building an attack you need to know whether the target leaks at all and, if so, where. That is what the leakage assessment lab is for. Rather than trying to extract a key, the goal is to establish statistically whether the power consumption of the device depends on the values being processed — and at which sample points that dependence shows up.

The method is a Student t-test on two populations of power traces partitioned by some criterion. The null hypothesis H₀ is that the two populations have the same mean. If H₀ is rejected, the power trace at that sample point depends on the criterion — which is a leakage.

The test statistic for two equal-size sets of n traces each is:

t = (X̄₁ - X̄₂) / (sp * sqrt(2/n))

where sp is the pooled standard deviation. The threshold used in side-channel evaluations is |t| > 4.5, corresponding to roughly 99.999% confidence that the null hypothesis should be rejected. Below that threshold, the difference is within what statistical noise would produce by chance.

Control experiment: random versus random

The first experiment tests the method itself. Two sets of traces are captured under identical random conditions — different random keys and plaintexts for every trace, with traces interleaved across the two sets to eliminate order bias. The t-test on these two sets should find no sample point above the threshold, because nothing distinguishes the two populations. Verifying this is not busywork: if the control fails, the measurement setup has a systematic bias and the fixed-versus-random result cannot be trusted.

Non-specific characterization: fixed versus random

The second experiment uses one set with a fixed key and fixed plaintext (the same values repeated every acquisition) and one set with random key and random plaintext. When the fixed set is computed, the intermediate values inside AES are the same on every trace. The device’s power pattern at those clock cycles should therefore look different from the random set, where the intermediate values vary.

The t-test picks this up. Sample points where |t| > 4.5 are points where the two populations are statistically distinguishable — where the fixed computation produces a characteristic power signature. The result from the lab showed significant t-values at multiple sample points, confirming that the AES implementation leaks information through power.

This is a non-specific result. It tells you that something leaks, and where in the trace it leaks, but not which intermediate value is responsible or how to exploit it.

Specific characterization: which intermediate value?

To identify the specific source of leakage, the traces are partitioned differently. Instead of fixed vs random, the lab partitioned by the least significant bit of a specific intermediate value across 1000 traces with known keys and plaintexts:

Key bytes: partition by LSB of each key byte — if the partition produces significant t-values, the hardware is processing that key byte in a way that exposes its bits.
AddRoundKey output: partition by LSB of plaintext XOR key for each byte position — this is the output of the first AES step.
SBox output: partition by LSB of SBox[plaintext XOR key] — this is the output of the SubBytes step.

Overlaying the absolute t-values for all 16 key bytes on a single plot produces 16 distinct peaks, visible in the sample range roughly 150 to 370. The peaks appear consecutively because AES processes the 16 bytes one after another. The peaks for the SBox output are the sharpest — the nonlinear S-Box lookup is the most power-distinguishable step in the round, which is exactly why DPA targets it.

for byte_idx in range(16):
    lsb_0 = (keys[:, byte_idx] % 2) == 0
    lsb_1 = (keys[:, byte_idx] % 2) == 1
    t_stats, _ = ttest_ind(traces[lsb_0], traces[lsb_1], axis=0, equal_var=True)
    plt.plot(np.abs(t_stats), alpha=0.6)

plt.axhline(y=4.5, color='r', linestyle='--')
plt.axis([150, 370, 0, 8])

This is what connects the assessment lab to the DPA attack. Leakage assessment identifies where in the execution the power depends on secret values. DPA uses that location to build an attack. The two labs are the same circuit of ideas: measure, find structure, exploit structure.

The approach is documented in Schneider and Moradi’s “Leakage Assessment Methodology — A Clear Roadmap for Side-Channel Evaluations” (CHES 2015), which is the standard reference for this kind of structured evaluation.

DPA: AES-128 key recovery

Differential Power Analysis against AES is more sophisticated. The target runs standard AES-128. The key is unknown. The attack recovers all 16 key bytes from power traces gathered during encryption.

The approach targets the first round. AES-128 encryption begins by XORing the 16-byte plaintext with the 16-byte key (AddRoundKey), then passing each byte through the AES S-Box (SubBytes). The S-Box is a fixed 256-entry lookup table. This is the step that leaks.

The leakage model

The attack uses a Hamming weight leakage model: the power consumed during a computation is proportional to the number of 1 bits in the value being processed. This is a simplification of real power consumption, but it captures the dominant effect for CMOS circuits.

The target function in the analysis notebook is:

SBOX = [...]   # standard 256-entry AES S-Box

def target_function(plaintext_byte, key_guess):
    return SBOX[plaintext_byte ^ key_guess]

For a given byte position b, the attacker has N traces, each corresponding to a known plaintext. For each of the 256 possible key byte values, the attacker computes target_function(plaintext[b], k) for every trace and extracts one bit of that value as the leakage prediction.

def leakage_function(x):
    return x & 1   # extract bit 0

Partitioning and differential

For each key guess k, the traces are partitioned into two groups:

Group 0: traces where leakage_function(target_function(plaintext[b], k)) == 0
Group 1: traces where leakage_function(target_function(plaintext[b], k)) == 1

The DPA statistic is the mean difference between the two groups, taken as the absolute value at each sample point:

def hypothesis(traces, plaintexts, key_guess):
    sel = [leakage_function(target_function(pt[b], key_guess)) for pt in plaintexts]
    group0 = traces[sel == 0]
    group1 = traces[sel == 1]
    return np.abs(np.mean(group1, axis=0) - np.mean(group0, axis=0))

The logic is: if k is the correct key, then the partition into groups 0 and 1 will correctly separate traces by a real hardware event — whether bit 0 of that S-Box output was a 0 or a 1 on a particular clock cycle. The difference between the group means will show a peak at the clock cycle where the S-Box lookup actually occurred. If k is wrong, the partition is uncorrelated with anything real, and the mean difference across all sample points stays near zero.

Full key recovery

The key is recovered byte by byte. For each of the 16 byte positions, run the 256-guess sweep and find the guess that produces the largest peak:

def dpa(traces, plaintexts, byte_idx):
    peaks = []
    for k in range(256):
        diff = hypothesis(traces, plaintexts, k)
        peaks.append(np.max(diff))
    return np.argmax(peaks)

def recover_full_key(traces, plaintexts):
    return [dpa(traces, plaintexts, b) for b in range(16)]

Running this against the captured traces produced the correct key:

0x00 0x01 0x02 0x03 0x04 0xca 0xfe 0x07 0xde 0xc0 0x0d 0xed 0x0c 0x0d 0x0e 0x0f

Written out: 0x00010203 04cafe07 dec00ded 0c0d0e0f. The embedded “cafe”, “dec0”, and “0ded” segments are clearly intentional by whoever set the test key — they make correct recovery unambiguous.

Why this works

Each S-Box lookup is an operation on a fixed-size value with a predictable data-dependent power cost. The XOR with the key is linear, so the S-Box input is plaintext[b] XOR key[b]. The attacker controls the plaintext, knows the S-Box, and guesses the key. The only unknown is which guess matches the actual hardware behavior. Power analysis provides the oracle.

The attack requires enough traces to average down the noise. With clean traces and a cooperative target, a few hundred traces suffice. Real hardware in noisy environments requires more, and countermeasures like masking (XORing intermediate values with a random mask before the S-Box and removing it after) make the simple partitioning approach fail because the leakage is no longer only a function of plaintext XOR key.

Countermeasures and what they cost

The SPA countermeasure was free: remove the break, accept two extra comparisons. The DPA countermeasure is not free. Masking adds at least one random mask generation and two additional XOR operations per S-Box lookup. Hiding adds jitter (random delays that make it hard to align traces) or shuffling (processing S-Box lookups in random order). Both masking and hiding degrade performance and increase hardware complexity.

The labs made the performance/security trade-off concrete. The attack works because efficient hardware is predictable hardware. Making it less predictable costs cycles, area, or both. Those costs are real, and someone has to decide whether they are worth paying — which requires understanding what the threat model actually is for the device in question.

The other thing these labs taught is that the word “secure” applied to a cryptographic algorithm does not transfer automatically to an implementation. AES is secure. This AES implementation, running on this microcontroller, without masking, in a lab with a power probe attached, is not. The distinction matters whenever the device is physically accessible to an adversary.