diff --git a/README.md b/README.md
index 59e564c..d0eae2c 100644
--- a/README.md
+++ b/README.md
@@ -158,30 +158,46 @@ picks up rkvdec2 patches.
 6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target
 7. **Document** — freeze the final recipe in this README
 
-**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 done for H.264 on
-ohm; step 5 scaffold landed (`marfrit-packages/arch/ffmpeg-v4l2-request-git/`
-on `main`, commit `9994c4e`, not yet pushed — first fermi CI build + ohm
-install still to run). Immediate next: push, build, install, validate.
+**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 + 4 done for H.264 on
+ohm (GStreamer `v4l2slh264dec → waylandsink` at 7 % CPU with zero-copy
+dmabuf NV12). Step 5 scaffold landed
+(`marfrit-packages/arch/ffmpeg-v4l2-request-git/` on `main`, commit
+`9994c4e`, unpushed — fermi CI build + ohm install still to run).
+Immediate next: push, build, install, validate the ffmpeg path.
 
-### SW baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`)
+### Baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`)
 
 Clip SHA-16 `dcf8a7170fbd49bb` pulled from doppler
 `/moviedata/fourier-test/` via hertz `lxc file pull` → http bridge → ohm
-(same bytes on every fleet device).
+(same bytes on every fleet device). **Source frame rate is 24 fps** per
+H.264 caps (`framerate=24/1`) — the `1080p30` in the file name is a
+misnomer carried from Blender's encode metadata; the actual media rate is
+24 fps. That's the number the target/delivered columns below are measured
+against.
 
-| Test                                   | CPU%  | Frames / wall  | Effective fps | Dropped |
-|----------------------------------------|-------|----------------|---------------|---------|
-| `ffmpeg -hwaccel none -f null -`       | 319 % | 1440 / 18.55 s | 77.6          | n/a     |
-| `ffmpeg -re -hwaccel none -f null -`   |  90 % | 1440 / 60.27 s | 23.9 (paced)  | n/a     |
-| `mpv --hwdec=no --vo=gpu-next`, DSI-1  | 127 % | 1800 target    | ~14 delivered | **973** |
+| Test                                                         | CPU%  | Frames / wall     | Effective fps | Dropped |
+|--------------------------------------------------------------|-------|-------------------|---------------|---------|
+| SW: `ffmpeg -hwaccel none -f null -`                         | 319 % | 1440 / 18.55 s    | 77.6          | n/a     |
+| SW: `ffmpeg -re -hwaccel none -f null -`                     |  90 % | 1440 / 60.27 s    | 24.0 (paced)  | n/a     |
+| SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1                    | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** |
+| HW: `gst v4l2slh264dec → fakesink sync=false`                |  89 % | 1800 / 48.9 s     | 36.8          | n/a     |
+| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1        |   7 % | 1488 source / 62 s| ~24 (paced)   | unknown |
 
-Reading: raw SW decode has ~2.6× headroom over realtime on the 4× A55
-(≈1 core saturated per 30 fps stream). But mpv with `gpu-next` through
-KWin drops **54 %** of frames while only using 127 % CPU — the
-compositor / VO path is the bottleneck, not decode. Exactly the
-"compositor-bound ≠ decode-bound" gotcha below. HW decode will cut the
-decode cost but won't by itself fix the composition drops; `--vo=drm`
-(direct KMS scanout) or a dmabuf-capable VO is the lever for that.
+Reading:
+- SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but
+  costs ~3.2 cores unpaced or 1 core at realtime pace.
+- SW mpv via `gpu-next` through KWin drops **68 %** of frames while only
+  using 127 % CPU — the compositor / VO path is the bottleneck, not
+  decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below.
+- HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU.
+- **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12)
+  drops the CPU to 7 %** — an ≈18× drop from the SW mpv number. The
+  GStreamer `v4l2codecs → waylandsink` path on KWin negotiates
+  dmabuf-direct, bypassing any GL upload.
+- Frame-drop count for the HW waylandsink run is unknown here
+  (fpsdisplaysink signal wasn't wired through `gst-launch`); next pass
+  should parse via `GST_DEBUG=fpsdisplaysink:5` or a `progressreport`
+  element.
 
 ### Acceptance criterion