Performance Analytics
>_ Module 04 — Benchmarks, Dashboards, and Scientific ValidationFour typed benchmark runners feed a three-tier logging system, a modular analysis engine rolls that data into metrics with confidence intervals, and Streamlit dashboards turn the result into something a reviewer can read. The runners are the Module 03 API in anger — the analyzers and dashboards are what makes the numbers trustworthy enough to publish.
Benchmark Runners
Four scripts under benchmarks/, each narrow in scope and each backed
by the same shared harness. Every runner is a thin Hydra entry point — parse the
composed config, build the engine via the Module 03 API, loop over iterations
with the clock-locked profiling helper, and write one CSV row per measurement
window. No bespoke logging code in any runner; all of it lives in the harness.
Timing Methodology
Benchmark numbers are only as good as the clock behind them. The runners use a hybrid timing strategy plus a set of stability knobs that together took coefficient of variation from 40% on an unturned baseline to 18.7% under locked GPU clocks.
std::chrono — at ~10 kFPS, CUDA-event
overhead dominates and warps the result. Two tools, two regimes, one
informed choice per runner.
cudaDeviceScheduleBlockingSync is set globally in the
executor. Without it, cudaStreamSynchronize spin-waits on
the CPU and picks up OS scheduler noise. With it, the thread blocks
cleanly — CV dropped ~26% the moment this flag landed.
--lock-clocks pins graphics and memory clocks through the
Module 03 profiling helper. Eliminates GPU Boost and thermal throttling
noise. Warmup drift falls from ~13 µs to ~2 µs; final CV lands at
18.7% on an RTX 3090 Ti.
Experiment Logging Architecture
Three sinks, one source. The harness emits a single measurement; a fan-out helper
lands it in the right place for each consumer. Ephemeral artifacts live under
artifacts/ and regenerate on demand; persistent datasets under
datasets/ are promoted manually and survive a clean.
artifacts/data/ using the Module 03
schema. Streamlit reads CSV directly; the analyzers use CSV when
MLflow isn't available; the format stays portable across tools.
experiment-logging-system.md.
Modular Analysis Engine
The analyzers sit above the logs. AnalyzerBase handles loading,
schema validation, and aggregation; five specialized subclasses each own one
angle of the data. AnalysisEngine orchestrates them — pass in a
run directory, get back a structured report.
Statistical Rigor
Benchmark numbers come with uncertainty. The analysis layer is explicit about it — intervals, tests, and effect sizes sit beside every aggregate. Two runs that differ by 2 µs at the mean aren't different if the confidence intervals overlap; the analyzer says so instead of letting a plot imply otherwise.
Streamlit Dashboard
app.py plus five pages. Each page is a focused view — one executor
mode, one application domain, or one cross-cutting concern. The whole app reads
CSV from artifacts/data/ on every interaction; a file-watcher
reloads when a new run lands so a long sweep populates the UI in place.
Validation
Separate from day-to-day benchmarking, the experiments/validation/
folder and a set of CUDA-level correctness choices exist to check the
benchmarks themselves. Narrow on purpose — each validation study proves one
methodological choice, not a result.
hypotf(x, y) instead of
sqrtf(x*x + y*y) — IEEE-754-safe against overflow and
underflow in intermediates. Compiler flags --fmad=false and
--ftz=false disable fused-multiply-add and preserve denormals,
so Debug and Release builds produce bit-identical results. Validation
against SciPy double-precision (Parseval energy, FFT linearity, per-tone
SNR) brought the accuracy-suite pass rate from ~75% to 98.8% and lifted
typical SNR into the 130–140 dB range.
RTF in Practice
Module 03 defines RTF. This module interprets it. An RTF value on its own is
half an answer — the same 0.01 means "comfortably real-time" for a 100 kHz
ionospheric stream and "dangerously close" for a 10 MHz one. The
ScientificMetricsAnalyzer is the bridge from raw RTF to a decision.