Researcher API
>_ Module 03 — Python API, CLI, and Workflow Tooling
One Python Engine, three ways to summon it, and a CLI that handles the
CUDA / Conda / CMake plumbing so research — not setup — is where the time goes. The
same API scales from a one-line preset to a fully composed pipeline without changing
the call site. Hydra-driven configs, Snakemake orchestration, and a small set of
helper utilities turn the benchmarking loop from script-salad into library calls.
Three Ways to Start
Every researcher lands on the same engine.process(samples) — what
differs is how the engine was built. Presets cover the 80% case with a single string.
A validated ExecutorConfig takes over when a parameter needs tuning.
Stage-level composition via StageFactory is reserved for the cases where
a brand-new stage enters the pipeline. Three surfaces, one call downstream.
default, iono,
and ionox cover general, ionospheric, and extended-SNR workflows.
Zero knobs, zero boilerplate — good for a first run and for teaching.
engine object back.
Configuration as Schema
Configuration is a first-class Pydantic v2 model, not a free-form dict. Invalid FFT sizes, illegal sample rates, and mode / stage mismatches raise at construction. Derived quantities — real-time factor, frequency resolution, hop size, memory budget — are computed properties, so they stay consistent with the inputs that produced them.
ExecutorConfig rejects negative rates, non-power-of-two NFFT,
and hop values that exceed the window. A bad config fails before any CUDA
context is created — cheap failures stay cheap.
freq_resolution, hop_samples, and estimated
memory footprint are properties of the config, derived from the inputs.
One source of truth — no drift between the value in the YAML and the value
reported in benchmarks.
ExecutorConfig under a name. The
preset system is a view over the schema, not parallel to it — there is
no second path that can drift.
ConfigError, BenchmarkError,
ExperimentError, ReproducibilityError — a shallow
exception hierarchy that lets callers catch the narrow failure they care
about without a bare except.
CLI-First Development
The CLI exists so the Python API can stay unopinionated. Bootstrap, build, test,
lint, and run all collapse into a handful of shell entry points. No manual
nvcc invocations, no CMake incantations, no remembering which conda
env to activate — the scripts pick the right environment, cache the build, and
fall through to sensible defaults.
- Verify CUDA toolkit and GPU visibility
- Resolve the right conda env by platform
- Install the project in editable mode
- Safe to re-run — idempotent by design
- Configure, build, and install the native extension
- Incremental by default, full rebuild on flag
- Run the Google Test suite with gcovr reports
- Launch Nsight Systems / Compute captures
- Lint, format, and type-check the package
- Run pytest with fixture-sharing across modules
- Launch benchmark runners with a single verb
- Invoke the Snakemake DAG for experiment sweeps
Python Package Layout
The package at src/sigtekx/ is organized by concern, not by feature.
Each submodule owns a narrow responsibility — configuration, pipeline composition,
execution, utility infrastructure — so a researcher reading the code can land on
the right file without tracing import chains.
Workflow Orchestration
Experiments are Hydra configs composed from benchmark/,
engine/, and experiment/ YAML groups. Snakemake consumes
that DAG and runs only the nodes whose inputs or code have changed. The same
command twice in a row does nothing the second time — idempotency comes from
marker files, not from a diff of the world.
artifacts/ and feed straight into the Module 04 analyzers.
Environments
Four conda specs under environments/ cover the axes that actually
differ across machines — build vs. runtime, Linux vs. Windows. A new contributor
can get the matching environment for their platform in one command; CI pins the
same specs so local results and cloud results run on the same toolchain.
Helper Utilities
Three utilities at the repo root turn the most-repeated benchmark boilerplate into library calls. They own the tedious parts — clock locking, stage timing, dataset materialization — so every runner in Module 04 can stay focused on what it's actually measuring.
nvidia-smi privilege
dance. Warmup and steady-state duration control, NVTX range tagging, and
a context manager that guarantees the GPU is returned to its default
state even on exception paths.
utils.reproducibility handles seed propagation across
NumPy, PyTorch, and the CUDA PRNG. Every run records its seed set in the
output CSV — a benchmark that can't be reproduced doesn't get saved.
Data Outputs
The API produces data that Module 04 consumes. Two conventions make that handoff boring: an academic real-time-factor definition, and a stable CSV schema. Boring is the goal — analytics should never have to re-derive what a column means.
RTF = T_process / T_signal — the academic convention from ASR
and radar literature, where lower is faster. A 100 kHz stream processed in
300 µs per 100 ms window yields RTF ≈ 0.003. The opposite throughput
convention (higher is faster) is documented in
rtf-convention-mapping.md for readers arriving from that side
of the field.
artifacts/data/ —
columns for config hash, executor mode, NFFT, sample rate, per-frame
latency, stage timings, seed, and environment fingerprint. The schema is
versioned; additions are append-only, so an old analyzer against new data
still runs. Full column list in csv-file-organization.md.