Researcher API

>_ Module 03 — Python API, CLI, and Workflow Tooling

One Python Engine, three ways to summon it, and a CLI that handles the CUDA / Conda / CMake plumbing so research — not setup — is where the time goes. The same API scales from a one-line preset to a fully composed pipeline without changing the call site. Hydra-driven configs, Snakemake orchestration, and a small set of helper utilities turn the benchmarking loop from script-salad into library calls.

Three Ways to Start

Every researcher lands on the same engine.process(samples) — what differs is how the engine was built. Presets cover the 80% case with a single string. A validated ExecutorConfig takes over when a parameter needs tuning. Stage-level composition via StageFactory is reserved for the cases where a brand-new stage enters the pipeline. Three surfaces, one call downstream.

Preset
One-Line Default
engine = Engine.from_preset("iono")
Named bundle of validated parameters. default, iono, and ionox cover general, ionospheric, and extended-SNR workflows. Zero knobs, zero boilerplate — good for a first run and for teaching.
Config
Schema-Driven Build
cfg = ExecutorConfig(nfft=4096, mode="streaming") engine = Engine(config=cfg)
Pydantic-validated struct. Override what you care about, inherit the rest. Computed properties (RTF, hop, memory) fall out of the schema — invalid combinations fail at construction, not mid-run.
Pipeline
Custom Stage Composition
stages = StageFactory.compose([ WindowStage("hann"), FFTStage(4096), MagnitudeStage(), ]) engine = Engine(stages=stages)
Drop in a bandpass, log-magnitude, or PSD stage without touching the executors. Factory resolves the graph, validates it, and hands the same engine object back.

Configuration as Schema

Configuration is a first-class Pydantic v2 model, not a free-form dict. Invalid FFT sizes, illegal sample rates, and mode / stage mismatches raise at construction. Derived quantities — real-time factor, frequency resolution, hop size, memory budget — are computed properties, so they stay consistent with the inputs that produced them.

Strict Validation
ExecutorConfig rejects negative rates, non-power-of-two NFFT, and hop values that exceed the window. A bad config fails before any CUDA context is created — cheap failures stay cheap.
Computed Properties
RTF, freq_resolution, hop_samples, and estimated memory footprint are properties of the config, derived from the inputs. One source of truth — no drift between the value in the YAML and the value reported in benchmarks.
Presets as Bundles
A preset is a validated ExecutorConfig under a name. The preset system is a view over the schema, not parallel to it — there is no second path that can drift.
Typed Exceptions
ConfigError, BenchmarkError, ExperimentError, ReproducibilityError — a shallow exception hierarchy that lets callers catch the narrow failure they care about without a bare except.

CLI-First Development

The CLI exists so the Python API can stay unopinionated. Bootstrap, build, test, lint, and run all collapse into a handful of shell entry points. No manual nvcc invocations, no CMake incantations, no remembering which conda env to activate — the scripts pick the right environment, cache the build, and fall through to sensible defaults.

init_bash.sh
One-Shot Bootstrap
  • Verify CUDA toolkit and GPU visibility
  • Resolve the right conda env by platform
  • Install the project in editable mode
  • Safe to re-run — idempotent by design
cli-cpp.sh
C++ / CUDA Build
  • Configure, build, and install the native extension
  • Incremental by default, full rebuild on flag
  • Run the Google Test suite with gcovr reports
  • Launch Nsight Systems / Compute captures
cli.sh
Python Development Loop
  • Lint, format, and type-check the package
  • Run pytest with fixture-sharing across modules
  • Launch benchmark runners with a single verb
  • Invoke the Snakemake DAG for experiment sweeps

Python Package Layout

The package at src/sigtekx/ is organized by concern, not by feature. Each submodule owns a narrow responsibility — configuration, pipeline composition, execution, utility infrastructure — so a researcher reading the code can land on the right file without tracing import chains.

core/
Engine facade, executor wrappers, pybind11 handoff to the native module
config/
Pydantic v2 models, presets, validators, computed properties
stages/
Python-visible stage composition, StageFactory, pipeline graph validation
utils/
GPU clock lock, seed and device reproducibility, signal generators
testing/
Shared pytest fixtures — engines, configs, sample data — for downstream suites
exceptions.py
Typed exception hierarchy — catch narrow failures, skip bare excepts
benchmarks/
Shared harness used by the runners in Module 04 — lives here so analytics stays a consumer
__version__.py
Single source of truth for package version — read by pyproject.toml and bindings

Workflow Orchestration

Experiments are Hydra configs composed from benchmark/, engine/, and experiment/ YAML groups. Snakemake consumes that DAG and runs only the nodes whose inputs or code have changed. The same command twice in a row does nothing the second time — idempotency comes from marker files, not from a diff of the world.

Hydra Config Composition
Three config groups combine at runtime. Swapping one group — the benchmark spec, the engine preset, the experiment scope — produces a new run without touching the others. No YAML duplication, no hardcoded sweep loops.
Snakemake DAG
The Snakefile declares outputs, not steps. The DAG is rebuilt from the declared dependencies every invocation; only stale or missing targets re-run. Safe to interrupt, safe to re-launch, safe to parallelize.
Marker-File Idempotency
Each runner writes a lightweight marker after a successful write. The marker is the idempotency signal — its absence means the output is incomplete, its presence means the data is trustworthy.
Sweep Without Rewrite
A parameter sweep is a Hydra multi-run override on the CLI — no loop in Python, no templated YAML. Results from different sweeps live side-by-side in artifacts/ and feed straight into the Module 04 analyzers.

Environments

Four conda specs under environments/ cover the axes that actually differ across machines — build vs. runtime, Linux vs. Windows. A new contributor can get the matching environment for their platform in one command; CI pins the same specs so local results and cloud results run on the same toolchain.

environment.build.yml
CUDA toolkit, nvcc, CMake, pybind11, gtest — everything needed to compile the native extension
environment.runtime.yml
Python runtime deps only — NumPy, SciPy, Pydantic, Hydra, MLflow, Snakemake
environment.linux.yml
Linux-specific overlay — driver libs, shell tooling, profiling utilities
environment.win.yml
Windows-specific overlay — MSVC build tools, PowerShell-compatible entry points

Helper Utilities

Three utilities at the repo root turn the most-repeated benchmark boilerplate into library calls. They own the tedious parts — clock locking, stage timing, dataset materialization — so every runner in Module 04 can stay focused on what it's actually measuring.

prof_helper.py
GPU clock lock / unlock without manual nvidia-smi privilege dance. Warmup and steady-state duration control, NVTX range tagging, and a context manager that guarantees the GPU is returned to its default state even on exception paths.
stage_timing_helper.py
Per-stage timing via CUDA events — window, FFT, magnitude costs reported in isolation. The same helper powers the bottleneck breakdown shown in the Streamlit dashboards and the latency regression tests.
dataset_helper.py
Multi-run-safe dataset materialization — synthetic signal generation, deterministic seeding, and a cache that survives parallel Snakemake workers without a race. Two runs targeting the same dataset share one on-disk artifact.
Reproducibility Module
utils.reproducibility handles seed propagation across NumPy, PyTorch, and the CUDA PRNG. Every run records its seed set in the output CSV — a benchmark that can't be reproduced doesn't get saved.

Data Outputs

The API produces data that Module 04 consumes. Two conventions make that handoff boring: an academic real-time-factor definition, and a stable CSV schema. Boring is the goal — analytics should never have to re-derive what a column means.

RTF Convention
RTF = T_process / T_signal — the academic convention from ASR and radar literature, where lower is faster. A 100 kHz stream processed in 300 µs per 100 ms window yields RTF ≈ 0.003. The opposite throughput convention (higher is faster) is documented in rtf-convention-mapping.md for readers arriving from that side of the field.
CSV Schema
Every runner writes a canonical CSV under artifacts/data/ — columns for config hash, executor mode, NFFT, sample rate, per-frame latency, stage timings, seed, and environment fingerprint. The schema is versioned; additions are append-only, so an old analyzer against new data still runs. Full column list in csv-file-organization.md.
MLflow Mirror
The same metrics that land in CSV also stream to MLflow for tracking-server visibility. Two sinks, one source — the runner writes once and the logging helper fans out. Module 04's dashboards read CSV; Module 04's analysis scripts read either.
Access_Source_Repository Return_to_Hub All_Projects