Researcher API

>_ Module 03 — Python API, CLI, and Workflow Tooling

One Python Engine, three ways to summon it, and a CLI that handles the CUDA / Conda / CMake plumbing so research — not setup — is where the time goes. The same API scales from a one-line preset to a fully composed pipeline without changing the call site. Hydra-driven configs, Snakemake orchestration, and a small set of helper utilities turn the benchmarking loop from script-salad into library calls.

Three Ways to Start

Every researcher lands on the same engine.process(samples) — what differs is how the engine was built. Presets cover the 80% case with a single string. A validated ExecutorConfig takes over when a parameter needs tuning. Stage-level composition via StageFactory is reserved for the cases where a brand-new stage enters the pipeline. Three surfaces, one call downstream.

Preset

One-Line Default

engine = Engine.from_preset("iono")

Named bundle of validated parameters. default, iono, and ionox cover general, ionospheric, and extended-SNR workflows. Zero knobs, zero boilerplate — good for a first run and for teaching.

Config

Schema-Driven Build

cfg = ExecutorConfig(nfft=4096, mode="streaming") engine = Engine(config=cfg)

Pydantic-validated struct. Override what you care about, inherit the rest. Computed properties (RTF, hop, memory) fall out of the schema — invalid combinations fail at construction, not mid-run.

Pipeline

Custom Stage Composition

stages = StageFactory.compose([ WindowStage("hann"), FFTStage(4096), MagnitudeStage(), ]) engine = Engine(stages=stages)

Drop in a bandpass, log-magnitude, or PSD stage without touching the executors. Factory resolves the graph, validates it, and hands the same engine object back.

Configuration as Schema

Configuration is a first-class Pydantic v2 model, not a free-form dict. Invalid FFT sizes, illegal sample rates, and mode / stage mismatches raise at construction. Derived quantities — real-time factor, frequency resolution, hop size, memory budget — are computed properties, so they stay consistent with the inputs that produced them.

Strict Validation
 ExecutorConfig rejects negative rates, non-power-of-two NFFT,
                        and hop values that exceed the window. A bad config fails before any CUDA
                        context is created — cheap failures stay cheap.
Computed Properties

RTF, freq_resolution, hop_samples, and estimated
                        memory footprint are properties of the config, derived from the inputs.
                        One source of truth — no drift between the value in the YAML and the value
                        reported in benchmarks.
Presets as Bundles

A preset is a validated ExecutorConfig under a name. The
                        preset system is a view over the schema, not parallel to it — there is
                        no second path that can drift.
Typed Exceptions
 ConfigError, BenchmarkError,
ExperimentError, ReproducibilityError — a shallow
                        exception hierarchy that lets callers catch the narrow failure they care
                        about without a bare except.

CLI-First Development

The CLI exists so the Python API can stay unopinionated. Bootstrap, build, test, lint, and run all collapse into a handful of shell entry points. No manual nvcc invocations, no CMake incantations, no remembering which conda env to activate — the scripts pick the right environment, cache the build, and fall through to sensible defaults.

init_bash.sh

One-Shot Bootstrap

Verify CUDA toolkit and GPU visibility
Resolve the right conda env by platform
Install the project in editable mode
Safe to re-run — idempotent by design

cli-cpp.sh

C++ / CUDA Build

Configure, build, and install the native extension
Incremental by default, full rebuild on flag
Run the Google Test suite with gcovr reports
Launch Nsight Systems / Compute captures

cli.sh

Python Development Loop

Lint, format, and type-check the package
Run pytest with fixture-sharing across modules
Launch benchmark runners with a single verb
Invoke the Snakemake DAG for experiment sweeps

Python Package Layout

The package at src/sigtekx/ is organized by concern, not by feature. Each submodule owns a narrow responsibility — configuration, pipeline composition, execution, utility infrastructure — so a researcher reading the code can land on the right file without tracing import chains.

core/

Engine facade, executor wrappers, pybind11 handoff to the native module

config/

Pydantic v2 models, presets, validators, computed properties

stages/

Python-visible stage composition, StageFactory, pipeline graph validation

utils/

GPU clock lock, seed and device reproducibility, signal generators

testing/

Shared pytest fixtures — engines, configs, sample data — for downstream suites

exceptions.py

Typed exception hierarchy — catch narrow failures, skip bare excepts

benchmarks/

Shared harness used by the runners in Module 04 — lives here so analytics stays a consumer

__version__.py

Single source of truth for package version — read by pyproject.toml and bindings

Workflow Orchestration

Experiments are Hydra configs composed from benchmark/, engine/, and experiment/ YAML groups. Snakemake consumes that DAG and runs only the nodes whose inputs or code have changed. The same command twice in a row does nothing the second time — idempotency comes from marker files, not from a diff of the world.

Hydra Config Composition

Three config groups combine at runtime. Swapping one group — the benchmark
                        spec, the engine preset, the experiment scope — produces a new run without
                        touching the others. No YAML duplication, no hardcoded sweep loops.
Snakemake DAG

The Snakefile declares outputs, not steps. The DAG is rebuilt from the
                        declared dependencies every invocation; only stale or missing targets
                        re-run. Safe to interrupt, safe to re-launch, safe to parallelize.
Marker-File Idempotency

Each runner writes a lightweight marker after a successful write. The
                        marker is the idempotency signal — its absence means the output is
                        incomplete, its presence means the data is trustworthy.
Sweep Without Rewrite

A parameter sweep is a Hydra multi-run override on the CLI — no loop in
                        Python, no templated YAML. Results from different sweeps live side-by-side
                        in artifacts/ and feed straight into the Module 04 analyzers.

Environments

Four conda specs under environments/ cover the axes that actually differ across machines — build vs. runtime, Linux vs. Windows. A new contributor can get the matching environment for their platform in one command; CI pins the same specs so local results and cloud results run on the same toolchain.

environment.build.yml

CUDA toolkit, nvcc, CMake, pybind11, gtest — everything needed to compile the native extension

environment.runtime.yml

Python runtime deps only — NumPy, SciPy, Pydantic, Hydra, MLflow, Snakemake

environment.linux.yml

Linux-specific overlay — driver libs, shell tooling, profiling utilities

environment.win.yml

Windows-specific overlay — MSVC build tools, PowerShell-compatible entry points

Helper Utilities

Three utilities at the repo root turn the most-repeated benchmark boilerplate into library calls. They own the tedious parts — clock locking, stage timing, dataset materialization — so every runner in Module 04 can stay focused on what it's actually measuring.

prof_helper.py

GPU clock lock / unlock without manual nvidia-smi privilege
                        dance. Warmup and steady-state duration control, NVTX range tagging, and
                        a context manager that guarantees the GPU is returned to its default
                        state even on exception paths.
stage_timing_helper.py

Per-stage timing via CUDA events — window, FFT, magnitude costs reported
                        in isolation. The same helper powers the bottleneck breakdown shown in
                        the Streamlit dashboards and the latency regression tests.
dataset_helper.py

Multi-run-safe dataset materialization — synthetic signal generation,
                        deterministic seeding, and a cache that survives parallel Snakemake
                        workers without a race. Two runs targeting the same dataset share one
                        on-disk artifact.
Reproducibility Module
 utils.reproducibility handles seed propagation across
                        NumPy, PyTorch, and the CUDA PRNG. Every run records its seed set in the
                        output CSV — a benchmark that can't be reproduced doesn't get saved.

Data Outputs

The API produces data that Module 04 consumes. Two conventions make that handoff boring: an academic real-time-factor definition, and a stable CSV schema. Boring is the goal — analytics should never have to re-derive what a column means.

RTF Convention

RTF = T_process / T_signal — the academic convention from ASR and radar literature, where lower is faster. A 100 kHz stream processed in 300 µs per 100 ms window yields RTF ≈ 0.003. The opposite throughput convention (higher is faster) is documented in rtf-convention-mapping.md for readers arriving from that side of the field.

CSV Schema

Every runner writes a canonical CSV under artifacts/data/ — columns for config hash, executor mode, NFFT, sample rate, per-frame latency, stage timings, seed, and environment fingerprint. The schema is versioned; additions are append-only, so an old analyzer against new data still runs. Full column list in csv-file-organization.md.

MLflow Mirror

The same metrics that land in CSV also stream to MLflow for tracking-server visibility. Two sinks, one source — the runner writes once and the logging helper fans out. Module 04's dashboards read CSV; Module 04's analysis scripts read either.

Access_Source_Repository Return_to_Hub All_Projects