Cloud Infrastructure

>_ Module 02 — AWS GPU Benchmarking Pipeline

Taking the SigTekX engine from a local workstation to reproducible cloud benchmarks. A containerized deployment pipeline built on AWS EC2 spot GPU instances, with automated result collection to S3, real-time log streaming to CloudWatch, and Streamlit-based performance analysis — all for under $0.20 per run.

Architecture Overview

The cloud pipeline follows a simple loop: build locally, push to ECR, pull and run on a spot GPU instance, then download results. Every component is scoped, automated, and teardown-safe — no resources linger after the run.

FIG 01: Cloud benchmarking architecture — Local build to EC2 spot execution to S3 result collection

AWS Services

EC2

GPU spot instances for benchmark execution

ECR

Private Docker registry, co-located in-region

Benchmark result CSV storage

CloudWatch

Container log streaming via awslogs driver

IAM

Scoped instance role, no credentials in code

SNS

Billing alert email notifications

Service Quotas

G and VT vCPU limit increase requests

CloudWatch Billing

EstimatedCharges alarm for cost safety

Instance Configuration

>_ EC2 Benchmark Instance

Spot Instance

Instance Type

g4dn.xlarge

GPU

NVIDIA T4 — 16 GB

CUDA Architecture

sm_75

AMI

Deep Learning OSS NVIDIA Driver AMI GPU PyTorch 2.10 (Ubuntu 24.04)

Spot Cost

~$0.16/hr

Region

us-west-2 (Oregon)

SigTekXEC2BenchmarkRole Instance Profile

AmazonEC2ContainerRegistryReadOnly

AWS Managed

AmazonS3FullAccess

AWS Managed

CloudWatchAgentServerPolicy

AWS Managed

Container Build

A two-stage Docker build separates compilation from runtime. The builder stage compiles the C++/CUDA engine and packages it into a Python wheel. The production stage installs only the wheel and runtime dependencies — no compilers, no source code, no build artifacts. A non-root appuser runs inside the container for security.

Stage 1

Builder

nvidia/cuda:13.0.0-devel-ubuntu22.04

Install build tools + Miniconda
Create Conda env from environment.build.yml
CMake build of C++/CUDA engine
Copy benchmarks + experiment configs
pip wheel — produces .whl artifact

Stage 2

Production

nvidia/cuda:13.0.0-runtime-ubuntu22.04

Miniconda + runtime Conda env only
Install .whl from builder stage
Copy benchmark scripts + configs
Non-root appuser for execution
Entrypoint: conda run -n sigtekx

Deployment Pipeline

Four shell scripts handle the full lifecycle — from creating AWS resources to tearing them down. Each step is idempotent and safe to re-run.

setup_iam.sh

Creates S3 bucket, IAM role with scoped inline policy, and instance profile. All resources are tagged and region-scoped to us-west-2.

push_ecr.sh

Authenticates Docker to ECR, builds the image, and pushes two tags: latest and the current git commit SHA for traceability.

run_ec2_benchmark.sh

SSHes into the EC2 instance, pulls the image from ECR, runs the container with --gpus all and the awslogs driver, then uploads result CSVs to S3. Supports --smoke, --full, and custom Hydra argument modes.

teardown.sh

Deletes the S3 bucket, IAM role, instance profile, and CloudWatch log group. Safe to run multiple times — skips resources that don't exist. ECR repository is preserved to keep cached layers.

Observability

Container stdout/stderr streams directly to CloudWatch via Docker's built-in awslogs driver — no sidecar, no agent, no extra process. Locally, the same benchmark output is visible in WSL2 terminal and analyzed in the Streamlit dashboard after downloading results from S3.

CloudWatch log streaming alongside WSL2 terminal output

CloudWatch log group alongside WSL2 benchmark terminal

Streamlit dashboard showing Real-Time Factor analysis from AWS benchmark run

Streamlit RTF analysis loaded from aws-ec2 dataset

Cloud Performance

End-to-end benchmarks measured from inside the g4dn.xlarge container. The T4's Turing architecture (sm_75) trades raw Ampere throughput for ~30% the hourly cost — still clearing 100 kHz real-time with a 77x margin and even higher spectral SNR than the local RTX 3090 Ti.

756.1 µs

Mean Latency

1,022 µs

P99 Latency

144.1 dB

Spectral SNR

2,802 FPS

Throughput (1 Ch)

123.3 MSPS

Peak Throughput (8 Ch)

100%

RT Compliance @ 100 kHz

>_ AWS g4dn.xlarge · NVIDIA T4 (Turing / sm_75) · 4 vCPU Xeon Cascade Lake · NFFT=4096 · Python E2E
>_ Local RTX 3090 Ti reference: 171.1 µs mean / 305.5 µs p99 — Ampere is ~4.4x faster per frame

Design Decisions

Why EC2 Spot over SageMaker?

SageMaker adds ML-specific abstractions — job queues, managed endpoints, training channels — that are overhead for a benchmark workload. EC2 spot is ~$0.16/hr vs SageMaker's ~$0.53 on-demand, simpler to reason about, and requires no ML knowledge.

Why ECR over Docker Hub?

ECR is co-located in the same region as the EC2 instance — fast pulls, no egress charges, no rate limits. Auth flows through the instance role with zero separate credentials.

Why g4dn.xlarge?

The T4's sm_75 is already in the CMake CUDA architectures list. Cheapest GPU option (~$0.16/hr spot) with native arch support and sufficient 16 GB VRAM for all benchmark configurations.

Why awslogs driver?

Built into Docker — no sidecar, no agent. Container stdout/stderr streams directly to CloudWatch. Log group and stream names are set at container launch time, not baked into the image.

Security

Scoped IAM Policy

Inline policy locked to one S3 bucket and one CloudWatch log group — no wildcard resource ARNs.

EC2-Only Trust

Trust policy restricted to ec2.amazonaws.com — role cannot be assumed by any other service.

Non-Root Container

appuser runs inside the container. No root access to the runtime environment.

Zero Credentials in Code

No secrets in env vars, Dockerfiles, or source. All auth flows through the instance role.

Cost Model

A full benchmark run takes approximately 30 minutes on g4dn.xlarge. A CloudWatch billing alarm backed by SNS email notifications ensures a forgotten spot instance can't silently burn through the budget.

Component	Rate	Cost / Run
EC2 g4dn.xlarge spot	~$0.16/hr	~$0.08
S3 storage (1 GB, 1 month)	$0.023/GB-mo	~$0.02
CloudWatch Logs ingest	$0.50/GB	<$0.05
S3 requests + ECR	Negligible	<$0.01
Total per run		~$0.12 – $0.20

Detailed Architecture

The full architectural diagram maps every data flow — from the local Docker build through ECR, EC2, S3, CloudWatch, and back to the local Streamlit dashboard.

SigTekX Detailed AWS Architecture Diagram

FIG 02: Detailed architecture — full data flow from local development to cloud execution and result analysis

Access_Source_Repository Return_to_Hub All_Projects