Cloud Infrastructure

>_ Module 02 — AWS GPU Benchmarking Pipeline

Taking the SigTekX engine from a local workstation to reproducible cloud benchmarks. A containerized deployment pipeline built on AWS EC2 spot GPU instances, with automated result collection to S3, real-time log streaming to CloudWatch, and Streamlit-based performance analysis — all for under $0.20 per run.

Architecture Overview

The cloud pipeline follows a simple loop: build locally, push to ECR, pull and run on a spot GPU instance, then download results. Every component is scoped, automated, and teardown-safe — no resources linger after the run.

SigTekX AWS Architecture Overview
FIG 01: Cloud benchmarking architecture — Local build to EC2 spot execution to S3 result collection

AWS Services

EC2
GPU spot instances for benchmark execution
ECR
Private Docker registry, co-located in-region
S3
Benchmark result CSV storage
CloudWatch
Container log streaming via awslogs driver
IAM
Scoped instance role, no credentials in code
SNS
Billing alert email notifications
Service Quotas
G and VT vCPU limit increase requests
CloudWatch Billing
EstimatedCharges alarm for cost safety

Instance Configuration

>_ EC2 Benchmark Instance

Spot Instance
Instance Type
g4dn.xlarge
GPU
NVIDIA T4 — 16 GB
CUDA Architecture
sm_75
AMI
Deep Learning OSS NVIDIA Driver AMI GPU PyTorch 2.10 (Ubuntu 24.04)
Spot Cost
~$0.16/hr
Region
us-west-2 (Oregon)
SigTekXEC2BenchmarkRole Instance Profile
AmazonEC2ContainerRegistryReadOnly
AWS Managed
AmazonS3FullAccess
AWS Managed
CloudWatchAgentServerPolicy
AWS Managed

Container Build

A two-stage Docker build separates compilation from runtime. The builder stage compiles the C++/CUDA engine and packages it into a Python wheel. The production stage installs only the wheel and runtime dependencies — no compilers, no source code, no build artifacts. A non-root appuser runs inside the container for security.

Stage 1
Builder
nvidia/cuda:13.0.0-devel-ubuntu22.04
  • Install build tools + Miniconda
  • Create Conda env from environment.build.yml
  • CMake build of C++/CUDA engine
  • Copy benchmarks + experiment configs
  • pip wheel — produces .whl artifact
->
Stage 2
Production
nvidia/cuda:13.0.0-runtime-ubuntu22.04
  • Miniconda + runtime Conda env only
  • Install .whl from builder stage
  • Copy benchmark scripts + configs
  • Non-root appuser for execution
  • Entrypoint: conda run -n sigtekx

Deployment Pipeline

Four shell scripts handle the full lifecycle — from creating AWS resources to tearing them down. Each step is idempotent and safe to re-run.

01
setup_iam.sh
Creates S3 bucket, IAM role with scoped inline policy, and instance profile. All resources are tagged and region-scoped to us-west-2.
02
push_ecr.sh
Authenticates Docker to ECR, builds the image, and pushes two tags: latest and the current git commit SHA for traceability.
03
run_ec2_benchmark.sh
SSHes into the EC2 instance, pulls the image from ECR, runs the container with --gpus all and the awslogs driver, then uploads result CSVs to S3. Supports --smoke, --full, and custom Hydra argument modes.
04
teardown.sh
Deletes the S3 bucket, IAM role, instance profile, and CloudWatch log group. Safe to run multiple times — skips resources that don't exist. ECR repository is preserved to keep cached layers.

Observability

Container stdout/stderr streams directly to CloudWatch via Docker's built-in awslogs driver — no sidecar, no agent, no extra process. Locally, the same benchmark output is visible in WSL2 terminal and analyzed in the Streamlit dashboard after downloading results from S3.

CloudWatch log streaming alongside WSL2 terminal output
CloudWatch log group alongside WSL2 benchmark terminal
Streamlit dashboard showing Real-Time Factor analysis from AWS benchmark run
Streamlit RTF analysis loaded from aws-ec2 dataset

Cloud Performance

End-to-end benchmarks measured from inside the g4dn.xlarge container. The T4's Turing architecture (sm_75) trades raw Ampere throughput for ~30% the hourly cost — still clearing 100 kHz real-time with a 77x margin and even higher spectral SNR than the local RTX 3090 Ti.

756.1 µs
Mean Latency
1,022 µs
P99 Latency
144.1 dB
Spectral SNR
2,802 FPS
Throughput (1 Ch)
123.3 MSPS
Peak Throughput (8 Ch)
100%
RT Compliance @ 100 kHz
>_ AWS g4dn.xlarge · NVIDIA T4 (Turing / sm_75) · 4 vCPU Xeon Cascade Lake · NFFT=4096 · Python E2E
>_ Local RTX 3090 Ti reference: 171.1 µs mean / 305.5 µs p99 — Ampere is ~4.4x faster per frame

Design Decisions

Why EC2 Spot over SageMaker?
SageMaker adds ML-specific abstractions — job queues, managed endpoints, training channels — that are overhead for a benchmark workload. EC2 spot is ~$0.16/hr vs SageMaker's ~$0.53 on-demand, simpler to reason about, and requires no ML knowledge.
Why ECR over Docker Hub?
ECR is co-located in the same region as the EC2 instance — fast pulls, no egress charges, no rate limits. Auth flows through the instance role with zero separate credentials.
Why g4dn.xlarge?
The T4's sm_75 is already in the CMake CUDA architectures list. Cheapest GPU option (~$0.16/hr spot) with native arch support and sufficient 16 GB VRAM for all benchmark configurations.
Why awslogs driver?
Built into Docker — no sidecar, no agent. Container stdout/stderr streams directly to CloudWatch. Log group and stream names are set at container launch time, not baked into the image.

Security

Scoped IAM Policy

Inline policy locked to one S3 bucket and one CloudWatch log group — no wildcard resource ARNs.

EC2-Only Trust

Trust policy restricted to ec2.amazonaws.com — role cannot be assumed by any other service.

Non-Root Container

appuser runs inside the container. No root access to the runtime environment.

Zero Credentials in Code

No secrets in env vars, Dockerfiles, or source. All auth flows through the instance role.

Cost Model

A full benchmark run takes approximately 30 minutes on g4dn.xlarge. A CloudWatch billing alarm backed by SNS email notifications ensures a forgotten spot instance can't silently burn through the budget.

Component Rate Cost / Run
EC2 g4dn.xlarge spot ~$0.16/hr ~$0.08
S3 storage (1 GB, 1 month) $0.023/GB-mo ~$0.02
CloudWatch Logs ingest $0.50/GB <$0.05
S3 requests + ECR Negligible <$0.01
Total per run ~$0.12 – $0.20

Detailed Architecture

The full architectural diagram maps every data flow — from the local Docker build through ECR, EC2, S3, CloudWatch, and back to the local Streamlit dashboard.

SigTekX Detailed AWS Architecture Diagram
FIG 02: Detailed architecture — full data flow from local development to cloud execution and result analysis
Access_Source_Repository Return_to_Hub All_Projects
EXPLORING DIAGRAM
Technical Schematic
SCROLL TO ZOOM | DRAG TO PAN | ESC TO CLOSE