Question 1

Is the compression really lossless?

Accepted Answer

Yes. Restore is always byte-for-byte identical, verified for bf16/fp16/fp32 weights and fp32 optimizer state against adversarial bit patterns (NaN, +/-Inf, denormal, -0.0). The AMI build itself fails unless a GPU compress->decompress round-trip is bit-exact, so a codec with broken plane reassembly never reaches a customer image.

Question 2

How much will it compress?

Accepted Answer

It depends on the data. All-bf16 / low-precision-optimizer checkpoints compress well (and better at scale), while fp32-heavy checkpoints saved far apart compress little. We are honest about where it helps and do not claim a fixed ratio. Compression is always lossless and never expands a blob beyond a small fixed header.

Question 3

Where are my checkpoints stored?

Accepted Answer

Compressed checkpoints persist to the Amazon S3 registry bucket you configure and never leave your account. You launch the AMI inside your own VPC, and your PyTorch training code writes checkpoints with s4weights.save / s4weights.load (or the delta-chain save_checkpoint / load_checkpoint), which compress each tensor on the GPU and persist bit-exact compressed checkpoints to your S3 registry.

Question 4

What does it run on, and how is it billed?

Accepted Answer

It runs on g6 or g6e GPU instances, wired end-to-end by the bundled CloudFormation template (deploy/cfn-train-runner.yaml). Billing is per-instance-hour with an annual option. AWS meters the running instance-hours automatically, and the runner calls RegisterUsage once at boot as a fail-closed entitlement check (an unentitled instance refuses to start).

Question 5

Is it easy to integrate into my PyTorch training code?

Accepted Answer

It is drop-in. You write checkpoints with the transparent s4weights.save / s4weights.load, or use save_checkpoint / load_checkpoint for the base->delta checkpoint store. Each tensor is compressed on the GPU, and for frequent saves the byte-XOR delta between consecutive checkpoints is stored and compressed as well.

S4 Weights

The problem

How it works

Split into byte planes on the GPU

Compress the delta between checkpoints

Restore bit-exact, persist to your own S3

Highlights

What's included

Use cases

FAQ

Pricing model

Other S4 products

S4 — Squished S3

S4 Logs

S4 Metrics