Question 1

Can it stop a runaway agent 100% of the time?

Accepted Answer

We keep two layers honestly distinct. Layer 1, the hard cap, is deterministic and pre-emptive: any request whose reservation would exceed a configured cap is blocked before relay, a 100% pre-emptive block of over-cap requests (same state in, same decision out, fixed by chaos tests). Layer 2, loop detection, is best-effort: a runaway is only knowable after a few calls, and those few are already billed, so it bounds the blast radius to a small number of requests or a small dollar amount rather than guaranteeing 100% prevention. It ships with a dry-run shadow mode so you can measure the false-block rate before you enforce.

Question 2

If output tokens are unknown in advance, how does it stop spend before the bill?

Accepted Answer

It is reserve-then-reconcile, not a flat estimate. Before relay, the reservation uses the worst case — input tokens counted now, output priced at max_tokens times the output rate — to make the hard-cap decision. When the response returns, the provider's reported usage is taken as the source of truth and reconciled against the reservation. Token counts are normalized across providers and split into input, output, cached-read, and cache-write so accounting reflects each provider's real rate card.

Question 3

Where are my prompts stored or sent?

Accepted Answer

S4 Firewall itself does not persist or transmit your prompts or responses. Its only outbound call is the provider request your application would have made anyway — the firewall adds no egress. The ledger and metrics carry token counts, not content (counts-not-content, fixed by property tests). Where prompts egress depends on the upstream you choose: routing to Amazon Bedrock through a VPC interface endpoint (PrivateLink, which this AMI can provision) keeps those calls inside your AWS boundary, while routing to a third-party provider on the public internet egresses to the internet and does not stay in your VPC.

Question 4

Does it need a separate control plane or database?

Accepted Answer

No. There is no separate control plane and no external database. Budget state is held in-memory per instance and re-derived from zero on restart. The data plane is a single static binary running under a hardened systemd unit with zero elevated capabilities and a least-privilege IAM role (upstream model invocation, CloudWatch PutMetricData scoped to the S4/Firewall namespace, and write-only PutObject to the ledger bucket). There is no telemetry home-call and no license-key check.

Question 5

How is it billed, and how do I deploy it?

Accepted Answer

Billing is AMI hourly (metered per instance-hour) with an annual contract option, running on c6g / c7g (Arm) instances. You deploy with the included CloudFormation templates — cfn-single.yaml for a single instance, cfn-ha.yaml for a redundant fleet behind an internal load balancer — which optionally create the Bedrock VPC interface endpoint. Then you simply point your application's base_url at the firewall.

S4 Firewall

The problem

How it works

Just point your base_url at it

Attribute, reserve, and decide in one synchronous pipeline

Two layers stop it, then reconcile against real usage

Highlights

What's included

Use cases

FAQ

Pricing model

Other S4 products

S4 — Squished S3

S4 Logs

S4 Metrics