← Blog

S3 Storage Cost Optimization in Practice: From Billing Breakdown to 7 Actionable Methods

Many teams only look at “how many TBs are stored” when reviewing their S3 bills, but true S3 costs are typically a combination of four types of charges: storage capacity, request volume, data transfer, and management features. Optimization should also be analyzed across these four dimensions, rather than simply migrating all buckets to cooler storage classes.

The prices below are based on us-east-1 as an example; actual costs depend on the pricing in the respective AWS region. The first 50TB of S3 Standard is approximately $0.023/GB-month; Standard-IA is about $0.0125/GB-month; Glacier Instant Retrieval is about $0.004/GB-month; and Glacier Deep Archive is about $0.00099/GB-month. While Deep Archive appears much cheaper, it has minimum storage duration requirements, retrieval times, and retrieval fees, meaning you cannot migrate blindly.

1. First Identify the Sources of Your Bill

It is recommended to first break down your costs by Usage type in Cost Explorer:

  • TimedStorage-ByteHrs: Object storage capacity.
  • Requests-Tier1/Tier2: Requests like PUT, LIST, GET, etc.
  • DataTransfer-Out-Bytes: Data transfer to the public internet or across regions.
  • Monitoring-Automation, Inventory, StorageLens: Management and analytics features.

A common misunderstanding is thinking that since storage fees are low, the costs are fine, while overlooking high volumes of LIST/GET requests or traffic from private subnets accessing S3 via a NAT Gateway, which spikes the network bill. Do not look only at bucket size for S3 optimization.

2. Intelligent-Tiering and Lifecycle Rules

If object access patterns are unpredictable, S3 Intelligent-Tiering is a low-risk starting point. It automatically moves objects between tiers like Frequent, Infrequent, and Archive Instant. There are no retrieval fees between automatic tiers, but a monitoring and automation fee is charged, typically costing $0.0025/1,000 objects-month. If your objects are very small and exist in huge quantities, this management fee might not be cost-effective.

Lifecycle rules are suitable for data with clear access patterns, such as logs, backups, and training samples:

{
  "Rules": [
    {
      "ID": "logs-to-glacier",
      "Status": "Enabled",
      "Filter": { "Prefix": "logs/" },
      "Transitions": [
        { "Days": 30, "StorageClass": "STANDARD_IA" },
        { "Days": 90, "StorageClass": "GLACIER_IR" },
        { "Days": 180, "StorageClass": "DEEP_ARCHIVE" }
      ],
      "Expiration": { "Days": 730 }
    }
  ]
}

Pay attention to minimum storage durations: Standard-IA is typically 30 days, Glacier Instant Retrieval / Flexible Retrieval is typically 90 days, and Deep Archive is typically 180 days. Objects with short lifecycles are not suitable for early transition to cooler tiers.

3. Glacier IR and Deep Archive: Only for “Truly Cold” Data

Glacier Instant Retrieval is suitable for archives that require “infrequent access but millisecond-level retrieval,” such as compliance audits and historical reports. Deep Archive is suitable for data that is almost never read and can tolerate retrieval times of several hours, such as multi-year backups.

The decision standard can be very straightforward: Deep Archive is only worth evaluating if an object is highly unlikely to be read within 6 months. If the business side cannot accept restoration wait times, do not force migration just to make the bill look good.

4. Cleaning Up Old Versions and Incomplete Multipart Uploads

Once Versioning is enabled, deleting an object only generates a delete marker, and older versions continue to be billed. A large portion of “ghost costs” in many buckets comes from historical versions.

You can check versions first:

aws s3api list-object-versions --bucket my-bucket --prefix app/

Then use lifecycle rules to clean up non-current versions:

{
  "Rules": [
    {
      "ID": "expire-old-versions",
      "Status": "Enabled",
      "Filter": {},
      "NoncurrentVersionExpiration": { "NoncurrentDays": 30 },
      "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
    }
  ]
}

AbortIncompleteMultipartUpload is easily overlooked. After a large file upload fails, the parts can remain in S3 indefinitely and continue to incur charges.

5. Request Optimization: LIST is More Expensive Than You Think

In S3 Standard, PUT/COPY/POST/LIST requests are typically priced at around $0.005/1,000 requests, while GET is typically around $0.0004/1,000 requests. Although the unit price seems low, when the number of objects reaches hundreds of millions, full scans via LIST requests become extremely expensive and can also slow down the system.

Practical recommendations:

  • Avoid using ListObjectsV2 as a database query.
  • Include dates, tenants, and business prefixes in your object key designs to reduce meaningless scans.
  • Use S3 Inventory instead of full LIST operations.
  • Combine or cache hot, small objects to reduce GET request storms.

6. Use VPC Endpoints to Avoid NAT Gateway Routing

When EC2, ECS, or Lambda in a private subnet access S3, routing through a NAT Gateway incurs NAT data processing fees. In us-east-1, the typical NAT Gateway price is about $0.045/hour plus $0.045/GB processed. The NAT processing fee alone for 1TB per month is approximately $45.

When accessing S3 and DynamoDB, prioritize configuring a Gateway VPC Endpoint:

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxxx \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-xxxx \
  --vpc-endpoint-type Gateway

Gateway Endpoints themselves generally do not incur hourly charges and also reduce traffic flowing through the NAT Gateway. After implementation, inspect your route tables to confirm that the S3 prefix list points to the endpoint.

7. Compression: Application-Side Compression vs. Transparent Gateways

Compression is the most direct way to optimize capacity, but it depends heavily on the data type. Parquet, ORC, gzip-compressed logs, images, and videos are usually already compressed and yield limited returns. JSON, CSV, plaintext logs, uncompressed objects, and certain backup files can offer significant savings.

The advantage of application-side compression is a simple architecture with no extra gateways; the downsides include the need for code changes, handling compatibility issues, and migrating historical data. Another approach is a transparent compression gateway: the application continues to use the S3 API, pointing its client at the gateway, which compresses the data before writing it to S3.

S4 (Squished S3) falls into the latter category. It is an EC2 AMI that includes the NVIDIA nvCOMP GPU codec, running on GPU instances such as g4dn/g5/g6. Once the S3 client points to this transparent gateway, it can typically reduce S3 storage bytes by 50% to 80% for compressible data, without requiring any application code changes. It is not suitable for all scenarios: if the data is already compressed, objects are small, or requests are extremely high but storage volume is low, the benefits may be negligible. The ideal scenario for evaluation is when you have massive amounts of text/semi-structured data in S3, storage capacity cost is the primary pain point, and you do not want to modify your application.

You can first use the S4 Cost Savings Calculator to get a rough estimate; it does not require uploading your own billing CSV and allows you to test with sample data.

Finally: Measure First, Then Modify Strategies

Recommended implementation order:

  1. Use Cost Explorer to find the primary usage types for S3.
  2. Use S3 Storage Lens to analyze buckets, prefixes, versions, and object size distributions.
  3. Clean up older versions and incomplete multipart uploads first.
  4. Then implement lifecycle rules, VPC Endpoints, and request optimization.
  5. Finally, evaluate compression or alternative architectures.

S3 cost reduction is not as simple as “moving everything to cold storage.” A truly effective approach requires analyzing data access patterns, object sizes, request paths, and retention lifecycles together.

Disclosure: The author of this article is from abyo software (the developer of S4, a cost optimization product on AWS Marketplace).