AWS S3 - life cycle management
By Abhinay Durishetty
15 mins read
When we walk into a library, the layout is often strategic. Newly released and popular books are displayed prominently at the front, easily accessible as they are in high demand. As time goes on, these books might be moved to general shelves when their immediate popularity wanes, but they're still frequently read.

Deep within the library, there might be a special section or a storage area for rare and old books. These books are not in regular circulation but are preserved for their value, perhaps accessed only once in a while for research or by enthusiasts.

Then, consider those books that have been in the library for decades and are no longer relevant or in a condition to be read. The library, at some point, may decide to retire these books, either donating them, selling them off in sales, or recycling them.

Now, let's relate this to Amazon S3's Lifecycle Management:

New and Popular Books: These represent your most recent and frequently accessed data. Just as new books are kept at the forefront in a library, this data stays in the standard storage class for easy and quick access.

General Shelves: As books lose their immediate demand, they move to the general section. Similarly, after a certain period, your data might be transitioned to a more cost-effective storage class like Standard-IA (Infrequent Access) since it's accessed less but still holds value.

Rare and Old Books Section: Analogous to the Glacier or Glacier Deep Archive storage classes in S3, some data, while accessed very infrequently, needs to be preserved for the long term due to its intrinsic value.

Retiring Books: Just as libraries might remove old, damaged, or irrelevant books, Lifecycle Management can be set to automatically delete data that's no longer needed after a specific time.

Through this analogy, we can understand that data, like books, has different phases in its lifecycle. And just as a library manages its collection to serve its readers best while optimizing space, Amazon S3's Lifecycle Management helps in optimizing storage costs and ensuring that data is stored most efficiently throughout its lifecycle.

What is Life Cycle Management?

Amazon S3's Lifecycle Management allows users to define rules for the automatic transitioning of data to different storage classes or the deletion of data. This ensures optimal cost management, aligns with data retrieval needs, and automates the cleanup of redundant or outdated data.

Understanding Storage Classes:

Before diving into how to set up Lifecycle Management, it's crucial to understand the different storage classes available in Amazon S3, as they serve as target destinations in your transition rules:

  • STANDARD: Ideal for frequently accessed data and designed for durability. Suitable for big data analytics, mobile and gaming applications, content distribution, and backups.

  • INTELLIGENT-TIERING: Perfect for data with unpredictable access patterns. Objects can be transitioned to this storage class after 30 days of creation. This class automatically moves objects between frequent and infrequent access tiers based on changing access patterns.

  • ONEZONE-IA (Infrequent Access): For data that's infrequently accessed but needs quick access when required. Data is stored in a single availability zone. Objects can be transitioned to this storage class after 30 days of creation.

  • GLACIER: Suitable for archiving data with retrieval times ranging from minutes to hours. It's a cost-effective solution for long-term archival. Objects can be transitioned to GLACIER after 90 days of creation.

  • GLACIER DEEP ARCHIVE: The most cost-effective storage class designed for archival data that's accessed once or twice in a year and can tolerate retrieval times of 12 hours. Objects can be transitioned to GLACIER DEEP ARCHIVE after 180 days of creation.

  • STANDARD-IA (Infrequent Access): This class is for data that's less frequently accessed, but when required, needs rapid access. It's a good fit for data like backups or older data not used regularly but necessary for unforeseen retrieval needs. Objects can be transitioned to this storage class after 30 days of creation.

Note: These transition durations mentioned above are typically suggested default durations and can be customized as per your requirements when setting up lifecycle policies in AWS S3.

How to create a Lifecycle management rule?

Let's dive into AWS S3 lifecycle management hands on exercise. As mentioned earlier, you learned that S3 offers different storage classes for transitioning infrequently used data. In this exercise, we'll upload a basic image to S3 and apply lifecycle management to the object.

To upload an object to S3, begin by clicking on the "Upload" button, and then proceed to upload a standard file to your S3 bucket.

In this example, I'm uploading a basic image to the specified S3 bucket. You have the option to choose the storage class for the bucket in the "Properties" section, as indicated in the accompanying image. Once you click the "Upload" button, the object will be transferred and stored in your S3 bucket.

However, let's delve into the storage classes after successfully uploading the object to S3.

  • To establish a lifecycle rule, navigate to your bucket, choose the management option, and then opt for creating a lifecycle rule
  • Give a name to the lifecycle rule
  • After assigning a name to the rule, you have the choice to choose a rule scope. This rule scope enables you to narrow down the applicability of the rule by utilizing one or more filters. These filters can include prefix, object tags, object size, or any combination that aligns with your specific use case. This flexibility is essential because, within your S3 bucket, numerous folders may contain multiple objects. However, you may only want to enforce the lifecycle rule within a particular folder. This option proves valuable in such scenarios. For example, if your objects reside in a folder named "images," you can define the prefix as /images. You can give object tags to filter for your specific use case and also filter out using the “object size” option.
  • Alternatively, you can opt for the "Apply to all objects in the bucket" setting to enforce your lifecycle rule across all objects within the bucket.For the purposes of this demonstration, as we currently have only one basic file in our S3 bucket, we are creating a rule to delete the object after 3 days from its creation. Enabling bucket versioning allows us to delete both delete markers and incomplete multipart uploads. Now, let's explore what these two options entail. In this demonstration, I am selecting the "Apply to all objects in the bucket" option. Once you choose this option, remember to click the acknowledgment checkbox.
For the purposes of this demonstration, as we currently have only one basic file in our S3 bucket, we are creating a rule to delete the object after 3 days from its creation. Enabling bucket versioning allows us to delete both delete markers and incomplete multipart uploads. Now, let's explore what these two options entail.

Expired Object Delete Markers:
  • When you enable versioning for an S3 bucket, S3 keeps multiple versions of an object, including all writes and deletes. A delete action doesn't really remove the object; instead, it just creates a delete marker. Over time, you might end up with many object versions and delete markers.
  • A common scenario is when an object has been deleted (and a delete marker is created), but later, the object no longer has any current versions left - only the delete marker. In this case, the delete marker is considered "expired."
  • The lifecycle policy can be set up to clean up these expired delete markers to keep the bucket more organized and prevent unnecessary costs

Incomplete Multipart Uploads:
  • When you upload large files to S3, you might use the multipart upload feature.
  • This splits a large object into smaller parts and uploads them in parallel to speed up the upload process. If for some reason, this upload process is interrupted (e.g., a network issue or the application crashes), you might end up with parts of the file left in S3, but no complete file.
  • These leftover parts can accumulate over time and result in unnecessary storage costs.
  • The lifecycle policy allows you to set an expiration period for these incomplete multipart uploads. After the specified period, these incomplete uploads are automatically deleted, freeing up space and reducing costs. 
Click on “Save” to complete the setup.