A while ago I started a project helping customer to archive TB data into S3. Customer’s requirement was very simple, to be cost efficient as much as possible. So I survey some methods to put customer’s TB data directly into Deep_Archive.
As I was reviewing customer’s monthly bill few weeks after implement, I found that there is a particular cost that I don’t recognize nor is show on AWS S3 pricing page.
$0.023 per GB-Month of storage used in GlacierStagingStorage
After some research then I understand what’s the cost about. When we upload a file which is bigger than particular size, guessing 100MB, S3 will use multipart uploads, which uses multi threads in parallel to upload the file, to speed up the transfer.
When multiparts are uploaded, it first store in S3 and it cannot be seen from console. Once all the multiparts are uploaded, S3 assembles the parts and create the object. If multipart uploads failed or cancelled, parts that are already uploaded will remain in S3 and generates cost.
GlacierStagingStorage represents the total storage used for unfinished multiparts uploads.
Therefore, suggested by AWS, it is best practice to to configure a lifecycle rule that will abort the multipart upload and deletes the parts associated with the multipart upload.
Cost shown on billing titled as GlacierStagingStorage is parts that are uploaded by multiparts upload which the upload was uncompleted or failed. It is suggested by AWS as best practice to configure a bucket lifecycle to abort the upload and therefore able to delete uncompleted parts to save cost.