Storage Checklist
Growth projections
- Identify the types of data to be stored and the expected growth rate for each type.
- Estimate the average object size for each type of data and the expected number of objects.
- Calculate the total amount of data that will need to be stored by multiplying the average object size by the expected number of objects.
- Consider any data compression, deduplication, or other data reduction techniques that may be used to lower the amount of data stored.
- Account for data retention policies, backups, and disaster recovery requirements that may affect storage growth projections.
- Monitor actual storage utilization and adjust projections as needed.
Example projections of monthly usage at 10%/month growth rate:
Month | 1mb/user | 2mb/user | 10mb/user | Total Cumulative Usage |
---|---|---|---|---|
1 | 1.05mb | 2.1mb | 10.5mb | 10.5mb |
2 | 1.1025mb | 2.205mb | 11.025mb | 21.525mb |
3 | 1.157625mb | 2.3125mb | 11.576mb | 33.101mb |
4 | 1.215531mb | 2.431mb | 12.1563mb | 45.258mb |
5 | 1.277986mb | 2.5559mb | 12.7799mb | 58.038mb |
6 | 1.344918mb | 2.6898mb | 13.4492mb | 71.488mb |
7 | 1.416631mb | 2.8332mb | 14.1663mb | 85.655mb |
8 | 1.493363mb | 2.9865mb | 14.9336mb | 100.589mb |
9 | 1.575352mb | 3.1505mb | 15.7535mb | 116.343mb |
10 | 1.662762mb | 3.3253mb | 16.6276mb | 133.07mb |
11 | 1.755889mb | 3.5119mb | 17.5589mb | 150.629mb |
12 | 1.854959mb | 3.7099mb | 18.5496mb | 169.179mb |
Type of Storage
Option | S3 | EBS | EFS |
---|---|---|---|
Type of workload | Frequent, large quantities | Frequent, small quantities | Low performance |
Data durability | Highest | Slightly lower | Slightly lower |
Access patterns | Frequent, large quantities | Frequent, small quantities | Low performance |
Cost | Most cost-effective | More expensive | More cost-effective |
Security | Highest | Slightly lower | Slightly lower |
Performance | Moderate | Higher | Lower |
Scalability | High | More scalable | More elastic |
Elasticity | High | - | More elastic |
Cost | Low | More expensive | More cost-effective |
Data durability | High | Slightly lower | Slightly lower |
Object Storage Checklist
- Store all objects in separate buckets
- Use unique object names
- Define a clear object lifecycle
- Implement security best practices
- Consider using multiple cloud providers
- Consider data transfer costs
- Optimize object storage performance
- Utilize object versioning judiciously
- Monitor and optimize costs regularly
- Ensure interoperability and portability between cloud providers
Block Storage Checklist
- Oversize volumes appropriately
- Use multiple disks for data storage
- Utilize the appropriate block storage type for the data's access patterns and performance requirements
- Implement encryption for block storage volumes
- Consider the impact of block size when storing data
- Monitor and optimize costs regularly
- Use block storage for persistent data only
- Implement access controls
- Consider the impact of network latency
- Plan for disaster recovery
- Test backup and recovery procedures regularly
- Consider the impact of volume type
- Avoid over-provisioning block storage resources
Archive Storage Checklist
- Determine which data should be stored in archive storage based on its access patterns and performance requirements.
- Consider the impact of using archive storage for all data storage needs and decide whether it's appropriate for your organization's needs.
- Implement retention policies to specify how long data should be retained in archive storage.
- Implement encryption for archive storage to protect against security vulnerabilities and data breaches.
- Consider the impact of access latency when choosing the location of archive storage.
- Implement backup and recovery procedures in addition to replication for archive storage.
- Ensure that archive storage is only used for data that is appropriate for archiving.
- Regularly review and delete expired data based on retention policies to avoid unnecessary storage costs and potential compliance violations.
- Regularly test backup and recovery procedures to ensure that data can be restored in the event of a disaster or other unexpected event.
- Plan for disaster recovery by replicating data across multiple regions or cloud providers to minimize the risk of data loss and downtime.
AWS Archive Storage checklist
Amazon S3 offers a variety of storage classes designed for different data access patterns and performance requirements. Here's an overview of the different S3 storage classes:
- Amazon S3 Standard: This is the default storage class for S3 and provides high durability, availability, and performance for frequently accessed data.
- Amazon S3 Intelligent-Tiering: This storage class uses machine learning to automatically move data between two access tiers based on changing access patterns. This can help optimize costs for data with unknown or changing access patterns.
- Amazon S3 Standard-Infrequent Access (S3 Standard-IA): This storage class is designed for infrequently accessed data that needs to be readily available when accessed. It offers a lower storage cost than Amazon S3 Standard, but with a slightly longer retrieval time and a per-object retrieval fee.
- Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA): This storage class is similar to S3 Standard-IA, but data is stored in a single availability zone, which makes it less expensive than S3 Standard-IA. However, it's not as durable as S3 Standard-IA and is best suited for infrequently accessed data that can be recreated easily.
- Amazon S3 Glacier: This is a low-cost storage option for data archiving and long-term retention. Retrieval times for data stored in Glacier can be several hours, so it's best suited for infrequently accessed data with long retention periods.
- Amazon S3 Glacier Deep Archive: This storage class provides the lowest cost storage option for long-term data archiving and digital preservation. Retrieval times for data stored in Glacier Deep Archive can be up to 12 hours, so it's best suited for data that is rarely accessed and has long retention periods.
Amazon S3 Glacier Deep Archive
Amazon S3 Glacier Deep Archive is a storage class in Amazon S3 Glacier that provides the lowest cost storage option for long-term data archiving and digital preservation. It is designed for customers who need to retain large amounts of data for many years or decades, and who don't require immediate or frequent access to that data.
Data stored in Amazon S3 Glacier Deep Archive is durably stored across multiple availability zones within an AWS region and is designed for 99.999999999% durability. This means that even in the unlikely event that multiple disks or facilities fail, there is still a high level of data durability and availability.
The retrieval times for data stored in Amazon S3 Glacier Deep Archive can be as long as 12 hours, which makes it best suited for infrequently accessed data that is not needed immediately. Retrieving data from Amazon S3 Glacier Deep Archive is a multi-step process that involves initiating a retrieval job, waiting for the data to become available, and then downloading the data.
Because Amazon S3 Glacier Deep Archive is a low-cost storage option, it's ideal for storing large amounts of data that is rarely accessed and has long retention periods, such as compliance data, backups, and archives.
Determine which data should be stored in Amazon S3 Glacier or Amazon S3 Glacier Deep Archive based on its access patterns and performance requirements.
Consider the impact of using archive storage for all data storage needs and decide whether it's appropriate for your organization's needs, and if not, consider other Amazon S3 storage classes, such as Amazon S3 Standard or Amazon S3 Infrequent Access.
Implement lifecycle policies to specify how long data should be retained in Amazon S3 Glacier or Amazon S3 Glacier Deep Archive, and automatically transition it to the appropriate storage class based on its age or other criteria.
Implement server-side encryption for Amazon S3 Glacier and Amazon S3 Glacier Deep Archive to protect against security vulnerabilities and data breaches.
Consider the impact of access latency when choosing the region and availability zone where Amazon S3 Glacier or Amazon S3 Glacier Deep Archive should be located.
Implement backup and recovery procedures in addition to replication for Amazon S3 Glacier or Amazon S3 Glacier Deep Archive, such as by using AWS Backup to create and manage backups.
Ensure that Amazon S3 Glacier or Amazon S3 Glacier Deep Archive is only used for data that is appropriate for archiving, and consider other Amazon S3 storage classes for frequently accessed or transactional data.
Regularly review and delete expired data based on lifecycle policies to avoid unnecessary storage costs and potential compliance violations.
Regularly test backup and recovery procedures to ensure that data can be restored in the event of a disaster or other unexpected event.
Plan for disaster recovery by replicating data across multiple AWS regions or cloud providers to minimize the risk of data loss and downtime, and consider using AWS Storage Gateway to create a hybrid cloud storage solution