Skip to main content

Storage Basics

Storage Considerations

  • Number of active users

    • Nnumber of active users can affect the amount of storage needed, as each user may have their own data that needs to be stored.
  • Data retention policies

    • Length of time data is retained can affect storage needs, as more data will need to be stored for longer retention periods.
  • Data growth rate

    • Rate at which data is growing can affect storage needs, as more storage will be needed to accommodate the growth.
  • Type of data being stored

    • Different types of data can have different storage requirements. For example, storing images and videos can require more storage than storing text.
  • Compression and optimization

    • Data compression and optimization techniques can affect storage needs, as these techniques can reduce the amount of storage required for a given amount of data.
  • Data replication and backup

    • Data replication and backup can affect storage needs, as additional storage will be needed to accommodate the replication and backup of data.
  • Auditing and compliance

    • Auditing and compliance requirements can affect storage needs, as additional storage may be required to store audit logs and comply with regulations.
  • Scalability and performance requirements

    • The scalability and performance requirements of the application can affect storage needs, as more storage may be needed to accommodate a high volume of requests or to ensure fast access to data.
  • Security requirements

    • Security requirements, such as encryption and access controls, can affect storage needs, as additional storage may be needed to store encrypted versions of data or to accommodate access controls.
  • Cloud provider pricing

    • The pricing of storage from the cloud provider can affect storage needs, as the cost of storage can impact the total cost of the application and influence the storage capacity that is purchased.
  • Multi-tenancy

    • If the application is designed to support multiple tenants, storage needs will increase as the number of tenants increases, as storage will be needed to store data for each tenant.
  • Data governance

    • The data governance policies in place can affect storage needs, as certain data governance policies may require more data to be stored or retained.
  • Integration with other systems

    • Integration with other systems can affect storage needs, as additional storage may be needed to store data from integrated systems.
  • Third-party services

    • The use of third-party services, such as analytics or reporting tools, can affect storage needs, as additional storage may be needed to store data from these services.
  • Business requirements

    • Business requirements, such as the need for real-time data access or the ability to handle large data sets, can affect storage needs and influence the storage capacity that is purchased.

Features

  • Object storage

    • A type of storage that is optimized for unstructured data, such as images, videos, and documents.
  • Block storage

    • A type of storage that is similar to a traditional hard drive, providing low-latency access to stored data and supporting the use of file systems.
  • File storage

    • A type of storage that allows for the creation of shared file systems, enabling multiple users to access and collaborate on files.
  • Archive storage

    • A type of storage that is optimized for infrequently accessed data, providing a cost-effective option for storing large amounts of data.
  • Backup and disaster recovery

    • Services that allow for the backup of data to protect against data loss and the ability to recover data in case of a disaster.
  • Data encryption

    • Services that allow for the encryption of data at rest and in transit to protect sensitive data.
  • Access controls

    • Services that allow for the management of access to data, such as setting permissions and creating access policies.
  • Versioning

    • Services that allow for the creation of multiple versions of a file, enabling the recovery of previous versions in case of data loss or corruption.
  • Object lifecycle management

    • Services that allow for the automated management of data, such as moving data to different storage tiers based on usage patterns.
  • Data transfer and migration

    • Services that allow for the transfer and migration of data between different storage services or between on-premises and cloud-based storage.
  • Global access

    • Services that allow for the easy access to data from multiple locations, helping to reduce latency and improve application performance.
  • Data analytics

    • Services that allow for the analysis of stored data to extract insights and improve business operations.
  • Data warehousing

    • Services that allow for the creation of data warehouses to store large amounts of structured data and support business intelligence and analytics workloads.
  • Multi-cloud support

    • Services that allow for the integration of multiple cloud providers, enabling the use of different storage services across different cloud environments.
  • Hybrid storage

    • Services that allow for the integration of on-premises and cloud-based storage, providing a flexible and cost-effective storage solution.
  • Content delivery network (CDN)

    • Services that allow for the distribution of content, such as images, videos and files, from multiple locations to reduce latency and improve performance.
  • Automatic tiering

    • Services that automatically move data to different storage tiers based on usage patterns, helping to optimize costs and improve performance.
  • Data lake

    • Services that allow for the storage and management of large amounts of unstructured and structured data, providing a central repository for data analytics and machine learning workloads.

AWS Storage Options

  • Amazon Simple Storage Service (S3)

    • A fully managed object storage service that allows for the storage and retrieval of large amounts of unstructured data, such as images, videos, and documents.
  • Amazon Elastic Block Store (EBS)

    • A fully managed block storage service that provides low-latency access to stored data and supports the use of file systems.
  • Amazon Elastic File System (EFS)

    • A fully managed file storage service that allows for the creation of shared file systems, enabling multiple users to access and collaborate on files.
  • Amazon S3 Glacier

    • A fully managed archive storage service that is optimized for infrequently accessed data, providing a cost-effective option for storing large amounts of data.
  • AWS Backup

    • A fully managed backup service that allows for the backup of data across multiple services, including S3, EBS, RDS, and more.
  • Amazon S3 Intelligent-Tiering

    • A fully managed storage class that automatically moves data between two access tiers, frequently accessed and infrequent accessed, to optimize costs.
  • Amazon S3 Glacier Deep Archive

    • A fully managed archive storage service that is optimized for data archiving with retrieval times of 12 hours or more, providing the lowest-cost storage option.
  • AWS Storage Gateway

    • A hybrid storage service that enables on-premises applications to store data in the AWS cloud and access it as a file, volume, or tape.
  • Amazon FSx for Lustre

    • A fully managed file storage service optimized for high-performance computing (HPC) workloads and the most demanding workloads.
  • Amazon FSx for Windows File Server

    • A fully managed native Microsoft Windows file system that is built on top of Amazon S3, and is compatible with the SMB protocol.

Type of Storage

OptionS3EBSEFS
Type of workloadFrequent, large quantitiesFrequent, small quantitiesLow performance
Data durabilityHighestSlightly lowerSlightly lower
Access patternsFrequent, large quantitiesFrequent, small quantitiesLow performance
CostMost cost-effectiveMore expensiveMore cost-effective
SecurityHighestSlightly lowerSlightly lower
PerformanceModerateHigherLower
ScalabilityHighMore scalableMore elastic
ElasticityHigh-More elastic
CostLowMore expensiveMore cost-effective
Data durabilityHighSlightly lowerSlightly lower
  • Type of workload:

    • To determine the appropriate storage type, consider the type of workload that needs to be stored.
    • S3 is suitable for data that needs to be accessed frequently and in large quantities
    • EBS and EFS are better suited for data that needs to be accessed more frequently and in smaller amounts.
  • Data durability requirements:

    • The data durability requirements should be considered when determining the storage type.
    • S3 provides the highest data durability
    • EBS and EFS offering slightly lower durability.
  • Access patterns:

    • The expected access patterns should also be considered when choosing the storage type. S3 is best suited for data that needs to be accessed frequently and in large quantities
    • EBS and EFS are better suited for data that needs to be accessed more frequently and in smaller amounts.
  • Cost:

    • The total cost of the storage type should also be taken into account. S3 is typically the most cost-effective solution
    • EBS and EFS can be more expensive than S3 because they are more expensive to provision and maintain.
  • Security requirements:

    • The security requirements of the data should be considered when choosing the storage type. S3 offers the highest level of security
    • EBS and EFS offer slightly lower levels of security, or at least it needs to be configured.
  • Performance:

    • EBS is better than S3 for applications that require higher performance, such as databases
    • EFS is better than EBS and similar to S3 for applications that require lower performance, such as static web hosting.
  • Scalability:

    • EBS is more scalable than EFS, as it allows you to increase the storage size of an EBS volume on-demand.
  • Data durability:

    • EBS provides better data durability than EFS, as it is backed by Amazon’s durable storage infrastructure.
  • Elasticity:

    • EFS is more elastic than EBS, as it allows you to store and access data from multiple Amazon EC2 instances simultaneously.
  • Cost:

    • EFS is typically more cost-effective than EBS, as you only pay for the amount of storage you use.
  • Data durability:

    • EFS provides better data durability than EBS, as it is backed by Amazon’s durable storage infrastructure.

AWS Storage Options

Amazon S3 offers a variety of storage classes designed for different data access patterns and performance requirements. Here's an overview of the different S3 storage classes:

  • Amazon S3 Standard: This is the default storage class for S3 and provides high durability, availability, and performance for frequently accessed data.
  • Amazon S3 Intelligent-Tiering: This storage class uses machine learning to automatically move data between two access tiers based on changing access patterns. This can help optimize costs for data with unknown or changing access patterns.
  • Amazon S3 Standard-Infrequent Access (S3 Standard-IA): This storage class is designed for infrequently accessed data that needs to be readily available when accessed. It offers a lower storage cost than Amazon S3 Standard, but with a slightly longer retrieval time and a per-object retrieval fee.
  • Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA): This storage class is similar to S3 Standard-IA, but data is stored in a single availability zone, which makes it less expensive than S3 Standard-IA. However, it's not as durable as S3 Standard-IA and is best suited for infrequently accessed data that can be recreated easily.
  • Amazon S3 Glacier: This is a low-cost storage option for data archiving and long-term retention. Retrieval times for data stored in Glacier can be several hours, so it's best suited for infrequently accessed data with long retention periods.
  • Amazon S3 Glacier Deep Archive: This storage class provides the lowest cost storage option for long-term data archiving and digital preservation. Retrieval times for data stored in Glacier Deep Archive can be up to 12 hours, so it's best suited for data that is rarely accessed and has long retention periods.

Azure Storage Options

  • Azure Blob Storage

    • A fully managed object storage service that allows for the storage and retrieval of large amounts of unstructured data, such as images, videos, and documents.
  • Azure Disk Storage

    • A fully managed block storage service that provides low-latency access to stored data and supports the use of file systems.
  • Azure File Storage

    • A fully managed file storage service that allows for the creation of shared file systems, enabling multiple users to access and collaborate on files.
  • Azure Archive Storage

    • A fully managed archive storage service that is optimized for infrequently accessed data, providing a cost-effective option for storing large amounts of data.
  • Azure Backup

    • A fully managed backup service that allows for the backup of data across multiple services, including Azure VMs, SQL, and more.
  • Azure Data Box

    • A family of offline, rugged, and portable data transfer devices that allows to transfer large amounts of data to Azure.
  • Azure StorSimple

    • A hybrid storage solution that combines on-premises and cloud storage, that automatically tier data to the cloud and optimize performance.
  • Azure Cool Blob Storage

    • A storage option that stores data that is infrequently accessed and stored for at least 30 days, providing a lower-cost storage option.
  • Azure Queue Storage

    • A fully managed message queuing service that enables reliable messaging between application components.
  • Azure Data Lake Storage

    • A fully-managed, scalable, and secure data lake that allows to store and analyze large amounts of data, providing a single repository for big data analytics workloads.

Google GCP Storage Options

  • Google Cloud Storage

    • A fully managed object storage service that allows for the storage and retrieval of large amounts of unstructured data, such as images, videos, and documents.
  • Google Persistent Disk

    • A fully managed block storage service that provides low-latency access to stored data and supports the use of file systems.
  • Google Filestore

    • A fully managed file storage service that allows for the creation of shared file systems, enabling multiple users to access and collaborate on files.
  • Google Coldline Storage

    • A fully managed archive storage service that is optimized for infrequently accessed data, providing a cost-effective option for storing large amounts of data.
  • Google Cloud Backup

    • A fully managed backup service that allows for the backup of data across multiple services, including VMs, SQL, and more.
  • Google Cloud Storage Transfer Service

    • A fully managed service that allows to transfer large amounts of data to and from GCP, including data from on-premises and other cloud storage providers.
  • Google Cloud Storage Nearline

    • A storage option that stores data that is infrequently accessed and stored for at least 30 days, providing a lower-cost storage option.
  • Google Cloud Spanner

    • A fully-managed, horizontally scalable, and strongly consistent relational database service that allows to store and access structured data
  • Google Cloud Bigtable

    • A fully-managed, NoSQL, wide-column database service that allows to store and access large amounts of semi-structured and unstructured data.
  • Google Cloud SQL

    • A fully-managed database service that allows to create, configure, and manage instances of MySQL and PostgreSQL, it allows to store and access structured data.

Other non-Managed Storage devices

  • Network Attached Storage (NAS) devices

    • These devices connect to a network and provide storage that can be accessed by multiple devices. They are typically used for small to medium-sized businesses and home offices.
  • Direct Attached Storage (DAS) devices

    • These devices connect directly to a single server or computer and provide storage that can be accessed by that device only. They are typically used in small business and home offices.
  • Storage Area Network (SAN) devices

    • These are specialized, high-speed network connections that provide block-level access to storage. SAN devices are typically used in large enterprise environments.
  • Tape libraries

    • Tape libraries are used for long-term archival storage, and are typically used for backups and disaster recovery.
  • External hard drives

    • External hard drives connect to a computer via a USB or FireWire connection and provide additional storage. They are typically used for backups and file transfers.
  • Flash drives

    • Flash drives are small, portable storage devices that connect to a computer via USB. They are typically used for backups, file transfers, and to store small amounts of data.
  • RAID arrays

    • RAID arrays are a set of hard drives that are configured to provide data redundancy, performance and fault tolerance.
  • Cloud Gateways

    • Cloud gateways are appliances that provide on-premises storage with direct connectivity to cloud storage services, allowing to store data on-premises, in the cloud, or both.
  • Object Storage Devices

    • These devices are optimized for unstructured data, such as images, videos, and documents, and can be used for both primary storage and archival storage.

Open Source Storage Options like S3

  • Minio: An object storage server that provides an S3-compatible API and can be used for storing unstructured data such as photos, videos, backups, and containers.

  • OpenStack Swift: An open-source object storage system that provides a highly available, scalable, and durable storage solution for unstructured data.

  • GlusterFS: An open-source, scalable, distributed file system that can be used for storing large amounts of unstructured data across multiple servers.

  • Ceph: An open-source, distributed, scalable storage system that provides object, block, and file storage solutions and can be used for storing large amounts of data.

  • SeaweedFS: An open-source, scalable, and highly available distributed file system that can be used for storing and serving massive amounts of unstructured data.

  • Nginx-Object-Storage: An open-source solution that implements an S3-compatible object storage service using Nginx web server and Lua scripting language.

  • Rook: An open-source project that provides a cloud-native storage solution for running storage services in containers and manages the deployment, scaling, and management of storage services.

Open Source Storage Options like EFS

  • GlusterFS: An open-source, scalable, distributed file system that can be used for storing large amounts of unstructured data across multiple servers.

  • NFS (Network File System): An open-standard protocol for file-level storage over a network that allows multiple clients to access a single file system and supports file sharing across different operating systems.

  • CephFS: A distributed file system component of the Ceph storage system that provides POSIX-compliant file access to objects stored in a Ceph cluster.

  • OpenEBS: An open-source, cloud-native storage solution that provides persistent and scalable block storage for containers and Kubernetes.

  • Ganesha NFS: An open-source implementation of the Network File System (NFS) protocol that provides file-level storage over a network.

  • DRBD (Distributed Replicated Block Device): An open-source block-level storage replication solution that enables real-time data replication between multiple nodes.