Choices
Database Type Matrix
For a full table matrix comparisons go here:
Considerations
Volume: how much data is coming in to inform storage capacity and tiering requirements -that is the size of the dataset.
Velocity: the speed at which data is processed, either as batches or continuous streams -that is the rate of flow.
Variety: the organization and format of data, capturing structured, semi-structured, and unstructured formats -that is data across multiple stores or types.
Veracity: includes the provenance and curation of considered data sets for governance and data quality assurance -that is accuracy of the data.
Source: Microsoft Azure documentation
Types of Database
Relational Database (RDBMS)
- Easy to maintain and update
- Easy to query and generate reports
- Enforces data integrity
- Flexible for complex data models
- Supports transactions
✅ RDBMS databases are a better choice if you need....
- Strong consistency guarantees
- RDBMS databases provide ACID (Atomicity, Consistency, Isolation, Durability) properties which ensure that the data stored in the system is always in a consistent state, and that transactions are processed in a consistent and reliable manner.
- To support complex queries
- RDBMS databases are optimized for handling complex queries, and support powerful query languages such as SQL which make it easy to retrieve and manipulate data.
- To enforce data integrity constraints
- RDBMS databases provide features such as indexes, keys, and constraints which can be used to enforce data integrity and consistency, and prevent data inconsistencies from arising.
- To support transactions
- RDBMS databases are designed to support transactions, which allows multiple operations to be executed as a single, atomic unit of work, and ensures that the data remains in a consistent state even in the event of errors or failures.
- Multiple concurrent users
- RDBMS databases are designed to support multiple concurrent users, and provide features such as locking and isolation levels that can be used to control concurrent access to the data.
❌ RDBMS databases are less likely a choice if you need....
- Very high scalability
- RDBMS databases are not as well-suited for extremely high scalability scenarios as NoSQL databases.
- High performance for write-intensive workloads
- RDBMS databases are optimized for read-heavy workloads, and may not perform as well as NoSQL databases for write-intensive workloads.
- To store large amounts of unstructured data
- RDBMS databases are designed to store structured data, and may not be the best choice for storing large amounts of unstructured data.
- Lowest latency
- RDBMS databases may not provide the low latency required for real-time or near-real-time applications.
- A flexible schema
- RDBMS databases have a fixed schema, which means that the structure of the data must be defined in advance, and changes to the schema can be difficult and time-consuming.
Types of RDBMS databases
- Amazon: Relational Database Service (RDS), AWS Aurora
- Google Cloud SQL, Google Cloud Spanner
- Azure SQL Database, Azure Database for MySQL, Azure Database for MariaDB/Postgres
- Oracle
- A popular RDBMS used by large organizations with complex data needs.
- Microsoft SQL Server
- An enterprise-level RDBMS used by large organizations with complex data needs.
- MySQL
- An open-source RDBMS used by small to medium-sized businesses.
- PostgreSQL
- An open-source RDBMS.
- DB2
- Enterprise-level RDBMS from IBM.
- Microsoft Access
- A popular desktop RDBMS.
- Sybase
- Enterprise-level RDBMS.
- Informix
- An enterprise-level RDBMS from IBM.
- Ingres
- An open-source RDBMS.
- SQLite
- An open-source RDBMS designed for embedded applications.
Popular SQL Databases
- MySQL
- MySQL is one of the most widely used open-source relational databases in the world. It is popular due to its ease of use, reliability, and cost-effectiveness (it is free and open-source).
- PostgreSQL
- PostgreSQL is a highly advanced open-source relational database that is known for its powerful features and scalability. It is popular for complex data analysis and for applications that require advanced data processing capabilities.
- Microsoft SQL Server
- Microsoft SQL Server is a popular commercial relational database management system (RDBMS) from Microsoft. It is widely used in enterprise applications and is known for its reliability, security, and performance.
- Oracle Database
- Oracle Database is a commercial relational database management system from Oracle Corporation. It is popular for its advanced features, scalability, and performance, and is widely used in large-scale enterprise applications.
- SQLite
- SQLite is a lightweight, file-based SQL database that is widely used in mobile and desktop applications due to its small footprint and ease of use.
- MariaDB
- MariaDB is a popular open-source relational database that is a fork of MySQL. It offers many of the same features as MySQL, but with a more open development process and a larger community of contributors.
- Amazon Aurora
- Amazon Aurora is a commercial relational database management system (RDBMS) from Amazon Web Services (AWS). It is known for its high performance, reliability, and scalability, and is widely used in cloud-based applications.
- IBM Db2
- IBM Db2 is a popular commercial relational database management system from IBM. It is widely used in large-scale enterprise applications and is known for its advanced features and performance.
- SAP HANA
- SAP HANA is a high-performance in-memory relational database management system (RDBMS) from SAP. It is widely used in business intelligence and data analytics applications due to its advanced data processing capabilities.
- PostgreSQL
- PostgreSQL is a popular open-source relational database management system. It is known for its advanced features, scalability, and reliability, and is widely used in enterprise and web applications.
- Firebird
- Firebird is a free, open-source relational database management system that is known for its small footprint, fast performance, and support for multiple operating systems. It is often used in small to medium-sized applications, or for testing and development.
- Informix
- Informix is a commercial relational database management system from IBM. It is widely used in large-scale enterprise applications and is known for its advanced features and scalability.
- Sybase
- Sybase is a commercial relational database management system that is widely used in financial, telecommunications, and other industries. It is known for its performance, reliability, and scalability.
- MSSQL
- Microsoft SQL Server (often referred to as MSSQL) is a popular commercial relational database management system from Microsoft. It is widely used in enterprise applications and is known for its reliability, security, and performance.
- Progress OpenEdge
- Progress OpenEdge is a commercial relational database management system from Progress Software. It is widely used in business applications and is known for its scalability, reliability, and performance.
- Teradata:
- Teradata is a commercial relational database management system that is widely used in data warehousing and business intelligence applications. It is known for its scalability, performance, and ability to handle large amounts of data.
- Microsoft Access
- Microsoft Access is a relational database management system from Microsoft that is widely used for small to medium-sized applications. It is known for its ease of use, integration with other Microsoft Office products, and support for both Windows and Mac operating systems.
- Derby
- Apache Derby is an open-source relational database management system that is widely used in embedded systems and testing and development. It is known for its small footprint, fast performance, and ease of use, and is a popular choice for Java-based applications.
Types of MySQL
If you're just starting out or have a small application, the Community Edition may be sufficient.
If you have a large and complex application with demanding requirements for performance and availability, the Enterprise or Cluster editions may be a better choice. If you need high availability and automatic failover, the InnoDB Cluster or MySQL Fabric editions may be a good fit.
- MySQL Community Edition
- This is the open-source version of MySQL, and is ideal for small to medium-sized projects or for testing and development. It is free to use and has a large community of users and contributors.
- MariaDB
- MariaDB is a community-driven fork of the MySQL database management system. MariaDB has a more open development process, with a larger community of contributors and a commitment to open-source software.
- MySQL Enterprise Edition
- This is the commercial version of MySQL and is intended for larger, mission-critical applications. It includes features such as enhanced security, scalability, and reliability, as well as technical support from the vendor.
- MySQL Cluster
- This edition is designed for high availability and real-time performance, and is ideal for applications that require fast and reliable access to data. It uses a shared-nothing architecture and allows for horizontal scalability.
- MySQL InnoDB Cluster
- This is a high-availability solution that provides automatic failover and disaster recovery capabilities. It is built on top of the InnoDB storage engine and is ideal for businesses that require high uptime and data availability.
- MySQL Fabric
- This is a system for managing multiple MySQL servers as a single unit, and is ideal for applications that require scalability and high availability.
Document Database
- Flexible data models
- Easy to scale
- Efficient indexing
- Easy to store large files
- Query language support
- Amazon SimpleDB
- Azure Cosmos DB
- Google Cloud Firestore
- MongoDB
✅ Document databases are a better choice if you need....
- Flexible schema
- Document databases have a flexible schema, which means that the structure of the data can be changed easily without affecting the existing data.
- To store unstructured or semi-structured data
- Document databases are designed to store unstructured or semi-structured data, such as JSON or XML documents, and can handle variations in the structure of the data.
- High performance for write-intensive workloads
- Document databases are optimized for write-intensive workloads and can handle a high volume of writes.
- Horizontal scalability
- Document databases can scale horizontally by distributing data across multiple nodes, which allows them to handle a large number of concurrent users and high throughput.
- Low latency
- Document databases are designed to provide low latency, which makes them well-suited for real-time or near-real-time applications.
❌ Document databases are less likely a choice if you need....
- Strong consistency guarantees
- Document databases may not provide the same level of consistency guarantees as RDBMS databases.
- Complex queries
- Document databases may not support complex queries as well as RDBMS databases.
- To enforce data integrity constraints
- Document databases may not have the same level of data integrity constraints as RDBMS databases.
- Transactions support
- Document databases may not support transactions as well as RDBMS databases.
- Multiple concurrent users
- Document databases may not have the same level of support for multiple concurrent users as RDBMS databases.
Types of Document Database
- Amazon SimpleDB
- Azure Cosmos DB
- Google Cloud Firestore
- Azure Cosmos DB
- A cloud-based document database from Microsoft.
- MongoDB
- An open-source document database with scalability and flexibility.
- CouchDB
- An open-source document database designed for distributed systems.
- CosmosDB
- A cloud-based document database from Microsoft.
- Apache Cassandra
- An open-source document database designed for distributed systems.
- Redis
- An open-source document database.
- RavenDB
- An open-source document database.
- Amazon DynamoDB
- A cloud-based document database from Amazon.
- IBM Cloudant
- A cloud-based document database from IBM.
- Couchbase
- An open-source document database.
Graph Database
- Easy to query relationships
- Easy to traverse
- Easy to visualize data
- Highly scalable
- Easier to perform complex analytics
✅ Graph databases are a better choice if you need....
- To model and query relationships
- Graph databases are designed to efficiently store and query relationships between data. They are well-suited for use cases such as social networks, recommendation systems, and fraud detection.
- High performance for read-intensive workloads
- Graph databases are optimized for read-intensive workloads and can handle a high volume of complex queries.
- Flexible schema
- Graph databases have a flexible schema, which means that the structure of the data can be changed easily without affecting the existing data.
- To handle large amounts of data
- Graph databases are designed to handle large amounts of data, and can scale horizontally by distributing data across multiple nodes.
- Low latency
- Graph databases are designed to provide low latency, which makes them well-suited for real-time or near-real-time applications.
❌ Graph databases are less likely a choice if you need....
- Strong consistency guarantees
- Graph databases may not provide the same level of consistency guarantees as RDBMS databases.
- High performance for write-intensive workloads
- Graph databases may not perform as well as other databases for write-intensive workloads.
- To store unstructured data
- Graph databases are designed to store structured data, and may not be the best choice for storing large amounts of unstructured data.
- To support transactions
- Graph databases may not support transactions as well as RDBMS databases.
- To support multiple concurrent users
- Graph databases may not have the same level of support for multiple concurrent users as RDBMS databases.
Types of GraphDB databases
- Neo4j
- An open-source graph database.
- Amazon Neptune
- A cloud-based graph database from Amazon.
- Oracle Spatial and Graph
- An enterprise-level graph database from Oracle.
- IBM Graph
- A cloud-based graph database from IBM.
- GraphDB
- An open-source graph database.
- ArangoDB
- An open-source graph database with scalability and flexibility.
- AllegroGraph
- An enterprise-level graph database.
- OrientDB
- An open-source graph database.
- Blazegraph
- An open-source graph database with scalability and flexibility.
- TigerGraph
- An enterprise-level graph database.
Object Database
- Can store objects directly
- Easy to query data
- Supports inheritance
- Supports polymorphism
- Supports object-oriented programming
✅ Object databases are a better choice if you need....
- To store complex, object-oriented data
- Object databases are designed to store complex, object-oriented data, and can handle the relationships and inheritance that are present in object-oriented programming languages.
- To support object-oriented programming languages
- Object databases are designed to support object-oriented programming languages and can be used seamlessly with languages such as Java, C++, and C#.
- High performance for read-intensive workloads
- Object databases are optimized for read-intensive workloads and can handle a high volume of complex queries.
- Flexible schema
- Object databases have a flexible schema, which means that the structure of the data can be changed easily without affecting the existing data.
- To handle large amounts of data
- Object databases are designed to handle large amounts of data, and can scale horizontally by distributing data across multiple nodes.
- To support transactions
- Object databases support transactions as well as RDBMS databases.
❌ Object databases are less likely a choice if you need....
- Strong consistency guarantees
- Object databases may not provide the same level of consistency guarantees as RDBMS databases.
- High performance for write-intensive workloads
- Object databases may not perform as well as other databases for write-intensive workloads.
- To store unstructured data
- Object databases are designed to store structured data, and may not be the best choice for storing large amounts of unstructured data.
- To support multiple concurrent users
- Object databases may not have the same level of support for multiple concurrent users as RDBMS databases.
- To support non-object-oriented languages
- Object databases may not be as easily integrated with non-object-oriented languages.
- Limited support in industry
- Object databases are not as widely used as other databases types such as RDBMS, NoSQL, and Graph databases and may have limited support and resources in the industry.
Types of Object databases
- GemStone/S
- An open-source object-oriented database.
- Oracle Berkeley DB
- An open-source object-oriented database from Oracle.
- Perst
- An open-source object-oriented database.
- ZODB
- An open-source object-oriented database with scalability and flexibility.
- Objectivity/DB
- An enterprise-level object-oriented database.
- Versant Object Database
- An enterprise-level object-oriented database.
- POET
- An open-source object-oriented database.
- ObjectStore
- An enterprise-level object-oriented database from Progress Software.
- db4o
- An open-source object-oriented database with scalability and flexibility.
- T-Kernel Object Database
- An open-source object-oriented database from T-Engine Forum.
Column-Oriented Database
- Faster data retrieval
- Improved query performance
- Compression of data
- Improved storage utilization
- Easier to analyze large amounts of data
✅ Column databases are a better choice if you need....
- High performance for read-intensive workloads
- Column databases are optimized for read-intensive workloads and can handle a high volume of complex queries, particularly when the queries are focused on specific columns of data.
- To store and analyze large amounts of structured data
- Column databases are designed to store and analyze large amounts of structured data, such as time-series data, and can handle a high volume of concurrent users and high throughput.
- To handle high write and update loads
- Column databases can handle high write and update loads and can efficiently handle large amounts of data.
- To support analytics and BI
- Column databases are designed to support analytical and Business Intelligence (BI) workloads, and can be used to perform complex aggregation and data mining operations.
- To support real-time analytics
- Column databases are designed to provide low latency and high throughput, which makes them well-suited for real-time analytics.
- To support horizontal scalability
- Column databases can scale horizontally by distributing data across multiple nodes, which allows them to handle a large number of concurrent users and high throughput.
❌ Column databases are less likely a choice if you need....
- Flexible schema
- Column databases have a fixed schema, which means that the structure of the data must be defined in advance, and changes to the schema can be difficult and time-consuming.
- To store unstructured data
- Column databases are designed to store structured data, and may not be the best choice for storing large amounts of unstructured data.
- To model and query relationships
- Column databases are optimized for storing and querying large amounts of structured data, but may not be well-suited for modeling and querying relationships between data.
- To support Object Oriented Programming
- Column databases may not be as easily integrated with object-oriented languages
- Limited support in industry
- Column databases are not as widely used as other databases types such as RDBMS, NoSQL, and Graph databases and may have limited support and resources in the industry.
- To support transactions
- Column databases may not support transactions as well as RDBMS databases
Types of Column-Oriented databases
- Google Cloud BigTable
- Amazon Redshift
- Apache Cassandra
- An open-source column-oriented database designed for distributed systems.
- HBase
- An open-source column-oriented database with scalability and flexibility.
- Microsoft SQL Server Columnstore
- An enterprise-level column-oriented database from Microsoft.
- Apache Accumulo
- An open-source column-oriented database.
- SAP HANA
- An enterprise-level column-oriented database.
- Druid
- An open-source column-oriented database.
- VoltDB
- An open-source column-oriented database.
- Vertica
- An enterprise-level column-oriented database from HP.
- Infobright
- An open-source column-oriented database with scalability and flexibility.
- IBM Cloudant
- A cloud-based column-oriented database from IBM.
In-Memory Database:
- Faster response time
- Improved scalability
- Easier to update data
- Improved data manipulation
- Improved data processing and analytics
- Amazon ElasticCache
Types of In-Memory Database databases
- Redis
- An open-source in-memory database.
- Microsoft SQL Server In-Memory OLTP
- An enterprise-level in-memory database from Microsoft.
- Oracle TimesTen In-Memory Database
- An enterprise-level in-memory database from Oracle.
- SAP HANA
- An enterprise-level in-memory database.
- VoltDB
- An open-source in-memory database.
- Hazelcast IMDG
- An open-source in-memory database with scalability and flexibility.
- IMQavl
- An open-source in-memory database.
- MemSQL
- An enterprise-level in-memory database.
- IBM Informix
- An enterprise-level in-memory database from IBM.
- Aerospike
- An open-source in-memory database.
✅ In-Memory databases are a better choice if you need:
- High performance
- In-memory databases store data in RAM, which is much faster than storing data on disk. This can significantly improve the performance of read-heavy and write-heavy workloads.
- Low Latency
- In-memory databases are designed for low latency, which makes them well-suited for real-time or near-real-time applications.
- High Throughput
- In-memory databases can handle a high volume of concurrent users and high throughput, which makes them well-suited for high-traffic applications.
- To support large data sets
- In-memory databases can handle large data sets and can scale horizontally by distributing data across multiple nodes.
- To support complex queries
- In-memory databases are optimized for handling complex queries, and support powerful query languages such as SQL which make it easy to retrieve and manipulate data.
- To support high-performance analytics
- In-memory databases are well-suited for high-performance analytics, as they can perform complex calculations and aggregations quickly.
❌ In-Memory databases are less likely a choice if you need:
- To store large amounts of data
- In-memory databases store data in RAM, which is limited compared to disk storage. Therefore, they may not be able to store as much data as disk-based databases.
- Durability
- In-memory databases may not provide the same level of durability as disk-based databases, as the data is stored in RAM and can be lost in case of power failure.
- To support long-term data retention
- In-memory databases are not designed for long-term data retention, and data may need to be periodically offloaded to disk-based storage.
- To support disaster recovery
- In-memory databases may not provide the same level of disaster recovery as disk-based databases, as data is stored in RAM and can be lost in case of power failure.
- To support non-volatile memory
- In-memory databases may not support non-volatile memory, which means that the data may not persist after a power failure.
- Limited support in industry
- In-memory databases are not as widely used as other databases types such as RDBMS, NoSQL, and Graph databases and may have limited support and resources in the industry.
Key-Value Database:
- Highly scalable
- High performance
- Simple data structure
- Easier to distribute data
- Easier to manage large datasets
- Amazon DynamoDB
- Google Cloud Datastore
Types of Key-Value Database databases
- Redis
- An open-source key-value database.
- Oracle NoSQL Database
- An enterprise-level key-value database from Oracle.
- Memcached
- An open-source key-value database with scalability and flexibility.
- Couchbase
- An open-source key-value database.
- Aerospike
- An open-source key-value database.
- Riak
- An open-source key-value database.
- Amazon DynamoDB
- A cloud-based key-value database from Amazon.
- Microsoft Azure Table Storage
- A cloud-based key-value database from Microsoft.
- IBM Cloudant
- A cloud-based key-value database from IBM.
- Berkeley DB
- An open-source key-value database.
✅ Key-Value databases are a better choice if you need:
- High performance
- Key-value databases are optimized for high performance, and can handle a high volume of read and write operations.
- High scalability
- Key-value databases can scale horizontally by distributing data across multiple nodes, which allows them to handle a large number of concurrent users and high throughput.
- Flexible schema
- Key-value databases have a flexible schema, which means that the structure of the data can be changed easily without affecting the existing data.
- To store unstructured or semi-structured data
- Key-value databases are designed to store unstructured or semi-structured data, and can handle variations in the structure of the data.
- To support caching
- Key-value databases are commonly used as a caching layer, to speed up the retrieval of frequently-accessed data.
- To support distributed systems
- Key-value databases are well-suited for distributed systems, as they can easily distribute data across multiple nodes.
❌ Key-Value databases are less likely a choice if you need:
- Strong consistency guarantees
- Key-value databases may not provide the same level of consistency guarantees as RDBMS databases.
- Complex queries
- Key-value databases may not support complex queries as well as RDBMS databases.
- To enforce data integrity constraints
- Key-value databases may not have the same level of data integrity constraints as RDBMS databases.
- Transaction support
- Key-value databases may not support transactions as well as RDBMS databases.
- To support joins
- Key-value databases are not designed to support joins, which may make it difficult to retrieve related data.
- To support advanced analytics
- Key-value databases may not have the same level of support for advanced analytics as other databases, such as column-based databases
Time Series Database:
- Easier to store and query time-series data
- Improved query performance
- Improved storage utilization
- Supports temporal analytics
- Easier to perform complex analytics
Types of Time Series Databases
- InfluxDB
- An open-source time series database.
- Prometheus
- An open-source time series database with scalability and flexibility.
- TimescaleDB
- An open-source time series database.
- Graphite
- An open-source time series database with scalability and flexibility.
- KairosDB
- An open-source time series database.
- OpenTSDB
- An open-source time series database.
- Microsoft Azure Time Series Insights
- A cloud-based time series database from Microsoft.
- Amazon Timestream
- A cloud-based time series database from Amazon.
- MetricDB
- An open-source time series database.
- IBM Cloud Data Lake
- A cloud-based time series database from IBM.
✅ Time Series databases are a better choice if you need:
- To store and analyze time-series data
- Time series databases are optimized for storing and analyzing time-series data, such as sensor data, financial data, and system metrics.
- To handle high write loads
- Time series databases are optimized for handling high write loads, and can efficiently insert, update and query large amounts of time-series data.
- To support advanced analytics
- Time series databases are designed to support advanced analytics, such as anomaly detection, forecasting and aggregation operations on time-series data.
- To handle large data sets
- Time series databases can handle large data sets, and can scale horizontally by distributing data across multiple nodes.
- To support real-time analytics
- Time series databases are designed to provide low latency and high throughput, which makes them well-suited for real-time analytics.
- To support horizontal scalability
- Time series databases can scale horizontally by distributing data across multiple nodes, which allows them to handle a large number of concurrent users and high throughput.
❌ Time Series databases are less likely a choice if you need:
- To store unstructured or semi-structured data
- Time series databases are designed to store structured data, and may not be the best choice for storing large amounts of unstructured data.
- To support complex relationships
- Time series databases may not support complex relationships as well as other databases, such as graph databases.
- To support transactional workloads
- Time series databases may not support transactional workloads as well as RDBMS databases.
- To support ad-hoc queries
- Time
Event Store Database:
- Tracks and stores events
- Allows for event replay
- Easy to query event data
- Easy to analyze event data
- Supports event-driven architectures
✅ Event Store databases are a better choice if you need:
- To store and process events
- Event store databases are optimized for storing and processing events, such as log data, sensor data, and financial transactions.
- To support event sourcing
- Event store databases are well-suited for event sourcing, a pattern that uses an event log to store the state of an application and to allow the reconstruction of the application state at any point in time.
- To support complex event processing
- Event store databases can support complex event processing, which allows you to analyze and respond to patterns in large data sets in real-time.
- To handle large data sets
- Event store databases can handle large data sets, and can scale horizontally by distributing data across multiple nodes.
- To support real-time analytics
- Event store databases are designed to provide low latency and high throughput, which makes them well-suited for real-time analytics.
- To support horizontal scalability
- Event store databases can scale horizontally by distributing data across multiple nodes, which allows them to handle a large number of concurrent users and high throughput.
❌ Event Store databases are less likely a choice if you need:
- Strong consistency guarantees
- Event store databases may not provide the same level of consistency guarantees as RDBMS databases.
- Complex queries
- Event store databases may not support complex queries as well as RDBMS databases.
- To support transactional workloads
- Event store databases may not support transactional workloads as well as RDBMS databases.
- To support ad-hoc queries
- Event store databases are optimized for storing and processing events, rather than ad-hoc queries.
- To support complex relationships
- Event store databases may not support complex relationships as well as other databases, such as graph databases.
- Limited support in industry
- Event store databases are not as widely used as other databases types such as RDBMS, NoSQL, and Graph databases and may have limited support and resources in the industry.
Types of Event Store Databases
- Apache Kafka
- An open-source event store database.
- Apache Pulsar
- An open-source event store database with scalability and flexibility.
- AWS Kinesis Data Streams
- A cloud-based event store database from Amazon.
- Apache Storm
- An open-source event store database.
- Apache Flink
- An open-source event store database.
- Apache Samza
- An open-source event store database.
- Azure Event Hubs
- A cloud-based event store database from Microsoft.
- Google Cloud Pub/Sub
- A cloud-based event store database from Google.
- IBM Event Streams
- A cloud-based event store database from IBM.
- Apache Kafka Streams
- An open-source event store database.
Cloud Managed Database Services
AWS Managed Databases
- Amazon Aurora
- a fully managed relational database service that is compatible with MySQL and PostgreSQL.
- Amazon DocumentDB
- a fully managed document database service that is compatible with MongoDB.
- Amazon DynamoDB
- a fully managed NoSQL database service that supports key-value and document data structures.
- Amazon ElastiCache
- a fully managed in-memory data store and cache service.
- Amazon Keyspaces (for Apache Cassandra)
- a fully managed Apache Cassandra compatible database service.
- Amazon MemoryDB for Redis
- a fully managed in-memory database service that is compatible with Redis.
- Amazon Neptune
- a fully managed graph database service.
- Amazon QLDB
- a fully managed ledger database service.
- Amazon RDS
- a fully managed relational database service that supports multiple database engines, including MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
- Amazon Redshift
- a fully managed data warehouse service.
- Amazon Timestream
- a fully managed time series database service.
- Amazon Aurora
- Amazon DocumentDB
- Amazon DynamoDB
- Amazon ElastiCache
- Amazon Keyspaces (for Apache Cassandra)
- Amazon MemoryDB for Redis
- Amazon Neptune
- Amazon QLDB
- Amazon RDS
- Amazon Redshift
- Amazon Timestream
GCP Managed Databases
- Cloud SQL
- A fully-managed relational database service that supports MySQL, PostgreSQL, and SQL Server.
- Cloud Spanner
- A fully-managed, globally-distributed, and strongly consistent relational database service.
- Cloud Firestore
- A fully-managed, NoSQL document database service.
- Cloud Bigtable
- A fully-managed, NoSQL wide-column store database service.
- Cloud Memorystore
- A fully-managed, in-memory data store service that is compatible with Redis.
- Cloud Datastore
- A NoSQL document database service that is fully managed, and scales automatically with your application.
- Cloud Data Loss Prevention (DLP)
- It is a data discovery, classification, and redaction platform that helps you find, classify, and protect sensitive data across your organization.
Azure Managed Databases
- Azure SQL Database
- A fully-managed relational database service that supports SQL Server and Azure Data Studio.
- Azure Cosmos DB
- A fully-managed, globally-distributed, and multi-model NoSQL database service.
- Azure Database for MySQL
- A fully-managed, scalable MySQL database service.
- Azure Database for PostgreSQL
- A fully-managed, scalable PostgreSQL database service.
- Azure Database for MariaDB
- A fully-managed, scalable MariaDB database service.
- Azure Cache for Redis
- A fully-managed in-memory data store service that is compatible with Redis.
- Azure Databricks
- A fully-managed, cloud-native big data and machine learning platform that allows you to easily process large amounts of data using Apache Spark.
- Azure Stream Analytics
- A fully-managed, real-time data stream processing service that allows you to process high volumes of streaming data from various sources.
Apache Cassandra
Features
Cassandra is different from other similar databases in several ways, including:
Distributed architecture
- Cassandra is designed to scale horizontally across multiple machines, providing high availability and fault-tolerance. This makes it well-suited for large-scale, distributed applications.
Data model
- Cassandra uses a column family-based data model, which is different from the relational model used by traditional databases. This allows it to handle large amounts of structured, semi-structured, and unstructured data.
Tunable consistency
- Cassandra provides tunable consistency, meaning that you can choose the level of consistency that is appropriate for your application. This allows you to trade off consistency for availability and performance.
Query language
- Cassandra uses its own query language, CQL (Cassandra Query Language), which is similar to SQL. This allows developers to perform complex queries and aggregations on the data.
Multi-data center support
- Cassandra is built for multi-data center deployment, and it allows for data replication across multiple geographical locations, providing low-latency data access for geographically dispersed users.
Write-heavy workloads
- Cassandra is designed to handle write-heavy workloads, which makes it well-suited for use cases such as real-time analytics, time-series data, and online gaming.
Use-cases
Cassandra is well-suited for a wide range of enterprise applications, websites, and mobile apps that have high scalability, availability, and performance requirements. Some examples of use cases that can benefit from using Cassandra include:
Real-time analytics
- Cassandra's ability to handle large amounts of write-heavy data, combined with its distributed architecture and tunable consistency, make it well-suited for real-time analytics applications that need to process large amounts of data quickly and make it available to users in near real-time.
Time-series data
- Cassandra's column family-based data model makes it well-suited for storing and querying time-series data, such as sensor data, financial data, or telemetry data.
Online gaming
- Cassandra's ability to handle write-heavy workloads and provide low-latency data access make it well-suited for online gaming applications that need to handle large numbers of concurrent players and perform real-time updates to the game state.
Social media
- Cassandra's ability to handle large amounts of semi-structured data, combined with its distributed architecture and tunable consistency, make it well-suited for social media applications that need to store and retrieve large amounts of user-generated content.
E-commerce
- Cassandra's ability to handle write-heavy workloads, combined with its distributed architecture, tunable consistency, and multi-data center support, make it well-suited for e-commerce applications that need to handle large numbers of concurrent users and perform real-time updates to the inventory and order status.
Internet of Things (IoT)
- Cassandra's ability to handle large amounts of time-series data and perform real-time analytics make it well-suited for IoT applications that need to store and process sensor data from a large number of devices.
References
- Database rankings
- DB-Engines Ranking of Relational DBMS
- Key-value stores
- Document stores
- Graph DBMS
- Time Series DBMS
- Search engines
- Object oriented DBMS
- RDF stores
- Wide column stores
- Multivalue DBMS
- Native XML DBMS
- Spatial DBMS
- Event Stores
- Content stores