New Trending Platforms and Tools (updated recently)

March 2023

DORA

The DevOps Research and Assessment (DORA) team has identified four key metrics that indicate the performance of a software development team:

Deployment Frequency—How often an organization successfully releases to production
Lead Time for Changes—The amount of time it takes a commit to get into production
Change Failure Rate—The percentage of deployments causing a failure in production
Time to Restore Service—How long it takes an organization to recover from a failure in production

Retool

Retool is a low code framework that allows users to build internal tools and applications quickly and easily, without having to write code from scratch. It provides a drag-and-drop interface for building user interfaces and connecting to data sources, as well as a range of pre-built components and integrations that can be customized to meet specific requirements.

Prototyping tools and applications much faster than traditional coding approaches. This can help organizations save time and money, and enable teams to iterate and respond to changing requirements more quickly.
Ease of use: Retool's drag-and-drop interface makes it easy for users to create user interfaces and connect to data sources, even if they don't have a background in programming or software development.
Customization: Retool provides a range of pre-built components and integrations that can be customized to meet specific requirements, making it a flexible platform for different use cases.
Integration: Retool can integrate with a range of data sources and systems, including databases, APIs, and web services, making it a powerful tool for building internal tools and applications that can access and analyze data from different sources.
Fairly new so may not have as many integrations as other tools.
Not built for highest scalability.

Backstage

Backstage: Restoring Order To Your Chaos - Dave Zolotusky, Spotify https://www.youtube.com/watch?v=AlQYP88N3Og

"Backstage is an open source framework for building developer portals, created at Spotify, donated to the CNCF, and adopted by hundreds of companies."

https://backstage.spotify.com/

Centralized hub: Backstage provides a centralized location where developers can easily discover and manage software assets across the organization.
Markdown TechDocs: Backstage uses Markdown TechDocs, which allows developers to write and manage documentation alongside their code, making it easier to keep documentation up-to-date and consistent.
Open-source: Being open-source, Backstage provides a flexible and customizable solution for organizations to manage their software assets.

Delta Lake

https://github.com/delta-io/delta

Delta Lake 2.0 Overview https://www.youtube.com/watch?v=VWJT3JyPKvk

Making Apache Spark™ Better with Delta Lake https://www.youtube.com/watch?v=LJtShrQqYZY

Delta Lake is an open-source data storage layer that provides reliable, scalable, and performant data management for big data workloads. It is built on top of Apache Spark and provides support for both batch and streaming data processing. Delta Lake offers several features, such as ACID transactions, schema enforcement, and versioning, that make it easier to manage and maintain big data workloads.

Pros:

ACID transactions: Delta Lake supports ACID transactions, which ensure that data is consistent and reliable, even in the face of concurrent modifications and failures.
Schema enforcement: Delta Lake can enforce schemas, which ensures that data is of the expected type and format, reducing the risk of data quality issues.
Versioning: Delta Lake provides versioning capabilities, which allows for easy rollback and recovery in the event of data issues or failures.
Open-source: Being open-source, Delta Lake provides a flexible and customizable solution for big data workloads.

Cons:

Complexity: Delta Lake can be complex and require significant setup and configuration, especially for organizations that are new to big data processing.
Integration: Delta Lake is built on top of Apache Spark, which may not be compatible with all big data workflows and tools.
Performance overhead: While Delta Lake provides several features that improve data quality and reliability, these features can come with a performance overhead that may impact the speed of data processing.

AWS Database Migration Service

https://aws.amazon.com/dms/ https://aws.amazon.com/dms/features/

Migrating Databases with AWS Database Migration Service (DMS) - Demo https://www.youtube.com/watch?v=JTSLF42bv9Y

AWS Database Migration Service (DMS) is a fully-managed service that helps migrate databases to and from the AWS Cloud, or between different AWS database services. It supports a variety of database migration scenarios, including homogeneous migrations (e.g., from one version of a database to another) and heterogeneous migrations (e.g., from one database platform to another). DMS also provides continuous data replication, allowing for real-time updates to the target database.

Pros:

Fully-managed: AWS DMS is a fully-managed service, which means that AWS takes care of the underlying infrastructure and maintenance, reducing the operational burden for organizations.
Wide range of supported migration scenarios: DMS supports a wide range of database migration scenarios, including homogeneous and heterogeneous migrations.
Continuous data replication: DMS provides continuous data replication, which ensures that data is kept up-to-date between the source and target databases in near real-time.
Integration with AWS services: DMS integrates well with other AWS services, such as AWS Schema Conversion Tool (SCT), Amazon CloudWatch, and AWS Identity and Access Management (IAM), making it easier to manage and monitor migrations.

Cons:

Cost: While DMS is a fully-managed service, it still incurs costs based on usage and data transfer, which can add up quickly for large-scale migrations or ongoing replication.
Complexity: Depending on the complexity of the migration scenario, DMS can require significant configuration and setup, which may require specialized skills or expertise.
Limited customization: While DMS supports a wide range of migration scenarios, it may not offer the same level of customization or control as self-managed migration tools.

Colima - container runtimes on macOS (and Linux) with minimal setup.

https://github.com/abiosoft/colima

Colima | Open Source alternative for Docker Desktop https://www.youtube.com/watch?v=v3sf_Ekhmtw&t=53s

Container runtimes on macOS (and Linux) with minimal setup

Colima is a container runtime that runs on macOS and Linux systems, allowing users to run containerized applications locally. It is designed to be lightweight and easy to use, with minimal setup required. Colima runs on top of other container runtimes, such as Docker, and provides a simple interface for managing containers.

Pros:

Lightweight: Colima is designed to be lightweight, which means it has minimal system requirements and doesn't use a lot of system resources.
Easy to use: Colima has a simple and intuitive interface, making it easy to manage containers locally without needing to learn complex command-line tools.
Minimal setup: Colima has minimal setup requirements, which means that users can quickly get up and running with containerized applications without needing to spend a lot of time on configuration.
Runs on macOS and Linux: Colima runs on both macOS and Linux systems, making it a versatile tool for developers working on either platform.

Cons:

Limited features: Colima is designed to be a lightweight and easy-to-use tool, which means that it may not have all of the features and functionality of more full-featured container runtimes.
Limited compatibility: While Colima runs on both macOS and Linux systems, it may not be compatible with all hardware and software configurations.
Integration: Colima runs on top of other container runtimes, such as Docker, which may require additional setup and configuration for integration with other tools and workflows.

Databricks Photon

https://www.databricks.com/product/photon

Advancing Spark - The Photon Whitepaper https://www.youtube.com/watch?v=hxvQxI4FksY

"Photon is the next generation engine on the Databricks Lakehouse Platform that provides extremely fast query performance at low cost – from data ingestion, ETL, streaming, data science and interactive queries – directly on your data lake. Photon is compatible with Apache Spark™ APIs, so getting started is as easy as turning it on – no code changes and no lock-in."

Fast query performance: Photon is designed to provide extremely fast query performance on large datasets, making it ideal for data-intensive workloads.
Cost-effective: Photon is designed to keep costs low by optimizing query performance and reducing the need for expensive hardware or infrastructure.
Advanced indexing techniques: Photon uses advanced indexing techniques, such as Bitmap Indexing and Dimensional Indexing, to accelerate query performance and reduce data processing times.
In-memory computing: Photon utilizes in-memory computing to store frequently accessed data in memory, reducing data access times and improving query performance.
Distributed computing: Photon is designed to leverage distributed computing to scale query processing across multiple nodes, improving query performance and reducing data processing times.
Requires Databricks Lakehouse Platform: Photon is only available as part of the Databricks Lakehouse Platform, which may require additional setup and configuration to integrate into existing workflows or toolchains.

Datahub

The Metadata Platform for the Modern Data Stack - DataHub's extensible metadata platform enables data discovery, data observability and federated governance that helps tame the complexity of your data ecosystem.

Data discovery: DataHub's metadata platform enables data discovery by providing a centralized catalog of data assets across an organization's different systems and tools. This makes it easier for users to find the data they need for their work.
Data observability: DataHub's metadata platform provides observability into the quality and usage of data across an organization's different systems and tools. This enables users to better understand the data they are working with and make more informed decisions.
Federated governance: DataHub's metadata platform enables federated governance by providing a centralized platform for managing metadata across different tools, systems, and teams. This helps organizations ensure that their data is accurate, reliable, and compliant with relevant regulations and policies.
Extensibility: DataHub's metadata platform is extensible, which means that it can be customized and extended to meet an organization's specific needs and workflows.
Open source: DataHub's metadata platform is open source, which means that it is freely available to use and can be modified and extended by the community.

DataOps.live

https://www.dataops.live/
DataHub Blog - https://blog.datahubproject.io/
DataHub YouTube Channel - https://www.youtube.com/channel/UC3qFQC5IiwR5fvWEqi_tJ5w
DataHub: Popular Metadata Architectures Explained

"The DataOps.live platform is helping data product teams in this global pharmaceutical giant to orchestrate and benefit from next-generation analytics on self-service data & analytics infrastructure consisting of Snowflake and other tools using data mesh approach."

DataOps.live is a platform designed to help data product teams streamline their data pipelines and improve collaboration between different teams and stakeholders. It provides a range of tools and features to help teams manage their data workflows, from data ingestion and processing to modeling and analytics.

Streamlined data workflows: DataOps.live helps data product teams streamline their data workflows by providing a centralized platform for managing data pipelines and collaborating with other teams and stakeholders.
Improved collaboration: DataOps.live provides features for collaboration and communication, including project management tools, issue tracking, and chat capabilities. This helps teams work together more efficiently and effectively.
Data quality and governance: DataOps.live provides tools for data quality monitoring and governance, including data validation, testing, and auditing capabilities. This helps teams ensure that their data is accurate, reliable, and compliant with relevant regulations and policies.
Flexible deployment options: DataOps.live can be deployed on-premises or in the cloud, giving teams the flexibility to choose the deployment option that best meets their needs and requirements.
Customizable: DataOps.live is highly customizable and can be tailored to meet the specific needs and workflows of different teams and organizations.

Feast: Feature Store for Machine Learning

https://feast.dev/

Open Source Feature Store for Production ML
Feast is a standalone, open-source feature store that organizations use to store and serve features consistently for offline training and online inference.

Feast's Feature Store is an open-source platform that helps organizations manage and serve machine learning (ML) features for training and inference. It provides a centralized platform for storing, managing, and serving feature data to ML models, making it easier for organizations to develop, deploy, and monitor ML models at scale.

Pros:

Centralized feature store: Feast's Feature Store provides a centralized platform for storing and managing feature data, which can help organizations avoid silos and inconsistencies in feature data across different systems and teams.
Streamlined data workflows: DataOps.live helps data product teams streamline their data workflows by providing a centralized platform for managing data pipelines and collaborating with other teams and stakeholders.
Improved collaboration: DataOps.live provides features for collaboration and communication, including project management tools, issue tracking, and chat capabilities. This helps teams work together more efficiently and effectively.
Data quality and governance: DataOps.live provides tools for data quality monitoring and governance, including data validation, testing, and auditing capabilities. This helps teams ensure that their data is accurate, reliable, and compliant with relevant regulations and policies.
Flexible deployment options: DataOps.live can be deployed on-premises or in the cloud, giving teams the flexibility to choose the deployment option that best meets their needs and requirements.
Customizable: DataOps.live is highly customizable and can be tailored to meet the specific needs and workflows of different teams and organizations.

Monte Carlo

https://www.montecarlodata.com/

Monte Carlo is a data platform that helps organizations manage and govern their data, ensuring data reliability, quality, and availability. It provides a range of tools and features for monitoring, detecting, and resolving data issues, so that organizations can make better decisions and improve their business outcomes.
Data reliability: Monte Carlo helps organizations ensure data reliability by monitoring data quality and availability, detecting data anomalies, and identifying potential issues before they impact business outcomes.
Collaboration: Monte Carlo provides features for collaboration and communication, including project management tools, issue tracking, and chat capabilities. This helps teams work together more efficiently and effectively.
Data governance: Monte Carlo provides tools for data governance, including data lineage, data cataloging, and data privacy management. This helps organizations comply with relevant regulations and policies and avoid risks associated with data misuse.
Integration: Monte Carlo can integrate with a range of data sources and systems, including cloud platforms, data warehouses, and data lakes, making it a flexible platform for different use cases.
User-friendly: Monte Carlo is designed to be user-friendly, with a simple and intuitive interface that makes it easy for users to monitor and manage their data.

New Trending Platforms and Tools (updated recently)

March 2023​

DORA​

Retool​

Backstage​

Delta Lake​

AWS Database Migration Service​

Colima - container runtimes on macOS (and Linux) with minimal setup.​

Databricks Photon​

Datahub​

DataOps.live​

Feast: Feature Store for Machine Learning​

Monte Carlo​

March 2023

DORA

Retool

Backstage

Delta Lake

AWS Database Migration Service

Colima - container runtimes on macOS (and Linux) with minimal setup.

Databricks Photon

Datahub

DataOps.live

Feast: Feature Store for Machine Learning

Monte Carlo