Recipes Compute
AWS Database
Process of data storing
Ingest
Prepare
Store
Analyze
Action
Ingest:
- This is the first step in the process, where data is collected from various sources.
- This could include scraping data from websites, receiving data from third-party APIs, or manually inputting data into a system.
- Examples of data sources include social media, sensor networks, and e-commerce platforms.
- Prepare:
- Once data is collected, it needs to be cleaned, transformed, and structured so that it can be stored and analyzed.
- This step may include tasks such as removing duplicate data, filling in missing values, and converting data into a consistent format.
- Store:
- After the data has been prepared, it can be stored in a database or data warehouse.
- This could include storing data in a relational database, like MySQL or PostgreSQL, or in a NoSQL database, like MongoDB or Cassandra.
- Analyze:
- With the data stored, it can be analyzed to gain insights and make decisions.
- This step could include tasks such as running SQL queries to extract specific data, using machine learning algorithms to identify patterns, or generating visualizations to make the data more understandable.
- Action:
- Based on the insights gained from the analysis, data can be used to take action.
- This could include automating business processes, creating new products or services, or making decisions about how to improve the overall performance of a company.
Structured data
- eCommerce transaction data
- Relational databases, such as MySQL or PostgreSQL, which store data in tables with defined columns and rows.
- CSV, XLS, and other spreadsheet files, which have a fixed number of columns and rows, with data in specific cells.
- XML and JSON data, which are structured according to a pre-defined schema or format.
- Electronic health records, which have specific fields for patient demographics, diagnosis, and treatment information.
- Financial transactions, which have specific fields for date, amount, account number, and other details.
- Scientific data, such as measurements from experiments, which have defined fields for variables and units of measurement.
- CRM data and other line of business applications
- transaction data
- Relational data
Unstructured data
Social feeds Video Digital images Website clickstream analysis Natural language text, such as blog posts, articles, and customer reviews Images and videos Audio recordings, such as interviews or podcasts Social media posts and comments Email messages and attachments PDF documents and spreadsheets Sensor data, such as temperature readings or GPS coordinates Log files from servers or applications