Different data structures and data pipelines

Different data structures and data pipelines

Flow of data from one system to another system with or without logical changes on data.

In the modern world huge data is getting generated from multiple machinery, tools and software systems. 

The more and more generation and capturing the data is termed as big data and the scope if big data is increasing day by day. Every organization wants to store the data generated by IOT devices, machinery, home appliances, software logs and process them for making future predictions and analyzing the company statistics on that.

To do all the big data processing we need a good solution to capture and then start process and provide useful insights from data to make predictions architects are creating the data processing flows which are referred as Data pipelines. 

We need to define this data pipelines adding each task in pipeline to perform desired stage task and send data to other stage for further processing or storing.

As the data is more and more evolving data is categorised into multiple ways.

  • Structured Data
  • Semi-Structured Data
  • Un-Structured Data

Each type of data has it's own significance.

Structured Data: Data is available as most understandable format as it's arranged in Rows and Columns in a tabular way, helps to run SQL queries to bring insights like averages, mean, median, summation.


Semi-Structured Data : Is the form of data almost similar to Structured data but doesn't exist in the form of Rows and Columns in Tabular form, examples :  XML, HTML, JSON can be considered.


Un-Structured Data : Which lacks the proper format in analyzing the data. It doesn't have a predefined structure or it's not organized data, examples : mails, video files, documents, images data.

To process such different kinds of data structure we need to define a set processes to bring outcome of those datasets.


Data pipelines flow:







Comments