Friday, November 28, 2014

Ab initio Parallelism


Ab Initio can process data in parallel runtime environment.

Ab Initio provides 3 ways of parallelism

          Component Parallelism
          Pipeline Parallelism
          Data Parallelism

Data Parallelism
Data is processed at the different servers at the same time.
Data parallelism occurs when a graph separates data into multiple divisions, allowing multiple copies of program components to operate on the data in all the divisions simultaneously.
This is the most common parallelism when you partition your data to be processed fast.This is achieved thru partitioning. For example you have 1000 records and you divide them to 8 computers to process fast. 

Pipeline Parallelism
Pipeline parallelism occurs when several connected program components on the same branch of a graph execute simultaneously.  If you are using a sort component the pipeline parallelism does not occur.
A graph with multiple components running simultaneously on the same data uses pipeline parallelism. Each component in the pipeline continuously reads from upstream components, processes data, and writes to downstream components. Since a downstream component can process records previously written by an upstream component, both components can operate in parallel.
NOTE: To limit the number of components running simultaneously, set phases in the graph
For example you can keep on reading the data from input file(say 10  records) but till now processed only 6 records. This is called pipeline parallelism when one component does not wait for all the data to come and starts processing parallely in a pipe. 

Component Parallelism
In this two or more components process the records in parallel.
A graph with multiple processes running simultaneously on separate data uses component parallelism.

This kind of parallelism is specific to your graph when 2 different components are not interrelated and they process the data parallely. For example you have 2 input files and you sort the data of both of them in 2 different flows. Then these 2 components are under component parallelism.


Useful Links


1.Ab Initio Sandbox
2.Ab Initio Components
3.Ab initio Intoduction
4.Ab Initio Basic Graph Development     
5.Ab Initio Multifile