Ab Initio Interview Questions and Answers:
1. Define Ab Initio?
Ab Initio is a word taken from Latin, which means “from the beginning”. It is an ETL tool performing GUI based parallel processing. It is also a business intelligence platform and used for data analysis, batch processing, and data management.
2. What does the architecture of Ab Initio include?
Ab Initio’s architecture includes
- GDE (Graphical Development Environment)
- Enterprise meta-environment (EME)
- Co-operating System
3. Explain what data processing is and what are the fundamentals in this approach?
In order to process the data, it actually needs to be stored and analyzed before. Following are the factors that come while data processing.
- Collection of Data
- Final Outcomes
These are regarded as the fundamental components to keep the pace up in data processing.
4. Explain what dependency analysis is in Ab Initio?
EME is a repository in Ab Initio. Dependency analysis is a process where the entire project is examined by EME. It examines the flow of data and its transformation from one component to another, from one file to another, between and with graphs.
5. Explain the segregation of Ab Initio EME?
Ab Initio is segregated into two main segments that are as follow
- Data Integration Portion
- User Interface
6. Explain the data processing cycle and its significance?
Data has to be processed continuously even when in use, which is called data processing cycle. Depending on the nature, type or size of the data, it provides results, may be quickly or in a time gap. Thus the complexity is increasing creating a need for methods which are reliable than the existing approaches.
While in this process, the data cycle makes sure the complexity is avoided to the maximum possible extent.
7. Explain de-partition?
De-partition is a process with the purpose of reading and rejoining data records from multiple operations or flows. There are many numbers of de-partition components available out there like Merge, Gather, Interleave and Concatenation.
8. Explain Sort Component?
Sort component is used to re-order the data. It mainly consists of two parameters as follows.
- Key: It helps in determining the collation order. As in the name, it is one of the key parameters for sort component.
- Max-core: It controls the frequency of dumping data by sort component from memory to disk.
9. What is a local lookup?
While Lookup looks into an entire file for the matching record, Local lookup looks into specific partition only. Local lookup does a lookup in the partition where the key, which is used, for lookup resides. By using the transform function, it retrieves the records quicker than to retrieve from the disk.
10. Explain how to run a graph infinitely?
To run a graph infinitely, the end script should call the .ksh graph file in the graph. For e.g.: If a graph is named xyz.mp, the end script should call the abc.ksh file.
11. What are the layouts that ab initio supports?
Ab Initio supports two kinds of layouts. 1). Serial layout and 2). Parallel layout. A graph can have both the layouts at the same time but the parallel layout depends on the degree of data parallelism. The layout is defined such that it is same as the degree of parallelism like for a 4-way parallel multi-file system, the component in a graph can run 4 way parallel.
12. What parallelisms does Ab Initio support?
Ab Initio supports 3 parallelisms. They are
Data Parallelism: It works on the same data (which is divided into segments) parallelly in a single application
Component Parallelism: It works on different data parallelly in a single application
Pipeline Parallelism: It works on the same data from multiple components which are passed from one component to another.
13. Define deadlock and explain the conditions in which it occurs?
A graphical or program hang is called a deadlock. When a deadlock occurs, a further process of a program stops.
Deadlock occurs in the following conditions.
- Data flow pattern may cause a deadlock
- A graph flow, which converges and diverge in a single phase, would lead to a deadlock.
14. List the file extensions that are used in Ab Initio.
- mp: It is used to stores graph or graph component in Ab initio
- mpc: Custom component or program
- mdc: Dataset or custom data-set component
- dml: DML (Data manipulation language) file or record type definition
- xfr: Transform function file
- dat: Data file (multi-file or serial file)
15. What is the difference between roll-up and scan?
Using ‘scan’, we can create cumulative summary records, whereas using roll-up, we cannot.
16. What is SANDBOX in Ab Initio?
It is a collection of different graphs and their related files, which are saved in a single directory tree. For the purpose of navigation, migration and version control, it behaves as a single group.
17. Difference between conventional loading and direct loading?
Conventional Load: In this, the Table constraints will be checked against the data, before loading the data
Direct Load: This is used for fast loading. Here the data is loaded first irrespective of the Table constraints and later checked. Unmatched or bad data will not be indexed then.
18. What is a local and formal parameter?
These are two types graph level parameters.
In the local parameter, at the time of declaration, we need to initialize the value. Whereas, in a formal parameter, the parameter will be prompted at the runtime.
19. Explain the procedure of running the graph without GDE (Graphical Development Environment)?
In RUN ==> Deploy >> As script, it creates a .bat file at your host directory
Now, run the .bat file from Command prompt.
20. What are the continuous or continuously enabled components in Ab Initio?
Continuous components are used to create graphs. Along with continuous running, it produces a useful output file. E.g.: continuous rollup, batch subscribes continuous update etc.