ETL Interview Questions and Answers
What is ETL process? How many steps ETL contains? Explain with example.
- ETL stands for Extraction, Transforming and Loading.
- Data is extracted from the source(database servers), and applied for generating business role on it.
The following are the steps involved :
- Define the source [ define the odbc connection to the database source ]
- Define the target [ create the odbc connection to the target database ]
- Create the mapping [ Apply business role here by adding transformations and define the data flow from source to target ]
- Create the session [ Mapping instructions ]
- Create the work flow [ Instructions that run on the sessions ]
Explain what are the ETL testing operations includes?
ETL testing includes
- Verify whether the data is transforming correctly according to business requirements
- Verify that the projected data is loaded into the data warehouse without any truncation and data loss
- Make sure that ETL application reports invalid data and replaces with default values
- Make sure that data loads at expected time frame to improve scalability and performance
What are the out put files that the informatica server creates during the session running?
- Informatica server log: A log is created for all status and error messages. It also creates error log for error messages and are created in Informatica home directory.
- Session log file: Every session is assigned to a session log file. It persists information about sessions into log files.
- Session detail file: It has the statistics of loading for every target in mapping. The session details include information about table name, no. of rows writer or rejected.
- Performance detail file: The information about session performance details that helps to improve the performance. Select ‘ Performance Details’ option in the session property sheet.
- Reject file: Reject file contains the data of rows which the writer does not write to targets.
In case you have non-OLEDB (Object Linking and Embedding Database) source for the lookup what would you do?
In case if you have non-OLEBD source for the lookup then you have to use Cache to load data and use it as source
What is Full load & Incremental or Refresh load?
- Initial Load : It is the process of populating all the data warehousing tables for the very first time
- Full Load : While loading the data for the first time, all the set records are loaded at a stretch depending on the volume. It erases all the contents of tables and reloads with fresh data
- Incremental Load : Applying the dynamic changes as and when necessary in a specific period. The schedule is predefined each period