NISHANT NAVEEN: 2008

Friday, September 12, 2008

INFORMATICA POWER CENTER - A SHORT TUTORIAL

Lets learn at Informatica Power Center in less than 30mins.

We will first discuss the basic overall architecture of the product, and then will have some hands on to break through.

Informatica is a tool for the ETL process and integration, which has a service-oriented architecture. SOA provides the ability to scale services and share resources across multiple machines.

Informatica has the ability to communicate with all major data sources and database vendors. It can move data to/from these data sources.

Power Center Domain is the fundamental and an important administrative unit. It is a collection of nodes and services. A node is the logical representation of a machine in a domain. Services are the processes that help in the whole ETL process, e.g. authentication, authorization, repository management, integration services etc.

Services for the domain include:

Service Manager: It manages all domain operations. It runs the application services and performs domain functions (e.g. authentication, authorization, and logging) on each node in the domain. The Service Manager uses TCP/IP to communicate with the application services.

· Application services

· Repository Service: It manages the repository. Repository is the data about all the data and and the ETL involved on the data. It retrieves, inserts, and updates metadata in the repository database tables.

· Integration Service: The Integration Service runs sessions and workflows.

A session is a set of instructions that tells the Integration Service how to move data from sources to targets

A workflow is a set of instructions that tells the Integration Service how to execute tasks, such as sessions, email notifications, and shell commands.

The Integration Service reads workflow information from the repository. The Integration Service connects to the repository through the Repository Service to fetch metadata from the repository.

· SAP BW Service: listens for RFC requests from SAP BW and processes it.

· Web Services Hub: Runs within Service Manager and receives requests from web service clients and processes.

There are 2 types of nodes:

Gateway Nodes: A gateway node can run application services, and it can serve as a master gateway node. The master gateway node is the entry point to the domain.
Worker Nodes: A worker node is any node not configured to serve as a gateway. The Service Manager performs limited domain operations on a worker node.

Some other terms:

Mapping: is to depict the flow of data between sources and targets.

Transformations: We add transformations to a mapping that depict how the Integration Service extracts and transforms data before it loads a target.

A transformation is a part of a mapping that generates or modifies data. Every mapping includes a Source Qualifier transformation, representing all data read from a source and temporarily stored by the Integration Service. In addition, you can add transformations that calculate a sum, look up a value, or generate a unique ID before the source data reaches the target

Source Qualifier: The source qualifier represents the rows that the Integration Service reads from the source when it runs a session.

Session: A session is a set of instructions that tells the Integration Service how to move data from sources to targets. You create a session for each mapping that you want the Integration Service to run

Workflow: A workflow is a set of instructions that tells the Integration Service how to execute tasks, such as sessions, email notifications, and shell commands.

Above is some insight into the working of the Informatica. For more in depth architecture of the Informatica, you can visit:

http://informatica.techtiks.com/informatica_architecture.html

Informatica tool has 3 main components:

Client tools: These tools are used for the development process. It has the following supporting tools:

· Designer: To define mapping and transformation.

· Workflow Manager: To define the session and the workflow.

· Workflow Monitor: To monitor execution of sessions.

· Repository Manager: To manage repository.

2. Server tools: Can be considered as the staging area, where data from sources are fetched, transformed and then loaded into the target.

3. Repository: Repository is a kind of data inventory where all the data related to mappings, sources, targets etc is kept.

Now once we are familiar with the overall description of the product, we are now ready to have some practical hands-on. Follow the following steps to perform your ETL. This is a highly generic steps, and depending on your requirements u need to perform your transformations.

1. Create a folder to organize your work.

2. Logon to the Repository in the Designer Window.

3. Create the Source Repository in the Repository Database.

4. Create the Target Repository in the Repository Database. If Target does not exist, then create and execute the Target creation commands.

5. Now perform the source to target mapping, with appropriate transformations.

6. Now create a session for the mapping and thereafter a workflow for its execution. This will be done in the Workflow Manager.

7. Now run the workflow and watch in Workflow Monitor. It will flash ‘Success’ once the operation completed successfully.

The time is running out. So, will post the pictorial form of the steps in next issue.
Thanks, and comments are solicited.

Wednesday, August 20, 2008

The Zig-Zag SQL query

Once I was asked to solve a very peculiar problem. The problem was to find the employee_id and the corresponding current month salary and previous month salary in a single row. The table looks like this:

employee_id	job_id	department_id	date_id	amount_credited
206	AC_ACCOUNT	110	20080731	8500
206	AC_ACCOUNT	110	20080431	8600
100	AD_PRES	90	20081131	26000
101	AD_VP	90	20080731	19000
102	AD_VP	90	20081231	20000
102	AD_VP	90	20080631	18000
121	ST_MAN	50	20080431	8500
102	AD_VP	90	20080831	20000
118	PU_CLERK	30	20081131	2700
118	PU_CLERK	30	20080431	2800
121	ST_MAN	50	20080531	8200
206	AC_ACCOUNT	110	20080831	8600
206	AC_ACCOUNT	110	20080631	8400
100	AD_PRES	90	20080531	24000
100	AD_PRES	90	20080431	27000
100	AD_PRES	90	20080331	26000
100	AD_PRES	90	20080231	25000

…………………………………………………………………

……………………………………………………………………

………………………………………………………………

…………………………………………………

…………………………………………………………..

P.S: Don't be astonished by seeing the date_id. It is made like that to ease the understanding. The date_id is not possible in all cases, but assume them to be correct for simplicity and making life easier.

Here, amount_credited, is the current month salary.

How to go about this SQL puzzle? I did it like this:

======================================================================

select aa.EMPLOYEE_ID,aa.DATE_ID, aa.AMOUNT_CREDITED "THIS_MONTH", bb.AMOUNT_CREDITED "LAST_MONTH"

from table_given aa, table_given bb

where

aa.EMPLOYEE_ID = bb.EMPLOYEE_ID AND

aa.DATE_ID=bb.date_id+100

UNION ALL

select distinct aa.EMPLOYEE_ID,aa.DATE_ID, aa.AMOUNT_CREDITED "THIS_MONTH", null "LAST_MONTH"

from table_given aa, table_given bb

where

aa.DATE_ID-100 NOT IN (Select distinct DATE_ID from new_fact_hr)

order by 1,2;

======================================================================

In this I have taken into account the relationship among date_id for formulating this query. The date_id for each month differs by 100.

e.g. 20080331

2008

Year Month Day

This query can be run on any database, which follows SQL standards. The zig-zag output produced is as follow:

employee_id	date_id	This Month	Last Month
100	20080131	24000
100	20080231	25000	24000
100	20080331	26000	25000
100	20080431	27000	26000
100	20080531	24000	27000
100	20080631	25000	24000
100	20080731	26000	25000
100	20080831	27000	26000
100	20080931	24000	27000
100	20081031	25000	24000
100	20081131	26000	25000
100	20081231	27000	26000
101	20080131	17000
101	20080231	18000	17000
101	20080331	19000	18000
101	20080431	20000	19000
………………….	………………………………….	…………………….	……………….
………………….	………………………………….	…………………….	……………….
………………….	………………………………….	…………………….	……………….
………………….	………………………………….	…………………….	……………….
………………….	………………………………….	…………………….	……………….

However, this query can be done more effectively with a shorter query if we use ORACLE database. It has many proprietary SQL commands, which is much advanced than SQL standards. I leave this up to you to try your hand on. Hint: Use Oracle SQL for Analytics (e.g. LAG, PARTITION OVER etc.). Good Luck to you.

Perl Vs Python Fight

I was confused as to what to start with? Perl or Python. Though I had done Perl before but wanna give a chance to Python too, if it outperforms Perl. I browsed through the web, and compiled the following pros and cons for both of them:

1) Python tends to focus more on code aesthetics than performance; as a result, Python OpenGL developers continue to struggle with overcoming Python's array handling performance issues.

2) Perl is much simpler to port to/from C/C++/Java

3) Perl has a much larger repository of open source libraries

4) Perl has much stronger/faster string handling (dynamic shader programs)

5) Perl simplifies embedding values within text blocks

6) Every function in Python can easily be performed in Perl; the converse is not true, particularly in terms of string handling.

7) Python has no defined global constants

8) Python has no forward referencing/prototypes

9) Python has no simple method of retrieving a dictionary/hash element's key

10) Python's print has no global method for enabling unbuffered I/O

11) Python's formatted print requires lists to be first converted to tuples

12) Python's print has no means to suppress both trailing new lines and spaces

13) Python has no ifdef method (short of reformatting) to block out prototype code

After going through these comparisons I decided to stick with Perl only, and learn Python to find the above truth only.

NISHANT NAVEEN

Friday, September 12, 2008

INFORMATICA POWER CENTER - A SHORT TUTORIAL

Wednesday, August 20, 2008

The Zig-Zag SQL query

The Zig-Zag SQL query

Perl Vs Python Fight

Blog Archive

About Me