Bodhi Tree Blogs: May 2012

Thursday, May 24, 2012

Executing informatica query from command prompt

In every difficulty there is an opportunity, yes its very true. Today after manager asked me to automate something, there was a big question mark on my head that how will I do it? will it really work? and after coming at home just checked informatica help and they have give it very straight forward..

How will you execute the informatica query from command prompt?

My task was to find out all the checked out object in a repository, I am doing that by creating a query at Repository Manager. But main pain area was to automate this query. To automate this, only way is to run a shell or batch script, but how will I call that.. so below is the way for it.

1. First create a query from Repository Manager; that is a simple task.

2. Execute the query from command prompt

You can do this using pmrep command of informatica. pmrep.exe file will be located at C:\Informatica\PowerCenter8.6.0\server\bin path [if you have installed informatica at windows machine]. From command prompt just go till this directory.

Before actually executing your query, you have to first connect to the repository. If you direct try to execute the query, you will get an error message saying you first have to connect to repository.

How will you connect to Repository--? so that will be via Connect command as below

C:\Informatica\PowerCenter8.6.0\server\bin>pmrep connect -r EDWDEV -d Domain_inf
ra-34af3549f [-h infra-34af3549f -o 6001] -n Administrator -s native -x magogate

Here host and ports are optional, if you are specifying Domain then do not specify host and port. Using this command you will be able to connect to repository. You will only require to specify Repository, Domain, User, Security Domain, and Password.

Once you get connected to repository last task is to run a query

C:\Informatica\PowerCenter8.6.0\server\bin>pmrep executequery -q CheckedOutObjec
ts -u CheckedOutObjectsList.txt

Here, we have specified -q <Query Name> and -u <OutputFileName>

If you do not specify outputfilename, results will get displayed at command prompt itself..

Please find below log which I have tried at my system

You can also view the video for it at
https://www.youtube.com/watch?v=p-hzjTuqPMI
=============================================================

C:\Documents and Settings\Administrator>cd C:\Informatica\PowerCenter8.6.0\serve
r\bin
C:\Informatica\PowerCenter8.6.0\server\bin>pmrep executequery -q CheckedInObject
-u CheckedInObject.txt
Informatica(r) PMREP, version [8.6.0], build [178.0613], Windows 32-bit
Copyright (c) Informatica Corporation 1994 - 2008
All Rights Reserved.
This Software may be protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,01
4,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471; 7,254,590 and othe
r U.S. Patents Pending.
Invoked at Thu Jan 08 00:21:32 2009
Failed to read repository connection information. Please first connect to the re
pository using the connect command.
Failed to execute executequery.
Completed at Thu Jan 08 00:21:32 2009

===================
C:\Informatica\PowerCenter8.6.0\server\bin>pmrep connect -r EDWDEV
Informatica(r) PMREP, version [8.6.0], build [178.0613], Windows 32-bit
Copyright (c) Informatica Corporation 1994 - 2008
All Rights Reserved.
This Software may be protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,01
4,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471; 7,254,590 and othe
r U.S. Patents Pending.
Invoked at Thu Jan 08 00:24:09 2009
[01/08/2009 00:24:09-[REP_12164] Domain-related error: [[PCSF_46006] Domain nam
e is not specified. [FAULT_PCSFUSEREXCEPTION]].]
[Failed to connect to repository service [EDWDEV].]
An error occurred while accessing the repository[Failed to connect to repository
service [EDWDEV].]
[01/08/2009 00:24:09-[REP_55102] Failed to connect to repository service [EDWDE
V].]
Repository connection failed.
Failed to execute connect.
Completed at Thu Jan 08 00:24:09 2009
C:\Informatica\PowerCenter8.6.0\server\bin>pmrep connect -r EDWDEV -d Domain_inf
ra-34af3549f -h infra-34af3549f -o 6001 -n Administrator -s native -x magogate
Informatica(r) PMREP, version [8.6.0], build [178.0613], Windows 32-bit
Copyright (c) Informatica Corporation 1994 - 2008
All Rights Reserved.
This Software may be protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,01
4,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471; 7,254,590 and othe
r U.S. Patents Pending.
Invoked at Thu Jan 08 00:26:32 2009
If -d is specified, do not specify -h or -o.
Failed to execute connect.
Completed at Thu Jan 08 00:26:32 2009
C:\Informatica\PowerCenter8.6.0\server\bin>pmrep connect -r EDWDEV -d Domain_inf
ra-34af3549f -h infra-34af3549f -o 6001 -n Administrator -s native -x magogate
C:\Informatica\PowerCenter8.6.0\server\bin>pmrep connect -r EDWDEV -d Domain_inf
ra-34af3549f -n Administrator -s Native -x magogate
Informatica(r) PMREP, version [8.6.0], build [178.0613], Windows 32-bit
Copyright (c) Informatica Corporation 1994 - 2008
All Rights Reserved.
This Software may be protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,01
4,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471; 7,254,590 and othe
r U.S. Patents Pending.
Invoked at Thu Jan 08 00:28:11 2009
Connected to repository EDWDEV in Domain_infra-34af3549f as user Administrator
connect completed successfully.
Completed at Thu Jan 08 00:28:16 2009
C:\Informatica\PowerCenter8.6.0\server\bin>
==================
C:\Informatica\PowerCenter8.6.0\server\bin>pmrep executequery -q CheckedInObject

Informatica(r) PMREP, version [8.6.0], build [178.0613], Windows 32-bit
Copyright (c) Informatica Corporation 1994 - 2008
All Rights Reserved.
This Software may be protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,01
4,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471; 7,254,590 and othe
r U.S. Patents Pending.
Invoked at Thu Jan 08 00:29:25 2009
mapping m_Dynamic_Target
.
The total number of records returned: 1
executequery completed successfully.
Completed at Thu Jan 08 00:29:30 2009

C:\Informatica\PowerCenter8.6.0\server\bin>pmrep executequery -q CheckedOutObjec
ts -u CheckedOutObjectsList.txt
Informatica(r) PMREP, version [8.6.0], build [178.0613], Windows 32-bit
Copyright (c) Informatica Corporation 1994 - 2008
All Rights Reserved.
This Software may be protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,01
4,670; 6,032,158; 5,794,246; 6,339,775; 6,850,947; 6,895,471; 7,254,590 and othe
r U.S. Patents Pending.
Invoked at Thu Jan 08 01:07:04 2009

The total number of records returned: 50
executequery completed successfully.
Completed at Thu Jan 08 01:07:09 2009
C:\Informatica\PowerCenter8.6.0\server\bin>

Sunday, May 20, 2012

Partitioning : Things which did not worked for me !

Partitioning - now even heraing this word gives me a shock, for last 20+ days I am quite frustrated, just because of Informatica Partitioning.

I am not sure where it works correctly and actually increases the performance (increases throughput at source and eventually at target), but I definetly know where id does not work. Being optimistic and showing positive attitude , I am describing where I failed and Partitioning this which did not worked for me.

Before describing actual problem, I just walk through the climex ...

I have a mapping, which has source as a csv file (having 5-Million records in it) and target as a partition table (partitioned based on date) and inbtween source and target there are number of tranformations like joiner, sorter, normalizer.

My Manager has forcee the performance problem because in future the source records are going to get increase. Currently source file do not have 5 million records. Earlier he asked my on-site co-ordinator to work on this and to check if adding partitioning at infra level, improves the performance or not. After onsite-corodinartor got released from the project, that task ultimately got assigned to me.

First Hurdle -- At first run, instead of getting any kind of performance gain, mapping started taking hell lot of time. I tried number of things at session paramter level, but not a single thing helped. After that I tried enabling "Collect Performance Statistics" and while going through that realized that data getting loded at target it quite more than expected.

When I looked at it closely, came to know that from one of the relational source, data was coming duplicate. That's because, when we add partition, say 3 for an example. all 3 partitions will fetch the data from relational source, whereas that's not the case with file as a source. In FileSource, if you specify input to only one partition, informatica automatically transfer data to remaining partition, so data will get divided into number of partitions. But in case relational source, if you do not specify proper filter condition at each partiion, same data will be fected by all partitions.

If you check at "Partitioning Joiner Tranformation" there u will find there are 2 types
1. 1:n partitioning
2. n:n partitioning

In case of a joiner tranformation, you can only partition Detail source if you do not want to add partition point at Joiner level that is called as 1:n partitionng

In case of n:n partitioing, you can add Partition Point at Joiner but with that you have to add a partition at master level too. Here, you have to work cauciously. If your master is relational source, add condition 1=2 except one partition to restrict duplicate data.

2. After completing this hurdle, I ran mapping again, and got success at certain level. I would not say throughput / performance was fantastic, but at least much more positive that earlier.

With my past experience, I modified below parameters at session level
a. Default buffer block size -- chaned to 1000000
b. Max Percentage of total memory allowed for auto memory attributes -- changed to 25
c. Commit Interval -- Changed ot 1000000
d. DTM Buffer Size -- Changed to 1GB, 2GB from auto

This helped me to gain performance for single or uptop 4-5 simultaneous sessions. But as I kicked off all 10 workflows together, perforamnce went down, and sometime un-partition sessions were working much better than the partitioned sessions.

3. I had a discussion with Nilesh [Seems I bugged nilesh very much, but for me he was the only guy to expect some help from informatica or oracle point of view], when we checked the session logs, we came to conclusion that buttleneck might be at target level.

At session logs, it was the Target Thread which was always busy more than 98%+ and that too for singnificat amount of time. So, what was the to overcome that.. then we tried Oracle Loader utility.

4. Created Oracle Loader connection at Workflow level, and changed the session paramters. It got failed, tried changing number of things, but all efforts went into vein. I was totally frustrated. Left that option aside, and started working on other prod issues.

Again on next day, when I looked closly at logs, it was telling me that the password is incorrect. So, instead of QA, changed the loader connection to Dev db, and again kicked off the session, but still no success. This time at least control file, input files and logs files got created. Input file was empty, and control file was showing inserted 0 rows.

I was expecting informatica to create 11 input files, and if not same number of control files but in same contorl files I was expecting entries for 11 input files. But that was not the case. There was only 1 input file and same entry was there in control file. Not sure why this is happening, but seems informatica always creates only one control files and only one input file.

Still I have not got any success in performance but lot many questions are there
1. Why performance is going down at the time of insertion? Is that because of Constraints? or because of concurrent insertions?

2. Why informatica creates only 1 contorl and input file? what we have to do if we are running multiple session/tasks in a single workflow?

Will update this blog if I get any answers for above questions, but till the time, for my manger, what every I did is just a waste. For him, whole time/work is good for nothing, end of the day my output is 0. :-(

Tuesday, May 8, 2012

How to pass Target File Name dynamically ?

Had a discussion with Santhosh Garu regarding target file creation but passing its name from mapping itself and appening the name with date. I had a discussion about this with Nilesh earlier, and even I had tried it at that time.

Again today I did it, I passed the file name from mapping with date appened to it. Below are the steps for it.

1. Import the target file and after importing it, click on the "Add FileName Columns to this table icon". Once you click on it, automatically new port will get added to target table.

2. Import source (File or Relational)
3. Creation a simple mapping.

Source --> Expression (to specify the filename) --> Target

In the expression you specify below as a filename'HelloWorld_' || TO_CHAR(SYSTIMESTAMP(),'DD-MM-YYYY')
4. Create a workflow and specify target directory and target filename at session properties level.
5. file name which you give at session level will get over-ridden by file name which you will pass from mapping.

Please watch below video for the same.http://www.youtube.com/watch?v=QCo929SrBsQ&feature=BFa&list=UUZ4jBpZoYcbYNQAOnuV_edg

Bodhi Tree Blogs

Thursday, May 24, 2012

Executing informatica query from command prompt

Sunday, May 20, 2012

Partitioning : Things which did not worked for me !

Tuesday, May 8, 2012

How to pass Target File Name dynamically ?

All about CSS

Pages

Followers