Sunday, May 20, 2012

Partitioning : Things which did not worked for me !

Partitioning - now even heraing this word gives me a shock, for last 20+ days I am quite frustrated, just because of Informatica Partitioning.

I am not sure where it works correctly and actually increases the performance (increases throughput at source and eventually at target), but I definetly know where id does not work. Being optimistic and showing positive attitude , I am describing where I failed and Partitioning this which did not worked for me.

Before describing actual problem, I just walk through the climex ...

I have a mapping, which has source as a csv file (having 5-Million records in it) and target as a partition table (partitioned based on date) and inbtween source and target there are number of tranformations like joiner, sorter, normalizer.

My Manager has forcee the performance problem because in future the source records are going to get increase. Currently source file do not have 5 million records. Earlier he asked my on-site co-ordinator to work on this and to check if adding partitioning at infra level, improves the performance or not. After onsite-corodinartor got released from the project, that task ultimately got assigned to me.

First Hurdle -- At first run, instead of getting any kind of performance gain, mapping started taking hell lot of time. I tried number of things at session paramter level, but not a single thing helped. After that I tried enabling "Collect Performance Statistics" and while going through that realized that data getting loded at target it quite more than expected.

When I looked at it closely, came to know that from one of the relational source, data was coming duplicate. That's because, when we add partition, say 3 for an example. all 3 partitions will fetch the data from relational source, whereas that's not the case with file as a source. In FileSource, if you specify input to only one partition, informatica automatically transfer data to remaining partition, so data will get divided into number of partitions. But in case relational source, if you do not specify proper filter condition at each partiion, same data will be fected by all partitions.

If you check at "Partitioning Joiner Tranformation" there u will find there are 2 types
1. 1:n partitioning
2. n:n partitioning

In case of a joiner tranformation, you can only partition Detail source if you do not want to add partition point at Joiner level that is called as 1:n partitionng

In case of n:n partitioing, you can add Partition Point at Joiner but with that you have to add a partition at master level too. Here, you have to work cauciously. If your master is relational source, add condition 1=2 except one partition to restrict duplicate data.

2. After completing this hurdle, I ran mapping again, and got success at certain level. I would not say throughput / performance was fantastic, but at least much more positive that earlier.

With my past experience, I modified below parameters at session level
a. Default buffer block size -- chaned to 1000000
b. Max Percentage of total memory allowed for auto memory attributes -- changed to 25
c. Commit Interval -- Changed ot 1000000
d. DTM Buffer Size -- Changed to 1GB, 2GB from auto

This helped me to gain performance for single or uptop 4-5 simultaneous sessions. But as I kicked off all 10 workflows together, perforamnce went down, and sometime un-partition sessions were working much better than the partitioned sessions.

3. I had a discussion with Nilesh [Seems I bugged nilesh very much, but for me he was the only guy to expect some help from informatica or oracle point of view], when we checked the session logs, we came to conclusion that buttleneck might be at target level.

At session logs, it was the Target Thread which was always busy more than 98%+ and that too for singnificat amount of time. So, what was the to overcome that.. then we tried Oracle Loader utility.

4. Created Oracle Loader connection at Workflow level, and changed the session paramters. It got failed, tried changing number of things, but all efforts went into vein. I was totally frustrated. Left that option aside, and started working on other prod issues.

Again on next day, when I looked closly at logs, it was telling me that the password is incorrect. So, instead of QA, changed the loader connection to Dev db, and again kicked off the session, but still no success. This time at least control file, input files and logs files got created. Input file was empty, and control file was showing inserted 0 rows.

I was expecting informatica to create 11 input files, and if not same number of control files but in same contorl files I was expecting entries for 11 input files. But that was not the case. There was only 1 input file and same entry was there in control file. Not sure why this is happening, but seems informatica always creates only one control files and only one input file.

Still I have not got any success in performance but lot many questions are there
1. Why performance is going down at the time of insertion? Is that because of Constraints? or because of concurrent insertions?

2. Why informatica creates only 1 contorl and input file? what we have to do if we are running multiple session/tasks in a single workflow?

Will update this blog if I get any answers for above questions, but till the time, for my manger, what every I did is just a waste. For him, whole time/work is good for nothing, end of the day my output is 0. :-(


1 comment:

  1. Plz don't say output is zero. You got to learn lot of things because of this issue. Only such incidents force us to try something new. otherwise you wouldn't have tried all these options. Rarely we get a chance to do these kind of experiments. Be positive. I'm B +ve :) Hats off to your atitude for not giving up so easily!

    ReplyDelete

All about CSS

From book HTML & CSS - Design and Build Websites - Jon Duckett CSS works by associating rules with HTML elements. These rules govern how...