The "Dynamic Look-up" implementation in any project I have not seen personally. People say that it get used in SCD-2 type implementation, but even that is not mandatory; In my last project, to implement SCD-2 type, they had used static lookup itself.
But, I came across one scenario wherein it was a need to impement Dynamic Lookup. In my project they have used static lookup, when it was giving problem in particular situation.
We had layes or a data flow as below
1. Source System which was OP
2. Staging tables pulling data from OP.
3. Dimention and Facts tables which were pulling data from Staging.
In staging, most of the tables were truncate and reload, and indexes defined on them used to drop and re-create after every staging data load initiation and completion using pl-sql script.
Data was getting picked up for staging load was using below condition at source qualifier.
Select *
from ods_dw_op_per_all_people_f_stg
where ges_update_date >=
(select max(last_successful_completion_date) - 3
from dw_jobs
where map_name = <Hard Coded Value>)
So, here ods_dw_op_per_all_people_f_stg is our source system table. Its actually a synonym at target database. Using ges_update_date condition, we can pick the latest data from source system. This technique will work fine most of the times i.e. 99%. But consider a situation like below.
We have below data which is coming on 11-Jun-2012
Id - 1
Name - Mandar
SurName - Gogate
City - Mumbai
Location - Girgaon
While processing this data from source system into staging tables, some prior dependent jobs got failed and somehow this data did not got inserted into staging tables. Because of which now at next data load again from 11-Jun-2012 data will get picked up.
At source side, on 12-Jun source data got changed..
Old Records were
Id - 1
Name - Mandar
SurName - Gogate
City - Mumbai
Location - Girgaon
New created records at soruce are
Id - 1
Name - Mandar
SurName - Gogate
City - Pune
Location - Bavdhan
So now at source 2 records exists, one with ges_uddate_date as 12-Jun and other with 13-Jun. Our source qualifier for staging load will pick both the records together and will try to insert those into staging level. Since, there is a index defined on employee_id, this insertion will ultimately fail.
In the normal scenario, if job would not have failed... on 11-Jul first records would have got inserted at staging table. and on 12-Jul again even if 2 recrods got picked up by Source Qualifier, those would have got updated [as there is a update strategy defined at mapping level].
The solution for this situation is "Dynamic Looktup"; which will pass only one value ahead.. and in its buffer its buffer itself it will mark rows as insert or update.
But, I came across one scenario wherein it was a need to impement Dynamic Lookup. In my project they have used static lookup, when it was giving problem in particular situation.
We had layes or a data flow as below
1. Source System which was OP
2. Staging tables pulling data from OP.
3. Dimention and Facts tables which were pulling data from Staging.
In staging, most of the tables were truncate and reload, and indexes defined on them used to drop and re-create after every staging data load initiation and completion using pl-sql script.
Data was getting picked up for staging load was using below condition at source qualifier.
Select *
from ods_dw_op_per_all_people_f_stg
where ges_update_date >=
(select max(last_successful_completion_date) - 3
from dw_jobs
where map_name = <Hard Coded Value>)
So, here ods_dw_op_per_all_people_f_stg is our source system table. Its actually a synonym at target database. Using ges_update_date condition, we can pick the latest data from source system. This technique will work fine most of the times i.e. 99%. But consider a situation like below.
We have below data which is coming on 11-Jun-2012
Id - 1
Name - Mandar
SurName - Gogate
City - Mumbai
Location - Girgaon
While processing this data from source system into staging tables, some prior dependent jobs got failed and somehow this data did not got inserted into staging tables. Because of which now at next data load again from 11-Jun-2012 data will get picked up.
At source side, on 12-Jun source data got changed..
Old Records were
Id - 1
Name - Mandar
SurName - Gogate
City - Mumbai
Location - Girgaon
New created records at soruce are
Id - 1
Name - Mandar
SurName - Gogate
City - Pune
Location - Bavdhan
So now at source 2 records exists, one with ges_uddate_date as 12-Jun and other with 13-Jun. Our source qualifier for staging load will pick both the records together and will try to insert those into staging level. Since, there is a index defined on employee_id, this insertion will ultimately fail.
In the normal scenario, if job would not have failed... on 11-Jul first records would have got inserted at staging table. and on 12-Jul again even if 2 recrods got picked up by Source Qualifier, those would have got updated [as there is a update strategy defined at mapping level].
The solution for this situation is "Dynamic Looktup"; which will pass only one value ahead.. and in its buffer its buffer itself it will mark rows as insert or update.
No comments:
Post a Comment