Query optimization for the Link SAT with Dependent Child

saqib · 6 June 2022 21:47

Hello all,

We have a LINK SAT which has multi-field Dependent Child. These multiple fields grouped, define a single LINE ITEM.

Querying this LINK SAT to get the latest record for each line item is extremely slow as we have to partition by

  qualify row_number() over(
    partition by
      hashkey
     , prime_dep_child_no
     , sub_dep_child_no
     , base_dep_child_no
    order by
      load_date desc
  ) = 1;

What is a good way to optimize this query?

Thanks,
Saqib

patrickcuba · 6 June 2022 23:34

The more you add as Dep-Keys the slower it will be!

Judging by your code you’re looking for a single record for hk + all the dep-keys… should just the hk suffice? I mean you can make it two step, get the latest record by hk, then get the dep-records you want? no?

saqib · 6 June 2022 23:52

@patrickcuba we need the latest record for all the line items…

Nicruzer · 7 June 2022 05:06

If this is a Link Sat, then you are storing the dependent keys in the link itself, correct? From there, you are concatenating and hashing all the keys (including dependents) as the link_hk. Thus, you are storing the link_hk + ldts in the satellite, correct? This would eliminate the need for using the row_number function, since you’ll have the composite hash and load date together in the Link Sat.

Am I way off base in my understanding of the situation?

AHenning · 7 June 2022 09:35

Hello Saqib!
Without knowing on what kind of environment you are running on my suggestion might not be faster. But try to extract a subset in a inner query and then join back to the table.
ex of inner query;
select max(loaddate), hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no
group by hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no

full query:
select x,y,z
from satellite s
inner join (
select max(loaddate) as ld, hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no
group by hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no
) lastrow on lastrow.ld = s.load_date and lastrow.hashkey = s.hashkey and lastrow.prime_dep_child_no = s.prime_dep_child_no and lastrow.sub_dep_child_no = s.sub_dep_child_no

Without the use of a window function you are able to reduce the risk table spooling (MS SQL Server).

Another note is that if the childfields would reside in a Link then the complexity of extracting data would be less complex (and faster).

Another note2: It is strange (my opinion) that you choose to have the same grain for order and orderlines or is it something that I missed here?

Hope this was usefull for you!

saqib · 7 June 2022 12:43

Thanks @Nicruzer and @AHenning . We have the dependent child(ren) as part of the LINKHASHKEY. So I should probably be using the LINKHASHKEY to partition by, rather than one of the Business HK participating in the LINK. Let me try that out.

Thanks again.

patrickcuba · 7 June 2022 22:23

Dep-key in used to create the link-hash-key???

patrickcuba · 7 June 2022 22:23

the use of QUALIFY statement means it’s not SQL Server

saqib · 7 June 2022 22:35

yup. is that not correct?

patrickcuba · 7 June 2022 22:53

dep-keys are degenerate dimensions that better qualify a relationship.
If you include them in the link-hash-key generation they no longer represent the participating business entities and the link-sat itself will be defined at a deeper grain that the relationship depicted in the link.

You will effectively create a link record for every combination of participating hub-hash-keys + dep-keys and the link ceases to be a unique list of relationships

patrickcuba · 7 June 2022 22:56

it’s part of the reason why you would choose to either have the Dep-Key in the Link vs in the Sat…

in the link — these attributes are applicable to the relationship forever and never change
in the sat – these attributes change and we track those in the sat+dep-key design

That way, in choosing one or the other with the desired purpose, your model will scale

saqib · 7 June 2022 22:59

@patrickcuba , see here:

per @cmeyersohn , the dependent child(ren) should be part of the LINK HASHKEY generation.

patrickcuba · 7 June 2022 23:20

yea i know!

up to you

Topic		Replies	Views
Non-historized or Sat Dependent Child Key Data Vault 2.0	6	290	9 February 2024
Should Business Key be included in the SAT HASHDIFF? Also what about Depedent Child in the HASHDIFF for LINK SAT Data Vault 2.0 business-key , dependent-child , link , hashdiff , link-satellite	18	1196	21 February 2023
Modeling a Row vault from an OLTP having composite primary key. Data Vault 2.0 business-key , dependent-child , link , hub	3	468	15 June 2023
Optimizing SAT Loading from Staging as a VIEW in Snowflake Data Vault 2.0	15	513	5 August 2022
Modeling a dependent child Data Vault 2.0 dependent-child , link	11	4044	13 March 2022

Query optimization for the Link SAT with Dependent Child

Related topics