Query optimization for the Link SAT with Dependent Child

Hello all,

We have a LINK SAT which has multi-field Dependent Child. These multiple fields grouped, define a single LINE ITEM.

Querying this LINK SAT to get the latest record for each line item is extremely slow as we have to partition by

  qualify row_number() over(
    partition by
      hashkey
     , prime_dep_child_no
     , sub_dep_child_no
     , base_dep_child_no
    order by
      load_date desc
  ) = 1;

What is a good way to optimize this query?

Thanks,
Saqib

The more you add as Dep-Keys the slower it will be!

Judging by your code you’re looking for a single record for hk + all the dep-keys… should just the hk suffice? I mean you can make it two step, get the latest record by hk, then get the dep-records you want? no?

@patrickcuba we need the latest record for all the line items…

If this is a Link Sat, then you are storing the dependent keys in the link itself, correct? From there, you are concatenating and hashing all the keys (including dependents) as the link_hk. Thus, you are storing the link_hk + ldts in the satellite, correct? This would eliminate the need for using the row_number function, since you’ll have the composite hash and load date together in the Link Sat.

Am I way off base in my understanding of the situation?

Hello Saqib!
Without knowing on what kind of environment you are running on my suggestion might not be faster. But try to extract a subset in a inner query and then join back to the table.
ex of inner query;
select max(loaddate), hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no
group by hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no

full query:
select x,y,z
from satellite s
inner join (
select max(loaddate) as ld, hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no
group by hashkey, prime_dep_child_no, sub_dep_child_no, base_dep_child_no
) lastrow on lastrow.ld = s.load_date and lastrow.hashkey = s.hashkey and lastrow.prime_dep_child_no = s.prime_dep_child_no and lastrow.sub_dep_child_no = s.sub_dep_child_no

Without the use of a window function you are able to reduce the risk table spooling (MS SQL Server).

Another note is that if the childfields would reside in a Link then the complexity of extracting data would be less complex (and faster).

Another note2: It is strange (my opinion) that you choose to have the same grain for order and orderlines or is it something that I missed here?

Hope this was usefull for you!

Thanks @Nicruzer and @AHenning . We have the dependent child(ren) as part of the LINKHASHKEY. So I should probably be using the LINKHASHKEY to partition by, rather than one of the Business HK participating in the LINK. Let me try that out.

Thanks again.

1 Like

Dep-key in used to create the link-hash-key???

the use of QUALIFY statement means it’s not SQL Server

yup. is that not correct?

dep-keys are degenerate dimensions that better qualify a relationship.
If you include them in the link-hash-key generation they no longer represent the participating business entities and the link-sat itself will be defined at a deeper grain that the relationship depicted in the link.

You will effectively create a link record for every combination of participating hub-hash-keys + dep-keys and the link ceases to be a unique list of relationships

it’s part of the reason why you would choose to either have the Dep-Key in the Link vs in the Sat…

in the link — these attributes are applicable to the relationship forever and never change
in the sat – these attributes change and we track those in the sat+dep-key design

That way, in choosing one or the other with the desired purpose, your model will scale

@patrickcuba , see here:

per @cmeyersohn , the dependent child(ren) should be part of the LINK HASHKEY generation.

yea i know!

:slight_smile:

up to you

1 Like