More than one record in SAT for a Business Key with the same LOAD_DATE in a single load

Thanks @cmeyersohn . The reason I was asking is because I thought that the BK + LOAD_DATE combination should be the PK of the SAT, with Multi-active SAT being an exception.

Remember, the HUB PK and LDTS are required as part of the standard. To that standard, you may add optional columns/fields to ensure uniqueness - for exactly the situation you have encountered. A satellite does not have to be a multi-active satellite to require additional components be added to the primary composite key set. Just like the situation you currently find yourself in, you must have another component in the PK composite key set to insert multiple records that are received in the same load. In your case, you also have the data that indicates when the record was changed. If the modification timestamp is at the correct level of atomicity so that it is unique from one record to the next for the same BK, then you should use it to ensure that you can both order the descriptive records correctly and establish the uniqueness required to insert those records on the same Load Date.
Also, keep in mind that is this were a Multi-Active satellite, you would not be computing the hash diff at all and would use a sequence number or rowid during the insert, and would insert all descriptive data with every load because there would be no way of determining changes, deletes, etc., that may have occurred in the source.
Because you have a modification date coming from the source and it is provided to you as part of your data feed, this data does not qualify as a multi-active satellite.
Respectfully,
Cindi

3 Likes

Thanks @cmeyersohn . This really helps and clarifies the confusion I had. Really appreciate you taking time to detailed responses. Thanks.

Apologies for not seeing this sooner! (Also great to see you at WWDVC 2022 by the way!)

It is supported in that you can add anything you like to your composite primary key in your hashed column configuration in the stage, but we donā€™t have a parameter for it specifically.

We will be adding the ability for people to add arbitrary columns (similar to a payload) to any table via a new parameter. This will account for things such as people wanting sequence identifiers being materialised as columns in the final table.

For now you can just add it to the payload and ensure it is part of your primary key definition in your stage. Hope this helps.

1 Like

Thanks @alex.higgs . It was nice meeting you at WWDVC as well!

Is there timeline for the ability to add arbitrary columns, specifically for auto-increment, in DBTVAULT?

Thanks,
Saqib

Hello @cmeyersohn, I am keen to understand a bit more on this. Can this structure be part of all satellites just to ensure such cases can be handled in future if it so happens.i.e. HUB BK + LDTS + MODIFIED_DT as primary key for all satellites to maintain consistency across all satellite table structures.

I am asking this because doing such change for specific satellites will introduce conditional logic to load specific satellite tables in a certain way and would introduce conditional architecture.

@nikunj Sorry for the long delay to your question. To answer your question, no adding this third piece to the two part composite primary key standard is only used where necessary, and in Saqibā€™s case it is necessary. Adding additional columns to a satellite structureā€™s primary key is not a break in the standards nor does it introduce conditional architecture. Modifications to a physical structure are permitted for specific reasons - and in Saqibā€™s scenario, he has data that ā€œsort ofā€ meets the criterion of ā€œmulti-activeā€ satellite data in that multiple changes are occurring, all of which must be captured as delta changes AND all of which are arriving in the same load. Using the Parent HK and LDTS as the only components in the composite primary key of the satellite will simply, technically, not work. Since the data set he is working has an element that repeatedly, consistently establishes uniqueness to the record when loaded, then it can be added to the composite PK of the satellite. Using a rownum or generated sequence identifier only adds unnecessary processing to the load. There is no additional computation required to simply make the PK of this specific satellite a 3-part composite key made up of the BK hash, LDTS, and MODIFIED_DT. It doesnā€™t break the standards and it doesnā€™t introduce conditional architecture.

1 Like

@cmeyersohn, to expand this discussion further, how does adding the third column to the primary key of the satellite affect any links based on the hub/satellite? It appears (to me) that you would add the third column to the hash key calculation for the link primary key. Then, for query purposes, would you add the third column to the link table, or a satellite hanging off the link?

HI @mauryhammel ,

I donā€™t think @cmeyersohn is suggesting to alter the structure of the business key / hub_hashkey. That should be based on the Business Concept. The third field is just used as part of the PK for the SAT. @cmeyersohn , please correct me, if I misstated that.

Thanks,
Saqib

@saqib, Iā€™m not suggesting changing the hub, but asking what happens with links. Letā€™s say there is a column (c2) in the source table that point to a different hub that would be the basis for a link. If you need an additional column (c1) to denote the primary key for a satellite, then the link column (c2) could have different values for different values of column (c1) with the same (original) hub hashkey. Do you use column (c1) as part of the primary key/hashkey for the link based on column (c2)?

Hopefully that is clearer?

Thanks,

Hi @mauryhammel .

The LINK should only contain HUB_HASHKEYs, and dependent children if any.

The SAT hanging from the LINK can contain other attributes.

Saqib

Nothing mateā€” links only have hashkeys and a few other DV metadata columns

The light has come on/brainfreeze thawed outā€¦ :grinning: I understand now. Thanks all for your patience.