I’m aware of standard to define 0-keys and ghost records for the DV objects.
Do you apply the same in reference objects (reference and reference satellite table)?
By default Ref objects do not have hash keys, therefore 0x0 and 0xf keys cannot be added, but only (missing) or (error) entries in code columns.
Has anyone had similar considerations?
If indeed you are following the preferred method for modelling reference tables (hub + sat), then I would definitely include the zero key and ghost records.
Remember also that hash keys are simply hashed representations of the business key; thus, if you have defined your ghost business key as “0000” or “0x0” or “0” [as an integer], and that definition doesn’t cannot exist in your source, then use that. The same would be true of the null key representations (-1 and -2 in our system for “required but missing” and “not required but missing” respectively).
The question is … why?
Will you be building PITs needing to surface reference data?
Will you be using link structures between ref-hubs and ref-sats?
Ref data should probably be loaded as Ref data, i.e. RED_
The reason I’m thinking about it is to display missing reference data in a consistent way.
E.g. when build an Infomart and a specific dimension, I’d point to a missing entry in REF table rather then left joining and coalescing
- Standard approach to handle missing entries
- satellite left join ref satellite
- CASE WHEN ref.code is null then ‘MISSING’ else coelesce(ref.CodeDescription, ‘ERROR’) END
- With ghost record in the REF SATELLITE this could be simplified because the value would already be populated in the ref satellite.
Again… PITs are for performance, do you think you will be having millions and millions of records in your REF data to justify using PITs over it?
Another one for you WHY EQUIJOINS MATTER!. The Scene | by Patrick Cuba | Snowflake | Medium