If I understand correctly, according to some articles from Scalefree, I should use the hash key of the unknown zero key as the hash key for the ghost record. This approach allows me to perform an EQUI JOIN on an hierarchical link (optional relationship / parent-child relationship) through the hub (optional / unknown zero key) with the satellite.
HK Name Function
HK(Ghost) NULL NULL
HK(X) Name_X Manager
HK(A) Name_A Technical
HK(B) Name_B Administrative
Select to get the data out
select
Hub_Employee.BK_Employee as Employee_id
, Satellite_Employee.Name
, Satellite_Employee.Function
, Satellite_Manager.Name as Manager
from
Link_Employee_Manager
inner join
Hub_Employee â Role Employee
on Hub_Employee.HK_Employee = Link_Employee_Manager.HK_Employee
inner join
Satellite_Employee â Role Employee
on Hub_Employee.HK_Employee = Satellite_Employee.HK_Employee
inner join
Hub_Employee Hub_Manager â Role Manager
on Hub_Manager.HK_Employee = Link_Employee_Manager.HK_Manager
inner join
Satellite_Employee Satellite_Manager â Role Manager
on Hub_Manager.HK_Employee = Satellite_Manager.HK_Employee
;
Output
If HK(Noref) = HK(Ghost) then output will be as desired
Employee_Id Name Function Manager
A Name_A Technical Name_X
B Name_B Administrative Name_X
X Name_X Manager
If HK(Noref) <> HK(Ghost) then output
Employee_Id Name Function Manager
A Name_A Technical Name_X
B Name_B Administrative Name_X
â Missing record for Employee X
By having zero keys, youâre able to do an equi-join from the link back to the hub. A join to the sat however youâll most likely always want to do using a left join.
Another thing is that youâd need to handle the timelines in your sat when getting the data out. Maybe you omitted it for the sake of simplicity.
But what matters in the end, is that ghost records should only get attention as soon as you start implement PIT tables. Itâs there where you, for every time slice, for every hub record, get the respective record from every satellite. If a certain hub record doesnât have a satellite record for a certain time slice, that is where you then insert the reference to the ghost record, so that when getting the data out, you can select from the PIT and inner join to the satellite (that of course must have that ghost record).
Patrick Cuba was of course way better in explaining this. This article should have the information (although it seems to be for Medium members only nowadays) .
Iâm familiar with Patrick Cubaâs article âData Vault Mysteries⌠Zero Keys & Ghost RecordsâŚâ (full version) and also with the remark that you cannot have descriptive information for a key linked to a null value. But in the scenario mentioned above, I need this information, and I donât want to use a left-join. Thatâs where these two Scalefree articles comes in play: Ghost Records and Zero Keys. What is your opinion?
Using the hash key of the unknown zero key for the ghost record in a hierarchical link scenario can indeed be a valid approach, especially if you are dealing with optional relationships or parent-child relationships in a database schema. This strategy helps in linking the ghost record to the associated hub (optional/unknown zero key) and subsequently performing an EQUI JOIN with the satellite.
Zero keys around a hub + link should also find a record loaded to the adjacent satellite; this means you do not need left joins anywhere
Zero keys can be extended to just -1, you could in theory use the concept to map other nuances around your keys, for example:
-1 is missing, use this for optional portions of the relationship
@patrickcuba: Feeling a bit lost here. How do I create a record for a zero key in the adjacent satellite? Upon table creation? (Naturally is not an option; the presence of the zero key in the link table is a consequence of the link being optional, not due to a null value in the source for that hub). And what about your quote âYou cannot have descriptive content describing nothing!â?
No, zero record attributes would have been mapped already as a part of the satellite table load from your staged content.
Also not correct in wording, a zero-key exists in one or more participants in the link table making that participant optional. Which is why a Link as a M:M construct infact supports 1:M, 1:0, 1:1 cardinalities, where you find the relationship is optional it is filled by that zero key. (hashed -1). This means you will have a zero key in the hub else how does one make the equi-join work between hub and a link?
Yes, so if you have attributes with valid values off a zero-key then has the DV been mapped correctly? Why are you getting descriptor data with no key? Should this be instead trapped in an Error table and prevented from loading into the vault?
Most of Patrickâs articles, which were on Medium, are also available on LinkedIn. If you know the title of the article then just add âLinkedInâ into the search and you can read the articles without having to sign up to Medium.