Zero key as ghost record hash key?

Hello,

If I understand correctly, according to some articles from Scalefree, I should use the hash key of the unknown zero key as the hash key for the ghost record. This approach allows me to perform an EQUI JOIN on an hierarchical link (optional relationship / parent-child relationship) through the hub (optional / unknown zero key) with the satellite.

Is this approach correct?

Thank you for providing your insights.

Can you give some example source data, and specify how you loaded it into your Data Vault and for what scenario you want to get the data out, and how?

Scenario: Employer reporting to Manager

Source

Employee_Id Manager_id Name Function
A X Name_A Technical
B X Name_B Administrative
X NULL Name_X Manager

Hub_Employee

HK_Employee BK_Employee
HK(X) X
HK(A) A
HK(B) B
HK(NoRef) Noref

Link_Employee_Manager

HK_Link HK_Employee HK_Manager
HK(A|X) HK(A) HK(X)
HK(B|X) HK(B) HK(X)
HK(X|NoRef) HK(X) HK(NoRef)

Satellite_Employee

HK Name Function
HK(Ghost) NULL NULL
HK(X) Name_X Manager
HK(A) Name_A Technical
HK(B) Name_B Administrative

Select to get the data out

select
Hub_Employee.BK_Employee as Employee_id
, Satellite_Employee.Name
, Satellite_Employee.Function
, Satellite_Manager.Name as Manager
from
Link_Employee_Manager
inner join
Hub_Employee – Role Employee
on Hub_Employee.HK_Employee = Link_Employee_Manager.HK_Employee
inner join
Satellite_Employee – Role Employee
on Hub_Employee.HK_Employee = Satellite_Employee.HK_Employee
inner join
Hub_Employee Hub_Manager – Role Manager
on Hub_Manager.HK_Employee = Link_Employee_Manager.HK_Manager
inner join
Satellite_Employee Satellite_Manager – Role Manager
on Hub_Manager.HK_Employee = Satellite_Manager.HK_Employee
;

Output

If HK(Noref) = HK(Ghost) then output will be as desired
Employee_Id Name Function Manager
A Name_A Technical Name_X
B Name_B Administrative Name_X
X Name_X Manager

If HK(Noref) <> HK(Ghost) then output
Employee_Id Name Function Manager
A Name_A Technical Name_X
B Name_B Administrative Name_X
→ Missing record for Employee X

Hence initial question!

By having zero keys, you’re able to do an equi-join from the link back to the hub. A join to the sat however you’ll most likely always want to do using a left join.

Another thing is that you’d need to handle the timelines in your sat when getting the data out. Maybe you omitted it for the sake of simplicity.

But what matters in the end, is that ghost records should only get attention as soon as you start implement PIT tables. It’s there where you, for every time slice, for every hub record, get the respective record from every satellite. If a certain hub record doesn’t have a satellite record for a certain time slice, that is where you then insert the reference to the ghost record, so that when getting the data out, you can select from the PIT and inner join to the satellite (that of course must have that ghost record).

Patrick Cuba was of course way better in explaining this. This article should have the information (although it seems to be for Medium members only nowadays) .

Also, could you post the link to those Scalefree articles? I’m intrigued :slight_smile:

I’m familiar with Patrick Cuba’s article ‘Data Vault Mysteries… Zero Keys & Ghost Records…’ (full version) and also with the remark that you cannot have descriptive information for a key linked to a null value. But in the scenario mentioned above, I need this information, and I don’t want to use a left-join. That’s where these two Scalefree articles comes in play: Ghost Records and Zero Keys. What is your opinion?

My opinion is that I have no problem with doing left joins :wink:

1 Like

Using the hash key of the unknown zero key for the ghost record in a hierarchical link scenario can indeed be a valid approach, especially if you are dealing with optional relationships or parent-child relationships in a database schema. This strategy helps in linking the ghost record to the associated hub (optional/unknown zero key) and subsequently performing an EQUI JOIN with the satellite.

Scalefree seems to obscuring the purpose of Zero Keys and Ghost Records. They must be seperate, https://www.linkedin.com/pulse/data-vault-mysteries-zero-keys-ghost-records-patrick-cuba

Zero keys around a hub + link should also find a record loaded to the adjacent satellite; this means you do not need left joins anywhere
Zero keys can be extended to just -1, you could in theory use the concept to map other nuances around your keys, for example:

Ghost records should never be mixed with Zero keys; only used for equi joins for sats surrounding a hub or link.

@patrickcuba: Feeling a bit lost here. How do I create a record for a zero key in the adjacent satellite? Upon table creation? (Naturally is not an option; the presence of the zero key in the link table is a consequence of the link being optional, not due to a null value in the source for that hub). And what about your quote ‘You cannot have descriptive content describing nothing!’?

1 Like

No, zero record attributes would have been mapped already as a part of the satellite table load from your staged content.

Also not correct in wording, a zero-key exists in one or more participants in the link table making that participant optional. Which is why a Link as a M:M construct infact supports 1:M, 1:0, 1:1 cardinalities, where you find the relationship is optional it is filled by that zero key. (hashed -1). This means you will have a zero key in the hub else how does one make the equi-join work between hub and a link?

Yes, so if you have attributes with valid values off a zero-key then has the DV been mapped correctly? Why are you getting descriptor data with no key? Should this be instead trapped in an Error table and prevented from loading into the vault?

Hi Frenk,

Most of Patrick’s articles, which were on Medium, are also available on LinkedIn. If you know the title of the article then just add “LinkedIn” into the search and you can read the articles without having to sign up to Medium.

1 Like