(Hi all, I added this question for information and discussion purposes as it is a common topic in our projects)
I have an integer column which is a business key which requires leading zeros in the key. However, these are lost when I process the integer in the Data Vault. I assume I need to change the integer to a varchar or use a data type with padded zeros before the value or something like that. Where should I do this? Is this actually allowed?
Your business key on the hub should always be a varchar.
This will allow for future flexibility.
1 Like
As Keith says, business keys are not metrics or measures and therefore hub table business keys must always be data typed as text strings.
This should help you convince your team
In this case you should change datatype to varchar. But i dont agree that varchar for each business key should be a rule of thumb. Why pay the price of a datatype that requires more storage if it is not needed? If the key needs to be migrated in the future then it is not a hard task. A rule of thumb would be to use the datatype that is used in the source. That Will enabled hard business rule validation of the business key.
I think you might be living in the past! On Cloud storage is cheap! Business keys must always be varchar to ensure it always integrate when you add more data sources to it who might represent their business key differently. Why risk the rework/refactor when you can solve that up front? Data Vault’s selling point is that you never have to refactor it, setting business keys to an int on your hub almost guarantees you will rework it at some point. Not sure what storage issues you may find when you’re simply dealing with hub table which are by design thin anyway.
Stick with me and you won’t need to refactor your data models!
Why use an integer data type if you never plan to do any calculation on that column?
In DV2.0 the rule of thumb is varchar your bkeys, it might not be a hard task but why spend the time and money on something that would never be an issue?
It’s like building a house without roof because “in this suburb it never rains”. Mate, eventually, it will!
BTW, the hub table is the only place we conform the source column to how we would like to call our business keys; I suggest you do it with your hub business key data type as well
Hello Patrick!
Thank you for your reply. I might live in the past, I don’t know!
I am aware that storage is cheap and has been for a long time. A larger datatype requires more storage and more storage requires more reads from DB and more reads from DB leads to longer execution time for queries. For me it is not a matter of storage, it is a matter of speed.
Even if one of Data Vault’s selling points is that you never have to refactor I think that this is not the case in the real world. In the real world we do mistakes all the time and there is always a need to fix issues that you learned on the way. So having a slim process where you migrate och fix issues along the way makes a better system in the long run, my opinion.
You can make it a real case if you just simply use varchar — this does not need to be more complicated then that, really!
For a customer imagine if we sell, “you never have to refactor”, “oh except business keys!”.
Makes no sense!
Hello Patrik.
Thank you for your reply.
I disagree with the selling point “you never have to refactor”. This implies that you never do mistakes, learn things on the way, etc. Data vault is fantastic when it comes to extending the model, separate data aquisition from reporting, not losing any data. This means that refactoring is possible without losing data, which is incredible. I think you should always be prepared to do some kind of refactoring. If you look into regular development processes, they need to refactor a lot along the way. Why wouldn’t this be the case in Data vault systems? A system that cannot refactor its code or data structures becomes very limited, this because the prize of altering or adding functionality will increase until the business thinks that changes are too expensive and that will be the begin of end for the system.
So I think refactoring at some point is inevitable.
You’re right, I was hoping you were learning something today.