Hi All,
We have encountered a few special column names that are reserved words(keywords), and numeric column names like 238567 from the source tables. What is the best practice for handling these columns in the raw vault? Are we allowed to rename them appropriately in the raw vault, or should they renamed only in the later stages?
Regards,
S
TIA
I reckon youâd need to see what the platform supports⌠I have not seen many that support numeric column names, you might need to set a standard that such columns be renamed with a prefix to make it work â like a double underscore or somethingâŚ
1 Like
We are currently using Redshift, which supports numeric column names. To avoid using quotes(") when reading the columns, I want to rename the column. Is column renaming an anti-pattern in Datavault?
Ah I see. The DV2.0 recommendation is to not change it, youâll have to make a decision here.
1 Like
This could be cast as a hard business rule - a necessary change to be able to load the data. Try to keep the column names as similar to the source as possible to facilitate the audit trail - e.g. add a prefix to the coulmn name x_ or something like that.
4 Likes
Reopening this topic with a follow-on queryâŚ
If the recommendation is not to rename the source columns as they land into the raw vault layer unless you absolutely have no choice (due to target platform either not supporting the characters or a conflict with reserved words) where should they be renamed? Business Vault, Information Mart (fact/dim attributes), or (shock) within the reporting tool? Iâm leaning towards the Information Mart as I donât think simply renaming a column is really a business rule.
The source systems often have pretty unfriendly/unhelpful names (âorder numberâ might be âohnmrâ, or a custom source field configured in a COTS solution to be âmarital statusâ might be âattribute1â), so they need to be translated somewhere.
No renaming in BV, BV is sparsely modelled and the outcome of idempotent business rules.
Names changes like - to _ is trivial and a DB limitation â and thatâs ok. SAP tables can be difficult to understand but I have seen customers use a SAP mapping table to rename those to something sensible, this too is ok because. the SOURCE still controls that naming standard.
If you really want to rename RV columns why not do it in a view? The view is NOT BV, itâs what can call in DDD language, an âanti-corruption layerâ. 1. Data Vault and Domain Driven Design | by Patrick Cuba | Medium
Yeah I donât know where this sits in the broader methodology but it sounds like a bit of a pedantic thing to try to look through a book for. Changing these columns names in staging or ingestion isnât going to cause you any pattern breaking consequences, Iâd squash this problem early in the process rather than trying to use DV2.0 methodology to solve these issues but others have expressed different opinions.
At the end of the day these arenât really data issues since it doesnât actually touch any of the column contents. So itâs more of a preference on how you want to organize your data.
You should never feel shy about engaging the source to fix / modify things. I think this is important because âthe data vault is not the repository of technical debtâ.
1 Like
Being realistic, are commercial software packages suddenly going to change their data models to suit some downstream endeavour? No. Good luck renaming some legacy on-prem ERP data model to friendlier names - youâll break it and it wonât be supported. For SaaS products you donât get access to the data model anyway - youâll be relying on data extracts or REST API calls etc. where hopefully the object and attribute names may be closer to how they are labelled on screen in those packages. Hence it is about what is realistic to achieve. Nowhere are we advocating changes to the semantics of the content of the data - it is about getting the naming of these domain attributes aligned to how the business actually operates so what is ultimately presented to consumers isnât nonsensical. We create hubs based on the business view of the world not how the source system is modelled and the business key will not necessarily be named as per the originating source system name, especially as it designed for passive integration from multiple places.
I think the most important thing is traceability and ensuring we know that something in our model labelled âorder_nameâ originated from something called âohnameâ etc.
A customer of mine would be keen to deal with aligning labelling of attributes to the âreal worldâ as part of the staging process and theyâre quite happy to create a view against the original landed data where they deal with this relabelling, so to preserve traceability. So I guess it is down to whether you do it there or after the load into the raw data vault layer. I guess there is also a risk the âreal worldâ view ends up being departmental and not enterprise - so relabelling up front may create technical debt later on - hence my original thought on dealing with this in the information mart layer where domain focussed data sets are getting presented to specific groupings of individuals. It just means you may end up repeating the relabelling if the attribute names end up being more enterprise-wide after allâŚ
Of course not, I do a talk where I include 3 definitions from DDD to categorize this relationship. Part of âstrategic designâ, https://youtu.be/EKuXFTkK2Yw?t=839
Thatâs easy, RV will have that name, lineage is tracked by platform for catalogue tool
It will