Just a quickie to get peoples thoughts on do you stick with source field names when populating SATs?
I regularly ingest from multiple sources, iSeries (DB2), SQL Server, Excel etc and our DV resides in Snowflake. As the bulk of our data comes from the iSeries, all field names are uppercase (yet meaning less e.g. PKHU16) and happily sit in snowflake. SQL Server and Excel though, field names come in a multitude of different case variations and odd spacings and separators which may or may not play nice in Snowflake.
In order to tick the box of being able to replicate source, any source file that I know is mixed case or contains odd chars I just slap " around them. Obviously hub keys have a common name used across all systems, so that’s not an issue (and they are all upper case). Preferably I’d like to UPPER every field name I ingest as I can replicate the source data, I just don’t like ugly field name
Be interested to know how people deal with this.
Something I have been struggling with as well. On one hand, you shouldn’t change source attribute names (except for the Hub BKs as you pointed out), on the other hand, I also want to adhere to my own naming and casing conventions.
In the end I chose to model everything in snake case (our enterprise technical data modelling convention) which means I have to change any upper case attribute that I’m sourcing (which ain’t bad) but also all Pascal/Camel cased attributes (which could be considered a bit bad).
Thanks for your thoughts Frenk, I caved in and kept the source casing (even though it hurts my eyes!) and leave column name fabrication to the information layer modellers - that way I can happily say that what is in my DV is true to source. Naming conventions go out the window when you give an empty application database to a dev team and say “fill your boots”, bites you in the bum when you need to then integrate the contents back into a DW If only I had a dedicated business architect/data architect per application project to build a sensible database for them… oh wait that’s me! We have a large IT section split into various dev areas, data team is probably the smallest naturally - I can’t govern them all when it comes to creating tables sadly!
Yup stick to source columns in raw vault satellites and adhere (as you have done) to the platform’s column naming reqs.
You’ll start defining naming standards in your business vault and information marts.