Data Vault Database Collation - Case Sensitive Or Not

Should the Datawarehouse collation be Case Sensitive or Not?

I am setting up a new database for Raw Data Vault on SQL Server.

The default setting for database server collation is SQL_Latin1_General_CP1_CI_AS

CI=Case Insensitive

I think using Case Insensitive setting has a drawback when any case change happens on the source data, it is not captured as a SCD on the satellite table. It may require some other handling as well when generating ‘hash diffs’ by changing case of the string before generating hash.

In my previous experience, it was Case sensitive and any change in CASE is also captured as a delta.

Would like to check with practitioners here if there is a recommendation of using a particular collation setting for a Datawarehouse. Any leads are welcome

Changes in case will be captured even if you use case insensitive because the calculated hashdiff will still be different. Case sensitivity (afaik) only applies to joining and filtering, not to hashing.

We want to capture those changes mate — even just in case to check the source is not misbehaving

1 Like