AutomateDV vs dataVault4dbt

Dear Data-Vaulters,
I’m about to choose which data vault package for dbt will be used in my company. Do you have experiences with automateDV and/or datavault4dbt? Which one is better?
To make things more complex on my side, I have hundreds of small data sources, part of them are manually created. So data quality is something that is a great problem on our side. This probably makes configurability and flexibility my top priority in terms of features.

Any thoughts and experiences will be greatly appreciated!

1 Like

datavault4dbt is a fork of AutomateDV.

We are using automatedv

Are you satisfied? Any obstacles? Do you need to use any workarounds? Or do you consult community often? :slight_smile:

anything using dbt needs workarounds… but that’s the beauty of using dbt packages, building your own macros

@kasia: I don’t have answer for you about the tool. I do have questions about the complexity you described.

  • Are the hundreds of small data sources existing to support the same process, solve the same business problem?

  • Are the ones systematically created share a common schema?

  • What is the percentage of data sources manually created in your source inputs?

  • Are manual data sources the root cause of data quality issues?

You mentioned data quality is something that is a great problem. If that is the case, your choice of ETL tool is not as important as addressing governance and stewardship of your data sources.

I would love to hear you take on my questions.
Best regards,
JF

1 Like

Hi,

All the sources support the same business process, they don’t share the same schema (each source has a different schema and there is nothing we can do about it), about 5-10% of sources are manually created, but each of source system is prone to human error (imagine invoicing systems where you enter data manually - the output has consistent schema and data types but the content can still be incorrect).

Yes I know that data governance is important and we’re working on it, but there are things we cannot really change. My research shows that we’re trying to solve an issue that basically doesn’t exist anywhere else. Currently nobody deals with manually created data to this extent.

Btw. I’m not working with invoices. It was just an example.

Thanks for your questions, I’m wondering what’s your thoughts now after I answered them :slight_smile:

I’ve tried both. I found at least that while datavault4dbt has forked AutomateDV, it has now gone quite a bit further than AutomateDV did. But I may not be up to date on the latest AutomateDV.

You can easily write some python code to create the yamls (which are basically python dicts) for both and just pass in different parameters(they are quite similar but have slightly different naming) and automate the hell out of it. If you have hundreds of small sources you’ll want lots of automation. I did try using the best of both, which is ok until you get to PITs - Datavault4DBT pits were better last time I looked, but they didn’t work on AutomateDV hubs for me (different HK data format I think)

Also, not sure we are allowed to pitch but I do some service provision around this so if you’re interested send me a DM. (putting this separately in case mods want to delete this post :slight_smile: