We have a HUB_ORDER that integrates ORDERs from Netsuite and Salesforce. The cadence for new Order data from Salesforce and Netsuite is different. So we have two DBT jobs that load data from these sources into their respective Satellites- SAT_ORDERS_NETSUITE and SAT_ORDERS_SALESFORCE. Sometimes these jobs overlap. HUB_ORDER runs only as part of the Salesforce DBT job and NOT the Netsuite DBT job. This is to prevent HUB_ORDER from getting duplicates, if the two aforementioned jobs run into each other.
Due to this we frequently run into the situation where we have the data in SAT_ORDERS_NETSUITE, but the corresponding Business Keys are not in the HUB_ORDER. This breaks the downstream tables.
How can we update HUB_ORDERas part of both DBT jobs, without the risk of duplicating Business Keys in the HUB_ORDER?
This is bad, this is really bad!
The real problem is the generated code that does not take into account that a hub can be loaded from different processes at the same time. You better fix the real problem instead of using a duct tape solution.
You have to edit the macro so it generates code that properly handles concurrency. The actual code on how you do this is not the same on every data platform. You must study how your SQL engine is dealing with concurrency.
Hi @patrickcuba . Yes these are all part of the same code, and dbtvault does UNION ALL. But the dbt loading pattern is that it creates a TEMP table with new hub hashkeys and then they get loaded into the HUB table. If two jobs are running at the same and they create a temp table at the same time, I believe there is a collision.
Do the insert with selection of Hubs that does not already exist.
Commit the database transaction.
If this code is run by two different processes at the same time, one will be locked at step nr 2 until the other transaction has finished. That is beacuse two different processes cannot obtain an exclusive table lock, one of them will always have to wait for the other.
If you dont code your hub load pattern like this there is a chance that one processes gets old data, kind of like a dirty read. If you have added a unique constraint on hub pk you risk a unique key violation error.
Hope this helps.