Data Vault Integration for Dashboard Data Extraction

Hello everyone,

I’m working on a Data Vault implementation that currently includes Hubs, Links, and Satellites, including multi-active satellites and satellite links. The model is still in the development phase, so we’re focusing on relatively simple use cases.

I collaborate with a Java team that needs to extract data from the Raw Vault to present it on a dashboard. The data they require comes from:

  • A few Hubs
  • Two Satellites

I’m relatively new to Data Vault modeling and would like some guidance on the best practices for this scenario. Specifically:

  1. Should I create PIT and Bridge tables?
  • While I understand these structures improve query performance and simplify data consumption, I’m wondering if they are necessary at this early stage, given that the data requirements are straightforward.
  1. Would it be acceptable to create a view with joins on the Raw Vault tables?
  • My idea is to define a view to join the required Hubs, Links, and Satellites for the requested data. This would keep the logic in the database and provide a single source of truth for the Java team.
  1. Should I implement a Stored Procedure for parameterized data retrieval?
  • I’m considering creating a Stored Procedure that accepts parameters (e.g., specific keys or date ranges) and performs the necessary filtering. The Java team could call this procedure to fetch the data dynamically.

Since we’re still building the Data Vault and don’t yet have complex scenarios, I’m looking for recommendations on how to approach this:

  • Is it better to keep things simple with views and procedures for now, or should I start implementing PIT and Bridge tables early?
  • Are there any other best practices I should consider?

Thank you in advance for your suggestions!

Hi tg3

At this early stage of your project, I would recommend you stay with Views
using joins of Hubs/Sats/Links.

Then maybe move on to parameterized Stored Procs if they suit your Java DEV team needs.

PIT & Bridge tables are useful when you have already loaded a large volume of Hubs/Sats/Links into your DV and you need quick, performant accesses to their interlinked data for specific Use Cases. Sounds like you are not there as yet.

2 Likes

Hi @squash7733.

Thanks a lot for your help.

Yes I will stay only with the raw vault for now. The stored procedure I really need that because java team wants to call data with specific parameters. I don’t know if I should create a physical table to store that data (like a BRIDGE but with info from satellites) instead of querying directly the raw vault.

If I create that physical table I will be load that table when a file is ingested so the performance of the select (store procedure) will be better that querying the whole data vault.

Hi tg3,

The additional table you are referring to would be in your InfoMart layer. I would recommend creating this table. The InfoMart layer is your semantic layer and intended for the purpose of providing modelled data to your end-users.

Consider whether you need Indexes on your table to accommodate the parameter values passed by the Java Team in order to improve performance (and potentially, watch out for parameter sniffing issues).