Integration of Hydra Cloud with DBT (Data Build Tool)

part 1. explanation

Scope

The focus of this guide is to showcase the ease of using dbt (Data Build Tool) with Hydra Cloud. What is being presented is supplier-related performance metrics to an organization’s procurement officer.

For demonstration, the data and the KPIs were chosen from the generic supplier contract management process shown here.
Synthetic data is being used for this guide, and the metrics are the following.

Performance:

Volume this quarter vs previous:
# orders
$ sold
# customers

Quality:

Return rate

Loyalty:

% return customer

part 2. Setup

Setup

Hydra

‍
Configure your Hydra instance and have the connection details ready. Fetch them from your Hydra Cloud connection page as shown below.

DBT

Log into your workstation.

Open the terminal and configure and verify dbt-core installation using the following steps.

user@Ms demohydra % python3 -m venv .venv
user@Ms demohydra % source .venv/bin/activate
(.venv) user@Ms demohydra % pip install dbt
Collecting dbt
  Downloading dbt-1.0.0.tar.gz (2.9 kB)
  ………
(.venv) user@Ms demohydra % dbt --version
installed version: 0.21.1
Plugins:
  - bigquery: 0.21.1
  - snowflake: 0.21.1
  - redshift: 0.21.1
  - postgres: 0.21.1
(.venv) user@Ms demohydra %

Navigate to your project directory and initialize your dbt project using the following steps.

(.venv) user@Ms demohydra %

dbt init demohydra

Running with dbt=0.21.1

Creating dbt configuration folder at /Users/user/.dbt

With sample profiles.yml for bigquery

Your new dbt project "demohydra" was created! If this is your first time

using dbt, you'll need to set up your profiles.yml file -- this file will tell dbt how

to connect to your database. You can find this file by running:

open /Users/user/.dbt

For more information on how to configure the profiles.yml file,

please consult the dbt documentation here:

https://docs.getdbt.com/docs/configure-your-profile

One more thing:

Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:

https://community.getdbt.com/

Happy modeling!

Connection

Hydra

On the terminal, navigate to the project directory and review the sub-directories and files.

(.venv) user@Ms demohydra % tree demohydra
demohydra
├── README.md
├── analysis
├── data
├── dbt_project.yml
├── macros
├── models
│ └── example
│ ├── my_first_dbt_model.sql
│ ├── my_second_dbt_model.sql
│ └── schema.yml
├── snapshots
└── tests

Using a terminal, navigate to your workstation users directory and locate the hidden .dbt directory.

(.venv) user@Ms demohydra % cd Users/user
(.venv) user@Ms demohydra % ls -ali
total 208
259216 drwxr-x---+ 26 user staff 832 28 Aug 10:47 .
14533 drwxr-xr-x 5 root admin 160 30 Jun 10:32 ..
261323 -r-------- 1 user staff 8 15 Jun 17:15 .CFUserTextEncoding
520426 -rw-r--r--@ 1 user staff 8196 27 Aug 12:58 .DS_Store
274378 drwx------+ 13 user staff 416 25 Aug 16:01 .Trash
1310335 -rw-r--r-- 1 user staff 28 22 Jul 05:23 .bash_profile
3051934 drwxr-xr-x 3 user staff 96 28 Aug 10:47 .dbt
2303565 -rw-r--r-- 1 user staff 228 4 Aug 18:34 .gitconfig

Navigate inside the .dbt directory and open the profiles.yml in your favorite editor.

default:
outputs:
dev:
type: bigquery
method: oauth
project: [GCP project id]
dataset: [the name of your dbt dataset] # You can also use "schema" here

"schema"

here
threads: [1 or more]
timeout_seconds: 300
location: US # Optional, one of US or EU
priority: interactive
retries: 1

prod:
type: bigquery
method: service-account
project: [GCP project id]
dataset: [the name of your dbt dataset]
threads: [1 or more]
keyfile: [/path/to/bigquery/keyfile.json]
timeout_seconds: 300
priority: interactive
retries: 1

target: dev

Fetch your Hydra Cloud credentials and connection details: https://dashboard.hydra.so/

Under tag outputs add a tag: hydra and change the target tag value to hydra. Input your hydra connection details as values.

default:
outputs:
hydra:
type: postgres
host: hy-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.us-east-1.aws.hydradb.io
user: dbuser
password: yourverystrongpassword
port: 5432
dbname: yourhydradbname
schema: public
#The options below are optional and user should first establish basic connection using above params only.
threads: 1
keepalives_idle: 30 # default 0, indicating the system default. See below connect_timeout: 10 *# default 10 seconds*
retries: 2 # default 1 retry on error/timeout when opening connections
search_path: public
role: yourdbrole # Is your db role the same as your db user?
sslmode: prefer
#sslmode: [ optional, set the sslmode used to connect to the database ]

target: hydra

Open the terminal and navigate inside the project directory containing the dbt_project.yml file, and execute the dbt debug command as shown below.

(.venv) user@Ms demohydra % dbt debug
Running with dbt=0.21.1
dbt version: 0.21.1
python version: 3.9.5
python path: /Users/user/Documents/hydra/demohydra/.venv/bin/python3
os info: macOS-13.4.1-arm64-arm-64bit
Using profiles.yml file at /Users/user/.dbt/profiles.yml
Using dbt_project.yml file at /Users/user/Documents/hydra/demohydra/demohydra/dbt_project.yml
Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [OK found and valid]
Required dependencies:
  - git [OK found]
Connection:
  host: hy-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.us-east-1.aws.hydradb.io
  port: 5432
  user: dbuser
  database: yourhydradbname
  schema: public
  search_path: None
  keepalives_idle: 0
  sslmode: None
  Connection test: [OK connection ok]

The “OK connection ok” response confirms that the dbt-core is able to connect to your Hydra Cloud instance.

Congratulations! You have successfully established the initial connection between dbt and Hydra Cloud.

part 3. Tutorial

Building the Metrics

Source Tables

Open your favorite SQL management tool and connect to the Hydra Cloud. As shown in the workflow here, import the following source tables into the Hydra Cloud.

final_purchases
products
suppliers
leads

‍

Review the data types for the date columns in all the source tables.

Data Warehouse

Open your favorite code editor and navigate to the “demohydra” directory where you have your dbt files.

Create a directory under models and name it to “scm”.

‍

Now we create our first data warehouse table using the source tables.

Under models, create your schema.yml file and write the following table creation code.

We shall be using the schema.yml file to create all the tables needed in the Hydra Cloud.

In the models directory, create the file “quarters.sql” and write the sql code as shown below.

After creating the table definition under schema.yml and the models file, go ahead and execute the dbt run command in your project directory.

‍

(.venv) user@Ms demohydra % dbt run
Running with dbt=0.21.1
Found 3 models, 6 tests, 0 snapshots, 0 analyses, 162 macros, 0 operations, 0 seed files, 0 sources, 0 exposures
19:50:43 | Concurrency: 1 threads (target='hydra')
19:50:43 |
19:50:43 | 1 of 3 START table model public.my_first_dbt_model................... [RUN]
19:50:44 | 1 of 3 OK created table model public.my_first_dbt_model.............. [SELECT 2 in 0.97s]
19:50:44 | 2 of 3 START table model public.quarters............................. [RUN]
19:50:45 | 2 of 3 OK created table model public.quarters........................ [SELECT 3 in 0.73s]
19:50:45 | 3 of 3 START view model public.my_second_dbt_model................... [RUN]
19:50:46 | 3 of 3 OK created view model public.my_second_dbt_model.............. [CREATE VIEW in 1.30s]
19:50:46 |
19:50:46 | Finished running 2 table models, 1 view model in 9.28s.
Completed successfully
Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Open your favorite SQL management tool and connect to the Hydra Cloud.

As shown below, review that the models table has been successfully created in your Hydra Cloud.

Congratulations! You have now created and populated your first Hydra Cloud models table using dbt.

As shown in this workflow, repeat the previous steps for table column definitions inside the schema.yml file and create models under the demohydra/models/scm directory for the following tables and models.

current_quarter_customers.sql
final_table.sql
id_returners.sql
percent_customer_returners.sql
pivoted_metrics.sql
salesmetrics.sql
unpivoted.sql