dbt and BigQuery in Practice; A Use-Case of Transforming Stock Data
Updated 2022/6/11: I pushed my source code to GitHub: https://github.com/koyaaarr/invest-analytics-model
Introduction
This article explains how to use dbt and BigQuery to transform actual data.
You can easily create a data lake, data warehouse, and data mart using dbt. It also enables us to test our data quality. I will combine BigQuery with dbt to transform actual stock data into a data mart used by my dashboard.
This article relates to the following one, so please read it if you have some time.
Then, let's get started.
Modeling
Before getting into the transformation, we need to define the data schemas of each table.
Here is the image of the tables we need.

On the data mart side, I want to see the overall performance of my portfolio, and each stock ratio consists of that. Therefore, two data marts are needed to create for these purposes.
On the other hand, each stock data(VOO, BTC-USD, BND) is stored in Google Cloud Storage. Their format is CSV and contains dates and values like closing price.

Therefore, I need to aggregate those data sources into the data warehouse and transform them into each data mart.
Each data schema is described following section.
Introducing dbt and BigQuery
Here are the prerequisites of this use case. I will use dbt CLI and install using Python.
Python: 3.9.11
dbt-core: 1.1.0
dbt-bigquery: 1.1.0
First of all, you can initialize dbt by the following command.
dbt init
This command creates a lot of files and directories.
This command creates a lot of files and directories.
Then you can make "profiles.yml" in the same directory as "dbt_project.yml". This file is generated in "~/.dbt" by default, but I recommend you make this in your working directory to control by git.
In the beginning, you will edit "models/", "dbt_project.yml", and "profiles.yml".
Let's take a look at each file.
"dbt_project.yml" defines the configuration of the project. You will edit the bottom of this file. There are tables we create, and you can specify each table's materialization types.
name: 'invest_analytics'
version: '1.0.0'
config-version: 2~~~models:
invest_analytics:
invest_analytics_dev:
+materialized: view
warehouse-date:
warehouse-stock:
warehouse-portfolio:
mart-portfolio-value:
+materialized: table
mart-portfolio-ratio:
+materialized: table
"profiles.yml" defines system configuration, including connection with BigQuery. If you authenticate using a service account, you need to designate the key file.
invest_analytics:outputs:dev:
dataset: invest_analytics_dev
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: ../service_account.json
location: asia-northeast1
method: service-account
priority: interactive
project: invest-analytics-347211
threads: 1
type: bigquery
target: dev
"models" directory contains SQLs and "schema.yaml".

You can write standard SQL in dbt, but the only different thing is its source table.
You need to define the referenced table with dbt's format like this instead of the ordinal format.
select
Date
, cast(close_voo as integer) as close_voo
, cast(close_btcusd as integer) as close_btcusd
, cast(close_bnd as integer) as close_bnd
, cast((close_voo * pf.num_voo + close_btcusd * pf.num_btcusd + c close_bnd * pf.num_bnd) as integer) as close_total
from
{{ ref('warehouse-stock') }} as st
left outer join {{ ref('warehouse-portfolio') }} as pf
on 1 = 1
order by
Date
If your table is generated by source data like CSV, you can define source data like this.
select
max(case ticker when 'VOO' then num_of_hold else null end) as num_voo
, max(case ticker when 'BTC-USD' then num_of_hold else null end) as num_btcusd
, max(case ticker when 'BND' then num_of_hold else null end) as num_bnd
from
{{ source('invest_analytics_dev', 'source-portfolio') }}
Finally, you need to define the data schemas in "schema.yaml" like this.
version: 2
sources:
- name: invest_analytics_dev
tables:
- name: source-voo
- name: source-btcusd
- name: source-bnd
- name: source-portfoliomodels:
- name: mart-portfolio-ratio
description: ''
columns:
- name: ticker
description: ''
tests:
- unique
- not_null
- accepted_values:
values: ['voo', 'btcusd', 'bnd']
- name: close_percent
description: ''
tests:
- unique
- not_null
If you have data sources imported from CSV files, you can write them in the "sources" part.
Then you can add your tables' data schema. In addition, you can define tests for each column. This example contains the "uniqueness test", "not null test", and "accepted values test".
Once you've finished defining each file, let's generate tables by this command.
dbt run — full-refresh — profiles-dir .
Then you get the result like this.
12:11:02 Running with dbt=1.1.0
12:11:02 Unable to do partial parsing because a project config has changed
12:11:03 Found 5 models, 17 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 0 seed files, 4 sources, 0 exposures, 0 metrics
12:11:04 Concurrency: 1 threads (target='dev')
12:11:04 1 of 5 START view model invest_analytics_dev.warehouse-date .................... [RUN]
12:11:06 1 of 5 OK created view model invest_analytics_dev.warehouse-date ............... [OK in 1.61s]~~~12:11:11 5 of 5 START table model invest_analytics_dev.mart-portfolio-ratio ............. [RUN]
12:11:14 5 of 5 OK created table model invest_analytics_dev.mart-portfolio-ratio ........ [CREATE TABLE (3.0 rows, 62.7 KB processed) in 3.27s]
12:11:14 Finished running 3 view models, 2 table models in 11.36s.
12:11:14 Completed successfully
12:11:14 Done. PASS=5 WARN=0 ERROR=0 SKIP=0 TOTAL=5
You can see each table is created in Google Cloud Console.

If you want to check the quality of the data, run this command.
dbt test — profiles-dir .
Then, you get the result like this.
10:04:03 Running with dbt=1.1.0
10:04:03 Found 5 models, 17 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 0 seed files, 4 sources, 0 exposures, 0 metrics
10:04:04 Concurrency: 1 threads (target='dev')
10:04:04 1 of 17 START test accepted_values_mart-portfolio-ratio_ticker__voo__btcusd__bnd [RUN]
10:04:06 1 of 17 PASS accepted_values_mart-portfolio-ratio_ticker__voo__btcusd__bnd ..... [[32mPASS[0m in 2.16s]~~~10:04:31 17 of 17 START test unique_warehouse-stock_Date ................................ [RUN]
10:04:32 17 of 17 PASS unique_warehouse-stock_Date ...................................... [[32mPASS[0m in 1.33s]
10:04:32 Finished running 17 tests in 28.88s.
10:04:32 Completed successfully
10:04:32 Done. PASS=17 WARN=0 ERROR=0 SKIP=0 TOTAL=17
Lastly, let's generate the document of our tables by the following command.
dbt docs generate — profiles-dir .
dbt docs serve — profiles-dir .
Then you can see the table definitions and lineage.


Conclusion
I hope you can find this article helpful. I explained the actual use case of dbt and BigQuery with stock data. You can create tables according to their dependencies, test data quality, and even generate the definition documents.