Getting Started with dbt Seeds in Databricks: A Beginner-Friendly Guide

Posted on May 31, 2026 by Sumit Kumar

If you’ve started your dbt journey and successfully connected dbt Core with Databricks, congratulations! 🎉

The next feature you should learn is dbt Seeds. Seeds are one of the simplest yet most powerful features in dbt, especially when working with reference data, lookup tables, and demo datasets.

In this blog, we’ll explore what dbt Seeds are, why they are used, and how to implement them in Databricks with a practical example.

What Are dbt Seeds?

A dbt Seed is a CSV file that dbt can load directly into your data warehouse as a table.

Instead of manually creating and maintaining small tables in Databricks, you can store the data as CSV files within your dbt project and let dbt manage them.

Think of Seeds as:

“Version-controlled tables created from CSV files.”

Because the CSV files are stored inside your project, they can be tracked through Git, reviewed during code reviews, and deployed consistently across environments.

Why Use dbt Seeds?

dbt Seeds are ideal for small datasets that do not change frequently.

Some common use cases include:

Reference Data

Examples:

Country codes
Currency mappings
Department lists
Product categories

Example:

country_code	country_name
IN	India
US	United States
UK	United Kingdom

Lookup Tables

Many organizations maintain small mapping tables used across multiple transformations.

Example:

department_id	department_name
10	HR
20	Finance
30	IT

Instead of creating this table manually in Databricks, you can simply maintain it as a CSV file.

Development and Testing

Seeds are extremely useful when:

Learning dbt
Demonstrating concepts
Building proof-of-concepts
Creating test datasets

This makes Seeds a perfect feature for beginners who want to understand the dbt workflow.

Static Business Rules

Example:

status_code	description
A	Active
I	Inactive

Since such values rarely change, storing them as a Seed is often the easiest approach.

How dbt Seeds Work

The process is simple:

Create a CSV file.
Place it inside the seeds folder.
Run dbt seed.
dbt creates a table in Databricks.

The generated table can then be used inside your dbt models just like any other source table.

Project Structure

A typical dbt project may look like this:

my_dbt_project/
|
├── models/
├── seeds/
├── macros/
├── tests/
└── dbt_project.yml

Create a folder named:

seeds/

if it does not already exist.

Step 1: Create a Seed File

Create a file named:

seeds/employees.csv

Add the following content:

emp_id,emp_name,department,salary
1,Rahul,IT,50000
2,Priya,HR,40000
3,Amit,Finance,60000
4,Neha,IT,70000

This CSV file will become a table in Databricks.

Step 2: Load the Seed into Databricks

Run the following command:

dbt seed

Sample output:

Finished running 1 seed in 4.12 seconds

dbt will create a table called:

employees

inside your target schema.

Step 3: Verify the Data

Open Databricks and execute:

SELECT *
FROM employees;

Output:

emp_id	emp_name	department	salary
1	Rahul	IT	50000
2	Priya	HR	40000
3	Amit	Finance	60000
4	Neha	IT	70000

Congratulations! Your first Seed has been successfully loaded.

Step 4: Use the Seed in a dbt Model

Now let’s create a simple transformation.

Create a file:

models/high_salary_employees.sql

Add the following SQL:

SELECT *
FROM {{ ref('employees') }}
WHERE salary > 50000

Run:

dbt run

dbt creates a new model containing employees whose salary exceeds ₹50,000.

Output:

emp_id	emp_name	department	salary
3	Amit	Finance	60000
4	Neha	IT	70000

Why Use `ref()` Instead of Direct Table Names?

You may wonder why we use:

{{ ref('employees') }}

instead of:

SELECT * FROM employees

The answer is simple: dbt understands dependencies through ref().

Benefits include:

Automatic dependency tracking
Better lineage visualization
Easier environment management
Improved maintainability
Accurate build ordering

Using ref() is considered a dbt best practice.

Loading a Specific Seed

If your project contains multiple Seed files, you can load only one:

dbt seed --select employees

This is particularly useful in large projects.

Refreshing Existing Seed Data

When the CSV file changes, reload it using:

dbt seed --full-refresh

This recreates the table with the latest data.

Configuring Seed Schemas

You can control where Seed tables are created.

In dbt_project.yml:

seeds:
  my_dbt_project:
    +schema: seed_data

Now dbt will create the Seed table in:

seed_data.employees

This helps separate Seed tables from business models.

Real-World Example

Imagine a marketing analytics project where campaign data comes from multiple platforms.

You may maintain a channel mapping file like:

channel_id,channel_name
1,Facebook
2,Instagram
3,LinkedIn
4,Twitter

Instead of hardcoding these values in SQL, you can store them as a Seed and join them with campaign performance data.

Benefits include:

Easier maintenance
Better version control
Centralized mappings
Reduced SQL complexity

Best Practices for dbt Seeds

✅ Use Seeds for small datasets only.

✅ Store reference and lookup data as Seeds.

✅ Always use ref() when referencing Seeds.

✅ Keep Seed files under version control.

✅ Avoid loading large transactional datasets as Seeds.

❌ Do not use Seeds for millions of records.

❌ Do not use Seeds as a replacement for source systems.

Conclusion

dbt Seeds provide a simple and efficient way to manage small, static datasets directly within your dbt project. They are perfect for lookup tables, reference data, testing, and learning dbt concepts.

For beginners working with Databricks, learning Seeds is a great next step after creating your first dbt model. With just a CSV file and a single command, you can create reusable tables that integrate seamlessly into your dbt transformation workflow.

By mastering Seeds early, you’ll build cleaner projects, improve maintainability, and follow dbt best practices from day one.

Key Takeaways

dbt Seeds convert CSV files into database tables.
Ideal for lookup and reference data.
Loaded using the dbt seed command.
Can be referenced in models using ref().
Fully version-controlled and easy to maintain.
Commonly used in production analytics projects.

Happy Learning and Happy Data Transforming with dbt and Databricks! 🚀

Getting Started with dbt Seeds in Databricks: A Beginner-Friendly Guide

What Are dbt Seeds?

Why Use dbt Seeds?

Reference Data

Lookup Tables

Development and Testing

Static Business Rules

How dbt Seeds Work

Project Structure

Step 1: Create a Seed File

Step 2: Load the Seed into Databricks

Step 3: Verify the Data

Step 4: Use the Seed in a dbt Model

Why Use `ref()` Instead of Direct Table Names?

Loading a Specific Seed

Refreshing Existing Seed Data

Configuring Seed Schemas

Real-World Example

Best Practices for dbt Seeds

Conclusion

Key Takeaways

Leave a Reply Cancel reply

Recent Posts

Deltafrog Technology

Top Courses:

Latest Posts

Useful Links:

What Are dbt Seeds?

Why Use dbt Seeds?

Reference Data

Lookup Tables

Development and Testing

Static Business Rules

How dbt Seeds Work

Project Structure

Step 1: Create a Seed File

Step 2: Load the Seed into Databricks

Step 3: Verify the Data

Step 4: Use the Seed in a dbt Model

Why Use ref() Instead of Direct Table Names?

Loading a Specific Seed

Refreshing Existing Seed Data

Configuring Seed Schemas

Real-World Example

Best Practices for dbt Seeds

Conclusion

Key Takeaways

Leave a Reply Cancel reply

Recent Posts

Deltafrog Technology

Top Courses:

Latest Posts

Useful Links:

Why Use `ref()` Instead of Direct Table Names?