Search code examples
pythonpandasdbtduckdb

Pandas in python dbt model duckdb


Im trying to use pandas in a dbt python model (dbt-duckdb), but I keep getting the problem Python model failed: No module named 'pandas'. Here you can find my dbt model configuration:

import boto3
import pandas as pd
def model(dbt, session):
    dbt.config(
        materialized="table",
        packages = ["pandas==2.2.3"],
        python_version="3.11"
    )
    key = "my_key"
    bucket = "my_bucket"
    client = boto3.client('s3')
    return None

Also I know duckdb has a way of importing s3 files but I need to manipulate the files before duckdb reads them because they are not correct.

Also this is my models yaml config

version: 2

models:
  - name: test
    config:
      packages:
       - "pandas==2.2.3"

Also I have a virtualenvironemnt with pandas installed.

Anyone who has experience with it, thanks in advance!


Solution

  • Found it, make sure that the venv you are using is called dbt-env dbt will automatically take this venv where you have installed pandas or whatever package you needed!