dlt

Use dlt (data load tool) to build pipelines that load data from any source into Hotdata managed databases — with automatic schema inference, incremental loading, and parquet-based delivery. The Hotdata destination ships built into our dlt fork, so there's no separate package to install.

Install

Install the Hotdata-enabled dlt fork from GitHub with the parquet extra:

pip install "dlt[parquet] @ git+https://github.com/hotdata-dev/dlt.git@feat/hotdata-destination"

Authentication

The destination reads credentials from dlt's config system. Set them as environment variables:

export DESTINATION__HOTDATA__CREDENTIALS__API_KEY="your_api_key"
export DESTINATION__HOTDATA__CREDENTIALS__WORKSPACE_ID="<workspace_id>"

Or add them to .dlt/secrets.toml:

[destination.hotdata.credentials]
api_key = "your_api_key"
workspace_id = "<workspace_id>"

You can also pass credentials explicitly when configuring the destination (see below).

Quickstart

import dlt
from dlt.destinations import hotdata

pipeline = dlt.pipeline(
    pipeline_name="my_pipeline",
    destination=hotdata(),
    dataset_name="public",
)

data = [
    {"id": 1, "name": "Alice", "amount": 99.99},
    {"id": 2, "name": "Bob",   "amount": 49.50},
]

info = pipeline.run(data, table_name="customers")
print(info)

dlt infers the schema from your data, creates a managed database, and loads the records as parquet.

Configure the destination

from dlt.destinations import hotdata

destination = hotdata(
    credentials={                  # defaults to DESTINATION__HOTDATA__CREDENTIALS__* config
        "api_key": "your_api_key",
        "workspace_id": "<workspace_id>",
    },
    database_name="sales",         # managed database name (default: "dlt")
    schema="public",               # schema within the database (default: "public")
    write_disposition="append",    # default disposition (default: "append")
    create_database_if_missing=True,  # auto-create the database (default: True)
)

Load from a source

Use any dlt-verified source or a custom generator:

import dlt
from dlt.sources.sql_database import sql_database
from dlt.destinations import hotdata

source = sql_database(
    credentials="postgresql://user:pass@host/db",
    schema="public",
    table_names=["orders", "customers"],
)

pipeline = dlt.pipeline(
    pipeline_name="postgres_to_hotdata",
    destination=hotdata(),
    dataset_name="public",
)

info = pipeline.run(source)
print(f"Loaded {info.loads_ids} into Hotdata")

Incremental loading

dlt tracks state between runs — only new or updated rows are loaded on subsequent executions:

import dlt
from dlt.destinations import hotdata

@dlt.resource(primary_key="id", write_disposition="merge")
def events(
    updated_at=dlt.sources.incremental("updated_at")
):
    # fetch rows newer than updated_at.last_value
    yield fetch_events(since=updated_at.last_value)

pipeline = dlt.pipeline(
    pipeline_name="events_pipeline",
    destination=hotdata(),
    dataset_name="public",
)

pipeline.run(events())

Write dispositions:

Disposition	Behaviour
`append`	Add new rows to the table
`replace`	Replace the full table on each run
`merge`	Upsert rows matched by `primary_key`

Query loaded data

After the pipeline runs, query the table through Hotdata:

hotdata query \
  "SELECT name, amount FROM default.public.customers ORDER BY amount DESC" \
  --database sales

Or with the Python SDK:

from hotdata_framework import from_env

client = from_env()
result = client.execute_sql(
    "SELECT * FROM default.public.customers LIMIT 10",
    database="sales",
)
print(result.rows)