Ibis

Use Ibis to create on-demand databases, upload data, and query with Python expressions — get pandas or Arrow results back without writing SQL.

Requirements: Python 3.10+, ibis-framework 10.x

Install

pip install hotdata-ibis

Connect

import ibis

con = ibis.hotdata.connect(
    api_url="https://api.hotdata.dev",
    token="YOUR_API_KEY",
    workspace_id="<workspace_id>",
)

URL-style also works:

con = ibis.connect("hotdata://api.hotdata.dev/?token=...&workspace_id=<workspace_id>")

Quickstart: create a database and query it

import time
import pandas as pd
import ibis

con = ibis.hotdata.connect(
    api_url="https://api.hotdata.dev",
    token="YOUR_API_KEY",
    workspace_id="<workspace_id>",
)

# 1. Create a database and declare the tables you'll load
con.create_database("sales", schema="public", tables=["orders"])

# 2. Upload a pandas DataFrame (or PyArrow table)
df = pd.DataFrame({
    "order_id": [1, 2, 3],
    "amount": [9.99, 49.99, 5.00],
    "region": ["west", "east", "west"],
})
con.create_table("orders", df, database=("sales", "public"), overwrite=True)

# 3. Uploads are async — wait briefly before querying
time.sleep(2)

# 4. Query with Ibis expressions
#    Managed tables are always accessed with catalog "default"
t = con.table("orders", database=("default", "public"))
result = (
    t.group_by("region")
    .agg(total=t.amount.sum())
    .order_by(ibis.desc("total"))
    .execute()  # returns a pandas DataFrame
)

# 5. Clean up
con.drop_table("orders", database=("sales", "public"))
con.drop_database("sales")

Managed databases

Managed databases are the primary way to bring data into Hotdata with Ibis. You declare a database and its tables, upload data as parquet, and query immediately.

Create and load

# Declare the database and all table names up front
con.create_database("analytics", schema="public", tables=["events", "users"])

# Upload from a pandas DataFrame
con.create_table("events", events_df, database=("analytics", "public"), overwrite=True)
con.create_table("users", users_df, database=("analytics", "public"), overwrite=True)

# PyArrow tables also work
import pyarrow as pa
table = pa.table({"id": [1, 2], "name": ["alice", "bob"]})
con.create_table("users", table, database=("analytics", "public"), overwrite=True)

Table names must be declared when the database is created — you cannot add new table names later without recreating the database.

Query

When querying, use "default" as the catalog — that is always the SQL prefix for managed tables:

t = con.table("events", database=("default", "public"))

# Ibis expression
result = (
    t.filter(t.event_type == "click")
    .group_by("user_id")
    .agg(n=t.count())
    .execute()
)

# Or raw SQL
result = con.sql(
    'SELECT user_id, COUNT(*) AS n '
    'FROM "default"."public"."events" '
    'WHERE event_type = \'click\' '
    'GROUP BY user_id'
).execute()

Delete

con.drop_table("events", database=("analytics", "public"))
con.drop_database("analytics")

Addressing summary

Operation	`database=` argument
`create_table` / `drop_table`	`("your-database-name", schema)`
`con.table(...)` / `con.sql(...)` when querying	`("default", schema)`

Query with Ibis expressions

.execute() returns a pandas DataFrame. Use .to_pyarrow() for an Arrow table or .to_pyarrow_batches() to stream batches:

t = con.table("orders", database=("default", "public"))

# Aggregate and sort
summary = (
    t.filter(t.amount > 10)
    .group_by("region")
    .agg(total=t.amount.sum(), n=t.count())
    .order_by(ibis.desc("total"))
    .execute()
)

# Arrow output
arrow_table = t.limit(1000).to_pyarrow()

# Streaming batches
with t.to_pyarrow_batches() as reader:
    for batch in reader:
        process(batch)

Raw SQL

Use con.sql(...) when you need Hotdata-specific syntax that Ibis doesn't model. You can chain Ibis expressions on the result:

base = con.sql(
    'SELECT * FROM "default"."public"."orders"',
    dialect="postgres",
)
result = base.filter(base.amount > 10).execute()

Connecting to existing sources

If you have existing databases or warehouses connected to your Hotdata workspace (Postgres, Snowflake, BigQuery, etc.), you can query them through the same Ibis connection. Set defaults to avoid qualifying every table reference:

con = ibis.hotdata.connect(
    api_url="https://api.hotdata.dev",
    token="YOUR_API_KEY",
    workspace_id="<workspace_id>",
    default_connection="my_postgres",
    default_schema="public",
)

t = con.table("orders")  # resolves to my_postgres.public.orders

Without defaults, pass database=(connection_id, schema) explicitly:

t = con.table("orders", database=("my_postgres", "public"))

Discover what's available:

con.list_catalogs()                                       # connection IDs
con.list_databases(catalog="my_postgres")                 # schemas
con.list_tables(database=("my_postgres", "public"))       # tables
con.get_schema("orders", catalog="my_postgres", database="public")

What's supported

Feature
`create_database` / `drop_database`	yes
`create_table` / `drop_table` (DataFrame or Arrow upload)	yes
`con.table(...)` with full schema metadata	yes
Filter, select, join, group_by, agg, order_by, limit	yes
`con.sql(...)` raw SQL	yes
`.execute()` → pandas, `.to_pyarrow()`, `.to_pyarrow_batches()`	yes
`list_catalogs`, `list_databases`, `list_tables`	yes
Temporary tables	no
Python UDFs	no
INSERT / UPDATE / DELETE on external connections	no

SQL compilation uses Ibis's Postgres dialect. Use con.sql(...) as a fallback for expressions that don't compile cleanly.