Ibis
Use Ibis to create on-demand databases, upload data, and query with Python expressions — get pandas or Arrow results back without writing SQL.
Requirements: Python 3.10+, ibis-framework 10.x
Install
pip install hotdata-ibis
Connect
import ibis
con = ibis.hotdata.connect(
api_url="https://api.hotdata.dev",
token="YOUR_API_KEY",
workspace_id="ws_...",
)
URL-style also works:
con = ibis.connect("hotdata://api.hotdata.dev/?token=...&workspace_id=ws_...")
Quickstart: create a database and query it
import time
import pandas as pd
import ibis
con = ibis.hotdata.connect(
api_url="https://api.hotdata.dev",
token="YOUR_API_KEY",
workspace_id="ws_...",
)
# 1. Create a database and declare the tables you'll load
con.create_database("sales", schema="public", tables=["orders"])
# 2. Upload a pandas DataFrame (or PyArrow table)
df = pd.DataFrame({
"order_id": [1, 2, 3],
"amount": [9.99, 49.99, 5.00],
"region": ["west", "east", "west"],
})
con.create_table("orders", df, database=("sales", "public"), overwrite=True)
# 3. Uploads are async — wait briefly before querying
time.sleep(2)
# 4. Query with Ibis expressions
# Managed tables are always accessed with catalog "default"
t = con.table("orders", database=("default", "public"))
result = (
t.group_by("region")
.agg(total=t.amount.sum())
.order_by(ibis.desc("total"))
.execute() # returns a pandas DataFrame
)
# 5. Clean up
con.drop_table("orders", database=("sales", "public"))
con.drop_database("sales")
Managed databases
Managed databases are the primary way to bring data into Hotdata with Ibis. You declare a database and its tables, upload data as parquet, and query immediately.
Create and load
# Declare the database and all table names up front
con.create_database("analytics", schema="public", tables=["events", "users"])
# Upload from a pandas DataFrame
con.create_table("events", events_df, database=("analytics", "public"), overwrite=True)
con.create_table("users", users_df, database=("analytics", "public"), overwrite=True)
# PyArrow tables also work
import pyarrow as pa
table = pa.table({"id": [1, 2], "name": ["alice", "bob"]})
con.create_table("users", table, database=("analytics", "public"), overwrite=True)
Table names must be declared when the database is created — you cannot add new table names later without recreating the database.
Query
When querying, use "default" as the catalog — that is always the SQL prefix for managed tables:
t = con.table("events", database=("default", "public"))
# Ibis expression
result = (
t.filter(t.event_type == "click")
.group_by("user_id")
.agg(n=t.count())
.execute()
)
# Or raw SQL
result = con.sql(
'SELECT user_id, COUNT(*) AS n '
'FROM "default"."public"."events" '
'WHERE event_type = \'click\' '
'GROUP BY user_id'
).execute()
Delete
con.drop_table("events", database=("analytics", "public"))
con.drop_database("analytics")
Addressing summary
| Operation | database= argument |
|---|---|
create_table / drop_table | ("your-database-name", schema) |
con.table(...) / con.sql(...) when querying | ("default", schema) |
Query with Ibis expressions
.execute() returns a pandas DataFrame. Use .to_pyarrow() for an Arrow table or .to_pyarrow_batches() to stream batches:
t = con.table("orders", database=("default", "public"))
# Aggregate and sort
summary = (
t.filter(t.amount > 10)
.group_by("region")
.agg(total=t.amount.sum(), n=t.count())
.order_by(ibis.desc("total"))
.execute()
)
# Arrow output
arrow_table = t.limit(1000).to_pyarrow()
# Streaming batches
with t.to_pyarrow_batches() as reader:
for batch in reader:
process(batch)
Raw SQL
Use con.sql(...) when you need Hotdata-specific syntax that Ibis doesn't model. You can chain Ibis expressions on the result:
base = con.sql(
'SELECT * FROM "default"."public"."orders"',
dialect="postgres",
)
result = base.filter(base.amount > 10).execute()
Connecting to existing sources
If you have existing databases or warehouses connected to your Hotdata workspace (Postgres, Snowflake, BigQuery, etc.), you can query them through the same Ibis connection. Set defaults to avoid qualifying every table reference:
con = ibis.hotdata.connect(
api_url="https://api.hotdata.dev",
token="YOUR_API_KEY",
workspace_id="ws_...",
default_connection="my_postgres",
default_schema="public",
)
t = con.table("orders") # resolves to my_postgres.public.orders
Without defaults, pass database=(connection_id, schema) explicitly:
t = con.table("orders", database=("my_postgres", "public"))
Discover what's available:
con.list_catalogs() # connection IDs
con.list_databases(catalog="my_postgres") # schemas
con.list_tables(database=("my_postgres", "public")) # tables
con.get_schema("orders", catalog="my_postgres", database="public")
What's supported
| Feature | |
|---|---|
create_database / drop_database | yes |
create_table / drop_table (DataFrame or Arrow upload) | yes |
con.table(...) with full schema metadata | yes |
| Filter, select, join, group_by, agg, order_by, limit | yes |
con.sql(...) raw SQL | yes |
.execute() → pandas, .to_pyarrow(), .to_pyarrow_batches() | yes |
list_catalogs, list_databases, list_tables | yes |
| Temporary tables | no |
| Python UDFs | no |
| INSERT / UPDATE / DELETE on external connections | no |
SQL compilation uses Ibis's Postgres dialect. Use con.sql(...) as a fallback for expressions that don't compile cleanly.
See also
- hotdata-ibis on GitHub
- Ibis documentation
- Python SDK — lower-level
hotdataAPI client