Core Concepts

Introduction

Hotdata gives you on-demand OLAP databases you can create, populate with parquet, query, and destroy — all through a single API call. There is no infrastructure to provision and no schema to migrate.

The primary use case: an agent or application creates a database for a specific request, loads data into it, runs analytical queries (including vector search, full-text search, and geospatial), then discards the database when done. The whole lifecycle can happen in seconds.

You can also connect Hotdata to your existing databases and warehouses — Postgres, Snowflake, BigQuery, and others — and query them alongside your managed databases in a single SQL statement. No ETL, no replication.

Two ways to get data in

Managed databases (on demand)

The fastest path. Create a database via the API, declare tables, upload parquet files, and start querying. Everything is provisioned on demand — there is no server to manage.

# Create a database and load data in under 30 seconds
hotdata databases create \
  --catalog mydb \
  --table orders

hotdata databases load \
  --catalog mydb --table orders \
  --url https://example.com/orders.parquet

hotdata query "SELECT COUNT(*) FROM mydb.public.orders"

Managed databases persist until you delete them, or until an optional expires_at you set at creation time. This makes them ideal for agent workflows, per-request analytics, and exploratory work where you need real compute on temporary data.

See CLI Reference — Databases and API Reference — Databases.

Connections (existing sources)

Connect Hotdata to your existing databases and warehouses. Hotdata discovers the schema, caches it locally, and routes queries through the connection at execution time. Data is never moved or replicated unless you explicitly load it into a managed database.

Supported sources: Postgres, MySQL, Snowflake, BigQuery, DuckDB, and more. See Data Sources.

Once connected, tables are queryable as <connection>.<schema>.<table> in standard SQL. You can join across connections and managed databases in a single query.

How it fits together

                    ╔═ hotdata ══════════════════════════════════╗
╔══════════╗        ║                                            ║░
║          ║░  API  ║  ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓  ║░
║  client  ║░──────▶║  ┃  workspace                          ┃  ║░
║          ║░       ║  ┃  ┏━━━━━━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━┓ ┃  ║░
╚══════════╝░       ║  ┃  ┃ managed db    ┃ ┃ connection   ┃ ┃  ║░
 ░░░░░░░░░░░░       ║  ┃  ┃  (on demand)  ┃ ┃  (external)  ┃ ┃  ║░
                    ║  ┃  ┃ - parquet     ┃ ┃ - postgres   ┃ ┃  ║░
                    ║  ┃  ┃ - ephemeral   ┃ ┃ - snowflake  ┃ ┃  ║░
                    ║  ┃  ┃ - any SQL     ┃ ┃ - bigquery   ┃ ┃  ║░
                    ║  ┃  ┗━━━━━━━━━━━━━━━┛ ┗━━━━━━━━━━━━━━┛ ┃  ║░
                    ║  ┃              ↘         ↙             ┃  ║░
                    ║  ┃         hybrid query engine          ┃  ║░
                    ║  ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛  ║░
                    ╚════════════════════════════════════════════╝░
                     ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Organization

A boundary for users and workspaces. All activity is scoped within an organization, including access control, resource limits, and usage tracking. Organizations isolate teams and environments while sharing a common governance layer. See API Reference — Workspaces.

Workspace

An isolated execution environment provisioned on demand. Each workspace runs independently with its own compute, storage, and security boundary. Workspaces persist until explicitly deleted, allowing agents and applications to create and use them without affecting other workloads. See API Reference — Workspaces and CLI Reference — Workspaces.

Managed Databases

Hotdata-owned OLAP databases you create via the API, populate with parquet, and query immediately. Unlike connections, managed databases have no external dependency — you define the schema, load the data, and Hotdata handles the rest.

Key properties:

Created on demand — a single API call or CLI command is all you need
Loaded from parquet — upload a file or point to a URL; no schema migration required
Any SQL — analytical queries, window functions, vector search, full-text, geospatial, all in one engine
Ephemeral or persistent — set expires_at for automatic cleanup, or delete explicitly

Tables inside a managed database are addressed as <catalog>.<schema>.<table> in SQL, where <catalog> is the alias set with --catalog at create time.

See CLI Reference — Databases and API Reference — Databases.

Connection

A configuration for an external data source (Postgres, Snowflake, BigQuery, SaaS APIs, and more). Connections are created within a workspace and used to discover and query tables from existing systems. Each connection has a unique ID and operates through controlled, read-only access. Schema discovery and metadata are cached for faster access.

Tables from a connection are addressed as <connection>.<schema>.<table> in SQL — no data movement required.

See Data Sources, API Reference — Connections, and CLI Reference — Connections.

Secrets

Credentials used by connections (passwords, tokens, API keys). Secrets are securely stored and scoped to a workspace. Values are never returned by read APIs and are injected only at execution time. This prevents leakage while allowing dynamic access to external systems. See API Reference — Secrets.

Saved Queries

Reusable query definitions that can be executed multiple times. They capture logic without storing results, making them useful for standard transformations, recurring analysis, and agent workflows. Saved queries can be versioned and combined with managed databases to build repeatable patterns. See API Reference — Saved Queries.

Persisted Results

Every query result is automatically stored in local storage. These results can be re-queried instantly, filtered, or joined without accessing the original source. This enables iterative workflows where each step builds on previous results. Persisted results also support time-based comparisons and replay. See API Reference — Results and CLI Reference — Results.

Vector Search

Uses usearch for approximate nearest neighbor search. Optimized for AVX-512 SIMD execution, enabling high-throughput similarity search on CPUs without GPU dependency. Designed for real-time retrieval of embeddings (text, images, etc.), supporting large-scale inference workloads with low latency. See SQL Reference — Vector search, CLI Reference — Search, and API Reference — Indexes.

Full text search

Built-in ranked full-text retrieval. Indexed and SIMD-optimized for fast evaluation of term relevance. Supports phrase matching, token weighting, and ranking across large text corpora. Eliminates the need for external search systems while maintaining strong relevance and performance. See SQL Reference — Full-text search, CLI Reference — Search, and API Reference — Indexes.

Geospatial Queries

Native support for spatial data types and operations such as distance calculations, containment checks, intersections, and bounding boxes. Enables location-aware filtering and joins within the same execution engine. Works alongside other query types, allowing spatial constraints to be combined with analytical, vector, and text queries. See SQL Reference — Geospatial functions.

OLAP (Analytical Queries)

Supports fast aggregations, filtering, and group-by operations over large datasets. Execution is vectorized and columnar, enabling efficient use of CPU and memory. Designed for analytical workloads where latency and throughput both matter. See SQL Reference — Aggregate functions, SQL Reference — Window functions, and API Reference — Query.

Hybrid Queries

Combines multiple query types in a single execution plan. For example: full-text search → vector similarity → relational filtering → geospatial constraints → final row retrieval. This avoids coordinating multiple systems and keeps execution within a single low-latency path. See API Reference — Query and SQL Reference — Overview.

Joining Across Results

Query results can be treated as tables and queried again. This allows joining across:

previous query outputs
different data sources
time-based snapshots

This model supports iterative computation, where each step refines the result without recomputing from the original data. It enables complex workflows to be expressed as a sequence of lightweight, composable queries. See API Reference — Query, API Reference — Databases, and SQL Reference — SELECT syntax.