Core Concepts
Organization
A boundary for users and workspaces. All activity is scoped within an organization, including access control, resource limits, and usage tracking. Organizations isolate teams and environments while sharing a common governance layer.
Workspace
An isolated execution environment provisioned on demand. Each workspace runs independently with its own compute, storage, and security boundary. Workspaces persist until explicitly deleted, allowing agents and applications to create and use them without affecting other workloads.
Connection
A configuration for a data source (for example, Postgres, Snowflake, BigQuery, or SaaS APIs). Connections are created within a workspace and used to discover and query tables. Each connection has a unique ID and operates through controlled, read-only access. Schema discovery and metadata are cached for faster access.
Datasets
Materialized results of queries or uploaded files. Datasets represent a snapshot of data at a point in time and are stored locally for reuse. They can be queried, joined, and transformed without re-accessing the original source. This reduces latency and avoids repeated scans of upstream systems.
Secrets
Credentials used by connections (passwords, tokens, API keys). Secrets are securely stored and scoped to a workspace. Values are never returned by read APIs and are injected only at execution time. This prevents leakage while allowing dynamic access to external systems.
Saved Queries
Reusable query definitions that can be executed multiple times. They capture logic without storing results, making them useful for standard transformations, recurring analysis, and agent workflows. Saved queries can be versioned and combined with datasets to build repeatable patterns.
Sessions
A workspace-scoped grouping for exploratory work. While a session is active, your activity is tied to it; datasets created in a session are deleted when the session ends, so keep anything you need long-term outside a session. Sessions can include markdown notes for context. Use the CLI hotdata sessions commands for create, switch, and nested runs; see CLI Reference — Sessions.
Schedules
Controls how and when data is refreshed. Data sources are refreshed daily by default, with support for custom schedules (cron-based or usage-driven). Frequently accessed datasets can be updated more aggressively, while less active data can be refreshed lazily to optimize cost and performance.
Persisted Results
Every query result is automatically stored in local storage. These results can be re-queried instantly, filtered, or joined without accessing the original source. This enables iterative workflows where each step builds on previous results. Persisted results also support time-based comparisons and replay.
Vector Search
Uses usearch for approximate nearest neighbor search. Optimized for AVX-512 SIMD execution, enabling high-throughput similarity search on CPUs without GPU dependency. Designed for real-time retrieval of embeddings (text, images, etc.), supporting large-scale inference workloads with low latency.
Full text search
Built-in ranked full-text retrieval. Indexed and SIMD-optimized for fast evaluation of term relevance. Supports phrase matching, token weighting, and ranking across large text corpora. Eliminates the need for external search systems while maintaining strong relevance and performance.
Geospatial Queries
Native support for spatial data types and operations such as distance calculations, containment checks, intersections, and bounding boxes. Enables location-aware filtering and joins within the same execution engine. Works alongside other query types, allowing spatial constraints to be combined with analytical, vector, and text queries.
OLAP (Analytical Queries)
Supports fast aggregations, filtering, and group-by operations over large datasets. Execution is vectorized and columnar, enabling efficient use of CPU and memory. Designed for analytical workloads where latency and throughput both matter.
Hybrid Queries
Combines multiple query types in a single execution plan. For example: full-text search → vector similarity → relational filtering → geospatial constraints → final row retrieval. This avoids coordinating multiple systems and keeps execution within a single low-latency path.
Joining Across Results
Query results can be treated as datasets and queried again. This allows joining across:
- previous query outputs
- different data sources
- time-based snapshots
This model supports iterative computation, where each step refines the result without recomputing from the original data. It enables complex workflows to be expressed as a sequence of lightweight, composable queries.