Data
query
Execute SQL queries against configured data sources
The query
function executes SQL queries against configured data sources and returns the results as a pandas DataFrame. It supports all data source types (CSV, PostgreSQL, ClickHouse).
Parameters
sql
(str): SQL query to executesource_name
(str): Name of the data source as configured in preswald.toml
Returns
pd.DataFrame
: Query results as a pandas DataFrame
Usage Examples
CSV Source
For CSV sources, you can query the data using standard SQL:
PostgreSQL Source
For PostgreSQL sources, queries are executed directly against the database:
ClickHouse Source
ClickHouse queries are also supported:
Query Engine
The query function uses DuckDB as the underlying query engine:
- For CSV files, DuckDB reads and processes the data directly
- For PostgreSQL, queries are executed through the postgres_scanner extension
- For ClickHouse, queries use the clickhouse_scanner extension
Error Handling
The function includes comprehensive error handling:
- Validates source existence
- Validates SQL syntax
- Handles connection and query errors
- Provides detailed error messages through logging
Example with error handling:
Best Practices
- Always call
connect()
before using - Always check if source exists in preswald.toml before querying
- Use parameterized queries when possible to prevent SQL injection
- Consider query performance and data volume
- Include appropriate WHERE clauses to limit result sets
- Use error handling when executing queries
Related Functions
connect()
: Must be called before using queryget_df()
: For retrieving entire tables/datasetsview()
: For rendering query result previews
Was this page helpful?