Data
get_df
Retrieve data from configured sources as pandas DataFrames
The get_df
function retrieves data from a configured source and returns it as a pandas DataFrame. For database sources (PostgreSQL, ClickHouse), a table name must be specified.
Parameters
source_name
(str): Name of the data source as configured in preswald.tomltable_name
(Optional[str]): Required for database sources, specifies which table to retrieve
Returns
pd.DataFrame
: Data from the specified source as a pandas DataFrame
Usage Examples
Note: connect
must be called before get_df
can be used.
CSV Source
For CSV sources, table_name
is not required since the entire CSV file is treated as a single table:
PostgreSQL Source
For PostgreSQL sources, table_name
is required:
ClickHouse Source
Similarly for ClickHouse sources, table_name
is required:
Error Handling
The function includes comprehensive error handling:
- Validates source existence
- Checks for required table_name parameter for database sources
- Handles connection and query errors
- Provides detailed error messages through logging
Best Practices
- Always call
connect()
before using - Always check if source exists in preswald.toml before calling
- For database sources, always provide
table_name
- Use error handling when calling the function
- Consider memory limitations when retrieving large datasets
Example with error handling:
Related Functions
connect()
: Must be called before using get_dfquery()
: For custom SQL queries against data sourcesview()
: For rendering DataFrame previews
Was this page helpful?