pyflink.dataframe.sql.sql#
- sql(query: str, *, auto_bind: bool = True, **bindings: DataFrame) DataFrame[source]#
Run a SQL SELECT query against DataFrames in scope, returning a DataFrame.
There are two independent ways DataFrames get bound to SQL names:
Explicit bindings via
**bindingskwargs. These are user contracts: any problem (bad view name, non-DataFrame value, collision with an existing temporary table/view) is reported asValueError.Auto-bind (when
auto_bind=True, the default): scans the caller’s globals and locals forDataFrameinstances and registers them under their Python variable name. Best-effort: collisions warn and skip, invalid view names warn and skip, different-TableEnvironment DataFrames log and skip.
- Parameters:
query – A SQL SELECT statement. Other statement kinds (INSERT, DDL) are not supported.
auto_bind – If True (default), auto-detect
DataFrameinstances in the caller’s globals and locals. See above.**bindings – Explicit alias -> DataFrame mappings. Always registered, even when
auto_bindis False. Take precedence over auto-bind on name collision.
- Returns:
A new DataFrame backed by the query result.
Note
This function is not safe under concurrent calls against the same TableEnvironment: register / sql_query / drop is not an atomic sequence. Don’t call it from multiple threads sharing one t_env.
Example:
>>> import pyflink.dataframe as pf >>> df1 = pf.from_dict({"a": [1, 2, 3], "b": ["x", "y", "z"]}) >>> df2 = pf.from_dict({"a": [1, 2, 3], "c": ["p", "q", "r"]}) >>> pf.sql("SELECT * FROM df1 JOIN df2 ON df1.a = df2.a").show()