pyflink.dataframe.sql.sql#

sql(query: str, *, auto_bind: bool = True, **bindings: DataFrame) → DataFrame[source]#

Run a SQL SELECT query against DataFrames in scope, returning a DataFrame.

There are two independent ways DataFrames get bound to SQL names:

Explicit bindings via **bindings kwargs. These are user contracts: any problem (bad view name, non-DataFrame value, collision with an existing temporary table/view) is reported as ValueError.
Auto-bind (when auto_bind=True, the default): scans the caller’s globals and locals for DataFrame instances and registers them under their Python variable name. Best-effort: collisions warn and skip, invalid view names warn and skip, different-TableEnvironment DataFrames log and skip.

Parameters:

query – A SQL SELECT statement. Other statement kinds (INSERT, DDL) are not supported.
auto_bind – If True (default), auto-detect DataFrame instances in the caller’s globals and locals. See above.
**bindings – Explicit alias -> DataFrame mappings. Always registered, even when auto_bind is False. Take precedence over auto-bind on name collision.

Returns:

A new DataFrame backed by the query result.

Note

This function is not safe under concurrent calls against the same TableEnvironment: register / sql_query / drop is not an atomic sequence. Don’t call it from multiple threads sharing one t_env.

Example:

>>> import pyflink.dataframe as pf
>>> df1 = pf.from_dict({"a": [1, 2, 3], "b": ["x", "y", "z"]})
>>> df2 = pf.from_dict({"a": [1, 2, 3], "c": ["p", "q", "r"]})
>>> pf.sql("SELECT * FROM df1 JOIN df2 ON df1.a = df2.a").show()

pyflink.dataframe.sql.sql#

This Page