pyflink.dataframe.dataframe.DataFrame.explain#
- DataFrame.explain(*, show_estimated_cost: bool = False, show_physical_execution_plan: bool = False) None[source]#
Print the AST and execution plan of this DataFrame.
- Parameters:
show_estimated_cost – If True, include the optimizer’s estimated cost (row count, cpu, io, etc.) for each physical node. Default is False.
show_physical_execution_plan – If True, include the physical execution plan in JSON format. Default is False.
Example:
>>> import pyflink.dataframe as pf >>> df = pf.from_records([(1, "a"), (2, "b")], schema=["id", "name"]) >>> df.explain() == Abstract Syntax Tree == LogicalTableScan(table=[[*anonymous_python-input-format$1*]]) == Optimized Physical Plan == TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name]) == Optimized Execution Plan == TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name]) >>> df.explain(show_estimated_cost=True) == Abstract Syntax Tree == LogicalTableScan(table=[[*anonymous_python-input-format$1*]]) == Optimized Physical Plan == TableSourceScan(...): rowcount = 1.0E8, cumulative cost = {1.0E8 rows, ...} == Optimized Execution Plan == TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name]) >>> df.explain(show_physical_execution_plan=True) == Abstract Syntax Tree == LogicalTableScan(table=[[*anonymous_python-input-format$1*]]) == Optimized Physical Plan == TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name]) == Optimized Execution Plan == TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name]) == Physical Execution Plan == { "nodes" : [ { "id" : 9, "type" : "Source: *anonymous_filesystem$1*[9]", "pact" : "Data Source", "contents" : "[9]:TableSourceScan(table=[[*anonymous_filesystem$1*, filter=[IS...", "parallelism" : 24 } ] }