pyflink.dataframe.dataframe.DataFrame.explain#

DataFrame.explain(*, show_estimated_cost: bool = False, show_physical_execution_plan: bool = False) → None[source]#

Print the AST and execution plan of this DataFrame.

Parameters:

show_estimated_cost – If True, include the optimizer’s estimated cost (row count, cpu, io, etc.) for each physical node. Default is False.
show_physical_execution_plan – If True, include the physical execution plan in JSON format. Default is False.

Example:

>>> import pyflink.dataframe as pf
>>> df = pf.from_records([(1, "a"), (2, "b")], schema=["id", "name"])
>>> df.explain()
== Abstract Syntax Tree ==
LogicalTableScan(table=[[*anonymous_python-input-format$1*]])

== Optimized Physical Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

== Optimized Execution Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

>>> df.explain(show_estimated_cost=True)
== Abstract Syntax Tree ==
LogicalTableScan(table=[[*anonymous_python-input-format$1*]])

== Optimized Physical Plan ==
TableSourceScan(...): rowcount = 1.0E8, cumulative cost = {1.0E8 rows, ...}

== Optimized Execution Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

>>> df.explain(show_physical_execution_plan=True)
== Abstract Syntax Tree ==
LogicalTableScan(table=[[*anonymous_python-input-format$1*]])

== Optimized Physical Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

== Optimized Execution Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

== Physical Execution Plan ==
{
  "nodes" : [ {
    "id" : 9,
    "type" : "Source: *anonymous_filesystem$1*[9]",
    "pact" : "Data Source",
    "contents" : "[9]:TableSourceScan(table=[[*anonymous_filesystem$1*, filter=[IS...",
    "parallelism" : 24
  } ]
}

pyflink.dataframe.dataframe.DataFrame.explain#

This Page