Skip to main content
Ctrl+K
PyFlink 1.20+vvr.11.7.dev0 documentation - Home PyFlink 1.20+vvr.11.7.dev0 documentation - Home
  • API Reference
  • Examples
  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
  • PyFlink DataFrame
    • DataFrame
    • DataFrame Creation
    • Input/Output
    • SQL
    • DataType
    • User Defined Functions
    • Configuration
    • GPU Support
    • AI / LLM
  • PyFlink Common
  • API Reference
  • PyFlink DataFrame
  • DataFrame
  • pyflink.dataframe.dataframe.DataFrame.explain

pyflink.dataframe.dataframe.DataFrame.explain#

DataFrame.explain(*, show_estimated_cost: bool = False, show_physical_execution_plan: bool = False) → None[source]#

Print the AST and execution plan of this DataFrame.

Parameters:
  • show_estimated_cost – If True, include the optimizer’s estimated cost (row count, cpu, io, etc.) for each physical node. Default is False.

  • show_physical_execution_plan – If True, include the physical execution plan in JSON format. Default is False.

Example:

>>> import pyflink.dataframe as pf
>>> df = pf.from_records([(1, "a"), (2, "b")], schema=["id", "name"])
>>> df.explain()
== Abstract Syntax Tree ==
LogicalTableScan(table=[[*anonymous_python-input-format$1*]])

== Optimized Physical Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

== Optimized Execution Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

>>> df.explain(show_estimated_cost=True)
== Abstract Syntax Tree ==
LogicalTableScan(table=[[*anonymous_python-input-format$1*]])

== Optimized Physical Plan ==
TableSourceScan(...): rowcount = 1.0E8, cumulative cost = {1.0E8 rows, ...}

== Optimized Execution Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

>>> df.explain(show_physical_execution_plan=True)
== Abstract Syntax Tree ==
LogicalTableScan(table=[[*anonymous_python-input-format$1*]])

== Optimized Physical Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

== Optimized Execution Plan ==
TableSourceScan(table=[[*anonymous_python-input-format$1*]], fields=[id, name])

== Physical Execution Plan ==
{
  "nodes" : [ {
    "id" : 9,
    "type" : "Source: *anonymous_filesystem$1*[9]",
    "pact" : "Data Source",
    "contents" : "[9]:TableSourceScan(table=[[*anonymous_filesystem$1*, filter=[IS...",
    "parallelism" : 24
  } ]
}

previous

pyflink.dataframe.dataframe.DataFrame.write_custom

next

pyflink.dataframe.dataframe.DataFrame.schema

On this page
  • DataFrame.explain()

This Page

  • Show Source

Created using Sphinx 7.4.7.

Built with the PyData Sphinx Theme 0.16.1.