DataFrame#
DataFrame#
A DataFrame is the core abstraction of the PyFlink DataFrame API.
It wraps a PyFlink Table and provides a pandas-like interface for
data transformations. All transformation methods return a new DataFrame (immutable/fluent API).
Example:
>>> import pyflink.dataframe as pf
>>> df = pf.from_dict({"a": [1, 2, 3], "b": ["x", "y", "z"]})
>>> df.select("a", "b").filter(pf.col("a") > 1).collect()
|
Select columns from the DataFrame. |
|
Add a new column or replace an existing column. |
|
Add multiple columns or replace existing columns. |
|
Drop columns from the DataFrame. |
|
Filter rows based on a predicate. |
|
Rename columns. |
|
Group the DataFrame by columns. |
|
Apply aggregation expressions to the entire DataFrame. |
|
Join with another DataFrame. |
|
Apply a function to each row, producing a new DataFrame. |
|
Apply a function to batches of rows, producing a new DataFrame. |
|
Apply a chainable function to the DataFrame. |
Limit the DataFrame to the first n rows. |
|
Skip the first n rows of the DataFrame. |
|
Return the first n rows as a new DataFrame. |
|
Convert to a PyFlink Table. |
|
Convert to a Pandas DataFrame. |
|
Collect the DataFrame results to a list. |
|
Return an iterator over rows of the DataFrame. |
|
|
Return an iterator of pandas DataFrames, each containing up to batch_size rows. |
|
Write the DataFrame to Parquet file(s) at the given path. |
|
Write the DataFrame to a Kafka topic. |
|
Write the DataFrame to a MaxCompute (ODPS) table. |
|
Write the DataFrame to a Paimon table. |
|
Write the DataFrame to an SLS (Aliyun Log Service) logstore. |
|
Write the DataFrame to a Hologres table. |
|
Write the DataFrame using a custom connector. |
|
Print the AST and execution plan of this DataFrame. |
Get the schema of the DataFrame. |
|
Get the column names. |
|
|
Drop rows containing null values. |
|
Drop rows containing NaN values in float columns. |
|
Replace null values with a given value. |
|
Replace NaN values with a given value in float columns. |
Access LLM / AI functions. |
GroupedDataFrame#
A DataFrame that has been grouped on a set of grouping keys.
|
Apply aggregation expressions. |
Expressions#
Helper functions to create column references and literal expressions.