Skip to main content
Ctrl+K
PyFlink 1.20+vvr.11.7.dev0 documentation - Home PyFlink 1.20+vvr.11.7.dev0 documentation - Home
  • API Reference
  • Examples
  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
  • PyFlink DataFrame
    • DataFrame
    • DataFrame Creation
    • Input/Output
    • SQL
    • DataType
    • User Defined Functions
    • Configuration
    • GPU Support
    • AI / LLM
  • PyFlink Common
  • API Reference
  • PyFlink DataFrame
  • DataFrame
  • pyflink.dataframe.dataframe.DataFrame.iter_batches

pyflink.dataframe.dataframe.DataFrame.iter_batches#

DataFrame.iter_batches(*, batch_size: int = 1000) → Iterator[pd.DataFrame][source]#

Return an iterator of pandas DataFrames, each containing up to batch_size rows.

This is a terminal operation that triggers execution. It uses Arrow-based serialization for efficient data transfer.

Parameters:

batch_size – The maximum number of rows per batch. Defaults to 1000.

Returns:

A generator that yields pandas DataFrames.

Example:

>>> df = pf.from_records([(i,) for i in range(2500)], schema=["a"])
>>> for batch in df.iter_batches(batch_size=1000):
...     print(len(batch))
1000
1000
500

previous

pyflink.dataframe.dataframe.DataFrame.iter_rows

next

pyflink.dataframe.dataframe.DataFrame.write_parquet

On this page
  • DataFrame.iter_batches()

This Page

  • Show Source

Created using Sphinx 7.4.7.

Built with the PyData Sphinx Theme 0.16.1.