pyflink.dataframe.dataframe.DataFrame.iter_batches#
- DataFrame.iter_batches(*, batch_size: int = 1000) Iterator[pd.DataFrame][source]#
Return an iterator of pandas DataFrames, each containing up to batch_size rows.
This is a terminal operation that triggers execution. It uses Arrow-based serialization for efficient data transfer.
- Parameters:
batch_size – The maximum number of rows per batch. Defaults to 1000.
- Returns:
A generator that yields pandas DataFrames.
Example:
>>> df = pf.from_records([(i,) for i in range(2500)], schema=["a"]) >>> for batch in df.iter_batches(batch_size=1000): ... print(len(batch)) 1000 1000 500