Skip to main content
Ctrl+K
PyFlink 1.20+vvr.11.7.dev0 documentation - Home PyFlink 1.20+vvr.11.7.dev0 documentation - Home
  • API Reference
  • Examples
  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
  • PyFlink DataFrame
    • DataFrame
    • DataFrame Creation
    • Input/Output
    • SQL
    • DataType
    • User Defined Functions
    • Configuration
    • GPU Support
    • AI / LLM
  • PyFlink Common
  • API Reference
  • PyFlink DataFrame
  • DataFrame
  • pyflink.dataframe.dataframe.DataFrame.map

pyflink.dataframe.dataframe.DataFrame.map#

DataFrame.map(func, *, return_dtype=None, concurrency: int | None = None, num_gpus: float | None = None, gpu_type: str | None = None) → DataFrame[source]#

Apply a function to each row, producing a new DataFrame.

The function receives a dict[str, value] and returns a dict[str, value]. This is a 1-to-1 row transformation.

return_dtype can be omitted if the function has a TypedDict return type hint.

Parameters:
  • func – A function (dict -> dict), or a DataFrameUDFWrapper.

  • return_dtype – Output schema as DataType.struct(…). Can be omitted if func has a TypedDict return hint.

  • concurrency – Optional concurrency (parallelism) for the UDF operator. If specified, the operator running this UDF will use this parallelism. UDFs with different concurrency values will be split into separate operators.

  • num_gpus – Optional number of GPUs requested for this UDF (e.g., 0.5, 1.0). Each GPU UDF will run in its own operator.

  • gpu_type – GPU type (e.g., ‘A10’, ‘V100’).

Returns:

A new DataFrame with the transformed rows.

Example::
>>> # With explicit return_dtype
>>> df.map(lambda row: {"a": row["a"] + 1, "b": row["b"].upper()},
...        return_dtype=DataType.struct({"a": DataType.int64(),
...                                      "b": DataType.string()}))
>>>
>>> # With TypedDict return hint (auto-inferred)
>>> class Output(TypedDict):
...     a: int
...     b: str
>>> def process(row) -> Output:
...     return {"a": row["a"] + 1, "b": row["b"].upper()}
>>> df.map(process)
>>>
>>> # With concurrency
>>> df.map(process, concurrency=4)
>>>
>>> # With GPU resources
>>> df.map(process, num_gpus=0.5, gpu_type='A10')

previous

pyflink.dataframe.dataframe.DataFrame.join

next

pyflink.dataframe.dataframe.DataFrame.map_batches

On this page
  • DataFrame.map()

This Page

  • Show Source

Created using Sphinx 7.4.7.

Built with the PyData Sphinx Theme 0.16.1.