pyflink.dataframe.dataframe.DataFrame.map#
- DataFrame.map(func, *, return_dtype=None, concurrency: int | None = None, num_gpus: float | None = None, gpu_type: str | None = None) DataFrame[source]#
Apply a function to each row, producing a new DataFrame.
The function receives a dict[str, value] and returns a dict[str, value]. This is a 1-to-1 row transformation.
return_dtype can be omitted if the function has a TypedDict return type hint.
- Parameters:
func – A function (dict -> dict), or a DataFrameUDFWrapper.
return_dtype – Output schema as DataType.struct(…). Can be omitted if func has a TypedDict return hint.
concurrency – Optional concurrency (parallelism) for the UDF operator. If specified, the operator running this UDF will use this parallelism. UDFs with different concurrency values will be split into separate operators.
num_gpus – Optional number of GPUs requested for this UDF (e.g., 0.5, 1.0). Each GPU UDF will run in its own operator.
gpu_type – GPU type (e.g., ‘A10’, ‘V100’).
- Returns:
A new DataFrame with the transformed rows.
- Example::
>>> # With explicit return_dtype >>> df.map(lambda row: {"a": row["a"] + 1, "b": row["b"].upper()}, ... return_dtype=DataType.struct({"a": DataType.int64(), ... "b": DataType.string()})) >>> >>> # With TypedDict return hint (auto-inferred) >>> class Output(TypedDict): ... a: int ... b: str >>> def process(row) -> Output: ... return {"a": row["a"] + 1, "b": row["b"].upper()} >>> df.map(process) >>> >>> # With concurrency >>> df.map(process, concurrency=4) >>> >>> # With GPU resources >>> df.map(process, num_gpus=0.5, gpu_type='A10')