pyflink.table.expression.Expression.fetch_content#

Expression.fetch_content(concurrency=None) → Expression[source]#

Asynchronously fetches content from the given URI (HTTP, OSS, HDFS, S3, etc.) and returns as VARBINARY. This is the recommended variant that uses async I/O to avoid blocking the operator thread, which is beneficial for high-throughput scenarios such as multi-modal inference pipelines.

Retries are handled by the framework-level async retry strategy, not by this function. Configure table.exec.async-scalar.retry-strategy and related options.

Supported URI schemes: - http:// or https:// - HTTP/HTTPS URLs - oss:// - Alibaba Cloud OSS - hdfs:// - HDFS - file:// - Local file system - s3:// or s3a:// - AWS S3

Parameters:: concurrency – Optional. The number of concurrent fetch operations per operator instance. Must be a literal integer constant. Defaults to max(8, num of cpu cores).

Example usage:

>>> from pyflink.table import EnvironmentSettings, TableEnvironment
>>> from pyflink.table.expressions import col
>>> # Async download is non-blocking and improves throughput for I/O-heavy workloads
>>> result = table.select(col("id"), col("uri").fetch_content())
>>> # With explicit concurrency
>>> result = table.select(col("id"), col("uri").fetch_content(64))

Returns:: A VARBINARY (bytes) representing the file content.

Added in version 1.12.0.

pyflink.table.expression.Expression.fetch_content#

This Page