pyflink.table.expression.Expression.fetch_content#
- Expression.fetch_content(concurrency=None) Expression[source]#
Asynchronously fetches content from the given URI (HTTP, OSS, HDFS, S3, etc.) and returns as VARBINARY. This is the recommended variant that uses async I/O to avoid blocking the operator thread, which is beneficial for high-throughput scenarios such as multi-modal inference pipelines.
Retries are handled by the framework-level async retry strategy, not by this function. Configure
table.exec.async-scalar.retry-strategyand related options.Supported URI schemes: - http:// or https:// - HTTP/HTTPS URLs - oss:// - Alibaba Cloud OSS - hdfs:// - HDFS - file:// - Local file system - s3:// or s3a:// - AWS S3
- Parameters:
concurrency – Optional. The number of concurrent fetch operations per operator instance. Must be a literal integer constant. Defaults to max(8, num of cpu cores).
Example usage:
>>> from pyflink.table import EnvironmentSettings, TableEnvironment >>> from pyflink.table.expressions import col >>> # Async download is non-blocking and improves throughput for I/O-heavy workloads >>> result = table.select(col("id"), col("uri").fetch_content()) >>> # With explicit concurrency >>> result = table.select(col("id"), col("uri").fetch_content(64))
- Returns:
A VARBINARY (bytes) representing the file content.
Added in version 1.12.0.