Skip to main content
Ctrl+K
PyFlink 1.20+vvr.11.7.dev0 documentation - Home PyFlink 1.20+vvr.11.7.dev0 documentation - Home
  • API Reference
  • Examples
  • API Reference
  • Examples

Section Navigation

  • PyFlink Table
  • PyFlink DataStream
    • StreamExecutionEnvironment
    • DataStream
    • Functions
    • State
    • Timer
    • Window
    • Checkpoint
    • Side Outputs
    • Asynchronous I/O
    • Connectors
    • Formats
  • PyFlink DataFrame
  • PyFlink Common
  • API Reference
  • PyFlink DataStream
  • Connectors
  • pyflink.datastream.connectors.file_system.BulkFormat

pyflink.datastream.connectors.file_system.BulkFormat#

class BulkFormat(j_bulk_format)[source]#

The BulkFormat reads and decodes batches of records at a time. Examples of bulk formats are formats like ORC or Parquet.

Internally in the file source, the readers pass batches of records from the reading threads (that perform the typically blocking I/O operations) to the async mailbox threads that do the streaming and batch data processing. Passing records in batches (rather than one-at-a-time) much reduce the thread-to-thread handover overhead.

For the BulkFormat, one batch is handed over as one.

Added in version 1.16.0.

previous

pyflink.datastream.connectors.file_system.StreamFormat

next

pyflink.datastream.connectors.file_system.FileSourceBuilder

On this page
  • BulkFormat

This Page

  • Show Source

Created using Sphinx 7.4.7.

Built with the PyData Sphinx Theme 0.16.1.