Catalog#

Catalog#

Catalog is responsible for reading and writing metadata such as database/table/views/UDFs from a registered catalog. It connects a registered catalog and Flink’s Table API.

Catalog.get_default_database()

Get the name of the default database for this catalog.

Catalog.list_databases()

Get the names of all databases in this catalog.

Catalog.get_database(database_name)

Get a database from this catalog.

Catalog.database_exists(database_name)

Check if a database exists in this catalog.

Catalog.create_database(name, database, ...)

Create a database.

Catalog.drop_database(name, ignore_if_exists)

Drop a database.

Catalog.alter_database(name, new_database, ...)

Modify an existing database.

Catalog.list_tables(database_name)

Get names of all tables and views under this database.

Catalog.list_views(database_name)

Get names of all views under this database.

Catalog.get_table(table_path)

Get a CatalogTable or CatalogView identified by tablePath.

Catalog.table_exists(table_path)

Check if a table or view exists in this catalog.

Catalog.drop_table(table_path, ...)

Drop a table or view.

Catalog.rename_table(table_path, ...)

Rename an existing table or view.

Catalog.create_table(table_path, table, ...)

Create a new table or view.

Catalog.alter_table(table_path, new_table, ...)

Modify an existing table or view.

Catalog.list_partitions(table_path[, ...])

Get CatalogPartitionSpec of all partitions of the table.

Catalog.get_partition(table_path, partition_spec)

Get a partition of the given table.

Catalog.partition_exists(table_path, ...)

Check whether a partition exists or not.

Catalog.create_partition(table_path, ...)

Create a partition.

Catalog.drop_partition(table_path, ...)

Drop a partition.

Catalog.alter_partition(table_path, ...)

Alter a partition.

Catalog.list_functions(database_name)

List the names of all functions in the given database.

Catalog.get_function(function_path)

Get the function.

Catalog.function_exists(function_path)

Check whether a function exists or not.

Catalog.create_function(function_path, ...)

Create a function.

Catalog.alter_function(function_path, ...)

Modify an existing function.

Catalog.drop_function(function_path, ...)

Drop a function.

Catalog.get_table_statistics(table_path)

Get the statistics of a table.

Catalog.get_table_column_statistics(table_path)

Get the column statistics of a table.

Catalog.get_partition_statistics(table_path, ...)

Get the statistics of a partition.

Catalog.bulk_get_partition_statistics(...)

Get a list of statistics of given partitions.

Catalog.get_partition_column_statistics(...)

Get the column statistics of a partition.

Catalog.bulk_get_partition_column_statistics(...)

Get a list of the column statistics for the given partitions.

Catalog.alter_table_statistics(table_path, ...)

Update the statistics of a table.

Catalog.alter_table_column_statistics(...)

Update the column statistics of a table.

Catalog.alter_partition_statistics(...)

Update the statistics of a table partition.

Catalog.alter_partition_column_statistics(...)

Update the column statistics of a table partition.

CatalogDatabase#

Represents a database object in a catalog.

CatalogDatabase.create_instance(properties)

Creates an instance of CatalogDatabase.

CatalogDatabase.get_properties()

Get a map of properties associated with the database.

CatalogDatabase.get_comment()

Get comment of the database.

CatalogDatabase.copy()

Get a deep copy of the CatalogDatabase instance.

CatalogDatabase.get_description()

Get a brief description of the database.

CatalogDatabase.get_detailed_description()

Get a detailed description of the database.

CatalogBaseTable#

CatalogBaseTable is the common parent of table and view. It has a map of key-value pairs defining the properties of the table.

CatalogBaseTable.create_table(schema[, ...])

Create an instance of CatalogBaseTable for the catalog table.

CatalogBaseTable.create_view(original_query, ...)

Create an instance of CatalogBaseTable for the catalog view.

CatalogBaseTable.get_options()

Returns a map of string-based options.

CatalogBaseTable.get_schema()

Get the schema of the table.

CatalogBaseTable.get_unresolved_schema()

Returns the schema of the table or view.

CatalogBaseTable.get_comment()

Get comment of the table or view.

CatalogBaseTable.copy()

Get a deep copy of the CatalogBaseTable instance.

CatalogBaseTable.get_description()

Get a brief description of the table or view.

CatalogBaseTable.get_detailed_description()

Get a detailed description of the table or view.

CatalogPartition#

Represents a partition object in catalog.

CatalogPartition.create_instance(properties)

Creates an instance of CatalogPartition.

CatalogPartition.get_properties()

Get a map of properties associated with the partition.

CatalogPartition.copy()

Get a deep copy of the CatalogPartition instance.

CatalogPartition.get_description()

Get a brief description of the partition object.

CatalogPartition.get_detailed_description()

Get a detailed description of the partition object.

CatalogPartition.get_comment()

Get comment of the partition.

CatalogFunction#

Represents a partition object in catalog.

CatalogFunction.create_instance(class_name)

Creates an instance of CatalogDatabase.

CatalogFunction.get_class_name()

Get the full name of the class backing the function.

CatalogFunction.copy()

Create a deep copy of the function.

CatalogFunction.get_description()

Get a brief description of the function.

CatalogFunction.get_detailed_description()

Get a detailed description of the function.

CatalogFunction.is_generic()

Whether or not is the function a flink UDF.

CatalogFunction.get_function_language()

Get the language used for the function definition.

ObjectPath#

A database name and object (table/view/function) name combo in a catalog.

CatalogPartitionSpec#

Represents a partition spec object in catalog. Partition columns and values are NOT of strict order, and they need to be re-arranged to the correct order by comparing with a list of strictly ordered partition keys.

CatalogPartitionSpec.get_partition_spec()

Get the partition spec as key-value map.

CatalogTableStatistics#

Statistics for a non-partitioned table or a partition of a partitioned table.

CatalogTableStatistics.get_row_count()

The number of rows in the table or partition.

CatalogTableStatistics.get_field_count()

The number of files on disk.

CatalogTableStatistics.get_total_size()

The total size in bytes.

CatalogTableStatistics.get_raw_data_size()

The raw data size (size when loaded in memory) in bytes.

CatalogTableStatistics.get_properties()

CatalogTableStatistics.copy()

Create a deep copy of "this" instance.

CatalogColumnStatistics#

Column statistics of a table or partition.

HiveCatalog#

A catalog implementation for Hive.

HiveCatalog(catalog_name[, ...])

A catalog implementation for Hive.

JdbcCatalog#

A catalog implementation for Jdbc.

JdbcCatalog(catalog_name, default_database, ...)

A catalog implementation for Jdbc.

Column#

Representation of a column in a ResolvedSchema.

A table column describes either a pyflink.table.catalog.PhysicalColumn, pyflink.table.catalog.ComputedColumn, or pyflink.table.catalog.MetadataColumn.

Column.physical(name, data_type)

Creates a regular table column that represents physical data.

Column.computed(name, resolved_expression)

Creates a computed column that is computed from the given ResolvedExpression.

Column.metadata(name, data_type, ...)

Creates a metadata column from metadata of the given column name or from metadata of the given key (if not null).

Column.with_comment(comment)

Add the comment to the column and return the new object.

Column.is_physical()

Returns whether the given column is a physical column of a table; neither computed nor metadata.

Column.is_persisted()

Returns whether the given column is persisted in a sink operation.

Column.get_data_type()

Returns the data type of this column.

Column.get_name()

Returns the name of this column.

Column.get_comment()

Returns the comment of this column.

Column.as_summary_string()

Returns a string that summarizes this column for printing to a console.

Column.explain_extras()

Returns an explanation of specific column extras next to name and type.

Column.copy(new_type)

Returns a copy of the column with a replaced DataType.

Column.rename(new_name)

Returns a copy of the column with a replaced name.

WatermarkSpec#

Representation of a watermark specification in ResolvedSchema.

It defines the rowtime attribute and a ResolvedExpression for watermark generation.

WatermarkSpec.of(rowtime_attribute, ...)

Creates a WatermarkSpec from a given rowtime attribute and a watermark expression.

WatermarkSpec.get_rowtime_attribute()

Returns the name of a rowtime attribute.

WatermarkSpec.get_watermark_expression()

Returns the ResolvedExpression for watermark generation.

WatermarkSpec.as_summary_string()

Prints the watermark spec in a readable way.

Constraint#

Integrity constraints, generally referred to simply as constraints, define the valid states of SQL-data by constraining the values in the base tables.

Constraint.get_name()

Returns the name of the constraint.

Constraint.is_enforced()

Constraints can either be enforced or non-enforced.

Constraint.get_type()

Returns the type of the constraint, which could be PRIMARY_KEY or UNIQUE_KEY.

Constraint.as_summary_string()

Prints the constraint in a readable way.

UniqueConstraint#

A unique key constraint. It can be declared also as a PRIMARY KEY.

UniqueConstraint.get_columns()

List of column names for which the primary key was defined.

UniqueConstraint.get_type_string()

Returns a string representation of the underlying constraint type.

ResolvedSchema#

Schema of a table or view consisting of columns, constraints, and watermark specifications.

This class is the result of resolving a Schema into a final validated representation.

  • Data types and functions have been expanded to fully qualified identifiers.

  • Time attributes are represented in the column’s data type.

  • pyflink.table.Expression have been translated to pyflink.table.catalog.ResolvedExpression

This class should not be passed into a connector. It is therefore also not serializable. Instead, the to_physical_row_data_type() can be passed around where necessary.

ResolvedSchema.of(columns)

Shortcut for a resolved schema of only columns.

ResolvedSchema.physical(column_names, ...)

Shortcut for a resolved schema of only physical columns.

ResolvedSchema.get_column_count()

Returns the number of Column of this schema.

ResolvedSchema.get_columns()

Returns all Column of this schema.

ResolvedSchema.get_column_names()

Returns all column names.

ResolvedSchema.get_column_data_types()

Returns all column data types.

ResolvedSchema.get_column(column_index_or_name)

Returns the Column instance for the given column index or name.

ResolvedSchema.get_watermark_specs()

Returns a list of watermark specifications each consisting of a rowtime attribute and watermark strategy expression.

ResolvedSchema.get_primary_key()

Returns the primary key if it has been defined.

ResolvedSchema.get_primary_key_indexes()

Returns the primary key indexes, if any, otherwise returns an empty list.

ResolvedSchema.to_source_row_data_type()

Converts all columns of this schema into a (possibly nested) row data type.

ResolvedSchema.to_physical_row_data_type()

Converts all physical columns of this schema into a (possibly nested) row data type.

ResolvedSchema.to_sink_row_data_type()

Converts all persisted columns of this schema into a (possibly nested) row data type.