Raise errors on view-table collisions#3380
Conversation
- Add `_raise_if_view_exists` helper that maps `view_exists(id) == True` at the target identifier to `TableAlreadyExistsError`. Catalogs without view support (most non-REST impls) raise `NotImplementedError` from `view_exists`; treat that as "no view at this identifier". - Wire the check into `create_table`, `register_table`, and `rename_table` on SqlCatalog, HiveCatalog, GlueCatalog, DynamoDbCatalog, and BigQueryMetastoreCatalog (skipping methods that already raise `NotImplementedError`). Also call it from `MetastoreCatalog.create_table_transaction`. - For `rename_table`, the check is on the destination identifier. - RestCatalog intentionally unchanged: the server's 409 response is the authority and an extra client-side HEAD per call would be wasted. - Re-uses `TableAlreadyExistsError` so existing `create_table_if_not_exists` callers see the same exception type they already catch. - Tests: unit tests for the helper (returns True / False / raises `NotImplementedError`) in `tests/catalog/test_base.py`, plus parametrized behavior tests over InMemoryCatalog + SqlCatalog in `tests/catalog/test_catalog_behaviors.py` that monkeypatch `view_exists` to assert each call site invokes the helper.
- Drop parametrization over (memory, sql, sql_without_rowcount): all three share the same SqlCatalog code path, and the helper itself is covered by unit tests in `tests/catalog/test_base.py`. Use a single InMemoryCatalog via a new `single_catalog` fixture. - Replace `lambda` patches with a recording fake so the tests assert *which* identifier was checked — matters for `rename_table`, where the destination (not the source) must be passed. - Add post-condition assertions: after the raise, the target table must not exist (proves the check ran before any commit-side effect), and for rename, the source must still be present at its original identifier.
| view_collision = catalog.view_exists(identifier) | ||
| except NotImplementedError: | ||
| view_collision = False | ||
| if view_collision: |
There was a problem hiding this comment.
Helper mirrors the pattern from Java HiveCatalog and Java REST's RESTSessionCatalog.replaceTransaction. Re-uses TableAlreadyExistsError so existing create_table_if_not_exists callers keep working. Catalogs without view support raise NotImplementedError from view_exists — treat that as 'no view'.
| sort_order: SortOrder = UNSORTED_SORT_ORDER, | ||
| properties: Properties = EMPTY_DICT, | ||
| ) -> CreateTableTransaction: | ||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
Deliberate divergence from Java: Java HiveCatalog.ViewAwareTableBuilder overrides create() with the view check but does not override createTransaction(), leaving the eager-create path stricter than the txn path. Checking here fixes that asymmetry — a create_table_transaction that would collide with a view fails fast, before any metadata files are written.
| table_name = Catalog.table_name_from(identifier) | ||
| if not self.namespace_exists(namespace_identifier): | ||
| raise NoSuchNamespaceError(f"Namespace does not exist: {namespace_identifier}") | ||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
Mirrors Java HiveCatalog.ViewAwareTableBuilder.create(). Java JdbcCatalog only checks views on replaceTransaction, not on create — runtime no-op here since SqlCatalog.view_exists raises NotImplementedError, but the helper keeps the call site uniform across catalogs.
| table_name = Catalog.table_name_from(identifier) | ||
| if not self.namespace_exists(namespace): | ||
| raise NoSuchNamespaceError(f"Namespace does not exist: {namespace}") | ||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
Mirrors Java HiveCatalog.registerTable. Java JdbcCatalog inherits registerTable from BaseMetastoreCatalog without a view check — runtime no-op here for the same reason as create_table.
| to_table_name = Catalog.table_name_from(to_identifier) | ||
| if not self.namespace_exists(to_namespace): | ||
| raise NoSuchNamespaceError(f"Namespace does not exist: {to_namespace}") | ||
| _raise_if_view_exists(self, to_identifier) |
There was a problem hiding this comment.
Mirrors Java HiveCatalog.renameTableOrView. Java JdbcCatalog.renameTable does the check too but gates it on schemaVersion == V1. PyIceberg's SqlCatalog has no equivalent V1/V2 split — view_exists raises NotImplementedError, so this is a no-op at runtime.
| ValueError: If the identifier is invalid. | ||
| """ | ||
| properties = {**DEFAULT_PROPERTIES, **properties} | ||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
1:1 with Java HiveCatalog.ViewAwareTableBuilder.create(). Hive is the only PyIceberg catalog that actually implements view_exists, so this is the call site where the check fires for real.
| if overwrite: | ||
| raise NotImplementedError("`overwrite` isn't supported") | ||
|
|
||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
1:1 with Java HiveCatalog.registerTable.
|
|
||
| if self.table_exists(to_identifier): | ||
| raise TableAlreadyExistsError(f"Table already exists: {to_table_name}") | ||
| _raise_if_view_exists(self, to_identifier) |
There was a problem hiding this comment.
1:1 with Java HiveCatalog.renameTableOrView. The check is on the destination, matching Java.
| """ | ||
| database_name, table_name = self.identifier_to_database_and_table(identifier) | ||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
Java GlueCatalog has no view support and no check here; closest reference is HiveCatalog.ViewAwareTableBuilder.create(). Runtime no-op until Glue gains view support — view_exists raises NotImplementedError.
| if overwrite: | ||
| raise NotImplementedError("`overwrite` isn't supported") | ||
|
|
||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
Same shape as the other Glue call sites — no Java analog (Glue Java has no view support), Hive reference HiveCatalog.registerTable. Runtime no-op.
| """ | ||
| from_database_name, from_table_name = self.identifier_to_database_and_table(from_identifier, NoSuchTableError) | ||
| to_database_name, to_table_name = self.identifier_to_database_and_table(to_identifier) | ||
| _raise_if_view_exists(self, to_identifier) |
There was a problem hiding this comment.
No Java Glue analog. Hive reference: HiveCatalog.renameTableOrView. Runtime no-op.
| ) | ||
|
|
||
| database_name, table_name = self.identifier_to_database_and_table(identifier) | ||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
Java DynamoDbCatalog has no view support. Closest reference is HiveCatalog.ViewAwareTableBuilder.create(). Runtime no-op.
| """ | ||
| from_database_name, from_table_name = self.identifier_to_database_and_table(from_identifier, NoSuchTableError) | ||
| to_database_name, to_table_name = self.identifier_to_database_and_table(to_identifier) | ||
| _raise_if_view_exists(self, to_identifier) |
There was a problem hiding this comment.
No Java DynamoDb analog. Hive reference: HiveCatalog.renameTableOrView. Runtime no-op.
| schema: Schema = self._convert_schema_if_needed(schema) # type: ignore | ||
|
|
||
| dataset_name, table_name = self.identifier_to_database_and_table(identifier) | ||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
No iceberg-java BigQuery analog. Hive reference for the semantics: HiveCatalog.ViewAwareTableBuilder.create(). Runtime no-op until BQ gains view support.
| if overwrite: | ||
| raise NotImplementedError("`overwrite` isn't supported") | ||
|
|
||
| _raise_if_view_exists(self, identifier) |
There was a problem hiding this comment.
No iceberg-java BigQuery analog. Hive reference: HiveCatalog.registerTable. Runtime no-op.
Rationale for this change
Are these changes tested?
Are there any user-facing changes?