Skip to content

demo: an easy to use catalog loader #1372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

demo: an easy to use catalog loader #1372

wants to merge 1 commit into from

Conversation

Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented May 23, 2025

Which issue does this PR close?

Part of #1254

What changes are included in this PR?

This PR introduces a catalog loader trait for Iceberg. I’ve also implemented it for the REST catalog to demonstrate its functionality.

I chose this approach to make integration with external tools, such as iceberg-playground by @liurenjie1024 or pyiceberg, easier.

Following the same pattern as pyiceberg, the loader itself only accepts a HashMap<String, String> and returns an Arc<dyn Catalog>. Users who prefer strongly-typed catalog builders can continue to use our existing APIs without any changes.

Are these changes tested?

In CI.

@Xuanwo
Copy link
Member Author

Xuanwo commented May 23, 2025

In pyiceberg, we have:

class RestCatalog(Catalog):
    uri: str
    _session: Session

    def __init__(self, name: str, **properties: str):
        """Rest Catalog.

        You either need to provide a client_id and client_secret, or an already valid token.

        Args:
            name: Name to identify the catalog.
            properties: Properties that are passed along to the configuration.
        """
        super().__init__(name, **properties)
        self.uri = properties[URI]
        self._fetch_config()
        self._session = self._create_session()

It worth consider to have static property keys like we do in FileIO.

#[async_trait]
pub trait CatalogLoader: Debug + Send + Sync {
/// Load a catalog from the given name and properties.
async fn load(properties: HashMap<String, String>) -> Result<Arc<dyn Catalog>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn load(properties: HashMap<String, String>) -> Result<Arc<dyn Catalog>>;
async fn load(name: String, properties: HashMap<String, String>) -> Result<Arc<dyn Catalog>>;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two problems with this approach:

  1. This is not easy to use when we know the concret type of catalog. For example, when the user just wants to create a RestCatalog. It will force user to do downcast. This is useful when the catalog has some extran functionality.
  2. This is not easy to use when the catalog builder has an advanced builder method, see https://github.com/apache/iceberg-rust/pull/1231/files#r2106848332

I think I can simplify the methods in my original proposal like your one, while keeping other things same, WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants