Skip to content

Support connecting to local s3 object stores in datafusion-cli #10072

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

I am trying to use Sprox locally to query parquet files

Sprox currently proxies requests to an actual S3 instance or local file cache.

I would like to be able to create an EXTERNAL table to read from this instance. Here is how it works in DuckDB:

CREATE SECRET (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN,
    ENDPOINT 'localhost:8080',
    USE_SSL false,
    URL_STYLE path
);

select * from read_parquet('s3://sprox/sample.parquet');

Describe the solution you'd like

I would like to do something like this in datafusion-cli:

-- Create external table
CREATE EXTERNAL TABLE sample
STORED AS PARQUET
OPTIONS(
    'aws.access_key_id' 'A',
    'aws.secret_access_key' 'B',
    'aws.endpoint' 'http://localhost:8080',
)
LOCATION 's3://sprox/sample.parquet';

When I run that today here is the error I get

datafusion-cli -f sprox.sql
DataFusion CLI v37.0.0
Internal error: Config value "" not found on AwsOptions.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Error during planning: table 'datafusion.public.sample' not found

I think this particular error is related to the fact that the config provider doesn't check for aws.endpoint. However, even once I fixed that locally I still couldn't make the external table -- I get an error about scheme not allowed.

Describe alternatives you've considered

Note you can do this workflow using environment variables

$ (venv) andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion2/datafusion-cli$ AWS_ALLOW_HTTP=true AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B AWS_ENDPOINT=http://localhost:8080  datafusion-cli
DataFusion CLI v37.0.0
> CREATE EXTERNAL TABLE sample
STORED AS PARQUET
LOCATION 's3://sprox/sample.parquet';
0 row(s) fetched.
Elapsed 2.266 seconds.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions