Closed
Description
Is your feature request related to a problem or challenge?
I am trying to use Sprox locally to query parquet files
Sprox currently proxies requests to an actual S3 instance or local file cache.
I would like to be able to create an EXTERNAL table to read from this instance. Here is how it works in DuckDB:
CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN,
ENDPOINT 'localhost:8080',
USE_SSL false,
URL_STYLE path
);
select * from read_parquet('s3://sprox/sample.parquet');
Describe the solution you'd like
I would like to do something like this in datafusion-cli
:
-- Create external table
CREATE EXTERNAL TABLE sample
STORED AS PARQUET
OPTIONS(
'aws.access_key_id' 'A',
'aws.secret_access_key' 'B',
'aws.endpoint' 'http://localhost:8080',
)
LOCATION 's3://sprox/sample.parquet';
When I run that today here is the error I get
datafusion-cli -f sprox.sql
DataFusion CLI v37.0.0
Internal error: Config value "" not found on AwsOptions.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Error during planning: table 'datafusion.public.sample' not found
I think this particular error is related to the fact that the config provider doesn't check for aws.endpoint
. However, even once I fixed that locally I still couldn't make the external table -- I get an error about scheme not allowed.
Describe alternatives you've considered
Note you can do this workflow using environment variables
$ (venv) andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion2/datafusion-cli$ AWS_ALLOW_HTTP=true AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B AWS_ENDPOINT=http://localhost:8080 datafusion-cli
DataFusion CLI v37.0.0
> CREATE EXTERNAL TABLE sample
STORED AS PARQUET
LOCATION 's3://sprox/sample.parquet';
0 row(s) fetched.
Elapsed 2.266 seconds.
Additional context
No response