Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: passing object references via "parameters" mechanism for LOAD FROM #5203

Open
NatanPuzis opened this issue Apr 7, 2025 · 0 comments
Assignees
Labels
feature New features or missing components of existing features

Comments

@NatanPuzis
Copy link

NatanPuzis commented Apr 7, 2025

API

Python

Description

One can use Pandas DataFrame to scan data for COPY <table> FROM (LOAD FROM df ...)

The way of specifying the reference to a dataframe is by providing the name of the Python variable inside the text of the query:

df = pd.DataFrame({...})  # --------------vv
conn.execute("COPY Person FROM (LOAD FROM df WHERE name="Alice" and age>=18 RETURN *)")

IMO, this is very unusual way of passing a Python object reference to a function.

Typically, when a query text is composed in the code, we'd use either literal values, e.g. where name='Alice' and age>=18 or
placeholders, e.g. where name=$p1 and age>=$p2 and then we would supply the associations between placeholders and
references to Python objects: parameters={"p1": "Alice", "p2": get_min_age()}

Accessing a local variable by its name embedded into query text also has some minor performance penalty as the implementing code has to call locals() and lookup for the df name within a dict in order to get the object reference.

Also, it seems impossible to use a function call rather than the name of a variable. The code below will fail:

def build_df():
    return pd.DataFrame({...})

conn.execute("COPY Person FROM (LOAD FROM build_df() WHERE age >= 18 RETURN *)")

Lastly, linters will complain about "not used variable df", as from their point of view df is defined but never used in the code.

I suggest to consider implementing an alternative approach for passing references to Python objects (e.g. dataframes) to the
Connection.execute() method. Most intuitive seems to be using the existing placeholder parameters mechanism, as it will be consistent with the rest of the Python API.

@NatanPuzis NatanPuzis added the feature New features or missing components of existing features label Apr 7, 2025
@acquamarin acquamarin self-assigned this Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features or missing components of existing features
Projects
None yet
Development

No branches or pull requests

2 participants