You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One can use Pandas DataFrame to scan data for COPY <table> FROM (LOAD FROM df ...)
The way of specifying the reference to a dataframe is by providing the name of the Python variable inside the text of the query:
df = pd.DataFrame({...}) # --------------vv
conn.execute("COPY Person FROM (LOAD FROM df WHERE name="Alice" and age>=18 RETURN *)")
IMO, this is very unusual way of passing a Python object reference to a function.
Typically, when a query text is composed in the code, we'd use either literal values, e.g. where name='Alice' and age>=18 or
placeholders, e.g. where name=$p1 and age>=$p2 and then we would supply the associations between placeholders and
references to Python objects: parameters={"p1": "Alice", "p2": get_min_age()}
Accessing a local variable by its name embedded into query text also has some minor performance penalty as the implementing code has to call locals() and lookup for the df name within a dict in order to get the object reference.
Also, it seems impossible to use a function call rather than the name of a variable. The code below will fail:
def build_df():
return pd.DataFrame({...})
conn.execute("COPY Person FROM (LOAD FROM build_df() WHERE age >= 18 RETURN *)")
Lastly, linters will complain about "not used variable df", as from their point of view df is defined but never used in the code.
I suggest to consider implementing an alternative approach for passing references to Python objects (e.g. dataframes) to the Connection.execute() method. Most intuitive seems to be using the existing placeholder parameters mechanism, as it will be consistent with the rest of the Python API.
The text was updated successfully, but these errors were encountered:
API
Python
Description
One can use Pandas DataFrame to scan data for
COPY <table> FROM (LOAD FROM df ...)
The way of specifying the reference to a dataframe is by providing the name of the Python variable inside the text of the query:
IMO, this is very unusual way of passing a Python object reference to a function.
Typically, when a query text is composed in the code, we'd use either literal values, e.g.
where name='Alice' and age>=18
orplaceholders, e.g.
where name=$p1 and age>=$p2
and then we would supply the associations between placeholders andreferences to Python objects:
parameters={"p1": "Alice", "p2": get_min_age()}
Accessing a local variable by its name embedded into query text also has some minor performance penalty as the implementing code has to call
locals()
and lookup for thedf
name within a dict in order to get the object reference.Also, it seems impossible to use a function call rather than the name of a variable. The code below will fail:
Lastly, linters will complain about "not used variable df", as from their point of view
df
is defined but never used in the code.I suggest to consider implementing an alternative approach for passing references to Python objects (e.g. dataframes) to the
Connection.execute()
method. Most intuitive seems to be using the existing placeholder parameters mechanism, as it will be consistent with the rest of the Python API.The text was updated successfully, but these errors were encountered: