Skip to content

Commit 7bf3460

Browse files
committed
WIP docs and tests
1 parent e09523c commit 7bf3460

File tree

2 files changed

+370
-0
lines changed

2 files changed

+370
-0
lines changed

docs/docs/patching_builtins.ipynb

Lines changed: 321 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Patching Python builtins (third-party library compatibility)\n",
8+
"\n",
9+
"Not every Python library is implemented to accept pathlib-compatible objects like those implemented by cloudpathlib. Many libraries will only accept strings as filepaths. These libraries then may internally use `open`, functions from `os` and `os.path`, or other core library modules like `glob` to navigate paths and manipulate them.\n",
10+
"\n",
11+
"This means that out-of-the-box you can't just pass a `CloudPath` object to any method of function and have it work. For those implemented with `pathlib`, this will work. For anything else the code will throw an exception at some point.\n",
12+
"\n",
13+
"The long-term solution is to ask developers to implement their library to support either (1) pathlib-compatible objects for files and directories, or (2) file-like objects passed directly (e.g., so you could call `CloudPath.open` in your code and pass the the file-like object to the library).\n",
14+
"\n",
15+
"The short-term workaround that will be compatible with some libraries is to patch the builtins to make `open`, `os`, `os.path`, and `glob` work with `CloudPath` objects. Because this overrides default Python functionality, this is not on by default. When patched, these functions will use the `CloudPath` version if they are passed a `CloudPath` and will fallback to their normal implementations otherwise.\n",
16+
"\n",
17+
"These methods can be enabled by setting the following environment variables:\n",
18+
" - `CLOUDPATHLIB_PACTH_ALL=1` - patch all the builtins we implement: `open`, `os` functions, and `glob`\n",
19+
" - `CLOUDPATHLIB_PACTH_OPEN=1` - patch the builtin `open` method\n",
20+
" - `CLOUDPATHLIB_PACTH_OS_FUNCTIONS=1` - patch the `os` functions\n",
21+
" - `CLOUDPATHLIB_PACTH_GLOB=1` - patch the `glob` module\n",
22+
"\n",
23+
"You can set environment variables in many ways, but it is common to either pass it at the command line with something like `CLOUDPATHLIB_PACTH_ALL=1 python my_script.py` or to set it in your Python script with `os.environ['CLOUDPATHLIB_PACTH_ALL'] = 1`. Note, these _must_ be set before any `cloudpathlib` methods are imported.\n",
24+
"\n",
25+
"Alternatively, you can call methods to patch the functions.\n",
26+
"\n",
27+
"```python\n",
28+
"from cloudpathlib import patch_open, patch_os_functions, patch_glob\n",
29+
"\n",
30+
"# patch builtins\n",
31+
"patch_open()\n",
32+
"patch_os_functions()\n",
33+
"patch_glob()\n",
34+
"```"
35+
]
36+
},
37+
{
38+
"cell_type": "markdown",
39+
"metadata": {},
40+
"source": [
41+
"These patch methods are all context managers, so if you want to control where the patch is active, you can use them in a `with` statement. For example:"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": 1,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"%load_ext autoreload\n",
51+
"%autoreload 2"
52+
]
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": 1,
57+
"metadata": {},
58+
"outputs": [
59+
{
60+
"name": "stdout",
61+
"output_type": "stream",
62+
"text": [
63+
"Unpatched version fails:\n",
64+
"'S3Path' object is not subscriptable\n",
65+
"Patched succeeds:\n",
66+
"[S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirB/fileB'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/fileC'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/nested-dir/test.file'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirB/fileB'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/fileC'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD/fileD')]\n",
67+
"`glob` module now is equivalent to `CloudPath.glob`\n",
68+
"[S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirB/fileB'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/fileC'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/nested-dir/test.file'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirB/fileB'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/fileC'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD/fileD'), S3Path('s3://cloudpathlib-test-bucket/manual-tests/glob_test/dirC/dirD/fileD')]\n"
69+
]
70+
}
71+
],
72+
"source": [
73+
"from glob import glob\n",
74+
"\n",
75+
"from cloudpathlib import patch_glob, CloudPath\n",
76+
"\n",
77+
"try:\n",
78+
" glob(CloudPath(\"s3://cloudpathlib-test-bucket/manual-tests/**/*dir*/**\"))\n",
79+
"except Exception as e:\n",
80+
" print(\"Unpatched version fails:\")\n",
81+
" print(e)\n",
82+
"\n",
83+
"\n",
84+
"with patch_glob():\n",
85+
" print(\"Patched succeeds:\")\n",
86+
" print(glob(CloudPath(\"s3://cloudpathlib-test-bucket/manual-tests/**/*dir*/**/*\")))\n",
87+
"\n",
88+
" # or equivalently\n",
89+
" print(\"`glob` module now is equivalent to `CloudPath.glob`\")\n",
90+
" print(glob(\"**/*dir*/**/*\", root_dir=CloudPath(\"s3://cloudpathlib-test-bucket/manual-tests/\")))"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"metadata": {},
96+
"source": [
97+
"We can see a similar result for patching the functions in the `os` module."
98+
]
99+
},
100+
{
101+
"cell_type": "code",
102+
"execution_count": 13,
103+
"metadata": {},
104+
"outputs": [
105+
{
106+
"name": "stdout",
107+
"output_type": "stream",
108+
"text": [
109+
"False\n",
110+
"Patched version of `os.path.isdir` returns: None\n"
111+
]
112+
}
113+
],
114+
"source": [
115+
"import os\n",
116+
"\n",
117+
"from cloudpathlib import patch_os_functions, CloudPath\n",
118+
"\n",
119+
"print(os.path.isdir(CloudPath(\"s3://cloudpathlib-test-bucket/manual-tests/\")))\n",
120+
"\n",
121+
"\n",
122+
"# try:\n",
123+
"# os.path.isdir(\"s3://cloudpathlib-test-bucket/manual-tests/\")\n",
124+
"# except Exception as e:\n",
125+
"# print(\"Unpatched version fails:\")\n",
126+
"# print(e)\n",
127+
"\n",
128+
"\n",
129+
"with patch_os_functions():\n",
130+
" result = os.path.isdir(CloudPath(\"s3://cloudpathlib-test-bucket/manual-tests/\"))\n",
131+
" print(\"Patched version of `os.path.isdir` returns: \", result)"
132+
]
133+
},
134+
{
135+
"cell_type": "markdown",
136+
"metadata": {},
137+
"source": [
138+
"## Patching `open`\n",
139+
"\n",
140+
"Sometimes code uses the Python built-in `open` to open files and operate on them. Because of the way that is implemented, it only accepts a string to operate on. Unfortunately, that breaks usage with cloudpathlib.\n",
141+
"\n",
142+
"Instead, we can patch the built-in `open` to handle all the normal circumstances, and—if the argument is a `CloudPath`—use cloudpathlib to do the opening.\n",
143+
"\n",
144+
"### Patching `open` in Jupyter notebooks\n",
145+
"\n",
146+
"Jupyter notebooks require one extra step becaue they have their own version of `open` that is injected into the global namespace of the notebook. This means that you must _additionally_ replace that version of open with the patched version if you want to use `open` in a notebook. This can be done with the `patch_open` method by adding the following to the top of the notebook.\n",
147+
"\n",
148+
"```python\n",
149+
"from cloudpathlib import patch_open\n",
150+
"\n",
151+
"# replace jupyter's `open` with one that works with CloudPath\n",
152+
"open = patch_open()\n",
153+
"```\n",
154+
"\n",
155+
"Here's an example that doesn't work right now (for example, if you depend on a thrid-party library that calls `open`)."
156+
]
157+
},
158+
{
159+
"cell_type": "code",
160+
"execution_count": 16,
161+
"metadata": {},
162+
"outputs": [
163+
{
164+
"name": "stdout",
165+
"output_type": "stream",
166+
"text": [
167+
"[Errno 2] No such file or directory: '/var/folders/sz/c8j64tx91mj0jb0vd1s4wj700000gn/T/tmpvnzs5qnd/cloudpathlib-test-bucket/patching_builtins/file.txt'\n"
168+
]
169+
}
170+
],
171+
"source": [
172+
"from cloudpathlib import CloudPath, patch_open\n",
173+
"\n",
174+
"\n",
175+
"# example of a function within a third-party library\n",
176+
"def library_function(filepath: str):\n",
177+
" with open(filepath, \"r\") as f:\n",
178+
" print(f.read())\n",
179+
"\n",
180+
"\n",
181+
"# create file to read\n",
182+
"cp = CloudPath(\"s3://cloudpathlib-test-bucket/patching_builtins/file.txt\")\n",
183+
"\n",
184+
"# fails with a TypeError if passed a CloudPath\n",
185+
"try:\n",
186+
" library_function(cp)\n",
187+
"except Exception as e:\n",
188+
" print(e)"
189+
]
190+
},
191+
{
192+
"cell_type": "code",
193+
"execution_count": null,
194+
"metadata": {},
195+
"outputs": [],
196+
"source": []
197+
},
198+
{
199+
"cell_type": "code",
200+
"execution_count": 4,
201+
"metadata": {},
202+
"outputs": [
203+
{
204+
"ename": "TypeError",
205+
"evalue": "ContextDecorator.__call__() takes 2 positional arguments but 3 were given",
206+
"output_type": "error",
207+
"traceback": [
208+
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
209+
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
210+
"Cell \u001b[0;32mIn[4], line 16\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[38;5;66;03m# create file to read\u001b[39;00m\n\u001b[1;32m 14\u001b[0m cp \u001b[38;5;241m=\u001b[39m CloudPath(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124ms3://cloudpathlib-test-bucket/patching_builtins/file.txt\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m---> 16\u001b[0m \u001b[43mlibrary_function\u001b[49m\u001b[43m(\u001b[49m\u001b[43mcp\u001b[49m\u001b[43m)\u001b[49m\n",
211+
"Cell \u001b[0;32mIn[4], line 9\u001b[0m, in \u001b[0;36mlibrary_function\u001b[0;34m(filepath)\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mlibrary_function\u001b[39m(filepath: \u001b[38;5;28mstr\u001b[39m):\n\u001b[0;32m----> 9\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28;43mopen\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mfilepath\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mr\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m \u001b[38;5;28;01mas\u001b[39;00m f:\n\u001b[1;32m 10\u001b[0m \u001b[38;5;28mprint\u001b[39m(f\u001b[38;5;241m.\u001b[39mread())\n",
212+
"\u001b[0;31mTypeError\u001b[0m: ContextDecorator.__call__() takes 2 positional arguments but 3 were given"
213+
]
214+
}
215+
],
216+
"source": [
217+
"from cloudpathlib import CloudPath, patch_open\n",
218+
"\n",
219+
"# jupyter patch\n",
220+
"# open = patch_open()\n",
221+
"\n",
222+
"with patch_open():\n",
223+
" # example of a function within a third-party library\n",
224+
" def library_function(filepath: str):\n",
225+
" with open(filepath, \"r\") as f:\n",
226+
" print(f.read())\n",
227+
"\n",
228+
"\n",
229+
" # create file to read\n",
230+
" cp = CloudPath(\"s3://cloudpathlib-test-bucket/patching_builtins/file.txt\")\n",
231+
"\n",
232+
" library_function(cp)"
233+
]
234+
},
235+
{
236+
"cell_type": "code",
237+
"execution_count": 3,
238+
"metadata": {},
239+
"outputs": [
240+
{
241+
"name": "stdout",
242+
"output_type": "stream",
243+
"text": [
244+
"> \u001b[0;32m/var/folders/sz/c8j64tx91mj0jb0vd1s4wj700000gn/T/ipykernel_34335/3906426398.py\u001b[0m(9)\u001b[0;36mlibrary_function\u001b[0;34m()\u001b[0m\n",
245+
"\u001b[0;32m 7 \u001b[0;31m\u001b[0;31m# example of a function within a third-party library\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
246+
"\u001b[0m\u001b[0;32m 8 \u001b[0;31m\u001b[0;32mdef\u001b[0m \u001b[0mlibrary_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
247+
"\u001b[0m\u001b[0;32m----> 9 \u001b[0;31m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"r\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
248+
"\u001b[0m\u001b[0;32m 10 \u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
249+
"\u001b[0m\u001b[0;32m 11 \u001b[0;31m\u001b[0;34m\u001b[0m\u001b[0m\n",
250+
"\u001b[0m\n",
251+
"<contextlib._GeneratorContextManager object at 0x1113b3ce0>\n",
252+
"*** TypeError: ContextDecorator.__call__() missing 1 required positional argument: 'func'\n"
253+
]
254+
}
255+
],
256+
"source": [
257+
"%debug"
258+
]
259+
},
260+
{
261+
"cell_type": "markdown",
262+
"metadata": {},
263+
"source": [
264+
"# `open`"
265+
]
266+
},
267+
{
268+
"cell_type": "markdown",
269+
"metadata": {},
270+
"source": [
271+
"#os"
272+
]
273+
},
274+
{
275+
"cell_type": "code",
276+
"execution_count": 2,
277+
"metadata": {},
278+
"outputs": [
279+
{
280+
"data": {
281+
"text/plain": [
282+
"True"
283+
]
284+
},
285+
"execution_count": 2,
286+
"metadata": {},
287+
"output_type": "execute_result"
288+
}
289+
],
290+
"source": []
291+
},
292+
{
293+
"cell_type": "code",
294+
"execution_count": null,
295+
"metadata": {},
296+
"outputs": [],
297+
"source": []
298+
}
299+
],
300+
"metadata": {
301+
"kernelspec": {
302+
"display_name": "cloudpathlib",
303+
"language": "python",
304+
"name": "python3"
305+
},
306+
"language_info": {
307+
"codemirror_mode": {
308+
"name": "ipython",
309+
"version": 3
310+
},
311+
"file_extension": ".py",
312+
"mimetype": "text/x-python",
313+
"name": "python",
314+
"nbconvert_exporter": "python",
315+
"pygments_lexer": "ipython3",
316+
"version": "3.12.1"
317+
}
318+
},
319+
"nbformat": 4,
320+
"nbformat_minor": 2
321+
}

tests/test_patching.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import importlib
2+
import os
3+
4+
import pytest
5+
6+
import cloudpathlib
7+
from cloudpathlib import patch_open
8+
9+
10+
def test_patch_open(rig):
11+
cp = rig.create_cloud_path("dir_0/new_file.txt")
12+
13+
with pytest.raises(FileNotFoundError):
14+
with open(cp, "w") as f:
15+
f.write("Hello!")
16+
17+
# set via method call
18+
with patch_open():
19+
with open(cp, "w") as f:
20+
f.write("Hello!")
21+
22+
assert cp.read_text() == "Hello!"
23+
24+
# set via env var
25+
cp2 = rig.create_cloud_path("dir_0/new_file_two.txt")
26+
original_env_setting = os.environ.get("CLOUDPATHLIB_PATCH_OPEN", "")
27+
28+
try:
29+
os.environ["CLOUDPATHLIB_PATCH_OPEN"] = "1"
30+
31+
importlib.reload(cloudpathlib)
32+
33+
with open(cp2, "w") as f:
34+
f.write("Hello!")
35+
36+
assert cp2.read_text() == "Hello!"
37+
38+
finally:
39+
os.environ["CLOUDPATHLIB_PATCH_OPEN"] = original_env_setting
40+
importlib.reload(cloudpathlib)
41+
42+
# cp.write_text("Hello!")
43+
44+
# # remove cache
45+
# cp._local.unlink()
46+
47+
48+
def test_patches(rig):
49+
pass

0 commit comments

Comments
 (0)