-
Notifications
You must be signed in to change notification settings - Fork 68
Implement HTTP(s) support #468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
We shouldn't call this "anchor". Anchor has a different meaning in a close enough context that I think this will be confusing. In particular, "anchor" colloquially refers to "anchor tags" in HTML documents, which you can reference within a URL with fragment identifiers
I'm not sure I like this. There are often genuine files that don't have file extensions that are usually plain text files, e.g., My impression is that, (while it's not absolute) the most common convention and default for many web servers and frameworks is to use a trailing slash explicitly to serve directories. |
Hm, yeah, I can see this is potentially confusing, but we're not calling it "anchor" arbitrarily. We're looking for the right analogy to populate the existing
Yeah, this was the other default rule I considered—I think you're right it's probably a more reliable default. That is how the python server works (except for the root, which does not redirect to the slash). Also planning for the method to be overridable so people can configure for their particular servers if it is different. |
So one thing to be aware of is your code assumes no one would ever do
|
@MattOates I guess I was thinking that this just sticks around on that path and any paths derived from it (since we don't mess with netloc) and should "just work." Does it not? Maybe there are just some bugs to clean up (sorry, haven't had a chance to poke at it and confirm/deny).
I think that we could (1) support env vars for basic auth like most of the cloud providers have for auth, (2) point people towards how to set the default client, or (3) recommend an explicit client. |
Does that mean we're going to print out the username and password in plaintext in string representations? |
Yeah, FWIW urllib also is happy to just print that out too. We could be more conservative, but I'm not inclined to add a special case if the standard lib doesn't. In [1]: from urllib.parse import urlparse
In [2]: urlparse("http://user:[email protected]")
Out[3]: ParseResult(scheme='http', netloc='user:[email protected]', path='', params='', query='', fragment='') |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #468 +/- ##
========================================
+ Coverage 93.4% 93.6% +0.2%
========================================
Files 23 26 +3
Lines 1800 1996 +196
========================================
+ Hits 1682 1870 +188
- Misses 118 126 +8
🚀 New features to boost your workflow:
|
OK @jayqi, this is fully working and ready for review! |
(root / "fileA").write_text("fileA") | ||
(root / "dirC" / "dirD" / "fileD.txt").write_text("fileD") | ||
(root / "dirC" / "fileC.txt").write_text("fileC") | ||
(root / "fileA.txt").write_text("fileA") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we still covering cases elsewhere where a file has no extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make sure we have some
cloud_root.mkdir() | ||
|
||
_make_glob_directory(cloud_root) | ||
|
||
local_root = tmp_path / "glob-tests" | ||
local_root = tmp_path / "glob-tests/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally nervous about all of these tests that add trailing slashes to all of our paths. Are we at risk of changing what our tests are covering for the cloud providers? Maybe it's fine, Codecov doesn't say our coverage is going down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make sure we have explicit tests for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure we still recognize dirs properly is tested here
cloudpathlib/tests/test_cloudpath_file_io.py
Lines 326 to 347 in 6a6d3c5
def test_is_dir_is_file(rig, tmp_path): | |
# test on directories | |
dir_slash = rig.create_cloud_path("dir_0/") | |
dir_no_slash = rig.create_cloud_path("dir_0") | |
dir_nested_slash = rig.create_cloud_path("dir_1/dir_1_0/") | |
dir_nested_no_slash = rig.create_cloud_path("dir_1/dir_1_0") | |
for test_case in [dir_slash, dir_no_slash, dir_nested_slash, dir_nested_no_slash]: | |
assert test_case.is_dir() | |
assert not test_case.is_file() | |
file = rig.create_cloud_path("dir_0/file0_0.txt") | |
file_nested = rig.create_cloud_path("dir_1/dir_1_0/file_1_0_0.txt") | |
for test_case in [file, file_nested]: | |
assert test_case.is_file() | |
assert not test_case.is_dir() | |
# does not exist (same behavior as pathlib.Path that does not exist) | |
non_existent = rig.create_cloud_path("dir_0/not_a_file") | |
assert not non_existent.is_file() | |
assert not non_existent.is_dir() |
def _move_file(self, src: HttpPath, dst: HttpPath, remove_src: bool = True) -> HttpPath: | ||
# .fspath will download the file so the local version can be uploaded | ||
self._upload_file(src.fspath, dst) | ||
if remove_src: | ||
self._remove(src) | ||
return dst |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes me nervous that this is not done as a transaction. Is there a way to check PUT or DELETE support up front?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_upload_file
will throw if it doesn't get a 200 response on finishing, so it should never be the case that we remove a file that was not uploaded. The only risk is that the subsequent _remove
fails (and raises) when a server does not support it. In that case raising on _remove
versus pre-emptively failing on _move_file
does not seem worth it for an extra server call.
We could catch and log a warning it was not removed before reraising?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a clear error message that indicates the file was copied seems worth it to me. Having a partial move feels like a surprising thing that could be confusing and could cause inconsistency or other correctness issues in certain cases.
Initial pass at http.
http.server
http.server
and also looks like it should work for apache and nginx file servers from my googling. (allows user to override this method for their own server).anchor
instead of.cloud_prefix
where appropriate. For cloud providers,.anchor
is the same as.cloud_prefix
(e.g.,s3://
). For http, the anchor should be the serverhttp://example.com
and the.cloud_prefix
should behttp://
. Constructing new paths should use.anchor
Limitations/caveats:
Assumes that url must have suffix (e.g.,http://example.com/file.txt
) to be a file; if no suffix, assumes dir. This is definitely not true of real-world URLs, but maybe is an ok assumption for anything serving files?/
to be directories (user can pass custom test for urls if they have a different scenario)Left to do:
https
Closes #455