Skip to content

Implement HTTP(s) support #468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
eb3bb5f
intial implementation of http
pjbull Sep 1, 2024
38e9206
remove unused func
pjbull Sep 1, 2024
36d61d9
add https support
pjbull Sep 16, 2024
0d80817
Change dir detection to /
pjbull Sep 17, 2024
ff7963b
lint
pjbull Sep 17, 2024
d62e361
Update https tests
pjbull Feb 14, 2025
13577e9
make sigs match
pjbull Feb 14, 2025
8b1d9c9
Add parsed_url
pjbull Feb 14, 2025
3668b82
Add tests for verb methods
pjbull Feb 14, 2025
719aeea
lint
pjbull Feb 14, 2025
1fba69b
test parsed_url
pjbull Feb 14, 2025
7aeef1e
test preserved properties
pjbull Feb 16, 2025
8b6fa5b
spread out ports; fix warnings
pjbull Feb 16, 2025
66c9c82
lint
pjbull Feb 16, 2025
3061f89
fix full_match
pjbull Feb 17, 2025
44d55ff
sleepy upload test
pjbull Feb 17, 2025
53a854e
docs wip
pjbull Feb 17, 2025
6ebf0ea
Update docs
pjbull Feb 18, 2025
b0d05e2
lint
pjbull Feb 18, 2025
698ab4a
improve http docs
pjbull Feb 20, 2025
7064f34
add table
pjbull Feb 20, 2025
402f4fe
lint
pjbull Feb 20, 2025
14fd932
try skipping http rigs on windows in CI
pjbull Feb 20, 2025
01305b3
more stable tests
pjbull Feb 27, 2025
e44c495
test flakiness
pjbull Feb 27, 2025
16d0137
refresh cert
pjbull Mar 6, 2025
4fafb7e
flaky test fix
pjbull Apr 19, 2025
d0e819a
simplify test servers
pjbull Apr 19, 2025
86f5847
possibly?
pjbull Apr 20, 2025
3126f61
redo certs for 127.0.0.1
pjbull Apr 21, 2025
8cbdff3
update command
pjbull Apr 21, 2025
1443ff5
Remove pytz and adjust sleep
pjbull Apr 21, 2025
da280b6
update rigs
pjbull Apr 21, 2025
b71e40c
update missing timestap
pjbull Apr 21, 2025
c259f6e
more resilient
pjbull Apr 21, 2025
ed53b45
sleepier
pjbull Apr 21, 2025
09a07d3
Tweaks
pjbull Apr 22, 2025
7208f76
changelog
pjbull Apr 22, 2025
0554ecd
Add explicit filename tests
pjbull May 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## Unreleased

- Fixed `rmtree` fail on Azure with no `hns` and more than 256 blobs to drop (Issue [#509](https://github.com/drivendataorg/cloudpathlib/issues/509), PR [#508](https://github.com/drivendataorg/cloudpathlib/pull/508), thanks @alikefia)
- Added support for http(s) urls with `HttpClient`, `HttpPath`, `HttpsClient`, and `HttpsPath`. (Issue [#455](https://github.com/drivendataorg/cloudpathlib/issues/455 ), PR [#468](https://github.com/drivendataorg/cloudpathlib/pull/468))

## v0.21.0 (2025-03-03)

Expand Down
173 changes: 91 additions & 82 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,88 +124,97 @@ list(root_dir.glob('**/*.txt'))

Most methods and properties from `pathlib.Path` are supported except for the ones that don't make sense in a cloud context. There are a few additional methods or properties that relate to specific cloud services or specifically for cloud paths.

| Methods + properties | `AzureBlobPath` | `S3Path` | `GSPath` |
|:-----------------------|:------------------|:-----------|:-----------|
| `absolute` | ✅ | ✅ | ✅ |
| `anchor` | ✅ | ✅ | ✅ |
| `as_uri` | ✅ | ✅ | ✅ |
| `drive` | ✅ | ✅ | ✅ |
| `exists` | ✅ | ✅ | ✅ |
| `glob` | ✅ | ✅ | ✅ |
| `is_absolute` | ✅ | ✅ | ✅ |
| `is_dir` | ✅ | ✅ | ✅ |
| `is_file` | ✅ | ✅ | ✅ |
| `is_relative_to` | ✅ | ✅ | ✅ |
| `iterdir` | ✅ | ✅ | ✅ |
| `joinpath` | ✅ | ✅ | ✅ |
| `match` | ✅ | ✅ | ✅ |
| `mkdir` | ✅ | ✅ | ✅ |
| `name` | ✅ | ✅ | ✅ |
| `open` | ✅ | ✅ | ✅ |
| `parent` | ✅ | ✅ | ✅ |
| `parents` | ✅ | ✅ | ✅ |
| `parts` | ✅ | ✅ | ✅ |
| `read_bytes` | ✅ | ✅ | ✅ |
| `read_text` | ✅ | ✅ | ✅ |
| `relative_to` | ✅ | ✅ | ✅ |
| `rename` | ✅ | ✅ | ✅ |
| `replace` | ✅ | ✅ | ✅ |
| `resolve` | ✅ | ✅ | ✅ |
| `rglob` | ✅ | ✅ | ✅ |
| `rmdir` | ✅ | ✅ | ✅ |
| `samefile` | ✅ | ✅ | ✅ |
| `stat` | ✅ | ✅ | ✅ |
| `stem` | ✅ | ✅ | ✅ |
| `suffix` | ✅ | ✅ | ✅ |
| `suffixes` | ✅ | ✅ | ✅ |
| `touch` | ✅ | ✅ | ✅ |
| `unlink` | ✅ | ✅ | ✅ |
| `with_name` | ✅ | ✅ | ✅ |
| `with_stem` | ✅ | ✅ | ✅ |
| `with_suffix` | ✅ | ✅ | ✅ |
| `write_bytes` | ✅ | ✅ | ✅ |
| `write_text` | ✅ | ✅ | ✅ |
| `as_posix` | ❌ | ❌ | ❌ |
| `chmod` | ❌ | ❌ | ❌ |
| `cwd` | ❌ | ❌ | ❌ |
| `expanduser` | ❌ | ❌ | ❌ |
| `group` | ❌ | ❌ | ❌ |
| `hardlink_to` | ❌ | ❌ | ❌ |
| `home` | ❌ | ❌ | ❌ |
| `is_block_device` | ❌ | ❌ | ❌ |
| `is_char_device` | ❌ | ❌ | ❌ |
| `is_fifo` | ❌ | ❌ | ❌ |
| `is_mount` | ❌ | ❌ | ❌ |
| `is_reserved` | ❌ | ❌ | ❌ |
| `is_socket` | ❌ | ❌ | ❌ |
| `is_symlink` | ❌ | ❌ | ❌ |
| `lchmod` | ❌ | ❌ | ❌ |
| `link_to` | ❌ | ❌ | ❌ |
| `lstat` | ❌ | ❌ | ❌ |
| `owner` | ❌ | ❌ | ❌ |
| `readlink` | ❌ | ❌ | ❌ |
| `root` | ❌ | ❌ | ❌ |
| `symlink_to` | ❌ | ❌ | ❌ |
| `as_url` | ✅ | ✅ | ✅ |
| `clear_cache` | ✅ | ✅ | ✅ |
| `cloud_prefix` | ✅ | ✅ | ✅ |
| `copy` | ✅ | ✅ | ✅ |
| `copytree` | ✅ | ✅ | ✅ |
| `download_to` | ✅ | ✅ | ✅ |
| `etag` | ✅ | ✅ | ✅ |
| `fspath` | ✅ | ✅ | ✅ |
| `is_junction` | ✅ | ✅ | ✅ |
| `is_valid_cloudpath` | ✅ | ✅ | ✅ |
| `rmtree` | ✅ | ✅ | ✅ |
| `upload_from` | ✅ | ✅ | ✅ |
| `validate` | ✅ | ✅ | ✅ |
| `walk` | ✅ | ✅ | ✅ |
| `with_segments` | ✅ | ✅ | ✅ |
| `blob` | ✅ | ❌ | ✅ |
| `bucket` | ❌ | ✅ | ✅ |
| `container` | ✅ | ❌ | ❌ |
| `key` | ❌ | ✅ | ❌ |
| `md5` | ✅ | ❌ | ✅ |
| Methods + properties | `AzureBlobPath` | `GSPath` | `HttpsPath` | `S3Path` |
|:-----------------------|:------------------|:-----------|:--------------|:-----------|
| `absolute` | ✅ | ✅ | ✅ | ✅ |
| `anchor` | ✅ | ✅ | ✅ | ✅ |
| `as_uri` | ✅ | ✅ | ✅ | ✅ |
| `drive` | ✅ | ✅ | ✅ | ✅ |
| `exists` | ✅ | ✅ | ✅ | ✅ |
| `glob` | ✅ | ✅ | ✅ | ✅ |
| `is_absolute` | ✅ | ✅ | ✅ | ✅ |
| `is_dir` | ✅ | ✅ | ✅ | ✅ |
| `is_file` | ✅ | ✅ | ✅ | ✅ |
| `is_junction` | ✅ | ✅ | ✅ | ✅ |
| `is_relative_to` | ✅ | ✅ | ✅ | ✅ |
| `iterdir` | ✅ | ✅ | ✅ | ✅ |
| `joinpath` | ✅ | ✅ | ✅ | ✅ |
| `match` | ✅ | ✅ | ✅ | ✅ |
| `mkdir` | ✅ | ✅ | ✅ | ✅ |
| `name` | ✅ | ✅ | ✅ | ✅ |
| `open` | ✅ | ✅ | ✅ | ✅ |
| `parent` | ✅ | ✅ | ✅ | ✅ |
| `parents` | ✅ | ✅ | ✅ | ✅ |
| `parts` | ✅ | ✅ | ✅ | ✅ |
| `read_bytes` | ✅ | ✅ | ✅ | ✅ |
| `read_text` | ✅ | ✅ | ✅ | ✅ |
| `relative_to` | ✅ | ✅ | ✅ | ✅ |
| `rename` | ✅ | ✅ | ✅ | ✅ |
| `replace` | ✅ | ✅ | ✅ | ✅ |
| `resolve` | ✅ | ✅ | ✅ | ✅ |
| `rglob` | ✅ | ✅ | ✅ | ✅ |
| `rmdir` | ✅ | ✅ | ✅ | ✅ |
| `samefile` | ✅ | ✅ | ✅ | ✅ |
| `stat` | ✅ | ✅ | ✅ | ✅ |
| `stem` | ✅ | ✅ | ✅ | ✅ |
| `suffix` | ✅ | ✅ | ✅ | ✅ |
| `suffixes` | ✅ | ✅ | ✅ | ✅ |
| `touch` | ✅ | ✅ | ✅ | ✅ |
| `unlink` | ✅ | ✅ | ✅ | ✅ |
| `walk` | ✅ | ✅ | ✅ | ✅ |
| `with_name` | ✅ | ✅ | ✅ | ✅ |
| `with_segments` | ✅ | ✅ | ✅ | ✅ |
| `with_stem` | ✅ | ✅ | ✅ | ✅ |
| `with_suffix` | ✅ | ✅ | ✅ | ✅ |
| `write_bytes` | ✅ | ✅ | ✅ | ✅ |
| `write_text` | ✅ | ✅ | ✅ | ✅ |
| `as_posix` | ❌ | ❌ | ❌ | ❌ |
| `chmod` | ❌ | ❌ | ❌ | ❌ |
| `cwd` | ❌ | ❌ | ❌ | ❌ |
| `expanduser` | ❌ | ❌ | ❌ | ❌ |
| `group` | ❌ | ❌ | ❌ | ❌ |
| `hardlink_to` | ❌ | ❌ | ❌ | ❌ |
| `home` | ❌ | ❌ | ❌ | ❌ |
| `is_block_device` | ❌ | ❌ | ❌ | ❌ |
| `is_char_device` | ❌ | ❌ | ❌ | ❌ |
| `is_fifo` | ❌ | ❌ | ❌ | ❌ |
| `is_mount` | ❌ | ❌ | ❌ | ❌ |
| `is_reserved` | ❌ | ❌ | ❌ | ❌ |
| `is_socket` | ❌ | ❌ | ❌ | ❌ |
| `is_symlink` | ❌ | ❌ | ❌ | ❌ |
| `lchmod` | ❌ | ❌ | ❌ | ❌ |
| `lstat` | ❌ | ❌ | ❌ | ❌ |
| `owner` | ❌ | ❌ | ❌ | ❌ |
| `readlink` | ❌ | ❌ | ❌ | ❌ |
| `root` | ❌ | ❌ | ❌ | ❌ |
| `symlink_to` | ❌ | ❌ | ❌ | ❌ |
| `as_url` | ✅ | ✅ | ✅ | ✅ |
| `clear_cache` | ✅ | ✅ | ✅ | ✅ |
| `client` | ✅ | ✅ | ✅ | ✅ |
| `cloud_prefix` | ✅ | ✅ | ✅ | ✅ |
| `copy` | ✅ | ✅ | ✅ | ✅ |
| `copytree` | ✅ | ✅ | ✅ | ✅ |
| `download_to` | ✅ | ✅ | ✅ | ✅ |
| `from_uri` | ✅ | ✅ | ✅ | ✅ |
| `fspath` | ✅ | ✅ | ✅ | ✅ |
| `full_match` | ✅ | ✅ | ✅ | ✅ |
| `is_valid_cloudpath` | ✅ | ✅ | ✅ | ✅ |
| `parser` | ✅ | ✅ | ✅ | ✅ |
| `rmtree` | ✅ | ✅ | ✅ | ✅ |
| `upload_from` | ✅ | ✅ | ✅ | ✅ |
| `validate` | ✅ | ✅ | ✅ | ✅ |
| `etag` | ✅ | ✅ | ❌ | ✅ |
| `blob` | ✅ | ✅ | ❌ | ❌ |
| `bucket` | ❌ | ✅ | ❌ | ✅ |
| `md5` | ✅ | ✅ | ❌ | ❌ |
| `container` | ✅ | ❌ | ❌ | ❌ |
| `delete` | ❌ | ❌ | ✅ | ❌ |
| `get` | ❌ | ❌ | ✅ | ❌ |
| `head` | ❌ | ❌ | ✅ | ❌ |
| `key` | ❌ | ❌ | ❌ | ✅ |
| `parsed_url` | ❌ | ❌ | ✅ | ❌ |
| `post` | ❌ | ❌ | ✅ | ❌ |
| `put` | ❌ | ❌ | ✅ | ❌ |

----

Expand Down
10 changes: 8 additions & 2 deletions cloudpathlib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@
from .azure.azblobclient import AzureBlobClient
from .azure.azblobpath import AzureBlobPath
from .cloudpath import CloudPath, implementation_registry
from .s3.s3client import S3Client
from .gs.gspath import GSPath
from .gs.gsclient import GSClient
from .gs.gspath import GSPath
from .http.httpclient import HttpClient, HttpsClient
from .http.httppath import HttpPath, HttpsPath
from .s3.s3client import S3Client
from .s3.s3path import S3Path


Expand All @@ -27,6 +29,10 @@
"implementation_registry",
"GSClient",
"GSPath",
"HttpClient",
"HttpsClient",
"HttpPath",
"HttpsPath",
"S3Client",
"S3Path",
]
20 changes: 11 additions & 9 deletions cloudpathlib/cloudpath.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
Generator,
List,
Optional,
Sequence,
Tuple,
Type,
TYPE_CHECKING,
Expand Down Expand Up @@ -299,11 +298,11 @@ def __setstate__(self, state: Dict[str, Any]) -> None:

@property
def _no_prefix(self) -> str:
return self._str[len(self.cloud_prefix) :]
return self._str[len(self.anchor) :]

@property
def _no_prefix_no_drive(self) -> str:
return self._str[len(self.cloud_prefix) + len(self.drive) :]
return self._str[len(self.anchor) + len(self.drive) :]

@overload
@classmethod
Expand Down Expand Up @@ -909,9 +908,9 @@ def relative_to(self, other: Self, walk_up: bool = False) -> PurePosixPath:
# absolute)
if not isinstance(other, CloudPath):
raise ValueError(f"{self} is a cloud path, but {other} is not")
if self.cloud_prefix != other.cloud_prefix:
if self.anchor != other.anchor:
raise ValueError(
f"{self} is a {self.cloud_prefix} path, but {other} is a {other.cloud_prefix} path"
f"{self} is a {self.anchor} path, but {other} is a {other.anchor} path"
)

kwargs = dict(walk_up=walk_up)
Expand Down Expand Up @@ -939,6 +938,9 @@ def full_match(self, pattern: str, case_sensitive: Optional[bool] = None) -> boo
# strip scheme from start of pattern before testing
if pattern.startswith(self.anchor + self.drive):
pattern = pattern[len(self.anchor + self.drive) :]
elif pattern.startswith(self.anchor):
# for http paths, keep leading slash
pattern = pattern[len(self.anchor) - 1 :]

# remove drive, which is kept on normal dispatch to pathlib
return PurePosixPath(self._no_prefix_no_drive).full_match( # type: ignore[attr-defined]
Expand Down Expand Up @@ -969,7 +971,7 @@ def parent(self) -> Self:
return self._dispatch_to_path("parent")

@property
def parents(self) -> Sequence[Self]:
def parents(self) -> Tuple[Self, ...]:
return self._dispatch_to_path("parents")

@property
Expand Down Expand Up @@ -1224,7 +1226,7 @@ def copytree(self, destination, force_overwrite_to_cloud=None, ignore=None):
)
elif subpath.is_dir():
subpath.copytree(
destination / subpath.name,
destination / (subpath.name + ("" if subpath.name.endswith("/") else "/")),
force_overwrite_to_cloud=force_overwrite_to_cloud,
ignore=ignore,
)
Expand Down Expand Up @@ -1258,8 +1260,8 @@ def _new_cloudpath(self, path: Union[str, os.PathLike]) -> Self:
path = path[1:]

# add prefix/anchor if it is not already
if not path.startswith(self.cloud_prefix):
path = f"{self.cloud_prefix}{path}"
if not path.startswith(self.anchor):
path = f"{self.anchor}{path}"

return self.client.CloudPath(path)

Expand Down
9 changes: 9 additions & 0 deletions cloudpathlib/http/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from .httpclient import HttpClient, HttpsClient
from .httppath import HttpPath, HttpsPath

__all__ = [
"HttpClient",
"HttpPath",
"HttpsClient",
"HttpsPath",
]
Loading
Loading