Skip to content

Commit fedde5f

Browse files
authored
Do not cleanup archive download tempdir immediately (#7479)
* Do not cleanup download tempdir immediately The previous logic forced us to handle populating the download directory in this function right next to the download and hash checking. By extending the lifetime of the directory we can more easily separate the code. This also allows for additional optimizations later: by using metadata from wheels directly instead of unpacking them, we can avoid extracting wheels unnecessarily. Unpacked files can be easily 3x larger than the archives themselves, so this should reduce disk utilization and general IO significantly.
1 parent b9bdad2 commit fedde5f

File tree

1 file changed

+23
-23
lines changed

1 file changed

+23
-23
lines changed

src/pip/_internal/operations/prepare.py

+23-23
Original file line numberDiff line numberDiff line change
@@ -144,32 +144,32 @@ def unpack_http_url(
144144
hashes=None, # type: Optional[Hashes]
145145
):
146146
# type: (...) -> None
147-
with TempDirectory(kind="unpack") as temp_dir:
148-
# If a download dir is specified, is the file already downloaded there?
149-
already_downloaded_path = None
150-
if download_dir:
151-
already_downloaded_path = _check_download_dir(
152-
link, download_dir, hashes
153-
)
147+
temp_dir = TempDirectory(kind="unpack", globally_managed=True)
148+
# If a download dir is specified, is the file already downloaded there?
149+
already_downloaded_path = None
150+
if download_dir:
151+
already_downloaded_path = _check_download_dir(
152+
link, download_dir, hashes
153+
)
154154

155-
if already_downloaded_path:
156-
from_path = already_downloaded_path
157-
content_type = mimetypes.guess_type(from_path)[0]
158-
else:
159-
# let's download to a tmp dir
160-
from_path, content_type = _download_http_url(
161-
link, downloader, temp_dir.path, hashes
162-
)
155+
if already_downloaded_path:
156+
from_path = already_downloaded_path
157+
content_type = mimetypes.guess_type(from_path)[0]
158+
else:
159+
# let's download to a tmp dir
160+
from_path, content_type = _download_http_url(
161+
link, downloader, temp_dir.path, hashes
162+
)
163163

164-
# unpack the archive to the build dir location. even when only
165-
# downloading archives, they have to be unpacked to parse dependencies
166-
unpack_file(from_path, location, content_type)
164+
# unpack the archive to the build dir location. even when only
165+
# downloading archives, they have to be unpacked to parse dependencies
166+
unpack_file(from_path, location, content_type)
167167

168-
# a download dir is specified; let's copy the archive there
169-
if download_dir and not os.path.exists(
170-
os.path.join(download_dir, link.filename)
171-
):
172-
_copy_file(from_path, download_dir, link)
168+
# a download dir is specified; let's copy the archive there
169+
if download_dir and not os.path.exists(
170+
os.path.join(download_dir, link.filename)
171+
):
172+
_copy_file(from_path, download_dir, link)
173173

174174

175175
def _copy2_ignoring_special_files(src, dest):

0 commit comments

Comments
 (0)