-
Notifications
You must be signed in to change notification settings - Fork 0
(local) "File or directory not found", when input_data points to a single S3 file #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Adds test case for ModelTrainer with single S3 file input - Ensures proper directory creation and cleanup in local mode - Improves S3 file handling in ModelTrainer
Resolves #9 |
To provide feedback, navigate to the Files changed tab and leave comments on the proposed code changes. Choose Start review for each comment, and then choose Request changes, and I'll propose revised changes. |
⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done |
|
||
# Upload the file to S3 | ||
s3_key = "data/single_file.csv" | ||
session.upload_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description: No error handling for potential S3 upload failures or file operations. Add try-except blocks for S3 operations and file handling to catch and handle potential errors.
Severity: Medium
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix addresses the lack of error handling for S3 upload and file operations. It adds a try-except block around the S3 upload operation to catch and handle potential errors, and wraps the temporary file removal in a try-except block within a finally clause. This ensures that errors during S3 upload or file removal are caught and logged, improving the robustness of the code.
session.upload_data( | |
# Upload the file to S3 | |
s3_key = "data/single_file.csv" | |
try: | |
session.upload_data( | |
path=tmp_file_path, | |
bucket=bucket, | |
key_prefix=s3_key, | |
) | |
except Exception as e: | |
print(f"Error uploading file to S3: {str(e)}") | |
raise | |
finally: | |
try: | |
os.unlink(tmp_file_path) # Remove the temporary file | |
except OSError as e: | |
print(f"Error removing temporary file: {str(e)}") | |
source_code = SourceCode( | |
source_dir=SOURCE_DIR, |
return session | ||
|
||
|
||
def test_get_data_source_local_path_s3_single_file(local_container, mock_s3_data_source, mock_session, monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description: The test function is long and could be split into smaller, more focused test cases. Consider breaking down the test into smaller, more specific test cases for better readability and maintainability.
Severity: Low
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix addresses the comment by splitting the original test function into two smaller, more focused test cases. The first test case, test_get_data_source_local_path_s3_single_file_creation
, verifies that a local directory is created. The second test case, test_get_data_source_local_path_s3_single_file_download
, checks that the download_folder
function is called. This improves readability and maintainability by separating concerns and making each test more specific.
def test_get_data_source_local_path_s3_single_file(local_container, mock_s3_data_source, mock_session, monkeypatch): | |
return session | |
def test_get_data_source_local_path_s3_single_file_creation(local_container, mock_s3_data_source, mock_session): | |
"""Test that _get_data_source_local_path creates a local directory for S3 single file.""" | |
local_container.sagemaker_session = mock_session | |
local_dir = local_container._get_data_source_local_path(mock_s3_data_source) | |
assert os.path.exists(local_dir) | |
shutil.rmtree(local_dir, ignore_errors=True) | |
def test_get_data_source_local_path_s3_single_file_download(local_container, mock_s3_data_source, mock_session): | |
"""Test that _get_data_source_local_path calls download_folder for S3 single file.""" | |
from unittest.mock import patch | |
local_container.sagemaker_session = mock_session | |
with patch('sagemaker.modules.local_core.local_container.download_folder') as mock_download: | |
local_container._get_data_source_local_path(mock_s3_data_source) | |
mock_download.assert_called_once() |
✅ I finished the code review, and left comments with the issues I found. I will now generate code fix suggestions. |
This pull request adds a fix for handling single files as input data in the ModelTrainer's local mode. Previously, when input_data pointed to a single S3 file, the local mode execution was not properly handling the directory creation and data downloading. The changes ensure that:
This improves the robustness of local mode training, particularly when working with single file inputs rather than directories of data.
The bulk of the meaningful changes are in the test file tests/integ/sagemaker/modules/train/test_local_model_trainer.py which adds test coverage for this scenario. The implementation change itself is relatively small but important for proper local mode functionality.