Skip to content

(local) "File or directory not found", when input_data points to a single S3 file #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amazon-q-developer[bot]
Copy link

This pull request adds a fix for handling single files as input data in the ModelTrainer's local mode. Previously, when input_data pointed to a single S3 file, the local mode execution was not properly handling the directory creation and data downloading. The changes ensure that:

  1. The local directory is properly created with os.makedirs() before attempting to download files
  2. Single file inputs from S3 are handled correctly in local mode
  3. Adds test coverage for the single file input scenario

This improves the robustness of local mode training, particularly when working with single file inputs rather than directories of data.

The bulk of the meaningful changes are in the test file tests/integ/sagemaker/modules/train/test_local_model_trainer.py which adds test coverage for this scenario. The implementation change itself is relatively small but important for proper local mode functionality.

- Adds test case for ModelTrainer with single S3 file input
- Ensures proper directory creation and cleanup in local mode
- Improves S3 file handling in ModelTrainer
Copy link
Author

Resolves #9

Copy link
Author

To provide feedback, navigate to the Files changed tab and leave comments on the proposed code changes. Choose Start review for each comment, and then choose Request changes, and I'll propose revised changes.

Copy link
Author

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done


# Upload the file to S3
s3_key = "data/single_file.csv"
session.upload_data(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description: No error handling for potential S3 upload failures or file operations. Add try-except blocks for S3 operations and file handling to catch and handle potential errors.

Severity: Medium

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix addresses the lack of error handling for S3 upload and file operations. It adds a try-except block around the S3 upload operation to catch and handle potential errors, and wraps the temporary file removal in a try-except block within a finally clause. This ensures that errors during S3 upload or file removal are caught and logged, improving the robustness of the code.

Suggested change
session.upload_data(
# Upload the file to S3
s3_key = "data/single_file.csv"
try:
session.upload_data(
path=tmp_file_path,
bucket=bucket,
key_prefix=s3_key,
)
except Exception as e:
print(f"Error uploading file to S3: {str(e)}")
raise
finally:
try:
os.unlink(tmp_file_path) # Remove the temporary file
except OSError as e:
print(f"Error removing temporary file: {str(e)}")
source_code = SourceCode(
source_dir=SOURCE_DIR,

return session


def test_get_data_source_local_path_s3_single_file(local_container, mock_s3_data_source, mock_session, monkeypatch):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description: The test function is long and could be split into smaller, more focused test cases. Consider breaking down the test into smaller, more specific test cases for better readability and maintainability.

Severity: Low

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix addresses the comment by splitting the original test function into two smaller, more focused test cases. The first test case, test_get_data_source_local_path_s3_single_file_creation, verifies that a local directory is created. The second test case, test_get_data_source_local_path_s3_single_file_download, checks that the download_folder function is called. This improves readability and maintainability by separating concerns and making each test more specific.

Suggested change
def test_get_data_source_local_path_s3_single_file(local_container, mock_s3_data_source, mock_session, monkeypatch):
return session
def test_get_data_source_local_path_s3_single_file_creation(local_container, mock_s3_data_source, mock_session):
"""Test that _get_data_source_local_path creates a local directory for S3 single file."""
local_container.sagemaker_session = mock_session
local_dir = local_container._get_data_source_local_path(mock_s3_data_source)
assert os.path.exists(local_dir)
shutil.rmtree(local_dir, ignore_errors=True)
def test_get_data_source_local_path_s3_single_file_download(local_container, mock_s3_data_source, mock_session):
"""Test that _get_data_source_local_path calls download_folder for S3 single file."""
from unittest.mock import patch
local_container.sagemaker_session = mock_session
with patch('sagemaker.modules.local_core.local_container.download_folder') as mock_download:
local_container._get_data_source_local_path(mock_s3_data_source)
mock_download.assert_called_once()

Copy link
Author

✅ I finished the code review, and left comments with the issues I found. I will now generate code fix suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants