Skip to content

Rotated bboxes transforms #9104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

AntoineSimoulin
Copy link
Member

@AntoineSimoulin AntoineSimoulin commented Jun 10, 2025

Add Transforms support for Rotated Boxes

This PR implements the last transforms for rotated boxes and follows what has been implemented in #9095 and #9084. This PR implements in particular the following modifications :

  • Add support for perspective for rotated boxes;
  • Fix missing tests for affine transformation and rotated boxes;
  • Fix the _affine_bounding_boxes_with_expand function for rotated boxes when expand=True;
  • Fix clamp_bounding_boxes function with behavior detailed below;
  • Add support for elastic for rotated boxes;
  • Add support for crop for rotated boxes.
  • Add missing tests for TestConvertBoundingBoxFormat;
  • Remove the SUPPORTED_BOX_FORMATS and NEW_BOX_FORMATS variable in the tests as tests for transform now full cover rotated boxes
  • Add support for sanitize for rotated boxes

Details on the clamping function

image

For the clamping, we re-order the point of the box such that the point with the lowest value on the x-axis is the point 1 (c.f. _order_bounding_boxes_points). Given the position of the 4 vertices with respect to the y-axis (c.f. cases above), we are going to adjust the points (x1, y1), (x2, y2), and (x4, y4) to make sure the point (x1, y1) is on the right side of the y-axis. We loop through the four vertices of the rotated box and apply the same operation. In the end we are guaranteed that the bounding box will be within the canvas size and will be completely included within the area of the original box.

We propose some illustration examples below (original boxes in grey and corresponding clamped boxes in blue.
image

Please note that depending on the order in which we loop through the vertices, we are not guaranteed the output boxes is the box with the largest area that meet the condition above (we might be too aggressive with the clamping. This can occur if the box is largely out of bounds along multiple axis).

Test plan

Please run the following tests:

pytest test/test_transforms_v2.py -vvv -k "TestPerspective and test_kernel_bounding_boxes"
pytest test/test_transforms_v2.py -vvv -k "TestPerspective and test_correctness_perspective_bounding_boxes"

pytest test/test_transforms_v2.py -vvv -k "TestAffine and test_transform_bounding_boxes_correctness"

pytest test/test_transforms_v2.py -vvv -k "TestRotate and test_kernel_bounding_boxes"
pytest test/test_transforms_v2.py -vvv -k "TestRotate and test_functional_bounding_boxes_correctness"
pytest test/test_transforms_v2.py -vvv -k "TestRotate and test_transform_bounding_boxes_correctness"

pytest test/test_transforms_v2.py -vvv -k "TestClampBoundingBoxes and test_kernel"
pytest test/test_transforms_v2.py -vvv -k "TestClampBoundingBoxes and test_functional"

pytest test/test_transforms_v2.py -vvv -k "TestElastic and test_kernel_bounding_boxes"

pytest test/test_transforms_v2.py -vvv -k "TestConvertBoundingBoxFormat and test_kernel"
pytest test/test_transforms_v2.py -vvv -k "TestConvertBoundingBoxFormat and test_kernel_noop"

Test Plan:
```bash
pytest test/test_transforms_v2.py -vvv -k "TestPerspective and test_kernel_bounding_boxes"

pytest test/test_transforms_v2.py -vvv -k "TestPerspective and test_correctness_perspective_bounding_boxes"
```
Test Plan:
Unit tests:
```bash
pytest test/test_transforms_v2.py -vvv -k "TestAffine and test_transform_bounding_boxes_correctness"
```
Test Plan:
Unit tests:
```bash
pytest test/test_transforms_v2.py -vvv -k "TestRotate and test_kernel_bounding_boxes"
pytest test/test_transforms_v2.py -vvv -k "TestRotate and test_functional_bounding_boxes_correctness"
pytest test/test_transforms_v2.py -vvv -k "TestRotate and test_transform_bounding_boxes_correctness"
```
Test Plan:
Unit tests:
```bash
pytest test/test_transforms_v2.py -vvv -k "TestClampBoundingBoxes and test_kernel"
pytest test/test_transforms_v2.py -vvv -k "TestClampBoundingBoxes and test_functional"
```
Test Plan:
Unit tests:
```bash
pytest test/test_transforms_v2.py -vvv -k "TestElastic and test_kernel_bounding_boxes"
```
Test Plan:
Unit tests:
```bash
pytest test/test_transforms_v2.py -vvv -k "TestConvertBoundingBoxFormat and test_kernel"
pytest test/test_transforms_v2.py -vvv -k "TestConvertBoundingBoxFormat and test_kernel_noop"
```
Copy link

pytorch-bot bot commented Jun 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9104

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures

As of commit 4a02ba0 with merge base fcca6ff (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Comment on lines 425 to 432
cond_a = x1.lt(0).logical_and(x2.ge(0)).logical_and(x3.ge(0)).logical_and(x4.ge(0))
cond_a = cond_a.logical_and(area(case_a) > area(case_b))
cond_a = cond_a.logical_or(x1.lt(0).logical_and(x2.ge(0)).logical_and(x3.ge(0)).logical_and(x4.le(0)))
cond_b = x1.lt(0).logical_and(x2.ge(0)).logical_and(x3.ge(0)).logical_and(x4.ge(0))
cond_b = cond_b.logical_and(area(case_a) <= area(case_b))
cond_b = cond_b.logical_or(x1.lt(0).logical_and(x2.le(0)).logical_and(x3.ge(0)).logical_and(x4.ge(0)))
cond_c = x1.lt(0).logical_and(x2.le(0)).logical_and(x3.ge(0)).logical_and(x4.le(0))
cond_d = x1.lt(0).logical_and(x2.le(0)).logical_and(x3.le(0)).logical_and(x4.le(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these, is there a particular reasons to use the methods? If not maybe we can rely on the plain operators like < etc.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how to do this as the operation needs to be applied along the axis. Operators such as AND and OR typically reduce the results to a single bolean value. So we will need to use logical_and. I can eventually refactor with < and > if this makes the code more readable.

@AntoineSimoulin
Copy link
Member Author

AntoineSimoulin commented Jun 13, 2025

Hey @NicolasHug I publish a fix which should fix the test and address your comments. Here is the list of the modifications:

  • Modify the make_bounding_boxes function to add clamping and padding, this ensuring that rotated boxes are build within the range of the canvas size;
  • Re-placing the reference_perspective_bounding_boxes function within the TestPerspective class to reduce the number of lines modified in this PR and since this function is only used within the class;
  • Decreasing the tightness for the test in TestAffine to atol=1e-5, rtol=2e-5 as the rotation angle had slightly higher variation when computed with the test function. Also let some tolerance for TestConvertBoundingBoxFormat;
  • Not applying the function _parallelogram_to_bounding_boxes to int rotated box as the truncation of the point from float to int does not preserve the rectangular shape of the box;
  • Apply clamping after resizing rotated bounding boxes;
  • Improve docstring for the _clamp_rotated_bounding_boxes function.

Please run the tests with:

pytest test/test_transforms_v2.py -k box -v
...
2372 passed, 1432 skipped, 5025 deselected in 67.68s (0:01:07)

if int_dtype:
# Does not apply the transformation to `int` boxes as the rounding error
# will typically not ensure the resulting box has a rectangular shape.
return parallelogram.clone()
Copy link
Member

@NicolasHug NicolasHug Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to return the parallelogram as-is, i.e. really not a rectangle, or still try to do the conversion and return something that is closer to a rectangle?

Separately it makes me wonder, maybe we should completely prevent rotated bounding boxes of integer dtype? That would probably make our life a lot easier, and users probably should be running the whole transform pipeline in float anyway, so as to avoid rounding errors compounding?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants