Skip to content

8360023: Add an insertion sort implementation to Hotspot #25895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

merykitty
Copy link
Member

@merykitty merykitty commented Jun 19, 2025

Hi,

This PR adds an implementation of insertion sort to Hotspot. It is an algorithm that is inplace and stable, and it is the ideal algorithm for arrays with small numbers of elements. The motivation for this is JDK-8357186 in which a stable sort is desired and the number of elements is small. Additionally, since insertion sort is the most efficient sorting algorithm for small arrays, it can be used in non-stable sort as well.

In addition, I make some improvements to GrowableArrayIterator:

  • Make a non-const variant (our current iterator is const only).
  • Add various utility operators to align with a typical iterator.

JDK-8360032 is a follow-up work that will build a stable merge-insertion sort on top of this PR.

Please take a look and share your thoughts. Thanks very much.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8360023: Add an insertion sort implementation to Hotspot (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25895/head:pull/25895
$ git checkout pull/25895

Update a local copy of the PR:
$ git checkout pull/25895
$ git pull https://git.openjdk.org/jdk.git pull/25895/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25895

View PR using the GUI difftool:
$ git pr show -t 25895

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25895.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 19, 2025

👋 Welcome back qamai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 19, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 19, 2025
@openjdk
Copy link

openjdk bot commented Jun 19, 2025

@merykitty The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Jun 19, 2025

Comment on lines 194 to 198
GrowableArrayIterator<E, false> ncbegin() {
return GrowableArrayIterator<E, false>(this, 0);
}

GrowableArrayIterator<E, false> ncend() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think ncbegin and ncend are good names. Why not to use begin and end?
Also GrowableArrayIterator<E, false> looks confusing because of the second parameter.
Why not to use something like GrowableArrayConstIterator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it would be an incompatible change, there are places where we do things like

GrowableArrayIterator<E> it = array.begin();

I think your latter concern can be addressed by using GrowableArrayNonConstIterator.

@eastig
Copy link
Member

eastig commented Jun 19, 2025

The motivation for this is JDK-8357186 in which a stable sort is desired and the number of elements is small.

Maybe instead of introducing a generalized version of the insertion sort, you can have a function implementing the insertion sort in the context of JDK-8357186? This specialized function will be much smaller than the PR changes.

@merykitty
Copy link
Member Author

@eastig Thanks a lot for your reviews. Yes a specialized insertion function could work, but a generalized function would be more useful and easier to test. A large part of this change is to modernize GrowableArrayIterator, and the actual insertion sort is pretty small (only about 40 LOC).

@eastig
Copy link
Member

eastig commented Jun 19, 2025

... the actual insertion sort is pretty small (only about 40 LOC).

This is my point. If amount of changes is so small and there is only one use case, why do we need the type independent implementation?

@eastig
Copy link
Member

eastig commented Jun 19, 2025

a generalized function would be more useful and easier to test.

For whom would it be more useful? Usually generalized versions come from a set of specialized versions. Not vice versa.

There is no issue to test a specialized version.

@merykitty
Copy link
Member Author

Firstly, since the generalized function is no more complex than the specialized function, so why not go for the generalized function and save us the troubles generalizing the specialized function if the need arises.

Secondly, it is much much easier to test the generalized function. I can easily verify that sorting an array of int is correct but I cannot verify that the sort of an array of SigEntrys is correct, especially when I am also modifying the compare function. Additionally, a stable sort is needed because it is non-trivial to obtain the desired effect with a non-stable sort. At the same time, I can easily make a TwoInt that is compared by val and idx stores the index in the original array.

@eastig
Copy link
Member

eastig commented Jun 19, 2025

...save us the troubles generalizing the specialized function if the need arises.

Unfortunately experience teaches us, code written for purpose of future uses is never used as is.
These many years of the project, there have not been any needs for a stable sort. Your case is the first one. You cannot predict other cases.

If you want the insertion sort, I'd recommend to have it in GrowableArrayView: GrowableArrayView<>::insertion_sort().
In this case you will not have the issue with iterators names which is a big issue from my point of view.

@eastig
Copy link
Member

eastig commented Jun 19, 2025

Here the simple implementation which does not require a lot of code:

void GrowableArrayView<>::insertion_sort(int f(E*, E*)) {
    if (_data == nullptr) return;
    for (int i = 1; i < length(); i++) {
        E key = _data[i];
        int j = i - 1;
        while (j >= 0 && f(_data[j], key)) {
            _data[j + 1] = _data[j];
            j--;
        }
        _data[j + 1] = key;
    }
}

@merykitty
Copy link
Member Author

merykitty commented Jun 19, 2025

These many years of the project, there have not been any needs for a stable sort.

Just do a grep -rn ./src/hotspot -e "stable sort" you can find:

// The extension is to ensure stable sort.

If you want the insertion sort, I'd recommend to have it in GrowableArrayView: GrowableArrayView<>::insertion_sort().

We are programming in C++, I think it would be better to follow the C++ convention. The practical reason is that it prevents users not wanting to sort from having to include the sort functionality.

In this case you will not have the issue with iterators names which is a big issue from my point of view.

Now we have GrowableArrayIterator and GrowableArrayNonConstIterator, what is the issue with them?

@merykitty
Copy link
Member Author

Note that insertion sort is the most efficient sorting algorithm for small arrays, so we can use it for non-stable sort as well.

@eastig
Copy link
Member

eastig commented Jun 19, 2025

These many years of the project, there have not been any needs for a stable sort.

Just do a grep -rn ./src/hotspot -e "stable sort" you can find:

// The extension is to ensure stable sort.

Ok. Do we need to rewrite this code to use a stable sort?

If you want the insertion sort, I'd recommend to have it in GrowableArrayView: GrowableArrayView<>::insertion_sort().

We are programming in C++, I think it would be better to follow the C++ convention. The practical reason is that it prevents users not wanting to sort from having to include the sort functionality.

Yes, we use C++ but we use subset of it: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md
IMO 10 LoC is always better than 50+ LoC.
We don't write library code which uses most of C++ features. If we can solve a problem with less code which looks like C code, let's use such a solution. We don't have a goal to use as many C++ features as possible, especially approaches used in STL.

In this case you will not have the issue with iterators names which is a big issue from my point of view.

Now we have GrowableArrayIterator and GrowableArrayNonConstIterator, what is the issue with them?

One issue is the pollution of the global namespace by the name which rarely be used. Another is that this is opposite to what C++ programmers are familiar with: iterator and const_iterator. It's already confusing: users of GrowableArrayIterator might expect it to be non-constant.
So instead of fixing this confusion by getting close to C++ standards, we are diverging more from them.

@jdksjolen
Copy link
Contributor

jdksjolen commented Jun 19, 2025

Hi @merykitty ,

Thank you for taking the effort to produce tooling for everyone when you found a need for it yourself. Often, we have useful datatypes hidden away into internals that we'd like to use, or we simply do other solutions because our preferred solution is missing.

Unfortunately, I think that the ceremony required to get your insertion sort working for someone else's type will put other devs off from using it. None, AFAIK, of our datatypes are compatible with the STL's interfaces.

If we take this definition:

void GrowableArrayView<>::insertion_sort(int f(E*, E*)) {
    if (_data == nullptr) return;
    for (int i = 1; i < length(); i++) {
        E key = _data[i];
        int j = i - 1;
        while (j >= 0 && f(_data[j], key)) {
            _data[j + 1] = _data[j];
            j--;
        }
        _data[j + 1] = key;
    }
}

And change it around a bit:

template<typename T, typename C>
void insertion_sort(T* array, size_t length, C comparator) {
    for (int i = 1; i < length; i++) {
        T key = array[i]; // Should it really copy???
        int j = i - 1;
        while (j >= 0 && comparator(array[j], key)) {
            array[j + 1] = array[j];
            j--;
        }
        array[j + 1] = key;
    }
}

Then I think we have something general-ish.
For stable_sort we then do:

GrowableArray::stable_sort(C comparator) {
  insertion_sort(_data, length(), comparator);
}

This is going to be general enough, for most of our cases we have a contiguous array with a size of some fixed element type which we want to change in-place. This is sufficient for expressing that.

This also fits well with the QuickSort class that we have, it's the same type of interface. Maybe the quickSort.hpp should be renamed into sort.hpp and InsertionSort be a class in there? I'm not sure what the style guide thinks about that, but I think it's a good idea :-).

@eastig
Copy link
Member

eastig commented Jun 19, 2025

@jdksjolen,

GrowableArray::stable_sort(C comparator) {
  insertion_sort(_data, length(), comparator);
}

It has a potential problem with O(n^2).

@merykitty
Copy link
Member Author

@jdksjolen Thanks for your suggestion. Actually, I can make it so that a T* will satisfy RandomIt but there is no need for RandomIt right now. It is unfortunate because an iterator will give us more safety net, though.

I have reverted the GrowableArrayIterator changes. Insertion sort is good for small arrays so we can use it for QuickSort::sort, too. For a stable sort algorithm, implementing a merge - insertion sort should be the way, there is no need to rush for a stable sort method.

@jdksjolen
Copy link
Contributor

@jdksjolen Thanks for your suggestion. Actually, I can make it so that a T* will satisfy RandomIt but there is no need for RandomIt right now. It is unfortunate because an iterator will give us more safety net, though.

I have reverted the GrowableArrayIterator changes. Insertion sort is good for small arrays so we can use it for QuickSort::sort, too. For a stable sort algorithm, implementing a merge - insertion sort should be the way, there is no need to rush for a stable sort method.

Cheers! This looks good to me. Let's see what the rest of the community thinks.

Copy link
Contributor

@JohnTortugo JohnTortugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some drive-by comments.

class InsertionSort : AllStatic {
public:
template <class T, class Compare>
static void sort(T* data, int size, Compare comp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int size to size_t size ? or at least unsigned int.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hotspot container usually uses signed int for size. So I think int here is a sensible choice.

// which the element is not greater than the current element (note that we are traversing
// backward)
T* prev = pos - 1;
if (comp(*prev, current_elem) <= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: would be better to pass pointers here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comp usually receives references. Practically, it is almost the same as receiving pointers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I didn't notice it was a reference.

*
*/

#include "runtime/os.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: sort the imports?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have cleaned up the unused import here. What do you mean by sorting the imports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort the "#include" lines alphabetically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you want to have unittest.hpp above the utilities files. I have done that. I was confused because the convention in this area is pretty blurry as many files have the unittest.hpp as their last include.

@eastig
Copy link
Member

eastig commented Jun 19, 2025

@merykitty,
Out of curiosity, why not to use the classical implementations provided above? It's so simple and compact. It's also self documenting. It also has the minimum number of branches.

@merykitty
Copy link
Member Author

@eastig It is almost the same, isn't it?

@theRealAph
Copy link
Contributor

Cheers! This looks good to me. Let's see what the rest of the community thinks.

I'm happy to approve it with a few minor changes.

@merykitty
Copy link
Member Author

Thanks for the reviews @theRealAph , I have addressed them.

Copy link
Member

@eastig eastig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

5 participants