Skip to content

[API Proposal]: Stable Sort for Spans #60982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
grbell-ms opened this issue Oct 28, 2021 · 10 comments
Open

[API Proposal]: Stable Sort for Spans #60982

grbell-ms opened this issue Oct 28, 2021 · 10 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@grbell-ms
Copy link
Member

grbell-ms commented Oct 28, 2021

Background and motivation

@hickford explains the motivation in #15803:

As documented, List<T>.Sort does an unstable sort:

This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.

It would be useful to offer a stable sort method for when the stronger behaviour is required. The same applies to Array.Sort

  1. C++ offers both sort and stable_sort
  2. Golang offers both Sort and Stable https://golang.org/pkg/sort/#Stable
  3. Python's sort is stable https://docs.python.org/3/library/stdtypes.html#list.sort
  4. Rust's sort is stable https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort
  5. Javascript's sort is stable since EcmaScript 2019
  6. Java's Arrays.sort is stable

There are workarounds, but they all come at the cost of higher memory and CPU consumption:

  • Enumerable.OrderBy() is stable, but is not in-place.
  • A wrapper type can keep track of the insertion order of items, but that is awkward and increases memory use.
  • The two-collection sort methods can be used to emulate stable sorting, but it is awkward, adds a layer of indirection, and requires allocating a new key collection every time.

API Proposal

namespace System
{
    // This is a copy/paste of the span Sort methods, with "stable" prepended to the names.
    public static partial class MemoryExtensions
    {
        public static void StableSort<T>(this Span<T> span);
        public static void StableSort<T, TComparer>(this Span<T> span, TComparer comparer) where TComparer : IComparer<T>?;
        public static void StableSort<T>(this Span<T> span, Comparison<T> comparison);
        public static void StableSort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items);
        public static void StableSort<TKey, TValue, TComparer>(this Span<TKey> keys, Span<TValue> items, TComparer comparer) where TComparer : IComparer<TKey>?;
        public static void StableSort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items, Comparison<TKey> comparison);
    }
}

API Usage

This will be used the same way as the existing sort methods are used on spans.

Span<string> strings = new string[] {"B", "A", "b" , "a"};
strings.StableSort(StringComparer.OrdinalIgnoreCase);
// strings is now guaranteed to be in the order {"A", "a", "B", "b"}

EDIT: @Joe4evr points out that implicit conversions are not considered for extension method resolution. So stably sorting an array requires an AsSpan() call:

var strings = new string[] {"B", "A", "b" , "a"};
strings.AsSpan().StableSort(StringComparer.OrdinalIgnoreCase);

Sorting a list will also require calling AsSpan():

var strings = new List<string>(new[] {"B", "A", "b" , "a"});
strings.AsSpan().StableSort(StringComparer.OrdinalIgnoreCase);

Alternative Designs

  • We could make the existing Sort methods stable. This would be a slight performance regression for many uses, but would avoid the need to add new APIs.
  • We could add a StableSort method group to Array and/or List so users don't have to call AsSpan(), at the cost of code/metadata bloat on a core type. An extension method group could improve discoverability without adding weight to trimmed builds, but it's not necessary to unlock this functionality.
  • Someone creates a NuGet package that does this so well that it's not needed in the standard library. 😋

Risks

  • Sorting code is complex and finely tuned for performance. This addition would ~double the quantity of that code that must be maintained.
  • It's not always obvious when a stable sort is needed or not. Developers might always choose StableSort, because "stable" is "better", and lose performance for no good reason.
@grbell-ms grbell-ms added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Oct 28, 2021
@ghost ghost added area-System.Memory untriaged New issue has not been triaged by the area owner labels Oct 28, 2021
@ghost
Copy link

ghost commented Oct 28, 2021

Tagging subscribers to this area: @GrabYourPitchforks, @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

@hickford explains the motivation in #15803:

As documented, List<T>.Sort does an unstable sort:

This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.

It would be useful to offer a stable sort method for when the stronger behaviour is required. The same applies to Array.Sort

  1. C++ offers both sort and stable_sort
  2. Golang offers both Sort and Stable https://golang.org/pkg/sort/#Stable
  3. Python's sort is stable https://docs.python.org/3/library/stdtypes.html#list.sort
  4. Rust's sort is stable https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort
  5. Javascript's sort is stable since EcmaScript 2019
  6. Java's Arrays.sort is stable

There are workarounds, but they all come at the cost of higher memory and CPU consumption:

  • Enumerable.OrderBy() is stable, but is not in-place.
  • A wrapper type can keep track of the insertion order of items, but that is awkward and increases memory use.
  • The two-collection sort methods can be used to emulate stable sorting, but it is awkward, adds a layer of indirection, and requires allocating a new key collection every time.

API Proposal

namespace System
{
    // This is a copy/paste of the span Sort methods, with "stable" prepended to the names.
    public static partial class MemoryExtensions
    {
        public static void StableSort<T>(this Span<T> span);
        public static void StableSort<T, TComparer>(this Span<T> span, TComparer comparer) where TComparer : IComparer<T>?;
        public static void StableSort<T>(this Span<T> span, Comparison<T> comparison);
        public static void StableSort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items);
        public static void StableSort<TKey, TValue, TComparer>(this Span<TKey> keys, Span<TValue> items, TComparer comparer) where TComparer : IComparer<TKey>?;
        public static void StableSort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items, Comparison<TKey> comparison);
    }
}

API Usage

This will be used the same way as the existing sort methods are used on arrays and spans.

var strings = new string {"B", "A", "b" , "a"};
strings.StableSort(StringComparer.OrdinalIgnoreCase);
// strings is now guaranteed to be in the order {"A", "a", "B", "b"}

But how to stably sort a list? Fortunately, even without changing List<T> it is still possible, if clunky.

var strings = new List<string>(new[] {"B", "A", "b" , "a"});
strings.AsSpan().StableSort(StringComparer.OrdinalIgnoreCase);

Alternative Designs

  • We could make the existing Sort methods stable. This would be a slight performance regression for many uses, but would avoid the need to add new APIs.
  • We could add a StableSort method group to Array, but this seems redundant with the Span methods.
  • We could add a StableSort method group to List so users don't have to know about CollectionsMarshal.AsSpan(), at the cost of code/metadata bloat on a core type. An extension method group could improve discoverability without adding weight to trimmed builds, but it's not necessary to unlock this functionality.
  • Someone creates a NuGet package that does this so well that it's not needed in the standard library. 😋

Risks

  • Sorting code is complex and finely tuned for performance. This addition would ~double the quantity of that code that must be maintained.
  • It's not always obvious when a stable sort is needed or not. Developers might always choose StableSort, because "stable" is "better", and lose performance for no good reason.
Author: grbell-ms
Assignees: -
Labels:

api-suggestion, area-System.Memory, untriaged

Milestone: -

@GrabYourPitchforks
Copy link
Member

Repathing to runtime because the algorithm folks tend to monitor that label.

@ghost
Copy link

ghost commented Oct 28, 2021

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

@hickford explains the motivation in #15803:

As documented, List<T>.Sort does an unstable sort:

This implementation performs an unstable sort; that is, if two elements are equal, their order might not be preserved. In contrast, a stable sort preserves the order of elements that are equal.

It would be useful to offer a stable sort method for when the stronger behaviour is required. The same applies to Array.Sort

  1. C++ offers both sort and stable_sort
  2. Golang offers both Sort and Stable https://golang.org/pkg/sort/#Stable
  3. Python's sort is stable https://docs.python.org/3/library/stdtypes.html#list.sort
  4. Rust's sort is stable https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort
  5. Javascript's sort is stable since EcmaScript 2019
  6. Java's Arrays.sort is stable

There are workarounds, but they all come at the cost of higher memory and CPU consumption:

  • Enumerable.OrderBy() is stable, but is not in-place.
  • A wrapper type can keep track of the insertion order of items, but that is awkward and increases memory use.
  • The two-collection sort methods can be used to emulate stable sorting, but it is awkward, adds a layer of indirection, and requires allocating a new key collection every time.

API Proposal

namespace System
{
    // This is a copy/paste of the span Sort methods, with "stable" prepended to the names.
    public static partial class MemoryExtensions
    {
        public static void StableSort<T>(this Span<T> span);
        public static void StableSort<T, TComparer>(this Span<T> span, TComparer comparer) where TComparer : IComparer<T>?;
        public static void StableSort<T>(this Span<T> span, Comparison<T> comparison);
        public static void StableSort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items);
        public static void StableSort<TKey, TValue, TComparer>(this Span<TKey> keys, Span<TValue> items, TComparer comparer) where TComparer : IComparer<TKey>?;
        public static void StableSort<TKey, TValue>(this Span<TKey> keys, Span<TValue> items, Comparison<TKey> comparison);
    }
}

API Usage

This will be used the same way as the existing sort methods are used on arrays and spans.

var strings = new string {"B", "A", "b" , "a"};
strings.StableSort(StringComparer.OrdinalIgnoreCase);
// strings is now guaranteed to be in the order {"A", "a", "B", "b"}

But how to stably sort a list? Fortunately, even without changing List<T> it is still possible, if clunky.

var strings = new List<string>(new[] {"B", "A", "b" , "a"});
strings.AsSpan().StableSort(StringComparer.OrdinalIgnoreCase);

Alternative Designs

  • We could make the existing Sort methods stable. This would be a slight performance regression for many uses, but would avoid the need to add new APIs.
  • We could add a StableSort method group to Array, but this seems redundant with the Span methods.
  • We could add a StableSort method group to List so users don't have to know about CollectionsMarshal.AsSpan(), at the cost of code/metadata bloat on a core type. An extension method group could improve discoverability without adding weight to trimmed builds, but it's not necessary to unlock this functionality.
  • Someone creates a NuGet package that does this so well that it's not needed in the standard library. 😋

Risks

  • Sorting code is complex and finely tuned for performance. This addition would ~double the quantity of that code that must be maintained.
  • It's not always obvious when a stable sort is needed or not. Developers might always choose StableSort, because "stable" is "better", and lose performance for no good reason.
Author: grbell-ms
Assignees: -
Labels:

api-suggestion, area-System.Runtime, untriaged

Milestone: -

@Joe4evr
Copy link
Contributor

Joe4evr commented Oct 29, 2021

Minor thing: Implicit conversions are not considered for extension method resolution. This isn't insurmountable, but it does reduce discoverability more since this new extension group also won't show up in IntelliSense for T[].

@grbell-ms
Copy link
Member Author

Thanks for pointing that our @Joe4evr, I've updated the proposal to reflect that.

@grbell-ms
Copy link
Member Author

What do I need to do to get this proposal reviewed?

@tannergooding
Copy link
Member

This needs some additional consideration/weigh in from folks like @stephentoub.

This seems like a fairly scenario specific thing that could be achieved via a NuGet package if desired/necessary.

It would likewise be good to explicitly list scenarios where stable sort is necessary. For unmanaged types, it will make no difference (5 is 5 is 5, it is impossible to distinguish them). For some kinds of reference types, it "might" make a difference but that's easily handleable by a custom equality comparer being passed into sort ("string" is "string", but may be different instances; this is normally not important but can be made important with a custom comparer).

@stephentoub
Copy link
Member

My stance is largely the same as @tannergooding's. There are already multiple available nuget packages that proffer stable sorting routines; are they insufficient? If not, why not? A great sorting implementation is a non-trivial amount of code to tune and maintain, especially one that covers all the variations suggested in the proposal. And I'd expect the cited workarounds along with nuget implementations to cover the majority of the needs such that this wouldn't need to be built-in. I'm not wholly against it, but we'd need really good motivation to undertake it (and the cited motivation of "others do it so you should to" generally doesn't meet the bar ;-).

@coderb
Copy link

coderb commented Apr 4, 2022

@stephentoub i have a use case where winforms needs a stable sort. i don't think relying on a nuget package is a good option.

dotnet/winforms#5202
In general, user interactive multi-column or multi-field sorts may be easily implemented using mutliple passes with a stable sort on a varying criteria (... think speadsheet columns or grid views).

Regarding maintenance, one option would be to port an existing well-tested and well-vetted routine from another runtime, eg Java's timsort or Rust's timsort-esque routine.

A java port may make the most sense because the code maps fairly straightforwardly over to c#. And it has been used in production for many years.

(for reference, under 1000 lines of code, authored by josh bloch so you know it's good ;-) )
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/TimSort.java

@jkotas
Copy link
Member

jkotas commented Apr 9, 2022

i have a use case where winforms needs a stable sort.

I believe that winforms can achieve what it needs just fine even without stable sort provided by core libraries: dotnet/winforms#5202 (comment) . We can continue discussion on the winforms issue.

A java port may make the most sense because the code maps fairly straightforwardly over to c#.

The code you have linked has GPL license. GPL is not compatible with this project. We would not be able to accept an implementation that was based on it.

@tannergooding tannergooding added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jul 15, 2022
@tannergooding tannergooding added this to the Future milestone Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

7 participants