Skip to content

Implement a unique function returning only the unique values in a vector. #940 #965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions doc/specs/stdlib_sorting_unique.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
title: unique function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If unique is included in stdlib_sorting, the specs of unique should be added in stdlib_sorting.

---

# The `unique` function

[TOC]

## Introduction

This function returns an array containing only the unique values extracted from an input array. This is useful for removing duplicates from datasets and finding the distinct elements in a collection.

## Status

The `unique` function is currently in **experimental** status.

## Version History
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same format as the other stdlib specs should be used.


|Version|Change|
|---|---|
|v0.1.0|Initial functionality in experimental status|
Comment on lines +19 to +21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|Version|Change|
|---|---|
|v0.1.0|Initial functionality in experimental status|
Experimental


## Requirements

This function has been designed to handle arrays of different types, including intrinsic numeric types, character arrays, and `string_type` arrays. The function should be efficient while maintaining an easy-to-use interface.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not requirements. The content of these sentences will be included in the description of the API of unique


## Usage

```fortran
! Get unique values from an integer array
integer :: x(5) = [1, 2, 3, 3, 4]
integer, allocatable :: y(:)
y = unique(x) ! y will be [1, 2, 3, 4]

! Get sorted unique values from a real array
real :: a(8) = [3.1, 2.5, 7.2, 3.1, 2.5, 8.0, 7.2, 9.5]
real, allocatable :: b(:)
b = unique(a, sorted=.true.) ! b will be [2.5, 3.1, 7.2, 8.0, 9.5]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, having the syntax

b = unique(a, sorted=.true.)

would imply that on entry a is already sorted such that some internal logic can be skipped. Clearly, a is not sorted in this example and so I find the syntax a bit counter-intuitive. Maybe you need change this internally.

```

## API

### `unique` - Returns unique values from an array

#### Interface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Interface
#### Syntax


```fortran
pure function unique(array, sorted) result(unique_values)
<type>, intent(in) :: array(:)
logical, intent(in), optional :: sorted
<type>, allocatable :: unique_values(:)
end function unique
```
Comment on lines +47 to +53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_values = stdlib_sorting(module):unique(procedure) ( array[, sorted])


where `<type>` can be any of:
* `integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`
* `real(sp)`, `real(dp)`, `real(xdp)`, `real(qp)`
* `complex(sp)`, `complex(dp)`, `complex(xdp)`, `complex(qp)`
* `character(len=*)`
* `type(string_type)`

#### Arguments

`array`: Array whose unique values need to be extracted.

`sorted` (optional): Whether the output vector needs to be sorted or not. Default is `.false.`.
Comment on lines +55 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
where `<type>` can be any of:
* `integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`
* `real(sp)`, `real(dp)`, `real(xdp)`, `real(qp)`
* `complex(sp)`, `complex(dp)`, `complex(xdp)`, `complex(qp)`
* `character(len=*)`
* `type(string_type)`
#### Arguments
`array`: Array whose unique values need to be extracted.
`sorted` (optional): Whether the output vector needs to be sorted or not. Default is `.false.`.
#### Arguments
`array`: shall be a rank one array of any of the types: `integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`, `real(sp)`, `real(dp)`, `real(qp)`, `character(*)`, or `type(string_type)`. It is an `intent(inout)` argument.
`sorted` (optional): `logical` scalar stating that whether the output vector needs to be sorted or not. Default is `.false.`. It is an `intent(in)` argument.


#### Result

The function returns an allocatable array containing only the unique values from the input array.

If `sorted` is `.true.`, the returned array will be sorted in order of non-decreasing values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use sort instead of sorted. sorted let me think that it concerns the input array array, and not the output.


If `sorted` is `.false.` (the default), the order of elements is unspecified but generally reflects the order of first appearance of each unique value in the input array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If `sorted` is `.false.` (the default), the order of elements is unspecified but generally reflects the order of first appearance of each unique value in the input array.
If `sorted` is `.false.` (default), the order of elements is unspecified but generally reflects the order of first appearance of each unique value in the input array.


## Examples

### Example 1: Basic usage with integers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples should be in separate files. Keeping them in separate files allows them to be compiled and tested.


```fortran
program example_unique_integers
use stdlib_sorting, only: unique
implicit none

integer :: data(10) = [1, 2, 3, 3, 4, 5, 5, 6, 6, 6]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be a parameter?

integer, allocatable :: unique_values(:)

! Get unique values
unique_values = unique(data)

! Print the results
print *, "Original array: ", data
print *, "Unique values: ", unique_values

end program example_unique_integers
```

Expected output:
```
Original array: 1 2 3 3 4 5 5 6 6 6
Unique values: 1 2 3 4 5 6
```
Comment on lines +98 to +102
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include this in the code itself.


### Example 2: Using the sorted option with real values

```fortran
program example_unique_reals
use stdlib_kinds, only: sp
use stdlib_sorting, only: unique
implicit none

real(sp) :: data(8) = [3.1, 2.5, 7.2, 3.1, 2.5, 8.0, 7.2, 9.5]
real(sp), allocatable :: unique_values(:)

! Get unique values in sorted order
unique_values = unique(data, sorted=.true.)

! Print the results
print *, "Original array: ", data
print *, "Sorted unique values: ", unique_values

end program example_unique_reals
```

Expected output:
```
Original array: 3.1 2.5 7.2 3.1 2.5 8.0 7.2 9.5
Sorted unique values: 2.5 3.1 7.2 8.0 9.5
```

### Example 3: Working with character arrays

```fortran
program example_unique_strings
use stdlib_sorting, only: unique
implicit none

character(len=6) :: data(7) = ["apple ", "banana", "cherry", "apple ", "date ", "banana", "cherry"]
character(len=6), allocatable :: unique_values(:)
integer :: i

! Get unique values
unique_values = unique(data)

! Print the results
print *, "Original array:"
do i = 1, size(data)
print *, data(i)
end do

print *, "Unique values:"
do i = 1, size(unique_values)
print *, unique_values(i)
end do

end program example_unique_strings
```

## Implementation Notes

The implementation uses a sorting-based approach to identify unique elements efficiently. When `sorted=.true.`, the algorithm sorts the input array and then identifies adjacent duplicate elements. When `sorted=.false.`, the function still uses sorting internally but ensures that the order of first appearance is preserved.

## Future Enhancements

Future versions might include additional features:

1. Return the indices of the first occurrence of each unique element
2. Return indices that can reconstruct the original array from the unique elements
3. Support for multi-dimensional arrays
4. Tolerance parameter for floating-point comparisons

## Related Functions

* `sort` - Sorts an array in ascending or descending order
* `sort_index` - Creates index array that would sort an array
* `ord_sort` - Performs a stable sort on an array
Comment on lines +172 to +176
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Related Functions
* `sort` - Sorts an array in ascending or descending order
* `sort_index` - Creates index array that would sort an array
* `ord_sort` - Performs a stable sort on an array

1 change: 1 addition & 0 deletions example/sorting/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ ADD_EXAMPLE(sort)
ADD_EXAMPLE(sort_index)
ADD_EXAMPLE(radix_sort)
ADD_EXAMPLE(sort_bitset)
ADD_EXAMPLE(unique)
64 changes: 64 additions & 0 deletions example/sorting/example_unique.f90
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
program example_unique
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split this program in smaller programs and include them in the specs, please?

use stdlib_kinds, only: dp, sp
use stdlib_sorting, only: unique
use stdlib_string_type, only: string_type
implicit none

! Example with integer array
integer :: int_array(10) = [1, 2, 3, 3, 4, 5, 5, 6, 6, 6]
integer, allocatable :: int_unique(:)

! Example with real array
real(sp) :: real_array(8) = [3.1, 2.5, 7.2, 3.1, 2.5, 8.0, 7.2, 9.5]
real(sp), allocatable :: real_unique(:)

! Example with character array
character(len=6) :: char_array(7) = ["apple ", "banana", "cherry", "apple ", "date ", "banana", "cherry"]
character(len=6), allocatable :: char_unique(:)

! Example with string_type array
type(string_type) :: string_array(8), string_unique_sorted(4)
type(string_type), allocatable :: string_unique(:)

integer :: i

! Setup string array
string_array(1) = "apple"
string_array(2) = "banana"
string_array(3) = "cherry"
string_array(4) = "apple"
string_array(5) = "date"
string_array(6) = "banana"
string_array(7) = "cherry"
string_array(8) = "apple"

! Get unique integer values
int_unique = unique(int_array)
print *, "Unique integers:", int_unique

! Get sorted unique integer values
int_unique = unique(int_array, sorted=.true.)
print *, "Sorted unique integers:", int_unique

! Get unique real values
real_unique = unique(real_array)
print *, "Unique reals:", real_unique

! Get sorted unique real values
real_unique = unique(real_array, sorted=.true.)
print *, "Sorted unique reals:", real_unique

! Get unique character values
char_unique = unique(char_array)
print *, "Unique strings:"
do i = 1, size(char_unique)
print *, char_unique(i)
end do

! Get unique string_type values (sorted)
string_unique = unique(string_array, sorted=.true.)
print *, "Sorted unique string_type values:"
do i = 1, size(string_unique)
print *, string_unique(i)
end do
end program example_unique
1 change: 1 addition & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ set(fppFiles
stdlib_sorting_ord_sort.fypp
stdlib_sorting_sort.fypp
stdlib_sorting_sort_index.fypp
stdlib_sorting_unique.fypp
stdlib_sparse_constants.fypp
stdlib_sparse_conversion.fypp
stdlib_sparse_kinds.fypp
Expand Down
54 changes: 51 additions & 3 deletions src/stdlib_sorting.fypp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#:set STRING_TYPES_ALT_NAME = list(zip(STRING_TYPES, STRING_TYPES, STRING_KINDS))
#:set CHAR_TYPES_ALT_NAME = list(zip(["character(len=*)"], ["character(len=len(array))"], ["char"]))
#:set BITSET_TYPES_ALT_NAME = list(zip(BITSET_TYPES, BITSET_TYPES, BITSET_KINDS))
#:set COMPLEX_TYPES_ALT_NAME = list(zip(CMPLX_TYPES, CMPLX_TYPES, CMPLX_KINDS))

#:set INT_INDEX_TYPES_ALT_NAME = list(zip(["int_index", "int_index_low"], ["integer(int_index)", "integer(int_index_low)"], ["default", "low"]))

Expand All @@ -13,6 +14,7 @@
#! This approach allows us to have the same code for all input types.
#:set IRSCB_TYPES_ALT_NAME = INT_TYPES_ALT_NAME + REAL_TYPES_ALT_NAME + STRING_TYPES_ALT_NAME + CHAR_TYPES_ALT_NAME &
& + BITSET_TYPES_ALT_NAME
#:set IRSC_TYPES_ALT_NAME = INT_TYPES_ALT_NAME + REAL_TYPES_ALT_NAME + STRING_TYPES_ALT_NAME + CHAR_TYPES_ALT_NAME + COMPLEX_TYPES_ALT_NAME

!! Licensing:
!!
Expand Down Expand Up @@ -70,7 +72,7 @@
!! in the Fortran Standard Library under the MIT license provided
!! we cite:
!!
!! Musser, D.R., Introspective Sorting and Selection Algorithms,
!! Musser, D.R., "Introspective Sorting and Selection Algorithms,"
!! Software—Practice and Experience, Vol. 27(8), 983–993 (August 1997).
!!
!! as the official source of the algorithm.
Expand Down Expand Up @@ -135,13 +137,13 @@ module stdlib_sorting

use stdlib_bitsets, only: bitset_64, bitset_large, &
assignment(=), operator(>), operator(>=), operator(<), operator(<=)

implicit none
private

integer, parameter, public :: int_index = int64 !! Integer kind for indexing
integer, parameter, public :: int_index_low = int32 !! Integer kind for indexing using less than `huge(1_int32)` values


! Constants for use by tim_sort
integer, parameter :: &
Expand All @@ -160,6 +162,52 @@ module stdlib_sorting
integer(int_index) :: len = 0
end type run_type

interface unique
!! Version: experimental
!!
!! The generic function implementing the `unique` algorithm to return
!! a new array containing only the unique values from the input array.
!! Its use has the syntax:
!!
!! result = unique(array[, sorted])
!!
!! with the arguments:
!!
!! * array: the rank 1 array from which to extract unique values. It is an `intent(in)`
!! argument of any of the types `integer(int8)`, `integer(int16)`,
!! `integer(int32)`, `integer(int64)`, `real(real32)`, `real(real64)`,
!! `real(real128)`, `complex(real32)`, `complex(real64)`, `complex(real128)`,
!! `character(*)`, or `type(string_type)`.
!!
!! * sorted (optional): shall be a scalar of type default logical. It
!! is an `intent(in)` argument that indicates whether the input array
!! is already sorted. If present with value `.true.`, the function will
!! skip sorting the input, which can save computational time. Default is `.false.`.
!! The output will always have duplicate elements removed and will be in
!! the same order as the input (if sorted) or in sorted order (if not already sorted).
!!
!!#### Example
!!
!!```fortran
!! ...
!! ! Extract unique values from an array
!! integer :: x(5) = [1, 2, 3, 3, 4]
!! integer, allocatable :: y(:)
!!
!! y = unique(x) ! y will be [1, 2, 3, 4]
!!
!! ! Use with optional sorted argument when input is already sorted
!! integer :: z(8) = [1, 2, 2, 3, 5, 5, 7, 8]
!! integer, allocatable :: u(:)
!!
!! u = unique(z, sorted=.true.) ! Skip sorting, u will be [1, 2, 3, 5, 7, 8]
!! ...
!!```
#:for t1, t2, name1 in IRSC_TYPES_ALT_NAME
module procedure ${name1}$_unique
#:endfor
end interface unique

public ord_sort
!! Version: experimental
!!
Expand Down
Loading
Loading