This repository was archived by the owner on May 5, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 11
rewrite groupby #3
Closed
Closed
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
436d568
Rename DataFrame to DataTable
nalimilan 9856e0c
rewrote groupby
cjprybol 15e5da6
removed function used by old groupby
cjprybol 58fe729
Revert "removed function used by old groupby"
cjprybol cbc38db
added type clarifications to methods
cjprybol f15717a
groupby uses SortedDict to retain order and checks for Categoricals
cjprybol 933869e
comment out tests for empty datatables and order-by-levels
cjprybol 20568cb
turn off sorting by default and corral all sorting under a single if
cjprybol 6e5fe01
added ngroup variable
cjprybol b0b742d
removed sorting functionality
cjprybol 4e64f43
remove commented tests, move groupsort_indexer to join.jl, remove
cjprybol 2d622cc
Merge branch 'master' into cjp/groupby
cjprybol 1e807de
Merge branch 'master' into cjp/groupby
cjprybol 2df2bdb
Merge branch 'master' into cjp/groupby
cjprybol File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,3 +7,4 @@ SortingAlgorithms | |
Reexport | ||
Compat 0.19.0 | ||
FileIO 0.1.2 | ||
DataStructures |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,40 @@ similar_nullable{T,R}(dv::CategoricalArray{T,R}, dims::@compat(Union{Int, Tuple{ | |
similar_nullable(dt::AbstractDataTable, dims::Int) = | ||
DataTable(Any[similar_nullable(x, dims) for x in columns(dt)], copy(index(dt))) | ||
|
||
function groupsort_indexer(x::AbstractVector, ngroups::Integer, null_last::Bool=false) | ||
# translated from Wes McKinney's groupsort_indexer in pandas (file: src/groupby.pyx). | ||
|
||
# count group sizes, location 0 for NULL | ||
n = length(x) | ||
# counts = x.pool | ||
counts = fill(0, ngroups + 1) | ||
for i = 1:n | ||
counts[x[i] + 1] += 1 | ||
end | ||
|
||
# mark the start of each contiguous group of like-indexed data | ||
where = fill(1, ngroups + 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know how much it really matters, but it might be best to avoid using |
||
if null_last | ||
for i = 3:ngroups+1 | ||
where[i] = where[i - 1] + counts[i - 1] | ||
end | ||
where[1] = where[end] + counts[end] | ||
else | ||
for i = 2:ngroups+1 | ||
where[i] = where[i - 1] + counts[i - 1] | ||
end | ||
end | ||
|
||
# this is our indexer | ||
result = fill(0, n) | ||
for i = 1:n | ||
label = x[i] + 1 | ||
result[where[label]] = i | ||
where[label] += 1 | ||
end | ||
result, where, counts | ||
end | ||
|
||
function join_idx(left, right, max_groups) | ||
## adapted from Wes McKinney's full_outer_join in pandas (file: src/join.pyx). | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to
@inbounds
this loop (and the others in this function)