-
Notifications
You must be signed in to change notification settings - Fork 43
[WIP] Indexing along axes #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yes, I think some sort of smart indexing behaviors are essential to making these arrays powerful. That said, I don't really want to pun too much on the indexing operation. There are a few reasons for this:
I was thinking about having only two different smart indexing behaviors, with two different traits indexing(::Union(Number, AbstractDate)) = Dimensional()
indexing(::Union(Symbol, AbstractString)) = Categorical() Dimensional axes must be sorted, unique, and their only special indexing behavior is with an explicit Are there other behaviors you'd want here? Or other kinds of types that you'd like to have as axes? |
I like the two traits you've defined. That seems like it'd cover the normal cases I can think of. I think it'd still be nice to allow other packages to further customize indexing behavior. An example might be a type that wants to use indexing for interpolation along an axis. Regarding indexing by floating point integers, I'm not sure it contradicts what I implemented. It seems like the sentiment in JuliaLang/julia#10154 is to allow for floating point indexing of the sort we might see use of here. I agree that mbauman/Signals.jl#10 is an important question, and Julia allows a lot of ways to define syntax for that. The notation As far as other kinds of types as axes, I think we should try to accommodate anything that's vector-like, including iterators. For example, I could see having an axis that's a DataFrame. That'd be a way to add metadata to rows of an AxisArray object. A DataFrame wouldn't normally fit because it's not a vector-like object, but |
Yes, the more I think about it, the more I like your external dispatch-driven approach. I had initially included the axis element types as parameters of AxisArray as I was imagining dispatching on them, kind of like this: getindex{EltA,EltB,EltC}(AxisArray{…, (EltA, EltB, EltC)}, ::Interval{EltA}, ::EltB, ::UnitRange{EltC}) But that gets complicated really fast. And you'd probably want to use Unions, which are buggy in dispatch with static parameters. I think the simplest thing to do is have a fallback getindex(A::AxisArray, I...) = getindex(A, map(axisindexes, A.axes, I)...) I like the simplicity. It'll have to be a little fancier to deal with different lengths (and we can do this without map or splat with stagedfunctions). The I think that I only want to allow ranges when the default StepRange |
How about the following for use as the default way to index intervals on ordered axes? Interval(0.3, 2.5)
Interval(from = 0.3) # open-ended `to`
Interval(to = 2.5) # open-ended `from` It's not the most concise approach, but it's pretty straightforward. |
9f03bca
to
035f408
Compare
I've rebased your work on top of my recent getindex fixes, and then took a stab at implementing what we've been talking about here. Take a look, I think it should be pretty functional. The Interval type is about as minimal as it gets, but it does the trick (pretty wild that you don't need |
Awesome. That code is quite concise and easy enough to follow! |
i = findfirst(ax, idx) | ||
i == 0 && error("index $idx not found") | ||
i | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to have another method that indexes on an array of elements. Here's a start at that (untested):
function axisindexes{T}(::Type{Categorical}, ax::AbstractVector{T}, idx::AbstractVector{T})
res = findin(ax, idx)
length(res) == 0 && error("index $idx not found")
res
end
Edit: fix typo. I actually tried it. It works, but note that with findin
, columns are selected, but they are given in the original order, not the order specified in idx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. You can go ahead and add it to this branch. I'll work tonight on fixing the getindex ambiguities. It's a bit of a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. (Probably not til tonight.)
On Tue, Feb 17, 2015 at 9:12 AM, Matt Bauman [email protected]
wrote:
In src/core.jl
#2 (comment):+Base.convert{T}(::Type{Interval{T}}, x) = Interval(x,x)
+Base.isless(a::Interval, b::Interval) = isless(a.hi, b.lo)
+Base.isless(a::Interval, b) = isless(promote(a,b)...)
+Base.isless(a, b::Interval) = isless(promote(a,b)...)
+
+# Default axes indexing throws an error
+axisindexes(ax, idx) = axisindexes(axistype(ax), ax, idx)
+axisindexes(::Type{Unsupported}, ax, idx) = error("elementwise indexing is not supported for axes of type $(typeof(ax))")
+# Dimensional axes may be indexed by intervals of their elements
+axisindexes{T}(::Type{Dimensional}, ax::AbstractVector{T}, idx::Interval{T}) = searchsorted(ax, idx)
+# Categorical axes may be indexed by their elements
+function axisindexes{T}(::Type{Categorical}, ax::AbstractVector{T}, idx::T)
- i = findfirst(ax, idx)
- i == 0 && error("index $idx not found")
- i
+endSeems reasonable. You can go ahead and add it to this branch. I'll work
tonight on fixing the getindex ambiguities. It's a bit of a mess.—
Reply to this email directly or view it on GitHub
https://github.com/mbauman/AxisArrays.jl/pull/2/files#r24817424.
Symbols, FloatRanges, and an Ordered type for ordering vectors.
This is missing tests, but it's a start and works in my quick interactive tests. This creates three AxisTypes: Categorical, Dimensional, and Unsupported, as determined by the axistype function. I created the checkaxis function (but don't use it yet) to enforce type-specific invariants. The axisindexes function is used to 'lower' the fancy indexing behavior to a supported basic indexing type (Int, Range, etc). A fallback `getindex(A, ::Any...)` calls the axisindexes functions for each fancy indexing dimension. It just works with the other behaviors (like `A[Axis{:col}(Interval(.1,.5))]`).
Eliminate splatting for N<=4 for the fallback getindex function. It's a little verbose, but the meta-meta-programming alternative would probably be too confusing.
Since we're not depending upon dispatch for the fancier axis indexing behaviors, this type parameter is not needed.
0b5f8ef
to
fbbf674
Compare
522a59b
to
77013eb
Compare
This is looking great. |
Thanks for all the feedback and help here! |
Agreed: looking great! Handling ambiguity warnings looked painful. |
This is less of a "work-in-progress" and more of a "something to start discussion" on indexing along axes. Here are feature ideas and other discussion items:
A[[from, to],:]
,A[(from,to),:]
,A[1s:9s,:]
, or something else?What's implemented is an
axesindexes
method that tries to generalize indexing along an axes. It should return a UnitRange or other simple indexing type. It generates a bunch of warnings, and there are unchecked indexing cases. Here's what works, now: