Skip to content

Refactor traits based on methods #35095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bramtayl opened this issue Mar 12, 2020 · 96 comments
Closed

Refactor traits based on methods #35095

bramtayl opened this issue Mar 12, 2020 · 96 comments

Comments

@bramtayl
Copy link
Contributor

All traits which semantically define whether or not a method is expected, e.g. HasShape, HasLength, HasEltype, could be refactored into a single function:

expect_method(::typeof(size), ...)
expect_method(::typeof(length), ...)
expect_method(::typeof(eltype), ...)

which need only take two typed values: True and False. This could unify many existing trait interfaces, and allow them to be more easily combined. In addition, I propose

expect_method(::typeof(getindex), ::Type{Container}, ::Type{Vararg{Int, Dimensions}})
expect_method(::typeof(setindex!), ::Type{Container}, ::Type{Element}, ::Type{Vararg{Int, Dimensions}})

etc. for any method that is required by an Abstract interface.

@yuyichao
Copy link
Contributor

which need only take two typed values: True and False

It is still better to use value, not types. Working around inference limitation is strictly a problem for the caller of the function.

This could unify many existing trait interfaces, and allow them to be more easily combined.

How does this allow things to be more easily combined? It seems that this is similar to say there should be only one function and all other functions should be implemented as method for this single function,

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 12, 2020

It is still better to use value, not types

Happy to change the proposal to true and false if that makes things easier.

similar to say there should be only one function

I think its more like saying there should be only one method table...but fair point. I can imagine a macro which would make writing these definitions easier. Something like:

@add_trait size(something::MyType) = ...

@bramtayl
Copy link
Contributor Author

Also, worth noting that I'm not proposing about adding it for every function, just every function that is required as part of an abstract interface.

@yuyichao
Copy link
Contributor

Also, worth noting that I'm not proposing about adding it for every function, just every function that is required as part of an abstract interface.

In that case, I assume you mean automatic, since an opt in behavior won't be any different from what's currently available, just with a diifferent syntax.

However, if it's automatic, it'll basically be asking for a inferable version of hasmethod. And in some sense this isn't something that can be done automatic anyway or you are limited to what the type system can express and won't be able to do anything more flexiable than that.

@bramtayl
Copy link
Contributor Author

an opt in behavior won't be any different from what's currently available, just with a diifferent syntax

yes, I am envisioning opt in behavior, and yes its just a syntax change, but syntax matters

@yuyichao
Copy link
Contributor

yes its just a syntax change, but syntax matters

Sure, I agree syntax does matter. However, then the question is the claim you made initially.

and allow them to be more easily combined

which is what I asked in the first reply above.

@bramtayl
Copy link
Contributor Author

How does this allow things to be more easily combined?

Because you can use logical operators on Bools (and typed bools provided the proper infrastructure). So you could write for example

arraylike(T) = expect_method(size, T) & expect_method(eltype, T) & expect_method(getindex, T, Vararg{Int}) & expect_method(setindex!, T, eltype(T), Vararg{Int})

@yuyichao
Copy link
Contributor

Because you can use logical operators on Bools (and typed bools provided the proper infrastructure).

You can already do that now.

julia> Base.IteratorEltype([]) isa Base.HasEltype && Base.IteratorSize([]) isa Base.HasShape
true

Again, if this is just syntax change that's fine and most of the time I'll have little opinion on what the best syntax should be. However, if this is just a syntax change, then it'll be hard to say this is more flexible. It could be shorter, it could be more convinient for certain usage, but it won't allow you to do what's impossible before or in cases that it couldn't be used before.

FWIW, this is also very closed to having HasEltype(...) = true/false Looking through the usage in base, it seems that such definition make a lot of existing usage harder to implement (they are probably still possible, just with longer and different syntax) so it seems that it's more likely to be a wash on the syntax complexity for simple usage.

@bramtayl
Copy link
Contributor Author

such definition make a lot of existing usage harder to implement

Why?

@yuyichao
Copy link
Contributor

The return value carries more than two states, and some functions dispatches on particular combinations. This methods is requiring all functions to have only two states (i.e. one bit of info) so you'll need to pass on more traits in order to pass the same amount of information.

In another word, what is possible now by passing IteratorSize, you'll need to pass in sth like both expect_method(length) and expect_method(size).

@bramtayl
Copy link
Contributor Author

You could dispatch on True and False. I thought you said that dispatch was slow anyway?

@bramtayl
Copy link
Contributor Author

You'll need to pass in sth like both expect_method(length) and expect_method(size).

You could have a fallback method like:

expect_method(length, T) = expect_method(size, T)

@yuyichao
Copy link
Contributor

yuyichao commented Mar 13, 2020

You could dispatch on True and False. I thought you said that dispatch was slow anyway?

Yes, and I never say it's slow. I say it's no faster than a branch and in this case the comparison is to other dispatch and there won't be much difference.

Dispatchig on true/false does have additional problem though since you can quickly run into boolean hell with more and more traits involved all having the same value set. And it seems that you are indeed most interested in complex cases.

You could have a fallback method like:

The producer of the info isn't the problem. The consumer is. If you are implementing something that dispatches on IteratorSize in order to pick the best method,

Now you have f(::<IteratorSize>, ...), with this you need f(::<expect_method(length)>, ::<expect_method(size)>, ...). Of course each method probably only care about one value, but you have to pass both to make the dispatch work in general case.

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 13, 2020

Now you have f(::<IteratorSize>, ...), with this you need f(::<expect_method(length)>, ::<expect_method(size)>, ...).

This seems like extra flexibility, not extra complexity, to me. Also, I think something more like:

if expect_method(size, T) isa True
    ...
elseif expect_method(length, T) isa True
    ...
else
    ...
end

would be better.

@yuyichao
Copy link
Contributor

yuyichao commented Mar 13, 2020

This seems like extra flexibility,

It's equivalent functionality (i.e. not more flexible) and longer syntax (I'm only talking about syntax complexity in this case). I believe the latter objectively requires more argument. I'm making no argument on which one is better for this simple case, just that the complexity is higher.

if expect_method(size, T) isa True
    ...
elseif expect_method(length, T) isa True
    ...
else
    ...
end

Now this is't about syntax anymore, and no that's how you lost flexibility. It is of course a good way to implement pure logic on traits but you can't do that in general. This is exactly why we have dispatch at all and not just branching on type checks in user code. With that branch written, it is now impossible to extend the dispatch logic it creates which make it less flexible.

Also note that since you are dealing with only one type and one kind of info in this case you can easily exhaust all possible values never require this to be extended, which is exactly why this is a good way to implement pure/simple trait computation. However, as long as you have any other dispatch/logic in the branch, those logic are now bound to the branch logic and cannot be extended freely anymore. In another word, the branch is a single dispatch on one argument, which is strictly less flexible than multiple dispatch.

(edit: I have no idea why I keep typing "now" as "not"...)

@bramtayl
Copy link
Contributor Author

With that branch written, it is now impossible to extend the dispatch logic`

What do you mean? Can't you define new methods of expect_method at any time?

the branch is a single dispatch on one argument, which is strictly less flexible than multiple dispatch

To repeat, if you want dispatch, you can use True and False. It was you, not me, who insisted on using true and false. There's no reason we can't add Sometimes or Slow or whatever values we want.

@yuyichao
Copy link
Contributor

yuyichao commented Mar 13, 2020

What do you mean? Can't you define new methods of expect_method at any time?

Yes, but that's not part of the dispatch logic (on trait values).

You can have f(x, y) = if expect...(length, x) g1(y) else g2(y) end and now you've lost the ability to dispatch on both x and y in g*.

To repeat, if you want dispatch, you can use True and False.

Yes, which is why I was using dispatch in the previous response.

It was you, not me, who insisted on using true and false.

true and false does not disable dispatch. You could even do Val{true/false}.
edit: and that comment of using Bool instead of two singleton types pretty much only apply when you would like only two singleton types and support bool like logic. It doesn't apply if you actually want to add more values, see below.

There's no reason we can't add Sometimes or Slow or whatever values we want.

Well, if that's the case, then sure, the lack of information won't be a concern anymore. Though,

  1. You mentioned that

    which need only take two typed values: True and False.

    and I assumed it was a selling point. (and to some degree I agree, but I also see that it has drawbacks that I brought up)

  2. I don't really see how you can return info for both length and size in this case. It's not like Sometimes size should equivalent to length. I mean, sure it can, and you can probably even return HasLength when you query for size. However, I don't really see how this fits in naturally while maintaining your desired property of easy computation/logic on them.


I also see why you might be confused about the relation between dispatch and branch. So to summarise and also mention why I brought up branch vs dispatch in different context.

  1. Dispatch is strictly more flexible than branches. Branches (at least in call cases brought up in this discussion) is just a non-extensible dispatch on a subset of arguments. In most cases it's even single dispatch.

  2. Exactly due to this inflexibility, branches, when they can be used (less flexible cases) are basically always no slower than dispatch.

  3. For basically all the use cases you've explicitly brought up before (not including the one I mentioned in Refactor traits based on methods #35095 (comment)). They belongs to usecases that can be covered with branches. That's why

    1. I mentioned you should be using branches with simple values in response to your usecases (i.e. they are simple, conceptually, enough that you can use branch)
    2. You can create syntax that fits the subset of usage very well, better than what we currently have.
  4. That's also why I said that the syntax does not fit so well with other usage, when you need to use the full power of multiple dispatch. For the example I use tot show this, the point is to take advantage of multiple dispatch, which is why replacing it with branches will be less flexible. To be clear, using the branch still won't be slower, but it'll just be less flexible, for a case that actually matter now.

  5. Now again, both syntax should be equivalent at some level. The one you propose is probably simpler in some cases but won' be simpler in general. This is not a direct reason to reject anything (though it doesn't make a good supporting argument either) of course since multiple syntax doing the same thing while for different cases is allowed. It'll be better if you can make it directly compatible with the current way (i.e. whatever syntax used now still works) or it'll just be an argument between two equivalent syntaxes neither of which is more flexible than the other. As I mentioned above, both syntax has advantage over the other in some cases and I'm not very interested in deciding which one is "better".

(edit: and you can also generalize branch vs dispatch to normal value computation to type domain computation to cover the discussion after the original proposal on discourse was disgarded.)

@bramtayl
Copy link
Contributor Author

if you can make it directly compatible with the current way

This seems pretty easy to do. What about defining a fallback method like

expect_method(size, T) = IteratorSize(T) isa HasShape

@bramtayl
Copy link
Contributor Author

the syntax does not fit so well with other usage, when you need to use the full power of multiple dispatch

Sure, I agree. There's definitely traits that don't make quite as straightforward sense using this framework, like IsInfinite (though expect_method(last, x) seems like it would work). I don't see why we can't keep those as is.

@yuyichao
Copy link
Contributor

yuyichao commented Mar 13, 2020

expect_method(size, T) = IteratorSize(T) isa HasShape

The general issue with this kind of approach is to keep the two in sync. Basically,

  1. What should the user use? expect_method or InteratorSize? The whole point about usecases above is so that the user can use both.
  2. What should be the user define. For @add_trait to work they have to define expect_method. However, in this particular case, I believe HasShape carries additional information that can't be derived from expect_method so one still need to define HasShape. If this was no an issue for some other cases, it'll mainly be about backward compatibility. In either case, you can't make all existing code to work and have to port all the defining code over.

There's definitely traits that don't make quite as straightforward sense using this framework, like IsInfinite (though expect_method(last, x) seems like it would work). I don't see why we can't keep those as is.

I saw you use "unify many" instead of "unify all" and I wasn't commenting on that. (Since it was a fair statement assuming everything else is accepted). I'm talking about how you use a particular trait. Are you using it for something as inflexiable/simple as a branch or are you taking advantage of multiple dispatch. This is not about syntnax and the particular trait in question doesn't matter... (in another word, I agree with what you've just said but that's not what I meant....)

@bramtayl
Copy link
Contributor Author

HasShape carries additional information that can't be derived from expect_method

You mean expect_method(::typeof(size), T)? Like what?

What should the user use?

expect_method, and HasShape would remain only for backwards compatibility

@yuyichao
Copy link
Contributor

You mean expect_method(::typeof(size), T)? Like what?

Dimension,

julia> Base.IteratorSize(Vector{Int})
Base.HasShape{1}()

julia> Base.IteratorSize(Matrix{Int})
Base.HasShape{2}()

HasShape would remain only for backwards compatibility

Well, the whole point of the direct compatibilty I mentioned above is to use IteratorSize since it carries more bits of information along with it's own types to avoid boolean hell. If it is not to be used then it's the other option I listed of comparing equivalent syntax that favors different usecases. (And again, I'm fine with either as of now. I'm just clarifying the claims and pointing out all the limitations that I can think of for consideration.)

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 13, 2020

Ah got it. I think the number of dimensions is just length(size(x)), no?

boolean hell

What is boolean hell? Is there a three headed dog there?

@yuyichao
Copy link
Contributor

yuyichao commented Mar 13, 2020

I think the number of dimensions is just length(size(x)), no?

I believe so.

What is boolean hell? Is there a three headed dog there?

I was actually looking for a wiki article about it but couldn't find one. There's a few google hit so you can read about it there. I actually also have no idea where did I first about this term...

Anyway, it's about using boolean to pass way too much information and you end up with functions that looks like f(false, false, true, true, false, true, false, false, false, true) and lost track of what each parameter is actually doing, either on the caller side or on the callee side. That's why having more states and unique types helps. There'll be fewer (type) parameters and they are going to be more easily distinguishable from each other.

Of course you can always write bad code like this without using Bool, f(0, 0, 1, 1, 0, 1, 0, 0, 0, 1) isn't much better than the Bool version. I personally believe Bool is singled out since it's about as little information one parameter can carry and you'll maximize the number of parameters you have this way, leading to the hell more easily.


And TBH, I might be using the wrong term. I searched "boolean hell" and found a few hit that is what I'm talking about and assumed that I'm not too off...

@bramtayl
Copy link
Contributor Author

you can always write bad code

Boolean hell just sounds like bad code hell to me. Which means there's a treehaddg there.

lost track of what each parameter is actually doing

Sounds a lot like the existing collect machinery in array.jl

@yuyichao
Copy link
Contributor

yuyichao commented Mar 13, 2020

Boolean hell just sounds like bad code hell to me. Which means there's a treehaddg there.

Yes, but since reducing the amount of information each parameter carries strongly encourages and even necessitate such usage the design of the argument is part of the bad code here...

In fact, I think most of what I've read/seen/done about avoiding such a trap are exactly about replacing boolean flags with enums, in some sense the design of the argument type is almost all of it.

Sounds a lot like the existing collect machinery in array.jl

I would't call it easy to understand but it's far from what I'm talking about. The parameters are mostly following a defined order (so when you don't care about it you can assume you cann jut pass them in the same order you get them). The parameters are mostly of different types so when you do care about them there's no question what a function that has ...(..., ::HasEltype, ...) is implemented for. The compiler can even check if the argument make sense. (You are not gonna get a match if you pass in HasLength here where as a ::True/::False will happily accept a return value from a wrong trait...)

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 13, 2020

::True/::False will happily accept a return value from a wrong trait...)

That's fair, and why I think inline use like

if expect_method(size, T) isa True
    ...
elseif expect_method(length, T) isa True
    ...
else
    ...
end

would be best

@yuyichao
Copy link
Contributor

That's fair, and why I think inline use like

The two are not comparible. There are cases, i.e. externally extendable multitple dispatch, that just can't be expressed this way. I'm only talking about the case you do need to pass trait value around here.

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 13, 2020

You can have f(x, y) = if expect...(length, x) g1(y) else g2(y) end and now you've lost the ability to dispatch on both x and y in g*.

I don't buy this argument. You can do

if expect_method(size, T1) isa True
    if expect_method(eltype, T2) isa True
        ...
    else
        ...
    end
elseif expect_method(length, T1) isa True
    ...
else
    ...
end

which is just as flexible as multiple dispatch

@yuyichao
Copy link
Contributor

which is just as flexible as multiple dispatch

There's very little to be argued here. If it was as flexible as dispatch we won't have the current dispatch system at all.

As I already mentioned above, if you have control over everything, of course you'll know all the dispatch path when you write the code. However, that's simply not the case. Your package will most likely implement some fallback methods that uses the information you know, the function will look like f(x::T1, y::T2). However, if you only have this, it'll be impossible for someone to implement a specialization of this function. The specialization may want to catch x::MySpecialType and expect_method(length, y). If all what you can use is the branch, you simply can't add such a dispatch rule after your original definition is already defined.

And to repeat I've said before, you need to be clear if what you want to define using this is a dispatch or not. In another word, are you simply defining functions that takes Any parameters or are you going to use the trait you defined to define methods that only operate on some type combinations while allowing everyone else to define method operating on other type combinations at the same time and in the same way. The branch you are talking about complement this but cannot be used to do this.

@yuyichao
Copy link
Contributor

Point taken. I mentioned having the option of having a Slow return value above for this reason

Again, this means that you cannot do logic on them easily anymore.

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 13, 2020

you cannot do logic on them easily anymore

Yes you can. You could define

&(::Slow, ::Slow) = Slow()
&(::Fast, ::Slow) = Slow()
&(::Slow, ::Fast) = Slow()
&(::Fast, ::Fast) = Slow()

for example

@bramtayl
Copy link
Contributor Author

I have never written an API with hash or isequal, so I don't know how you're supposed to set them up, but however you do it, its important to have a way to figure out whether an abstract interface has the methods or not.

@yuyichao
Copy link
Contributor

That’s more like min or max. What should !Slow be?

whether an abstract interface has the methods or not.

The point is that there isn’t a single method for this. It’s a multiple method interface.

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 13, 2020

It's a currently non existent interface. How do I figure out if a AbstractBigInt has hash defined or not?

@yuyichao
Copy link
Contributor

If this is useful enough one can be add, say calling it HasIsequalHash. Similar to indexing above, I’m using existing interface as examples as how traits for them need to be defined.

@bramtayl
Copy link
Contributor Author

I can't imagine a scenario when it would not be useful to know about the existence of API required methods. Can you?

Regarding fast/slow, seems to me a good solution would be having a separate fast_method_exists function with the following defaults. Both functions could just return True or False

fast_method_exists(...) = False()
method_exists(...) = fast_method_exists(...)

@yuyichao
Copy link
Contributor

I can't imagine a scenario when it would not be useful to know about the existence of API required methods. Can you?

Yes I can. It's when the API has property that isn't about existance of the method (i.e. when that's not what you are interesed in). You can obviously always still provide more information that's not useful, but having to use a completely different mechanism to query for different information isn't nice.

Also, it's not even about figuring out the existence of "methods". It's about it being "methods" rather than a "method".

seems to me a good solution would be having a separate fast_method_exists function with the following defaults. Both functions could just return True or False

See above, I agree you can always encode more information by just using more bits. However, that's not a very good solution as soon as you want to implement dispatch.

@bramtayl
Copy link
Contributor Author

when the API has property that isn't about existance of the method

For example?

you can always encode more information by just using more bits

Two bits seems pretty good to me

@yuyichao
Copy link
Contributor

For example?

The type parameter for HasShape.

Two bits seems pretty good to me

Not sure what you are talking about because you don't have two bits of information. You only have one bit per trait. If you meant that you think using two bits is fine then that's not the problem either. The issue is you need twice (or more) as many argument.

@bramtayl
Copy link
Contributor Author

The type parameter for HasShape.

We've been through this. HasShape gives redundant information if you know something has a size; you can just use length(size(x))

you need twice (or more) as many argument.

You don't need any arguments. You just need if

@yuyichao
Copy link
Contributor

if you know something has a size; you can just use length(size(x))

Well, I thought you weren't suggesting to actually use it that way. For one, this force you to actualy use the value so you can't do things in the type. Also, this is exactly what I meant that,

having to use a completely different mechanism to query for different information isn't nice.

You don't need any arguments. You just need if

Again, you can't use branch for dispatch. And again, if you are just talking about something that makes writing complex branch easier then it's fine. If you are talking about replacing trait on all what it can be used for then no, you can't use branches to replace dispatch and the necessity of putting things through arguments.

@bramtayl
Copy link
Contributor Author

you are just talking about something that makes writing complex branch easier then it's fine

Yes, that's what I'm talking about

@yuyichao
Copy link
Contributor

yuyichao commented Mar 13, 2020

Yes, that's what I'm talking about

In that case please update the issue since things like

Refactor traits

and

unify many existing trait interfaces

are very misleading when what you are offering aren't to replace all what those are used for.

@bramtayl
Copy link
Contributor Author

I'm totally lost. I'm talking about refactoring of all traits that only check for the existence of a method as logical. This makes complex branching easier.

@yuyichao
Copy link
Contributor

I'm saying that as of now, a lot of the traits are used for dispatch (they are type domain computations and if you don't have to do dispatch you would not put these in type domain to start with). What you propose makes these usage much harder and you are not targetting this usage. Because of that, you should not advertise this as replacing traits because you can't do that.

@bramtayl
Copy link
Contributor Author

You can use True and False for dispatch

@bramtayl
Copy link
Contributor Author

Listen, we're going around in circles, and even if you did happen to convince me I'm wrong, I wouldn't stop arguing because your elitist attitude towards programming and the way you treat people makes me angry.

@yuyichao
Copy link
Contributor

Well, you've already accepted that you are just interesed in operation on these and not dispatch. I also thought you already realized all the issues if you use those generic values for dispatch.

For review, see,

#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)
#35095 (comment)

To summarize, yes, you can use generic value for dispatch and that's what leads to boolean hell. What you are proposing is a very bad way for the current usage of trait (I'm specifically tallking about the property of returning generic value from traits, not any aspect of it)

@yuyichao
Copy link
Contributor

I wouldn't stop arguing

OK, I take it as you are intentially going in circle here and make me repeating my argument for the sole purpose of keeping the argument going.

I think that's a clear enough sign that this issue can be closed.

@bramtayl
Copy link
Contributor Author

you are just interesed in operation on these and not dispatch

You can use True and False for branching or for dispatch. Branching works just as well as dispatch because of constant propagation. You only need most two bits for any of the examples we've discussed, which is maybe your definition of boolean hell but not mine.

@yuyichao
Copy link
Contributor

You only need most two bits for any of the examples we've discussed, which is maybe your definition of boolean hell but not mine.

I did not say any of them are boolean hell, I'm saying that it's how you get there, because

The issue is you need twice (or more) as many argument.

And again, I'm saying you are at least doubling the number of argument for dispatch here. It's not going to be too much a problem when you have only one argument, which is most of the cases you brought up. It's only an issue when you are doing multiple dispatch.

FWIW,

You only need most two bits for any of the examples we've discussed

This is not true. In both #35095 (comment) and #35095 (comment), I was explicitly talking about cases where more than one arguments and two bits of information is needed.

@bramtayl
Copy link
Contributor Author

bramtayl commented Mar 13, 2020

Yes, there's a slight increase in number of arguments compared to the existing trait system (I'm saying slight because I think having to check the speed of more than one method would be very rare), but at the expense of a consistent interface which can be extended arbitrarily without any new types

@yuyichao
Copy link
Contributor

(I'm saying slight because I think having to check the speed of more than one method would be very rare),

I've talked about this above as well. The point is that you are dispatching on the properties available. It's likely that most method will only use few bit of info, but the fact that some may need it means that you have to pass it around.

@bramtayl
Copy link
Contributor Author

Yes

@bramtayl
Copy link
Contributor Author

Closed in favor of #35940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants