|
| 1 | +--- |
| 2 | +layout: sip |
| 3 | +permalink: /sips/:title.html |
| 4 | +stage: pre-sip |
| 5 | +status: submitted |
| 6 | +presip-thread: https://contributors.scala-lang.org/t/pre-sip-bind-variables-for-alternative-patterns/6321/13 |
| 7 | +title: SIP-60 - Bind variables within alternative patterns |
| 8 | +--- |
| 9 | + |
| 10 | +**By: Yilin Wei** |
| 11 | + |
| 12 | +## History |
| 13 | + |
| 14 | +| Date | Version | |
| 15 | +|---------------|--------------------| |
| 16 | +| Sep 17th 2023 | Initial Draft | |
| 17 | +| Jan 16th 2024 | Amendments | |
| 18 | + |
| 19 | +## Summary |
| 20 | + |
| 21 | +Pattern matching is one of the most commonly used features in Scala by beginners and experts alike. Most of |
| 22 | +the features of pattern matching compose beautifully — for example, a user who learns about bind variables |
| 23 | +and guard patterns can mix the two features intuitively. |
| 24 | + |
| 25 | +One of the few outstanding cases where this is untrue, is when mixing bind variables and alternative patterns. The part of |
| 26 | +current [specification](https://scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html) which we are concerned with is under section **8.1.12** and is copied below, with the relevant clause |
| 27 | +highlighted. |
| 28 | + |
| 29 | +> … All alternative patterns are type checked with the expected type of the pattern. **They may not bind variables other than wildcards**. The alternative … |
| 30 | +
|
| 31 | +We propose that this restriction be lifted and this corner case be eliminated. |
| 32 | + |
| 33 | +Removing the corner case would make the language easier to teach, reduce friction and allow users to express intent in a more natural manner. |
| 34 | + |
| 35 | +## Motivation |
| 36 | + |
| 37 | +## Scenario |
| 38 | + |
| 39 | +The following scenario is shamelessly stolen from [PEP 636](https://peps.python.org/pep-0636), which introduces pattern matching to the |
| 40 | +Python language. |
| 41 | + |
| 42 | +Suppose a user is writing classic text adventure game such as [Zork](https://en.wikipedia.org/wiki/Zork). For readers unfamiliar with |
| 43 | +text adventure games, the player typically enters freeform text into the terminal in the form of commands to interact with the game |
| 44 | +world. Examples of commands might be `"pick up rabbit"` or `"open door"`. |
| 45 | + |
| 46 | +Typically, the commands are tokenized and parsed. After a parsing stage we may end up with a encoding which is similar to the following: |
| 47 | + |
| 48 | +```scala |
| 49 | +enum Word |
| 50 | + case Get, North, Go, Pick, Up |
| 51 | + case Item(name: String) |
| 52 | + |
| 53 | + case class Command(words: List[Word]) |
| 54 | +``` |
| 55 | + |
| 56 | +In this encoding, the string `pick up jar`, would be parsed as `Command(List(Pick, Up, Item("jar")))`. |
| 57 | + |
| 58 | +Once the command is parsed, we want to actually *do* something with the command. With this particular encoding, |
| 59 | +we would naturally reach for a pattern match — in the simplest case, we could get away with a single recursive function for |
| 60 | +our whole program. |
| 61 | + |
| 62 | +Suppose we take the simplest example where we want to match on a command like `"north"`. The pattern match consists of |
| 63 | +matching on a single stable identifier, `North` and the code would look like this: |
| 64 | + |
| 65 | +~~~ scala |
| 66 | +import Command.* |
| 67 | + |
| 68 | +def loop(cmd: Command): Unit = |
| 69 | + cmd match |
| 70 | + case Command(North :: Nil) => // Code for going north |
| 71 | +~~~ |
| 72 | + |
| 73 | +However as we begin play-testing the actual text adventure, we observe that users type `"go north"`. We decide |
| 74 | +our program should treat the two distinct commands as synonyms. At this point we would reach for an alternative pattern `|` and |
| 75 | +refactor the code like so: |
| 76 | + |
| 77 | +~~~ scala |
| 78 | + case Command(North :: Nil | Go :: North :: Nil) => // Code for going north |
| 79 | +~~~ |
| 80 | + |
| 81 | +This clearly expresses our intent that the two commands map to the same underlying logic. |
| 82 | + |
| 83 | +Later we decide that we want more complex logic in our game; perhaps allowing the user to pick up |
| 84 | +items with a command like `pick up jar`. We would then extend our function with another case, binding the variable `name`: |
| 85 | + |
| 86 | +~~~ scala |
| 87 | + case Command(Pick :: Up :: Item(name) :: Nil) => // Code for picking up items |
| 88 | +~~~ |
| 89 | + |
| 90 | +Again, we might realise through our play-testing that users type `get` as a synonym for `pick up`. After playing around |
| 91 | +with alternative patterns, we may reasonably write something like: |
| 92 | + |
| 93 | +~~~ scala |
| 94 | + case Command(Pick :: Up :: Item(name) :: Nil | Get :: Item(name) :: Nil) => // Code for picking up items |
| 95 | +~~~ |
| 96 | + |
| 97 | +Unfortunately at this point, we are stopped in our tracks by the compiler. The bind variable for `name` cannot be used in conjunction with alternative patterns. |
| 98 | +We must either choose a different encoding. We carefully consult the specification and that this is not possible. |
| 99 | + |
| 100 | +We can, of course, work around it by hoisting the logic to a helper function to the nearest scope which function definitions: |
| 101 | + |
| 102 | +~~~ scala |
| 103 | +def loop(cmd: Cmd): Unit = |
| 104 | + def pickUp(item: String): Unit = // Code for picking up item |
| 105 | + cmd match |
| 106 | + case Command(Pick :: Up :: Item(name)) => pickUp(name) |
| 107 | + case Command(Get :: Item(name)) => pickUp(name) |
| 108 | +~~~ |
| 109 | + |
| 110 | +Or any number of different encodings. However, all of them are less intuitive and less obvious than the code we tried to write. |
| 111 | + |
| 112 | +## Commentary |
| 113 | + |
| 114 | +Removing the restriction leads to more obvious encodings in the case of alternative patterns. Arguably, the language |
| 115 | +would be simpler and easier to teach — we do not have to remember that bind patterns and alternatives |
| 116 | +do not mix and need to teach newcomers the workarounds. |
| 117 | + |
| 118 | +For languages which have pattern matching, a significant number also support the same feature. Languages such as [Rust](https://github.com/rust-lang/reference/pull/957) and [Python](https://peps.python.org/pep-0636/#or-patterns) have |
| 119 | +supported it for some time. While |
| 120 | +this is not a great reason for Scala to do the same, having the feature exist in other languages means that users |
| 121 | +that are more likely to expect the feature. |
| 122 | + |
| 123 | +A smaller benefit for existing users, is that removing the corner case leads to code which is |
| 124 | +easier to review; the absolute code difference between adding a bind variable within an alternative versus switching to a different |
| 125 | +encoding entirely is smaller and conveys the intent of such changesets better. |
| 126 | + |
| 127 | +It is acknowledged, however, that such cases where we share the same logic with an alternative branches are relatively rare compared to |
| 128 | +the usage of pattern matching in general. The current restrictions are not too arduous to workaround for experienced practitioners, which |
| 129 | +can be inferred from the relatively low number of comments from the original [issue](https://github.com/scala/bug/issues/182) first raised in 2007. |
| 130 | + |
| 131 | +To summarize, the main arguments for the proposal are to make the language more consistent, simpler and easier to teach. The arguments |
| 132 | +against a change are that it will be low impact for the majority of existing users. |
| 133 | + |
| 134 | +## Proposed solution |
| 135 | + |
| 136 | +Removing the alternative restriction means that we need to specify some additional constraints. Intuitively, we |
| 137 | +need to consider the restrictions on variable bindings within each alternative branch, as well as the types inferred |
| 138 | +for each binding within the scope of the pattern. |
| 139 | + |
| 140 | +## Bindings |
| 141 | + |
| 142 | +The simplest case of mixing an alternative pattern and bind variables, is where we have two `UnApply` methods, with |
| 143 | +a single alternative pattern. For now, we specifically only consider the case where each bind variable is of the same |
| 144 | +type, like so: |
| 145 | + |
| 146 | +~~~ scala |
| 147 | +enum Foo: |
| 148 | + case Bar(x: Int) |
| 149 | + case Baz(y: Int) |
| 150 | + |
| 151 | + def fun = this match |
| 152 | + case Bar(z) | Baz(z) => ... // z: Int |
| 153 | +~~~ |
| 154 | + |
| 155 | +For the expression to make sense with the current semantics around pattern matches, `z` must be defined in both branches; otherwise the |
| 156 | +case body would be nonsensical if `z` was referenced within it (see [missing variables](#missing-variables) for a proposed alternative). |
| 157 | + |
| 158 | +Removing the restriction would also allow recursive alternative patterns: |
| 159 | + |
| 160 | +~~~ scala |
| 161 | +enum Foo: |
| 162 | + case Bar(x: Int) |
| 163 | + case Baz(x: Int) |
| 164 | + |
| 165 | +enum Qux: |
| 166 | + case Quux(y: Int) |
| 167 | + case Corge(x: Foo) |
| 168 | + |
| 169 | + def fun = this match |
| 170 | + case Quux(z) | Corge(Bar(z) | Baz(z)) => ... // z: Int |
| 171 | +~~~ |
| 172 | + |
| 173 | +Using an `Ident` within an `UnApply` is not the only way to introduce a binding within the pattern scope. |
| 174 | +We also expect to be able to use an explicit binding using an `@` like this: |
| 175 | + |
| 176 | +~~~ scala |
| 177 | +enum Foo: |
| 178 | + case Bar() |
| 179 | + case Baz(bar: Bar) |
| 180 | + |
| 181 | + def fun = this match |
| 182 | + case Baz(x) | x @ Bar() => ... // x: Foo.Bar |
| 183 | +~~~ |
| 184 | + |
| 185 | +## Types |
| 186 | + |
| 187 | +We propose that the type of each variable introduced in the scope of the pattern be the least upper-bound of the type |
| 188 | +inferred within within each branch. |
| 189 | + |
| 190 | +~~~ scala |
| 191 | +enum Foo: |
| 192 | + case Bar(x: Int) |
| 193 | + case Baz(y: String) |
| 194 | + |
| 195 | + def fun = this match |
| 196 | + case Bar(x) | Baz(x) => // x: Int | String |
| 197 | +~~~ |
| 198 | + |
| 199 | +We do not expect any inference to happen between branches. For example, in the case of a GADT we would expect the second branch of |
| 200 | +the following case to match all instances of `Bar`, regardless of the type of `A`. |
| 201 | + |
| 202 | +~~~ scala |
| 203 | +enum Foo[A]: |
| 204 | + case Bar(a: A) |
| 205 | + case Baz(i: Int) extends Foo[Int] |
| 206 | + |
| 207 | + def fun = this match |
| 208 | + case Baz(x) | Bar(x) => // x: Int | A |
| 209 | +~~~ |
| 210 | + |
| 211 | +### Given bind variables |
| 212 | + |
| 213 | +It is possible to introduce bindings to the contextual scope within a pattern match branch. |
| 214 | + |
| 215 | +Since most bindings will be anonymous but be referred to within the branches, we expect the _types_ present in the contextual scope for each branch to be the same rather than the _names_. |
| 216 | + |
| 217 | +~~~ scala |
| 218 | + case class Context() |
| 219 | + |
| 220 | + def run(using ctx: Context): Unit = ??? |
| 221 | + |
| 222 | + enum Foo: |
| 223 | + case Bar(ctx: Context) |
| 224 | + case Baz(i: Int, ctx: Context) |
| 225 | + |
| 226 | + def fun = this match |
| 227 | + case Bar(given Context) | Baz(_, given Context) => run // `Context` appears in both branches |
| 228 | +~~~ |
| 229 | + |
| 230 | +This begs the question of what to do in the case of an explicit `@` binding where the user binds a variable to the same _name_ but to different types. We can either expose a `String | Int` within the contextual scope, or simply reject the code as invalid. |
| 231 | + |
| 232 | +~~~ scala |
| 233 | + enum Foo: |
| 234 | + case Bar(s: String) |
| 235 | + case Baz(i: Int) |
| 236 | + |
| 237 | + def fun = this match |
| 238 | + case Bar(x @ given String) | Baz(x @ given Int) => ??? |
| 239 | +~~~ |
| 240 | + |
| 241 | +To be consistent with the named bindings, we argue that the code should compile and a contextual variable added to the scope with the type of `String | Int`. |
| 242 | + |
| 243 | +### Quoted patterns |
| 244 | + |
| 245 | +[Quoted patterns](https://docs.scala-lang.org/scala3/guides/macros/quotes.html#quoted-patterns) will not be supported in this SIP and the behaviour of quoted patterns will remain the same as currently i.e. any quoted pattern appearing in an alternative pattern binding a variable or type variable will be rejected as illegal. |
| 246 | + |
| 247 | +### Alternatives |
| 248 | + |
| 249 | +#### Enforcing a single type for a bound variable |
| 250 | + |
| 251 | +We could constrain the type for each bound variable within each alternative branch to be the same type. Notably, this is what languages such as Rust, which do not have sub-typing do. |
| 252 | + |
| 253 | +However, since untagged unions are part of Scala 3 and the fact that both are represented by the `|`, it felt more natural to discard this restriction. |
| 254 | + |
| 255 | +#### Type ascriptions in alternative branches |
| 256 | + |
| 257 | +Another suggestion is that an _explicit_ type ascription by a user ought to be defined for all branches. For example, in the currently proposed rules, the following code would infer the return type to be `Int | A` even though the user has written the statement `id: Int`. |
| 258 | + |
| 259 | +~~~scala |
| 260 | +enum Foo[A]: |
| 261 | + case Bar[A](a: A) |
| 262 | + case Baz[A](a: A) |
| 263 | + |
| 264 | + def test = this match |
| 265 | + case Bar(id: Int) | Baz(id) => id |
| 266 | +~~~ |
| 267 | + |
| 268 | +In the author's subjective opinion, it is more natural to view the alternative arms as separate branches — which would be equivalent to the function below. |
| 269 | + |
| 270 | +~~~scala |
| 271 | +def test = this match |
| 272 | + case Bar(id: Int) => id |
| 273 | + case Baz(id) => id |
| 274 | +~~~ |
| 275 | + |
| 276 | +On the other hand, if it is decided that each bound variable ought to be the same type, then arguably "sharing" explicit type ascriptions across branches would reduce boilerplate. |
| 277 | + |
| 278 | +#### Missing variables |
| 279 | + |
| 280 | +Unlike in other languages, we could assign a type, `A | Null`, to a bind variable which is not present in all of the alternative branches. Rust, for example, is constrained by the fact that the size of a variable must be known and untagged unions do not exist. |
| 281 | + |
| 282 | +Arguably, missing a variable entirely is more likely to be an error — the absence of a requirement for `var` declarations before assigning variables in Python means that beginners can easily assign variables to the wrong variable. |
| 283 | + |
| 284 | +It may be, that the enforcement of having to have the same bind variables within each branch ought to be left to a linter rather thana a hard restriction within the language itself. |
| 285 | + |
| 286 | +## Specification |
| 287 | + |
| 288 | +We do not believe there are any syntax changes since the current specification already allows the proposed syntax. |
| 289 | + |
| 290 | +We propose that the following clauses be added to the specification: |
| 291 | + |
| 292 | +Let $`p_1 | \ldots | p_n`$ be an alternative pattern at an arbitrary depth within a case pattern and $`\Gamma_n`$ is the named scope associated with each alternative. |
| 293 | + |
| 294 | +If `p_i` is a quoted pattern binding a variable or type variable, the alternative pattern is considered invalid. Otherwise, let the named variables introduced within each alternative $`p_n`$, be $`x_i \in \Gamma_n`$ and the unnamed contextual variables within each alternative have the type $`T_i \in \Gamma_n`$. |
| 295 | + |
| 296 | +Each $`p_n`$ must introduce the same set of bindings, i.e. for each $`n`$, $`\Gamma_n`$ must have the same **named** members $`\Gamma_{n+1}`$ and the set of $`{T_0, ... T_n}`$ must be the same. |
| 297 | + |
| 298 | +If $`X_{n,i}`$, is the type of the binding $`x_i`$ within an alternative $`p_n`$, then the consequent type, $`X_i`$, of the |
| 299 | +variable $`x_i`$ within the pattern scope, $`\Gamma`$ is the least upper-bound of all the types $`X_{n, i}`$ associated with |
| 300 | +the variable, $`x_i`$ within each branch. |
| 301 | + |
| 302 | +## Compatibility |
| 303 | + |
| 304 | +We believe the changes would be backwards compatible. |
| 305 | + |
| 306 | +# Related Work |
| 307 | + |
| 308 | +The language feature exists in multiple languages. Of the more popular languages, Rust added the feature in [2021](https://github.com/rust-lang/reference/pull/957) and |
| 309 | +Python within [PEP 636](https://peps.python.org/pep-0636/#or-patterns), the pattern matching PEP in 2020. Of course, Python is untyped and Rust does not have sub-typing |
| 310 | +but the semantics proposed are similar to this proposal. |
| 311 | + |
| 312 | +Within Scala, the [issue](https://github.com/scala/bug/issues/182) first raised in 2007. The author is also aware of attempts to fix this issue by [Lionel Parreaux](https://github.com/dotty-staging/dotty/compare/main...LPTK:dotty:vars-in-pat-alts) and the associated [feature request](https://github.com/lampepfl/dotty-feature-requests/issues/12) which |
| 313 | +was not submitted to the main dotty repository. |
| 314 | + |
| 315 | +The associated [thread](https://contributors.scala-lang.org/t/pre-sip-bind-variables-for-alternative-patterns/6321) has some extra discussion around semantics. Historically, there have been multiple similar suggestions — in [2023](https://contributors.scala-lang.org/t/qol-sound-binding-in-pattern-alternatives/6226) by Quentin Bernet and in [2021](https://contributors.scala-lang.org/t/could-it-be-possible-to-allow-variable-binging-in-patmat-alternatives-for-scala-3-x/5235) by Alexey Shuksto. |
| 316 | + |
| 317 | +## Implementation |
| 318 | + |
| 319 | +The author has a current in-progress implementation focused on the typer which compiles the examples with the expected types. Interested |
| 320 | + parties are welcome to see the WIP [here](https://github.com/lampepfl/dotty/compare/main...yilinwei:dotty:main). |
| 321 | + |
| 322 | +### Further work |
| 323 | + |
| 324 | +#### Quoted patterns |
| 325 | + |
| 326 | +More investigation is needed to see how quoted patterns with bind variables in alternative patterns could be supported. |
| 327 | + |
| 328 | +## Acknowledgements |
| 329 | + |
| 330 | +Many thanks to **Zainab Ali** for proof-reading the draft, **Nicolas Stucki** and **Guillaume Martres** for their pointers on the dotty |
| 331 | +compiler codebase. |
0 commit comments