Skip to content

Commit e240bcc

Browse files
Update SIPs state
1 parent 831439c commit e240bcc

12 files changed

+3445
-45
lines changed

_sips/sips/alternative-bind-patterns.md

-7
This file was deleted.
+331
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,331 @@
1+
---
2+
layout: sip
3+
permalink: /sips/:title.html
4+
stage: pre-sip
5+
status: submitted
6+
presip-thread: https://contributors.scala-lang.org/t/pre-sip-bind-variables-for-alternative-patterns/6321/13
7+
title: SIP-60 - Bind variables within alternative patterns
8+
---
9+
10+
**By: Yilin Wei**
11+
12+
## History
13+
14+
| Date | Version |
15+
|---------------|--------------------|
16+
| Sep 17th 2023 | Initial Draft |
17+
| Jan 16th 2024 | Amendments |
18+
19+
## Summary
20+
21+
Pattern matching is one of the most commonly used features in Scala by beginners and experts alike. Most of
22+
the features of pattern matching compose beautifully — for example, a user who learns about bind variables
23+
and guard patterns can mix the two features intuitively.
24+
25+
One of the few outstanding cases where this is untrue, is when mixing bind variables and alternative patterns. The part of
26+
current [specification](https://scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html) which we are concerned with is under section **8.1.12** and is copied below, with the relevant clause
27+
highlighted.
28+
29+
> … All alternative patterns are type checked with the expected type of the pattern. **They may not bind variables other than wildcards**. The alternative …
30+
31+
We propose that this restriction be lifted and this corner case be eliminated.
32+
33+
Removing the corner case would make the language easier to teach, reduce friction and allow users to express intent in a more natural manner.
34+
35+
## Motivation
36+
37+
## Scenario
38+
39+
The following scenario is shamelessly stolen from [PEP 636](https://peps.python.org/pep-0636), which introduces pattern matching to the
40+
Python language.
41+
42+
Suppose a user is writing classic text adventure game such as [Zork](https://en.wikipedia.org/wiki/Zork). For readers unfamiliar with
43+
text adventure games, the player typically enters freeform text into the terminal in the form of commands to interact with the game
44+
world. Examples of commands might be `"pick up rabbit"` or `"open door"`.
45+
46+
Typically, the commands are tokenized and parsed. After a parsing stage we may end up with a encoding which is similar to the following:
47+
48+
```scala
49+
enum Word
50+
case Get, North, Go, Pick, Up
51+
case Item(name: String)
52+
53+
case class Command(words: List[Word])
54+
```
55+
56+
In this encoding, the string `pick up jar`, would be parsed as `Command(List(Pick, Up, Item("jar")))`.
57+
58+
Once the command is parsed, we want to actually *do* something with the command. With this particular encoding,
59+
we would naturally reach for a pattern match — in the simplest case, we could get away with a single recursive function for
60+
our whole program.
61+
62+
Suppose we take the simplest example where we want to match on a command like `"north"`. The pattern match consists of
63+
matching on a single stable identifier, `North` and the code would look like this:
64+
65+
~~~ scala
66+
import Command.*
67+
68+
def loop(cmd: Command): Unit =
69+
cmd match
70+
case Command(North :: Nil) => // Code for going north
71+
~~~
72+
73+
However as we begin play-testing the actual text adventure, we observe that users type `"go north"`. We decide
74+
our program should treat the two distinct commands as synonyms. At this point we would reach for an alternative pattern `|` and
75+
refactor the code like so:
76+
77+
~~~ scala
78+
case Command(North :: Nil | Go :: North :: Nil) => // Code for going north
79+
~~~
80+
81+
This clearly expresses our intent that the two commands map to the same underlying logic.
82+
83+
Later we decide that we want more complex logic in our game; perhaps allowing the user to pick up
84+
items with a command like `pick up jar`. We would then extend our function with another case, binding the variable `name`:
85+
86+
~~~ scala
87+
case Command(Pick :: Up :: Item(name) :: Nil) => // Code for picking up items
88+
~~~
89+
90+
Again, we might realise through our play-testing that users type `get` as a synonym for `pick up`. After playing around
91+
with alternative patterns, we may reasonably write something like:
92+
93+
~~~ scala
94+
case Command(Pick :: Up :: Item(name) :: Nil | Get :: Item(name) :: Nil) => // Code for picking up items
95+
~~~
96+
97+
Unfortunately at this point, we are stopped in our tracks by the compiler. The bind variable for `name` cannot be used in conjunction with alternative patterns.
98+
We must either choose a different encoding. We carefully consult the specification and that this is not possible.
99+
100+
We can, of course, work around it by hoisting the logic to a helper function to the nearest scope which function definitions:
101+
102+
~~~ scala
103+
def loop(cmd: Cmd): Unit =
104+
def pickUp(item: String): Unit = // Code for picking up item
105+
cmd match
106+
case Command(Pick :: Up :: Item(name)) => pickUp(name)
107+
case Command(Get :: Item(name)) => pickUp(name)
108+
~~~
109+
110+
Or any number of different encodings. However, all of them are less intuitive and less obvious than the code we tried to write.
111+
112+
## Commentary
113+
114+
Removing the restriction leads to more obvious encodings in the case of alternative patterns. Arguably, the language
115+
would be simpler and easier to teach — we do not have to remember that bind patterns and alternatives
116+
do not mix and need to teach newcomers the workarounds.
117+
118+
For languages which have pattern matching, a significant number also support the same feature. Languages such as [Rust](https://github.com/rust-lang/reference/pull/957) and [Python](https://peps.python.org/pep-0636/#or-patterns) have
119+
supported it for some time. While
120+
this is not a great reason for Scala to do the same, having the feature exist in other languages means that users
121+
that are more likely to expect the feature.
122+
123+
A smaller benefit for existing users, is that removing the corner case leads to code which is
124+
easier to review; the absolute code difference between adding a bind variable within an alternative versus switching to a different
125+
encoding entirely is smaller and conveys the intent of such changesets better.
126+
127+
It is acknowledged, however, that such cases where we share the same logic with an alternative branches are relatively rare compared to
128+
the usage of pattern matching in general. The current restrictions are not too arduous to workaround for experienced practitioners, which
129+
can be inferred from the relatively low number of comments from the original [issue](https://github.com/scala/bug/issues/182) first raised in 2007.
130+
131+
To summarize, the main arguments for the proposal are to make the language more consistent, simpler and easier to teach. The arguments
132+
against a change are that it will be low impact for the majority of existing users.
133+
134+
## Proposed solution
135+
136+
Removing the alternative restriction means that we need to specify some additional constraints. Intuitively, we
137+
need to consider the restrictions on variable bindings within each alternative branch, as well as the types inferred
138+
for each binding within the scope of the pattern.
139+
140+
## Bindings
141+
142+
The simplest case of mixing an alternative pattern and bind variables, is where we have two `UnApply` methods, with
143+
a single alternative pattern. For now, we specifically only consider the case where each bind variable is of the same
144+
type, like so:
145+
146+
~~~ scala
147+
enum Foo:
148+
case Bar(x: Int)
149+
case Baz(y: Int)
150+
151+
def fun = this match
152+
case Bar(z) | Baz(z) => ... // z: Int
153+
~~~
154+
155+
For the expression to make sense with the current semantics around pattern matches, `z` must be defined in both branches; otherwise the
156+
case body would be nonsensical if `z` was referenced within it (see [missing variables](#missing-variables) for a proposed alternative).
157+
158+
Removing the restriction would also allow recursive alternative patterns:
159+
160+
~~~ scala
161+
enum Foo:
162+
case Bar(x: Int)
163+
case Baz(x: Int)
164+
165+
enum Qux:
166+
case Quux(y: Int)
167+
case Corge(x: Foo)
168+
169+
def fun = this match
170+
case Quux(z) | Corge(Bar(z) | Baz(z)) => ... // z: Int
171+
~~~
172+
173+
Using an `Ident` within an `UnApply` is not the only way to introduce a binding within the pattern scope.
174+
We also expect to be able to use an explicit binding using an `@` like this:
175+
176+
~~~ scala
177+
enum Foo:
178+
case Bar()
179+
case Baz(bar: Bar)
180+
181+
def fun = this match
182+
case Baz(x) | x @ Bar() => ... // x: Foo.Bar
183+
~~~
184+
185+
## Types
186+
187+
We propose that the type of each variable introduced in the scope of the pattern be the least upper-bound of the type
188+
inferred within within each branch.
189+
190+
~~~ scala
191+
enum Foo:
192+
case Bar(x: Int)
193+
case Baz(y: String)
194+
195+
def fun = this match
196+
case Bar(x) | Baz(x) => // x: Int | String
197+
~~~
198+
199+
We do not expect any inference to happen between branches. For example, in the case of a GADT we would expect the second branch of
200+
the following case to match all instances of `Bar`, regardless of the type of `A`.
201+
202+
~~~ scala
203+
enum Foo[A]:
204+
case Bar(a: A)
205+
case Baz(i: Int) extends Foo[Int]
206+
207+
def fun = this match
208+
case Baz(x) | Bar(x) => // x: Int | A
209+
~~~
210+
211+
### Given bind variables
212+
213+
It is possible to introduce bindings to the contextual scope within a pattern match branch.
214+
215+
Since most bindings will be anonymous but be referred to within the branches, we expect the _types_ present in the contextual scope for each branch to be the same rather than the _names_.
216+
217+
~~~ scala
218+
case class Context()
219+
220+
def run(using ctx: Context): Unit = ???
221+
222+
enum Foo:
223+
case Bar(ctx: Context)
224+
case Baz(i: Int, ctx: Context)
225+
226+
def fun = this match
227+
case Bar(given Context) | Baz(_, given Context) => run // `Context` appears in both branches
228+
~~~
229+
230+
This begs the question of what to do in the case of an explicit `@` binding where the user binds a variable to the same _name_ but to different types. We can either expose a `String | Int` within the contextual scope, or simply reject the code as invalid.
231+
232+
~~~ scala
233+
enum Foo:
234+
case Bar(s: String)
235+
case Baz(i: Int)
236+
237+
def fun = this match
238+
case Bar(x @ given String) | Baz(x @ given Int) => ???
239+
~~~
240+
241+
To be consistent with the named bindings, we argue that the code should compile and a contextual variable added to the scope with the type of `String | Int`.
242+
243+
### Quoted patterns
244+
245+
[Quoted patterns](https://docs.scala-lang.org/scala3/guides/macros/quotes.html#quoted-patterns) will not be supported in this SIP and the behaviour of quoted patterns will remain the same as currently i.e. any quoted pattern appearing in an alternative pattern binding a variable or type variable will be rejected as illegal.
246+
247+
### Alternatives
248+
249+
#### Enforcing a single type for a bound variable
250+
251+
We could constrain the type for each bound variable within each alternative branch to be the same type. Notably, this is what languages such as Rust, which do not have sub-typing do.
252+
253+
However, since untagged unions are part of Scala 3 and the fact that both are represented by the `|`, it felt more natural to discard this restriction.
254+
255+
#### Type ascriptions in alternative branches
256+
257+
Another suggestion is that an _explicit_ type ascription by a user ought to be defined for all branches. For example, in the currently proposed rules, the following code would infer the return type to be `Int | A` even though the user has written the statement `id: Int`.
258+
259+
~~~scala
260+
enum Foo[A]:
261+
case Bar[A](a: A)
262+
case Baz[A](a: A)
263+
264+
def test = this match
265+
case Bar(id: Int) | Baz(id) => id
266+
~~~
267+
268+
In the author's subjective opinion, it is more natural to view the alternative arms as separate branches — which would be equivalent to the function below.
269+
270+
~~~scala
271+
def test = this match
272+
case Bar(id: Int) => id
273+
case Baz(id) => id
274+
~~~
275+
276+
On the other hand, if it is decided that each bound variable ought to be the same type, then arguably "sharing" explicit type ascriptions across branches would reduce boilerplate.
277+
278+
#### Missing variables
279+
280+
Unlike in other languages, we could assign a type, `A | Null`, to a bind variable which is not present in all of the alternative branches. Rust, for example, is constrained by the fact that the size of a variable must be known and untagged unions do not exist.
281+
282+
Arguably, missing a variable entirely is more likely to be an error — the absence of a requirement for `var` declarations before assigning variables in Python means that beginners can easily assign variables to the wrong variable.
283+
284+
It may be, that the enforcement of having to have the same bind variables within each branch ought to be left to a linter rather thana a hard restriction within the language itself.
285+
286+
## Specification
287+
288+
We do not believe there are any syntax changes since the current specification already allows the proposed syntax.
289+
290+
We propose that the following clauses be added to the specification:
291+
292+
Let $`p_1 | \ldots | p_n`$ be an alternative pattern at an arbitrary depth within a case pattern and $`\Gamma_n`$ is the named scope associated with each alternative.
293+
294+
If `p_i` is a quoted pattern binding a variable or type variable, the alternative pattern is considered invalid. Otherwise, let the named variables introduced within each alternative $`p_n`$, be $`x_i \in \Gamma_n`$ and the unnamed contextual variables within each alternative have the type $`T_i \in \Gamma_n`$.
295+
296+
Each $`p_n`$ must introduce the same set of bindings, i.e. for each $`n`$, $`\Gamma_n`$ must have the same **named** members $`\Gamma_{n+1}`$ and the set of $`{T_0, ... T_n}`$ must be the same.
297+
298+
If $`X_{n,i}`$, is the type of the binding $`x_i`$ within an alternative $`p_n`$, then the consequent type, $`X_i`$, of the
299+
variable $`x_i`$ within the pattern scope, $`\Gamma`$ is the least upper-bound of all the types $`X_{n, i}`$ associated with
300+
the variable, $`x_i`$ within each branch.
301+
302+
## Compatibility
303+
304+
We believe the changes would be backwards compatible.
305+
306+
# Related Work
307+
308+
The language feature exists in multiple languages. Of the more popular languages, Rust added the feature in [2021](https://github.com/rust-lang/reference/pull/957) and
309+
Python within [PEP 636](https://peps.python.org/pep-0636/#or-patterns), the pattern matching PEP in 2020. Of course, Python is untyped and Rust does not have sub-typing
310+
but the semantics proposed are similar to this proposal.
311+
312+
Within Scala, the [issue](https://github.com/scala/bug/issues/182) first raised in 2007. The author is also aware of attempts to fix this issue by [Lionel Parreaux](https://github.com/dotty-staging/dotty/compare/main...LPTK:dotty:vars-in-pat-alts) and the associated [feature request](https://github.com/lampepfl/dotty-feature-requests/issues/12) which
313+
was not submitted to the main dotty repository.
314+
315+
The associated [thread](https://contributors.scala-lang.org/t/pre-sip-bind-variables-for-alternative-patterns/6321) has some extra discussion around semantics. Historically, there have been multiple similar suggestions — in [2023](https://contributors.scala-lang.org/t/qol-sound-binding-in-pattern-alternatives/6226) by Quentin Bernet and in [2021](https://contributors.scala-lang.org/t/could-it-be-possible-to-allow-variable-binging-in-patmat-alternatives-for-scala-3-x/5235) by Alexey Shuksto.
316+
317+
## Implementation
318+
319+
The author has a current in-progress implementation focused on the typer which compiles the examples with the expected types. Interested
320+
parties are welcome to see the WIP [here](https://github.com/lampepfl/dotty/compare/main...yilinwei:dotty:main).
321+
322+
### Further work
323+
324+
#### Quoted patterns
325+
326+
More investigation is needed to see how quoted patterns with bind variables in alternative patterns could be supported.
327+
328+
## Acknowledgements
329+
330+
Many thanks to **Zainab Ali** for proof-reading the draft, **Nicolas Stucki** and **Guillaume Martres** for their pointers on the dotty
331+
compiler codebase.

0 commit comments

Comments
 (0)