Skip to content

Evolution strategies #423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Evolution strategies #423

wants to merge 3 commits into from

Conversation

zygoloid
Copy link
Contributor

@zygoloid zygoloid commented Mar 30, 2021

Proposal links (add links as proposal evolves):

  • Evolution links:
    • Proposal PR
    • [RFC topic](TODO)
    • [Decision topic](TODO)
    • [Decision PR](TODO)
    • [Announcement](TODO)
  • Related links (optional):
    • [Idea topic](TODO)
    • [TODO](TODO)

@zygoloid zygoloid requested a review from a team March 30, 2021 06:13
@zygoloid zygoloid added WIP proposal A proposal labels Mar 30, 2021
@google-cla google-cla bot added the cla: yes PR meets CLA requirements according to bot. label Mar 30, 2021
@zygoloid zygoloid changed the title Lexical extensibility Evolution strategies Apr 1, 2021

We should make the facilities of this approach available to user code, by
allowing a package to expose automigration tools that will be transparently
applied to its dependents.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing users access to writing these tools runs the significant risk of them producing versions that are not sufficiently correct

Copy link
Contributor Author

@zygoloid zygoloid Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. How much should we be worried about that? In some sense, this is "your dependencies can release a new version that breaks you", which I think will be the case regardless, but it does seem like the problem has a different character given that they can break anything in the dependent projects by performing completely arbitrary rewrites.

There's also the aspect that people will be building, executing, and deploying code that literally no-one has ever code-reviewed. I don't think that's abnormal, either -- there are lots of systems that generate code that (most of the time) no-one and nothing looks at the output of other than a compiler -- but again this would be happening at a larger scale than is common.

migration tool, which may be surprising when relating diagnostics or
behavior back to the original source of an un-migrated package. For example,
source snippets in diagnostics may refer to code that doesn't match the
original source, and debug information may refer to generatd files instead
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/generatd/generated/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

without changing the meaning of any program already in the set. For example,
this might include recognizing a new token that was previously invalid.
- A _removal_, that strictly decreases the set of valid input programs,
without changing the meaening of any program in the set.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/meaening/meaning/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

between options that otherwise provide similar value, we should prefer using
more expensive migration strategies over selecting an inferior end state.

When a choice of strategies is available, purely additive changes should be
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I believe that purely additive changes should be preferred. There is value in having a small core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent is that this only applies subject to the "primary driver is the intended end state" above: only when the choices are largely equal on other merits should we consider this factor. I think I can express that more clearly.


### Non-strategy: simultaneous migration

A number of strategies that require making simultaneous chanegs to multiple
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/chanegs/changes/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

strategy. Therefore there is no requirement to reserve any lexical space to
prepare for future changes.

Therefore, we will no longer require whitespace after the `//` introducing a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about //* or //-? Do we not need to leave that open as potentially new operator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to add such an operator, we could migrate all existing uses of //* or //- that introduce a comment to add whitespace after the //, as a point change. (I think we'd probably want to do something smarter, like looking for the enclosing sequence of consecutive comment lines and adding whitespace after the comment introducer across all of them. But in any case I think this can be handled as a point change.)

I think, broadly, if we can model an anticipated direction of evolution as point changes, we shouldn't try to guess what changes we'll want to make, because the cost of making those changes is sufficiently small. (For example, let's not proactively reserve a bunch of words that we think might be keywords, if we think the cost of reclaiming an identifier as a keyword is small.) If, on the other hand, an anticipated direction of evolution would require an incremental migration in response to changes, then we should be thinking about how to make such future changes easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think I may believe a little more than you that we should encourage reserving lexical space so that more future changes can be purely additive changes, even if we choose not to pursue them.

Right now it may not be worth guessing what changes we'll want to make -- Carbon is small, and if we added a token //* probably nothing would be broken, so we wouldn't really make a tool. However, as Carbon grows, those costs shift -- I think reserving lexical space is going to be cheaper than writing and running migrations (note this is also a burden to users who need to update their code). Thus point changes actually have a more significant cost long-term, pushing more for reserving lexical space.

So yeah, right now, don't reserve //*. If Carbon goes public and we still haven't really decided, reserve //*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think I'd got too anchored to point changes being substantially cheaper than incremental changes and I'd lost sight of additive changes being substantially cheaper than point changes (a point change still churns the entire Carbon ecosystem as the migration tool is applied, in addition to the disadvantages listed in this proposal, whereas an additive change does not). Reserving lexical space to turn point changes into additive changes makes a lot of sense to me, but I agree that we don't need to do so now.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I essentially agree with the point you're making here, for now at least.

addition, that in this instance occurs concurrently with the completion of
the removal phase and the removal of the `upcoming` marker.

### Guidance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structural comment, not necessary to address in this proposal, but from a BLUF writing perspective the guidance feels like the bottom-line of this proposal, and thus how it should begin, rather than at the tail end.

strategy. Therefore there is no requirement to reserve any lexical space to
prepare for future changes.

Therefore, we will no longer require whitespace after the `//` introducing a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think I may believe a little more than you that we should encourage reserving lexical space so that more future changes can be purely additive changes, even if we choose not to pursue them.

Right now it may not be worth guessing what changes we'll want to make -- Carbon is small, and if we added a token //* probably nothing would be broken, so we wouldn't really make a tool. However, as Carbon grows, those costs shift -- I think reserving lexical space is going to be cheaper than writing and running migrations (note this is also a burden to users who need to update their code). Thus point changes actually have a more significant cost long-term, pushing more for reserving lexical space.

So yeah, right now, don't reserve //*. If Carbon goes public and we still haven't really decided, reserve //*.

Comment on lines +302 to +305
In order to support changes to an interface, we allow newly-added methods to be
marked as `upcoming`. This indicates that the method is not required, and indeed
cannot be called (except by other `upcoming` functionality), but can be
implemented. Then the addition of an interface method can be staged as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this about a change to the language, or a feature to allow evolution of user-defined interfaces? It feels like the document is mostly talking about the former, but this seems to be about the latter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary focus when writing this document was about evolving the language and its standard library, but my intent was to cover both that and the needs of people evolving non-leaf packages implemented in Carbon. That said, I'd expect that things that people evolving Carbon software need are also things that we need to evolve the standard library.

Comment on lines +307 to +317
- A method is introduced, declared `upcoming`. This is an addition, as
strictly more programs become valid.
- The intent to remove the `upcoming` marker is announced -- in this case,
implicitly, as all `upcoming` markers indicate an intent to remove the
marker. The removal period for this `upcoming` marker begins.
- Over time, the method is implemented by all implementers of the interface.
- The `upcoming` marker is removed. This is a removal, as it results in
strictly fewer programs being valid.
- Once the removal is complete, the new method can be used. This is an
addition, that in this instance occurs concurrently with the completion of
the removal phase and the removal of the `upcoming` marker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example doesn't use a default implementation of the upcoming method. With a default, the new function can be used with much less latency. This may be painting incremental changes in an unfair light.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a correct and generally-applicable default, the transition can be done as a point change, or perhaps even as a pure addition. I'm happy to switch to a different example; this one might be unhelpful by being similar to something we've been considering but with somewhat different details.


All subsequent builds using the new toolchain first migrate the source code to
the new syntax, and then pass it to the new toolchain, which only understands
the new syntax.
Copy link
Contributor

@gribozavr gribozavr Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you see this migration tool to be implemented and released in practice? It would need to be built on T-1 grammar and semantics. IIUC, the migration tool is separated from the compiler so that the compiler (and hence the grammar at time T) only needs to handle the new syntax. However, the compiler T needs to ship with a migration tool that understands T-1 syntax and semantics, and such handling is no longer available in current compiler libraries. So the migration tool can't use the T compiler as a library, it needs T-1.

It seems to me that either the migration tool will need to be built from a different branch than the compiler, or they can be built from the same source code, but with different feature flags enabled. I think branch-based development of the migrator will be a non-starter for the Carbon toolchain development process. If we use feature flags, we could "as well" for many migrations allow the new compiler to understand both kinds of syntax (package migration flags will determine whether the old or new syntax is actually accepted).

While this distinction might look like an implementation detail, I think it is user-visible, as it mitigates a number of disadvantages described above. Rust's editions are very similar to this flag-based model, for example, the RFC for Rust 2021 says:

  • Editions are used to introduce changes into the language that would otherwise have the potential to break existing code, such as the introduction of a new keyword.
  • Editions are never allowed to split the ecosystem. We only permit changes that still allow crates in different editions to interoperate.
  • Editions are named after the year in which they occur (e.g., Rust 2015, Rust 2018, Rust 2021).
  • When we release a new edition, we also release tooling to automate the migration of crates. Some manual work may be required but that should be uncommon.
  • The nightly toolchain offers "preview" access to upcoming editions, so that we can land work that targets future editions at any time.
  • We maintain an Edition Migration Guide that offers guidance on how to migrate to the next edition.
  • Whenever possible, new features should be made to work across all editions.

Note that editions allow for removals following a deprecation cycle (see RFC 2052):

When opting in to a new edition, existing deprecations may turn into hard errors, and the compiler may take advantage of that fact to repurpose existing usage, e.g. by introducing a new keyword. This is the only kind of breaking change a edition opt-in can make.

In a different place Nico clarifies:

The language of the RFC was very clear that you should get warnings in the latest compiler release. This basically means that so long as we have the migration lints, we're ok. It's not require that the warnings are there for the entire edition or anything.

I think it is very similar to our goals and our migration strategy. The only real differences I'd propose:

  • Carbon should name editions after the month of the release date (e.g., 2021.4) to allow for faster evolution.
  • The Carbon toolchain will support the latest edition, and non-latest editions for at least for a certain amount of time (e.g., 6 months). Support for older editions will be dropped depending on the maintenance cost and user demand.

For migrating users of libraries over API changes we have the same issue. If libUiFramework releases v2 that requires a migration from v1, then to migrate a libCustomWidget we need libUiFramework v1 just to do semantic analysis of libCustomWidget before the migration, and v2 immediately after the migration to actually compile it. I think this is going to be difficult without a widely adopted package manager and build system; it might be more practical to see if all the necessary migration information can be included into just the libUiFramework v2.

In order to support changes to an interface, we allow newly-added methods to be
marked as `upcoming`. This indicates that the method is not required, and indeed
cannot be called (except by other `upcoming` functionality), but can be
implemented. Then the addition of an interface method can be staged as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implemented. Then the addition of an interface method can be staged as follows:
implemented. Then the addition of an interface method for which no default implementation is possible can be staged as follows:

@jonmeow jonmeow marked this pull request as draft April 20, 2021 16:17
@jonmeow jonmeow removed the WIP label Apr 20, 2021
@github-actions
Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active, please comment or remove the inactive label.
This PR is labeled inactive because the last activity was over 90 days ago. This PR will be closed and archived after 14 additional days without activity.

@github-actions github-actions bot added the inactive Issues and PRs which have been inactive for at least 90 days. label Jul 28, 2021
@github-actions
Copy link

We triage inactive PRs and issues in order to make it easier to find active work. If this PR should remain active or becomes active again, please reopen it.
This PR was closed and archived because there has been no new activity in the 14 days since the inactive label was added.

@github-actions github-actions bot closed this Aug 12, 2021
@github-actions github-actions bot added the proposal deferred Decision made, proposal deferred label Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes PR meets CLA requirements according to bot. inactive Issues and PRs which have been inactive for at least 90 days. proposal deferred Decision made, proposal deferred proposal A proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants