Skip to content

npm packaging requirements #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ashleygwilliams opened this issue Jan 30, 2018 · 15 comments
Closed

npm packaging requirements #34

ashleygwilliams opened this issue Jan 30, 2018 · 15 comments

Comments

@ashleygwilliams
Copy link
Member

ashleygwilliams commented Jan 30, 2018

moving this out of #5 because that's got a lot of discussion that's not quite on this topic.

to package up wasm for npm, we'll need these things for a package.json:

{
   "name": String, // name of the package
   "version": String, // version of the package
   "main": String, // the primary file
   "files": [
        "path/to/js", // list of files to include in the package
        "path/to/otherstuff?",
    ],
   "dependencies" : {
       "pkgname": "version", // list of npm pkgs that the pkg depends on
   }
}

there's other metadata such a repo, author, contributors, that we may also want to consider, but the above is what is needed for bare minimum packaging. for more info: https://docs.npmjs.com/files/package.json

@chicoxyzzy
Copy link
Contributor

Is "files" field really necessary here?

@Pauan
Copy link

Pauan commented Jan 30, 2018

@chicoxyzzy No, the files field isn't mandatory, and it is rarely used.

Instead of using files, most projects use .gitignore and/or .npmignore to exclude files from the npm package.

@lukewagner
Copy link
Contributor

Below is a fairly idiomatic (relative to webassembly/design/BinaryEncoding.md) binary encoding of the above JSON schema (with my arbitrary choice of package.json as the custom section name open for discussion).

That being said, I wonder whether it's better to simply stick the (UTF-8-encoded bytes of the) JSON blob literally as the payload of the custom section. Then extracting this from a given WebAssembly.Module m is as simple as:

  1. calling WebAssembly.Module.customSection(m, "package.json") to get the ArraryBuffer of UTF-8-encoded bytes b
  2. calling JSON.parse(TextDecoder.decode(b)) to get the JSON.

Thoughts? I'm actually inclined toward the latter... @alexcrichton ?


npm package custom section

As a custom section,

  • the id is 0,
  • the name field is the UTF-8 byte sequence package.json,
  • the payload_data field contains the following record:
Field Type Description
flags varuint32 Bitmask initially required to be 0 that later allows adding fields
name string name of the package
version string version of the package
main string the primary file
num_files varuint32 number of file strings that follow
files string* list of files to include in the package
num_dependencies varuint32 number of file strings that follow
dependencies string* list of npm pkgs that the pkg depends on

string

Field Type Description
name_len varuint32 length of name_str in bytes
name_str bytes UTF-8 encoding of string

@alexcrichton
Copy link
Contributor

Heh yeah I'd be totally ok going with just raw JSON here in the custom sections.

One point I'd want to clarify though, where do we think this'll happen? Depending on what happens we may not necessarily want the whole schema in a section of the wasm executable, but I'd wanna gut check my thinking.

Our eventual end state would be something along the lines of:

  • You're publishing a library to npm, and this library can consist of a bunch of rust crates compiled to wasm.
  • Any crate in the crate graph could depend on an npm package.
  • Any crate in the crate graph may also have "custom js" it needs available to it
  • The final wasm blob is probably created via LLD (or some wasm linker thing)
  • This tool @ashleygwilliams is thinking of is run at the final point before actually publishing to npm.

Does that sound right? If so I think the only parts we'd need from the wasm blob which may affect package.json are the custom JS files to include and npm dependencies. The name/version/main fields may be generated at the final step (perhaps main through the bindgen business).

In that sense I was thinking that the wasm cusotm sections would be engineered to be concatenateable where each crate would have an optional custom section listing its "custom js" or npm dependencies, and the linker would naturally binary-concatenate all these sections into one when producing the final output.

I may be confused though!

@Pauan
Copy link

Pauan commented Jan 31, 2018

Am I missing something? Why are we talking about inserting package.json into the WebAssembly?

package.json is a separate file that describes the package metadata, it's the npm equivalent of Cargo.toml

And linking all the npm dependencies together should probably be the responsibility of a bundler like Webpack or Parcel, not Rust.

@aturon
Copy link
Contributor

aturon commented Jan 31, 2018

@alexcrichton

Just to check: in the case where we have multiple crates with individual package.json files, it's not enough to simply concatenate their contents; we need to apply a "semver constraint intersection" to "flatten" into a coherent final set of dependencies. Is your idea that this flattening would take place within the publication tool, which would read out a bunch of separate package.json custom sections and flatten them?

@Pauan I believe the main rationale for using custom sections to store this data is that we will then be able to avoid a lot of special-case tooling, e.g. when linking crates. Even the npm publication tool is expected to be language-agnostic; each compiler will produce a custom section for dependencies in the form the tool expects.

It's not the bundler's job, because this all needs to happen prior to publishing to npm. That is, we need to determine a package.json for publication, well before a bundler is involved.

@est31
Copy link

est31 commented Jan 31, 2018

I don't think the compiler (or cargo) should get involved in choosing javascript package managers. There isn't just npm, there are also other package managers for javascript. This should definitely be a separate thing, and it should be also possible to turn it off completely for the case you don't want to publish on npm but use the wasm file directly, especially if you want to use it without using any bundler or similar.

@aturon
Copy link
Contributor

aturon commented Jan 31, 2018

@est31 That's the plan. The idea is just to have a mechanism for recording data into custom sections for consumption by various tools.

@alexcrichton
Copy link
Contributor

@aturon

Just to check: in the case where we have multiple crates with individual package.json files, it's not enough to simply concatenate their contents

Oh sure, of course! I was mostly referring to the literal binary representation where the linker (I'd assume at least) would just bytewise concatenate similarly named sections from each module into one at the end. In that sense raw JSON may not work as they're not byte-wise concatenatable but a binary form with lengths and such should work.

In other words the tool which is taking a wasm file and generating a package.json needs to basically iterate over the requests each of the inputs to the final wasm module had, but after this iteration it'd for sure do the resolution like you're mentioning.

@aturon
Copy link
Contributor

aturon commented Jan 31, 2018

@alexcrichton Great, that's exactly what I hoped! Sounds very clean.

@Pauan
Copy link

Pauan commented Feb 1, 2018

@aturon I believe the main rationale for using custom sections to store this data is that we will then be able to avoid a lot of special-case tooling, e.g. when linking crates. Even the npm publication tool is expected to be language-agnostic; each compiler will produce a custom section for dependencies in the form the tool expects.

It's not the bundler's job, because this all needs to happen prior to publishing to npm. That is, we need to determine a package.json for publication, well before a bundler is involved.

I'm sorry, I'm still not understanding, could you elaborate some more?

My understanding is that this is how the process should work:

Let's say somebody wants to write some Rust code and then create a foo package and publish the foo package to npm. They would follow these steps:

  1. Write some Rust code:

    // This means that we're exporting a function to WebAssembly
    #[wasm_export]
    pub extern fn foo() -> i32 {
        0.0
    }
  2. Compile that Rust code to a foo.wasm file.

  3. Create a package.json file which contains the usual npm metadata:

    {
      "name": "foo",
      "version": "0.1.0",
      "main": "./foo.wasm"
    }
  4. Run the npm publish command, which is the standard way of publishing npm packages.

Okay, great, they're done!

Now, somebody else wants to consume that foo package. There might be all sorts of different consumers: WebAssembly, JavaScript, TypeScript, etc.

For the sake of this example, let's assume that they want to consume that foo package in Rust. They would follow these steps:

  1. Write some Rust code:

    // This means that we're importing the `foo` npm package
    #[wasm_module = "foo"]
    extern {
        fn foo() -> i32;
    }
    
    // Use foo in some way
  2. Compile that Rust code to a bar.wasm file.

  3. Create a package.json file which uses the foo package as a dependency:

    {
      "name": "bar",
      "version": "0.1.0",
      "main": "./bar.wasm",
      "devDependencies": {
        "foo": "^0.1.0"
      }
    }
  4. Run the npm install command.

  5. Bundle the bar.wasm file using Parcel, Webpack, etc.

And that's it, everything should Just Work(tm). All of the packaging, bundling, and linking is done by external tools (npm and Webpack/Parcel/etc.). I think this is the only way that Rust can seamlessly work with npm.

The only thing that rustc needs to do is that when you use #[wasm_export] it creates a wasm export, and when you use #[wasm_module = "foo"] it creates a wasm import. Not some special Rust-specific import/export, just a regular wasm import/export. No custom sections or metadata needed.

What if you don't want to use wasm_export and wasm_module and write all the extern stuff? Well, in that case my recommendation would be to publish a Rust package on Cargo, and then statically link it with other Rust packages (which are also obtained from Cargo).

This linking is handled entirely by rustc / llvm, it's just the normal workflow that Rust programmers are currently using. You can then publish a single .wasm file (which contains all of the statically linked Rust packages) to npm (by following the above steps).

Or perhaps rustc could support a sort of "dynamic linking" where it creates multiple .wasm files (one for each crate), and those .wasm files use wasm imports to import each other. Then you can publish those multiple .wasm files as a single npm package.

But in any case, if you want to use (or publish) a WebAssembly module, you'll have to use the wasm_export and wasm_module stuff, and rely upon an external WebAssembly linker (like Webpack or Parcel).

Is my above understanding correct, or do you have something different in mind?

@Pauan
Copy link

Pauan commented Feb 1, 2018

By the way, what I said above is assuming Rust has no built-in npm integration. If we wanted to integrate npm (and I think we should), then I imagine this is how it would work:

If somebody wants to write some Rust code and publish it as the npm package foo, they would do this:

  1. Write some Rust code:

    // This means that we're exporting a function to WebAssembly
    #[wasm_export]
    pub extern fn foo() -> i32 {
        0.0
    }
  2. Add the following to their Cargo.toml file:

    [npm]
    name = "foo"
    version = "0.1.0"
  3. Run cargo npm publish --release --target wasm32-unknown-unknown (or whatever other target they want)

And they're done!

When they run cargo npm publish, Cargo will:

  1. Create a folder which is used for the publishing (this would probably be something like target/npm)

  2. Build the project and move the compiled WebAssembly file(s) into target/npm

  3. Create a package.json file in target/npm (which includes the fields specified in [npm])

  4. Run the npm publish command.

All of this is an implementation detail of Cargo, so the user doesn't (and shouldn't!) need to worry about it. The user simply needs to use cargo npm publish and everything Just Works(tm).


When consuming an npm package in Rust, they would follow these steps:

  1. Write some Rust code:

    // This means that we're importing the `foo` npm package
    #[wasm_module = "foo"]
    extern {
        fn foo() -> i32;
    }
    
    // Use foo in some way
  2. Add the following to their Cargo.toml file:

    [npm.devDependencies]
    foo = "^0.1.0"
  3. Run cargo build as usual.

And they're done!

When they specify [npm.devDependencies] (or [npm.dependencies], [npm.peerDependencies], [npm.bundledDependencies], [npm.optionalDependencies], etc.) Cargo will:

  1. Create a folder which is used for npm (this would probably be something like target/npm)

  2. Create a package.json file in target/npm which contains the [npm.devDependencies] fields.

  3. Run the npm install command.

  4. Add the package-lock.json file into Cargo.lock (this ensures that npm dependencies are deterministic, just like Cargo dependencies).

    If package-lock.json already exists in Cargo.lock then it should copy it into the target/npm folder instead of generating a new package-lock.json file (this copying needs to be done before running npm install).

  5. Build the Rust project and move the WebAssembly file(s) into target/npm

  6. Run Parcel (or Webpack or whatever) to create the final fully-linked output. Parcel/Webpack will automatically resolve the npm packages to the target/npm/node_modules folder (e.g. target/npm/node_modules/foo), so that doesn't need to be done by Rust or Cargo.

Once again, this is all internal implementation details that the user shouldn't need to know about.

Fundamentally, it's doing the same steps that I described in my previous post, except that those steps have been integrated into Cargo so that it's easier for the programmer to use.

Things get trickier when a Rust package qux specifies [npm.devDependencies], and it uses a Rust package corge, and corge also specifies [npm.devDependencies]. There are basically two options:

  1. Concatenate the [npm.devDependencies] together and resolve them as if it were a single npm package. This is the more Rust-y way of doing things, but it means you will get build failures if there is a version conflict (and version conflicts will be very common if you use this approach).

  2. Treat each Rust package as if it were a separate npm package, so the versions get resolved separately. This is how npm does things. It avoids version conflicts, but it can create extra code bloat. Although I don't like this way of doing things, npm generally relies upon this behavior, so this is probably the correct option.

If you don't want to integrate this functionality directly into Cargo, we could make a separate third-party cargo-npm command which does the above steps (similar to how we have cargo-web right now).


Why am I suggesting to use npm install and Parcel/Webpack, rather than handling it entirely in rustc/Cargo? Because a Rust package that wants to integrate with the npm ecosystem must do that.

There are a wide variety of packages on npm: JavaScript code, TypeScript code, WebAssembly created with C++, WebAssembly created with Rust, hand-written WebAssembly, JavaScript code importing WebAssembly code, WebAssembly code importing JavaScript code, code intended for the browser only, code intended for Node only, code that works with the browser or Node, code that relies upon quirky behavior of npm, code that relies upon quirky behavior of Webpack, code that injects JSON/CSS/HTML into the final bundle, code which is dynamically imported at runtime (code-splitting), code which uses Webpack plugins, code which relies upon multiple different versions of npm packages existing simultaneously, etc.

Bundlers (such as Webpack or Parcel) have been designed to deal with this complexity, they are the de-facto standard in the npm/JavaScript communities. They handle all of the above situations (and more!). Trying to replicate that behavior in rustc/Cargo is simply not practically viable. Even just trying to replicate npm's package version resolution mechanism is very tricky (people have tried).

@aturon
Copy link
Contributor

aturon commented Feb 1, 2018

@Pauan Thanks for elaborating your thoughts! FWIW, I actually think we are all largely in agreement here, and there's just a bit of context that's missing re: what's being spelled out in this issue.

First, and most important: did you read and understand the pipeline diagram? That's the clearest documentation we currently have for the intended pipeline here, and I think it addresses many of your concerns.

But let me also try to explain things in text form.

First off, we very much want to let npm and the bundlers do their work. The goal of this issue and the related one on expressing imports is just to figure out how to tell npm, and ultimately the bundlers, the information they need to know to do this work.

We envision that process happening in two steps:

  • As you suggest, we need some way for crates to specify imports from npm. You're suggesting putting that in Cargo.toml. We had been thinking having a separate package.json. But that's a question for the other thread. The point is, in the end, for each crate we have npm dependency information.

  • At the root crate, when we want to publish up to npm, we need to produce a final package.json and .wasm file (and, with bindgen, some js) and publish that to npm for consumption. That's where the tool described in this issue comes in. We imagined this process itself comprising two steps:

    • The npm dependency information for each crate in the graph would be injected as a custom section, which means linking will produce a single .wasm file with all of the dependency information from the crate graph, without Rust tooling needing any special knowledge.

    • The tool described in this issue will then remove the custom section, and use it to compute a final package.json. That involves dealing with overlapping and/or conflicting version constraints from the crate graph. The tool will then publish the whole shebang on npm.

The use of custom sections is motivated by decoupling, in two ways:

  • Keeping the Rust toolchain as oblivious as possible, so instead this information is all handled by wasm-specific tooling.
  • Allowing the tool described in this issue to be language-agnostic: it works on an arbitrary .wasm file that has the right custom sections, which could've been generated by other languages.

Finally, a bit of broader context: @ashleygwilliams is coming from the npm perspective -- she works at npm -- and we've been discussing the above with the bundlers (we met with Parcel recently).

Hopefully that helps clear some of this up!

@Pauan
Copy link

Pauan commented Feb 2, 2018

@aturon First, and most important: did you read and understand the pipeline diagram?

I did see that image, but it's quite large and requires both horizontal and vertical scrolling, so it was hard to read. I have read it more thoroughly now, it does help some.

The npm dependency information for each crate in the graph would be injected as a custom section, which means linking will produce a single .wasm file with all of the dependency information from the crate graph, without Rust tooling needing any special knowledge.

So let me just verify that I'm understanding you correctly. Let's say you create a Rust project which uses multiple Rust crates.

Each Rust crate might have a package.json. When you build your Rust project, rustc will read the package.json file for each crate, concatenating them together.

In addition, when rustc statically links all the Rust crates together into a single .wasm file, it also injects the concatenated package.json files into a custom section in the .wasm

Then you run a separate tool which will read the .wasm, generate a package.json from the custom section, and then delete the custom section.

Is my understanding correct? If so, then I completely agree with the proposal, it makes perfect sense now.

Finally, a bit of broader context: @ashleygwilliams is coming from the npm perspective -- she works at npm -- and we've been discussing the above with the bundlers (we met with Parcel recently).

Indeed, I am also coming at it from the npm perspective, because I am a long-time JavaScript and npm user. That's why the suggestion of using WebAssembly custom sections seemed odd to me, since package.json is the normal way of doing things.

I thought that the custom sections would be included in the npm package. But now I understand that the custom sections are simply an implementation detail, not something that is exposed to the programmer: the programmer uses package.json as usual.

Hopefully that helps clear some of this up!

Yes it does clear it up, thanks a lot!

@ashleygwilliams
Copy link
Member Author

we are tracking this in the wasm-pack repo and it has already shipped ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants