-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Remove thread local, replacing with DeserializeSeed #5259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is on the right track!
src/cargo/sources/registry/mod.rs
Outdated
if links.is_some() { | ||
return Err(de::Error::duplicate_field("links")); | ||
} | ||
links = Some(map.next_value()?); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write links = map.next_value()?
and then use RegistryPackage { ..., links }
below without an unwrap_or(None)
, but this is fine if you prefer the symmetry with the other fields.
src/cargo/sources/registry/mod.rs
Outdated
where | ||
D: de::Deserializer<'de>, | ||
{ | ||
return deserializer.deserialize_map(Visitor(self.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When implementing DeserializeSeed I typically reuse the same type for the DeserializeSeed impl and the Visitor impl, so this would be deserializer.deserialize_map(self)
.
src/cargo/sources/registry/mod.rs
Outdated
where | ||
D: de::Deserializer<'de>, | ||
{ | ||
return deserializer.deserialize_seq(Visitor); | ||
return deserializer.deserialize_seq(Visitor(self.0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto here, I would use the same struct as both the DeserializeSeed and the Visitor.
src/cargo/sources/registry/index.rs
Outdated
drop(slot.borrow_mut().take()); | ||
res | ||
})?; | ||
} = RegistryPackageDefault(self.source_id.clone()).deserialize(&mut json_de)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what SourceId is but this can be RegistryPackageDefault(&self.source_id)
if clone is slow.
Is there a way for us to evaluate how much of a performance improvement this change would be? |
I seem to remember that someone had a derive for |
The |
That crate seems to re-export a different interface over Serde's, and I personally found it rather difficult to use in my short attempt. It seemed far less polished than Serde's normal derives, but that could be because of my own flawed expectations. I'd sort of want something like this: #[derive(Deserialize)]
pub struct RegistryPackage<'a> {
name: Cow<'a, str>,
vers: Version,
#[serde(seed = "SourceId", seed_borrowed)]
deps: DependencyList,
features: BTreeMap<String, Vec<String>>,
cksum: String,
yanked: Option<bool>,
links: Option<String>,
}
#[derive(Deserialize)]
#[serde(seed = "SourceId", seed_borrowed)]
struct DependencyList {
#[serde(deserialize_with = "de_deps")]
inner: Vec<Dependency>,
}
fn de_deps<'de, D>(deserializer: D, seed: &SourceId) -> Result<Vec<Dependency>, _>; As far as I can tell, this is difficult/impossible to do with However, it's worth noting that I don't think there's a good or easy way to do what I want automatically. What Serde really lacks today is a way to do what I want, but without duplicating all of the fields' deserializers except for the one I want to call |
Thanks Mark! ^ @Marwes in case this feedback is helpful to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me. Would it be possible to get an idea of the performance improvement to determine whether all the new code is worth it?
I'd love to get a benchmark, but I only did this because @alexcrichton told me that this is a relatively perf-critical part of Cargo and I have a vague suspicion that thread locals might be slower than this. If @alexcrichton proposes a benchmark though I'd be happy to run it. |
Thanks for pinging, I jotted down some explanations about why the crate looks like it does below which should hopefully clarify it a bit.
Sorry to hear that. I implemented the whole thing for https://github.com/gluon-lang/gluon where it serializes a fairly huge object graph (across multiple crates that need different states). After I got it working I were a bit exhausted and largely left the crate as-is. It has been working without problems since so I never revisited it.
Yeah, it is a bit rough for the above mentioned reasons. The actual derive implementation itself is actually just a fork of Writing the example you wrote about would be something like this with #[derive(DeserializeState)]
#[serde(de_parameters = "'b")]
#[serde(state = "&'b SourceId")]
pub struct RegistryPackage<'a> {
name: Cow<'a, str>,
vers: Version,
#[serde(state)]
deps: DependencyList,
features: BTreeMap<String, Vec<String>>,
cksum: String,
yanked: Option<bool>,
links: Option<String>,
}
#[derive(Deserialize)]
#[serde(de_parameters = "'b")]
#[serde(state = "&'b SourceId")]
struct DependencyList {
#[serde(deserialize_state_with = "de_deps")]
inner: Vec<Dependency>,
}
fn de_deps<'de, D>(deserializer: D, seed: &SourceId) -> Result<Vec<Dependency>, _>; I think that is fairly similar to what you had in mind, just a bit more explicit in places? It just moves the seed type specification to the struct (having it on the field causes problems when we try to have multiple different seeds, by having it on the struct it can only be specified once).
So I originally built the crate on top of the regular This seed type can of course be generated but if that is done, then there is no way for the trait DeserializeStateAssoc {
type Seed: DeserializeSeed<Value = Self>;
}
impl DeserializeStateAssoc for Value {
type Seed = ValueSeed;
} This sort of works (I tried it) but falls apart as there is no way for the seed type to have type parameters that do not exist on the impl<T> DeserializeStateAssoc for Value {
type Seed = ValueSeed<T>;
} A parameter on the trait could maybe solve that issue, but when I tried it I eventually ran into lifetime issues or coherence issues. (I don't quite remember the exact reason this failed, it kinda seems like it would work looking at it now.) impl<T> DeserializeStateAssoc<T> for Value {
type Seed = ValueSeed<T>;
} So eventually I resigned myself to adding a new set of traits which had the exact properties that were needed. FWIW, it is possible to use |
☔ The latest upstream changes (presumably #5261) made this pull request unmergeable. Please resolve the merge conflicts. |
Thanks @Mark-Simulacrum! This is indeed perf-sensitive in the sense that we parse JSON and registries on My "benchmark" has typically been to take Servo's checkout and do something like |
23e4fa3
to
bdc7886
Compare
Local benchmarks seem to show no performance difference. |
bdc7886
to
0f362cf
Compare
Ok no worries! I'm fine removing the thread local with this regardless, so let's... @bors: r+ |
📌 Commit 0f362cf has been approved by |
Remove thread local, replacing with DeserializeSeed @dtolnay -- is this the right way to implement this? It seems *very* verbose, but I suppose that's to be expected... r? @alexcrichton
☀️ Test successful - status-appveyor, status-travis |
@dtolnay -- is this the right way to implement this? It seems very verbose, but I suppose that's to be expected...
r? @alexcrichton