Skip to content

Combine Query and QueryLens using a type parameter for state #18162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

chescock
Copy link
Contributor

@chescock chescock commented Mar 5, 2025

Objective

Make QueryLens easier to use by allowing query methods to be called on it directly instead of needing to call the query() method first.

Introduce owned query iterators, so that it's possible for methods to return iterators constructed from Query::transmute or Query::join.

Enable #3887. The World::query method returns a QueryState, so the simplest way to create a Query from a World today is:

let mut state = world.query::<D>();
let query = state.query(&world);

That requires passing world twice, and it's not possible to do that on a single line, since world.query::<D>().query(&world) "creates a temporary value which is freed while still in use". (Although it does work if you also use the query in the same expression, like for c in world.query::<D>().query(&world).)

We could improve that by having World::query return QueryLens (and renaming the method that returns QueryState to World::query_state). That would eliminate the need to pass world twice, but would still require a call to QueryLens::query, and would usually require an extra binding to keep the QueryLens and its internal QueryState alive. But with this PR in place, the QueryLens would be usable as a Query without any extra calls!

Solution

Make Query and the various iterator types generic, where normal queries use borrowed &QueryState<D, F> and QueryLens uses owned QueryState<D, F>. Introduce a trait QueryStateBorrow: Borrow<QueryState> to abstract over the two types.

Have Query use a default type for the state so that it continues to work without specifying it explicitly.

Note that the 'state lifetime on Query was only used in &'state QueryState, so it is now only used in the default type parameter. It still needs to be part of the Query type in order to be referenced in the default type, so we need a PhantomData so that it's actually used. Another option here would be to make Query a type alias for some new named type, but I think the user experience will be better with a default type parameter than with a type alias.

Testing

I used cargo-show-asm to verify that the assembly of Query::iter did not change.

This rust code:

use crate::prelude::*;
#[derive(Component)]
struct C(usize);

#[unsafe(no_mangle)]
#[inline(never)]
pub fn call_query_iter(query: Query<&C>) -> usize {
    let mut total = 0;
    for c in query.iter() {
        total += c.0;
    }
    total
}

Run with

cargo asm -p bevy_ecs --lib --simplify call_query_iter

Results in the same asm before and after the change

call_query_iter:
	push rsi
	push rdi
	push rbx
	mov rdx, qword ptr [rcx]
	mov rcx, qword ptr [rcx + 8]
	mov r8, qword ptr [rdx + 248]
	mov rax, qword ptr [rdx + 256]
	lea r9, [r8 + 4*rax]
	cmp byte ptr [rdx + 264], 0
	je .LBB1258_2
	xor r11d, r11d
	xor r10d, r10d
	xor esi, esi
	xor eax, eax
	jmp .LBB1258_6
.LBB1258_2:
	mov esi, 8
	xor r11d, r11d
	xor r10d, r10d
	xor edi, edi
	xor eax, eax
	jmp .LBB1258_15
.LBB1258_3:
	xor r11d, r11d
.LBB1258_4:
	xor esi, esi
.LBB1258_5:
	mov edi, esi
	inc rsi
	add rax, qword ptr [r11 + 8*rdi]
.LBB1258_6:
	cmp rsi, r10
	jne .LBB1258_5
.LBB1258_7:
	cmp r8, r9
	je .LBB1258_22
	mov r10d, dword ptr [r8]
	add r8, 4
	mov r11, qword ptr [rcx + 328]
	lea rsi, [r10 + 8*r10]
	mov r10, qword ptr [r11 + 8*rsi + 16]
	test r10, r10
	je .LBB1258_7
	lea r11, [r11 + 8*rsi]
	mov rsi, qword ptr [rdx + 272]
	cmp qword ptr [r11 + 64], rsi
	jbe .LBB1258_3
	mov rdi, qword ptr [r11 + 56]
	mov rsi, qword ptr [rdi + 8*rsi]
	test rsi, rsi
	je .LBB1258_3
	mov r11, qword ptr [r11 + 24]
	not rsi
	lea rsi, [rsi + 2*rsi]
	shl rsi, 4
	mov r11, qword ptr [r11 + rsi + 16]
	jmp .LBB1258_4
.LBB1258_12:
	xor r11d, r11d
.LBB1258_13:
	mov rsi, qword ptr [rsi + 80]
	xor edi, edi
.LBB1258_14:
	mov rbx, rdi
	shl rbx, 4
	mov ebx, dword ptr [rsi + rbx + 8]
	inc rdi
	add rax, qword ptr [r11 + 8*rbx]
.LBB1258_15:
	cmp rdi, r10
	jne .LBB1258_14
.LBB1258_16:
	cmp r8, r9
	je .LBB1258_22
	mov r10d, dword ptr [r8]
	add r8, 4
	mov rsi, qword ptr [rcx + 160]
	lea r11, [r10 + 4*r10]
	shl r11, 5
	mov r10, qword ptr [rsi + r11 + 88]
	test r10, r10
	je .LBB1258_16
	add rsi, r11
	mov r11d, dword ptr [rsi + 148]
	mov rdi, qword ptr [rcx + 328]
	lea rbx, [r11 + 8*r11]
	mov r11, qword ptr [rdx + 272]
	cmp qword ptr [rdi + 8*rbx + 64], r11
	jbe .LBB1258_12
	lea rdi, [rdi + 8*rbx]
	mov rbx, qword ptr [rdi + 56]
	mov r11, qword ptr [rbx + 8*r11]
	test r11, r11
	je .LBB1258_12
	mov rdi, qword ptr [rdi + 24]
	not r11
	lea r11, [r11 + 2*r11]
	shl r11, 4
	mov r11, qword ptr [rdi + r11 + 16]
	jmp .LBB1258_13
.LBB1258_22:
	pop rbx
	pop rdi
	pop rsi
	ret

@chescock chescock added A-ECS Entities, components, systems, and events C-Usability A targeted quality-of-life change that makes Bevy easier to use S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Mar 5, 2025
@Victoronz
Copy link
Contributor

Victoronz commented Mar 5, 2025

First of all, thank you for this PR! I've been wanting to do something like this for some time!

Let's get into the details:

I've mentioned this in the discord discussion before, but I'll properly write it out here again:
If we make QueryLens a proper type, then we should not store QueryState on the stack.
QueryState is about a quarter kilobyte big, with a variable size, depending on D and F.
Today, Query is used and stored in various places, and if QueryLens becomes a type that can be used like Query, it is only natural to expect it to be passed around, stored and moved a bunch as well.
Some types even store multiple Query types within them!
Especially since the QueryStateBorrow generic propagates to the iterators, they now also have the same problem!

I think we should only store pointers to QueryState, not QueryState itself. For QueryLens this means storing a Box<QueryState>.
The API of the trait could then equal the API of &QueryState, and can live in just one place (whether that should be the trait or &QueryState I am not yet sure).

The meaning of the QueryStateBorrow trait then becomes close to Deref, instead of Borrow.
With Deref as the base of this trait, we'd have more future possibilities, because more types implement Deref over Borrow.
Examples are: Pin, LazyCell, LazyLock, various mutex guards, RefCell borrow types.
It also spares us from needing to pepper .borrow() everywhere, we can use either *, .deref(), or a direct call when applicable.

Separately, QueryStateBorrow needs to be an unsafe trait, because neither Borrow nor Deref are. An implementor could just conjure up some invalid QueryState when handing out &QueryState, which would lead to UB!

where
D: ReadOnlyQueryData,
S: Copy,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using Clone bounds, not Copy, it allows for stuff like Arc<QueryState> in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to use Clone here because it makes it too easy to accidentally clone the entire QueryState. This is a fairly niche method, so I'm comfortable declaring that it can't be used with unusual iterators.

Note that iter() and iter_mut() will always return a QueryIter with &QueryState, since they borrow from the Query, so this only prevents calls to remaining() on the result of into_iter() with an exotic state.

fetch: D::Fetch<'w>,
filter: F::Fetch<'w>,
query_state: &'s QueryState<D, F>,
fetch: <S::Data as WorldQuery>::Fetch<'w>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we could simplify some of these/recover the old names with type aliases

@chescock
Copy link
Contributor Author

chescock commented Mar 5, 2025

I think we should only store pointers to QueryState, not QueryState itself. For QueryLens this means storing a Box<QueryState>. The API of the trait could then equal the API of &QueryState, and can live in just one place (whether that should be the trait or &QueryState I am not yet sure).

The meaning of the QueryStateBorrow trait then becomes close to Deref, instead of Borrow. With Deref as the base of this trait, we'd have more future possibilities, because more types implement Deref over Borrow. Examples are: Pin, LazyCell, LazyLock, various mutex guards, RefCell borrow types. It also spares us from needing to pepper .borrow() everywhere, we can use either *, .deref(), or a direct call when applicable.

Yup, that's a good idea! I had written this before those discussions, and was trying for the smallest possible change to QueryLens, but I agree that boxing it makes sense. It costs an extra allocation, but QueryState is already full of allocations, so one more shouldn't hurt. And the diff will get smaller without the borrow() calls!

Separately, QueryStateBorrow needs to be an unsafe trait, because neither Borrow nor Deref are. An implementor could just conjure up some invalid QueryState when handing out &QueryState, which would lead to UB!

Yup, that makes sense, too. I had been thinking that we could just roll that into the safety requirements of the existing unsafe fn new(). But they'll be easier to express more clearly on the trait, and there aren't going to be so many implementations of this that we need to worry about a little more unsafe.

Comment on lines 91 to 125
/// Abstracts over an owned or borrowed [`QueryState`].
///
/// # Safety
///
/// This must `deref` to a `QueryState` that does not change.
pub unsafe trait QueryStateDeref:
Deref<Target = QueryState<Self::Data, Self::Filter>>
{
/// The [`QueryData`] for this `QueryState`.
type Data: QueryData;

/// The [`QueryFilter`] for this `QueryState`.
type Filter: QueryFilter;

/// The type returned by [`Self::storage_ids`].
type StorageIter: Iterator<Item = StorageId> + Clone + Default;

/// A read-only version of the state.
type ReadOnly: QueryStateDeref<
Data = <Self::Data as QueryData>::ReadOnly,
Filter = Self::Filter,
>;

/// Iterates the storage ids that this [`QueryState`] matches.
fn storage_ids(&self) -> Self::StorageIter;

/// Borrows the remainder of the [`Self::StorageIter`] as an iterator
/// usable with `&QueryState<D, F>`.
fn reborrow_storage_ids(
storage_iter: &Self::StorageIter,
) -> iter::Copied<slice::Iter<StorageId>>;

/// Converts this state to a read-only version.
fn into_readonly(self) -> Self::ReadOnly;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I said earlier I wasn't sure where the API should live, but I am now certain that it should be on &QueryState.
By putting it on the trait, we would need to describe the correct implementation for each method in the safety contract, or we cannot use these methods ourselves.

This trait should instead have no API for now, and would then only have to describe its supertrait and QueryData/QueryFilter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I delegated everything I could, but these methods are here because the types vary between owned and borrowed versions. The S::StorageIter is stored at the same time as the S, so it can't be slice::Iter for owned QueryData. And into_readonly needs to return an owned QueryState, since there's nothing left to borrow from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it seems QueryIterationCursor code doesn't compile when attempting to borrow the storage iterator from a deref.
However, this means that we are introducing some storage iterator clones into hot iteration code.
We should try to avoid that if possible, these clones shouldn't be necessary!
That does mean changing up the code to allow for this, which I hope isn't too difficult...
The iteration code of QueryIterationCursor is important enough that changing it involves benchmarks/proving it isn't being regressed.

With into_readonly, we run into the problem of needing to cast the original S instead of just the deref.
If we need to put the cast after the deref, then we can just wrap S with a struct that retains the deref, but appends the cast in its own impl!

For that, we don't need a dedicated method on QueryStateDeref, it can be done in Query::into_readonly directly.
If we later want it on QueryStateDeref, then we can put it there as a provided method! Though the more private solution is preferable at first I think.
This would change the returned S type of Query::into_readonly into ReadOnly<S>.

This solution would also transfer to owned transmutes, which into_readonly is essentially a simple case of.

Copy link
Contributor

@Victoronz Victoronz Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The technique of wrapping an S in a ReadOnly<S>/Transmuted<S> might actually be a nice speedup for QueryState::transmute/Query::transmute_lens in general, because we can then avoid creating a new QueryState like we do now!
We can just keep the "access is a subset" check, then cast S this way!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this means that we are introducing some storage iterator clones into hot iteration code.
We should try to avoid that if possible, these clones shouldn't be necessary!
That does mean changing up the code to allow for this, which I hope isn't too difficult...
The iteration code of QueryIterationCursor is important enough that changing it involves benchmarks/proving it isn't being regressed.

Yup, the clones are unfortunate! They only occur for QueryLens::into_iter(), though. Calling into_iter() on an ordinary Query, or calling iter() or iter_mut() on anything, will use &QueryState. That still uses a slice iterator, so this won't change the behavior of any existing code. And if you're consuming a QueryLens then you're already spending allocations on the QueryState, so one more shouldn't be too bad.

I really don't want to make any big changes to QueryIterationCursor in this PR, exactly because it's important for performance! If performance of QueryLens::into_iter() becomes a problem, we can do a more targeted follow-up to rework QueryIterationCursor.

For that, we don't need a dedicated method on QueryStateDeref, it can be done in Query::into_readonly directly. ... This would change the returned S type of Query::into_readonly into ReadOnly<S>.

Oh, wrapping the type is a clever idea! ... Hmm, it would mean that Query<&mut T>::as_readonly() is no longer the same type as Query<&T>, though, since the state type would differ. And I really don't want to make a breaking change like that as part of this PR!

The technique of wrapping an S in a ReadOnly<S>/Transmuted<S> might actually be a nice speedup for QueryState::transmute/Query::transmute_lens in general, because we can then avoid creating a new QueryState like we do now!

Yeah, I think there are some good opportunities in that space! It's a little harder than that because you also need to store a new D::State and F::State for the new D and F. Maybe Transmuted<S> could hold new versions of those and delegate the rest to a wrapped state? Although then you're definitely not a Deref<Target=QueryState>.

Copy link
Contributor

@Victoronz Victoronz Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this means that we are introducing some storage iterator clones into hot iteration code.
We should try to avoid that if possible, these clones shouldn't be necessary!
That does mean changing up the code to allow for this, which I hope isn't too difficult...
The iteration code of QueryIterationCursor is important enough that changing it involves benchmarks/proving it isn't being regressed.

Yup, the clones are unfortunate! They only occur for QueryLens::into_iter(), though. Calling into_iter() on an ordinary Query, or calling iter() or iter_mut() on anything, will use &QueryState. That still uses a slice iterator, so this won't change the behavior of any existing code. And if you're consuming a QueryLens then you're already spending allocations on the QueryState, so one more shouldn't be too bad.

I really don't want to make any big changes to QueryIterationCursor in this PR, exactly because it's important for performance! If performance of QueryLens::into_iter() becomes a problem, we can do a more targeted follow-up to rework QueryIterationCursor.

True, let us leave it for a follow-up then.

For that, we don't need a dedicated method on QueryStateDeref, it can be done in Query::into_readonly directly. ... This would change the returned S type of Query::into_readonly into ReadOnly<S>.

Oh, wrapping the type is a clever idea! ... Hmm, it would mean that Query<&mut T>::as_readonly() is no longer the same type as Query<&T>, though, since the state type would differ. And I really don't want to make a breaking change like that as part of this PR!

Hmm, I don't quite follow, how would those differ?
But for now, what could be done in this PR is add an unsafe cast method to QueryStateDeref (mirroring ptr::cast), which would serve the purpose we'd need it for. The impl for Box<QueryState> should then also be changed into a into_raw -> cast -> from_raw roundtrip.

The technique of wrapping an S in a ReadOnly<S>/Transmuted<S> might actually be a nice speedup for QueryState::transmute/Query::transmute_lens in general, because we can then avoid creating a new QueryState like we do now!

Yeah, I think there are some good opportunities in that space! It's a little harder than that because you also need to store a new D::State and F::State for the new D and F. Maybe Transmuted<S> could hold new versions of those and delegate the rest to a wrapped state? Although then you're definitely not a Deref<Target=QueryState>.

Right! I forgot that ReadOnly has the equal state restriction. Maybe something can be done to add a similar restriction here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it would mean that Query<&mut T>::as_readonly() is no longer the same type as Query<&T>, though, since the state type would differ.

Hmm, I don't quite follow, how would those differ?

If I understood your suggestion, then Query<&mut T, (), &QS>::as_readonly() would return Query<&T, (), ReadOnly<&QS>>. But Query<&T> is Query<&T, (), &QS>, and &QS != ReadOnly<&QS>. So a user couldn't pass the result of as_readonly() to a method that took Query<&T>, which I believe is the main use case for that method today.

But for now, what could be done in this PR is add an unsafe cast method to QueryStateDeref (mirroring ptr::cast), which would serve the purpose we'd need it for. The impl for Box<QueryState> should then also be changed into a into_raw -> cast -> from_raw roundtrip.

I don't think I see how to implement that. Do you mean exposing an equivalent of QueryState::as_transmuted_state instead of QueryState::as_readonly? That would still wind up needing an associated type, but it would need to be generic over NewD, NewF. Since the only uses are as_readonly and as_nop, and as_nop never needs an owned state, it seems simpler to only expose as_readonly.

I can change the Box<QueryState> implementation to do a pointer cast. I like that the current impl doesn't need unsafe, but the safety there isn't any worse than the &QueryState cast, so we might as well avoid reallocating!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for now, what could be done in this PR is add an unsafe cast method to QueryStateDeref (mirroring ptr::cast), which would serve the purpose we'd need it for. The impl for Box<QueryState> should then also be changed into a into_raw -> cast -> from_raw roundtrip.

I don't think I see how to implement that. Do you mean exposing an equivalent of QueryState::as_transmuted_state instead of QueryState::as_readonly? That would still wind up needing an associated type, but it would need to be generic over NewD, NewF. Since the only uses are as_readonly and as_nop, and as_nop never needs an owned state, it seems simpler to only expose as_readonly.

The intent of a cast method here was to get the benefit of those very NewD/NewF generics!
By ensuring via a where bound that the Data/Filter types must have equal state, I thought that as an unsafe method, this could allow then removing the Self::ReadOnly associated type. It would result in some more complex type paths though.

into_readonly could then be a safe, provided method that internally uses that cast.

However, this doesn't really need to happen in this PR, it is probably better left for a follow-up.

@Victoronz
Copy link
Contributor

Victoronz commented Mar 10, 2025

Quick note:
Github pings in commit messages should be avoided, because it means Github will notify that person each time the repo is cloned!
(Which can be a lot...)
Last time I heard about this happening, Github didn't have a fine-grained way of turning this kind of ping off either.
This might only apply to commit titles and not the full message, I'm not sure.

@chescock
Copy link
Contributor Author

Github pings in commit messages should be avoided, because it means Github will notify that person each time the repo is cloned!

Oh, sorry! I was just trying to give you proper credit :). I didn't actually know that would turn into a ping!

Do you want me to force-push a new message, or is it okay here because nobody is cloning my personal fork and the commit will be squashed if it gets merged?

@Victoronz
Copy link
Contributor

Victoronz commented Mar 10, 2025

Github pings in commit messages should be avoided, because it means Github will notify that person each time the repo is cloned!

Oh, sorry! I was just trying to give you proper credit :). I didn't actually know that would turn into a ping!

Do you want me to force-push a new message, or is it okay here because nobody is cloning my personal fork and the commit will be squashed if it gets merged?

I think its fine here because of the squash, just noting for the future; some repos don't do that!
Thanks for the credit :)

…tate

# Conflicts:
#	crates/bevy_ecs/src/query/iter.rs
#	crates/bevy_ecs/src/query/par_iter.rs
#	crates/bevy_ecs/src/system/query.rs
…tate

# Conflicts:
#	crates/bevy_ecs/src/query/par_iter.rs
@alice-i-cecile alice-i-cecile added M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide X-Contentious There are nontrivial implications that should be thought through labels May 5, 2025
@alice-i-cecile
Copy link
Member

More specifically, I want to deprecate the querying methods on QueryState in favor of the query() methods from #15858 to reduce the API surface

Strongly in favor of this goal. This is really hard to maintain.

@chescock chescock added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels May 9, 2025
@maniwani
Copy link
Contributor

maniwani commented May 9, 2025

So I'm hoping I can shorten the whole thing to world.query::<D>().iter() by making query return QueryLens and having QueryLens be usable as a Query.

I'm broadly in favor of this PR, but I'm not very familiar with this section of the code.

(As a side note, the API duplication between Query and QueryState and both APIs being public has always seemed odd to me. I think diminishing the presence of QueryState and steering to Query would remove a lot of ambiguity. I think it'd be ideal if QueryState didn't have methods. You'd only have to grab a reference to it to hand over to the world via some world.query_from(query_state) to get the Query.)

@alice-i-cecile
Copy link
Member

I think it'd be ideal if QueryState didn't have methods. You'd only have to grab a reference to it to hand over to the world via some world.query_from(query_state) to get the Query.)

The main challenge with this is worsening ergonomics for queries when you have World. I agree otherwise though: the duplication really bothers me.

/// [`<Self as QueryData>::fetch`](QueryData::fetch) must always return an item whose `Entity`
/// equals the one the function was called with.
/// I.e.: `Self::fetch(fetch, entity, table_row).entity() == entity` always holds.
pub unsafe trait EntityEquivalentQueryData: QueryData
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about leaving out the Equivalent from the trait name?
It seems redundant, the bound on Item defines what this trait sees as an entity, and plain EntityQueryData would be easier to parse/understand imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about leaving out the Equivalent from the trait name? It seems redundant, the bound on Item defines what this trait sees as an entity, and plain EntityQueryData would be easier to parse/understand imo.

I don't have any especially strong opinions here. Most people won't need to look at this trait at all; they'll expect QueryIter<Entity> to be usable as an EntitySet, and it will be, so they won't worry about how that was implemented.

The trickiest part here is that MainEntity and RenderEntity are EntityEquivalent + QueryData, but can't be EntityEquivalentQueryData because they return a different entity. So maybe EntityEquivalentQueryData is misleading on that count? But EntityQueryData seems like it has the same confusion. Maybe something math-y? No, ReflexiveEntityQueryData just sounds awkward.

Since the stakes are low and none of the options seem perfect, I'm inclined to leave it alone for now, but if you argue more then I'll probably change it rather than argue back :).

@chescock
Copy link
Contributor Author

(As a side note, the API duplication between Query and QueryState and both APIs being public has always seemed odd to me. I think diminishing the presence of QueryState and steering to Query would remove a lot of ambiguity. I think it'd be ideal if QueryState didn't have methods. You'd only have to grab a reference to it to hand over to the world via some world.query_from(query_state) to get the Query.)

The main challenge with this is worsening ergonomics for queries when you have World. I agree otherwise though: the duplication really bothers me.

Yup, I think this is roughly the path I've been trying to take! #15858 introduced query_state.query(world) methods, and #17822 moved the implementations to Query so that the methods on QueryState have thin implementations like self.query(world).into_iter(). I had been intending to follow that immediately with a PR to deprecate the methods on QueryState, but then saw ergonomics loss of changing world.query().iter(world) to world.query().query(world).into_iter(). I'm hoping we can get it down to just world.query().iter(), but that requires a Query that owns its state, so I tried this first.

@Victoronz
Copy link
Contributor

Victoronz commented Jun 15, 2025

This is a good step in the right direction!
I am a bit wary about some details, and the split of D/F generics between both types and the trait, but most of that can either be addressed in follow-ups or lived with!

Ultimately, QueryStateDeref should ideally not have any methods, because having them will restrict or overly complicate the implementation for some pointers which are normal Deref.

That being said, there is some other stuff that interacts or even hinges on this, so I'd like to approve this.
I think doing some sanity benchmarks first to make sure nothing fundamental is being regressed would be a good idea though!

@alice-i-cecile
Copy link
Member

@chescock, can you rebase this now that #15396 is in?

@chescock
Copy link
Contributor Author

@chescock, can you rebase this now that #15396 is in?

Hmm, it's a bit trickier than I expected to combine them. #15396 started using the 's lifetime in the Fetch<'w, 's> values embedded in the iterators, but this PR wants to remove 's from the iterators. I may need to do an intermediate PR to remove 's from type Fetch and pass &'s Self::State directly to fn fetch(). I'll poke at it more tomorrow or Wednesday.

chescock added 4 commits June 16, 2025 21:17
# Conflicts:
#	crates/bevy_ecs/src/query/fetch.rs
#	crates/bevy_ecs/src/query/iter.rs
#	crates/bevy_ecs/src/relationship/relationship_query.rs
#	crates/bevy_ecs/src/system/query.rs
Restore `'state` lifetime to iterators so they can provide it to items.
Replace `QueryStateDeref` with a `ConsumableQueryState` trait that is only used in consuming query methods,
and use ordinary `Deref<Target = QueryState>` for borrowing methods.
@chescock
Copy link
Contributor Author

Okay, I think I got this working again. The main changes are:

Remove 's lifetime from WorldQuery::Fetch

The iterator types for consuming owned state need to store a Fetch alongside the state, so they can't borrow from it. I added that lifetime in #15396 so that we could pipe data from the State to the Item through the Fetch. Instead, we can add a &'s State parameter to QueryData::fetch and borrow the data directly.

This touches enough code that I'll make a separate PR with just those changes: #19720

Split QueryStateDeref into Deref<Target = QueryState> and ConsumableQueryState

For QueryData that need to borrow from the state, we can't use consuming (_inner) methods with an owned state (QueryLens), since there would be nowhere to borrow from. So, we need a new trait for the consuming methods, ConsumableQueryState. It's implemented for all QueryData for &QueryState, but only for ReleaseStateQueryData for Box<QueryState>.

We still want to allow querying arbitrary QueryData with ordinary borrowing methods from an owned state, but it turns out that only requires Deref.

(Note that it's okay that Deref is a safe trait, because creating a Query is unsafe, and a malicious Deref impl wouldn't be able to satisfy the requirements for Query::new.)

This split also made me realize I could split this PR up further, if that would help. If we unify Query and QueryLens but don't try to support consuming methods on QueryLens, then we don't need the ConsumableQueryState trait yet, just Deref. I do think we'll want to support that eventually, but if it's easier to review in small steps then I can try to separate it out.

Add missing safety comment.
Remove unnecessary `mut`.
@Victoronz
Copy link
Contributor

Great that the two PRs were not in too heavy a conflict!

I do think removing the safety contract from the Deref trait bound is rather dangerous:
I am not yet convinced this is safe. As long as QueryState is publicly accessible in any form, encouraging and using a Deref<Target = QueryState> bound can be harmful, or become unsound. One need not create an invalid QueryState or go via Query at all. As long as QueryState can be stateful, a Deref impl can mess with that state, or swap the type out for another instance, maybe not even of the same World.
Even if we ensure that these cannot be done, it is a lot easier to maintain/understand if we can keep the safety requirement on the operation itself.

That also keeps it clearer for future extensions that want to make use of pointers to achieve certain Query/QueryState logic, like caching, or some forms of shared state.

Similarly, if methods in ConsumableQueryState give out StorageIds that are outside of what the current the QueryState should match, or mess with the QueryState returned from into_readonly we run into similar issues, leading me to think that it too, should be unsafe.

On the note of unsafe traits:
Thinking of it, I believe ReleaseStateQueryData shouldn't be safe either. QueryData can be implemented manually, and as long as an implementor incorrectly implements ReleaseStateQueryData while borrowing from State, they can achieve invalid lifetime extension.

github-merge-queue bot pushed a commit that referenced this pull request Jun 19, 2025
# Objective

Unblock #18162.

#15396 added the `'s` lifetime to `QueryData::Item` to make it possible
for query items to borrow from the state. The state isn't passed
directly to `QueryData::fetch()`, so it also added the `'s` lifetime to
`WorldQuery::Fetch` so that we can pass the borrows through there.

Unfortunately, having `WorldQuery::Fetch` borrow from the state makes it
impossible to have owned state, because we store the state and the
`Fetch` in the same `struct` during iteration.

## Solution

Undo the change to add the `'s` lifetime to `WorldQuery::Fetch`.

Instead, add a `&'s Self::State` parameter to `QueryData::fetch()` and
`QueryFilter::filter_fetch()` so that borrows from the state can be
passed directly to query items.

---------

Co-authored-by: Alice Cecile <[email protected]>
Co-authored-by: Emerson Coskey <[email protected]>
…tate

# Conflicts:
#	crates/bevy_ecs/src/query/iter.rs
#	crates/bevy_ecs/src/query/state.rs
#	crates/bevy_ecs/src/system/query.rs
@chescock
Copy link
Contributor Author

I do think removing the safety contract from the Deref trait bound is rather dangerous: I am not yet convinced this is safe. As long as QueryState is publicly accessible in any form, encouraging and using a Deref<Target = QueryState> bound can be harmful, or become unsound. One need not create an invalid QueryState or go via Query at all. As long as QueryState can be stateful, a Deref impl can mess with that state, or swap the type out for another instance, maybe not even of the same World. Even if we ensure that these cannot be done, it is a lot easier to maintain/understand if we can keep the safety requirement on the operation itself.

That also keeps it clearer for future extensions that want to make use of pointers to achieve certain Query/QueryState logic, like caching, or some forms of shared state.

It's sound today: The Deref bound is only used on Query, and the only way to create a Query is with a pub(crate) unsafe fn new. We can both audit the crate to ensure no bad implementations are used, and rely on the "Make sure that this is only called in ways that ensure the queries have unique mutable access" safety requirement.

There may be future extensions where we need to replace it with an unsafe trait! But this PR is already pretty big and I don't want to add more things speculatively.

Similarly, if methods in ConsumableQueryState give out StorageIds that are outside of what the current the QueryState should match, or mess with the QueryState returned from into_readonly we run into similar issues, leading me to think that it too, should be unsafe.

You can already cause trouble like this without a bad Deref impl just by passing a bad QueryState value. Either way, the bad line of code is the call to Query::new where you passed a value that gives out bad StorageIds, violating the (admittedly vague) safety requirements.

On the note of unsafe traits: Thinking of it, I believe ReleaseStateQueryData shouldn't be safe either. QueryData can be implemented manually, and as long as an implementor incorrectly implements ReleaseStateQueryData while borrowing from State, they can achieve invalid lifetime extension.

We don't rely on the implementation of ReleaseStateQueryData anywhere for safety, so I don't think a bad impl can cause UB. All we do is call the method to get a value to return. It's not really any different from extending lifetimes with Clone.

I'm happy to be proven wrong on any of these if you can come up with examples that cause UB, of course! But I'm fairly confident that anything that causes UB will already violate some existing safety contract.

@chescock
Copy link
Contributor Author

This split also made me realize I could split this PR up further, if that would help. If we unify Query and QueryLens but don't try to support consuming methods on QueryLens, then we don't need the ConsumableQueryState trait yet, just Deref. I do think we'll want to support that eventually, but if it's easier to review in small steps then I can try to separate it out.

Well, nobody asked me to, but I split it out anyway: #19787 :). I'm still using safe Deref in that PR.

@Victoronz
Copy link
Contributor

Victoronz commented Jun 23, 2025

[...]

It's sound today: The Deref bound is only used on Query, and the only way to create a Query is with a pub(crate) unsafe fn new. We can both audit the crate to ensure no bad implementations are used, and rely on the "Make sure that this is only called in ways that ensure the queries have unique mutable access" safety requirement.

There may be future extensions where we need to replace it with an unsafe trait! But this PR is already pretty big and I don't want to add more things speculatively.

[...]

You can already cause trouble like this without a bad Deref impl just by passing a bad QueryState value. Either way, the bad line of code is the call to Query::new where you passed a value that gives out bad StorageIds, violating the (admittedly vague) safety requirements.

[...]

We don't rely on the implementation of ReleaseStateQueryData anywhere for safety, so I don't think a bad impl can cause UB. All we do is call the method to get a value to return. It's not really any different from extending lifetimes with Clone.

I'm happy to be proven wrong on any of these if you can come up with examples that cause UB, of course! But I'm fairly confident that anything that causes UB will already violate some existing safety contract.

Hmm, I think I need to adjust my language here: I was mixing several considerations, which makes it quite unclear what kinds of "safety" I was referring to:

This design and PR is likely safe from the viewpoint of a bevy user! Piping all Query creation through Query::new does make its safety contract touch whatever Query is created by our current API.

However, I am moreso trying to consider the maintainability angle:
In my experience, broad non-local safety requirements are a burden to maintain, extend, and audit, and somewhat go against usual design philosophy in Rust.
I find that thinking about/working with a type (and by extension, functions utilizing that type) becomes a lot easier when every instance of that type is correct, instead of "we accept instances that can have unwanted behavior, as long as we do not call/use it in any way that exhibits the aforementioned behavior".

This is in line with the usual wisdom of constraining API to only take types that work correctly with it, and having types only contain instances that are correct.
To me, the traits themselves carrying the safety requirements themselves makes them constrained to only what we want, and easier to work with!

Now, if this is bevy-internal machinery, does it matter that much?

This is my own judgement, so it might be flawed, but from having seen and done some work in bevy and elsewhere, I feel that these broad non-local safety contracts are easy to accidentally break, because they are usually not documented over their entire coverage.
If I define "broad non-local safety contract" more accurately:
An unsafe function that imposes a restriction on a type instance that cannot be verified without knowing the entire "life" of that type.
+ This aforementioned type touching a broad amount of code, or being "viral".

A relevant example of this is our current usage of UnsafeWorldCell: some ways of constructing it impose this same requirement of "some of this types' API must not be called". At the same time, UWCs propagate through a lot of code, where the work to trace back to type construction is not always done, which is what I believe led to #13343.
Sometimes UWCs are being passed as parameters to public trait methods, which increases this risk!

This isn't an issue as long as we audit it, whatever needs to be manually audited adds to the mental load of both reviewers and contributors! (Especially for new ones) I reckon we could be better address this by more documentation/well isolated safety requirements! That makes it a lot more "composable" (Manipulating some code sections becomes less likely to invalidate others).

As another example, in the previous batch of EntitySet functionality, we added a few wrapper types that maintain a uniqueness invariant on their internal type. They were only intended to be constructed/handled by explicit construction/accessor methods, yet those were circumvented by some later PRs that began meshing the functionality with other parts of the ECS! (Even when the API surface did cover their use case)
That is of no fault of their own, I didn't explicitly document (for contributors) that the wrapper fields were not to be manually accessed.
However, direct field access does occur more throughout the engine: For instance, we quite commonly touch world data fields, without accessors, directly after obtaining it from an UWC.

Ultimately, I've concluded that safety requirements should not be viral whenever possible, and if they need to be, there should be documentation with the same degree of virality!

I might be too stingy on this, but I do feel that it would help in lessening the cognitive load of Query/QueryState related work :)
How do you feel about this, do you view this design problem/space differently?

As for ReleaseStateQueryData:
Why does this trait exist? To extend the item lifetime of implementors that do not borrow from QueryState.
Why can we not extend all item lifetimes? Because extending lifetimes past the lifetime of the data being referenced is unsound.
Why is this trait safe? Because we are not relying on it for safety.

For this to work, the compiler would need to reject an incorrect implementation of ReleaseStateQuery data here. Does it actually do that? The most likely to be wrong part in my understanding is that I think that it doesn't.
If it doesn't, is there not a contradiction here? If it does not matter which item lifetimes we extend, why does this trait exist all?

(Note that Clone does not extend lifetimes, it borrows from self.)

@chescock
Copy link
Contributor Author

That is a lot of words and I want to take the time to read them more carefully before I respond, but on first look I think we agree on the principles here and just disagree on some of the details.

@Victoronz
Copy link
Contributor

That is a lot of words and I want to take the time to read them more carefully before I respond, but on first look I think we agree on the principles here and just disagree on some of the details.

I think we mostly agree too! (I do tend to express myself rather wordily, I'll try to work on that)

@chescock
Copy link
Contributor Author

(I do tend to express myself rather wordily, I'll try to work on that)

No worries! I tend to alternate between writing too many words, and then deleting so many of them that my original point gets lost :). You just started reading one of the ones with too many words :).

In my experience, broad non-local safety requirements are a burden to maintain, extend, and audit, and somewhat go against usual design philosophy in Rust. I find that thinking about/working with a type (and by extension, functions utilizing that type) becomes a lot easier when every instance of that type is correct, instead of "we accept instances that can have unwanted behavior, as long as we do not call/use it in any way that exhibits the aforementioned behavior".

I definitely agree with this! One of the really nice things about rust is the way it lets us encapsulate unsafety into types that carry additional proofs!

Ultimately, I've concluded that safety requirements should not be viral whenever possible, and if they need to be, there should be documentation with the same degree of virality!

This, too!

If I define "broad non-local safety contract" more accurately: An unsafe function that imposes a restriction on a type instance that cannot be verified without knowing the entire "life" of that type. + This aforementioned type touching a broad amount of code, or being "viral".

I agree that safety contracts should be verifiable locally. Possibly I'm just conditioned to mistrust documentation, so I assume there is a valid safety contract but it isn't the one written :).

Certainly, the current safety constraint on Query::new, "This will create a query that could violate memory safety rules. Make sure that this is only called in ways that ensure the queries have unique mutable access.", is a little vague. I think it means the UnsafeWorldCell needs to have a superset of the access from QueryState::component_access. And then we rely on the QueryState type to ensure that the access is consistent with the storage ids and state and so forth.

A relevant example of this is our current usage of UnsafeWorldCell: some ways of constructing it impose this same requirement of "some of this types' API must not be called". At the same time, UWCs propagate through a lot of code, where the work to trace back to type construction is not always done, which is what I believe led to #13343. Sometimes UWCs are being passed as parameters to public trait methods, which increases this risk!

Yeah, the rules around data vs metadata for UnsafeWorldCell have some unfortunate complexity.

What was #13343 again? ... Oh, I see, the fundamental problem was a call to UnsafeWorldCell::world(). The safety comment there didn't even attempt to address the safety requirements of the call.

Maybe the issue here is that UnsafeWorldCell has viral safety requirements, because every time you pass one you have to hand-write the proof that the access is valid. And that leads to fatigue, where devs and reviewers assume that this is just another case where we're passing along obvious requirements, and therefore they miss cases where the requirements aren't actually met. Is that what you were referring to?

I agree that's a danger! (I'm not sure how to do better on UnsafeWorldCell, and I think there is just a lot of essential complexity there, but we're not trying to solve that one here.)

As another example, in the previous batch of EntitySet functionality, we added a few wrapper types that maintain a uniqueness invariant on their internal type. They were only intended to be constructed/handled by explicit construction/accessor methods, yet those were circumvented by some later PRs that began meshing the functionality with other parts of the ECS! (Even when the API surface did cover their use case) That is of no fault of their own, I didn't explicitly document (for contributors) that the wrapper fields were not to be manually accessed. However, direct field access does occur more throughout the engine: For instance, we quite commonly touch world data fields, without accessors, directly after obtaining it from an UWC.

Yeah, I really wish rust let you declare that fields or types were unsafe to modify. I'm not sure how to do better in the language we have than making non-pub fields with unsafe constructors and accessors, though.

Like, we can't prevent safe code in the query module from swapping out the world or state with one from a different world, and having all the ComponentIds be invalid!

I might be too stingy on this, but I do feel that it would help in lessening the cognitive load of Query/QueryState related work :) How do you feel about this, do you view this design problem/space differently?

Adding a new trait increases the cognitive load, so the question is whether it saves more in reasoning about unsafe. That's why I was happy to make a trait unsafe when we needed a new trait anyway, but want to avoid it when we can use an off-the-shelf trait like Deref.

And I don't think it helps all that much, given that we have other safety requirements to satisfy. It mostly serves to split the proof into multiple places. We're using concrete types, so the code calling new can already rely on &T and Box<T> having deterministic Deref impls without needing to indirect through another trait. (I might feel differently if we wanted to support an unbounded set of types, but right now it's just the two.)

Also, I think a non-deterministic Deref might ... work fine? Each QueryState needs to be individually valid, but I don't think it matters if they're all the same. Like, you could probably construct a query that iterates a different set of tables each time, but that should still be sound as long as each set of tables was valid for the original query.

This does make me realize I'm missing safety requirements on ConsumableQueryState, though. Like, storage_ids() needs to return distinct values that are each valid for Data and Filter. Right now we rely on those being correct because they are a QueryState, but I don't think that proof is being carried through the trait right now.

(And, of course, many fields like QueryState::fetch_state are pub (crate), so we technically have to ensure that nobody does nonsense like replacing the ComponentId on a QueryState<&T>. Unsafe fields would be nice...)

As for ReleaseStateQueryData: Why does this trait exist? To extend the item lifetime of implementors that do not borrow from QueryState. Why can we not extend all item lifetimes? Because extending lifetimes past the lifetime of the data being referenced is unsound. Why is this trait safe? Because we are not relying on it for safety.

For this to work, the compiler would need to reject an incorrect implementation of ReleaseStateQuery data here. Does it actually do that? The most likely to be wrong part in my understanding is that I think that it doesn't. If it doesn't, is there not a contradiction here? If it does not matter which item lifetimes we extend, why does this trait exist all?

(Note that Clone does not extend lifetimes, it borrows from self.)

So, most types don't use the 's lifetime, and can implement ReleaseStateQueryData as

impl ReleaseStateQueryData for Thing {
    fn release_state<'w>(item: Self::Item<'w, '_>) -> Self::Item<'w, 'static> {
        item
    }
}

An impl like that will compile for type Item<'w, 's> = &'w T because the types are equal, but would not compile for type Item<'w, 's> = FilteredEntityRef<'w, 's> because they are not. So I think I would say that the compiler will reject incorrect implementations.

There may be cases where the types are different but you do something like Cow::Owned(item.into_owned()) to make the lifetimes work, but that's also perfectly sound.

And you could, of course, do unsafe { mem::transmute(item) }, but the bad unsafe code there is the transmute and not the impl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Code-Quality A section of code that is hard to understand or change C-Usability A targeted quality-of-life change that makes Bevy easier to use D-Complex Quite challenging from either a design or technical perspective. Ask for help! M-Needs-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide S-Needs-Review Needs reviewer attention (from anyone!) to move forward X-Contentious There are nontrivial implications that should be thought through
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants