Skip to content

[WIP] Nested Virtualization: KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE #322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

phip1611
Copy link

@phip1611 phip1611 commented May 20, 2025

Summary of the PR

For live-migration and state save/resume in Cloud Hypervisor [PR] and other VMMs using nested virtualization, the relevant IOCTLs were missing so far to get the state from/into KVM.

Handling nested state is a little more complex than anticipated due to the "dynamic buffer" (structure with header reporting the length) that KVM uses in its interface. Creating a nice, safe, and easy interface took multiple iterations.. To mitigate this, my solution is a helper type acting as stack buffer that is good enough for most use-cases. To provide users more flexibility, the new functions consumes AsRef/AsMut<[u8]>, in case a user wants to use a type that is serde compatible over the new KvmNestedStateBuffer.

Hints for Reviewers

  • Please review this commit-by-commit.
  • This is the result of multiple internal iterations; I'm however open to design discussions!

Steps to Undraft

Requirements

Before submitting your PR, please make sure you addressed the following
requirements:

  • All commits in this PR have Signed-Off-By trailers (with
    git commit -s), and the commit message has max 60 characters for the
    summary and max 75 characters for each description line.
  • All added/changed functionality has a corresponding unit/integration
    test.
  • All added/changed public-facing functionality has entries in the "Upcoming
    Release" section of CHANGELOG.md (if no such section exists, please create one).
  • Any newly added unsafe code is properly documented.

@phip1611 phip1611 force-pushed the nested-state branch 4 times, most recently from 6cce9ed to faa8422 Compare May 20, 2025 08:59
phip1611 added 2 commits May 20, 2025 11:00
This type is a helper making the use of get_nested_state() and
set_nested_state(), which are added in the next commit, much more
convenient.

Effectively, KVM expects a dynamic buffer with a header reporting
the size to either store the nested state in or load it from. As such
data structures with a certain alignment are challenging to work with
(note that Vec<u8> always have an alignment of 1 but we need 4),
this type sacrifices a little memory in some cases overhead for better
UX.

As the type implements AsRef<[u8]> and AsMut<[u8]>, which will also
be used as the parameter for get_nested_state() and set_nested_state(),
users will be able to use custom buffers tailored to their use case,
e.g., the SVM state is smaller than the VMX state.

They then will also be able to pass in types that are compatible
with `serde`.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
These calls are relevant for live-migration and state save/resume
when nested virtualization is used.

I tested everything with a nested guest in Cloud Hypervisor, but these
patches are not yet upstream.

Signed-off-by: Philipp Schuster <[email protected]>
On-behalf-of: SAP [email protected]
@roypat
Copy link
Member

roypat commented May 20, 2025

Heya!

Handling nested state is a little more complex than anticipated due to the "dynamic buffer" (structure with header reporting the length) that KVM uses in its interface.

If I'm reading the kernel code correctly, then since commit 6ca00dfafda7 ("KVM: x86: Modify struct kvm_nested_state to have explicit fields for data") this isn't actually true anymore: The u8 flexible array member that kvm_nested_state used to have got removed and replaced with a strongly typed union field with a statically known size. The only reason it was declared as a zero-sized member was backwards compatibility with Linux <5.2. But here we don't really need to worry about that, so we can just have the ioctl wrappers take &[mut] kvm_nested_state as generated by bindgen (potentially with a check that kvm_nested_state.size == sizeof::<kvm_nested_state>() for sanity and protected against memory unsafety in case the structure does ever get extended again).

cc @bonzini who probably knows more about this than me though

@phip1611
Copy link
Author

phip1611 commented May 20, 2025

Please note that kvm_nested_state from the current bindgen is too small and only contains the "header". Various checks in my code verify this. The actual size of the nested state also differs between VMX and SVM, I also verified this on my developer machine.

The effective size can be verified during runtime using kvm.check_extension_int(Cap::NestedState)

My introduced helper type solves this using stack allocated memory. But for SVM, we waste a page every time as the SVM state is smaller.

@roypat
Copy link
Member

roypat commented May 20, 2025

Please note that kvm_nested_state from the current bindgen is too small and only contains the "header".

argh, you're right, the bindgen version is indeed rubbish, hadnt actually looked at bindings.rs.

Various checks in my code verify this. The actual size of the nested state also differs between VMX and SVM, I also verified this on my developer machine.

The effective size can be verified during runtime using kvm.check_extension_int(Cap::NestedState)

My introduced helper type solves this using stack allocated memory. But for SVM, we waste a page every time as the SVM state is smaller.

yes this is all fine - I misinterpreted your PR description, sorry. I thought you were treating the kvm_nested_state struct as having a flexible array member (I got confused by the &[u8] conversions).

As for the serde thingy, I think the diff below is more in line with how serde support for other structures is handled in this crate, and allows the ioctl wrappers to be implemented in terms struct KvmNestedStateBuffer instead of [u8] slices:

diff --git a/kvm-bindings/src/x86_64/bindings.rs b/kvm-bindings/src/x86_64/bindings.rs
index d49ee9a4f4f3..248b4a8d76ac 100644
--- a/kvm-bindings/src/x86_64/bindings.rs
+++ b/kvm-bindings/src/x86_64/bindings.rs
@@ -1999,6 +1999,10 @@ impl Default for kvm_vmx_nested_state_data {
 }
 #[repr(C)]
 #[derive(Debug, Default, Copy, Clone, PartialEq)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
 pub struct kvm_vmx_nested_state_hdr {
     pub vmxon_pa: __u64,
     pub vmcs12_pa: __u64,
@@ -2009,6 +2013,10 @@ pub struct kvm_vmx_nested_state_hdr {
 }
 #[repr(C)]
 #[derive(Debug, Default, Copy, Clone, PartialEq)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
 pub struct kvm_vmx_nested_state_hdr__bindgen_ty_1 {
     pub flags: __u16,
 }
@@ -2065,6 +2073,10 @@ impl Default for kvm_svm_nested_state_data {
 }
 #[repr(C)]
 #[derive(Debug, Default, Copy, Clone, PartialEq)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
 pub struct kvm_svm_nested_state_hdr {
     pub vmcb_pa: __u64,
 }
@@ -2087,6 +2099,10 @@ pub struct kvm_nested_state {
 }
 #[repr(C)]
 #[derive(Copy, Clone)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::Immutable, zerocopy::FromBytes)
+)]
 pub union kvm_nested_state__bindgen_ty_1 {
     pub vmx: kvm_vmx_nested_state_hdr,
     pub svm: kvm_svm_nested_state_hdr,
diff --git a/kvm-bindings/src/x86_64/mod.rs b/kvm-bindings/src/x86_64/mod.rs
index 08e1bab89f26..9d9b69150fdc 100644
--- a/kvm-bindings/src/x86_64/mod.rs
+++ b/kvm-bindings/src/x86_64/mod.rs
@@ -9,6 +9,8 @@ pub mod fam_wrappers;
 #[cfg(feature = "serde")]
 mod serialize;
 
+pub mod nested;
+
 pub use self::bindings::*;
 #[cfg(feature = "fam-wrappers")]
 pub use self::fam_wrappers::*;
diff --git a/kvm-bindings/src/x86_64/nested.rs b/kvm-bindings/src/x86_64/nested.rs
new file mode 100644
index 000000000000..1298984f7538
--- /dev/null
+++ b/kvm-bindings/src/x86_64/nested.rs
@@ -0,0 +1,59 @@
+use ::{kvm_nested_state__bindgen_ty_1, KVM_STATE_NESTED_VMX_VMCS_SIZE};
+use KVM_STATE_NESTED_SVM_VMCB_SIZE;
+
+
+#[derive(Clone, Copy)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub union kvm_nested_state__data {
+    pub vmx: kvm_vmx_nested_state_data,
+    pub svm: kvm_svm_nested_state_data,
+}
+
+impl Default for kvm_nested_state__data {
+    fn default() -> Self {
+        let mut s = ::std::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::std::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
+
+#[derive(Clone, Copy)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub struct kvm_vmx_nested_state_data {
+    pub vmcs12: [u8; KVM_STATE_NESTED_VMX_VMCS_SIZE as usize],
+    pub shadow_vmcs12: [u8; KVM_STATE_NESTED_VMX_VMCS_SIZE as usize],
+}
+
+#[derive(Clone, Copy)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub struct kvm_svm_nested_state_data {
+    pub vmcb12: [u8; KVM_STATE_NESTED_SVM_VMCB_SIZE as usize],
+}
+
+#[derive(Clone, Copy, Default)]
+#[cfg_attr(
+    feature = "serde",
+    derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub struct kvm_nested_state {
+    pub flags: u16,
+    pub format: u16,
+    pub size: u32,
+    pub hdr: kvm_nested_state__bindgen_ty_1,
+    pub data: kvm_nested_state__data,
+}
diff --git a/kvm-bindings/src/x86_64/serialize.rs b/kvm-bindings/src/x86_64/serialize.rs
index 11d90d533936..e0e109e15cd7 100644
--- a/kvm-bindings/src/x86_64/serialize.rs
+++ b/kvm-bindings/src/x86_64/serialize.rs
@@ -10,6 +10,8 @@ use bindings::{
 use fam_wrappers::kvm_xsave2;
 use serde::{Deserialize, Deserializer, Serialize, Serializer};
 use zerocopy::{transmute, FromBytes, FromZeros, Immutable, IntoBytes};
+use kvm_nested_state__bindgen_ty_1;
+use nested::{kvm_nested_state, kvm_nested_state__data};
 
 serde_impls!(
     kvm_regs,
@@ -31,7 +33,8 @@ serde_impls!(
     kvm_cpuid2,
     kvm_xsave,
     kvm_xsave2,
-    kvm_irqchip
+    kvm_irqchip,
+    kvm_nested_state
 );
 
 // SAFETY: zerocopy's derives explicitly disallow deriving for unions where
@@ -94,10 +97,35 @@ unsafe impl Immutable for kvm_irqchip__bindgen_ty_1 {
     }
 }
 
+// SAFETY: zerocopy's derives explicitly disallow deriving for unions where
+// the fields have different sizes, due to the smaller fields having padding.
+// Miri however does not complain about these implementations (e.g. about
+// reading the "padding" for one union field as valid data for a bigger one)
+unsafe impl IntoBytes for kvm_nested_state__bindgen_ty_1 {
+    fn only_derive_is_allowed_to_implement_this_trait()
+    where
+        Self: Sized
+    {
+    }
+}
+
+// SAFETY: zerocopy's derives explicitly disallow deriving for unions where
+// the fields have different sizes, due to the smaller fields having padding.
+// Miri however does not complain about these implementations (e.g. about
+// reading the "padding" for one union field as valid data for a bigger one)
+unsafe impl IntoBytes for kvm_nested_state__data {
+    fn only_derive_is_allowed_to_implement_this_trait()
+    where
+        Self: Sized
+    {
+    }
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
     use bindings::*;
+    use nested;
 
     fn is_serde<T: Serialize + for<'de> Deserialize<'de> + Default>() {
         let config = bincode::config::standard();
@@ -152,6 +180,7 @@ mod tests {
         is_serde::<kvm_xcrs>();
         is_serde::<kvm_irqchip>();
         is_serde::<kvm_mp_state>();
+        is_serde::<nested::kvm_nested_state>();
     }
 
     fn is_serde_json<T: Serialize + for<'de> Deserialize<'de> + Default>() {
@@ -184,5 +213,6 @@ mod tests {
         is_serde_json::<kvm_xcrs>();
         is_serde_json::<kvm_irqchip>();
         is_serde_json::<kvm_mp_state>();
+        is_serde_json::<nested::kvm_nested_state>();
     }
 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants