-
Notifications
You must be signed in to change notification settings - Fork 120
[WIP] Nested Virtualization: KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE #322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
More info: https://docs.kernel.org/virt/kvm/api.html#capabilities-that-can-be-enabled-on-vcpus Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
6cce9ed
to
faa8422
Compare
This type is a helper making the use of get_nested_state() and set_nested_state(), which are added in the next commit, much more convenient. Effectively, KVM expects a dynamic buffer with a header reporting the size to either store the nested state in or load it from. As such data structures with a certain alignment are challenging to work with (note that Vec<u8> always have an alignment of 1 but we need 4), this type sacrifices a little memory in some cases overhead for better UX. As the type implements AsRef<[u8]> and AsMut<[u8]>, which will also be used as the parameter for get_nested_state() and set_nested_state(), users will be able to use custom buffers tailored to their use case, e.g., the SVM state is smaller than the VMX state. They then will also be able to pass in types that are compatible with `serde`. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
These calls are relevant for live-migration and state save/resume when nested virtualization is used. I tested everything with a nested guest in Cloud Hypervisor, but these patches are not yet upstream. Signed-off-by: Philipp Schuster <[email protected]> On-behalf-of: SAP [email protected]
Heya!
If I'm reading the kernel code correctly, then since commit 6ca00dfafda7 ("KVM: x86: Modify struct kvm_nested_state to have explicit fields for data") this isn't actually true anymore: The u8 flexible array member that kvm_nested_state used to have got removed and replaced with a strongly typed union field with a statically known size. The only reason it was declared as a zero-sized member was backwards compatibility with Linux <5.2. But here we don't really need to worry about that, so we can just have the ioctl wrappers take cc @bonzini who probably knows more about this than me though |
Please note that kvm_nested_state from the current bindgen is too small and only contains the "header". Various checks in my code verify this. The actual size of the nested state also differs between VMX and SVM, I also verified this on my developer machine. The effective size can be verified during runtime using My introduced helper type solves this using stack allocated memory. But for SVM, we waste a page every time as the SVM state is smaller. |
argh, you're right, the bindgen version is indeed rubbish, hadnt actually looked at bindings.rs.
yes this is all fine - I misinterpreted your PR description, sorry. I thought you were treating the kvm_nested_state struct as having a flexible array member (I got confused by the &[u8] conversions). As for the serde thingy, I think the diff below is more in line with how serde support for other structures is handled in this crate, and allows the ioctl wrappers to be implemented in terms diff --git a/kvm-bindings/src/x86_64/bindings.rs b/kvm-bindings/src/x86_64/bindings.rs
index d49ee9a4f4f3..248b4a8d76ac 100644
--- a/kvm-bindings/src/x86_64/bindings.rs
+++ b/kvm-bindings/src/x86_64/bindings.rs
@@ -1999,6 +1999,10 @@ impl Default for kvm_vmx_nested_state_data {
}
#[repr(C)]
#[derive(Debug, Default, Copy, Clone, PartialEq)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
pub struct kvm_vmx_nested_state_hdr {
pub vmxon_pa: __u64,
pub vmcs12_pa: __u64,
@@ -2009,6 +2013,10 @@ pub struct kvm_vmx_nested_state_hdr {
}
#[repr(C)]
#[derive(Debug, Default, Copy, Clone, PartialEq)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
pub struct kvm_vmx_nested_state_hdr__bindgen_ty_1 {
pub flags: __u16,
}
@@ -2065,6 +2073,10 @@ impl Default for kvm_svm_nested_state_data {
}
#[repr(C)]
#[derive(Debug, Default, Copy, Clone, PartialEq)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
pub struct kvm_svm_nested_state_hdr {
pub vmcb_pa: __u64,
}
@@ -2087,6 +2099,10 @@ pub struct kvm_nested_state {
}
#[repr(C)]
#[derive(Copy, Clone)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::Immutable, zerocopy::FromBytes)
+)]
pub union kvm_nested_state__bindgen_ty_1 {
pub vmx: kvm_vmx_nested_state_hdr,
pub svm: kvm_svm_nested_state_hdr,
diff --git a/kvm-bindings/src/x86_64/mod.rs b/kvm-bindings/src/x86_64/mod.rs
index 08e1bab89f26..9d9b69150fdc 100644
--- a/kvm-bindings/src/x86_64/mod.rs
+++ b/kvm-bindings/src/x86_64/mod.rs
@@ -9,6 +9,8 @@ pub mod fam_wrappers;
#[cfg(feature = "serde")]
mod serialize;
+pub mod nested;
+
pub use self::bindings::*;
#[cfg(feature = "fam-wrappers")]
pub use self::fam_wrappers::*;
diff --git a/kvm-bindings/src/x86_64/nested.rs b/kvm-bindings/src/x86_64/nested.rs
new file mode 100644
index 000000000000..1298984f7538
--- /dev/null
+++ b/kvm-bindings/src/x86_64/nested.rs
@@ -0,0 +1,59 @@
+use ::{kvm_nested_state__bindgen_ty_1, KVM_STATE_NESTED_VMX_VMCS_SIZE};
+use KVM_STATE_NESTED_SVM_VMCB_SIZE;
+
+
+#[derive(Clone, Copy)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub union kvm_nested_state__data {
+ pub vmx: kvm_vmx_nested_state_data,
+ pub svm: kvm_svm_nested_state_data,
+}
+
+impl Default for kvm_nested_state__data {
+ fn default() -> Self {
+ let mut s = ::std::mem::MaybeUninit::<Self>::uninit();
+ unsafe {
+ ::std::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+ s.assume_init()
+ }
+ }
+}
+
+#[derive(Clone, Copy)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub struct kvm_vmx_nested_state_data {
+ pub vmcs12: [u8; KVM_STATE_NESTED_VMX_VMCS_SIZE as usize],
+ pub shadow_vmcs12: [u8; KVM_STATE_NESTED_VMX_VMCS_SIZE as usize],
+}
+
+#[derive(Clone, Copy)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub struct kvm_svm_nested_state_data {
+ pub vmcb12: [u8; KVM_STATE_NESTED_SVM_VMCB_SIZE as usize],
+}
+
+#[derive(Clone, Copy, Default)]
+#[cfg_attr(
+ feature = "serde",
+ derive(zerocopy::IntoBytes, zerocopy::Immutable, zerocopy::FromBytes)
+)]
+#[repr(C)]
+pub struct kvm_nested_state {
+ pub flags: u16,
+ pub format: u16,
+ pub size: u32,
+ pub hdr: kvm_nested_state__bindgen_ty_1,
+ pub data: kvm_nested_state__data,
+}
diff --git a/kvm-bindings/src/x86_64/serialize.rs b/kvm-bindings/src/x86_64/serialize.rs
index 11d90d533936..e0e109e15cd7 100644
--- a/kvm-bindings/src/x86_64/serialize.rs
+++ b/kvm-bindings/src/x86_64/serialize.rs
@@ -10,6 +10,8 @@ use bindings::{
use fam_wrappers::kvm_xsave2;
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use zerocopy::{transmute, FromBytes, FromZeros, Immutable, IntoBytes};
+use kvm_nested_state__bindgen_ty_1;
+use nested::{kvm_nested_state, kvm_nested_state__data};
serde_impls!(
kvm_regs,
@@ -31,7 +33,8 @@ serde_impls!(
kvm_cpuid2,
kvm_xsave,
kvm_xsave2,
- kvm_irqchip
+ kvm_irqchip,
+ kvm_nested_state
);
// SAFETY: zerocopy's derives explicitly disallow deriving for unions where
@@ -94,10 +97,35 @@ unsafe impl Immutable for kvm_irqchip__bindgen_ty_1 {
}
}
+// SAFETY: zerocopy's derives explicitly disallow deriving for unions where
+// the fields have different sizes, due to the smaller fields having padding.
+// Miri however does not complain about these implementations (e.g. about
+// reading the "padding" for one union field as valid data for a bigger one)
+unsafe impl IntoBytes for kvm_nested_state__bindgen_ty_1 {
+ fn only_derive_is_allowed_to_implement_this_trait()
+ where
+ Self: Sized
+ {
+ }
+}
+
+// SAFETY: zerocopy's derives explicitly disallow deriving for unions where
+// the fields have different sizes, due to the smaller fields having padding.
+// Miri however does not complain about these implementations (e.g. about
+// reading the "padding" for one union field as valid data for a bigger one)
+unsafe impl IntoBytes for kvm_nested_state__data {
+ fn only_derive_is_allowed_to_implement_this_trait()
+ where
+ Self: Sized
+ {
+ }
+}
+
#[cfg(test)]
mod tests {
use super::*;
use bindings::*;
+ use nested;
fn is_serde<T: Serialize + for<'de> Deserialize<'de> + Default>() {
let config = bincode::config::standard();
@@ -152,6 +180,7 @@ mod tests {
is_serde::<kvm_xcrs>();
is_serde::<kvm_irqchip>();
is_serde::<kvm_mp_state>();
+ is_serde::<nested::kvm_nested_state>();
}
fn is_serde_json<T: Serialize + for<'de> Deserialize<'de> + Default>() {
@@ -184,5 +213,6 @@ mod tests {
is_serde_json::<kvm_xcrs>();
is_serde_json::<kvm_irqchip>();
is_serde_json::<kvm_mp_state>();
+ is_serde_json::<nested::kvm_nested_state>();
}
}
|
Summary of the PR
For live-migration and state save/resume in Cloud Hypervisor [PR] and other VMMs using nested virtualization, the relevant IOCTLs were missing so far to get the state from/into KVM.
Handling nested state is a little more complex than anticipated due to the "dynamic buffer" (structure with header reporting the length) that KVM uses in its interface. Creating a nice, safe, and easy interface took multiple iterations.. To mitigate this, my solution is a helper type acting as stack buffer that is good enough for most use-cases. To provide users more flexibility, the new functions consumes
AsRef/AsMut<[u8]>
, in case a user wants to use a type that isserde
compatible over the newKvmNestedStateBuffer
.Hints for Reviewers
Steps to Undraft
Requirements
Before submitting your PR, please make sure you addressed the following
requirements:
git commit -s
), and the commit message has max 60 characters for thesummary and max 75 characters for each description line.
test.
Release" section of CHANGELOG.md (if no such section exists, please create one).
unsafe
code is properly documented.