-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Block] Add option to configure block device flush #2447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
920ad88
cb1eb6e
4710a4a
f31435e
1e5c40c
8dcfda5
140cf31
a09b9f6
32ca43d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Block device caching strategies | ||
|
||
Firecracker offers the possiblity of choosing the block device caching | ||
strategy. Caching strategy affects the path data written from inside the | ||
microVM takes to the host persistent storage. | ||
|
||
## How it works | ||
|
||
When installing a block device through a PUT /drives API call, users can choose | ||
the caching strategy by inserting a `cache_type` field in the JSON body of the | ||
request. The available cache types are: | ||
|
||
- `Unsafe` | ||
- `Writeback` | ||
|
||
### Unsafe mode (default) | ||
|
||
When configuring the block caching strategy to `Unsafe`, the device will | ||
advertise the VirtIO `flush` feature to the guest driver. If negotiated when | ||
activating the device, the guest driver will be able to send flush requests | ||
to the device, but the device will just acknowledge the request without | ||
actually performing any flushing on the host side. The data which was not | ||
yet committed to disk will continue to reside in the host page cache. | ||
|
||
### Writeback mode | ||
|
||
When configuring the block caching strategy to `Writeback`, the device will | ||
advertise the VirtIO `flush` feature to the guest driver. If negotiated when | ||
activating the device, the guest driver will be able to send flush requests | ||
to the device. When the device executes a flush request, it will perform an | ||
`fsync` syscall on the backing block file, committing all data in the host | ||
page cache to disk. | ||
|
||
## Supported use cases | ||
|
||
The caching strategy should be used in order to make a trade-off: | ||
|
||
- `Unsafe` | ||
- enhances performance as fewer syscalls and IO operations are performed when | ||
running workloads | ||
- sacrifices data integrity in situations where the host simply loses the | ||
contents of the page cache without committing them to the backing storage | ||
(such as a power outage) | ||
- recommended for use cases with ephemeral storage, such as serverless | ||
environments | ||
- `Writeback` | ||
- ensures that once a flush request was acknowledged by the host, the data | ||
is committed to the backing storage | ||
- sacrifices performance, from boot time increases to greater | ||
emulation-related latencies when running workloads | ||
- recommended for use cases with low power environments, such as embedded | ||
environments | ||
|
||
## How to configure it | ||
|
||
Example sequence that configures a block device with a caching strategy: | ||
|
||
```bash | ||
curl --unix-socket ${socket} -i \ | ||
-X PUT "http://localhost/drives/dummy" \ | ||
-H "accept: application/json" \ | ||
-H "Content-Type: application/json" \ | ||
-d "{ | ||
\"drive_id\": \"dummy\", | ||
\"path_on_host\": \"${drive_path}\", | ||
\"is_root_device\": false, | ||
\"is_read_only\": false, | ||
\"cache_type\": \"Writeback\" | ||
}" | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,16 +30,41 @@ use super::{ | |
use crate::virtio::VIRTIO_MMIO_INT_CONFIG; | ||
use crate::Error as DeviceError; | ||
|
||
use serde::{Deserialize, Serialize}; | ||
|
||
/// Configuration options for disk caching. | ||
#[derive(Clone, Copy, Debug, Deserialize, PartialEq, Serialize)] | ||
pub enum CacheType { | ||
/// Flushing mechanic will be advertised to the guest driver, but | ||
/// the operation will be a noop. | ||
Unsafe, | ||
/// Flushing mechanic will be advertised to the guest driver and | ||
/// flush requests coming from the guest will be performed using | ||
/// `fsync`. | ||
Writeback, | ||
} | ||
|
||
impl Default for CacheType { | ||
fn default() -> CacheType { | ||
CacheType::Unsafe | ||
} | ||
} | ||
|
||
/// Helper object for setting up all `Block` fields derived from its backing file. | ||
pub(crate) struct DiskProperties { | ||
cache_type: CacheType, | ||
file_path: String, | ||
file: File, | ||
nsectors: u64, | ||
image_id: Vec<u8>, | ||
} | ||
|
||
impl DiskProperties { | ||
pub fn new(disk_image_path: String, is_disk_read_only: bool) -> io::Result<Self> { | ||
pub fn new( | ||
disk_image_path: String, | ||
is_disk_read_only: bool, | ||
cache_type: CacheType, | ||
) -> io::Result<Self> { | ||
let mut disk_image = OpenOptions::new() | ||
.read(true) | ||
.write(!is_disk_read_only) | ||
|
@@ -57,6 +82,7 @@ impl DiskProperties { | |
} | ||
|
||
Ok(Self { | ||
cache_type, | ||
nsectors: disk_size >> SECTOR_SHIFT, | ||
image_id: Self::build_disk_image_id(&disk_image), | ||
file_path: disk_image_path, | ||
|
@@ -121,6 +147,31 @@ impl DiskProperties { | |
} | ||
config | ||
} | ||
|
||
pub fn cache_type(&self) -> CacheType { | ||
self.cache_type | ||
} | ||
} | ||
|
||
impl Drop for DiskProperties { | ||
fn drop(&mut self) { | ||
match self.cache_type { | ||
CacheType::Writeback => { | ||
// flush() first to force any cached data out. | ||
if self.file.flush().is_err() { | ||
error!("Failed to flush block data on drop."); | ||
} | ||
// Sync data out to physical media on host. | ||
if self.file.sync_all().is_err() { | ||
error!("Failed to sync block data on drop.") | ||
} | ||
METRICS.block.flush_count.inc(); | ||
} | ||
CacheType::Unsafe => { | ||
// This is a noop. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should also do a Otherwise, if the rust There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no need to include For buffered IO in C, one would use This means that all data written with a Moreover, I don't think the argument that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My arguments were also rooted in the fact that for Even if this a noop, this just follows the semantic of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is just future proofing. I thought about the semantics of I would agree to removing it, I have no strong preference here. Let me know what you think. |
||
} | ||
}; | ||
} | ||
} | ||
|
||
/// Virtio device for exposing block level read/write operations on a host file. | ||
|
@@ -155,12 +206,13 @@ impl Block { | |
pub fn new( | ||
id: String, | ||
partuuid: Option<String>, | ||
cache_type: CacheType, | ||
disk_image_path: String, | ||
is_disk_read_only: bool, | ||
is_disk_root: bool, | ||
rate_limiter: RateLimiter, | ||
) -> io::Result<Block> { | ||
let disk_properties = DiskProperties::new(disk_image_path, is_disk_read_only)?; | ||
let disk_properties = DiskProperties::new(disk_image_path, is_disk_read_only, cache_type)?; | ||
|
||
let mut avail_features = (1u64 << VIRTIO_F_VERSION_1) | (1u64 << VIRTIO_BLK_F_FLUSH); | ||
|
||
|
@@ -343,7 +395,8 @@ impl Block { | |
|
||
/// Update the backing file and the config space of the block device. | ||
pub fn update_disk_image(&mut self, disk_image_path: String) -> io::Result<()> { | ||
let disk_properties = DiskProperties::new(disk_image_path, self.is_read_only())?; | ||
let disk_properties = | ||
DiskProperties::new(disk_image_path, self.is_read_only(), self.cache_type())?; | ||
self.disk = disk_properties; | ||
self.config_space = self.disk.virtio_block_config_space(); | ||
|
||
|
@@ -380,6 +433,10 @@ impl Block { | |
pub fn is_root_device(&self) -> bool { | ||
self.root_device | ||
} | ||
|
||
pub fn cache_type(&self) -> CacheType { | ||
self.disk.cache_type() | ||
} | ||
} | ||
|
||
impl VirtioDevice for Block { | ||
|
@@ -491,8 +548,12 @@ pub(crate) mod tests { | |
let size = SECTOR_SIZE * num_sectors; | ||
f.as_file().set_len(size).unwrap(); | ||
|
||
let disk_properties = | ||
DiskProperties::new(String::from(f.as_path().to_str().unwrap()), true).unwrap(); | ||
let disk_properties = DiskProperties::new( | ||
String::from(f.as_path().to_str().unwrap()), | ||
true, | ||
CacheType::Unsafe, | ||
) | ||
.unwrap(); | ||
|
||
assert_eq!(size, SECTOR_SIZE * num_sectors); | ||
assert_eq!(disk_properties.nsectors, num_sectors); | ||
|
@@ -504,7 +565,9 @@ pub(crate) mod tests { | |
// Testing `backing_file.virtio_block_disk_image_id()` implies | ||
// duplicating that logic in tests, so skipping it. | ||
|
||
assert!(DiskProperties::new("invalid-disk-path".to_string(), true).is_err()); | ||
assert!( | ||
DiskProperties::new("invalid-disk-path".to_string(), true, CacheType::Unsafe).is_err() | ||
); | ||
} | ||
|
||
#[test] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe note here that the data may is only committed to the physical disk when the host driver decides to
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an explanation of how caching works on Linux systems is not appropriate here, especially since whatever we write in this section highly depends on how the host OS and FS are set up. The fact that IO is cached is the default behavior in Linux systems and I don't think we need to explain this here.
We should, however, touch on this in the general block device documentation and provide our recommendations regarding host OS and FS setup. In that section we should also mention what host configuration we are running our tests on, and that should be enough.
Note: the block device doesn't have its own documentation and I will add it and move the cache documentation that file in a subsequent PR.