Skip to content

Commit 71f8095

Browse files
committed
Touch up comments in fingerprint file
1 parent 8251a28 commit 71f8095

File tree

1 file changed

+165
-37
lines changed

1 file changed

+165
-37
lines changed

src/cargo/core/compiler/fingerprint.rs

Lines changed: 165 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,14 @@
1414
//! Unit. If any of the inputs changes from the last compilation, then the
1515
//! Unit is considered dirty. A missing fingerprint (such as during the
1616
//! first build) is also considered dirty.
17-
//! - Dirty propagation is done in the `JobQueue`. When a Unit is dirty, the
18-
//! `JobQueue` automatically treats anything that depends on it as dirty.
19-
//! Anything that relies on this is probably a bug. The fingerprint should
20-
//! always be complete (but there are some known limitations). This is a
21-
//! problem because not all Units are built all at once. If two separate
22-
//! `cargo` commands are run that build different Units, this dirty
23-
//! propagation will not work across commands.
17+
//! - Whether or not input files are actually present. For example a build
18+
//! script which says it depends on a nonexistent file `foo` is always rerun.
19+
//! - Propagation throughout the dependency graph of file modification time
20+
//! information, used to detect changes on the filesystem. Each `Fingerprint`
21+
//! keeps track of what files it'll be processing, and when necessary it will
22+
//! check the `mtime` of each file (last modification time) and compare it to
23+
//! dependencies and output to see if files have been changed or if a change
24+
//! needs to force recompiles of downstream dependencies.
2425
//!
2526
//! Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking
2627
//! is notoriously imprecise and problematic. Only a small part of the
@@ -97,24 +98,52 @@
9798
//!
9899
//! After the list of Units has been calculated, the Units are added to the
99100
//! `JobQueue`. As each one is added, the fingerprint is calculated, and the
100-
//! dirty/fresh status is recorded in the `JobQueue`. A closure is used to
101-
//! update the fingerprint on-disk when the Unit successfully finishes. The
102-
//! closure will recompute the Fingerprint based on the updated information.
103-
//! If the Unit fails to compile, the fingerprint is not updated.
101+
//! dirty/fresh status is recorded. A closure is used to update the fingerprint
102+
//! on-disk when the Unit successfully finishes. The closure will recompute the
103+
//! Fingerprint based on the updated information. If the Unit fails to compile,
104+
//! the fingerprint is not updated.
104105
//!
105106
//! Fingerprints are cached in the `Context`. This makes computing
106107
//! Fingerprints faster, but also is necessary for properly updating
107108
//! dependency information. Since a Fingerprint includes the Fingerprints of
108109
//! all dependencies, when it is updated, by using `Arc` clones, it
109110
//! automatically picks up the updates to its dependencies.
110111
//!
112+
//! ## Considerations for inclusion in a fingerprint
113+
//!
114+
//! Over time we've realized a few items which historically were included in
115+
//! fingerprint hashings should not actually be included. Examples are:
116+
//!
117+
//! * Modification time values. We strive to never include a modification time
118+
//! inside a `Fingerprint` to get hashed into an actual value. While
119+
//! theoretically fine to do, in practice this causes issues with common
120+
//! applications like Docker. Docker, after a layer is built, will zero out
121+
//! the nanosecond part of all filesystem modification times. This means that
122+
//! the actual modification time is different for all build artifacts, which
123+
//! if we tracked the actual values of modification times would cause
124+
//! unnecessary recompiles. To fix this we instead only track paths which are
125+
//! relevant. These paths are checked dynamically to see if they're up to
126+
//! date, and the modifiation time doesn't make its way into the fingerprint
127+
//! hash.
128+
//!
129+
//! * Absolute path names. We strive to maintain a property where if you rename
130+
//! a project directory Cargo will continue to preserve all build artifacts
131+
//! and reuse the cache. This means that we can't ever hash an absolute path
132+
//! name. Instead we always hash relative path names and the "root" is passed
133+
//! in at runtime dynamically. Some of this is best effort, but the general
134+
//! idea is that we assume all acceses within a crate stay within that
135+
//! crate.
136+
//!
137+
//! These are pretty tricky to test for unfortunately, but we should have a good
138+
//! test suite nowadays and lord knows Cargo gets enough testing in the wild!
139+
//!
111140
//! ## Build scripts
112141
//!
113142
//! The *running* of a build script (`CompileMode::RunCustomBuild`) is treated
114143
//! significantly different than all other Unit kinds. It has its own function
115-
//! for calculating the Fingerprint (`prepare_build_cmd`) and has some unique
116-
//! considerations. It does not track the same information as a normal Unit.
117-
//! The information tracked depends on the `rerun-if-changed` and
144+
//! for calculating the Fingerprint (`calculate_run_custom_build`) and has some
145+
//! unique considerations. It does not track the same information as a normal
146+
//! Unit. The information tracked depends on the `rerun-if-changed` and
118147
//! `rerun-if-env-changed` statements produced by the build script. If the
119148
//! script does not emit either of these statements, the Fingerprint runs in
120149
//! "old style" mode where an mtime change of *any* file in the package will
@@ -357,28 +386,37 @@ pub struct Fingerprint {
357386
/// "description", which are exposed as environment variables during
358387
/// compilation.
359388
metadata: u64,
360-
/// Whether any of the `local` inputs, on the filesystem, are considered out
361-
/// of date by looking at mtimes.
389+
/// Description of whether the filesystem status for this unit is up to date
390+
/// or should be considered stale.
362391
#[serde(skip_serializing, skip_deserializing)]
363392
fs_status: FsStatus,
364-
/// A file, relative to `target_root`, that is produced by the step that
393+
/// Files, relative to `target_root`, that are produced by the step that
365394
/// this `Fingerprint` represents. This is used to detect when the whole
366395
/// fingerprint is out of date if this is missing, or if previous
367396
/// fingerprints output files are regenerated and look newer than this one.
368397
#[serde(skip_serializing, skip_deserializing)]
369398
outputs: Vec<PathBuf>,
370399
}
371400

401+
/// Indication of the status on the filesystem for a particular unit.
372402
enum FsStatus {
403+
/// This unit is to be considered stale, even if hash information all
404+
/// matches. The filesystem inputs have changed (or are missing) and the
405+
/// unit needs to subsequently be recompiled.
373406
Stale,
407+
408+
/// This unit is up-to-date, it does not need to be recompiled. If there are
409+
/// any outputs then the `FileTime` listed here is the minimum of all their
410+
/// mtimes. This is then later used to see if a unit is newer than one of
411+
/// its dependants, causing the dependant to be recompiled.
374412
UpToDate(Option<FileTime>),
375413
}
376414

377415
impl FsStatus {
378416
fn up_to_date(&self) -> bool {
379417
match self {
380418
FsStatus::UpToDate(_) => true,
381-
_ => false,
419+
FsStatus::Stale => false,
382420
}
383421
}
384422
}
@@ -443,19 +481,40 @@ enum StaleFile {
443481
}
444482

445483
impl LocalFingerprint {
484+
/// Checks dynamically at runtime if this `LocalFingerprint` has a stale
485+
/// file.
486+
///
487+
/// This will use the absolute root paths passed in if necessary to guide
488+
/// file accesses.
446489
fn find_stale_file(
447490
&self,
448491
pkg_root: &Path,
449492
target_root: &Path,
450493
) -> CargoResult<Option<StaleFile>> {
451494
match self {
495+
// We need to parse `dep_info`, learn about all the files the crate
496+
// depends on, and then see if any of them are newer than the
497+
// dep_info file itself. If the `dep_info` file is missing then this
498+
// unit has never been compiled!
452499
LocalFingerprint::CheckDepInfo { dep_info } => {
453-
find_stale_file_from_dep_info(pkg_root, &target_root.join(dep_info))
500+
let dep_info = target_root.join(dep_info);
501+
if let Some(paths) = parse_dep_info(pkg_root, &dep_info)? {
502+
Ok(find_stale_file(&dep_info, paths.iter()))
503+
} else {
504+
Ok(Some(StaleFile::Missing(dep_info)))
505+
}
454506
}
507+
508+
// We need to verify that no paths listed in `paths` are newer than
509+
// the `output` path itself, or the last time the build script ran.
455510
LocalFingerprint::RerunIfChanged { output, paths } => Ok(find_stale_file(
456511
&target_root.join(output),
457512
paths.iter().map(|p| pkg_root.join(p)),
458513
)),
514+
515+
// These have no dependencies on the filesystem, and their values
516+
// are included natively in the `Fingerprint` hash so nothing
517+
// tocheck for here.
459518
LocalFingerprint::RerunIfEnvChanged { .. } => Ok(None),
460519
LocalFingerprint::Precalculated(..) => Ok(None),
461520
}
@@ -518,6 +577,12 @@ impl Fingerprint {
518577
ret
519578
}
520579

580+
/// Compares this fingerprint with an old version which was previously
581+
/// serialized to filesystem.
582+
///
583+
/// The purpose of this is exclusively to produce a diagnostic message
584+
/// indicating why we're recompiling something. This function always returns
585+
/// an error, it will never return success.
521586
fn compare(&self, old: &Fingerprint) -> CargoResult<()> {
522587
if self.rustc != old.rustc {
523588
bail!("rust compiler has changed")
@@ -640,15 +705,33 @@ impl Fingerprint {
640705
bail!("current filesystem status shows we're outdated");
641706
}
642707

708+
// This typically means some filesystem modifications happened or
709+
// something transitive was odd. In general we should strive to provide
710+
// a better error message than this, so if you see this message a lot it
711+
// likely means this method needs to be updated!
643712
bail!("two fingerprint comparison turned up nothing obvious");
644713
}
645714

715+
/// Dynamically inspect the local filesystem to update the `fs_status` field
716+
/// of this `Fingerprint`.
717+
///
718+
/// This function is used just after a `Fingerprint` is constructed to check
719+
/// the local state of the filesystem and propagate any dirtiness from
720+
/// dependencies up to this unit as well. This function assumes that the
721+
/// unit starts out as `FsStatus::Stale` and then it will optionally switch
722+
/// it to `UpToDate` if it can.
646723
fn check_filesystem(
647724
&mut self,
648725
pkg_root: &Path,
649726
target_root: &Path,
650727
mtime_on_use: bool,
651728
) -> CargoResult<()> {
729+
assert!(!self.fs_status.up_to_date());
730+
731+
// Get the `mtime` of all outputs. Optionally update their mtime
732+
// afterwards based on the `mtime_on_use` flag. Afterwards we want the
733+
// minimum mtime as it's the one we'll be comparing to inputs and
734+
// dependencies.
652735
let status = self
653736
.outputs
654737
.iter()
@@ -661,32 +744,59 @@ impl Fingerprint {
661744
return mtime;
662745
})
663746
.min();
747+
664748
let mtime = match status {
665-
Some(Some(mtime)) => mtime,
666-
Some(None) => return Ok(()),
749+
// We had no output files. This means we're an overridden build
750+
// script and we're just always up to date because we aren't
751+
// watching the filesystem.
667752
None => {
668753
self.fs_status = FsStatus::UpToDate(None);
669754
return Ok(());
670755
}
756+
757+
// At least one path failed to report its `mtime`. It probably
758+
// doesn't exists, so leave ourselves as stale and bail out.
759+
Some(None) => return Ok(()),
760+
761+
// All files successfully reported an `mtime`, and we've got the
762+
// minimum one, so let's keep going with that.
763+
Some(Some(mtime)) => mtime,
671764
};
672765

673766
for dep in self.deps.iter() {
674767
let dep_mtime = match dep.fingerprint.fs_status {
675-
FsStatus::UpToDate(Some(mtime)) => mtime,
676-
FsStatus::UpToDate(None) => continue,
768+
// If our dependency is stale, so are we, so bail out.
677769
FsStatus::Stale => return Ok(()),
770+
771+
// If our dependencies is up to date and has no filesystem
772+
// interactions, then we can move on to the next dependency.
773+
FsStatus::UpToDate(None) => continue,
774+
775+
FsStatus::UpToDate(Some(mtime)) => mtime,
678776
};
777+
778+
// If the dependency is newer than our own output then it was
779+
// recompiled previously. We transitively become stale ourselves in
780+
// that case, so bail out.
679781
if dep_mtime > mtime {
680782
return Ok(());
681783
}
682784
}
785+
786+
// If we reached this far then all dependencies are up to date. Check
787+
// all our `LocalFingerprint` information to see if we have any stale
788+
// files for this package itself. If we do find something log a helpful
789+
// message and bail out so we stay stale.
683790
for local in self.local.iter() {
684791
if let Some(file) = local.find_stale_file(pkg_root, target_root)? {
685792
file.log();
686793
return Ok(());
687794
}
688795
}
796+
797+
// Everything was up to date! Record such.
689798
self.fs_status = FsStatus::UpToDate(Some(mtime));
799+
690800
Ok(())
691801
}
692802
}
@@ -790,6 +900,11 @@ impl DepFingerprint {
790900
}
791901

792902
impl StaleFile {
903+
/// Use the `log` crate to log a hopefully helpful message in diagnosing
904+
/// what file is considered stale and why. This is intended to be used in
905+
/// conjunction with `RUST_LOG` to determine why Cargo is recompiling
906+
/// something. Currently there's no user-facing usage of this other than
907+
/// that.
793908
fn log(&self) {
794909
match self {
795910
StaleFile::Missing(path) => {
@@ -835,9 +950,13 @@ fn calculate<'a, 'cfg>(
835950
} else {
836951
calculate_normal(cx, unit)?
837952
};
953+
954+
// After we built the initial `Fingerprint` be sure to update the
955+
// `fs_status` field of it.
838956
let target_root = target_root(cx, unit);
839957
let mtime_on_use = cx.bcx.config.cli_unstable().mtime_on_use;
840958
fingerprint.check_filesystem(unit.pkg.root(), &target_root, mtime_on_use)?;
959+
841960
let fingerprint = Arc::new(fingerprint);
842961
cx.fingerprints.insert(*unit, Arc::clone(&fingerprint));
843962
Ok(fingerprint)
@@ -1047,6 +1166,16 @@ fn build_script_local_fingerprints<'a, 'cfg>(
10471166
move |deps: &BuildDeps, pkg_fingerprint: Option<&dyn Fn() -> CargoResult<String>>| {
10481167
if deps.rerun_if_changed.is_empty() && deps.rerun_if_env_changed.is_empty() {
10491168
match pkg_fingerprint {
1169+
// FIXME: this is somewhat buggy with respect to docker and
1170+
// weird filesystems. The `Precalculated` variant
1171+
// constructed below will, for `path` dependencies, contain
1172+
// a stringified version of the mtime for the local crate.
1173+
// This violates one of the things we describe in this
1174+
// module's doc comment, never hashing mtimes. We should
1175+
// figure out a better scheme where a package fingerprint
1176+
// may be a string (like for a registry) or a list of files
1177+
// (like for a path dependency). Those list of files would
1178+
// be stored here rather than the the mtime of them.
10501179
Some(f) => {
10511180
debug!("old local fingerprints deps");
10521181
let s = f()?;
@@ -1093,6 +1222,9 @@ fn local_fingerprints_deps(
10931222
let mut local = Vec::new();
10941223

10951224
if !deps.rerun_if_changed.is_empty() {
1225+
// Note that like the module comment above says we are careful to never
1226+
// store an absolute path in `LocalFingerprint`, so ensure that we strip
1227+
// absolute prefixes from them.
10961228
let output = deps
10971229
.build_script_output
10981230
.strip_prefix(target_root)
@@ -1146,13 +1278,18 @@ pub fn prepare_init<'a, 'cfg>(cx: &mut Context<'a, 'cfg>, unit: &Unit<'a>) -> Ca
11461278
Ok(())
11471279
}
11481280

1281+
/// Returns the location that the dep-info file will show up at for the `unit`
1282+
/// specified.
11491283
pub fn dep_info_loc<'a, 'cfg>(cx: &mut Context<'a, 'cfg>, unit: &Unit<'a>) -> PathBuf {
11501284
cx.files()
11511285
.fingerprint_dir(unit)
11521286
.join(&format!("dep-{}", filename(cx, unit)))
11531287
}
11541288

1155-
pub fn target_root<'a, 'cfg>(cx: &mut Context<'a, 'cfg>, unit: &Unit<'a>) -> PathBuf {
1289+
/// Returns an absolute path that the `unit`'s outputs should always be relative
1290+
/// to. This `target_root` variable is used to store relative path names in
1291+
/// `Fingerprint` instead of absolute pathnames (see module comment).
1292+
fn target_root<'a, 'cfg>(cx: &mut Context<'a, 'cfg>, unit: &Unit<'a>) -> PathBuf {
11561293
if unit.mode.is_run_custom_build() {
11571294
cx.files().build_script_run_dir(unit)
11581295
} else if unit.kind == Kind::Host {
@@ -1184,8 +1321,10 @@ fn compare_old_fingerprint(
11841321
let old_fingerprint_json = paths::read(&loc.with_extension("json"))?;
11851322
let old_fingerprint: Fingerprint = serde_json::from_str(&old_fingerprint_json)
11861323
.chain_err(|| internal("failed to deserialize json"))?;
1187-
assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
1188-
new_fingerprint.compare(&old_fingerprint)
1324+
debug_assert_eq!(util::to_hex(old_fingerprint.hash()), old_fingerprint_short);
1325+
let result = new_fingerprint.compare(&old_fingerprint);
1326+
assert!(result.is_err());
1327+
return result;
11891328
}
11901329

11911330
fn log_compare(unit: &Unit<'_>, compare: &CargoResult<()>) {
@@ -1222,17 +1361,6 @@ pub fn parse_dep_info(pkg_root: &Path, dep_info: &Path) -> CargoResult<Option<Ve
12221361
}
12231362
}
12241363

1225-
fn find_stale_file_from_dep_info(
1226-
pkg_root: &Path,
1227-
dep_info: &Path,
1228-
) -> CargoResult<Option<StaleFile>> {
1229-
if let Some(paths) = parse_dep_info(pkg_root, dep_info)? {
1230-
Ok(find_stale_file(dep_info, paths.iter()))
1231-
} else {
1232-
Ok(Some(StaleFile::Missing(dep_info.to_path_buf())))
1233-
}
1234-
}
1235-
12361364
fn pkg_fingerprint(bcx: &BuildContext<'_, '_>, pkg: &Package) -> CargoResult<String> {
12371365
let source_id = pkg.package_id().source_id();
12381366
let sources = bcx.packages.sources();

0 commit comments

Comments
 (0)