You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cut a few seconds off of every test using Cockroach (#6934)
This one has some history!
While working on some database-related APIs, I noticed that my tests
using CockroachDB were a **lot** slower than other tests. To some
degree, I expect this, but also, this was on the order of ~5-10 seconds
per test doing very little other than CockroachDB setup and teardown.
After doing a little profiling, I noticed that tests took several
seconds to perform teardown, which increases significantly if any schema
changes occurred.
Why does teardown take so long? Well, it turns out we are sending
`SIGTERM` to the CockroachDB process to gracefully terminate it, instead
of `SIGKILL`, which would terminate it much more abruptly.
This is where the history comes in: Gracefully terminating CockroachDB
was a choice we made a few years ago to avoid a test flake:
#540. Basically, when
creating the "seed database" -- a database where we apply schema changes
that are shared between many tests -- we want to gracefully terminate to
avoid leaving the database in a "dirty state", where it might need to
flush work and cleanup intermediate state. In the case of #540, that
"dirty intermediate state" was an absolute path, which meant copies of
that seed database trampled on each other if graceful shutdown did not
complete.
Our approach was to apply graceful termination to all CockroachDB
teardown invocations, but this was overkill.
Only the seed database expects to have storage be in-use after the call
to `cleanup` -- all other test-only invocations expect to immediately
remove their storage. They don't need to terminate gracefully, and
arguably, should just exit as quickly as they can.
This PR changes the disposition:
- `cleanup_gracefully` uses `SIGTERM`, and waits for graceful cleanup.
This is still used when constructing the seed db.
- `cleanup` uses `SIGKILL`, and kills the database immediately. This is
now used for all other use-cases.
As an example in the performance difference, here's a comparison for
some datastore tests:
## Before
```
SETUP PASS [ 1/1] crdb-seed: cargo run -p crdb-seed --profile test
PASS [ 6.996s] nexus-db-queries db::datastore::db_metadata::test::ensure_schema_is_current_version
PASS [ 7.344s] nexus-db-queries db::datastore::db_metadata::test::schema_version_subcomponents_save_progress
PASS [ 8.609s] nexus-db-queries db::datastore::db_metadata::test::concurrent_nexus_instances_only_move_forward
------------
Summary [ 11.386s] 3 tests run: 3 passed, 228 skipped
```
## After
```
SETUP PASS [ 1/1] crdb-seed: cargo run -p crdb-seed --profile test
PASS [ 2.087s] nexus-db-queries db::datastore::db_metadata::test::ensure_schema_is_current_version
PASS [ 3.054s] nexus-db-queries db::datastore::db_metadata::test::schema_version_subcomponents_save_progress
PASS [ 4.442s] nexus-db-queries db::datastore::db_metadata::test::concurrent_nexus_instances_only_move_forward
------------
Summary [ 7.550s] 3 tests run: 3 passed, 228 skipped
```
0 commit comments