Skip to content

understand (and possibly improve) instance creation times #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jordanhendricks opened this issue Aug 11, 2023 · 9 comments
Open

understand (and possibly improve) instance creation times #487

jordanhendricks opened this issue Aug 11, 2023 · 9 comments
Labels
control plane Related to the control plane.
Milestone

Comments

@jordanhendricks
Copy link
Contributor

jordanhendricks commented Aug 11, 2023

In recent experience with rack2, we have had a few discussions around the time it takes to create instances with a lot of memory. Some examples are: the as of not yet understood oxidecomputer/omicron#3417, data from @askfongjojo that indicates large instances often reliably take around 40 seconds to create and start, and observations that creation of large instances often times out.

Recently we landed support in omicron for using the VMM reservoir, which helped alleviate some of the pain around creating large instances, but it still takes on the order of 30+ seconds to create instances > 64GiB of memory, so I wanted to understand where that time was going.

I looked at a couple of larger instances this week on the dogfood cluster, and saw that there was about 20-25 seconds for a 64gb/96gb memory instance between the first propolis-server log line and the log line indicating a VNIC was being created for instance. (I intended to look at more, smaller instances, but was hamstrung by unrelated issues.) In between those two events, by code inspection I see that we would make OS call to allocate guest memory from the reservoir. @pfmooney did some testing of large VMs and found that the actual reservoir allocation was very small (order of microseconds), but it took around 15 seconds to map ~60GiB of memory into the guest address space. It thus seems plausible but that's where our time was spent, but we have little in the way of logging to show that.

It does not seem that improving instance creation times for large VMs is a big priority at the moment (though of course, no one is going to complain if instance creation is faster!). That said, from looking at this issue so far, it's clear that we could have better data here. At a minimum, I think we should:

@jordanhendricks jordanhendricks added this to the Unscheduled milestone Aug 11, 2023
@jordanhendricks jordanhendricks added the control plane Related to the control plane. label Aug 11, 2023
@jordanhendricks
Copy link
Contributor Author

related: #471

@askfongjojo
Copy link

My most recent attempts to create instances of 128 GB memory began to fail after the sleds on rack2 have had a lot of instances created and then deleted.

root@[fd00:1122:3344:102::3]:32221/omicron> select id, active_sled_id, state, name, time_created from instance where name like 'provision%128m' order by 5;
                   id                  |            active_sled_id            |   state   |          name           |         time_created
---------------------------------------+--------------------------------------+-----------+-------------------------+--------------------------------
  884d2f39-f5fd-4c31-82c4-4e4628c4a6c3 | a2303a7b-fe1f-4010-99da-ed90bba042b0 | destroyed | provision-time-16c-128m | 2023-08-10 23:12:33.204546+00
  b7523d15-a861-4320-9972-c5a7f7d63754 | ae9eccdf-e662-43d2-9493-445cfa934ee8 | destroyed | provision-time-16c-128m | 2023-08-11 00:55:14.505487+00
  bdb77ad9-6963-4ad6-a1a6-829af63cf575 | 2d7b6828-ba9f-44ff-862a-63852d79a410 | destroyed | provision-time-8c-128m  | 2023-08-14 04:59:26.472164+00
  5918a06f-aab8-4752-913b-310f717a3b2b | 6b4ff253-ba5b-4d0c-94c9-7751bdc0bf80 | destroyed | provision-time-16c-128m | 2023-08-14 05:03:32.510433+00
  1c3f0077-2b7b-4c56-a159-9858b9789ec5 | 6b93c9c3-8056-44f4-b2b5-2f461be09819 | destroyed | provision-time-32c-128m | 2023-08-14 05:07:03.469654+00
  4c5fe451-7b95-4f2d-a448-ce20bbb5fea6 | 94f583be-8d15-4b15-92cd-bf22f33179b7 | destroyed | provision-time-32c-128m | 2023-08-14 05:11:08.420303+00
  8c505c88-2432-419f-85ed-8c194a5310d0 | 6b4ff253-ba5b-4d0c-94c9-7751bdc0bf80 | destroyed | provision-time-64c-128m | 2023-08-14 05:13:54.8145+00
  3b90cccf-b798-40ad-8373-7caa62686ed5 | a2303a7b-fe1f-4010-99da-ed90bba042b0 | destroyed | provision-time-32c-128m | 2023-08-14 06:23:44.721355+00
  3fc8f742-5e24-4133-ac09-5e30a5ff6b3c | ae9eccdf-e662-43d2-9493-445cfa934ee8 | destroyed | provision-time-64c-128m | 2023-08-14 06:26:28.147305+00
(9 rows)

Of the above 9 instances, only the first two were created successfully. The subsequent ones all failed after about 1m 45s (the durations were simply time of the cli calls).

@pfmooney
Copy link
Collaborator

Filed illumos#15844, which should cover at least some of the provisioning cost we're seeing. I have a patch in the works which should improve things there.

@pfmooney
Copy link
Collaborator

15844 has landed in illumos-gate. Once that makes its way into stlouis, and onto test hardware, it'd be good to revisit the large instance provisioning tests.

@askfongjojo
Copy link

Will certainly do so. Currently provisioning large VMs (with 64/96/128 GB of memory) is still subject to the race with the 60 second client timeout.

@gjcolombo
Copy link
Contributor

FWIW, I'm not (yet) convinced that the problem in oxidecomputer/omicron#3927 is specific to large instances, since in that issue Nexus timed out while waiting for sled agent to create the Propolis zone, and I don't have an intuition as to what mechanism would make that take longer for larger instances than it does for smaller ones. (That said I have no idea what made zone setup take so long, full stop, so pretty much any theory is on the table at this point.)

@askfongjojo
Copy link

askfongjojo commented Aug 23, 2023

Well, as a "controlled experiment", I have this one instance on rack2 that was successfully provisioned and started with 64 vcpus and 128 GB memory: https://oxide.sys.rack2.eng.oxide.computer/projects/try/instances/provision-time-64c-128m

After yesterday's software update, it failed to start up after multiple tries, all due to the same client timeout error:

23:15:06.504Z WARN SledAgent (dropshot (SledAgent)): client disconnected before response returned
    file = /home/build/.cargo/git/checkouts/dropshot-a4a923d29dccc492/8ef2bd2/dropshot/src/server.rs:927
    local_addr = [fd00:1122:3344:106::1]:12345
    method = PUT
    remote_addr = [fd00:1122:3344:107::3]:64621
    req_id = e349db23-b0d1-482c-847c-64e7a11421c4
    uri = /instances/fe88bdb4-8bff-41a8-9ae5-9a21acf53ce3/state

I stopped all VMs on the sled in question after multiple failed attempts and this was the only instance being started on the sled BRM44220010 (gc25). Next, I updated the memory setting of this instance to 96 GB in the instance table in CRDB. After that, I was able to stop/start the instance (tried that twice) and verify that the guest was also functional (with a certain test workload). So it would appear that there is still something to do with large memory size?

@pfmooney
Copy link
Collaborator

Just for reference, 15844 was not merged into stlouis until this morning, so any impact it may have would be missing from what was installed yesterday.

@askfongjojo
Copy link

askfongjojo commented Aug 28, 2023

Prior to 15844, the end-to-end provisioning times were anywhere between 30-40 for a VM of 32 or 64 GB memory. VMs with larger memory size were close to 50-60 seconds (or failing with sled-agent client timeout error if the requests went beyond 60s). And I hadn't been able to provision instance with more than 128 GB.

After 15844, the provisioning times fall in the 21-29s range consistently for VM of different sizes (sample size = 50, much of the time was spent on vnic and disk setup). I am also able to spin up 256 GB memory instances and run simple applications on them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
control plane Related to the control plane.
Projects
None yet
Development

No branches or pull requests

4 participants