System Update

Self-service update lets operators manage updates to Oxide system software and vendor firmware (e.g., AMD microcode) independently without Oxide involvement. The control plane remains operational throughout the update process, except for a few minutes of downtime at the end.

Workflow

The Oxide support team distributes update bundles through a customer-specified file transfer. The operator should validate the SHA checksum of the bundle against the value provided by support. They may inspect its contents as desired (it is a ZIP file organized as a TUF repository).

An operator then uploads the update bundle using the CLI (oxide system update repo upload). (Note that upload via the web console will be supported in a future release.) To initiate the update, the operator sets the target release version for the system to the value corresponding to the update bundle provided by support. This can be done either with the CLI or via the web console. This starts the asynchronous process of updating all components of the system (as necessary). For the duration of the update, the system reports progress in terms of the number of components updated out of the total.

An operator can perform these actions themselves, but the Oxide support team is available to assist, and—​as always—​if you encounter any issues.

Note
Access to the update commands is restricted to users with the fleet admin role.

What to expect during an update

The unattended update process can take about 6 hours for a 16-sled rack, with the total time increasing proportionally to the number of sleds (e.g., 9 hours for a 24-sled rack).

Instance downtime

A running instance will experience downtime when its sled is rebooted into the new SP and host OS versions. While a given instance will likely be up for most of the duration of the update, the timing and duration of instance reboots and downtime are unpredictable. We recommend shutting down instances before starting a system update to allow for controlled instance outages.

During update, instances will transition to the failed state when the sled on which they’re running is rebooted. If configured with an auto-restart policy (which is true by default), an instance in the failed state will be restarted on another sled. Instances may be restarted multiple times during an update (i.e. if the instance is restarted on a sled that is subsequently rebooted). Instances that are already stopped at the time of sled reboot will not be automatically restarted (the auto-restart policy only applies to instances in the failed state).

If instances with continuous disk I/O must remain online during update, you may want to increase the NVMe I/O timeout in the guest operating system. A longer timeout will help reduce filesystem errors caused by transient unavailability of disk replicas during software update. More information about this can be found here.

API and console downtime

The API and web console will remain available through most of the update except for a short period of downtime at the end when the control plane transitions from the old version to the new version. The transition also involves a change in external IP addresses. The total time impact is typically a few minutes but can be longer depending on the upstream DNS TTL settings. Imports of large images and other long-running API operations may have a higher failure rate during system updates.

Other known limitations

  • There may be periods during which metrics are generated but not being collected, primarily when the sleds hosting the metrics database and collector are offline for an SP or host OS update.

  • SSD firmware is not included in the update bundle. Oxide support can assist with any necessary SSD firmware updates.

Future improvements

The limitations described above will be addressed in future releases to make system updates as undisruptive as possible.

Prerequisites

  1. The first-time use of update requires a one-time setup to configure the existing software version running on the rack. This setup may be performed by an operator (with assistance from Oxide Support as desired).

  2. The workstation used for performing the update has access to the TUF repo and an Oxide CLI version compatible with the software currently running on the rack.

  3. The service IP address pool size has at least 13 IP addresses. The IP ranges available can be listed and managed using the operator CLI subcommands under oxide ip-pool service range.

  4. Any firewall rules governing rack API/console and upstream NTP access must cover the entire service IP range(s) because the IP addresses for the external API and boundary NTP may change after each system update. (Note: IP lookup of API/console using Oxide’s DNS servers or something downstream of them will reflect the new IP addresses in use.)

Detailed Procedures

Upload TUF repo and set target release

From a workstation that has access to both the TUF repo and Oxide CLI, the fleet admin user will execute the following commands:

  1. Set the TUF_REPO_FILE environment variable

    export TUF_REPO_FILE=/path/to/tuf-repo
  2. Extract trust root inside TUF repo

    unzip "$TUF_REPO_FILE" repo/metadata/1.root.json
  3. Upload trust root

    oxide system update trust-root create --json-body repo/metadata/1.root.json
  4. Upload the TUF repo binaries

    oxide system update repo upload --path "$TUF_REPO_FILE"

    After a successful upload, the response JSON will include a system version in the repo.system_version property. It should look something like x.y.z-0.ci+gitXXXXXXXXXXX (e.g., 17.0.0-0.ci+git83f7f06a0a3). You can also run oxide system update repo list to list the releases you’ve uploaded.

  5. Set the target system version

    oxide system update target-release update --system-version "$SYSTEM_VERSION"

Monitor update progress

Operators can monitor update progress on the System Update page in the web console, or by running

oxide system update status

The command returns component counts summarized by version string. Here is an example of the output:

{
"components_by_release_version": {
"16.1.0-0.ci+git8076220c1b7": 223,
"17.0.0-0.ci+git529ba64ee28": 6
},
"suspended": false,
"target_release": {
"time_requested": "2025-10-10T21:20:54.455837Z",
"version": "17.0.0-0.ci+git529ba64ee28"
},
"time_last_step_planned": "2025-10-10T21:23:22.700163Z"
}

The update process will continue to be enhanced in the upcoming releases to include information about any issues encountered, their impact and expected operator action.