Self-service update lets operators manage updates to Oxide system software and vendor firmware (e.g., AMD microcode) independently without Oxide involvement. The control plane remains operational throughout the update process, except for a few minutes of downtime at the end.
Workflow
The Oxide support team distributes update bundles through a customer-specified file transfer. The operator should validate the SHA checksum of the bundle against the value provided by support. They may inspect its contents as desired (it is a ZIP file organized as a TUF repository).
An operator then uploads the update bundle using the CLI (oxide system update
repo upload). (Note that upload via the web console will be supported in
a future release.) To initiate the update, the operator sets the target
release version for the system to the value corresponding to the update
bundle provided by support. This can be done either with the CLI or via the
web console. This starts the asynchronous process of updating all components of
the system (as necessary). For the duration of the update, the system reports
progress in terms of the number of components updated out of the total.
An operator can perform these actions themselves, but the Oxide support team is available to assist, and—as always—if you encounter any issues.
What to expect during an update
The unattended update process can take about 6 hours for a 16-sled rack, with the total time increasing proportionally to the number of sleds (e.g., 9 hours for a 24-sled rack).
Instance downtime
A running instance will experience downtime when its sled is rebooted into the new SP and host OS versions. While a given instance will likely be up for most of the duration of the update, the timing and duration of instance reboots and downtime are unpredictable. We recommend shutting down instances before starting a system update to allow for controlled instance outages.
During update, instances will transition to the failed state when
the sled on which they’re running is rebooted. If configured with an
auto-restart policy (which
is true by default), an instance in the failed state will be restarted on
another sled. Instances may be restarted multiple times during an update (i.e.
if the instance is restarted on a sled that is subsequently rebooted). Instances
that are already stopped at the time of sled reboot will not be automatically
restarted (the auto-restart policy only applies to instances in the failed
state).
If instances with continuous disk I/O must remain online during update, you may want to increase the NVMe I/O timeout in the guest operating system. A longer timeout will help reduce filesystem errors caused by transient unavailability of disk replicas during software update. More information about this can be found here.
API and console downtime
The API and web console will remain available through most of the update except for a short period of downtime at the end when the control plane transitions from the old version to the new version. The transition also involves a change in external IP addresses. The total time impact is typically a few minutes but can be longer depending on the upstream DNS TTL settings. Imports of large images and other long-running API operations may have a higher failure rate during system updates.
Other known limitations
There may be periods during which metrics are generated but not being collected, primarily when the sleds hosting the metrics database and collector are offline for an SP or host OS update.
SSD firmware is not included in the update bundle. Oxide support can assist with any necessary SSD firmware updates.
Future improvements
The limitations described above will be addressed in future releases to make system updates as undisruptive as possible.
Prerequisites
The first-time use of update requires a one-time setup to configure the existing software version running on the rack. This setup may be performed by an operator (with assistance from Oxide Support as desired).
The workstation used for performing the update has access to the TUF repo and an Oxide CLI version compatible with the software currently running on the rack.
The service IP address pool size has at least 13 IP addresses. The IP ranges available can be listed and managed using the operator CLI subcommands under
oxide ip-pool service range.Any firewall rules governing rack API/console and upstream NTP access must cover the entire service IP range(s) because the IP addresses for the external API and boundary NTP may change after each system update. (Note: IP lookup of API/console using Oxide’s DNS servers or something downstream of them will reflect the new IP addresses in use.)
Detailed Procedures
Upload TUF repo and set target release
From a workstation that has access to both the TUF repo and Oxide CLI, the fleet admin user will execute the following commands:
Set the
TUF_REPO_FILEenvironment variableexport TUF_REPO_FILE=/path/to/tuf-repoExtract trust root inside TUF repo
unzip "$TUF_REPO_FILE" repo/metadata/1.root.jsonUpload trust root
oxide system update trust-root create --json-body repo/metadata/1.root.jsonUpload the TUF repo binaries
oxide system update repo upload --path "$TUF_REPO_FILE"After a successful upload, the response JSON will include a system version in the
repo.system_versionproperty. It should look something likex.y.z-0.ci+gitXXXXXXXXXXX(e.g.,17.0.0-0.ci+git83f7f06a0a3). You can also runoxide system update repo listto list the releases you’ve uploaded.Set the target system version
oxide system update target-release update --system-version "$SYSTEM_VERSION"
Monitor update progress
Operators can monitor update progress on the System Update page in the web console, or by running
oxide system update statusThe command returns component counts summarized by version string. Here is an example of the output:
{
"components_by_release_version": {
"16.1.0-0.ci+git8076220c1b7": 223,
"17.0.0-0.ci+git529ba64ee28": 6
},
"suspended": false,
"target_release": {
"time_requested": "2025-10-10T21:20:54.455837Z",
"version": "17.0.0-0.ci+git529ba64ee28"
},
"time_last_step_planned": "2025-10-10T21:23:22.700163Z"
}The update process will continue to be enhanced in the upcoming releases to include information about any issues encountered, their impact and expected operator action.