The ability to boot a PC from the network has been around since the 90's, standardized by the PXE specification published by Intel. It leveraged DHCP and TFTP to enable devices to request and receive boot images from a server, which we've used for years with Windows Deployment Services (successor to the Remote Installation Server), Microsoft Configuration Manager, and many other OS deployment tools.
Even in these "modern" days, PXE continues to be heavily used, for a variety of scenarios:
- "Clean OS" installation. Even with "cloud native" deployment processes like those implemented with Windows Autopilot, organizations still want to start with a clean, up-to-date OS on their devices.
- Break/fix. Drives fail, Windows installations run into issues -- whatever the cause, it's often faster to blast a new OS to the device than it is to try to fix it.
- Device recovery. Whether it's malware, ransomware, software flaws, or any other "mass PC casualty" it can be faster and safer to just rebuild the device.
Alternatives like USB keys are often slower, and certainly less convenient and more challenging to manage, than PXE booting. So even 20 years later, people prefer PXE booting whenever possible -- and it continues to work well. But that doesn't mean there aren't improvements that can be made, especially when it comes to performance. Let's walk through some scenarios.
The basics: WDS or ConfigMgr "out of the box"
Let's assume you are setting up WDS or ConfigMgr PXE for the first time. If you don't tweak anything, PXE booting will work fine. But "fine" doesn't mean "fast." A simple test pulling a fairly typical Windows PE boot WIM (around 569MB, including .NET Framework and PowerShell) to a device via a PXE-initiated TFTP transfer takes over three minutes. Trying to do the same thing over a simulated WAN link increases that time substantially -- to the point where I don't even want to wait long enough to find out how long it would take -- it's in the hours.
But why is it so slow? Three minutes doesn't seem slow, but hours certainly does. If you go back to the original TFTP spec, the default block size was 512 bytes, and the client had to acknowledge every packet before the server would send the next. On a LAN, that's not awful, but as you add longer round-trip times (RTT) for each packet sent, it quickly becomes unusable.
Some tuning is required
There have been improvements to TFTP over the years to help with the poor performance from the initial protocol. RFC 2348 added a new "blksize" option that enables the specificication of larger block sizes, as long as they aren't larger than the network medium's maximum transmission unit (MTU). Many years later, in 2015, another new option was added by RFC 7440: a new "windowsize" option that specifies the number of packets that the server can send before a click needs to send back an acknowledgement.
If you combine those two together, you can greatly improve the TFTP throughput. A good reference that talks about this is a blog from Jörgen Nilsson, which tests various values on different types of hardware. His recommendation at the time was for a block size of 1456 (so it fits within the 1500 byte Ethernet MTU) and a window size of 8 (to avoid issues on VMware). With the move to UEFI-based machines, which all implement consistent PXE/TFTP functionality, larger sizes may now be possible (especially for the window size), but I'll leave that testing for the user and stick to the 1456 and 8.
If you read my previous post about simulating "real-world" I defined a variety of different network types, each with different speeds and round-trip times (RTT). Even with these tweaks, scenarios that would have taken impossibly long before are now actually measurable:
Type | Bandwidth | RTT | TFTP |
LAN | -- | -- | 0:00:33 |
WAN | 256Mbit/s | 30ms | 0:08:33 |
Local cloud | 1Gbit/s | 60ms | 0:25:39 |
WAN inter-continent | 256Mbit/s | 100ms | 0:42:44 |
Remote cloud | 1Gbit/s | 190ms | 1:16:56 |
Satellite | 100Mbit/s | 400ms | 2:50:58 |
From three minutes to 33 seconds just from tweaking a couple of simple parameters.
Measurable, sure, but still only the LAN time could be considered good -- the other times would really test your patience. And throw in some packet loss errors (especially on the satellite time) and it may never finish.
TFTP has to go
You could continue trying to tweak the block size and window size to squeeze out some additional improvements, but at the end of the day the limitations is really the protocol itself: TFTP was built for LANs with very low latency, and it is quite poor for everything else.
So what alternatives do we have? What about HTTP/HTTPS? Those are based on the TCP protocol, which was designed to handle higher-latency (high RTT) network links through the use of advanced sliding window techniques. The device firmware didn't support this until more recently (more on that later), so you needed to use something that did. That's where iPXE comes in. It implements the ability to download that same Windows PE WIM file using HTTP or HTTPS. The small iPXE binary is transferred first using TFTP and loaded by the firmware, then it takes over to transfer the WIM file using HTTP or HTTPS.
So let's see how long it takes to complete the WIM file transfer over those same network connections using HTTP:
Type | Bandwidth | RTT | TFTP | HTTP |
LAN | -- | -- | 0:00:33 | 0:00:01 |
WAN | 256Mbit/s | 30ms | 0:08:33 | 0:00:32 |
Local cloud | 1Gbit/s | 60ms | 0:25:39 | 0:00:25 |
WAN inter-continent | 256Mbit/s | 100ms | 0:42:44 | 0:03:13 |
Remote cloud | 1Gbit/s | 190ms | 1:16:56 | 0:01:24 |
Satellite | 100Mbit/s | 400ms | 2:50:58 | 0:07:57 |
Now we're talking: not only are LAN speeds good, but so are WAN and local cloud speeds. All the scenarios are doable in a pinch, but certainly having a single PXE server for a number of WAN-connected locations, or even hosting a PXE server in the cloud, becomes a real possibility.
So what does it take to use iPXE? Since we're typically talking about UEFI-based machines these days, that also means Secure Boot. Fortunately, our iPXE Anywhere solution is signed by Microsoft to enable it to be used with Secure Boot enabled. Using the 2Pint-provided 2PXE server to replace your WDS or ConfigMgr-provided PXE service, we can deploy the same boot images that you use today -- just faster.
But wait, there's more
So far, we've messed around with the protocols involved, TFTP and HTTP/HTTPS. What else could be done to reduce the time required? Well, we need to reduce the amount of data transferred. Compression won't do any good (we're talking about WIM files that are already nicely compressed), so what other mechanism is available? Simple, peer-to-peer sharing.
Since we are talking about downloading a WIM file from Windows web server using HTTP/HTTPS, there's an existing Microsoft technology that can help out: BranchCache. Instead of downloading the full content from the remote server, the client can download metadata that describes chunks of the file; the client can then reach out to other PCs on the same network segment to see if they already have those chunks.
OK, but normally that logic is built into Windows too: the BITS client understands the BranchCache protocol and uses it to do peer transfers when possible. How does iPXE accomplish the same thing? It includes its own BranchCache (a.k.a. PeerDist) client-side implementation, as Microsoft has published the specification for how it works.
So as long as there are other available devices that have already downloaded and cached the needed chunks (something that we can actively manage with another product called CacheR), the PXE-booting device can get those chunks without needing to go over the slow network links.
There is still some TFTP (iPXE binaries) and remote HTTP/HTTPS (chunk details) traffic required, but this is typically a small percentage of the total transfer size. Cross-continent and remote cloud scenarios are easily doable, and even satellite links could be considered. There are even benefits for LAN scenarios: reduced network load on the server results in greater scalability, e.g. more simultaneous clients.
And after we've finally booted into Windows PE, we can also use peering for all the other needed content: operating system images, drivers, etc.
Can we eliminate TFTP altogether?
The UEFI firmware in modern PCs can directly boot using HTTP/HTTPS, so it is certainly possible to configure the device to boot directly to iPXE to begin a bare metal deployment. That's not to say that there aren't some complications with that setup, since every PC maker does things differently. But it's definitely something we are working to streamline, so stay tuned.
For more information
Want to know more? Click the "Contact us" button on the https://2pintsoftware.com home page and provide your details. Want to know pricing? Per device, per year prices can be found on the website as well (free for government, education, and non-profits too).
Top Posts
Optimizing BranchCache with RepubQuorumSize: An Undocumented Yet Crucial Tweak
By Michelle Hammarskjöld
By Michelle Hammarskjöld
By Johan Arwidmark
By Michael Niehaus