Preemptive I/O: Maximising HDD Throughput
Computer performance increases at a tremendous rate. As each new performance barrier is passed, the weakest link becomes more and more apparent: the hard disk. We might be able to change that...
Despite buying faster and faster hard-disk-drives, our apparent disk speed does not seem to increase significantly, if at all. Why?
Well, the incremental improvements in hard-disk technology do not compare with the exponential improvements evident in the rest of the computer, principally the CPU and RAM. Additionally, the applications, databases and files that we work with today are often quite large (with regards to the volume of data that must be read from disk) and they tend to increase in size with each new revision. Furthermore, the average computer will have a fragmented file-system which, in turn, dictates seek-induced delays as the disk re-positions the drive head for the next read/write. If compression and/or encryption is used on the disk in question, then performance takes a further hit as decompression and decryption operations take place. Put these together and they result in a system with an under-utilised CPU and one that has to periodically wait for the hard disk(s) to catch up.
Performance users can choose high-speed SCSI disks and RAID 0 arrays to maximise on throughput. SATA drives (in a RAID configuration) have the potential to yield a considerable increase in data transfer rates too. Even so, their users will still suffer a delay between launching an application or opening a file and actually being able to work with it and, for these power-users at least, that delay is more than mildly annoying.
If your pockets are infinitely deep, you can just about completely eradicate the disk bottleneck by choosing a computer equipped with a Solid State Disk (SSD). A high-performance SSD can handle around a quarter of million random I/Os per second and can yield sustainable throughput of >3 GB/second! Furthermore, an SSD has no moving parts (discounting, for a moment, backup storage) and so is subject to no mechanical wear-and-tear. Hence, mean time between failure (MTBF) figures of >1,000,000 hours are often quoted. However, aside from astromical costs, SSD's also have one other significant drawback - limited capacity. The smaller disks are measured in MBs and the largest SSD I found topped-out at 64GB (and God only knows how much it costs). In today's storage-hungry environment, this is a severe restriction to their deployment. Especially when you consider that, at the time of writing, 1TB of disk storage can be acquired for a paltry $1200 (USD)!
What can we do to increase disk performance? We can increase rotational speed - which involves machining to extremely fine tolerances and thus also increases the potential for a catastrophic failure. We can increase track density and reduce the number of platters, thereby reducing the amount of head movement required. We can increase the speed at which the disk controller can read/write, up to a point. The interface between disk and CPU could be further refined and we can increase cache size and improve the caching algorithms. All of which already receive millions of R&D dollars.
Introducing Preemptive I/O

I began to wonder if there was anything we could do in software to alleviate this performance handicap and I have developed an embryonic idea that I have christened "Preemptive I/O".
Suppose the computer could anticipate our disk I/O requirements. It could, theoretically, read and write ahead of our actual request to do so, performing the bulk of its I/O in the background.
I tend to use the same collection of applications during my day-to-day computer usage, opening and closing them as required. Assuming this holds true for the average user, Preemptive I/O could use pattern analysis and perhaps a Bayesian Model Averaging algorithm to take a guess at which application we're going to want to launch next and pre-fetch some or all (RAM permitting) of the relevant files from disk. When we do launch the application, it's components (or a significant part of them) are already in main memory - so startup appears to be instantaneous.
The same could be true for data files. It is common for systems to maintain a cache of references to recently used files. Thus the computer effectively "knows" what we're currently working on. This same cache could be used by the Preemptive I/O system to pre-fetch the most recently used data files. If there is sufficient RAM, it could do this for each application it pre-fetches. If not, then it could pre-fetch as the application starts.
Disk writes could also be preempted. Indeed, a lot of desktop software already does this to a certain degree, with the application's "auto save" facility. I would suggest that the Preemptive I/O system takes this one step further - by assuming that anything that is created or modified will, ultimately, need to be written to disk, the Preemptive I/O system could stream changes to the disk as they occur. Then, when the user elects to save, the stream is "finalised" - a final write operation that would appear to be almost instantaneous.
Preemptive I/O should not be particularly demanding of the CPU. It would, however, require a substantial amount of RAM to be truly effective. However, I don't see this as a barrier. Both CPU power and RAM are in abundance in modern desktop computers. Furthermore, as 64-bit CPU's begin their rise into the mainstream, the amount of addressable RAM increases dramatically. Thus, it would be entirely possible to have a seperate bank of DIMM sockets dedicated to Preemptive I/O.
So there you have it (I did warn that it was embryonic). I'm not sure where I'm going with this at this time - I may expand on the idea, I may do a little more research. If I do, I'll document everything here.
Questions
- Would a Preemptive I/O system be worthwhile?
- Is such a system workable/possible?
- Has it already been done?
- Any volunteers?