Completed GSOC with QEMU project on "Moving I/O throttling and write notifiers into block filter drivers"

(Note: this is a copy of my completion report on the qemu mailing list: https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg05371.html)

This is a GSOC project summary required for the project’s final submission. As part of GSOC 2017, I took the project of moving two hard coded block layer features into filter drivers. I/O Throttling is implemented in block/throttle.c and before write notifiers are split into a driver for each user of the before write notifier API: block/backup.c and block/write-threshold.c. Furthermore, work began on block-insert-node and block-remove-noder commands for the QMP interface to allow runtime insertion and removal of filter drivers. [0]

A lot of thanks to my mentors for their help: Alberto Garcia, Stefan Hajnoczi and Kevin Wolf.


The BlockBackend struct (block/block-backend.c) represents the backend part of a storage device that a VM sees in the QEMU environment. The BlockBackend has the responsibility to forward I/O requests from the VM down to the actual underlying storage; a network block device, a qcow2 image etc.

In order to allow for polymorphic storage, a BlockBackend forwards the requests to an acyclic graph in which the leaves are the terminal I/O destination, a file or network connection. The BlockDriverState struct represents a node in this graph, and each node is governed by a specific driver. Above the leaf-nodes we have format drivers that translate requests for each format (ie a qcow2 driver, or a raw driver). Backing files are implemented as chains of nodes that forward read requests to their children but keep write requests to themselves. This setup allows different node drivers to intercept requests before they reach their destination by being inserted into points of interest in the graph. We call these block filter drivers.

An existing filter driver for example is block/blkverify.c which compares two children for content parity and reports content mismatches.

I/O Throttling

I/O throttling is done by intercepting I/O requests and throttling them based on the configured limits (docs/throttle.txt). The interface was refactored into the throttle driver [1] while the throttling primitives were left unchanged. The already existing interface of setting limits on a BlockBackend device is simulated [2] by inserting a hidden to the user throttle filter node at the root of the BlockBackend with the user’s set limits. Implicitly created filter nodes is not a good solution since some of the QEMU internals are written without considering filter nodes. Some patches in the throttle-remove-legacy branch are dedicated to changing existing behaviour to match the new concept of implicit filters. In the future management tools should be expected to explicitly add and remove filter nodes like throttle (except for transient block job filters which may remain implicit) and there should be no surprises about the state of the block layer graph for the user.

Throttle groups are categories of drives sharing the same limits with a round-robin algorithm. Additional effort was spent on making throttle groups easier to configure by turning them into a separately creatable object (with -object syntax on command line invocation or object-add in QMP). Their properties can be set with ‘qom-set’ commands and retrieved with ‘qom-get’.

Write Notifiers

While a backup block job is running, it is important to have knowledge of writes to the relevant image. Before write notifiers pass the write requests to the backup job to perform copy on write on the target image with the new data. Currently this is done on the BlockBackend level. Other block jobs (commit/mirror) already create implicit nodes in the BDS chain and this approach was copied and a backup filter driver was created [3], internal to block/backup.c

The write-threshold feature once enabled via QMP, watches passing write requests and compares them to a user-given threshold offset. When that threshold is exceeded an event is delivered to the user and the feature disables itself. This is used for management tools that need to know when they have to resize their images. Like backup, this was done in the BlockBackend level. However it wasn’t easy to replace the existing interface with an implicit filter node like in throttling, so only a separate driver was created [4] in block/write-threshold.c. Like other filter drivers, it can be inserted on runtime and removed once it delivers the event and is spent and should be removed or replaced.

Branches / Patches

The ‘throttle’ and ‘throttle-remove-legacy’ patches should be merged soon after master unfreezes from the 2.10 release. The rest of the patch series are in final stages of review on qemu-devel except for block-insert-node which is an RFC [5].

Already merged patches in 2.10 https://github.com/qemu/qemu/commits/v2.10.0-rc4?author=epilys Already merged patches for 2.11 https://www.mail-archive.com/address@hidden/msg470461.html

[0] [insert-node] block-insert-node and block-remove-node commands https://github.com/epilys/qemu/commits/insert-node?author=epilys

[1] [throttle] add throttle filter driver https://github.com/epilys/qemu/commits/throttle?author=epilys Message-ID: address@hidden https://www.mail-archive.com/address@hidden/msg476047.html

[2] [throttle-remove-legacy] remove legacy throttling interface https://github.com/epilys/qemu/commits/throttle-remove-legacy?author=epilys

[3] [4] [notify] https://github.com/epilys/qemu/commits/notify?author=epilys

[5] block-insert-node RFC Message-ID: address@hidden https://www.mail-archive.com/address@hidden/msg473619.html