As an update to the earlier blog post, take a look below.
As a reminder: this is about resynchronization (ie. recovery after a node or network problem), not about the replication.
If you’ve got a demanding application it’s possible that it completely fills your I/O bandwidth, disk and/or network, leaving no room for the synchronization to complete. To make the synchronization slow down and let the application proceed, DRBD has the dynamically adaptive resync rate controller.
It is enabled by default with 8.4, and disabled by default with 8.3.
To explicitly enable or disable, set c-plan-ahead
to 20
(enable) or 0
(disable).
Note that, while enabled, the setting for the old fixed sync rate is used only as initial guess for the controller. After that, only the c-*
settings are used, so changing the fixed sync rate while the controller is enabled won’t have much effect.
What it does
The resync controller tries to use up as much network and disk bandwidth as it can get, but no more than c-max-rate
, and throttles if either
- more resync requests are in flight than what amounts to
c-fill-target
[1. Or, ifc-fill-target
is set to0
, if the current estimated response delay from the peer is more thanc-delay-target
] - it detects application IO (read or write), and the current estimated resync rate is above
c-min-rate
[1. Unlessc-min-rate
is0
.].
The default c-min-rate
with 8.4.x is 250 kiB/sec (the old default of the fixed sync-rate
), with 8.3.x it was 4MiB/sec.
This “throttle if application IO is detected” is active even if the fixed sync rate is used. You can (but should not, see below) disable this specific throttling by setting c-min-rate
to 0
.
Tuning the resync controller
It’s hard, or next to impossible, for DRBD to detect how much activity your backend can handle. But it is very easy for DRBD to know how much resync-activity it causes itself.
So, you tune how much resync-activity you allow during periods of application activity.
To do that you should
- set
c-plan-ahead
to20
(default with 8.4), or more if there’s a lot of latency on the connection (WAN link with protocol A); - leave the fixed resync rate (the initial guess for the controller) at about 30% or less of what your hardware can handle;
- set
c-max-rate
to 100% (or slightly more) of what your hardware can handle; - set
c-fill-target
to the minimum (just as high as necessary) that gets your hardware saturated, if the system is otherwise idle.
Respectively, figure out the maximum possible resync rate in your setup while the system is idle, then setc-fill-target
to the minimum setting that still reaches that rate. - And finally, while checking application request latency/responsiveness, tune
c-min-rate
to the maximum that still allows for acceptable responsiveness.
Most parts of this post were originally published as an ML post by Lars.