**Congestion Avoidance on a lossy link** Mathis et. al. gave us in its paper The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm a simple formula for throughput for specific scenario. The scenario is a single TCP sender sends full sized segments to a TCP receiver over a link with high bandwidth, a constant delay, and a fixed packet loss rate. The condition for the formula is that the bandwidth is high enough, such that the fixed packet loss limits the throughput. ====== Theory ====== The packet loss rate denoted by p means that 1/p consecutive packets are delivered followed by one drop. The drop halves the cwnd from W = cwnd_max to W/2. After the reduction cwnd increases in congestion avoidance from W/2 to W by 1 per RTT. Thus, after W/2 RTT cwnd reaches W again, which gives as a perfect sawtooth. {{:public:perfect_cwnd.png?900 | Perfect cwnd}} The total number of packets per cycle gives us the area under the sawtooth, which is (W/2)² + 1/2 (W/2)² = 3/8 (W)². Since we know that this must be equal to 1/p, we can solve to W = sqrt(8/(3p)). With this, the common formula for throughput Tp = data per cycle / time per cycle = (MSS * 3/8 W²) / (RTT * W/2) solves to Tp = MSS/RTT * 1/sqrt(2/3 p) = MSS/RTT * C/sqrt(p) Besides the constant C = sqrt(3/2), the paper gives us other values for C when acks are delayed or when the packet loss is random rather than periodic. The C value for delayed ack was derived for a sender that counts acks instead of consider the number of packets that were acked. Modern TCP implementations and QUIC consider the number of packets that an ACK acks. Therefore, this value for C does not apply here. The C value for random packet loss were derived in another paper by an approximative math model. ====== Simulation ====== To simulate the scenario, the example lossy_link was created with the mathis_simple.ini config file. The config file contains two configs. One for ack every packet and one for ack every other packet (delayed ack). The following parameters are relevant. # use lossy_link_shared as network network = lossy_link_shared # Avoid flow control **.quic.initialMaxData = 4294967295 **.quic.initialMaxStreamDataBidiRemote = 4294967295 **.quic.initialMaxStreamDataUni = 4294967295 # set link properties **.linkDelay = 1ms **.linkDatarate = 100000Mbps **.ppp[*].ppp.mtu = ${mtu=1280}B # always have enough bytes to send **.client.quic.sendQueueLimit = 50000000 **.client.quic.sendQueueLowWaterRatio = .8 # use the accurate increase to come closer to the theory **.client.quic.accurateIncreaseInNewRenoCongestionAvoidance = true # let the sender send nothing but stream frames to avoid sending small packets (not full-sized) **.quic.bundleAckForNonAckElicitingPackets = false # set the loss rate p = 2 %, meaning 1/2% = 50 packets are delivered followed by one drop **.client.quic.periodicSendLossRate = .02 # configs for ack every packet and ack every other packet [Config ack_every_packet] **.server.quic.numReceivedAckElicitingsBeforeAck = ${pktBeforeAck=1} [Config ack_every_other_packet] **.server.quic.numReceivedAckElicitingsBeforeAck = ${pktBeforeAck=2} ===== Ack every packet ===== Simulating the config for ack every packet and drawing cwnd, bytesInFlight and partialBytesAcked gives us the following diagram. {{ :public:mathis_simple.svg?900 | Mathis Simple}} Here, cwnd_max = 12520 bytes which corresponds to W = 10 (MTU is 1280, reduced by IP and UDP header we get 1252 bytes per packet). The theory says one cycle takes W/2 RTT = 5 RTT. As shown in the diagram one cycle in the simulation takes (W/2 + 2) RTT = 7 RTT. We have two additional RTTs, because the packet loss detection and the following recovery period take one additional RTT each. **One RTT to detect packet loss.** QUIC detects a packet loss only by receiving ACKs for packets sent after the packet loss. These ACKs arrive naturally one RTT after the packet loss occurred. **One RTT for the recovery period.** When a packet loss was detected, QUIC halves the cwnd. This leaves bytesInFlight far above cwnd. QUIC has to wait for more incoming ACKs that reduce bytesInFlight below cwnd before it is allowed to continue sending packets. As shown in the diagram, the acks arriving after the packet loss does not increase partialBytesReceived (with specified accurate increase, we increase cwnd when partialBytesReceived reaches cwnd). This is, because QUIC remains in recovery mode until it received the first ack for a packet sent after the packet loss was detected. These are the packets sent after one RTT after the packet loss were bytesInFlight reduces below cwnd. Since the ACKs for these packet arrive naturally one RTT later, QUIC has to wait two RTT before it is able to raise cwnd for the first time after a packet loss. Thus, one cycle takes (W/2 + b) RTT with b = 2. Deriving the formula with this new time gives us [[https://chart.apis.google.com/chart?cht=tx&chl=Tp=\frac{3}{4}\cdot\frac{MSS}{RTT}\cdot(\sqrt{\frac{8}{3p}%2Bb^2}-b)&.png]] where MSS is the QUIC Packet Size, which is the MTU reduced by the IP and UDP header (= 1252 bytes in the simulation). ===== Ack every other packet ===== Simulating the config for ack every other packet and again drawing cwnd, bytesInFlight and partialBytesAcked gives us the following diagram. {{ :public:mathis_simple_delayed.svg?900 | Mathis Simple Delayed}} The important difference here is that the time after the packet loss was detected until cwnd gets increased takes one RTT longer than before. That is due to the delayed ack strategy of the receiver. Every time the sender sends an uneven number of packets, the receiver delays the ack of the last packet. Let's first look at the good case, where the sender sends an even number of packets. At 3 RTT after the detected packet loss, cwnd is increased to 6 full sized packets. Thus, the sender sends 6 packets. One RTT later, the sender receives 3 acks for all 6 sent packets. Each acknowledged packet increases partialBytesInFlight. partialBytesAcked, which was at 2 full sized packets, gets increased by the first 4 acknowledged packets to the value of cwnd. This event increases cwnd from 6 to 7 full sized packets and decreases partialBytesAcked to 0. The 2 remaining acknowledged packets increase partialBytesAcked to 2 full sized packets, which is the same value as before the acks were received. Now, let's look at the bad case, where the sender sends an uneven number of packets. Right after the sender increased cwnd to 7 full sized packets (at 4 RTT after the detected packet loss), it sends 7 packets. One RTT later, the sender receives 3 acks for the first 6 send packets (the receiver delays the ack for the 7th packet by the ack delay timer). partialBytesAcked, which was at 2 full sized packets, gets increased by the first 5 acknowledged packets to the value of cwnd. cwnd increases from 7 to 8 full sizes packets and partialBytesAcked decreases to 0. The 1 remaining acknowledged packet increases partialBytesAcked to 1 full sized packet, which is one lower than it was before the acks were received. In fact, partialBytesAcked decreases by one full sized packet every second RTT, because of the delayed ack from the receiver. Once partialBytesAcked hits 0, a delayed ack delays the cwnd increase. A receiver delays an ack, and therefore, in the particular case, the cwnd increase at the sender, at least until a second packet arrive and at most until the ack delay timer fires. Since a second packet arrive after one RTT, the actual delay is min(RTT, ackDelayTimer). In the simulation RTT < ackDelayTimer, which gives us a delay of min(RTT, ackDelayTimer) = RTT. Whether this delay in cwnd increase happens in one cycle depends if partialBytesAcked hits 0 in the cycle. If partialBytesAcked increases to more than W/4 full sized packets one RTT after the packet loss was detected, partialBytesAcked will not hit 0 in the cycle. How high partialBytesAcked increases, depends on the number of packets sent right after the packet loss was detected, and this in turn depends on the number of acknowledged packets before the packet loss was detected. For example, if the packet loss was detected after the W/4 packets were acked, bytesInFlight decreases by 3W/4 full sized packets (3W/4 - 1 acknowledged packet + 1 lost packet). Since cwnd was set to W/2 full sized packets, the sender is allowed to send 3W/4 - W/2 = W/4 packets. The acks for theses W/4 packets increase partialBytesAcked to W/4 full sized packets. Thus, a delay in cwnd increase does not happen in a cycle, only if the packet loss before the start of the cycle was detected after at most W/4 packets were acked. With random packet loss, this happens in 1/4 of all cases, where in 3/4 cases we will see a delay in cwnd increase by min(RTT, ack delay timer) seconds. Following this theory, the value of b in the equation, shown at the end of the last section, increases from 2 to 2 + 3/4 * min(1, ackDelayTimer/RTT), when the receiver acks only every other packet. ===== Results ===== With the values chosen for the simulation, which are MSS = 1252 bytes, RTT = 2 ms (omitting the transmission delay, which is low for the 100 Gb/s link), and p = 2 % we get these throughput results for ack every packet theory: 36,50 Mb/s simulation with periodic loss: 35,77 Mb/s simulation with random loss: 41,13 Mb/s and these throughput results for ack every other packet theory: 34,25 Mb/s simulation with periodic loss: 31,30 Mb/s simulation with random loss: 36,40 Mb/s Remarkable is the difference in throughput between ack every packet and ack every packet by more than 2 Mb/s.