During the Internet history, deployed network devices played an important role in the design of the network protocols. Multipath TCP is a concrete example illustrating this when it has to cope with middleboxes. Today, I experienced a situation that impacts (slightly) the Multipath extension to QUIC.
While developing my MultipathTester application, I decided to observe the performance of multipath protocols using a dual-stacked WiFi access point providing both IPv4 and IPv6 addresses. As I know this network has some separated infrastructure for each IP version, it makes sense to create paths over each address. I thus launched my multipath tests that first ran Multipath TCP bulk transfer and then the Multipath QUIC one. For Multipath TCP, the experiment ran smoothly. When running the bulk download transfer of a 10 MB file with Multipath QUIC, it surprisingly took very long time. So long, that after having waited waited for nearly two minutes, I decided to stop the test. “Ok, this might be a random bug, that’s not a big deal”, I though. I thus ran again the same test over the same WiFi network. And got the same result.
That was surprising, as my previous tests with similar setups led to comparable results between both protocols. Here, either Multipath QUIC takes minutes to complete a 10 MB file transfer or never completes it. Even more: single-path QUIC leads to acceptable results with a completion time close to the one of Multipath TCP. Observing worse results when using multiple paths rather than only one is not unusual, as stated by previous research works on Multipath TCP. Naive packet scheduling decisions might slightly alter multipath performance. But getting a completion time of ~5 seconds in single-path and then having one greater than 10 minutes in multipath, this is definitely not a scheduling issue.
I thus dug the case and had a look at what the server saw about this connection. The initial path is created over the IPv4 address. Once the handshake completes the client creates paths over IPv6 addresses. In the initial Multipath QUIC design, a path was created once a packet has been sent or received with the corresponding Path ID. Those packets are thus well received by the server, which acknowledges the creation of multiple paths and starts using them. Strangely, the client never acknowledges the data sent over additional IPv6 paths, but it sends only flow control frames. This situation makes the server crazy: packets that it sends are never acknowledged, but it still receives packets over that path that avoids considering them as failed. When using the OLIA congestion control scheme, the sending congestion window becomes very small and server tries to retransmit packets on the IPv6 paths, leading to a livelock situation that could be escaped only if all packets are retransmitted on IPv4 and no more packets are seen on IPv6.
The culprit of this situation is actually the network where the WiFi used by the client is attached. Indeed, a client located inside the network can send UDP packets on its IPv6 address to a server outside the network on port 443. The server thus receives packets sent by the client. However, subsequent packets sent back by the same server are never seen by the client, as the network drops incoming UDP packets. This issue only arises in the IPv6 network. This means that so far, it that network, it is not possible to perform QUIC connections between IPv6 hosts.
Therefore, before considering paths as usable, hosts must perform symmetrical address validation as now described in the latest IETF QUIC draft. This is done by sending particular data over the path to be tested. If the correct packet/frame acknowledging the previous data is received, then the path is considered as usable and can be used to exchange useful frames. The current Multipath QUIC draft includes this requirement. Notice that the case described in this document is just another argument in favor of address validation, as there are numerous other good reasons to perform it (security, mitigating amplification attacks,…).
Now that the validation is performed at server-side, I succeed to get similar results with both Multipath TCP and Multipath QUIC in the same network. Hourra! :-)