Delayed ACK and TCP Performance
Warning
Migrated from: https://cwiki.apache.org/confluence/display/NUTTX/Delayed+ACK+and+TCP+Performance
uIP and NuttX
The heart of the NuttX IP stack derived from Adam Dunkel’s tiny uIP stack back at version 1.0. The NuttX TCP/IP stack contains the uIP TCP state machine and some uIP “ways of doing things,” but otherwise, there is now little in common between these two designs.
NOTE: uIP is also built into Adam Dunkel’s Contiki operating system.
uIP, Delayed ACKs, and Split Packets
In uIP, TCP packets are sent and ACK’ed one at a time. That is, after one TCP packet is sent, the next packet cannot be sent until the previous packet has been ACKed by the receiving side. The TCP protocol, of course, supports sending multiple packets which can be ACKed be the receiving time asynchronously. This one-packet-at-a-time logic is a simplification in the uIP design; because of this, uIP needs only a single packet buffer any you can use uIP in even the tiniest environments. This is a good thing for the objectives of uIP.
Improvements in packet buffering is the essential improvement that you get if upgrade from Adam Dunkel’s uIP to his lwIP stack. The price that you pay is in memory usage.
This one-at-a-time packet transfer does create a performance problem for uIP: RFC 1122 states that a host may delay ACKing a packet for up to 500ms but must respond with an ACK to every second segment. In the baseline uIP, the effectively adds a one half second delay between the transfer of every packet to a recipient that employs this delayed ACK policy!
uIP has an option to work around this: It has logic that can be enable to split each packet into half, sending half as much data in each packet. Sending more, smaller packets does not sound like a performance improvement. This tricks the recipient that follows RFC 1122 into receiving the two, smaller back-to-back packets and ACKing the second immediately. References: uip-split.c and uip-split.h.
The NuttX TCP/IP Stack and Delay ACKs
The NuttX, low-level TCP/IP stack does not have the limitations of the uIP TCP/IP stack. It can send numerous TCP/IP packets regardless of whether they have been ACKed or not. That is because in NuttX, the accounting for which packets have been ACKed and which have not has been moved to a higher level in the architecture.
NuttX includes a standard, BSD socket interface on top
of the low-level TCP/IP stack.
It is in this higer-level, socket layer where the ACK
accounting is done, specifically in the function
send().
If you send a large, multi-packet buffer via send()
,
it will be broken up into individual packets and each
packet will be sent as quickly as possible, with no
concern for whether the previous packet has been ACKed or not.
However, the NuttX send()
function will not return to
the caller until the final packet has been ACKed.
It does this to assure that the callers data was sent
successfully (or not).
This behavior means that if an odd number of packets
were sent, there could still be a delay after the final
packet before send()
receives the ACK and returns.
So the NuttX approach is similar to the uIP way of doing
things, but does add one more buffer, the user provided
buffer to send()
, that can be used to improve TCP/IP
performance (of course, this user provided buffer is
also required by in order to be compliant with send()
””
specification.
The NuttX Split Packet Configuration
But what happens if all of the user buffer is smaller than
the MSS of one TCP packet?
Suppose the MTU is 1500 and the user I/O buffer is only
512 bytes?
In this case, send()
performance degenerates to the same
behavior as uIP:
An ACK is required for each packet before send()
can
return and before send()
can be called again to send
the next packet.
And the fix? A fix has recently been contributed by
Yan T that works in a similar way to uIP split
packet logic:
In send()
, the logic normally tries to send a full packet
of data each time it has the opportunity to do so.
However, if the configuration option CONFIG_NET_TCP_SPLIT=y
is defined, the behavior of send()
will change in the
following way:
send()
will keep track of even and odd packets; even packets being those that we do not expect to be ACKed and odd packets being the those that we do expect to be ACKed.send()
will then reduce the size of even packets as necessary to assure that an even number of packets is always sent. Every call to send will result in an even number of packets being sent.
This clever solution tricks the RFC 1122 recipient in the
same way that uIP split logic does.
So if you are working with hosts the following the RFC 1122
ACKing behavior and you have MSS sizes that are larger that
the average size of the user buffers, then your throughput
can probably be greatly improved by enabling CONFIG_NET_TCP_SPLIT=y
NOTE: NuttX is not an RFC 1122 recipient; NuttX will ACK every TCP/IP packet that it receives.
Write Buffering
The best technical solution to the delayed ACK problem
would be to support write buffering.
Write buffering is enabled with CONFIG_NET_TCP_WRITE_BUFFERS
.
If this option is selected, the NuttX networking layer will
pre-allocate several write buffers at system initialization
time. The sending a buffer of data then works like this:
send()
(1) obtains a pre-allocated write buffer from a free list, and then (2) simply copies the buffer of data that the user wishes to send into the allocated write buffer. If no write buffer is available,send()
would have to block waiting for free write buffer space.send()
then (3) adds the write buffer to a queue of outgoing data for a TCP socket. Each open TCP socket has to support such a queue.send()
could then (4) return success to the caller (even thought the transfer could still fail later).Logic outside of the
send()
implementation manages the actual transfer of data from the write buffer. When the Ethernet driver is able to send a packet on the TCP connection, this external logic (5) copies a packet of data from the write buffer so that the Ethernet driver can perform the transmission (a zero-copy implementation would be preferable). Note that the data has to remain in the write buffer for now; it may need to be re-transmitted.This external logic would also manage the receipt TCP ACKs. When TCP peer acknowledges the receipt of data, the acknowledged portion of the data can the (6) finally be deleted from the write buffer.
The following options configure TCP write buffering:
CONFIG_NET_TCP_WRITE_BUFSIZE
: The size of one TCP write buffer.CONFIG_NET_NTCP_WRITE_BUFFERS
: The number of TCP write buffers (may be zero to disable TCP/IP write buffering)
NuttX also supports TCP read-ahead buffering. This option
is enabled with CONFIG_NET_TCP_READAHEAD
. TCP read-ahead
buffer is necessary on TCP connections; otherwise data
received while there is no recv()
in place would be lost.
For consistency, it would be best if such a TCP write
buffer implementation worked in a manner similar to the
existing TCP read-ahead buffering.
The following lists the NuttX configuration options available to configure the TCP read-ahead buffering feature:
CONFIG_NET_TCP_READAHEAD_BUFSIZE
: The size of one TCP read-ahead buffer.CONFIG_NET_NTCP_READAHEAD_BUFFERS
: The number of TCP read-ahead buffers (may be zero to disable TCP/IP read-ahead buffering)
A future enhancement is to combine the TCP write buffer management logic and the TCP read-ahead buffer management so that one common pool of buffers can be used for both functions (this would probably also require additional logic to throttle read-buffering so that received messages do not consume all of the buffers).