TCP-study

TCP

TCP is a transport layer protocol provides connection-oriented, reliable service to applications. It supports unicast only. TCP connection uses socket: the combination of an IP address and a port number which is uniquely identified by the two end sockets. TCP connection establishment: two end TCP modules allocate required resources for the connection and negotiate the value of the parameter uses such as maximum segment size(MSS), given by the receiver side, and receiving buffer size(advertised window, WIN). Initial sequence number(ISN), specified by the sender side.

Three-way Handshake

1.An end host initiates a TCP connection by sending a SYN packet with ISN, n in sequence number field, an empty payload field and MSS, TCP receiving window size and SYN flag bit is set.

2.The other end replies a SYN packet with ACK=n+1, its own ISN, MSS and own TCP receiving window size.

3.The initiating host sends an acknowledgement: ACK=m+1

Four-way Handshake

A TCP connection is full duplex so that each end of the connection has to shut down its one-way data flow. After termination performed, the connection must stay in TIME_WAIT state for twice the Maximum Segment Life(MSL, 2a) to wait for delayed segments. If an unrecorable error is detected, either end can close the TCP connection by sending a RST segment.

TCP Half-Close. One end TCP sends a packet with the FIN flag set. The other end acknowledges the FIN segment. The data flow in the opposite direction still works.

TCP Data Flow

TCP provides a byte-stream connection to the application layer. There are sending buffer and receving buffer in both sender module and receiver module.

TCP Error Control

TCP segment may get lost in network since TCP uses IP service and IP is connectionless and unreliable.. TCP provides error control for application data by retransmitting lost or errored segments.

Each data byte is assigned a unique sequence number. TCP uses cumulative acknowledgements(by default) to inform the sender of the last correctly received byte in order. Error detection is performed in each layer of the TCP/IP stack by means of header checksums, and error packets are dropped.

If a segment is dropped, an acknowledgement will be sent to the sender for the 1st byte in this segment(expecting this byte). A gap in the received sequence numbers indicates a transmission loss or wrong order, and an acknowledgement for the first byte in the gap may be sent to the sender.

Selective acknowledgement (SACK) is used to report multiple lost segments.

TCP Retransmission

A transmission timer is started by a sender when it sends out a TCP segemt and if no ACK received when the time expires, this segment is retransmitted.

An overly small value causes frequent timeouts and unnecessary retransmission. A too large value causes a large delay when a segment gets lost. The value of retransmission timer should be larger than but of the same magnitude as a measurement of the Round Trip Time(RTT). TCP continuously measure the RTT and updates the retransmission timer value, defined as Retransmission TimeOut(RTO), dynamically.

RTT: the time difference between sending a target segment and receving the ACK for the segment is measured. min(RTO) = 1 second.

RTT measurement is performed at the both ends of a TCP connection, each end may run the measurement only on one target segment at any time. According to Karn’s algorithm: RTTs and RTTd are not updated based on retransmission. 500ms is the smallest unit of both RTT measurements and RTO timeout counts.

RTO Exponential Backoff: This algorithm is used to update RTO when the retransmission time expires for a retransmitted segment(no RTT measurement available). RTO is doubled each time until it reaches 64 sec.

TCP Interactive Data Flow

Telnet, ssh. The server echoes the key back to user and piggybacks the ack for the key stroke sent by user. To reduce the number of small segments to be more efficient: Delayed Acknowledgement and Nagle Algorithm.

Delayed Acknowledgement timer goes off every K ms if there is new data to send during this period, the ACK can be piggybacked with the data segment. Otherwise, an ACK segment is sent till K ms.

Nagle algorithm: each TCP connection can have only one small segment outstanding and all subsequent bytes are bufferd until an ACK for the first segment is received. Then, these buffered bytes are sent in a single segment. More efficient than sending multiple segments with the cost of increased delay for the user.

TCP Congestion Control and Flow Control

TCP supports bulk data flows, where a large number of bytes are sent through the TCP connection. Application: email, FTP, HTTP. The source always wants to increase its sending rate to achieve high throughput but they should be bounded by the maximum rate that can be allowed without causing network congestion or receiver buffer overflow for a low packet loss rate.

Congestion control and flow control are used to cope with congestion problems. TCP uses slow start and congestion avoidance to react to congestion in routers and to avoid receiver buffer overflow.

TCP Sliding Window Flow Control

The receiver advertises the maximum amount of data it can receive(the Advertised Window, or awnd) and the sender is not allowed to send more data than the awnd. However, congestion can still occur.

TCP congestion Control(Avoidance)

TCP needs to adjust its sending rate in reaction to the rate fluctuations of other flows sharing the same buffer. A new TCP connection should increase its rate as quickly as possible to take all the available bandwidth. At the same time, TCP should slow down its rate increase when the sending rate is higher than some threshold. The sender can infer congestion when a transmission timer goes off. The receiver reports congestion implicitly by sending duplicate acks.

The receiver provides two parameters to influence the senders transmission rate: awnd and MSS. The sender maintains two variables for congestion control: congestion window size(cwnd) to upper bound the sender rate, slow start threshold(ssthresh). The size of sliding window = min(cwnd, awnd) for tthe sender.

if cwnd <= ssthresh:
  cwnd = cwnd + segsize(MSS)
else:
  cwnd = cwnd + segsize * segsize / cwnd + segsize / 8
  
# when a congestion occurs(indicated by retrasmission timeout), reset
sstthresh = max[2*segsize, min(cwnd, awnd)/2]
cwnd = 1 segsize.  # into slow start phase

# Set cwnd = 1 segsize(1 MSS bytes) whenever starting traffic on a new connection, or whenever increasing traffic after congestion was experienced
# ssthresh takes an initial value of 65535 bytes, and changes only when a congestion occurs

Fast Retransmit

After receiving three duplicate acks, the sender retransmits the segments without waiting for the retransmitting timer to expire. After the retransmission, congestion avoidance is performed.

Fast Recovery

Used when three or more duplicated ACKs are received.

1) after the third duplicate ACK is received: ssthresh = max[2 segsize, min(cwnd, awnd)/2], retransmit the missing segment, and then cwnd = sshthresh + 3 segsize

2) for each additional duplicate ack received: cwnd = cwnd + segsize. transmit a segment if allowed by the window size.

3) when the ack for the retransmitted segment arrives(new ack): cwnd = ssthresh + segsize