libtcpcsm - A TCP Congestion State Machine

Author: Shane Alcock

Email: contact@wand.net.nz

--------------------------------------------------------------------------
Copyright (c) 2010-2012 The University of Waikato, Hamilton, New Zealand.
All rights reserved.

This code has been developed by the University of Waikato WAND research
group. For further information, please see http://www.wand.net.nz/.
--------------------------------------------------------------------------

The libtcpcsm software is designed to facilitate analysis of TCP congestion
behaviour, i.e. detecting loss or reordering events, tracking congestion
window size and understanding why TCP sends less data than expected, using
packet header traces as input. It is based on libtrace, a comprehensive and 
fast packet trace processing library which supports many common trace formats.

There are two main components to this software: libtcpcsm, a userspace library
that passively detects TCP congestion events in network traffic traces, and
the tools built using libtcpcsm.

The main difficulty in categorising congestion events is the variation in
TCP implementations across different operating systems, e.g. Windows XP 
triggers a fast retransmit after only two duplicate ACKs instead of three like
all other OSes. Previous solutions have either implemented a separate state 
machine for each OS, which is inefficient and difficult to maintain, or based
their decisions on IETF standards only, which leads to inaccurate results.
tcpcsm manages to account for OS variability within a single state machine,
providing the best of both worlds.

libtcpcsm also recognises and detects a variety of TCP features that were 
uncommon when previous tools were developed, such as DSACK and F-RTO.

libtcpcsm has been extensively validated using a series of controlled 
experiments against a range of TCP senders running different OS configurations. 
More details on the validation process can be found at 
http://www.wand.net.nz/~salcock/tcpcsm/valid.php

libtcpcsm and the associated tools are licensed under the GNU General Public
License (GPL) version 2. Please see the included file COPYING for details of 
this license.


Installation
============

Required libraries:
	libtrace - http://research.wand.net.nz/software/libtrace.php
	libflowmanager - http://research.wand.net.nz/software/libflowmanager.php
	libtcptools - http://research.wand.net.nz/software/libtcptools.php

Building the tcpcsm software can be done using the following commands:
	./configure
	make
	make install

By default, libtcpcsm and the tools will install to /usr/local - this can be
changed using the --prefix option with ./configure.


Tools
=====

There are currently three tools shipped with libtcpcsm. They are:

	tcpcsm - a simple congestion event reporter
	psh_analyser -  reports PSH flight size, where a PSH flight is the
			amount of data sent between PSH flags
	flight_cwnd -	reports flight sizes, which can be used as an
			approximation of the congestion window size

Both psh_analyser and flight_cwnd report the congestion events that are 
identified by the standard tcpcsm tool in addition to flight sizes. This can be
disabled by running the tools with the -q option.

More details on the command-line arguments accepted by each tool can be obtained
by running the tool with the -h option.

Output
======

All events reported by libtcpcsm or an associated tool follow the same basic
format. The format consists of the following fields, separated by spaces, in
the order shown below:

	server ip
	client ip
	server port 
	client port
	flow id number
	direction
	event type
	unix timestamp

There may be more fields following the timestamp, depending on the event type.
Refer to the individual tool or event documentation for more information.

Server and client is determined purely by using the smallest port number for
the flow. Very simplistic, but consistent. If the port numbers are the same, 
the IP address is used as a tie-breaker.

The flow id number is a unique number assigned to each flow when it first 
appears, allowing you to identify events for the same flow easily.

The direction describes which of the two halves of the connection the event
applies to. 0 is the server->client half, 1 is the client->server half.

Event types in libtcpcsm can be split into two categories: congestion related
and not congestion related. Congestion related events are events that may have
some effect on the congestion window, e.g. retransmits, end of loss recovery,
DSACK. Other events include uncommon, unexpected or possibly invalid TCP/IP 
behaviour that may be of interest, such as reordered segments, duplicate IP 
Ids and anomalous sequence numbering.

Congestion Event Types:

RTO		Retransmission due to the expiry of the retransmit timer.
FRETX		Fast retransmit, i.e. after 3 duplicate ACKs.
MS_FRETX	Fast retransmit after only 2 duplicate ACKs.
SACK_FRETX	Fast retransmit due to SACK information.
BAD_FRETX	Fast retransmit due to Reno behaviour (invalid in NewReno).
LOSS_REC	Retransmit during recovery from an RTO event.
FREC		Retransmit during recovery from a fast retransmit event.
UNEXP_FREC	Retransmit during fast recovery for an unexpected sequence 
		number.
FREC_OVER	Fast recovery has ended.
LOSS_END	Recovery from an RTO has ended.
RETX_OVERLAP	Retransmit overlaps with a previous retransmit.	
RETX_NEW	Retransmit also includes some new unseen data.
UNNEEDED	Retransmit was observed after recovery had ended, i.e. the
		retransmission was unnecessary.
FRTO_PROBE	Segment is an FRTO probe.
FRTO_INVALID	Segment is an invalid FRTO probe.
FRTO_NOTSPUR	FRTO has determined that the previous RTO was not spurious.
FRTO_SPUR	FRTO has determined that the previous RTO was spurious and the
		congestion window should be reverted.
DSACK		Duplicate SACK observed.
NOT_DSACK	SACK resembling DSACK was observed, but was not actually a 
		DSACK.
DSACK_REVERT	DSACK concluded that all retransmissions were spurious and the
		congestion window should be reverted.
	

Other Event Types:

REORDER		Segment arrived out of order, but was not a retransmit.
SEQ_CHANGE 	Sequence number was anomalous compared with the rest of the 
		TCP stream.
LINK_DUP	Packet is a link-layer duplicate, i.e. same seqno, IP Id, of a
		previous packet.
IPID_DUP	Packet has the same IP Id as a previous packet, but is not a
		link-layer duplicate.
BAD_SHAKE	Segment has been sent prior to successful completion of the TCP
		handshake.
UNEXP_SYN	A SYN has been unexpectedly observed in the middle of a
		functional TCP connection.
SYN_PAYLOAD	A payload-bearing SYN has been observed.
BAD_DSACK	DSACK information observed in a SACK block other than the first
		one, which violates the DSACK spec.
BAD_ZERO_WIN	An invalid zero window probe was observed - they must be one 
		byte only.
MISSING_DATA	Data has been acknowledged that libtcpcsm has never seen.
INCOMP_SACK	A SACK block had an option length less than 8 bytes.
RWIN_LIMITED	Connection became limited by the receive window.
RWIN_LIM_END	Connection is no longer limited by the receive window.
FLOW_END	Flow has concluded - also reports the total number of kilobytes
		and packets sent in that direction. 

The unix timestamp tells when the reported event occurred in the packet trace.
Be careful with timing - packet traces are often captured in the middle of the
path between the endpoints. The timestamp for a retransmit is not the same as
the time that the RTO timer expired; it is the time that the retransmitted
packet was observed at the passive monitor.

Events are reported in the order that they are detected within libtcpcsm. 
Subsequent analysis of these events may require them to be sorted or split
based on timestamp, flow id or event type. Much of this can be done using
standard Linux utilities such as 'sort' and 'grep'. More detailed analysis
will require the development of your own scripts to parse the output file
appropriately. We typically use python for this, but any scripting language
that can easily split and process lines of text will do the trick.


