style="display:inline-block;width:728px;height:90px"
data-ad-client="ca-pub-7505528228218001"
data-ad-slot="1225241371">

Step 1 - design our application

Our application is going to be very simple. It is going to read a pcap capture file that contains some Ip4 packets that have been fragmented. It is going to reassemble those fragments, create a new packet that is made up of just the reassembled data. We're going to drop the datalink headers and simply insert a new Ip header in front all those fragments, so that our packet's DLT will be Ip4 (DLT is the first header in the packet.)

Here is what our new packet will look like in memory:

+------------+--------+--------+
| Ip4 header | frag 1 | frag 2 |
+------------+--------+--------+

We need to handle the incoming stream of packets. So the first thing we need to setup is a packet handler that will receive packets from libpcap. We're not going to be concerned with multi-threading issues in this tutorial. So to receive packets our main application class will simply implement the PcapPacketHandler interface. Once we have the packets we will need to check if the packet is Ip4 packet and if its fragmented or not.

For all Ip4 packets, fragmented or not, we going to stuff them into a reassembly buffer that we are going to use for IP datagram, fragmented or not.

The Ip4 flag NO_MORE_FRAGMENTS is going to give us a clue about when the fragment is complete, but we can't always rely on that flag. Fragments can arrive out of sequence or even be dropped along the way and never arrive. So we are also going to keep track of how many bytes we have reassembled. When that total matches the length of the entire unfragmented datagram, then we know we have received all fragments and we are done.

For those cases where fragment is dropped and never arrives, we are also going to implement a simple timeout mechanism that will timeout each reassembly buffer past certain amount of time.

Here is some pseudo code that our application is going to implement:

loop {
  Receive packet from libpcap;
  if packet is Ip4 packet then
    get or create reassembly buffer and store in a map;

    calculate offset into the buffer and add fragment
  
    if the packet is complete then
      remove buffer from map;
      dispatch buffer to user's handler;
    endif

  endif

  timeout buffer entries;
}

User handler {
  receive reassembly buffers;
  create a new IP only packet;
  scan the packet;
  to packet.toString() to get pretty output;
}

Reassembly buffer

This is a very important piece of our application therefore we need to plan it out in detail. We're calling this buffer IpReassemblyBuffer and it extends a JBuffer.

We are going to allocate a large JBuffer which will hold our ip header and all the fragments combined. Like so:

+------------+--------+--------+
| Ip4 header | frag 1 | frag 2 |
+------------+--------+--------+

The buffer is also going to keep track of timeouts. We're going to set a time value at which time the buffer becomes officially timed out. We will implement a simple isTimedout():boolean method to check for that condition. The method simply compares the timeout timestamp with the current time and if its past its due date, return true.

The buffer needs to keep track of number of bytes already assembled and the total length of the IP data gram. When the 2 are equal, that means the buffer is complete and we can dispatch if to the user. We're also going to implement as boolean method that checks for this condition isComplete().

To keep track of all the buffers, we're going to use a JRE Map and use a 32 bit int hash we generate from ip fragments ip header using fields, Ip4.id(), Ip4.source(), Ip4.destination(), Ip4.protocol like so:
int hash = (id() << 16) ^ source() ^ destination() ^ type();

We're also going to use a PriorityQueue that will prioritize buffers for us based on the timeout timestamp value. Buffers will be ordered according to timeout value. The packets on top of the queue are going to be either timedout or closer to timeout than any other buffer on the queue. This is going to lets us efficiently check packets on the queue, until we reach a packet that is not timedout, at which time we can stop.

The first fragment that we see is the one that creates the buffer for that Ip datagram. At the time of the construction of the buffer, we're going to use the ip header of that fragment as a template for the IP header we need to insert infront off all the fragments in the buffer. We also need to reset a few fields in the header to match the new packet that we are creating out of the fragments. We need to either recalculate or reset to 0 the header crc, clear the MORE_FRAGMENTS flag, drop any optional headers by resetting the hlen field to 5 and also set the total length field to the new length of our IP datagram.

The buffer will never be complete unless we receive that last fragment. That last fragment is crucial since it tells us the length of the original IP datagram. If all the fragment arrive in sequence then the last fragment also means that reassembly is complete and we can dispatch to user. Although we could receive fragments out of sequence and still receive a fragment after the last one has been received. Another important thing we need to set, is to change the size of the buffer to match that of the entire datagram. The buffer's physical size is 8K, our datagrams are probably going to be smaller than that, so there will be some unused space at the end, but the buffer will be strictly bounded to datagram data.

So in summary. We have a buffer Map and a timeout Queue. The Map keeps track of reassembly buffers for us based on a special hashcode, while the timeout queue uses the priority queue mechanism to sort our buffers and keep buffers that have timed out at the top.

The user handler

The user handler is going to receive ip reassembly buffers. These buffers may or may not be complete, but they will always have atleast an ip header and 1 fragment.

We will check if the buffer is complete and report an error message if its not. Otherwise we will just create a packet out of it.

There is no need to copy the data out of the buffer, it already contains everything we need. It is freshly allocated so its our to do as we please. It has an Ip4 header at the start and then all of the reassembled fragments already copied into it.

We are simply going to peer the a JMemoryPacket with our buffer. Then we are going to run a scan on the packet to decode it, telling the scanner that the first header is Ip4.

And that's it.