Mark Bednarczyk's blog

1.3 beta 3 update planned

Update: We can't confirm there is a memory leak in beta 2 at this time. May have been a false alarm. None the less beta 3 will be released today, regardless, due to the System.out debug message appearing in the Tcp protocol when it parses tcp options. A slight oversight on our part. I added a junit testcase that specifically checks for anything rogue on the System.out and System.err after stressing the decoder and protocols. This should avoid such a embarrassing bug sneaking in to the release in the future.

JBuffer performance

I've been investigating JBuffer accessor method performance in the next generation jnetpcap 2.X implementation (still unreleased). On my WinXP development system, a single dynamic JBuffer.getInt took 105ns. After a little bit of optimization, got that down to 95, then 65 and that was the limit I couldn't break through. The java native call itself, when measured against an empty method, takes 20ns, so that is the hard limit imposed by the JVM and its calling convention.

Finally, I managed to get that number down to 25ns per getter call. The trick was to pre-fetch the memory address in java and pass that address as a parameter to the native function. This saved a JNI call from the native function, which is less efficient then doing same thing in JVM. This little trick shaved off about 22% overhead, as empirically measured.

This low level performance is critical in the new API. I am going after significant performance improvements with the new implementation.

Here is a preview of the class hierarchy in the 2.x nio package:

 JNative
 +-> JMemory
 |   +-> JBuffer
 |       +-> MappedBuffer
 |           +-> SlidingWindowBuffer
 |
 +-> JCallback

The new API deprecated JStruct and few other things which unnecessarily complicate the API.

The new MappedBuffer and SlidingWindowBuffer classes are very interesting and also significantly improve performance and resource utilization in 2.x API.

The MappBuffer class allows 1 or more JBuffer's to be logically mapped to as a contiguous single logical buffer. Its usage is intended for packet reassembly, although there will be multitude of other usage for this type of buffer. For example, it will allow several IP fragments to be mapped into a single complete fragment, while referencing the physical buffers in memory of individual fragments without copies. The individual IP fragments will be mapped to a reassembled buffer by reference not by copy.

Status 9/15/10

Only 3 more bugs to fix, which I should be able to finish tomorrow. During my testing and especially handling of multitude of different packet types, I have found few additional bugs.

I also hit the Tcp and Udp protocol headers hard. Updated every possible javadoc, added a bunch of RFC extracts and quotes as documentation. Finished off TcpOptions, setters/getters etc.. These 2 protocols look real good and complete now.

I also expanded the TestUtils under the tests/java directory a bit to make it easier to write more complex test cases. I added JPacketBuffer type class which can load into memory a bunch of pcap headers and packets and allow test cases to work with those packets directly out of memory. Not everything in the API needs to work with open pcap captures all the time. This greatly sped up some test cases.

Also I have been a little disappointed with performance of these fast in-memory buffers. Not much we can do with the 1.X API, but there will be significant performance gains with the 2.X API architecture. Especially when it comes to peering and referencing data in native land. Peering in 2.X API is pure natively handled with no JNI calls. Currently there are too many JNI calls to perform a java object to native memory peer. The 2.X API architecture fixes all these deficiencies with the benefit being, super fast peers. My rough measurements take down a peering call from 100ns to between 5-10ns. Also the scanner improvements, I am confident I can at minimum double performance of jnetpcap if not triple it.

Performance

Here are some unofficial performance numbers:

jNetPcap version 1.3b1
WinXP x86_32Description
73,410,000total packets processed
59,051packets per second
16.9usamount of processing time per packet in micro-seconds (10e-6)
260,111headers per second
3,845nsamount of processing time per header in nano-seconds (10e-9)

Spec: Intel Core Duo (2-core), WinXP Pro, 4GMb Memory, JRE 1.5.0_8
Setup: 7,341 different packets are loaded into a memory buffer from 15 different files found in tests directory for total of 22,030,069 bytes of data and processed 10,000 times each. The raw packets in memory are dispatched to JBufferHandler where their buffers are peered with a static packet object and scanned for headers.

My goal for next version of the decoder is to greatly increase these performance numbers. I am confident I can at minimum double these numbers with the new implementation. For one, peering is a lot more efficient. Second the decoder will implement what I call"cut-through" scanner where first few common headers are going to be processed without any function calls in the scanner. Kind of pre-processed as efficiently as possible in native code. That and number of other efficiency improvements should greatly increase the above performance numbers.

Bugs in 1.3b1

There are several new bugs discovered in the beta release jnetpcap 1.3b1. These are all fixed now and the same fixes are applied to the 1.4 branch. Due to the fact that not just one, but several bugs have been found, we will be going through the release-candidate process. This means that next release after 1.3b1 will now be 1.3.0.rc1 and so on, until no more bugs are found and the final 1.3.0 production quality release.

For a complete list of bugs so far, please visit the jnetpcap.1.3 release overview page:

http://jnetpcap.com/jnetpcap-1.3

Syndicate content