Big Data Analysis

Real-Time Per-Flow Measurement

Internet traffic has substantially increased. While the growth in volume necessarily accelerates development of high-speed routers, there has been a consistent need to measure the size of the traffic, not just roughly estimate the total traffic. Every individual flow size must be measured quickly and accurately, not only for accounting but also for security.

Because of the speed of routers, the data structure that holds counting values for flows is stored in an expensive, but fast memory, like SRAM, so the available amount of memory is not large. Therefore, the goal of a per-flow measurement algorithm is to measure individual flow sizes as accurately as possible with a small amount of memory. Also, updating a counter should be fast enough to be run on a high-speed router. For real-time detection of heavy flows, online decoding is also required. To be able to decode online, the number of memory accesses and hash computations for the estimation should be small enough to be performed on a high-speed router. A lot of research on per-flow measurement has been proposed such as spectral bloom filters (SBF), the multi-resolution space-code Bloom filter (MRSCBF), counter braid (CB), probabilistic multiplicity counting (PMC), compact spread estimator (CSE), counter sharing method (CSM), randomized CSM, etc.

Recyclable Counter with Confinement (RCC)

Our approach for better estimation is (1) to recycle memory blocks by repeated resetting to prevent saturation, and (2) to use a very small memory block for each flow for better accuracy.

Obviously, recycling memory blocks distorts all the information that a memory block holds, and a small memory block has a small estimation upper limit, although those may provide slow saturation and better accuracy. To resolve these problems, we use a hash table to accumulate flow sizes measured by a memory block. Whenever a memory block for a flow reaches its maximum capacity, its estimation result is accumulated in the hash table (therefore, the information contributed by the flow is preserved in the table), and then the memory block is recycled after resetting the block to the average noise level (therefore, the information contributed by the other flows is preserved in the small memory block). Because the majority of flows are small enough to fit in a memory block, the number of flows to be managed by the hash table is not large, compared to the total number of flows. Recycling with resetting sharply contrasts with previously proposed schemes in that the number of 1’s in RCC fluctuates, whereas the number in other schemes is ever-increasing. Repeated resetting prevents saturation, and thus, estimation accuracy is enhanced.

You can learn more from our journal papers shown as below.

Related Papers

  • DaeHun Nyang and DongOh Shin, “Recyclable Counter with Confinement for Realtime Per-flow Measurement,” IEEE/ACM Transactions on Networking, Vol. 24, No. 5, pp. 3191-3203, Dec 2016.
  • Rhongho Jang, DongGyu Cho, Youngtae Noh, and DaeHun Nyang, “RFlow+: An SDN-based WLAN Monitoring And Management Framework,” Proceedings of IEEE INFOCOM 2017, Best-in-session Presentation Award, Atlanta, pp. 460-468, May 2017.