NETWORK MEASUREMENT

NETWORK TOPOLOGY DISCOVERY
For the layer 3 topology, Traceroute is a good tool ( if the machine responses well for the ICMP message). But if you want to find layer 2 topology, traceroute will not work, since layer 2 is transparent to it.

If you want to find the layer 2 topology you have to use the SNMP, and in the following paper some ideas were proposed to solve the problem. They all need to get info from MIB.

Also how to find the hidden layer 2 switch between the layer 3 device is a good problem, based on some BW measurement tools, the difference result from 2 different BW measure methods will provide some useful info about the existence of layer 2 devices.

But if you want to identify the internal logical topology, only from the end-to-end measurement, you may need network tomography. This is a new field which benefit greatly from the wealth of statistical theory and algorithms. And the topology identification is one promising application.

  • References:
  1. Network Tomography: recent developments
  2. Internet Tomography
  3. IEEE INFOCOM 2003 "Traffic Monitoring and Topology Inference"
  4. Maximum Likelihood Network Topology Identification from Edge-based Unicast Measurements.
  5. Maximum Likelihood Network Topology Identification from End-to-end Unicast Measurements. (By using the sandwidth probes, we can get sample about the queue delay along each link. Then we build the maixmum likelihood function, and use it to infer the tree topology which will most likely fit the real topology. And a novel Markov Chain Monte Carlo procedure was developed for the rapid determination of the most likely topologies.)
  6. Network Tomography and the identification of share infrastructure.
  7. Discovering General Logical Network Topologies ( Based on the result from 5 )
CLUSTER BY NETWORK DISTANCE

Given N end hosts, which belong to different administrative domains, how could we divide them into different clusers or groups by the network distance between them without knowing the info about the underlying topology ?

There are several projects related to this problem:

  • Global Network Position (GNP)
    • The key idea is to represent the complex structure of the Internet by a simple geometric space (e.g. an N-dimensional Euclidean space). In this representation, each host in the Internet is characterized by its position in the geometric space with a set of geometric coordinates. If the representation is accurate, then the easily computable geometric distances between hosts in this geometric space can accurately approximate the Internet network distances, and no actual network measurements are required.

    • If the number of landmark is D, then the vector will be D-dimension. And the distances of all paths between N end hosts can efficiently communicated by K sets of coordinates of D numbers each (i.e. O(KD) of data), as opposed to K(K-1)/2 individual distances (i.e., O(K^2) of data).
    • As to the network distance, the metric is round-trip propagation and transmission delay, a relatively stable characteristic between hosts.
  • Internet Distance Map Service (IDMaps)
    • The objective of this research is to explore scalable design alternatives for an architecture to provide Internet Distance Maps Service (IDMaps). In IDMaps, special HOPS servers maintain a virtual topology map of the Internet consisting of end hosts and special hosts called Tracers. This virtual topology map is used to predict Internet distance. For example, the distance between hosts A and B is estimated as the distance between A and its nearest Tracer T1, plus the distance between B and its nearest Tracer T2, plus the shortest path distance from T1 to T2 over the Tracer virtual topology. As the number of Tracers grow, the prediction accuracy of IDMaps tends to improve.
    • Where to put place the Tracer is an interesting problem.IDMaps address this in two different ways, namely the K-HST and the minimum K-center algorithms. These algorithms have been used to determine placement of fire stations, ambulance placement.

Based on all these related works, I am going to use the algorithm used in ECO (Effective Collective Operation) and the network distance to cluster the hosts. Here, you can find more about ECO.

  • Select the landmark.

    I will use the Domain Name Sever(DNS) of each hosts as the landmarks. But each time a new DNS is added the dimension of the space will increased by one, then we may need to recaculate all the network distance again. We should find an optimal solution, that is mean only add new DNS as the landmark when it is necessary.

  • Get the coordinate of each hosts.

    Suppose that there are n hosts ( H(i), where i=1, 2, ...,n) , and m landmarks ( L(j), wherej=1,2,3, ... , m).

    Cor[H(i)] = { Xi1, Xi2, ..., Xij, ... , Xim}, where Xij=Latency[H(i), L(j)]

  • Caculate the network distance (and the vector similarity) based on the coordinates

    D(H(i),H(j)) = sum of (Xik-Xjk)*(Xik-Xjk), where k=1,2, ... , m

  • Use the following algorithm to cluster the hosts.
Put all the hosts in a graph, where the vertexes are the hosts, and the edge between them is the network distance that we get from the measurement.

initialize clusters to empty

for all hosts

host.min_edge = minimum cost edge incident on node

sort edges by nondecreasing cost

for all edges(a,b)

if a and b are in the same cluster

continue

if (edge.weight > Q1 * node(a).min_edge) or (edge.weight > Q1 * node(a).min_edge)

continue

if node(a) in a cluster

if ( edge.weight > Q2 * node(a).cluster.min_edge )

continue

if node(b) in a cluster

if ( edge.weight > Q2 * node(b).cluster.min_edge )

continue

merge node(a).cluster and node(b).cluster

set new_cluster.min_edge=min( node(a).cluster.min_edge, node(b).cluster.min_edge )