CS 765 Complex Networks

Fall 2016

Network Lab 5

Due on Tuesday Dec 13, 2016 at 2:30 pm

Power-law network

Using any programming language, generate 100,000 random integers from a power law distribution with exponent alpha = 2.1. Note that the slides #94-108 discuss the power law generation and fitting.

  1. What is the largest value in your sample? Is it possible for a node in a network to have a degree this high (assuming you don't allow multiple edges between two nodes)?
  2. Construct a histogram of the frequency of occurrence of each integer in your sample. (Pajek will let you calculate the degree of each individual node Net > Partitions > Degree > All. Then, export the partition as a '.clu' file by clicking on the save icon to the left of the partitions drop-down select menu. Now, you can import it into Excel or another program and histogram it.) Try both a linear scale plot and a log-log scale plot. What happens to the bins with zero count in the log-log plot?
  3. Try a simple linear regression on the log transformation of both variables. (In Matlab, you can plot two data sets together as follows: plot(x1,y1,'r-',x2,y2,'b:'). This will plot y1 vs. x1 as a red solid line, and y2 vs. x2 as a blue dotted line. If you are using the fitlineonloglog.m Matlab script, you will feed it the binned data, and it will take the log of the x and y for you before doing a linear fit.) What is your value of the power-law exponent alpha? Include a plot of the data with the fit superimposed.
  4. Now exponentially bin the data and fit with a line. What is your value of alpha?
  5. Do a cumulative frequency plot of the original data sample. Fit, plot, and report on the fitted exponent and the corresponding value of alpha.
  6. Finally, do a maximum likelihood fitting of the data. Plot the results and report the alpha.
  7. Which method was the most accurate? Which one, in your opinion, gave the best view of the data and the fit?
Network resilience (Bonus: 5 points)

For this task, you will use a sample gnutella network gnutella2.gdf. The Guess toolbars, downloadable as resiliencedegree.py and resiliencebetweenness.py from cTools will work on modestly sized networks (~1000 nodes) that are undirected. The resilience toolbars will let you specify the % of nodes to be removed and whether it is random failure (nodes are selected at random) or targeted attack (the highest degree nodes or nodes with highest betweenness are removed). It will also compute the size of the largest component and display the network after the nodes are removed. You may also do this assignment in igraph or any other software.

Please answer the following about the network (turn in 1 image of the original network, and 1 image of the network at less than 1/2 of its original size according to one of the attack strategies).

  1. What network you are using (what are the nodes and edges).
  2. What percentage of the nodes need to be removed to shrink the giant component to 1/2 of its size in degree targeted vs. betweenness targeted vs. random failure? Comment on this result with respect to the degree distribution and community structure (or lack thereof) in your network.
  3. Construct a random network with the same number of nodes and edges (you can do this by selecting 'Empty' when starting up Guess and then typing makeSimpleRandom(numberofnodes,numberofedges) )
  4. How do the percentages of nodes removed compare in the intentional attack and random failure in order to reduce the size of the largest component in this network by 1/2?
  5. [bonus] How does the resilience of your network compare to that of this equivalent random graph?
Submitting your files

Submission of your homework is via WebCampus. You must submit all the required files in a single pdf document containing all the answers.

Acknowledgement: The assignment is modified from Lada Adamic.