# CS 765 Complex Networks

## Due on Tuesday Dec 2, 2014 at 9:30 am

Power-law network (7 points)

You may wish to refer to the Power-laws ``Scale free'' networks and the Generating and Fitting Power Law Distributions in Matlab to figure out how to complete the tasks.

Generate 100,000 random integers from a power law distribution with exponent alpha = 2.1

1. What is the largest value in your sample? Is it possible for a node in a network to have a degree this high (assuming you don't allow multiple edges between two nodes)?
2. Construct a histogram of the frequency of occurrence of each integer in your sample. Pajek will let you calculate the degree of each individual node (Net > Partitions > Degree > All). Then, export the partition as a '.clu' file by clicking on the save icon to the left of the partitions drop-down select menu. Now, you can import it into Excel or another program and histogram it. Try both a linear scale plot and a log-log scale plot.
3. What happens to the bins with zero count in the log-log plot?
4. Try a simple linear regression on the log transformation of both variables. In Matlab, you can plot two data sets together as follows: plot(x1,y1,'r-',x2,y2,'b:'). This will plot y1 vs. x1 as a red solid line, and y2 vs. x2 as a blue dotted line. (If you are using the fitlineonloglog.m Matlab script, you will feed it the binned data, and it will take the log of the x and y for you before doing a linear fit). What is your value of the power-law exponent alpha? Include a plot of the data with the fit superimposed.
5. Now exponentially bin the data and fit with a line. What is your value of alpha?
6. Finally, do a cumulative frequency plot of the original data sample. Fit, plot, and report on the fitted exponent and the corresponding value of alpha.
7. Which method was the most accurate? Which one, in your opinion, gave the best view of the data and the fit?
The Watts Strogatz small world model (3 points)

Go to http://ccl.northwestern.edu/netlogo/models/SmallWorlds. This is a NetLogo model that will allow you to vary the rewiring probability.

1. Adjust this probability from 0 to 1, each time hitting "rewire" and allowing it to calculate the clustering coefficient and average path length. Does your plot agree with what you saw in lecture?
2. Try using a spring layout. In what ways do the random links make the world smaller?
LexRank (bonus 4 points)

Select a piece of text (10-20 sentences) that you would like to summarize and paste it in the appropriate box of the LexRank demo.

If you are unable to paste text into the text box, try a different browser.

1. There are two parameters you can vary:
• the cosine similarity threshold determines how similar two sentences have to be in order to share and edge.
• the salience threshold determines how high a sentence's PageRank has to be in order for that sentence to be included in the summary.
Vary the cosine similarity threshold and record the most salient sentence. Does the most salient sentence change as you vary the threshold?
2. Accordingly, report on a cosine similarity threshold that gave you the best result (if applicable).
3. Compare the 1 sentence summary to the 2 or 3-sentence summary. In your opinion, how much do the 2nd and 3rd sentences add (in terms of adding more information).
4. Would you have chosen them, or a different sentence? Relate your answer to the structure of the lexical similarity graph.