**Social Network** (5 points)

Import your data from Facebook. (You may use http://apps.facebook.com/namegenweb/) This is the network of your acquaintances and the connections between them, but you yourself are excluded from the network.

You may use http://netwiki.amath.unc.edu/DataFormats/GraphMLToPajek to convert GraphML format into Pajek format (Note: It relies on libraries in Python version 2). You may also use http://mrbrain.cs.jhu.edu/graph-services/convert/ for conversions.

- Do an energy layout of the network using the Draw>Draw-Partition-Vector command, using the degree partition and either closeness or betweenness as the vector. Include an image.
- Who is the most central node in the network by degree, closeness and betweenness?
- Point out 3 vertices whose centrality scores differ (e.g. high betweenness but medium closeness) and explain from their position in the network why it happens.
- Identify a node with high betweenness that you could afford to remove without disconnecting other vertices from that component. Create a second network that excludes that person. Use Net>Transform>Remove>Selected Vertices. Recompute betweenness for everyone remaining in the network. Include an image.
- Point out 2 particular vertices and their position in the network. Discuss why their betweenness centrality score did or did not change.
- Point out 1 vertex (if it exists) whose closeness centrality suffers as a result.
- Briefly discuss the ambiguities (& missing data) in this kind of data collection.
- Imagine you are a newcomer who wants to not only be friends with you, but occupy a central position in your network (I know, multiple personality is a bit hard to keep track of). You only have time to make 2 new acquaintances out of your network of friends. Which 2 would you choose to maximize your closeness centrality?
- Add yourself to the network by using the command Net>Transform>Add>Vertices and adding edges in the Draw window and compute your closeness. Which 2 vertices would you connect to to maximize your betweenness score (what is your betweenness?).

For this assignment you may use existing data if you do not have a Facebook account.

We will be using the ACL (Association for Computational Linguistics) anthology, composed by Mark Joseph & Drago Radev: http://clair.eecs.umich.edu/aan/index.php.

We will just use two networks derived from this data set. The files are the weighted co-authorship network CoAuthorshipNetwork.net (how many papers two people co-authored) and the weighted citation network (how many papers of co-author A cite papers of co-author B) AuthorCitationNetwork.net. In each of these networks, an author is only included if they have at least 10 papers in the ACL dataset.

Your tasks are the following:

- Load the two networks. They should both have the same number of vertices: 1559.
Compute the density of both:
`Info > Network > General`. Which one has the higher*density*? Why could this be the case? - Compute the clustering coefficient of both.
`Net > Vector > Clustering Coefficients > CC1`. And then`Info > Vector`. Interpret the difference. Do this on an undirected version of the citation network`Net > Transform > Arcs->Edges`, but for the rest of the assignment use the directed version. - In the co-authorship network, compute the
*degree*,*closeness*, and*betweenness*of each author. Following is a (rather complicated) way to sort the vertices by their centralities:- Apply the centrality measure so that you have a vector of values for each vertex
- With that vector selected in the drop-down menu, select
`Vector > Make Permuation` - With that permuation selected in the permutation drop-down menu, select
`Operations > Reoder > Network`. This will create a new network - Re-calculate the centrality for the ordered network. Click on the
`edit`button next to the new centrality vector. Now the vertices are ordered from least to most central, so scroll to the bottom to get the top 5 (include the list).

- In the citation network, compute the
*indegree*and*proximity prestige*of each author.- for proximity prestige, you are getting the input domain of the vertex (everyone who cites that person directly or indirectly),
and dividing by the average distance to those vertices. You will use
`Net > Partitions > Domain > Input` - this will produce two things: a partition with the size of the input domain of each vertex, and a vector of average distances to vertices in the input domain
- create a vector from the input domain size partition
`Partition > Make Vector` - then select the second drop down menu for the vector to be the average distance
- select
`Vectors > Divide First by Second`. This will be the input prestige of each vertex

- for proximity prestige, you are getting the input domain of the vertex (everyone who cites that person directly or indirectly),
and dividing by the average distance to those vertices. You will use
- Look for the highest correlation in a centrality measure for the co-authorship network and prestige
(indegree or proximity prestige) for the citation network. Please give all pairwise correlations.
Which two measures are the most correlated? Interpret.
(Caution! Make sure that you are using the centrality/proximity measures with the original vertex ordering, and then find the correlations.)
- Select a centrality measure as the first vector in the vector drop down menu
- There is a second drop-down menu right below it, select a prestige measure
- Select
`Vectors > Info`. This will give you the*Pearson correlation coefficient* - Make sure that the measures were applied to the original ordering of the vertices, so that you are correlating values for the same vertex

- [1 bonus] Finally, load the file CitationNetWoCoauthors.net. This is the citation network with citations between co-authors removed (the reason being that an author may be citing their own paper and in the process citing their co-authors). We're trying to get a more "unbiased" prestige measure where we don't take direct citations by co-authors into account. Compare the density of this network with the complete author citation network. What percentage of the citation edges was from co-authors?
- [2 bonus] Perform the same analysis on the whole graph obtained from http://clair.eecs.umich.edu/aan/download.php.

Submission of your homework is via WebCampus. You must submit all the required files in a single document containing all the answers.

Acknowledgement: The assignment is modified from Lada Adamic.