02. Importing data for social network analysis

** This page is now updated for igraph version 0.6.3 in R 2.15.2 (Feb 2, 2013)

In both statnet and igraph, you start with importing a dataset, which you convert into either a "network object" (statnet) or "igraph object" (igraph). Basically, it's just telling R to recognize a given set of numbers as something to be manipulated in the network analysis framework.

There are several formats in which you can save data for use in social network analysis, (1) adjacency matrix, (2) edge lists, (3) adjacency lists (or node lists) and (4) affiliation matrix/incidence matrix. I will talk about affiliation matrices in a separate section for bipartite networks

(1) An adjacency matrix is a matrix in which the rows and columns represent different nodes or vertices (i.e., the dots on the sociogram). In an unweighted adjacency matrix, the edges (i.e., lines) are represented by 0 or 1, with indicating that these two nodes are connected. If two nodes are connected, they are said to be adjacent (hence the name, adjacency matrix). In a weighted matrix, however, you can have different values, indicating different edge qualities (or tie strenghts). Here is an example of a hypothetical adjacency matrix of birds with 5-digit ID numbers.

23732 23778 23824 23871 58009 58098 58256
23732 0 1 0 1 0 1 0
23778 1 0 1 1 0 1 0
23824 0 1 0 0 0 0 0
23871 1 1 0 0 1 1 0
58009 0 0 0 1 0 1 0
58098 1 1 0 1 1 0 1
58256 0 0 0 0 0 1 0

You can import an adjacency matrix into a usable format in either statnet or igraph using these sample codes (note, this is for an unweighted matrix):

in igraph:

library(igraph) # This loads the igraph package
dat=read.csv(file.choose(),header=TRUE,row.names=1,check.names=FALSE) # choose an adjacency matrix from a .csv file
m=as.matrix(dat) # coerces the data set as a matrix
g=graph.adjacency(m,mode="undirected",weighted=NULL) # this will create an 'igraph object'

 In igraph version 0.6, the way the igraph object is displayed has changed. The above code will return:

IGRAPH UN-- 7 10 --
+ attr: name (v/c)

The letters on the first line (there can be up to 4) indicates some basic information about the graph. The first letter indicates whether this is a directed ('D') or undirected ('U') graph. The 2nd letter tells you if  this is a named ('N') graph--i.e., whether or not the vertex set has a 'name' attribute. The 3rd letter tells you if this graph is weighted ('W'). The fourth letter is 'B' for bipartite graphs. These letter codes are followed by two numbers: the first is the number of vertices and the second is the number of edges. 
The second line gives you information about the 'attributes' associated with the graph. In this case, there is only one attribute, called 'name', which is associated with the vertex set. 

equivalent in statnet:

library(statnet) # This loads the statnet package

el=read.csv(file.choose(),header=TRUE,row.names=1,check.names=FALSE) # choose an adjacency matrix from a .csv file
m=as.matrix(el) # This coerces the object into a matrix, just in case
net=network(m,matrix.type="adjacency",directed=FALSE) # This converts the matrix into a an undirected "network object"


(2) An edge list is a two-column list of the two nodes that are connected in a network. Here's the same data as above, but in an edge list form.

V1 V2
23732 23778
23732 23871
23732 58098
23778 23824
23778 23871
23778 58098
23871 58009
23871 58098
58009 58098
58098 58256

**You can also easily input edgelists in igraph or statnet. HOWEVER, there is one major annoyance with statnet: it tends to read the numbers as continuous numbers, inputting this list directly will produce a network with 58256 nodes (the highest number)--most of which will not be connected. The code provided below gets around the problem, but it was very annoying at first. You will not have this problem if your node IDs are not numerical.

The simplest workaround for this is to make sure that the IDs are read as characters:

in igraph:

dat=read.csv(file.choose(),header=TRUE) # choose an edgelist in .csv file format
el=as.matrix(dat) # coerces the data into a two-column matrix format that igraph likes
g=graph.edgelist(el,directed=FALSE) # turns the edgelist into a 'graph object'

 **A much easier way to import an edge list in igraph is to use the graph.data.frame() function:

in statnet:

el=read.csv(file.choose(),header=TRUE) # read a .csv file

(3) An adjacency list, also known as a node list, presents the 'focal' node on the first column, and then all the other nodes that are connected to it (i.e., adjacent to it) as columns to the right of it.

Focal Interactor 1 Interactor 2 Interactor 3 Interactor 4 Interactor 5
23732 23778 23871 58098

23778 23732 23824 23871 58098
23824 23778

23871 23732 23778 58009 58098
58009 23871 58098

58098 23732 23778 23871 58009 58256
58256 58098

This is sometimes a convenient way to organize the data from the field... However, the major downside is that it is not very easy to import data this way into igraph or statnet. igraph DOES support importing an adjacency list into a graph object, but you need the data to be in a specific format (a 'list object', with row as a different vector)... I can't get this to work. The best way to deal with it seems to be to use the codes below to convert the adjacency list into an edgelist. I'm going to assume the data is in the format as shown above, with the first row being column names, and saved as a .csv file:

lines=scan(file.choose(),what="character",sep="\n",skip=1) # read the csv file (skipping the header), line-by-line as character string.
lines=gsub(","," ",lines) # replace commas with spaces
lines=gsub("[ ]+$","",gsub("[ ]+"," ",lines)) # remove trailing and multiple spaces.
adjlist=strsplit(lines," ") # splits the character strings into list with different vector for each line
col1=unlist(lapply(adjlist,function(x) rep(x[1],length(x)-1))) # establish first column of edgelist by replicating the 1st element (=ID number) by the length of the line minus 1 (itself)
col2=unlist(lapply(adjlist,"[",-1)) # the second line I actually don't fully understand this command, but it takes the rest of the ID numbers in the character string and transposes it to list vertically
el=cbind(col1,col2) # creates the edgelist by combining column 1 and 2.

.... now you have an edgelist, saved as "el" that you can import into igraph or statnet using the codes provided above in the "edgelist" section.


... and here's a set of codes to convert adjacency lists into adjacency matrix, written by Dave McDonald.
**Note: I have received some feedback indicating that the following code produces errors when copied & pasted. I'm not sure if it's something wonky with how the text appears on the website... In any case, I now provide a text file you can download at the bottom of the page, just in case.

dat=scan(file.choose(), what="character",sep="\n")
# scan in a file with IDs as character strings {even if numeric"}, and with fields separated by commas; sample input file is "sample_adjlist.txt"
# stores the number of lines in the file
dat=gsub(","," ",dat)
# convert commas to spaces and add a space {in case the input file lacks any spaces after a comma}
dat=gsub("[ ]+$","",gsub("[ ]+"," ",dat));
# remove trailing and multiple spaces
# create a dymXdym matrix of zeros
adjlist=strsplit(dat," ")
# split the file into a list of length "dym" with distinct elements. N.B. adjlist[[1]][1] calls the first element in the vector of characters in adjlist

# creates ids of type character to initiate header list of node IDs
for (i in 2:dym) {ids=c(ids,adjlist[[i]][1])}
# creates a character object with the list names
for (i in 1:dym) {adjlist[[i]]=adjlist[[i]][2:length(adjlist[[i]])]}
# strips the first column {which is now in "ids"} from adjlist

for(i in 1:dym) {tfmat[[i]]<-ids %in% tfmat[[i]]}
# converts tfmat into lists of T-F of length dym of which elements of adjlist[[i]] occur in the ids header

for(j in 1:dym){for(i in 1:dym){if(tfmat[[j]][i]==T){m[j,i]<-1}}}
# turns cells of matrix m, corresponding to cells in tfmat having value T, into 1s

# creates a dataframe "adjmat" with 0/1 values from m and row names from "ids"
# changes header names to the "ids" vector of ID values

(4) An affiliation network is one in which edges are defined by co-membership in groups. Many social networks are defined this way. Conceptually, this ends up being a form of bipartite graph. Thus, I will discuss these at length in a separate page called "affiliation/bipartite networks"
Dai Shizuka,
Aug 2, 2011, 2:13 PM
Dai Shizuka,
Apr 27, 2012, 8:27 AM
Dai Shizuka,
Apr 5, 2010, 5:21 PM
Dai Shizuka,
Apr 5, 2010, 5:45 PM