research paper about network analysis

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Publications
Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

Advanced Search
Journal List
Health Psychol Behav Med
v.6(1); 2018

Network analysis: a brief overview and tutorial

David hevey.

School of Psychology, Trinity College Dublin, Dublin, Ireland

Objective : The present paper presents a brief overview on network analysis as a statistical approach for health psychology researchers. Networks comprise graphical representations of the relationships (edges) between variables (nodes). Network analysis provides the capacity to estimate complex patterns of relationships and the network structure can be analysed to reveal core features of the network. This paper provides an overview of networks, how they can be visualised and analysed, and presents a simple example of how to conduct network analysis in R using data on the Theory Planned Behaviour (TPB).

Method : Participants ( n = 200) completed a TPB survey on regular exercise. The survey comprised items on attitudes, normative beliefs, perceived behavioural control, and intentions. Data were analysed to examine the network structure of the variables. The EBICglasso was applied to the partial correlation matrix.

Results : The network structure reveals the variation in relationships between the items. The network split into three distinct communities of items. The affective attitude item was the central node in the network. However, replication of the network in larger samples to produce more stable and robust estimates of network indices is required.

Conclusions : The reported network reveals that the affective attitudinal variable was the most important node in the network and therefore interventions could prioritise targeting changing the emotional responses to exercise. Network analysis offers the potential for insight into structural relations among core psychological processes to inform the health psychology science and practice.

Introduction

Health psychology research examines how the complex interactions between biological, psychological, and social factors influence health and well-being. For example, the UK Foresight map of obesity (see https://www.gov.uk/government/collections/tackling-obesities-future-choices ) provides a comprehensive representation of the complex system of over 300 relationships between over 100 variables and obesity (Finegood, Merth, & Rutter, 2010 ). The developers of the map assumed that obesity is the result of the interplay between a wide variety of factors, including a person’s physical make-up, eating behaviour, and physical activity pattern. The system reflects the relevant factors and their interdependencies that produce obesity as a behavioural outcome. The variables were classified into various categories of causal factors; for example, social psychological factors (e.g. peer pressure), individual psychological factors (e.g. stress), environmental factors (e.g. the extent to which one’s environment makes it easy to engage in regular walking), and individual physical activity factors (e.g. functional fitness). On the basis of expert academic opinion the Foresight report authors proposed that the variables in the system not only influence obesity, but can also have positive (e.g. high levels of stress cause high levels of alcohol consumption) and negative (e.g. high levels of stress cause low levels of physical activity) effects on each other, some have distal effects whereas others have proximal effects, and effects can be unidirectional (e.g. social attitudes towards fatness causes conceptualisations of obesity as an illness) or reciprocal (e.g. physical activity causes functional fitness, which causes physical activity). Networks are a fundamental characteristic of such complex systems; consequently, health psychological science can benefit from considering the network structure of the phenomena that it seeks to understand. It has been argued that networks pervade all aspects of human psychology (Borgatti, Mehra, Brass, & Labianca, 2009 ), and in the past decade network analysis has become an important conceptual and analytical approach in psychological research. Although network analysis has a long history of being applied in causal attribution research (e.g. Kelly, 1983 ) and social network analysis (Clifton & Webster, 2017 ), its broader potential for psychological science was highlighted over a decade ago by van der Maas et al. ( 2006 ). The frequently reported patterns of positive correlations between various cognitive tasks (e.g. verbal comprehension and working memory) are typically explained in terms of a dominant latent factor, i.e. the correlations reflect a hypothesised common factor of general intelligence ( g ). However, van der Maas and colleagues argued that this empirical pattern can also be accounted for by means of a network approach, wherein the patterns of positive relationships can be explained using a mutualism model, i.e. the variables have mutual, reinforcing, relationships. From a network analysis perspective, the network of relationships between the variables constitute the psychological phenomenon (De Schryver, Vindevogel, Rasmussen, & Cramer, 2015 ), which is a system wherein the constituent variables mutually influence each other without the need to hypothesise the existence of causal latent variables (Schmittmann et al., 2013 ). In addition to addressing psychometric issues (Epskamp, Maris, Waldorp, & Borsboom, In Press ) network perspectives can inform other areas of psychological science.

A key impetus for the current research on networks in psychology derives from Borsboom and colleagues’ influential application of networks in the field of clinical psychology in relation to psychopathology symptoms (e.g. Borsboom, 2017 ; Borsboom & Cramer, 2013 ; Cramer et al., 2016 ; Cramer, Waldorp, van der Maas, & Borsboom, 2010 ). Network models are also increasingly applied in other areas such as health related quality of life (HRQOL) assessment in health psychology (e.g. Kossakowski et al., 2016 ), personality (e.g. Costantini et al., 2015 ; Mõttus & Allerhand, 2017 ), and attitudes (e.g. Dalege et al., 2015 ). The psychosystems research team (i.e. Denny Borsboom, Angélique Cramer, Sacha Epskamp, Eiko Fried, Don Robinaugh, Claudia van Borkulo, Lourens Waldorp, Han van der Maas) are critical innovators for network analysis in psychology and this paper draws extensively from the key papers from the team and their collaborators; the psychosystems.org webpage is an essential resource for anyone interested in network analysis theory, process and applications.

To date, network analysis has not been widely applied in health psychology; however, network models are particularly salient for health psychology because many of the psychological phenomena we seek to understand are theorised to depend upon a large number of variables and interactions between them. The biopsychosocial model (e.g. Engel, 1980 ) has underpinned health psychology research and theory for the past 4 decades, and it reflects a complex system of mutually interacting and dynamic biological, psychological, interpersonal, and contextual effects on health (Lehman, David, & Gruber, 2017 ; Suls & Rothman, 2004 ). From a network perspective, health behaviours and outcomes can be conceptualised as emergent phenomena from a system of reciprocal interactions: network analysis offers a powerful methodological approach to investigate the complex patterns of such relationships. The overall global structural organisation, or topology, of the phenomenon and the roles played by specific variables in the network can be analysed in a manner that other statistical approaches cannot provide. In general, health psychology research, like many areas of psychology, has studied aspects of systems in isolation: for example, using regression models to examine the relationship between focal beliefs and moods and a specific outcome such as health behaviours or adaptation to illness. Although such research provides important insights, this approach is not suited for examining complex systems of interconnected variables and it does not help us easily piece back the various separate research findings on discrete components/sub-pathways into the more complex and complete system. As noted above, the complex interplay of physiological, psychological, social and environmental factors have been highlighted in the context of obesity. Comparable exercises for other chronic illnesses will produce similarly complex networks of variables. Network analysis provides a means to understand system-level relationships in a manner that can enhance psychological science and practice.

Health psychology research often focuses on HRQOL as a key outcome variable and HRQOL is frequently understood as being the common effect of observed items in scales, e.g. increased daily pain causes lower mental health. Network analysis has been applied to the SF-36 (Ware & Sherbourne, 1992 ), a widely used HRQOL scale, to examine the patterns of relationships between the items: Kossakowski et al. ( 2016 ) found that the observed covariances between the items may result largely from direct interactions between items. From this perspective, HRQoL emerges from a network of mutually interacting characteristics; the specific nature of the interacting relationships (e.g. causal effect, bidirectional effect, or effects of unmodelled latent variables) requires additional clarification. In addition to offering novel insights into psychometrics, a network approach can be applied to other important health psychology variables (e.g. illness representations, coping strategies) to better understand the nature of the relationships between items used in measurement.

Borsboom’s research on the networks of patterns of interconnected relationships between symptoms of various psychiatric disorders has resulted in the development of a novel network theory of mental disorders (Borsboom, 2017 ). This theory provides new insights into how trigger events can activate pathways in strongly connected networks to produce symptoms that can become self-sustaining, i.e. because the symptoms are strongly connected, feedback relations between them mean that they can activate each other after the triggering event has been removed. The absence of the trigger may be not be sufficient to de-activate the symptom network and return the person to a state of health; such insights from a network theory of psychopathology can help inform not only understandings of how and why symptoms are maintained, but also how such networks can be targeted to help transition the network back into a healthy state. Of note, such an approach may be beneficial for health psychology approaches to understanding clusters of symptom presentations over time in conditions such as chronic pain and chronic fatigue syndrome.

The network structures of individuals can be visualised and analysed; consequently we may be able to see how the system of beliefs, emotional states, behaviours and symptoms influence each other over time. Systems might comprise sets of variables that are diverse and only marginally connected, or could consist of variables that are highly interconnected. Understanding an individual’s personalised network may allow insight into when an individual’s specific patterns of beliefs and behaviours reach a tipping point, which then negatively impact on mood and symptoms. Such system transitions (e.g. moving from a state of wellness to being impaired functionally) occur gradually in response to changing conditions or they may be triggered by an external perturbation, e.g. life stressor. An individual may have a very robust network so that it remains stable despite the perturbations (e.g. symptom flare up) and consequently the person can maintain function, whereas other individuals may have less resilient networks wherein it is challenging to restore disturbed equilibrium. How such networks evolve over time and respond to changes in key and peripheral variables cannot be understood using traditional analytical methods: network analysis offers rich potential to further our understanding of complex systems of relationships among variables.

The Causal Attitude Network (CAN) model, which conceptualises attitudes as networks of causally interacting evaluative reactions (i.e. beliefs, feelings, and behaviours towards an attitude object; Dalege et al., 2015 ), is also of particular interest to health psychologists given the centrality of attitudinal variables in many core psychological models (e.g. Theory of Planned Behaviour, Health Belief Model). The capacity to graphically visualise complex patterns of relationships further offers the potential for insight into the salient psychological processes and to highlight theoretical gaps. For example, Langley, Wijn, Epskamp, and Van Bork ( 2015 ) used network analysis to examine the Health Belief Model variables in relation to girls’ intentions to obtain HPV vaccination. They reported that although some aspects of the HBM (e.g. perceived efficacy) were related to intentions, other core constructs such as cues to action were less relevant. In addition, social factors, currently not included in the HBM, were important in the network; such research can inform conceptual developments linking individual beliefs with social context to better understand healthy behaviours. Consequently, the network approach offers the potential to gain novel insights as the network structure can be analysed to reveal both core structural and relational features.

The aim of this paper is to provide an overview of networks, how they can be visualised and analysed, and to present a simple example of how to conduct network analysis on empirical data in R (R Core Team, 2017 ).

What is a network?

At an abstract level, a network refers to various structures comprising variables, which are represented by nodes, and the relationships (formally called edges ) between these nodes. For example, from the Foresight Report the variables such as stress, peer pressure, functional fitness, nutritional quality of food and drink represent nodes in the network, and the positive and negative relationships between those nodes are edges. There are some differences in nomenclature in the network literature: nodes are sometimes referred to as vertices, edges are sometimes referred to as links, and networks are also called graphs. Networks can be estimated based on cross-sectional or longitudinal time-series data; in addition, networks can be analysed at the group or individual level. Cross sectional data from a group can reveal group-level conditional independence relationships (e.g. Rhemtulla et al., 2016 ). Individualised networks based on times series data can provide insights into a specific individual over time (e.g. Kroeze et al., 2017 ). Furthermore, the networks produced by different populations can be compared. In general, network analysis represents a wide range of analytical techniques to examine different network models.

In psychological networks, nodes represent various psychological variables (e.g. attitudes, cognitions, moods, symptoms, behaviours), while edges represent unknown statistical relationships (e.g. correlations, predictive relationships) that can be estimated from the data. A node can represent a single item from a scale, a sub-scale, or a composite scale: the choice of node depends upon the type of data that provide the most appropriate and useful understanding of the questions to be addressed. Edges can represent different types of relationships, e.g. co-morbidity of psychological symptoms, correlations between attitudes.

Two types of edges can be present in a network: (1) a directed edge: the nodes are connected and one head of the edge has an arrowhead indicating a one-way effect, or (2) an undirected edge: the nodes have a connecting line indicating some mutual relationship but with no arrowheads to indicate direction of effect. Networks can be described as being directed (i.e. all edges are directed) or undirected (i.e. no edges are directed). For example, edge direction has been used in psychology networks particularly for representing cross-lagged relationships among variables (Bringmann et al., 2016 ). A directed network can be cyclic (i.e. we can follow the directed edges from a given node to end up back at that node) or acyclic (i.e. you cannot start at a node and end up back at that node again by following the directed edges).

Directed networks can represent causal structures (Pearl, 2000 ); however, such directed networks can have very strict assumptions, i.e. all the variables that have a causal effect are measured in the network, and the causal chain of cause and effect is not cyclic (i.e. a variable cannot cause itself via any path) (Epskamp, Borsboom, & Fried, 2018a ). Although Directed Acyclic Graphs (DAGs) have been frequently reported in the epidemiological research literature in the past two decades (Greenland, Pearl, & Robins, 1999 ), the acyclic assumption may be untenable in many contexts for psychology. For example, in many psychological phenomena, reciprocal effects may exist between variables: having a positive attitude towards a behaviour results in that behaviour, which then results in a more positive attitude. In addition, directed networks suffer from the problem, similar to that arising in Structural Equation Modelling, that many equivalent models can account for the pattern of relationships found in the data (Bentler & Satorra, 2010 ; MacCallum, Wegener, Uchino, & Fabrigar, 1993 ). In their recent review of the challenges for network theory and methodology in psychopathology, Fried and Cramer ( 2017 ) note that despite the plausibility of many causal psychopathological symptom pathways in networks, there is a need to build stronger cases for the causal nature of these relationships. They highlight that many network papers have estimated undirected networks in cross-sectional data, and that even those that use directed networks based on time-series data at best show that variables measured at one moment in time can predict another variable at a different measurement time ( Granger causality ; Granger, 1969 ), which satisfies the requirement for putative causes preceding their effects (Epskamp et al., 2018b ). Although such a temporal relationship may indicate a causal relationship, it is possible that the link may occur for other reasons (e.g. a unidimensional autocorrelated factor model would lead to every variable predicting every other variable over time; Epskamp et al., 2018b ). Spirtes, Glymour, and Scheines ( 2000 ) developed the PC algorithm, which can be used to examine networks to find candidate causal structures that may have generated the observed patterns of relations present. However, such approaches have not been widely used to date in psychological networks. In general, network analysis can be considered as hypothesis-generating for putative causal structures that require empirical validation.

Edges convey information about the direction and strength of the relationship between the nodes. The edge may be positive (e.g. positive correlation/covariance between variables) or negative (e.g. negative correlation/covariance between variables); the polarity of the relationships is represented graphically using different coloured lines to represent the edges: positive relationships are typically coloured blue or green, and negative relationships are coloured red. Edges can be either weighted or unweighted . A weighted edge reflects the strength of the relationship between nodes by varying the thickness and colour density of the edge connecting the nodes: thicker denser coloured lines indicate stronger relationships. Alternatively, the edge may be unweighted and simply represent the presence vs . absence of a relationship; in such a network, the absence of a relationship results in the nodes not having a connecting edge.

Figure 1 presents a simple network model representing the partial correlation matrix between 5 variables (A - E) below ( Table 1 ). The size and colour density of the lines (edges) vary to reflect the varying strength of relationship between the variables; the edges are non-directional as the data represented as bivariate partial correlations between the variables. The network comprises both positive (green lines) and negative correlations (red lines) between the variables. Some variables are more central and have more connections than others: C relates to all the variables in the network, whereas D only relates to two other variables.

An external file that holds a picture, illustration, etc.
Object name is RHPB_A_1521283_F0001_OC.jpg

Sample network with 5 nodes and 8 edges. Postive edges are green and negative edges are red. The numbers represent the correlations between the variables.

Having briefly outlined the basic features of a network, the next sections will outline the three core analytical steps in network analysis:

Estimate the network structure based on a statistical model that reflects the empirical patterns of relationships between the variables
Analyse the network structure
Assess the accuracy of the network parameters and measures.

1. Estimating the Network

Historically, network science has developed using graphical approaches to represent relationships between nodes. For example, Leonhard Euler’s application of ‘geometry of position’, Gustav Kirchoff’s work on the algebra of graphs in relation to electrical networks, and Cayley’s contributions to molecular chemistry all utilised graphical approaches to network data (Estrada & Knight, 2015 ). The network visually represents the pattern of relationships between variables and a network can be estimated using common statistical parameters that quantify relationships, e.g. correlations, covariances, partial correlations, regression coefficients, odds ratios, factor loadings. However, as correlation networks can contain spurious edges, for example due to an (unmeasured) confounding variable, the most common approach in psychology uses partial correlations to create the relationships between variables. For example, if we had a network examining the relationship between risk behaviours (e.g. caffeine consumption) and health outcome (e.g. cancer), the analysis would show a relationship between the variables; however, such a relationship may simply reflect the fact that an unmeasured confound (e.g. smoking) is associated with both caffeine consumption and cancer. Partial correlations, similar to multiple regression coefficients, provide estimates of the strength of relationships between variables controlling for the effects of the other measured variables in the network model. Thus it is critically important to measure such potential confounding variables to ensure that their effects are controlled for. Two nodes are connected if there is covariance between those nodes that cannot be explained by any other variable in the network. The resulting partial correlations not only provide an estimate of the direct strength of relationships, but can also indicate mediation pathways: in Figure 1 A and D are not directly connected (i.e. no edge between them) but A influences C, which in turn influences D, thus C mediates the relationship between A and D. Partial correlation networks can provide valuable hypothesis generating structures, which may reflect potential causal effects to be further examined in terms of conditional independence (Pearl, 2000 ).

As noted previously, undirected network models in psychology have typically been examined, and a frequently used model in estimating such networks is the pairwise Markov Random Field (PMRF), which is a broad class of statistical models. A PMRF model is characterised by undirected edges between nodes that indicate conditional dependence relations between nodes. An absent edge means that two nodes are conditionally independent given all other nodes in the network. An edge indicates conditional dependence given all other nodes in the network. Different PMRF models can be used, depending upon the type of data (continuous, ordinal, binary, or mixtures of these data types) to be modelled. When continuous data are multivariate normally distributed, analysing the partial correlations using the Gaussian graphical model (GGM; Costantini et al., 2015 ; Lauritzen, 1996 ) is appropriate. If the continuous data are not normally distributed then a transformation (e.g. nonparanormal transformation, Liu, Lafferty, & Wasserman, 2009 ) can be applied prior to applying the GGM. The GGM can also be used for ordinal data, wherein the network is based on the polychoric correlations instead of partial correlations (Epskamp, 2018 ). If all the research variables are binary, the Ising Model can be used (van Borkulo et al., 2014 ). When the data comprise a mixture of categorical and continuous variables, the Mixed Graphical Model can be used to estimate the PMRF (Haslbeck & Waldorp, 2016 ). Thus, networks can be estimated from various types of data in a flexible manner.

The network complexity requires consideration. The higher the number of nodes being examined, then the higher the number of edges have to be estimated: in a network with five nodes, 10 unique edges are estimated, whereas in a network with 10 nodes, 45 edges are estimated, and in a network with 20 nodes, 190 edges are estimated. In addition, in the case of an Ising model not only are edge weights estimated but so too are thresholds: in the case of 20 nodes that would mean an additional 20 parameters to be estimated. However, as mentioned above many of these edges (e.g. correlations) may be spurious, and an increase in the number of nodes can lead to over-fitting and very unstable estimates (Babyak, 2004 ). Like all statistical techniques that use sample data to estimate parameters, the correlation and partial correlations values will be influenced by sample variation and therefore exact zeros will be rarely observed in the matrices. Consequently, correlation networks will nearly always be fully connected networks, possibly with small weights on many of the edges that reflect weak and potentially spurious partial correlations. Such spurious relationships will be problematic in terms of the network interpretation and will compromise the potential for network replication. In order to limit the number of such spurious relationships, a statistical regularisation technique, which takes into account the model complexity, is frequently used.

A ‘least absolute shrinkage and selection operator’ (LASSO; Friedman, Hastie, & Tibshirani, 2008 ) with a tuning parameter set by the researcher is applied to the estimation of the partial correlation networks. The LASSO performs well in the estimation of partial correlation networks (Fan, Feng, & Wu, 2009 ), and it results in some small weak edge estimates being reduced to exactly zero, resulting in a sparse network (Tibshirani, 1996 ). The LASSO yields a more parsimonious graph (fewer connections between nodes) that reflects only the most important empirical relationships in the data. Of note, the absence of an edge does not present evidence that the edge is in fact exactly zero (Epskamp, Kruis, Marsman, & Marinazzo, 2017 ). The goal of the LASSO is to exclude spurious relationship but in doing so, it may omit actual relationships. Although many variants of the LASO have been developed, the graphicalLASSO ( glasso , Friedman et al., 2008 ) is recommended both in terms of ease of implementation in specific analysis programmes but also its felxibility in terms of non-continuous data (Epskamp & Fried, In Press ). The edge may be absent from the network if the data are too messy and noisy to detect the true relationship, and quantifying evidence for edge weights being zero is an ongoing research issue (Wetzels & Wagenmakers, 2012 ). Simulation studies show that the LASSO has a low likelihood of false positives, which provides some confidence that an observed edge is indeed present in the network (Krämer, Schäfer, & Boulesteix, 2009 ). However, the specific nature of the relationship reflected in the edge is still uncertain, e.g. the edge could represent a direct causal pathway between nodes, or it could reflect the common effect of a (latent) variable not included in the network model.

As mentioned previously, the use of the LASSO requires setting a tuning parameter. The sparseness of the network produced using the LASSO depends upon the value the researcher sets tuning parameter (λ): the higher the λ value selected the more edges are removed from the network and its value directly influences the structure of the resulting network. The tuning parameter λ therefore needs to be carefully selected to create a network structure that minimises the number of spurious edges while maximising the number of true edges (Foygel & Drton, 2010 ). In order to ensure that the optimal tuning parameter is selected, a common method involves estimating a number of networks under different λ values. These different networks range from a completely full network where every node is connected to each other to an empty network where no nodes are connected. The LASSO estimates produce a collection of networks rather than a single network; the researcher needs to select the optimal network model and typically this is achieved by minimising the Extended Bayesian Information Criterion (EBIC; Chen & Chen, 2008 ), which has been shown to work particularly well in identifying the true network structure (Foygel & Drton, 2010 ; van Borkulo et al., 2014 ), especially when the true network is sparse. Model selection using the EBIC works well for both the Ising model (Foygel Barber & Drton, 2015 ) and the GGM (Foygel & Drton, 2010 ). The EBIC has been widely used in psychology networks (e.g. Beard et al., 2016 ; Isvoranu et al., 2017 ) and it enhances both the accuracy and interpretability of networks produced (Tibshirani, 1996 ).

The EBIC uses a hyperparameter ( γ ) that dictates how much the EBIC will prefer sparser models (Chen & Chen, 2008 ; Foygel & Drton, 2010 ). The γ value is determined by the researcher and is typically set between 0 and 0.5 (Foygel & Drton, 2010 ), with higher values indicating that simpler models (more parsimonious models with fewer edges) are preferred. In many ways the choice of γ depends upon the extent to which the researcher is taking a liberal or conservative approach to the network model. A value of 0 results in more edges being estimated, including possible spurious ones, but which can be useful in early exploratory and hypotheses generating research. Of note, a γ setting of zero will still produce a network that is sparser compared to a partial correlation network that has not be regularised using a LASSO. Although γ can be set at 1, the default in many situations is 0.5. Foygel and Drton ( 2010 ) suggest that setting the γ value 0.5 will result in fewer edges being retained, which will remove the spurious edges but it may also remove some other edges too. A compromise value γ of 0.25 is potentially a useful value to also use to see the impact on the network model produced.

Figure 2 presents the same data (questionnaire items on the big 5 model of personality, with 5 items for each dimension: Openness, Conscientiousness, Agreeableness, Extraversion, and Neuroticism) analysed using γ of 0, 0.5, and 0.99. With the tuning parameter set to 0, the network contains a dense array of connections as more edges are estimated; as the tuning parameter increases, the number of edges estimated decreases as the model become more sparse. This illustrates that the choices made by the researchers in setting the γ level will impact on the nature of the network produced. Of note, Epskamp and Fried ( In Press ) report that comparison of networks based on simulated data using γ of 0.00, 0.25 and 0.50 revealed the higher values of γ were able to reveal the true network structure but that the value of 0 included a number of spurious relationships. They caution that γ of .5 may still be conservative and not reflect the true model, and they note that the choice of γ is somewhat arbitrary and up to the researcher. Epskamp ( 2018 ) reported recently that increasing the γ to 0.75 or 1.00 did not outperform a γ of 0.5 in a well-established personality dataset.

An external file that holds a picture, illustration, etc.
Object name is RHPB_A_1521283_F0002_OC.jpg

Partial correlation networks estimated on same dataset, with increasing levels of the LASSO hyperparameter γ (from left to right: Panel (a) γ = 0, Panel (b) γ = 0.5, Panel (c) = 0.99).

In order to plot the network, the nodes and edges need to be positioned in manner that reflects the patterns of relationships present in the data. The most frequently used approach in psychological networks is the Fruchterman-Reingold algorithm (Fruchterman & Reingold, 1991 ), which calculates the optimal layout so that nodes with less strength and less connections are placed further apart, and those with more and/or stronger connections are placed closer to each other. The development of qgraph as a package to visualise patterns of relationships between nodes in networks was an invaluable contribution to advancing network analysis (Epskamp, Cramer, Waldorp, Schmittmann, & Borsboom, 2012 ).

2. Network Properties

After a network structure is estimated, the graphical representation of the network reveals the structural relationships between the nodes, and we can then further analyse the network structure in terms of its properties. This analysis provides insight into critically important features of the network. For example, are certain nodes more important (central) than others in the network? Is the global structure dense or sparse? Does it contain strong clusters of nodes (communities) or are the nodes isolated?

Not all nodes in a network are equally important in determining the network’s structure: centrality indices provide insight into the relative importance of a node in the context of the other nodes in the network (Borgatti, 2005 ; Freeman, 1978 ). For example, a central symptom is one that has a large number of connections in a network and its activity can spread activation throughout a symptoms network; in contrast, a peripheral symptom is on the outskirts of a network and has few connections and consequently less impact on the network. Different centrality indices provide insights into different dimensions of centrality. The indices can be presented as standardised z score indices to provide information on the relative importance of the nodes, and judging centrality requires careful consideration of the different dimensions in combination. These indices are based on the pattern of the connections in which the node of interest plays a role and can be used to model or predict several network processes, such as the amount of flow that traverses a node or the tolerance of the network to the removal of selected nodes (Borgatti, 2005 ). The most common aspects of centrality typically examined are as follows.

Degree : degree centrality is defined as the number of connections incident to the node of interest (Freeman, 1978 ).

Node strength : how strongly a node is directly connected to other nodes is based on the sum of the weighted number and strength of all connections of a specific node relative to all other nodes. Whilst degree provides information on the number of connections, strength can provide additional information on the importance of that node, for example a node with many weak connections (high degree) might not be as central to the network as one that has fewer but stronger connections. However, as noted by Opsahl, Agneessens, and Skvoretz ( 2010 ) merely focusing on node strength alone as an index of importance is potentially misleading as it does not take account of the number of other nodes to which it connected. Consequently, it is important to incorporate both degree and strength as indicators of the level of involvement of a node in the surrounding network when examining the centrality of a node. Opsahl et al. ( 2010 ) proposed the use of a degree centrality measure, which is the product of the number of nodes that a specific node is connected to, and the average weight of the edges to these nodes adjusted by an alpha ( α ) parameter, which determines the relative importance of the number of edges compared to edge weights. In combining both degree and strength, the tuning α parameter is set by the researcher: if this parameter is between 0 and 1, then having a high degree is regarded as favourable, whereas if it is set above 1, then a low degree is favourable.

Closeness : the closeness index quantifies the node’s relationship to all other nodes in the network by taking into account the indirect connections from that node. A high closeness index indicates a short average distance of a specific node to all other nodes; a central node with high closeness will be affected quickly by changes in any part of the network and can affect changes in other parts of the network quickly (Borgatti, 2005 ).

Betweenness : the betweenness index provides information on how important a node is in the average pathway between other pairs of nodes. A node can play a key role in the network if it frequently lies on the shortest path between two other nodes, and it is important in the connection that the other nodes have between them (Saramäki, Kivelä, Onnela, Kaski, & Kertész, 2007 ; Watts & Strogatz, 1998 ).

Clustering : the extent to which a node is part of a cluster of nodes can be estimated (Saramäki et al., 2007 ). The local clustering coefficient C is the proportion of edges that exist between the neighbours of a particular node relative to the total number of possible edges between neighbours (Bullmore & Sporns, 2009 ). It provides insight into the local redundancy of a node: does removing the node have an impact on the capacity of the neighbouring nodes to still influence each other? An overall global clustering coefficient (also referred to as transitivity) for the entire network can be estimated in both undirected and directed networks. Furthermore, the overall network may comprise communities , i.e. a clustering of nodes that are highly interconnected among themselves and poorly connected with nodes outside that cluster.

Detecting communities requires researchers to not simply interpret the placement of nodes in the visual representation of the data but to examine the patterns present using a formal statistical approach. Fried ( 2016 ) highlights a number of approaches to help identify communities. As latent variable models and network models are mathematically equivalent, examining the eigenvalues of components present in data using exploratory factor analysis is one way to identify how many communities might be present and the factor loadings indicate which nodes belong to which community. More sophisticated approaches include the spinglass algorithim (although this is limited by the fact that it often produces different results every time you run it, and it only allows nodes to be part of one community, whereas nodes may be better described as belonging to several communities at the same time), the walktrap algorithim (which provides more consistent results if you repeat it, but which also only allows nodes to be part of one community), and the Clique Percolation Method (CPM), which allows nodes to belong to more than one community (see Blanken et al., 2018 ).

Overall network topology

Networks can take on many different shapes; however, some common network shapes have been described in detail in the literature. Random networks comprise nodes with random connections, with each node have approximately the same number of connections to others. The distribution of the nodes’ connections follows a bell-curve. ‘Small world’ networks are characterised by relatively high levels of transitivity and nodes being connected to each other through small average path lengths (Watts & Strogatz, 1998 ). A classic example of the ‘small-world effect’ is the so-called ‘six degrees of separation’ principle, suggested by Milgram ( 1967 ). Letters passed from person to person reached a designated target individual in only a small (approximately 6) number of steps; the nodes (individuals) were connected by a short path through the network.

‘Scale free’ networks are characterised by a relatively small number of nodes that are connected to many other nodes (Barabási, 2012 ). These ‘hub’ nodes have an exceptionally high number of connections to other nodes, whereas the majority of non-hub nodes have very few connections. The distribution of the nodes’ connections follows a power law. Research has found that HIV transmission among men who have sex with men can be modelled as a scale free model (Leigh Brown et al., 2011 ); identifying individuals who are have very high levels of connections and represent ‘ superspreaders ’ of infections provides an efficient means for targeted vaccinations (Pastor-Satorras & Vespignani, 2001 ). Within scale free networks, nodes with high centrality measures and extremely higher centrality than other nodes may be ‘hubs’. However, it is critically important to check the pattern of directed relationships between the node and its neighbours, e.g. in a directed network a node could have a high centrality because it has many directed edges to other nodes (high OutDegree centrality) whilst having no edges from those nodes pointing at it (zero InDegree centrality); in this case the node would not be a hub. 1

In addition to group-level analysis, networks can be developed at a person-specific level: a time-series network of an individual may be useful for understanding the relationship between nodes (e.g. symptoms) at an individualised level, and could be used for personalised treatment planning (David, Marshall, Evanovich, & Mumma, 2018 ). If network structures are replicated and nodes emerge as hubs, then changing these hub nodes might have downstream effects on other nodes, which might result in an efficient means to change outcomes (Isvoranu et al., 2017 ). For example, network analysis may reveal that a certain belief is a hub and therefore critical in terms of impact on behaviour change: therefore we could focus our efforts on changing that belief rather than attempting to change multiple beliefs. Developing a better understanding of the structural relationships between the nodes in the network can provide important theoretical and practical insights for health psychology.

3. Network accuracy

As the network is based on sample data, the accuracy of the sample-based estimates of the population parameters reflecting the direction, strength and patterns of relationships between nodes should be considered. To-date much of the research on networks has used edge strength and node centrality to make inferences about the phenomenon being modelled. However, as Epskamp et al. ( 2018a ) note, relatively little attention has been paid towards examining the accuracy of the edge and centrality estimates. Given the relatively small sample sizes that typically characterises psychological research, edge strengths and node centrality may not be estimated accurately. Therefore, it is recommended that researchers determine the accuracy of both. The accuracy of edge weights is estimated by calculating confidence intervals (e.g. 95% CI) for their estimates. As a CI requires knowledge of the sampling distribution of the estimate, which may be difficult to obtain for the edge weight estimate, Epskamp et al. ( 2018a ) developed a method that uses bootstrapping (Efron, 1979 ) to repeatedly estimate a model under either sampled or simulated data, and then estimates the required statistic. The more bootstrap samples that are run, the more consistent the results. Either a parametric bootstrap or non-parametric bootstrap can be applied for edge-weights (Bollen & Stine, 1992 ). For non-parametric bootstrapping, observations in the data are resampled with replacement to create new plausible datasets. Parametric bootstrapping samples new observations from the parametric model that has been estimated from the original data; this creates a series of values that can be used to estimate the sampling distribution. Consequently, the parametric bootstrap requires a parametric model of the data whereas the non-parametric bootstrap can be applied to continuous, categorical and ordinal data. As the non-parametric bootstrap is data-driven and less likely to produce biased estimates with LASSO regularised edges (which tend to dominate in the literature), Epskamp et al. ( 2018a ) emphasise the usefulness and general applicability of the non-parametric bootstrap. If the bootstrapped CIs are wide, it becomes hard to interpret the strength of an edge.

The accuracy of the centrality indices can be examined by using a different type of bootstrapping: subsets of the data are used to investigate the stability of the order of centrality indices based on the varying sub-samples ( m out of n bootstrap; Chernick, 2011 ). The focus is on whether the order of centrality indices remains the same after re-estimating the network with less cases or nodes. A case-dropping subset bootstrap can applied and the correlation stability (CS) coefficient can quantify the stability of centrality indices using subset bootstraps. The correlation between the original centrality indices (based on the full data) is compared to the correlation obtained from the subset of data representing different percentages of the overall sample. For example, what is the correlation between the estimates from the entire data with the estimates based on a subset of 70% of the original sample? A series of such correlations can be presented to illustrate how the correlations change as the subset sample gets smaller (95% of the sample, 80%, 70%, … .25%). If the correlation changes considerably, then the centrality estimate may be problematic. A correlation stability coefficient of .7 or higher between the original full sample estimate and the subset estimates has been suggested as being a useful threshold to examine (Epskamp et al., 2018a ). A CS -coefficient (correlation = .7) represents the maximum proportion of cases that can be dropped, such that with 95 % probability the correlation between original centrality indices and centrality of networks based on subsets is 0.7 or higher (Epskamp et al., 2018a ). It is suggested that the CS -coefficient should not be below 0.25, and preferably it should be above 0.5.

Other applications of network analysis

The majority of research has examined networks based on cross-sectional data from a single group of participants. However, networks can also be determined for individuals over time as well as for comparing different groups. A network can be created for an individual based on time-series data to provide insights into that specific individual. Nodes that are identified as hubs in such networks could be important targets for interventions (Valente, 2012 ). Networks can be developed that model temporal effects between consecutive data measurements. The graphical VAR model (Wild et al., 2010 ) uses LASSO regularisation based on BIC to select the optimal tuning parameter (Abegaz & Wit, 2013 ). When multiple individuals are measured over time, multi-level VAR can be used and it estimates variation due to both time and to individual differences (Bringmann et al., 2013 ).

Networks can be estimated for different groups. Although the lack of methods comparing networks from different groups has been noted (Fried & Cramer, 2017 ), joint estimation of different graphical models (Danaher, Wang, & Witten, 2014 ; Guo, Levina, Michailidis, & Zhu, 2011 ) may prove useful in this context. For example the Fused Graphical Lasso (FGL) was recently used to compare the networks of borderline personality disorder patients with those from a community sample (Richetin, Preti, Costantini, De Panfilis, & Mazza, 2017 ). In addition, van Borkulo and colleagues have developed the Network Comparison Test (NCT) to allow researchers to conduct direct comparisons of two networks as estimated in different subpopulations (Van Borkulo, 2018 ). The test uses permutation testing in order to compare network structures that involve relationships between variables that are estimated from the data. The test focuses on the extent to which groups may differ in relation to (1) the structure of the network as a whole, (2) a given edge strength, (3) and the overall level of connectivity in the network. For example, research has reported that the network of MDD symptoms for those with persistent depression was more strongly connected than the network of those with remitting depression (van Borkulo et al., 2015 ).

Network analysis issues

Like all statistical models, the network model represents an idealised version of a real-world phenomenon that we wish to understand. In selecting the variables to be modelled we must decide which variables to include and how they are to be measured: each of these processes introduces error into the modelling process. A general concern for networks concerns their replicability (e.g. see Forbes, Wright, Markon, & Krueger, 2017 ; and responses by Borsboom et al., 2017 ; Steinley, Hoffman, Brusco, & Sher, 2017 ) and research needs to address this issue by estimating the stability of the networks and examining generalizability of the network model. As noted by Fried and Cramer ( 2017 ) the literature in general requires more conceptual and methodological developments for estimating both the accuracy and stability of networks. The identification of useful thresholds for these parameters will also prove critical in the interpretation of the network models. Similar to other methods of analysis (e.g. regression, SEM), network analysis is sensitive to the variables in the model and to the specific estimation methods used. Hence, the challenges regarding replication and generalizability are not unique to network modelling.

The larger the sample size, the more stable and accurately networks are estimated. Given the recent growth in use network analytic approaches in psychology it is not easy to hypothesise expected network structure and edge weights, which means there is little evidence to guide a priori power analyses. Epskamp et al. ( 2018a ) note that as more network research is conducted in psychology, more knowledge will accumulate regarding the nature of network structure and edge-weights that can be expected.

The dominant methods to date used to discover network structures in psychology are based on correlations, partial correlations, and patterns of conditional independencies. Further developments and application of causal model techniques will advance understanding of the relationships present in networks (Borsboom & Cramer, 2013 ). As noted previously, much of the research in psychological networks has been based on exploratory data analyses to generate networks; there is a need to progress towards confirmatory network modelling wherein hypotheses about network structure are formally tested.

How to run network analysis: an example using R

Many network structure analysis methods can be implemented in the generic software MATLAB and Stata, or specialised network software packages including UCINET (Borgatti, Everett, & Freeman, 2002 ) or Gephi ( https://gephi.org ). The Stanford Network Analysis Platform (SNAP) provides a network analysis library. R is an open-source statistical programming language that facilitates statistical analysis and data visualisation (R Core Team, 2017 ); to date much of the research on psychological networks has used R -packages igraph (Csárdi & Nepusz, 2006 ) or qgraph (Epskamp et al., 2012 ). Of note, the psychosystems research group has created specific R packages that make network analysis easier to implement (see psychosystems.org) . As mentioned at the start of this paper, their website is an essential resource for conducting network analysis in psychology. In this example, we will use the bootnet package as it provides a comprehensive suite of analytical options for network analysis. Data can inputted straight into R or can be imported in various common formats (e.g. csv. or txt. file) or from other data analysis programmes, e.g. Excel, SPSS, SAS and Stata.

R can be obtained via the https://www.r-project.org/ webpage. To download R , you need to select your preferred CRAN (Comprehensive R Archive Network) mirror ( https://cran.r-project.org/mirrors.html ). On the Mirrors webpage, you will find listings of countries that have identical versions of R and should select a location geographically close to your computer’s location. R can be downloaded for Linux, Windows, and Mac OS. The pages are regularly updated and you need to check with releases are supported for your platform. R as a base package can perform many statistical analyses but most importantly, R ’s functionality can be expanded by downloading specific packages.

After installing R ( https://www.r-project.org/ ), it is quite useful to also install R Studio ( https://www.rstudio.com/ ), which provides a convenient interface to R . Once both are installed, opening up R Studio will give a window that is split into 4 panes:

Console/Terminal : this pane is the main graphical interface for the user and this is where the commands are typed in.

Editor : this pane shows the active datasets that you are working on.

Environment/History/Connections : this pane shows the R datasets and allows you to import data from text (e.g. csv. file), Excel, SPSS, SAS and Stata. The History tab allows you see the list of your previous commands.

Files/plots/packages/help: this pane and its tabs can open files, view the most current plot (also previous plots), install and load packages, or use the general R help function.

Under the Tools drop down tap at the top of the R Studio screen, you can select which packages to install for the analyses required. Alternatively the packages can be installed using the Packages tab or they can be directly installed using a typed command. R is a command line driven programme and you can enter commands at the prompt (> by default) and each command is executed one at a time. For the current example, you will need to install 2 packages (‘ggplot2’ and ‘bootnet’) and the relevant command lines are:

>Install.packages("ggplot2")

>Install.packages("bootnet")

Once installed, the packages need to be loaded into R using the library("name of package") command.

>library("ggplot2")

>library("bootnet")

Next we need to tell R to import the data, in this case a csv. file called TPB2018.

The data are taken from a study conducted using the Theory of Planned Behaviour (TPB; Ajzen, 1985 , 2011 ). The TPB assumes that volitional human behaviour is a function of (1) one’s intention to perform a given behaviour and (2) one’s perception of behavioural control (PBC) regarding that behaviour ( Figure 3 ). Furhermore, intentions are influened by one’s attitudes towards the behaviour (e.g. cognitive attitudes : is the behaviour good or bad?; affective attitudes : is the behaviour pleasant or unpleasant?), one’s subjective norm beliefs (e.g. descriptive norms : do others perform the behaviour?; injunctive norms : do others who are important to me want me to perform the behaviour?), and one’s perceptions of control regarding the behaviour (e.g. self efficacy : level of confidence to perform the behaviour; perceived control : barriers to stop the behavoiur being performed). The extent to which PBC influences behaviour directly, rather than indirectly through intention, depends on the degree of actual control over performing the behaviour (Sniehotta, Presseau, & Araújo-Soares, 2014 ). The TPB has been a dominant theoretical approach in health behaviour research for a number of decades and has been examined extensively. The vast majority of studies have used correlational designs to investigate cross-sectional and prospective associations between TPB variables and behaviour (Noar & Zimmerman, 2005 ); systematic reviews indicate that the TPB accounts for approximately 20% of variannce in health behaviour, and that intention is the strongest predictor of behaviour (McEachan, Conner, Taylor, & Lawton, 2011 ).

An external file that holds a picture, illustration, etc.
Object name is RHPB_A_1521283_F0003_OB.jpg

Theory of planned behaviour.

Following receipt of ethical approval from the local university REC (2014/6/15), students completed a questionnaire regarding regular exercise (Datafile in supplementary material). This cross-sectional dataset is used here to illustrate how to conduct a network analysis and comprises the responses of 200 students to a TPB questionnaire, which included the following items relating to regular exercise (i.e. exercising for at least 20 min, three times per week) for the next two months:

Att1 : belief that engaging in regular exercise is healthy

Att2: belief that engaging in regular exercise is useful

Att3 : belief that engaging in regular exercise is enjoyable

Dnorm1 : descriptive norms for friends regarding engaging in regular exercise

Dnorm2 : descriptive norms for other students regarding engaging in regular exercise

Injnorm1 : injunctive norms for friends regarding engaging in regular exercise

Injnorm2 : injunctive norms for students regarding engaging in regular exercise

Pbc1 : perceived control regarding engaging in regular exercise

Pbc2 : self-efficacy towards engaging in regular exercise

Intention : intention to engage in regular exercise

In the Environment/History/Connection pane, we can select Import Dataset to import the datafile. Alternatively you can use the command code:

TPB2018 = read.csv("filename.extension", header = TRUE).

The filename extension is simply the location of the relevant csv. file on your computer.

Once it is imported, the data will appear in the Editor pane and the console window will have a line of code indicating that data is active

>View(TPB2018)

The next step is to tell R to estimate the network model using the EBICglasso to produce an interpretable network. The command line below tells R to label the results as ‘Network.’

Network <- estimateNetwork(TPB2018, default = "EBICglasso")

Once we have estimated the network, we can ask R to plot it.

>plot(Network, layout = "spring", labels = colnames(TPB2018))

These commands will produce the network plot with the variable names in the plot ( Figure 4 ).

An external file that holds a picture, illustration, etc.
Object name is RHPB_A_1521283_F0004_OC.jpg

Network analysis of TPB items. The size and density of the edges between the nodes respresent the strength of connectedness.

The network shows the strength of relationships between the TPB variables. Some variables have quite strong connections (e.g. att2 and att3 ; injnorm1 and dnorm1 ), whereas others have weak relationship (e.g. att1 and pbc1 ). Visual inspection of the network reveals that the network seems to split into three different communities: (1) the normative beliefs cluster together; (2) the three attitudinal variables and the pbc1 item seem to cluster, and (3) the pbc2 and intention item cluster together. However, visual inspection of the graphical display of complex relationships requires careful interpretation, especially if there are a large number of nodes in the network. In order to check the presence of the potential 3 communities, a spinglass algorithm was applied to the network using the igraph R -package. Of note, this analysis supported the 3 community interpretation (Interested readers are referred to Eiko Fried’s tutorial on this topic: http://psych-networks.com/r-tutorial-identify-communities-items-networks/ ).

Next we can examine the centrality indices in terms of Betweenness, Closeness and Strength ( Figure 5 ).

An external file that holds a picture, illustration, etc.
Object name is RHPB_A_1521283_F0005_OB.jpg

Centrality indices.

>centralityPlot(Network)

Att 3 had the highest strength value and a high closeness value: it has strong connections to the nodes nearby. It plays an important role in the network and its activation has the strongest influence the other nodes in the network. However, pbc1 and injnorm1 had the highest betweenness values: they act as the bridge connecting the communities of nodes.

Stability of the centrality indices

As noted previously, the stability of centrality indices can be examined by estimating network models based on subsets of the data. The case-dropping bootstrap ( type = "case" ) is used; in this case 1000 bootstrapped samples were estimated.

>CentralStability <- bootnet(Network, nBoots = 1000, type = "case")

The CS coefficients for each index can be produced:

>corStability(CentralStability)

A table presenting summary data (e.g. M , SD, CI s) on the bootstrapped indices can be created.

>summary(CentralStability)

However, it may be more useful to plot the stability of centrality indices:

>Plot(CentralStability)

Figure 6 shows the resulting plot of the centrality indices. As the percentage of the sample included in the estimates decreases (as illustrated on the X-axis, the subset samples decrease from 95% of the original sample to 25% of the sample), there is a drop in the correlation between the subsample estimate and the estimate from the original entire sample. Once the correlation goes below .7, then the estimates become unstable. For example, using 90% of the original sample, there is steep decrease in accuracy of the betweenness estimate, whilst the stability of the strength and closeness estimates declines at a slower rate. However, with a subset sample of 70% of the original participants, the closeness estimate is now correlating less than .7 with the full sample estimate. When the subset sample comprises 50% of the original sample, the strength estimate falls below .7. Overall, the pattern suggests the stability of the centrality indices for closeness and betweenness are not that reliable: of note, strength tends to be the most precisely estimated centrality index in psychology networks, and betweenness and closeness only reach the threshold for reliable estimation in large samples (Santos, Kossakowski, Schwartz, Beeber, & Fried, 2018 ).

An external file that holds a picture, illustration, etc.
Object name is RHPB_A_1521283_F0006_OC.jpg

Stability of central indices.

Edge weight accuracy

The robustness of the edge weights can be examined using bootstrapped confidence intervals.

> EdgeWgt<- bootnet(Network, nBoots = 2500)

Similar to the centrality indices, a summary table of the results of edge accuracy analysis can be produced (e.g. M , SD, CI s for estimates):

summary(EdgeWgt)

The plot of the bootstrapped CIs for estimated edge parameters provides a visually informative representation of the estimates.

> plot(EdgeWgt, labels = TRUE, order = "sample")

Figure 7 has been modified to remove most of the names of the edges being represented on the Y axis to de-clutter the figure to enhance readability. The red line in Figure 6 shows the edge value estimated in the sample, and the grey bars surrounding the red line indicate the width of the bootstrapped CIs. Of note, many of edges are estimated as zero (e.g. dnorm2 - att3 ). Some edges are larger then zero, but the bootstrapped CIs contain zero (e.g. att3 - intention ), and for a smaller number of edges, the estimates are larger than 0 and the CIs do not including zero (e.g. dnorm1 - injnorm1 ). Given the above pattern of CIs for the edge weights, the network should be interpreted with caution.

An external file that holds a picture, illustration, etc.
Object name is RHPB_A_1521283_F0007_OC.jpg

Accuracy of the edge-weight estimates (red line) and the 95% confidence intervals (grey bars) for the estimates.

The data were used to illustrate how to run network analysis. Typically such data are analysed by combing the items into their higher order construct (e.g. Attitudes, Norms, PBC, and Intentions) and then multiple regression examines the extent to which variation in Attitudes, Norms and PBC accounts for variation in Intentions, and which variables have significant relationships with intentions (Noar & Zimmerman, 2005 ). Network analysis allows us to examine how the items relate to each other and can reveal important structural relationships that regression cannot reveal. If the present network was replicated and using larger samples, then we could interpret the network in terms of its structural implications for the TPB.

Contrary to the theory, not all variables were directly related to intentions; for example att2’s (belief that exercise is useful) relationship to intention was mediated by its relationship to att1, att3 and pbc1. Indeed, all of the subjective norm items were related to intentions through a mediated pathway with pbc1. Although in line with the TPB, the normative beliefs are related to each other and form a community (i.e. the normative variables correlate with each other), in the current network, contrary to the theory, these normative beliefs have no direct relationship with intentions and only a weak relationship to PBC. This finding would indicate that your intentions to exercise are not that influenced by either the exercise behaviours of others or what you believe others would like you do in terms of regular exercise. Rather, the network suggests that your beliefs about other’s exercise only influences your perceptions of control over exercise, e.g. if others are exercising and want you to exercise, you may feel that you have more control over whether you exercise (‘if others can do it, then so can I’), and by feeling in control, you may have higher intentions to then exercise. A previous meta-analysis similarly reported lower correlations between subjective norms and intentions for physical activity behaviour compared to the strength of relationships between attitudes and intentions, and between PBC and intention (Hagger, Chatzisarantis, & Biddle, 2002 ).

Among the attitudinal variables, the affective attitude is the central node as it connects not only to all the other attitude variables but also to both PBC items (in line with theory) and the Intention item. Research has highlighted the role of affective attitudes on behaviour (e.g. Lawton, Conner, & McEachan, 2009 ) and the present data highlight the value in conceptualising normative beliefs as comprising affective/experiential and cognitive/instrumental components (Conner, 2015 ).

The model also found that the self-efficacy variable (pbc1) of PBC had the highest closeness to intentions; the strong relationship between self-efficacy and activity intentions is consistent with previous meta-analyses (Hagger et al., 2002 ). The fact that the two PBC items had differing patterns of relationships with the other TPB variables further supports the proposed distinction between the self-efficacy and perceived control components of PBC (Conner, 2015 ). If replicated using within person networks, the findings may suggest that changes self efficacy might directly impact on intentions and changes in affective attitude might impact on the other attitudinal variables, and given the network model, a change in Att1 provides a route to influence Pbc2, which should further strengthen the intentions. In essence the network reveals that for regular exercise behaviour among the student population, the affective attitudinal variable is the strongest node and therefore interventions could prioritise targeting changing the emotional responses to exercise to increase intentions to exercise. The network gives little support to intervening to change normative beliefs. This section indicates how network analysis in principle can influence not just how we appraise the pathways proposed in our theories, but also how it may offer guidance for interventions.

The present example aimed to highlight some of the key aspects to conducting network analysis in R and how to make sense of the outputs. Many real world networks estimated in psychology are likely to be messy and therefore interpretations require tempering in light of the stability and accuracy of the estimates. As network analysis becomes more prevalent, replication of network structures and properties will give greater confidence in the interpretations of the network patterns.

Of note, the psychosystems group has also developed an online web app ( https://jolandakos.shinyapps.io/NetworkApp/ ) that allows researchers to visualise and analyze networks from data uploaded into the app. The app, based on the R packages describe above, can analyse data in different common formats (e.g. ‘.csv’, ‘.xls’ and ‘.sav’) and the data can represent the raw data, the correlation matrix between the variables, an adjacency matrix, or an edge list. The user can inform the app how missing data were coded and can also apply the non-paranormal transformation for data that are not normally distributed. The app provides the various options outlined in this paper for estimating the network structure from the raw data; these include the GLASSO, the graphical VAR, and multilevel VAR. The network default is to use the Fruchterman-Reingold Algorithm to layout the network and the user can decide various visual settings (e.g. size of nodes). It also calculates the centrality (strength, closeness and betweenness) indices to determine a node’s importance in the network. A clustering analysis can be run on the data and the networks from two groups can be compared. This resource offers a very user-friendly means to start to examine network structures in data.

Barabási ( 2012 ) argued that theories cannot ignore the network effects caused by interconnectedness among variables. Health psychological processes reflect complex systems and to understand such systems, we need to understand the networks that define the interactions between the constituent variables. Many of our core health psychology models comprise networks of interacting constructs. Considering such psychological processes and outcomes from this perspective offers alternate ways of conceptualising and answering important psychological questions. Networks evolve over time due to dynamical processes that add or remove nodes (variables) or change edges (relationships between variables): the power of network science derives from the ability of the network to model systems where the nature of the nodes (e.g. symptoms, behaviours, beliefs, physiological arousal) and the edges (e.g. correlational relationship, causal relationship, social connection) can vary. Network analysis as a technique has been briefly outlined and how to conduct a simple analysis in R was presented. Hopefully this brief paper will encourage health psychologists to think about their data in terms of networks and to start to apply network analysis methods to their research questions. The work of Borsboom and colleagues provides a key foundation for network analyses and, as mentioned at the start of this paper, their invaluable contributions to the applications of network theory to psychology cannot be underestimated. Understanding the dynamic patterns of networks may offer unique insights into core psychological processes that impact health and well-being.

1 We wish to thank an anonymous reviewer for highlighting this possibility.

Disclosure statement

No potential conflict of interest was reported by the author.

David Hevey http://orcid.org/0000-0003-2844-0449

Abegaz, F., & Wit, E. (2013). Sparse time series chain graphical models for reconstructing genetic networks . Biostatistics (Oxford, England) , 14 ( 3 ), 586–599. doi: 10.1093/biostatistics/kxt005 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Ajzen, I. (1985). From intentions to actions: A theory of planned behavior . In Kuhl J., & Beckman J. (Eds.), Action-control: From cognition to behavior (pp. 11–39). Heidelberg: Springer. [ Google Scholar ]
Ajzen, I. (2011). The theory of planned behaviour: Reactions and reflections . Psychology & Health , 26 , 1113–1127. doi: 10.1080/08870446.2011.613995 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Babyak, M. A. (2004). What you see may not be what you get: A brief, nontechnical introduction to overfitting in regression-type models . Psychosomatic Medicine , 66 ( 3 ), 411–421. doi: 10.1097/00006842-200405000-00021 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Barabási, A. L. (2012). The network takeover . Nature Physics , 8 , 14–16. doi: 10.1038/nphys2188 [ CrossRef ] [ Google Scholar ]
Beard, C., Millner, A. J., Forgeard, M. J. C., Fried, E. I., Hsu, K. J., Treadway, M. T., … Björgvinsson, T. (2016). Network analysis of depression and anxiety symptom relationships in a psychiatric sample . Psychological Medicine , 46 ( 16 ), 3359–3369. doi: 10.1017/S0033291716002300 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Bentler, P. M., & Satorra, A. (2010). Testing model nesting and equivalence . Psychological Methods , 15 ( 2 ), 111–123. doi: 10.1037/a0019625 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Blanken, T. F., Deserno, M. K., Dalege, J., Borsboom, D., Blanken, P., Kerkhof, G. A., & Cramer, A. O. J. (2018). The role of stabilizing and communicating symptoms given overlapping communities in psychopathology networks . Scientific Reports , 8 , 59. doi: 10.1038/s41598-018-24224-2 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Bollen, K. A., & Stine, R. A. (1992). Bootstrapping goodness-of-fit measures in structural equation models . Sociological Methods &Research , 21 ( 2 ), 205–229. doi: 10.1177/0049124192021002004 [ CrossRef ] [ Google Scholar ]
Borgatti, S. P. (2005). Centrality and network flow . Social Networks , 27 , 55–71. doi: 10.1016/j.socnet.2004.11.008 [ CrossRef ] [ Google Scholar ]
Borgatti, S. P., Everett, M. G., & Freeman, L. C. (2002). Ucinet for windows: Software for social network analysis . Harvard, MA: Analytic Technologies. [ Google Scholar ]
Borgatti, S. P., Mehra, A., Brass, D. J., & Labianca, G. (2009). Network analysis in the social sciences . Science , 323 , 892–895. doi: 10.1126/science.1165821 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Borsboom, D. (2017). A network theory of mental disorders . World Psychiatry , 16 , 5–13. doi: 10.1002/wps.20375 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Borsboom, D., & Cramer, A. O. J. (2013). Network analysis: An integrative approach to the structure of psychopathology . Annual Review of Clinical Psychology , 9 , 91–121. doi: 10.1146/annurev-clinpsy-050212-185608 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Borsboom, D., Fried, E. I., Epskamp, S., Waldorp, L. J., van Borkulo, C. D., van der Maas, H. L. J., & Cramer, A. O. J. (2017). False alarm? A comprehensive reanalysis of “evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger (2017) . Journal of Abnormal Psychology , 126 ( 7 ), 989–999. doi: 10.1037/abn0000306 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Bringmann, L. F., Pe, M. L., Vissers, N., Ceulemans, E., Borsboom, D., Vanpaemel, W., … Kuppens, P. (2016). Assessing temporal emotion dynamics using networks . Assessment , 23 ( 4 ), 425–435. doi: 10.1177/1073191116645909 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Bringmann, L. F., Vissers, N., Wichers, M., Geschwind, N., Kuppens, P., Peeters, F., … de Erausquin, G. A. (2013). A network approach to psychopathology: New insights into clinical longitudinal data . PLoS ONE , 8 ( 4 ), e60188. doi: 10.1371/journal.pone.0060188 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and functional systems . Nature Reviews Neuroscience , 10 ( 3 ), 186–198. doi: 10.1038/nrn2575 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces . Biometrika , 95 ( 3 ), 759–771. doi: 10.1093/biomet/asn034 [ CrossRef ] [ Google Scholar ]
Chernick, M. R. (2011). Bootstrap methods: A guide for practitioners and researchers . New York: Wiley. [ Google Scholar ]
Clifton, A., & Webster, G. D. (2017). An introduction to social network analysis for personality and social psychologists . Social Psychological and Personality Science , 8 ( 4 ), 442–453. doi: 10.1177/1948550617709114 [ CrossRef ] [ Google Scholar ]
Conner, M. (2015). Extending not retiring the theory of planned behaviour: A commentary on Sniehotta, Presseau and Araújo-Soares . Health Psychology Review , 9 ( 2 ), 141–145. doi: 10.1080/17437199.2014.899060 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., Mõttus, R., Waldorp, L. J., & Cramer, A. O. J. (2015). State of the aRt personality research: A tutorial on network analysis of personality data in R . Journal of Research in Personality , 54 , 13–29. doi: 10.1016/j.jrp.2014.07.003 [ CrossRef ] [ Google Scholar ]
Cramer, A. O. J., van Borkulo, C. D., Giltay, E. J., van der Maas, H. L. J., Kendler, K. S., Scheffer, M., … Branchi, I. (2016). Major depression as a complex dynamic system . PLoS ONE , 11 ( 12 ), e0167490. doi: 10.1371/journal.pone.0167490 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Cramer, A. O. J., Waldorp, L., van der Maas, H., & Borsboom, D. (2010). Comorbidity: A network perspective . Behavioral and Brain Sciences , 33 ( 2–3 ), 137–150. doi: 10.1017/S0140525X09991567 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Csárdi, G., & Nepusz, T. (2006). The Igraph Software Package for Complex Network Research . InterJournal, Complex Systems, 1695. Retrieved from http://igraph.org
Dalege, J., Borsboom, D., van Harreveld, F., van den Berg, H., Conner, M., & van der Maas, H. L. J. (2015). Toward a formalized account of attitudes: The Causal Attitude Network (CAN) model . Psychological Review , 123 ( 1 ), 2–22. doi: 10.1037/a0039802 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Danaher, P., Wang, P., & Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes . Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 76 ( 2 ), 373–397. doi: 10.1111/rssb.12033. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
David, S. J., Marshall, A. J., Evanovich, E. K., & Mumma, H. (2018). Intraindividual dynamic network analysis – implications for clinical assessment . Journal of Psychopathology and Behavioral Assessment , 40 , 235–248. doi: 10.1007/s10862-017-9632-8 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
De Schryver, M., Vindevogel, S., Rasmussen, A. E., & Cramer, A. O. J. (2015). Unpacking constructs: A network approach for studying war exposure, daily stressors and post-traumatic stress disorder . Frontiers in Psychology , 6 , 4. doi: 10.3389/fpsyg.2015.01896 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Efron, B. (1979). Bootstrap methods: Another look at the jackknife . The Annals of Statistics , 7 ( 1 ), 1–26. [ Google Scholar ]
Engel, G. L. (1980). The clinical application of the biopsychosocial model . American Journal of Psychiatry , 137 , 535–544. doi: 10.1176/ajp.137.5.535 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Epskamp, S. (2018). Regularized Gaussian psychological networks: Brief report on the performance of extended BIC model selection. Retrieved from https://arxiv.org/abs/1606.05771
Epskamp, S., Borsboom, D., & Fried, E. I. (2018a). Estimating psychological networks and their accuracy: A tutorial paper . Behavior Research Methods , 50 ( 1 ), 195–212. doi: 10.3758/s13428-017-0862-1 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Epskamp, S., Cramer, A., Waldorp, L., Schmittmann, V. D., & Borsboom, D. (2012). Qgraph: Network visualizations of relationships in psychometric data . Journal of Statistical Software , 48 ( 1 ), 1–18. doi: 10.18637/jss.v048.i04 [ CrossRef ] [ Google Scholar ]
Epskamp, S., & Fried, E. I. (In Press). A tutorial on estimating regularized psychological networks . Psychological Methods , doi: 10.1037/met0000167 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Epskamp, S., Kruis, J., Marsman, M., & Marinazzo, D. (2017). Estimating psychopathological networks: Be careful what you wish for . PLOS ONE , 12 ( 6 ), e0179891. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Epskamp, S., Maris, G., Waldorp, L., & Borsboom, D. (In Press). Network psychometrics . In P. Irwing, Hughes D., & Booth T. (Eds.), Handbook of psychometrics . New York, NY, USA: Wiley. [ Google Scholar ]
Epskamp, S., van Borkulo, C. D., van der Veen, M. N., Servaas, M. N., Isvoranu, A.-M., Riese, H., & Cramer, A. O. J. (2018b). Personalized network modeling in psychopathology: The importance of contemporaneous and temporal connections . Clinical Psychological Science , 6 ( 3 ), 416–427. doi: 10.1177/2167702617744325 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Estrada, E., & Knight, P. A. (2015). A first course in network theory . Oxford: Oxford University Press. [ Google Scholar ]
Fan, J., Feng, Y., & Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties . The Annals of Applied Statistics , 3 ( 2 ), 521–541. doi: 10.1214/08-AOAS215 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Finegood, D. T., Merth, T. D. N., & Rutter, H. (2010). Implications of the foresight obesity system map for solutions to childhood obesity . Obesity , 18 ( Supplement1 ), S13–S16. [ PubMed ] [ Google Scholar ]
Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017). Evidence that psychopathology symptom networks have limited replicability . Journal of Abnormal Psychology , 126 ( 7 ), 969–988. doi: 10.1037/abn0000276 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Foygel, R., & Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. Advances in Neural Information Processing Systems, 23, 24th Annual Conference on Neural Information Processing Systems 2010 , NIPS 2010.
Foygel Barber, R., & Drton, M. (2015). High-dimensional ising model selection with Bayesian information criteria . Electronic Journal of Statistics , 9 ( 1 ), 567–607. doi: 10.1214/154957804100000000 [ CrossRef ] [ Google Scholar ]
Freeman, L. C. (1978). Centrality in social networks conceptual clarification . Social Networks , 1 ( 3 ), 215–239. doi: 10.1016/0378-8733(78)90021-7 [ CrossRef ] [ Google Scholar ]
Fried, E. I. (2016). R tutorial: how to identify communities of items in networks. Retrieved from http://psych-networks.com/r-tutorial-identify-communities-items-networks/
Fried, E. I., & Cramer, A. O. J. (2017). Moving forward: Challenges and directions for psychopathological network theory and methodology . Perspectives on Psychological Science , 12 ( 6 ), 999–1020. doi: 10.17605/OSF.IO/BNEK [ PubMed ] [ CrossRef ] [ Google Scholar ]
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso . Biostatistics (Oxford, England) , 9 ( 3 ), 432–441. doi: 10.1093/biostatistics/kxm045 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Fruchterman, T. M. J., & Reingold, E. M. (1991). Graph drawing by force-directed placement . Software: Practice and Experience , 21 ( 11 ), 1129–1164. doi: 10.1002/spe.4380211102 [ CrossRef ] [ Google Scholar ]
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods . Econometrica , 37 ( 3 ), 424–438. doi: 10.2307/1912791 [ CrossRef ] [ Google Scholar ]
Greenland, S., Pearl, J., & Robins, J. M. (1999). Causal diagrams for epidemiologic research . Epidemiology , 10 , 37–48. [ PubMed ] [ Google Scholar ]
Guo, J., Levina, E., Michailidis, G., & Zhu, J. (2011). Joint estimation of multiple graphical models . Biometrika , 98 ( 1 ), 1–15. doi: 10.1093/biomet/asq060 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Hagger, M. S., Chatzisarantis, N. L. D., & Biddle, S. J. H. (2002). A meta-analytic review of the theories of reasoned action and planned behavior in physical activity: Predictive validity and the contribution of additional variables . Journal of Sport & Exercise Psychology , 24 ( 1 ), 3–32. [ Google Scholar ]
Haslbeck, J. M. B., & Waldorp, L. J. (2016). Structure estimation for mixed graphical models in high dimensional data. Retrieved from https://arxiv.org/abs/1510.05677
Isvoranu, A. M., van Borkulo, C. D., Boyette, L., Wigman, J. T. W., Vinkers, C. H., Borsboom, D., … GROUP Investigators . (2017). A network approach to psychosis: Pathways between childhood trauma and psychotic symptoms . Schizophrenia Bulletin , 43 ( 1 ), 187–196. doi: 10.1093/schbul/sbw055 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Kelly, H. H. (1983). Perceived causal structures . In Jaspars J., Fincham F. D., & Hewstone M. (Eds.), Attribution theory and research: Conceptual, developmental and social dimensions (pp. 343–369). London: Academic Press. [ Google Scholar ]
Kossakowski, J. J., Epskamp, S., Kieffer, J. M., van Borkulo, C. D., Rhemtulla, M., & Borsboom, D. (2016). The application of a network approach to health-related quality of life (HRQoL): Introducing a new method for assessing hrqol in healthy adults and cancer patient . Quality of Life Research , 25 , 781–792. doi: 10.1007/s11136-015-1127-z [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Krämer, N., Schäfer, J., & Boulesteix, A. L. (2009). Regularized estimation of large-scale gene association networks using graphical Gaussian models . BMC Bioinformatics , 10 , 384. doi: 10.1186/1471-2105-10-384 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Kroeze, R., van der Veen, D. C., Servaas, M. N., Bastiaansen, J. A., Oude Voshaar, R. C., Borsboom, D., … Riese, H. (2017). Personalized feedback on symptom dynamics of psychopathology: A proof-of-principle study . Journal for Person-Oriented Research , 3 , 1–10. [ PMC free article ] [ PubMed ] [ Google Scholar ]
Langley, D. J., Wijn, R., Epskamp, S., & Van Bork, R. (2015). Should I get that Jab? Exploring Influence to encourage vaccination via online social media. ECIS 2015 Research-in-Progress Papers , Paper 64.
Lauritzen, S. L. (1996). Graphical models . Oxford, UK: Clarendon Press. [ Google Scholar ]
Lawton, R., Conner, M., & McEachan, R. (2009). Desire or reason: Predicting health behaviors from affective and cognitive attitudes . Health Psychology , 28 , 56–65. doi: 10.1037/a0013424 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Lehman, B. J., David, D. M., & Gruber, J. A. (2017). Rethinking the biopsychosocial model of health: Understanding health as a dynamic system . Social and Personality Psychology Compass , 11 ( 8 ), e12282. doi: 10.1111/spc3.12328 [ CrossRef ] [ Google Scholar ]
Leigh Brown, A. J., Lycett, S. J., Weinert, L., Hughes, G. H., Fearnhill, E., & Dunn, D. T. (2011). Transmission network parameters estimated from HIV sequences for a nationwide epidemic . The Journal of Infectious Diseases , 204 , 1463–1469. doi: 10.1093/infdis/jir550 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Liu, H., Lafferty, J. D., & Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs . The Journal of Machine Learning Research , 10 , 2295–2328. [ PMC free article ] [ PubMed ] [ Google Scholar ]
MacCallum, R. C., Wegener, D. T., Uchino, B. N., & Fabrigar, L. R. (1993). The problem of equivalent models in applications of covariance structure analysis . Psychological Bulletin , 114 ( 1 ), 185–199. doi: 10.1037/0033-2909.114.1.185 [ PubMed ] [ CrossRef ] [ Google Scholar ]
McEachan, R. R. C., Conner, M., Taylor, N., & Lawton, R. J. (2011). Prospective prediction of health-related behaviors with the theory of planned behavior: A meta-analysis . Health Psychology Review , 5 , 97–144. doi: 10.1080/17437199.2010.521684 [ CrossRef ] [ Google Scholar ]
Milgram, S. (1967). The small-world problem . Psychology Today , 2 , 60–67. [ Google Scholar ]
Mõttus, R., & Allerhand, M. (2017). Why do traits come together? The underlying trait and network approaches . In Zeigler-Hill V., & Shackelford T. (Eds.), SAGE handbook of personality and individual differences: Volume 1. The science of personality and individual differences (pp. 1–22). London: SAGE. [ Google Scholar ]
Noar, S. M., & Zimmerman, R. S. (2005). Health behavior theory and cumulative knowledge regarding health behaviors: Are we moving in the right direction? Health Education Research , 20 , 275–290. doi: 10.1093/her/cyg113 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Opsahl, T., Agneessens, F., & Skvoretz, J. (2010). Node centrality in weighted networks: Generalizing degree and shortest paths . Social Networks , 32 ( 3 ), 245–251. doi: 10.1016/j.socnet.2010.03.006 [ CrossRef ] [ Google Scholar ]
Pastor-Satorras, R., & Vespignani, A. (2001). Epidemic spreading in scale-free networks . Physics Review Letters , 86 , 3200–3203. [ PubMed ] [ Google Scholar ]
Pearl, J. (2000). Causality: Models, reasoning, and inference . New York, NY: Cambridge University Press. [ Google Scholar ]
R Core Team . (2017). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
Rhemtulla, M., Fried, E. I., Aggen, S. H., Tuerlinckx, F., Kendler, K. S., & Borsboom, D. (2016). Network analysis of substance abuse and dependence symptoms . Drug and Alcohol Dependence , 161 , 230–237. doi: 10.1016/j.drugalcdep.2016.02.005 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Richetin, J., Preti, E., Costantini, G., De Panfilis, C., & Mazza, M. (2017). The centrality of affective instability and identity in borderline personality disorder: Evidence from network analysis . PLOS one , 12 ( 10 ), e0186695. doi: 10.1371/journal.pone.0186695 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Santos, H. P., Jr., Kossakowski, J. J., Schwartz, T. A., Beeber, L., & Fried, E. I. (2018). Longitudinal network structure of depression symptoms and self-efficacy in low-income mothers . PLoS ONE , 13 ( 1 ), e0191675. doi: 10.1371/journal.pone.0191675 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Saramäki, J., Kivelä, M., Onnela, J., Kaski, K., & Kertész, J. (2007). Generalizations of the clustering coeffic ient to weighted complex networks . Physical Review E , 75 ( 2 ), 27–105. doi: 10.1103/PhysRevE.75.027105 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Schmittmann, V. D., Cramer, A. O. J., Waldorp, L. J., Epskamp, S., Kievit, R. A., & Borsboom, D. (2013). Deconstructing the construct: A network perspective on psychological phenomena . New Ideas in Psychology , 31 , 43–53. doi: 10.1016/j.newideapsych.2011.02.007 [ CrossRef ] [ Google Scholar ]
Sniehotta, F. F., Presseau, J., & Araújo-Soares, V. (2014). Time to retire the theory of planned behaviour . Health Psychology Review , 8 ( 1 ), 1–7. doi: 10.1080/17437199.2013.869710 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge, MA: MIT Press. [ Google Scholar ]
Steinley, D., Hoffman, M., Brusco, M. J., & Sher, K. J. (2017). A method for making inferences in network analysis: Comment on Forbes, Wright, Markon, and Krueger (2017) . Journal of Abnormal Psychology , 126 ( 7 ), 1000–1010. doi: 10.1037/abn0000308 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Suls, J., & Rothman, A. (2004). Evolution of the biopsychosocial model: Prospects and challenges for health psychology . Health Psychology , 23 , 119–125. doi: 10.1037/0278-6133.23.2.119 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso . Journal of the Royal Statistical Society. Series B (Methodological) , 58 , 267–288. [ Google Scholar ]
Valente, T. W. (2012). Network interventions . Science , 337 ( 6090 ), 49–53. doi: 10.1126/science.1217330 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Van Borkulo, C. D. (2018). Network comparison test: Permutation-based test of differences in strength of networks. Retrieved from github.com/cvborkulo/ NetworkComparisonTest
van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., & Waldorp, L. J. (2014). A new method for constructing networks from binary data . Scientific Reports , 4 ( 5918 ), 1–10. doi: 10.1038/srep05918 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
van Borkulo, C. D., Boschloo, L., Borsboom, D., Penninx, B. W. J. H., Waldorp, L. J., & Schoevers, R. A. (2015). Association of symptom network structure with the course of depression . JAMA Psychiatry , 72 ( 12 ), 1219–1226. doi: 10.1001/jamapsychiatry.2015.2079 [ PubMed ] [ CrossRef ] [ Google Scholar ]
van der Maas, H. L., Dolan, C. V., Grasman, R. P., Wicherts, J. M., Huizenga, H. M., & Raijmakers, M. E. (2006). A dynamical model of general intelligence: The positive manifold of intelligence by mutualism . Psychological Review , 113 ( 4 ), 842–861. doi: 10.1037/0033-295X.113.4.842 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Ware, J. E., Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36): I. Conceptual framework and item selection . Medical Care , 30 , 473–483. [ PubMed ] [ Google Scholar ]
Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks . Nature , 393 ( 6684 ), 440–442. doi: 10.1038/30918 [ PubMed ] [ CrossRef ] [ Google Scholar ]
Wetzels, R., & Wagenmakers, E.-J. (2012). A default Bayesian hypothesis test for correlations and partial correlations . Psychonomic Bulletin & Review , 19 , 1057–1064. doi: 10.3758/s13423-012-0295-x [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
Wild, B., Eichler, M., Friederich, H.-C., Hartmann, M., Zipfel, S., & Herzog, W. (2010). A graphical vector autoregressive modeling approach to the analysis of electronic diary data . BMC Medical Research Methodology , 10 ( 28 ), 1–13. doi: 10.1186/1471-2288-10-28 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]

Open access
Published: 14 July 2022

Exploring the raison d’etre behind metric selection in network analysis: a systematic review

D. Morrison ORCID: orcid.org/0000-0003-4612-081X 1 ,
M. Bedinger ORCID: orcid.org/0000-0001-6097-1595 1 ,
L. Beevers ORCID: orcid.org/0000-0002-1597-273X 1 &
K. McClymont ORCID: orcid.org/0000-0003-0235-7786 1

Applied Network Science volume 7 , Article number: 50 ( 2022 ) Cite this article

2864 Accesses

7 Citations

2 Altmetric

Metrics details

Network analysis is a useful tool to analyse the interactions and structure of graphs that represent the relationships among entities, such as sectors within an urban system. Connecting entities in this way is vital in understanding the complexity of the modern world, and how to navigate these complexities during an event. However, the field of network analysis has grown rapidly since the 1970s to produce a vast array of available metrics that describe different graph properties. This diversity allows network analysis to be applied across myriad research domains and contexts, however widespread applications have produced polysemic metrics. Challenges arise in identifying which method of network analysis to adopt, which metrics to choose, and how many are suitable. This paper undertakes a structured review of literature to provide clarity on raison d’etre behind metric selection and suggests a way forward for applied network analysis. It is essential that future studies explicitly report the rationale behind metric choice and describe how the mathematics relates to target concepts and themes. An exploratory metric analysis is an important step in identifying the most important metrics and understanding redundant ones. Finally, where applicable, one should select an optimal number of metrics that describe the network both locally and globally, so as to understand the interactions and structure as holistically as possible.

Introduction

As urbanisation increases, cities have become hubs of the modern world. Abundant with people, resources and services which drive the growth of technology, the economy and society; all of which are intertwined with one another (Cristiano et al. 2020 ). They operate as urban ‘systems’, and represent networks of different interacting and co-evolving constituent parts (van Meeteren 2019 ). It is estimated that 55% of the world’s population live in urban areas, with projected increases of up to 70% by 2050 (United Nations 2018 ). In order to maintain reliable functioning of the urban system, there will be associated increases in assets to accommodate this growing population. Expansion of the urban system in tandem with a changing climate leaves constituent parts (e.g. local economy; technical infrastructure) susceptible to potentially huge losses in the event of a perturbation (Bouwer et al. 2007 ; Alfieri et al. 2017 ).

Recent shocks to the system, have highlighted how complex and interconnected our urban systems really are. For example, the global outbreak of COVID-19 has impacted healthcare, education, employment, travel and wellbeing—all supported by individual systems that are reliant on each other. Understanding this ‘domino’ effect within urban systems is therefore paramount and risk management must progress by treating risks not in isolation, but instead by considering how risk permeates throughout the elements within the urban system as a whole. Past approaches to the management of urban systems have been critiqued as reductionist, i.e. where feedback effects and interactions between parts have not been sufficiently acknowledged (Cavallo and Ireland 2014 ). In such approaches, any risk posed to a constituent part was self-contained within itself, implying that it does not pose any risk to the wider system (Clark-Ginsberg et al. 2018 ). However, methods such as Network Analysis enable elements within a system to be mapped and linked together, presenting a more comprehensive approach to assessing urban systems.

Network analysis is born from Graph Theory , where entities of a specified nature (e.g. services, people, houses, infrastructure, cities etc.) are represented as ‘vertices’ or ‘nodes’ that are each connected through a series of ‘edges’ or ‘links’. The use of graph theory for representing systems such as cities is relatively new, and is achieved through the following steps: (1) Identify network typologies (e.g. emergency facilities, households) (2) Define connections (e.g. emergency facilities provide relief to households and businesses) (3) Define rules (e.g. households and businesses are associated with their closest emergency facilities, such as fire stations, hospitals etc.) and (4) Build the graph in the form of an ‘adjacency matrix’ or ‘edge list’ (Fig. 1 ) (Arosio et al. 2020 ). When the graph is established, the structure and characteristics of the network can be analysed through an array of metrics.

Adjacency matrix (left) to graph/network (right)

Network metrics help to describe a network at either a local level (which looks at individual elements i.e. particular nodes or edges) or a global level (which looks at the network as a whole) (Miele et al. 2019 ). The most established and commonly used metrics are those developed by Freeman ( 1979 ): Betweenness Centrality, Degree Centrality and Closeness Centrality. The concept of centrality is one that pertains to the local level of a network, and seeks to quantify the position, importance or influence of an element in a network. Centrality originated in Social Network Analysis (SNA) (Newman 2010 ), therefore many characteristics that metrics such as Betweenness, Degree and Closeness describe are sociological in origin. However, centrality metrics have also been used to describe physical networks such as neural topology of the brain (Saberi et al. 2021 ); technical networks such as water distribution systems (Giustolisi et al. 2019 ); and multi-mode networks such as those modelling tangible and intangible elements in an urban system (Beevers et al. 2022 ; McClymont et al. 2021 ).

The transferability of centrality metrics across domains and contexts highlights a particular strength of network analysis in that many different systems, from the brain to entire cities, can be represented. However, Miele et al. ( 2019 ) argues there is a risk of blind use of metrics and other issues surrounding metric selection. The ease and availability of network analysis software allows end users to calculate a range of metrics without fully understanding the mathematics. It is therefore essential that the adopted metric is truly able to describe the characteristics of the system in question and is applied and interpreted by the researcher in an appropriate manner. Global metrics (e.g. Density, Diameter) are used to compare different networks, however simple characteristics that vary across each network (e.g. number of nodes) can influence the results of global metrics. In addition, multiple metrics are often chosen without clarification of how they each make different contributions to the analysis, which can lead to redundant results if the chosen metrics are in some way correlated.

Given the wide range of possible applications of network analysis, one might ask: when should I use a certain metric and not another? What are the reasons for choosing network metrics and is it fair to assume that common metrics (i.e. centrality) can consistently describe the same system characteristic across a wide range of scenarios and domains? How many metrics are appropriate to use? Are some metrics more versatile across contexts than others?

To explore these questions, this study looks at the wider urban systems and disaster management literature to find the raison d’etre behind metric selection in network analysis. We identify the most common types of network analysis and the supporting metrics to fulfil study objectives, therefore answering “ who uses what, where, and why? ” . This study aims to reflect on past selection and application of network metrics across the fields of urban systems and disaster management, to avoid misinterpretation of results due to inappropriate metric selection and maximise the usefulness of network analysis in future.

Methodology

The materials used to conduct this review were retrieved from Scopus. Our analysis sought to identify research papers from the fields of disaster management and urban systems that feature network analysis. Papers were limited to journal or conference articles, written in English, with no upper or lower limit on publication date. In order to meet the PRISMA criteria of systematic reviews (Liberati et al. 2009 ) we outline the search terms used to systematically identify relevant literature and predefined inclusion and exclusion criteria to ensure reproducibility. The initial literature search was conducted on 16th March 2021 and the full research workflow is summarised in Fig. 2 . Search queries in Scopus were as follows;

TITLE-ABS-KEY ("FLOOD MANAGEMENT" OR "DROUGHT MANAGEMENT" OR "DISASTER RISK REDUCTION" OR "DISASTER MANAGEMENT" OR "HAZARD MANAGEMENT" OR "URBAN SYSTEMS" OR "COMPLEX ADAPTIVE SYSTEMS") AND TITLE-ABS-KEY ("NETWORK ANALYSIS" OR "NETWORK THEORY" OR "NETWORK SCIENCE" OR "GRAPH THEORY" OR "GRAPH ANALYSIS" OR "GRAPH METRIC").

Research workflow

The initial search in Scopus returned a total of 476 papers, which after duplicates were removed was reduced to 465 papers. 21 papers were unavailable on Scopus for review therefore 443 papers were carried forward for Stage 2.

Strict inclusion and exclusion criteria were established to keep within the scope of this review (Table 1 ). These criteria act as a means in which to filter out studies as per Stage 2 of the research workflow, where the abstracts of the 443 papers were screened for eligibility. During the abstract-screening stage, papers were manually examined to determine whether they met the pre-defined eligibility criteria for the qualitative analysis. The primary purpose of this stage was to ensure that there was a technical application of network analysis that adopted one or more metrics. Papers that used network analysis as a form of visualisation only with no quantitative analysis were excluded. Moreover, Bayesian and neural network-based papers were excluded as these tend to be more events-based, focused on probabilities between events and extracting new insights from granular data through deep/machine learning. Here our focus is on structure-based network insights, e.g., whether one element of the network has more influence than another. The abstract screening process provided 346 papers for the full-text screening in Stage 3.

The full-text screening process in Stage 3 is an extension of the abstract screening stage. This stage involved full-text screening of 346 papers, in which the following information was documented in Excel; network method used, metrics adopted, research domain and context. Given that metrics that quantify centrality of a network have since been diversified, in that they extend out with SNA, it was necessary to distinguish between identified metrics and what particular branch of network analysis they were applied to. Moreover, distinguishing between research domain and context was also necessary since, for example, the disaster management domain comprises a wide range of contexts (i.e. different hazards and disasters). Out of 346 papers that qualified for full-text screening, 155 papers were identified for a richer, manual qualitative analysis. These papers served as the database that form the results and discussion of this review.

The final stage of the research workflow was to manually review the final 155 papers as comprehensively as possible using the documented information in excel as a guide, and also served to validate the initial information extracted during the full-text screening stage to minimise potential error. For example, each study had the adopted metrics recorded, therefore an aim of the analysis was to identify (if any) the rationale behind metric selection. Identifying the rationale behind metric selection for each of the 155 papers and comparing with the quantitative excel database (i.e. what metrics were used in what domain and context) facilitated answering the research questions. For instance, studies that share a research domain and context (e.g. examining the emergency response network after a flood) may adopt different metrics, or one study may adopt just one, whereas another adopts several metrics; is there an explicit reason for this? In addition, the characteristics that a metric is aiming to describe in a particular study were mapped to each metric for use in a frequency analysis. This aims to provide an understanding of how versatile metrics are in describing certain properties of a network. The literature search in this review covers multiple (albeit related) domains in its Scopus search (i.e. hazard management, disaster management, flood management, drought management + urban systems and Complex Adaptive Systems (CAS).

Publication trend

Figure 3 below outlines the timeline of network analysis applications in the 155 identified papers used in the analysis, as discussed in “ Methodology ” section. For this review, the earliest identified application of network analysis was in 1975. Following this, no applications of network analysis were featured until 2005. From then onwards, a sharp increase can be observed, with the largest annual peak in 2020 (32 applications). The sharp rise suggests that the value of network analysis is of increasing interest to researchers focused on extreme events. Conceptually, research in this area has increasingly acknowledged the systemic interconnectedness of events and impacts (Pederson et al. 2006 ). Practically, the influx of ‘Big Data’ from 2005 (van Rijmenam 2013 ) has created greater opportunities for insights into system behaviour, and facilitated analysis of more sophisticated networks, e.g. with geocoded social media data (Songchon et al. 2021 ).

Network analysis publication trend

Regarding the domains in which these studies are applied, 87 of the 155 (56%) papers analysed were within disaster management. The breadth of disaster management studies accounted for all hazards such as floods, tsunamis, earthquakes, hurricanes, terrorism, cyber-security and disease outbreaks. The remaining 44% of papers were related to “urban systems” and “CAS” and were diverse in their research domains, with applications in: sustainable development, supply-chain management, healthcare, transport and critical infrastructure, ecology and wider climate-related research such as water resources and emissions. The balance between network publications in urban systems and disaster management highlights an almost even application on short-term problems concerned with direct shocks, or disturbances (disaster management) and longer-term more complex problems such as urban planning in systems research.

Network methods

In total, we identified 31 unique network analysis methods that can be grouped into 8 categories (Table 2 ). Figure 4 provides a breakdown of each individual method.

Map of identified network methods according to categories

The most popular method was SNA, accounting for just under half of all studies (42%, 69/155 papers). Applications of SNA were predominantly in the disaster management domain (65%, 45/69 papers), where the context was mainly focused on Emergency Management Networks (EMN) such as how Governments and Local Authorities responded to previous disasters. For more detailed information on networks in EMNs, we refer the reader to the review by Du et al. ( 2020 ).

The next most popular network methods are GIS-based networks (13%, 22/155 papers) and routing-problem-based networks (12%, 20/155 papers). Whilst we acknowledge that routing problem-based networks are also spatial in nature, we categorise them differently based on the method objective; GIS-based networks in this review refer to networks developed using georeferenced data that are diverse in research domain, context and aim. For example Ceron et al. ( 2020 ) examine meteorological topology of the system using radar-derive rainfall data and Sun et al. ( 2019 ) investigate regional economic development of urban agglomerations in China. On the other hand, routing-problem-based networks are exclusive within the disaster management domain (95%, 19/20 papers), where there is a focus on evacuation, emergency service response or access to relief facilities. One exception is a wider transport-related application of a routing problem by Guettiche and Kheddouci ( 2019 ) who examine critical transport nodes and links to mitigate congestion. Thus two of the most popular network analysis methods (SNA and Routing Problem methods) are almost exclusively focused on disaster management, it suggests that there is a lack of methodological diversity in the domain, with respect to network analysis techniques.

In contrast, 6 out of 8 method categories (approximately 51% of papers) are within the wider urban systems domains outlined previously. The modelling/simulation methods category consists of 6 different methods (Fig. 4 ) which incorporate network analysis with applications in critical infrastructure (Pumpuni-Lenss et al. 2017 ) and urban development (Sun et al. 2019 ). Moreover, simulation methods such as Agent-based Modelling (ABM) are integrated within SNA, routing problems and Ecological Network Analysis (ENA) (7 papers).

Applications of ENA are typically applied in the same area of urban and sustainable development (66%, 12/18 papers), with exceptions in ecology (Borrett et al. 2006 ), climate (Chen et al. 2015 ) and aviation (Burns et al. 2008 ; DeLaurentis and Ayyalasomayajula 2009 ).

Celik and Corbacioglu ( 2018 ), Comfort et al. ( 2013 ) and Jin et al. ( 2014 ) feature as examples of typical ‘complex networks’, mainly small-world and scale-free networks. Both applications of complex network featured in this review are exclusive to disaster management.

Network metrics

In total 38 global metrics and 41 local metrics were identified from the qualitative analysis of 155 papers, with a pool of 79 metrics available overall. Additional file 1 : Table S1 provides a summary table with the definitions of each metric and a classification as either “Global” or “Local”. Global metrics typically describe characteristics, topology, and structure at the network/system level. Conversely, local metrics are typically centrality-based measures that describe the characteristics of a network at the node/edge level. The split between global and local metrics is roughly even, where out of all 79 metrics, global metrics account for 48% and local metrics 52%.

Local metrics

Overall, the top three most frequently occurring local metrics were Betweenness Centrality (51%, 79/155 papers), Degree Centrality (32%, 50/155 papers) and Characteristic Path Length (i.e. mean path length or shortest/optimal path; 28%, 44/155 papers). Developed by Freeman ( 1979 ) (alongside Closeness Centrality; the 4th most frequently occurring metric. 15%, 23/155 papers) the concept of centrality was developed in the context of social networks. Given that SNA is also the most popular form of network analysis (every application of SNA utilises at least one centrality metric), it is no surprise that they are the most popular metrics. Similarly, the prevalence of GIS-based networks and Routing problems (53%, 42/79 papers) highlighted in “ Network metrics ” section can also explain the popularity of path length as a metric, as the applications of routing problems are typically associated with evacuation behaviour, and thus shortest and optimal path lengths are the objective in these studies (see Xuefen and Lim 2016 ).

Whilst centrality metrics are most popular where SNA is concerned, of particular interest is how these metrics were applied across the other categories of network analysis described in Table 2 , and in what domains. Of the papers that did not use SNA, 40% (34/86 papers) adopted centrality measures, highlighting that these metrics are versatile out with social networks. Beyond the established centrality which dominate local analysis, there are 8 additional centrality metrics that cumulatively account for < 1% (7 papers). These are as follows: Cascading Centrality and Random Centrality (Der Sarkissian et al. 2020 ), Directed Alternative Centrality and Directed Alternative Power (Zheng et al. 2020 ), Egobetweenness Centrality (Hossain and Kuti 2010 ), Percolation Centrality (Dong et al. 2020 ), Power Centrality (Radulescu et al. 2020 ), and Status Centrality (Ongkowijoyo and Doloi 2017 ; Tang and Lai 2019 ). All but three of these centrality measures (Directed Alternative Centrality, Directed Alternative Power and Power Centrality) were applied in the context of disaster management using either spatial networks or SNA, whereas the other three were applied in urban systems in the context of smart cities (SNA) or examining spatial interaction, migration and inter-city flows (spatial network). Although all these studies have deviated from the norm by including lesser-known centrality metrics, only Zheng et al. ( 2020 ) did not supplement their analysis with one of either Betweenness, Closeness or Degree Centrality. Figure 5 summarises the distribution of local metrics.

Frequency of local network metrics

Global metrics

Figure 6 illustrates the most popular global metrics. The top three most popular global metrics were Density (26%, 41/155 papers), Centralisation (13%, 20/155 papers), and Throughflow (9.7%, 15/155 papers) joint with Clustering Coefficient (9.7%, 15/155 papers). Density and Centralisation are complementary ‘global' metrics to Betweenness and Degree Centrality, as they provide an overview of a network as a whole, as opposed to the centrality of individual nodes. Largely these metrics are applied in SNA. There are few instances in which Density (7 papers) and Centralisation (2 papers) were adopted out with SNA. For example, Wang et al. ( 2020 ) adopt density as a topological measure of a Human-Spatial system based on geocoded social media data before, during and after Hurricane Harvey. He et al. ( 2019 ) use Density to assess the hierarchical structure of China’s ‘megaregions’ to explore the urbanisation process.

Frequency of global network metrics

In addition to established metrics such as Degree, Betweenness, Density and Centralisation, there are several bespoke applications of metrics that are unique to particular studies. In an exploratory analysis of the ‘World City Network’, Derudder and Taylor ( 2005 ) present ‘Global Network Connectivity’ (GNC) which is an aggregated measure of a city’s connectivity in relation to other cities and ‘City Cliquishness’, which is derived from an already popular means of assessing the structure of a network through ‘clique analysis’ (6%, 10/155 papers).

Similar to the points raised in “ Local metrics ” section, where centrality metrics are largely associated with applications of SNA, the same can be said for ENA. This approach is built around network ‘flows’ (Throughflow), in which the flows (e.g. energy, resources, passengers) are used to analyse the interaction between multiple systems (Fath et al. 2007 ). Throughflow is both a global (system level) and local (node level) metric (Borrett et al. 2006 ; Finn 1980 ). Whilst ENA is always associated with flows there are instances where ENA has been supplemented with additional metrics which are also unique to ENA: Average Mutual Information (AMI), Development Capacity and Ascendency (Bodini 2012 ; Bodini et al. 2012 ), Network Utility (e.g. Gao et al. 2021 ), Control (Tan et al. 2018 ; Yang et al. 2020 ), Network Flux (Liu et al. 2011 ), Stability (Fan and Fang 2019 ) and Mixed Trophic Impact (Gao et al. 2021 ).

Number of metrics used

Figure 7 below summarises the distribution of metrics across the 155 papers. The distribution is right-skewed, indicating that a majority of papers adopted 1–2 metrics in a network analysis, with an average metric count of approximately 3.

Distribution of metrics based on how many metrics are adopted across the 155 papers.

At the extremes of the distribution, there are 4 occurrences in which there were no explicitly stated metrics adopted in the analysis. For example, Pastor-Escuredo et al. ( 2020 ) adopt SNA in the context of Flood Risk Management (FRM) to propose a rapid multi-dimensional impact assessment framework using social media data. In contrast, another SNA-based approach by Ongkowijoyo and Doloi ( 2017 ) adopts 8 metrics: Density, Centralisation, Degree Centrality, Closeness Centrality, Betweenness Centrality, Eigenvector Centrality, Status Centrality and Risk Criticality. The emphasis towards local level metrics suggests that various characteristics of the nodes within the network are necessary to describe. Romascanu et al. ( 2020 ) highlight that historically, consistent centrality measures are not adopted when identifying central nodes within networks. When examining this in the context of this review, it is found that 17% of SNA studies adopt only one metric. This therefore begs the question as to why some applications of network analysis do not aspire to describe the system in question as holistically as possible, through a suite of metrics which describe different aspects of the network.

Whilst these examples are based on SNA approaches, it is important to highlight that the number of metrics adopted is context dependent, with respect to the chosen network method. For example, ENA is unique in that it does not require the same indicators as SNA as it focuses on flows, as opposed to characteristics such as centrality. Therefore the typical metrics adopted in this approach are Throughflow (at both system and node level) or Network Utility (with additional metrics adopted as highlight previously in “ Global metrics ” section). Similarly, Routing Problems are less concerned with characterising the important nodes, instead the shortest/optimal route is the priority, therefore Characteristic Path Length is often the only metric adopted (70%, 14/20 papers).

Network characteristics

The previous section, “ Network methods ” and Network metrics ” aimed to give an overview of the state-of-the-art regarding network methods and metrics. This section explores the extent to which the rationale behind use of particular metrics is reported and made transparent. In other words: what do we hope to learn about systems through network analysis? Why are particular metrics used, and what key concepts or system properties are thought to be targeted by each?

In a small proportion (6%, 10/155 papers) of papers there was a lack of transparency and explicit reporting around the rationale behind metric selection. This suggests that there is no particular ‘blind use’ of metrics with the vast majority of papers providing a description of how the metric relates to the system in question, or at least a generic definition. Of the papers where no explicit rationale or definition is provided however, the typical metrics adopted are those which featured most frequently in “ Local metrics ” section (Degree Centrality, Betweenness Centrality) and “ Global metrics ” section (Density, Centralisation, Clustering Coefficient). This suggests that their level of establishment is commonplace, such that it is assumed they require no detailed explanation.

Figure 8 illustrates the most frequently occurring characteristics described by the identified metrics. Table 3 provides a summary of the top 8 characteristics with their frequencies. It is no surprise that the most frequently occurring characteristic is connectivity . In total, 12 metrics were described as measures of connectivity. In some cases, the distinction between metrics of connectivity are related to the scale of the network being measured. For example, Degree Centrality can measure node (local) connectivity, whilst Cohesion is a measure of system (global) connectivity. It is clear therefore that it is less of case that connectivity is a priority in network analysis, but more the nature of the connectivity that is important (Fig. 8 ).

Word cloud revealing the most common characteristics described by metrics

The interchangeability of characteristics that metrics can describe is also highlighted in Table 3 . Betweenness, Degree, Closeness and Eigenvector Centrality are all mentioned as describing a node’s importance , influence , position and power . Given that all of these metrics are mathematically different, it suggests that there are different ways in which a node can be “important” or “influential”. Furthermore, the main centrality metrics feature in most of the characteristics described such as being measures to describe “information flows”, therefore network metrics are highly diverse and context dependent.

Who uses what? And where?

With respect to “who” is using network analysis amongst the targeted areas in a broad sense, the applications of network analysis reviewed here were somewhat evenly spread across disaster management (with respect to all kinds of hazards) and urban systems CAS literature (a breadth of contexts). When disaggregating this further by specific network analysis techniques, it is clear that there is a wide range of different methods available. This highlights that in the areas of disaster management and urban systems, network analysis has been significantly diversified since the original conception of SNA in the 1970s (see Zhang 2010 ).

Advances in computational power and data availability have not only facilitated more advanced applications of SNA, but have allowed integration with other computational methods such as modelling and simulation. For example, Rodrigueza and Estuar ( 2018 ) use SNA as a basis for understanding disaster behaviour in an ABM. Furthermore, network analysis has evolved beyond the study of sociology, as the importance of transport and critical infrastructure has become a modern concern. For example, georeferenced data has enabled path analysis of transport and mobility, or the “actors” (nodes) in a network no longer need to be people or organisations, but instead businesses, homes and emergency facilities in “Hybrid-social physical networks” (Bozza et al. 2017 ). Furthermore, graph theory has facilitated systems-oriented methods such as ENA, which not only pertains to ecosystems, but the interactions between physical systems such as cities across the world (Bodini et al. 2012 ).

With respect to “who uses what”, despite the availability of a wide range of specific network analysis methods, there is siloing within the reviewed research domains. A majority (65%) of disaster management studies used either SNA or Routing Problems. However, the application of network methods is broader in the urban systems domain, where 6 out of 8 method categories (Table 2 ) were used in 51% of studies overall. The focus on SNA and routing-problem-based networks within disaster management suggests that there are established priorities, yet also highlights gaps where network analysis may also be able to fill. For instance, the primary application of SNA is centred around response networks and organisational collaboration during previous disasters, where two extreme events (i.e. floods) are compared to examine how networks have evolved between two past points in time. This is intended to examine the preparedness of a nation or region. However, the prevalence of such applications suggest that there may be an overemphasis on understanding past events, instead of more directly preparing for future events. Applications that are more present- or future-oriented (i.e. evacuation and emergency service response modelling across spatial transport networks) do suggest that there is some element of future preparation involved, however, as they are based on shortest/optimal path problems, they mostly represent preparedness in the context of spatial movement, rather than abstract social collaboration. Because looking to future preparedness requires dealing with a great deal of uncertainty regarding how context may change from the present, and it is crucial to understand the more fundamental (but highly complex) dynamics behind preparedness in the present before introducing those future uncertainties, these findings are understandable. However, the high prevalence of SNA or Routing Problems suggests more could be done to expand the conceptualisation of research problems—and diversify the application of network analysis techniques—within disaster management (Bedinger et al. 2019 ). This review suggests that disaster management should therefore turn towards the wider urban systems literature for inspiration regarding alternative network analysis methods that consider the interdependency of multiple systems as opposed to mobility only. For instance, ENA applications in urban systems studies often model the interactions between different sectors (Liu et al. 2011 ).

With respect to “where”, we have further disaggregated this review by the focus of specific network analysis techniques—in other words, whether the metrics used have a local or global perspective. A wide range of metrics were identified in the chosen research domains with an almost equal representation between global and local.

It should be acknowledged that this discussion emphasises centrality metrics, due to their overwhelming popularity, and that a comprehensive discussion of reviewed applications for each of the 79 metrics would be cumbersome and out with the scope of this study.

Unsurprisingly, the most frequently used of all metrics (both global and local) were all local metrics of centrality (Freeman 1979 ): Betweenness, Degree, and Closeness. These are sociological in origin, and have coevolved with the development of SNA, whereby they have typically been used to describe social entities. However, the diversification of network analysis has led to a diversification of metrics in two ways. First, SNA is no longer only “social”—it is the go-to network analysis method regardless of the phenomenon being studied. Although centrality metrics are grounded in SNA, and thus would be assumed to pertain to sociological entities, these metrics have been used to describe a host of other entities, or to describe economic proxies whilst the main method is still SNA. This is important as it begs the question as to why in disaster management there remains such a focus on applying SNA to mainly social-based networks, rather than extending this to different sectors that are interconnected and are at risk (e.g. health care, economy, transport). Second, centrality metrics are frequently applied in methods other than SNA (e.g. ABM), and the concept of centrality has developed beyond metrics proposed by Freeman ( 1979 ) to other centrality metrics.

The versatility of centrality metrics and the availability of many other metrics (global and local) highlights the value of network analysis. Although this does present issues as well.

How many metrics should I use? When should I use this metric and not the other?

It is challenging to definitively answer how many metrics one should use in network analysis, as this depends on context, adopted method, time, resources, and knowledge of the end user. However, based on the results in “ Number of metrics used ” section, three things are clear; the average number of metrics observed in this review is three; the majority of studies adopt fewer than 3; and there is wide variation across the studies (i.e. some studies adopt several (8) and some adopt none at all). Centrality metrics are typically used to capture specific characteristics of a network, such as evaluating how a single node is connected to the rest (degree centrality), which provides a static overview of network structure. From a more dynamic perspective, betweenness centrality evaluates how ‘information’ propagates through the network. Other centrality metrics, such as eigenvector centrality aim to fill the gaps of basic nodal metrics such as degree centrality, as it includes ‘information’ (such as a nodes influences) whilst also describing the connectivity as degree centrality evaluates. Given these three perspectives, this could possibly explain why typically studies returned in this review adopt an average of three metrics. This would therefore assume that there is a minimum number of characteristics required to evaluated a network.

If the purpose of having an array of metrics is to capture different characteristics of a network, then it could be argued that more metrics are better, as each metric would contribute to holistically describing the system. However, this is where context becomes important. For example, Cui and Li ( 2020 ) aimed to measure two concepts: social capital and how it is used in community resilience. Both of these concepts are multi-faceted and represent complex sociological interactions, such as sense of belonging, collective efficacy, trust, and reciprocity. To achieve this, Cui and Li ( 2020 ) adopt one global metric (Density) and seven local metrics (Betweenness, Degree, Closeness, Path Length, Efficiency, Constraint, Structural Holes) as appropriate to these concepts. In a different context, Balsiger and Ingold ( 2016 ) aimed to investigate how actors within flood governance collaborate and share information based on perceptions of sustainability, using just one local metric (Degree Centrality). Degree Centrality uses the concept of “Structural embeddedness” (see Granovetter 1992 ) which describes how embedded an actor is in the network based on how central they are (i.e. how many actors they are connected to). Both studies clearly define their objectives and achieve them through appropriate metrics; the former study’s scope is wider or more complex, therefore a wider range of metrics is perhaps necessary.

However, one could also use the latter study to highlight inconsistent metric applications between studies measuring similar concepts. Balsiger and Ingold ( 2016 ) and another study (Comfort et al. 2016 ) both aim to examine collaboration. Comfort et al. ( 2016 ) use another sociological concept: “bridging” actors. These are defined as actors that link between two indirectly connected actors, and this can be measured using Betweenness Centrality. If the objectives of the two studies are similar, why has one adopted Betweenness Centrality and the other has not? Further contradicting these observations is Faas et al. ( 2017 ), whose objective is also analysing bridging actors, however in this instance, Degree Centrality is the only metric presented in the paper. These examples show it is difficult to justify which and how many metrics should be used, based only on referencing past applications of disaster management and urban systems research, because the existing body of work is inconsistent.

So how should we justify which and how many metrics should be used? The fundamental aims of network analysis are arguably to represent complex concepts (e.g. multi-faceted social interactions) with a systems perspective (i.e. what is happening both locally and globally). Selecting only one metric is insufficient to achieve either. One metric can only cover one concept, and either global or local characteristics. Therefore at least one global and one local metric is desirable. In addition, more is not necessarily better, as this runs the risk of redundancy if the results of the chosen metrics are correlated (Miele et al. 2019 ).

Therefore, we would argue that an important step in selecting network metrics is a correlation analysis in order to minimise this risk. For example, the R package, Central Informative Nodes in Network Analysis (CINNA) (see Ashtiani et al. 2019 ) enables comparisons across numerous measures of centrality to identify the most important metrics using Principal Component Analysis (PCA) and pairwise associations.

Are some metrics more versatile than others? Can common metrics consistently describe the same characteristic across contexts?

The results in Sects. “ Network metrics ” and “ Network characteristics ” sections highlight that there is diversity in terms of what characteristics of a system or entities can describe. Moreover, the above discussion has alluded to versatility amongst metrics in that two metrics can describe the same thing and that terminology is interchangeable. This raises the question, does interchangeable terminology represent versatility? Or inconsistency in reporting? Furthermore, beyond the most common metrics, what about the less popular ones?

In favour of the argument of inconsistency is the fact that there were studies (albeit a small percentage) which provided no rationale or explanation of metric choice. Katerndahl ( 2012 ) uses SNA to understand how research collaboration within academic faculties impacts productivity at the individual and departmental level, however does not provide any definition or rationale behind the use of Degree, Betweenness and Eigenvector Centrality. Similarly, Comfort et al. ( 2013 ) and Oh ( 2017 ) provide no rationale for their selection of metrics. Kim and Hastak ( 2018 ) provide no rationale for Density, yet describe Degree, Betweenness and Eigenvector centralities as metrics to explain “prominence or importance”, without distinguishing how these three centrality metrics differ and why all three are required to measure the same concept. In contrast, Liu and Lim ( 2016 ) provide no definitions for the centrality metrics Betweenness and Degree, yet provide definitions and interpretations of Centralisation and Density. Moreover, Comfort and Zhang ( 2020 ) explain the rationale behind Betweenness Centrality and the External/Internal Index, but omit any explanation of Density or Diameter. Tozer and Klenk ( 2019 ) use only Degree in a Bibliometric analysis but provide no rationale as to what it represents. Ma et al. ( 2020 ) simply state that degree measures structure. Finally, Pheungpha et al. ( 2019 ) and Zelenkauskaite et al. ( 2012 ) do not specify which metrics or measure of centrality is being used, respectively. In these instances, it appears the analysis was qualitative and that the relationships (i.e. who was connected to who) was of primary interest. Rather than specific failures to adequately outline methodological choices, we believe these instances speak to a larger issue of “letting the researcher decide” how to communicate about network analysis in non-mathematical fields. This is a barrier to a more transparent, higher standard of interdisciplinary network science.

In terms of versatility, it appears that it is not necessarily always a case of which characteristic is being measured but a case of how . The most frequently occurring characteristic is connectivity. In the context of Routing Problem based methods, this typically refers to how connections between nodes are disrupted as a result of a hazard in which the optimal path length is impacted due to a loss of connectivity (Espada et al. 2015 ). Accessibility is also a frequently appearing characteristic which is interchangeable with connectivity in this context. Connectivity is used as a generic term when adopting centrality metrics, in which Degree, Betweenness and Closeness describe different aspects of connectivity. For example, Čerba et al. ( 2017 ) describe the connectivity of semantic resources in terms of quantity (Degree), distance and relation (Closeness) and whether nodes act as bridges (independent, or indirectly connected nodes; described by Betweenness) or not. Optimal connectivity is therefore described as a node which is connected to many others, acts as a bridge and is close to each other nodes. However, whilst in this instance connectivity appears to be a characteristic described by three measures of centrality, there are several examples of this that do not relate to well-known and oft-used centrality metrics. Derudder and Taylor ( 2005 ) use the GNC metric to measure a city’s connectivity in relation to other cities. This metric does not consider centrality. Furthermore, “Connectivity” is also a metric that represents the minimum amount of nodes or edges that would need to be removed to fragment the network into two or more isolated subgroups (Diestel 2005 ) and Samarasinghe and Strickert ( 2013 ) claim that Density is a global indicator of connectivity. It is no surprise that connectivity is the most frequently occurring characteristic, given that networks are fundamentally about connection.

Similarly, the same applies for importance and influence. Kim and Hastak ( 2018 ) state that Degree, Betweenness and Eigenvector centrality are used to explain the importance of actors in an SNA analysis of social media data post-disaster. Taking a selection of examples from the application of SNA to measure response networks in disaster management, Calliari et al. ( 2019 ) use Degree to assess the influence of the most central actors in the network, Celik and Corbacioglu ( 2018 ) use Degree to highlight the most important and well-connected actors, and Celik and Corbacioglu ( 2016 ) and Cui and Li ( 2020 ) both measure the power of actors using Degree. Moreover, Celik and Corbacioglu ( 2016 ) use Betweenness as a means of measuring an actors’ position in the network, yet Htein et al. ( 2018 ) measure such actor-level positioning using Degree (with respect to Centralisation). Mathematically, Degree Centrality is simply the number of other nodes which a given node is connected to (Freeman 1979 ). Therefore, it is recognisable in the context of social networks that a well-connected actor plays a prominent role in the network, and possesses influence and importance. However, the nature of this influence and importance is not only a function of the amount of connections. For instance, Meilani and Hardjosoekarto ( 2020 ) and Chen et al. ( 2020 ) make a distinction between Degree and Eigenvector Centrality by stating that power is measured in the latter not by how many connections a node has, but who the connections are. In both examples, nodes represent actors within disaster risk reduction efforts after an event and the power is identified by examining nodes who are both mutually high in Eigenvector Centrality, thus identifying who is most powerful differently than Degree Centrality affords.

Whilst centrality metrics are most popular where SNA is concerned, of particular interest is how else these metrics have been applied. A number of studies adopted centrality measures out with SNA, and described entities other than people or organisations. For example, Lao et al. ( 2016 ) use degree centrality to weight edges in their network to represent air passengers, thus providing a measure of a city’s centrality. Arora and Ventresca ( 2018 ) use Betweenness and Closeness centrality for preferential linking in the synthesis of resilient Supply Chain Networks (SCN), where centrality measures act as proxies for price, performance and quality. Mu et al. ( 2020 ) examine the spatial distribution of green space and physical factors to explore alternative green space planning strategies using Degree, Closeness and Betweenness. Garrett et al. ( 2017 ) adopt Degree, Closeness, Betweenness, Eigenvector as measures of centrality to explore food security and agricultural networks (alongside Cliques, Diameter and Path Length). In disaster management, centrality measures are typically associated with road networks and critical infrastructure; Fan and Mostafavi ( 2019 ) use degree centrality with social media data in a graph-based event detection model to identify disruption of critical infrastructure. Papilloud et al. ( 2020 ) characterise flood exposure of road network using Edge Betweenness Centrality (EBC) and Sasabe et al. ( 2020 ) also apply EBC in road network risk analysis. Alongside Lao et al. ( 2016 ), these instances were the only three in which Betweenness Centrality was measured at edges instead of nodes.

Whilst interchangeable terminology for some metrics is a prevalent theme emerging from this review, there are instances in which the metric being described is more definitive. A metric that is terminologically consistent across studies is Throughflow, used in ENA. Whilst the nature of the flows may vary between study, the purpose of applying the metric remains the same. Throughflow is classed as both a global and local metric. Locally, the flows can measure the importance of a node, whereas at a system level, the Total System Throughflow (TST) can indicate if a system is at a steady state if the sum of all inflows is equal to outflows. Measuring TST indicates the level of activity that pertains to the system in question, and this can be useful to characterise the system’s level of growth (e.g. economic growth in a city) (Bodini et al. 2012 ). This presents a useful insight into the methods of network analysis as it is clear that SNA has evolved beyond sociology in terms of method and metrics, and its background in sociology has perhaps fostered the level of versatility, interchangeability, and at times, ambiguity as to what metrics actually mean for those interpreting their results. ENA on the other hand is far more clear-cut, as it depends on measuring flows in terms of materials and resources, not the roles of individuals which are far more difficult to quantify.

Moreover, there is also more consistency and less ambiguity in studies that adopt bespoke/composite/less popular/less generalisable metrics. Nakatani et al. ( 2018 ) demonstrate adaptability of network analysis by using a well-established economic indicator, the Herfindahl–Hirschman Index (see Matsumoto et al. 2012 ) to measure the vulnerability of supply-chains. In contrast to vulnerability (an assessment of weak network links) is criticality, which measures importance (Knoop et al. 2012 ). Mitsakis et al. ( 2016 ) adopt the Unified Network Performance Measure (UNPM) to assess the performance of a transportation network against technological and natural disasters. Developing on the approach by Nagurney and Qiang ( 2008 ), the UNPM is an example of a metric that has been developed to measure the performance of a network in a specific context, therefore the meaning of “importance” in comparison to that described by centrality measures is less ambiguous. An additional example of bespoke metrics are Travel Alternative Diversity and Network Spare Capacity Dimension by Xu et al. ( 2018 ). These are measures of redundancy in a transport network and aim quantify alternative travel routes and how much spare capacity the network has under normal and disruptive conditions.

Conclusions

Summary of findings.

The modern world is highly complex and comprises of numerous interconnected entities. Understanding the interactions between these parts is of great importance if the fragility of the urban system is to be understood and mitigated. To do so requires advanced analytical methods that are able to quantify the importance of a particular component and examine its role in the system with respect to other parts upon which it may rely. Network methods have significantly increased in popularity as the availability of data and specialised software enables such analyses. Network analysis is not a restrictive technique as it can be applied across various contexts and domains, such as understanding the key actors and processes in social networks to supply-chain management. It is highly diverse, even beyond the scope of this review which covers disaster management and urban systems exclusively. Within the confines of these research domains, this review highlights some key issues in network analysis related to: the range of network methods in disaster management and urban systems, the selection of appropriate metrics to describe characteristics and whether or not metrics versatile, and how many metrics are necessary for a holistic analysis.

First, the concepts of graph theory and network analysis have developed well beyond original applications (SNA). It has developed into a research domain of its own and has become highly diverse in terms of what problems it seeks to answer; spanning from analysis of social systems, ecological systems, economic systems and infrastructure to name a few. In addition, it has purpose as a standalone method, but also as a supplementary method to simulation and modelling problems, such as human behaviour. Furthermore, SNA remains the most popular method, however extends beyond social entities. However, despite the diversity, siloing exists between the research domains examined in this review. Answering “ who uses what, and where? ”, we find that the majority of this diversity is found in the wider urban systems literature. Disaster management could take inspiration from this domain as it may benefit from applying SNA more broadly, out with the contexts of EMN. Furthermore, methods such as ENA may be appropriate in the context of systems-oriented approaches. Network analysis coupled with ABM is gaining traction. It should also be emphasised that whilst we discuss methodological diversity, this is constrained to the two specified research domains examined as part of this review. As a result, for network analysis applications more broadly, there are likely be additional methods that this review has not covered.

Second, answering “ who uses what, and why? ”, we find that the centrality metrics have evolved in tandem with the methodologies. Sociological in origin, centrality metrics are not constrained to quantifying the properties and characteristics of social entities. They can be applied to networks describing cities and air passenger flows. However, the versatility of these metrics has led to diverging meanings and interchangeable terminology. This makes it difficult to underpin an appropriate metric to analyse networks with, as it may be the case that the same metric describes one characteristic in one instance, and then something different in the other. Furthermore, out with centrality, there is a vast range of metrics that have been developed for a specific purpose, making them ‘bespoke’. Metrics can also that utilise established indices coupled with network principles (such as centrality) to form composite metrics. This review identified 79 metrics in total, many of which are not discussed, however it is clear that there are many metrics that are not being used.

Finally, answering the raison d’etre further, knowing when to select one metric and not the other, alongside how many metrics to adopt in an analysis is a point of importance. This review finds that the average number of metrics adopted is three, however nearly 20% of papers adopt only one. Given that metrics can either be categorised as global or local, selecting only one metric potentially fails to capture the necessary characteristics of the network. In contrast, there are instances where studies adopt up to eight metrics in an attempt to capture all characteristics. However, this approach falls short due to the fuzzy lexicon of certain metrics in that multiple are reported to describe the same thing. This potentially results in redundant analysis and multicollinearity between metrics.

A way forward

In light of the above findings, we outline a way forward researchers embarking on network analysis. The steps outlined acts as points to consider when approaching problems that require analysis of a network.

Define target concepts and/or frameworks that support the study objectives. It is then possible to map these concepts to the entities in the network and establish what characteristics are being measured. Characteristics can then be mapped to appropriate metrics. We refer the reader to supplementary material for a full list of the metrics identified in this review and definitions. One can also cast the net wider and use the CINNA package in R (see Ashtiani et al. 2019 ) which provides metrics beyond those identified in this review.

Make a metrics shortlist. Ensuring your target concepts are in line with a shortlist of possible metrics by reviewing applications of similar metrics in the research domain of interest, make a shortlist of metrics for testing.

Perform an exploratory metric analysis. Using packages like CINNA, it is possible to perform PCA and Correlation analysis on a multitude of variables. PCA describes which of the shortlisted metrics contribute to the analysis most and describe how much of the variance within results. Correlation analysis is a quick way of identifying redundant metrics. Revise shortlist if necessary.

Adopt at least more than one metric and understand the maths. Mapping concepts such as social characteristics to mathematical formulas is difficult and it is possible that one may interpret the results different from what the maths describes. It is therefore important that in the final analysis, more than one metric is adopted to get an idea of how interpretation of results may change depending on the metric outputs in question. Furthermore, where possible, describe the network at both local and global scales.

Be explicit. Outline explicitly the rationale behind metric selection, what they aim to measure and describe, how the maths translates to the target concepts and rules of interpretation.

Availability of data and materials

Not applicable.

Abbreviations

Agent-based modelling

Flood risk management

Complex adaptive systems

Central informative nodes in network analysis

Ecological network analysis

Edge betweenness centrality

Emergency management networks

Global network connectivity

Principal component analysis

Social network analysis

Total system throughflow

Unified network performance measure

Alfieri L, Bisselink B, Dottori F, Naumann G, de Roo A, Salamon P, Wyser K, Feyen L (2017) Global projections of river flood risk in a warmer world. Earth's Future 5(2):171–182

Article Google Scholar

Arora V, Ventresca M (2018) Modeling topologically resilient supply chain networks. Appl Netw Sci 3(1):3–19. https://doi.org/10.1007/s41109-018-0070-7

Arosio M, Martina MLV, Figueiredo R (2020) The whole is greater than the sum of its parts: a holistic graph-based assessment approach for natural hazard risk of complex systems. Nat Hazard 20(2):521–547. https://doi.org/10.5194/nhess-20-521-2020

Ashtiani M, Mirzaie M, Jafari M (2019) CINNA: an R/CRAN package to decipher central informative nodes in network analysis. Bioinformatics 35(8):1436–1437. https://doi.org/10.1093/bioinformatics/bty819

Balsiger J, Ingold K (2016) In the eye of the beholder: network location and sustainability perception in flood prevention. Environ Policy Gov 26(4):242–256. https://doi.org/10.1002/eet.1715

Bedinger M, Beevers L, Collet L, Visser A (2019) Are we doing ‘systems’ research? A review of methods for climate change adaptation to hydro-hazards in a complex world. Sustainability 11(4):1163. https://doi.org/10.3390/su11041163

Beevers L, Bedinger M, McClymont K, Morrison D, Aitken G, Quinn AV (2022) Modelling Systematic COVID-19 impacts in cities. Springer, Nature (Urban Sustainability)

Bodini A (2012) Building a systemic environmental monitoring and indicators for sustainability: what has the ecological network approach to offer? Ecol Ind 15(1):140–148. https://doi.org/10.1016/j.ecolind.2011.09.032

Bodini A, Bondavalli C, Allesina S (2012) Cities as ecosystems: growth, development and implications for sustainability. Ecol Model 245:185–198. https://doi.org/10.1016/j.ecolmodel.2012.02.022

Borrett SR, Whipple SJ, Patten BC, Christian RR (2006) Indirect effects and distributed control in ecosystems: temporal variation of indirect effects in a seven-compartment model of nitrogen flow in the Neuse River Estuary, USA—time series analysis. Ecol Model 194(1–3 SPEC. ISS.):178–188. https://doi.org/10.1016/j.ecolmodel.2005.10.011

Bouwer LM, Crompton RP, Faust E, Höppe P, Pielke RA Jr (2007) Confronting disaster losses. Science 318(5851):753–753

Bozza A, Asprone D, Parisi F, Manfredi G (2017) Alternative resilience indices for city ecosystems subjected to natural hazards. Comput-Aided Civ Infrastruct Eng 32(7):527–545. https://doi.org/10.1111/mice.12275

Burns MC, Roca CJ, Moix BM (2008) The spatial implications of the functional proximity deriving from air passenger flows between European metropolitan urban regions. GeoJournal 71(1):37–52. https://doi.org/10.1007/s10708-008-9144-x

Calliari E, Michetti M, Farnia L, Ramieri E (2019) A network approach for moving from planning to implementation in climate change adaptation: evidence from southern Mexico. Environ Sci Policy 93(2019):146–157. https://doi.org/10.1016/j.envsci.2018.11.025

Cavallo A, Ireland V (2014) Preparing for complex interdependent risks: a systems of systems approach to building disaster resilience. Int J Disaster Risk Reduct 9:181–193. https://doi.org/10.1016/j.ijdrr.2014.05.001

Celik S, Corbacioglu S (2016) From linearity to complexity: emergent characteristics of the 2006 Avian Influenza Response System in Turkey. Saf Sci 90:5–13. https://doi.org/10.1016/j.ssci.2016.01.006

Celik S, Corbacioglu S (2018) Organizational learning in adapting to dynamic disaster environments in Southern Turkey. J Asian Afr Stud 53(2):217–232. https://doi.org/10.1177/0021909616677368

Čerba O, Jedlička K, Čada V, Charvát K (2017) Centrality as a method for the evaluation of semantic resources for disaster risk reduction. ISPRS Int J Geo-Inf 6(8):237. https://doi.org/10.3390/ijgi6080237

Ceron W, Santos LB, Neto GD, Quiles MG, Candido OA (2020) Community detection in very high-resolution meteorological networks. IEEE Geosci Remote Sens Lett 17(11):2007–2010. https://doi.org/10.1109/LGRS.2019.2955508

Chen S, Chen B, Su M (2015) Nonzero-sum relationships in mitigating urban carbon emissions: a dynamic network simulation. Environ Sci Technol 49(19):11594–11603. https://doi.org/10.1021/acs.est.5b02654

Chen W, Zhang H, Comfort LK, Tao Z (2020) Exploring complex adaptive networks in the aftermath of the 2008 Wenchuan earthquake in China. Saf Sci 125(2020):104607. https://doi.org/10.1016/j.ssci.2020.104607

Clark-Ginsberg A, Abolhassani L, Rahmati EA (2018) Comparing networked and linear risk assessments: from theory to evidence. Int J Disaster Risk Reduct 30:216–224. https://doi.org/10.1016/j.ijdrr.2018.04.031

Comfort LK, Zhang H (2020) Operational networks: adaptation to extreme events in China. Risk Anal 40(5):981–1000. https://doi.org/10.1111/risa.13442

Comfort LK, Ertan G, Oh N, Haase T (2013) Network evolution in disaster management: a comparison of response systems evolving after the 2005 and 2008 gulf coast hurricanes. In: Proceedings of the 2013 IEEE 2nd international network science workshop, NSW 2013, pp 42–49. https://doi.org/10.1109/NSW.2013.6609193

Comfort LK, Bert J, Song JE (2016) Wicked problems in real time: uncertainty, information, and the escalation of Ebola. Inf Polity 21(3):273–289. https://doi.org/10.3233/IP-160394

Cristiano S, Zucaro A, Liu G, Ulgiati S, Gonella F (2020) On the systemic features of urban systems. A look at material flows and cultural dimensions to address post-growth resilience and sustainability. Front Sustain Cities 2:12. https://doi.org/10.3389/frsc.2020.00012

Cui P, Li D (2020) A SNA-based methodology for measuring the community resilience from the perspective of social capitals: take Nanjing, China as an example. Sustain Cities Soc 53:101880. https://doi.org/10.1016/j.scs.2019.101880

DeLaurentis DA, Ayyalasomayajula S (2009) Exploring the synergy between industrial ecology and system of systems to understand complexity a case study in air transportation. J Ind Ecol 13(2):247–263. https://doi.org/10.1111/j.1530-9290.2009.00121.x

Der Sarkissian R, Abdallah C, Zaninetti JM, Najem S (2020) Modelling intra-dependencies to assess road network resilience to natural hazards. Nat Hazards 103(1):121–137. https://doi.org/10.1007/s11069-020-03962-5

Derudder B, Taylor P (2005) The cliquishness of world cities. Glob Netw 5(1):71–91. https://doi.org/10.1111/j.1471-0374.2005.00108.x

Diestel, R., 2005. Graph theory 3rd ed. Graduate texts in mathematics, 173, p.33

Google Scholar

Dong S, Mostafizi A, Wang H, Gao J, Li X (2020) Measuring the topological robustness of transportation networks to disaster-induced failures: a percolation approach. J Infrastruct Syst 26(2):04020009. https://doi.org/10.1061/(asce)is.1943-555x.0000533

Du L, Feng Y, Tang LY, Kang W, Lu W (2020) Networks in disaster emergency management: a systematic review. Nat Hazards. https://doi.org/10.1007/s11069-020-04009-5

Espada RJ, Apan A, McDougall K (2015) Vulnerability assessment and interdependency analysis of critical infrastructures for climate adaptation and flood mitigation. Int J Disaster Resil Built Environ 6(3):313–346. https://doi.org/10.1108/IJDRBE-02-2014-0019

Faas AJ, Velez ALK, FitzGerald C, Nowell BL, Steelman TA (2017) Patterns of preference and practice: bridging actors in wildfire response networks in the American Northwest. Disasters 41(3):527–548. https://doi.org/10.1111/disa.12211

Fan Y, Fang C (2019) Research on the synergy of urban system operation—based on the perspective of urban metabolism. Sci Total Environ 662:446–454. https://doi.org/10.1016/j.scitotenv.2019.01.252

Fan C, Mostafavi A (2019) A graph-based method for social sensing of infrastructure disruptions in disasters. Comput-Aided Civ Infrastruct Eng 34(12):1055–1070. https://doi.org/10.1111/mice.12457

Fath BD, Scharler UM, Ulanowicz RE, Hannon B (2007) Ecological network analysis: network construction. Ecol Model 208(1):49–55. https://doi.org/10.1016/j.ecolmodel.2007.04.029

Finn JT (1980) Flow analysis of models of the Hubbard Brook ecosystem. Ecology 61(3):562–571

Freeman LC (1979) Centrality in social networks. Soc Netw 1(3):215–239. https://doi.org/10.1016/0378-8733(78)90021-7

Article MathSciNet Google Scholar

Gao H, Tian X, Zhang Y, Shi L, Shi F (2021) Evaluating circular economy performance based on ecological network analysis: a framework and application at city level. Resour Conserv Recycl 168:105257. https://doi.org/10.1016/j.resconrec.2020.105257

Garrett KA, Andersen KF, Asche F, Bowden RL, Forbes GA, Kulakow PA, Zhou B (2017) Resistance genes in global crop breeding networks. Phytopathology 107(10):1268–1278. https://doi.org/10.1094/PHYTO-03-17-0082-FI

Giustolisi O, Ridolfi L, Simone A (2019) Tailoring centrality metrics for water distribution networks. Water Resour Res 55(3):2348–2369. https://doi.org/10.1029/2018WR023966

Granovetter M (1992) Problems of explanation in economic sociology. In: Eccles R, Nohria N (eds) Networks and organizations: structure, form, and action. Harvard Business School Press, Boston, pp 25–56

Guettiche M, Kheddouci H (2019) Critical links detection in stochastic networks: application to the transport networks. Int J Intell Comput Cybern 12(1):42–69. https://doi.org/10.1108/IJICC-04-2018-0055

He D, Sun Z, Gao P (2019) Development of economic integration in the central Yangtze River Megaregion from the perspective of urban network evolution. Sustainability (Switzerland). https://doi.org/10.3390/su11195401

Hossain L, Kuti M (2010) Disaster response preparedness coordination through social networks. Disasters 34(3):755–786. https://doi.org/10.1111/j.1467-7717.2010.01168.x

Htein MK, Lim S, Zaw TN (2018) The evolution of collaborative networks towards more polycentric disaster responses between the 2015 and 2016 Myanmar floods. Int J Disaster Risk Reduct 31(August):964–982. https://doi.org/10.1016/j.ijdrr.2018.08.003

Jin L, Jiong W, Yang D, Huaping W, Wei D (2014) A simulation study for emergency/disaster management by applying complex networks theory. J Appl Res Technol 12(2):223–229

Katerndahl D (2012) Co-evolution of departmental research collaboration and scholarly outcomes. J Eval Clin Pract 18(6):1241–1247. https://doi.org/10.1111/j.1365-2753.2012.01881.x

Kim J, Hastak M (2018) Social network analysis: characteristics of online social networks after a disaster. Int J Inf Manag 38(1):86–96. https://doi.org/10.1016/j.ijinfomgt.2017.08.003

Knoop VL, Snelder M, van Zuylen HJ, Hoogendoorn SP (2012) Link-level vulnerability indicators for real-world networks. Transp Res Part A Policy Pract 46(5):843–854

Lao X, Zhang X, Shen T, Skitmore M (2016) Comparing China’s city transportation and economic networks. Cities 53:43–50. https://doi.org/10.1016/j.cities.2016.01.006

Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA, Clarke M, Devereaux PJ, Kleijnen J, Moher D (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. PLoS Med 6(7):e1000100–e1000128. https://doi.org/10.1371/journal.pmed.1000100

Liu X, Lim S (2016) Integration of spatial analysis and an agent-based model into evacuation management for shelter assignment and routing. J Spat Sci 61(2):283–298

Liu GY, Yang ZF, Chen B, Zhang Y (2011) Ecological network determination of sectoral linkages, utility relations and structural characteristics on urban ecological economic system. Ecol Model 222(15):2825–2834. https://doi.org/10.1016/j.ecolmodel.2011.04.034

Ma X, Liu W, Zhou X, Qin C, Chen Y, Xiang Y, Zhang X, Zhao M (2020) Evolution of online public opinion during meteorological disasters. Environ Hazards 19(4):375–397. https://doi.org/10.1080/17477891.2019.1685932

Matsumoto A, Merlone U, Szidarovszky F (2012) Some notes on applying the Herfindahl–Hirschman Index. Appl Econ Lett 19(2):181–184. https://doi.org/10.1080/13504851.2011.570705

McClymont K, Bedinger M, Beevers L, Walker G, Morrison D (2021) Analysing city-scale resilience using a novel systems approach. In: Santos PP, Chmutina K, Von Meding J, Raju E (eds) Understanding disaster risk: a multidimensional approach, 1st edn. Elsevier, Amsterdam, pp 179–201. https://doi.org/10.1016/B978-0-12-819047-0.00011-1

Chapter Google Scholar

Meilani NL, Hardjosoekarto S (2020) Digital weberianism bureaucracy: alertness and disaster risk reduction (DRR) related to the Sunda Strait volcanic tsunami. Int J Disaster Risk Reduct 51(September):101898. https://doi.org/10.1016/j.ijdrr.2020.101898

Miele V, Matias C, Robin S, Dray S (2019) Nine quick tips for analyzing network data. PLoS Comput Biol 15(12):1–10. https://doi.org/10.1371/journal.pcbi.1007434

Mitsakis E, Salanova JM, Stamos I, Chaniotakis E (2016) Network criticality and network complexity indicators for the assessment of critical infrastructures during disasters. Springer, Cham, pp 191–205

Mu B, Liu C, Tian G, Xu Y, Zhang Y, Mayer AL, Lv R, He R, Kim G (2020) Conceptual planning of urban-rural green space from a multidimensional perspective: a case study of zhengzhou, China. Sustainability 12(7):1–20. https://doi.org/10.3390/su12072863

Nagurney A, Qiang Q (2008) A network efficiency measure with application to critical infrastructure networks. J Glob Optim 40(1):261–275

Nakatani J, Tahara K, Nakajima K, Daigo I, Kurishima H, Kudoh Y, Matsubae K, Fukushima Y, Ihara T, Kikuchi Y, Nishijima A, Moriguchi Y (2018) A graph theory-based methodology for vulnerability assessment of supply chains using the life cycle inventory database. Omega 75:1339–1351. https://doi.org/10.1016/j.omega.2017.03.003

Newman MEJ (2010) Networks: an introduction. Oxford University Press, Oxford

Book Google Scholar

Oh N (2017) Dimensions of strategic intervention for risk reduction and mitigation: a case study of the MV Sewol incident. J Risk Res 20(12):1516–1533. https://doi.org/10.1080/13669877.2016.1179210

Ongkowijoyo C, Doloi H (2017) Determining critical infrastructure risks using social network analysis. Int J Disaster Resil Built Environ 8(1):5–26. https://doi.org/10.1108/IJDRBE-05-2016-0016

Papilloud T, Röthlisberger V, Loreti S, Keiler M (2020) Flood exposure analysis of road infrastructure: comparison of different methods at national level. Int J Disaster Risk Reduct. https://doi.org/10.1016/j.ijdrr.2020.101548

Pastor-Escuredo D, Torres Y, Martínez-Torres M, Zufiria PJ (2020) Rapid multi-dimensional impact assessment of floods. Sustainability (Switzerland). https://doi.org/10.3390/su12104246

Pederson P, Dudenhoeffer D, Hartley S, Permann M (2006) Critical infrastructure interdependency modelling: a survey of critical infrastructure interdependency modelling. Technical report. A. Idaho National Laboratory. https://doi.org/10.2172/911792

Pheungpha N, Supriyono B, Wijaya AF, Sujarwoto S (2019) Modes of network governance in disaster relief: the case of the Bangkok flood relief, 2011. Public Administr Issues 2019(6):77–93. https://doi.org/10.17323/1999-5431-2019-0-6-77-93

Pumpuni-Lenss G, Blackburn T, Garstenauer A (2017) Resilience in complex systems: an agent-based Approach. Syst Eng 20(2):158–172

Radulescu CM, Slava S, Radulescu AT, Toader R, Toader DC, Boca GD (2020) A pattern of collaborative networking for enhancing sustainability of smart cities. Sustainability (Switzerland). https://doi.org/10.3390/su12031042

Rodrigueza RC, Estuar MRJE (2018) Social network analysis of a disaster behavior network: an agent-based modeling approach. In: Proceedings of the 2018 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM 2018, pp 1100–1107. https://doi.org/10.1109/ASONAM.2018.8508651

Romascanu A, Ker H, Sieber R, Greenidge S, Lumley S, Bush D, Morgan S, Zhao R, Brunila M (2020) Using deep learning and social network analysis to understand and manage extreme flooding. J Contingencies Crisis Manag 28(3):251–261. https://doi.org/10.1111/1468-5973.12311

Saberi M, Khosrowabadi R, Khatibi A, Misic B, Jafari G (2021) Topological impact of negative links on the stability of resting-state brain network. Sci Rep. https://doi.org/10.1038/s41598-021-81767-7.PMC7838299

Samarasinghe S, Strickert G (2013) Mixed-method integration and advances in fuzzy cognitive maps for computational policy simulations for natural hazard mitigation. Environ Model Softw 39:188–200. https://doi.org/10.1016/j.envsoft.2012.06.008

Sasabe M, Fujii K, Kasahara S (2020) Road network risk analysis considering people flow under ordinary and evacuation situations. Environ Plan B Urban Anal City Sci 47(5):759–774. https://doi.org/10.1177/2399808318802940

Songchon C, Wright G, Beevers LC (2021) Quality assessment of crowdsourced social media data for urban flood management. Comput Environ Urban Syst. https://doi.org/10.1016/j.compenvurbsys.2021.101690

Sun Q, Wang S, Zhang K, Ma F, Guo X, Li T (2019) Spatial pattern of urban system based on gravity model and whole network analysis in eight urban agglomerations of China. Math Probl Eng. https://doi.org/10.1155/2019/6509726

Tan LM, Arbabi H, Li Q, Sheng Y, Densley TD, Mayfield M, Coca D (2018) Ecological network analysis on intra-city metabolism of functional urban areas in England and Wales. Resour Conserv Recycl 138:172–182. https://doi.org/10.1016/j.resconrec.2018.06.010

Tang P, Lai S (2019) A framework for managing public security risks with complex interactions in cities and its application evidenced from Shenzhen City in China. Cities 95(April):102390. https://doi.org/10.1016/j.cities.2019.102390

Tozer L, Klenk N (2019) Urban configurations of carbon neutrality: insights from the Carbon Neutral Cities Alliance. Environ Plan C Politics Space 37(3):539–557. https://doi.org/10.1177/2399654418784949

United Nations (2018) World Urbanization Prospects: The 2018 Revision. Online Edition

Van Meeteren M (2019) Urban system. The Wiley Blackwell Encyclopedia of Urban and Regional Studies, pp 1–11

van Rijmenam M (2013) A short history of big data. https://datafloq.com/read/big-data-history/239 . Accessed 22 Sept 2021

Wang Y, Taylor JE, Garvin MJ (2020) Measuring resilience of human-spatial systems to disasters: framework combining spatial-network analysis and fisher information. J Manag Eng 36(4):04020019. https://doi.org/10.1061/(asce)me.1943-5479.0000782

Xu X, Chen A, Jansuwan S, Yang C, Ryu S (2018) Transportation network redundancy: complementary measures and computational methods. Transp Res Part B Methodol 114:68–85. https://doi.org/10.1016/j.trb.2018.05.014

Xuefen L, Lim S (2016) Integration of spatial analysis and an agent-based model into evacuation management for shelter assignment and routing. J Spat Sci 61(2):283–298. https://doi.org/10.1080/14498596.2016.1147393

Yang Z, Gao W, Zhao X, Hao C, Xie X (2020) Spatiotemporal patterns of population mobility and its determinants in Chinese cities based on travel big data. Sustainability (Switzerland). https://doi.org/10.3390/SU12104012

Zelenkauskaite A, Bessis N, Sotiriadis S, Asimakopoulou E (2012) Interconnectedness of complex systems of internet of things through social network analysis for disaster management. In: Proceedings of the 2012 4th international conference on intelligent networking and collaborative systems, INCoS 2012, pp 503–508. https://doi.org/10.1109/iNCoS.2012.25

Zhang M (2010) Social network analysis: history, concepts, and research. In: Furht B (ed) Handbook of social network technologies and applications. Springer, Berlin, pp 3–21. https://doi.org/10.1007/978-1-4419-7142-5_1

Zheng W, Kuang A, Wang X, Chen J (2020) Measuring network configuration of the Yangtze River middle reaches urban agglomeration: based on modified radiation model. Chin Geogr Sci 30(4):677–694. https://doi.org/10.1007/s11769-020-1131-2

Download references

Acknowledgements

The work presented would not have been possible were it not for the funding provided by EPSRC.

This work was undertaken as part of the ‘ Water Resilient Cities: Climate Uncertainty and Urban Vulnerability to Hydro-hazards ’ project, funded by the Engineering and Physical Sciences Research Council (EPSRC). Grant number EP/NE30419/1.

Author information

Authors and affiliations.

School of Energy, Geosciences, Infrastructure and Society, Heriot-Watt University, William Arrol Building, Room W.A. 3.36/3.37, 2 Third Gait, Currie, Edinburgh, EH14 4AS, UK

D. Morrison, M. Bedinger, L. Beevers & K. McClymont

You can also search for this author in PubMed Google Scholar

Contributions

DM, MB, LB and KM contributed to the design and implementation of the research. DM contributed to the main analysis. All authors contributed to writing the manuscript. There are no other persons who satisfied the criteria for authorship but are not listed. All authors read and approved the final manuscript.

Corresponding author

Correspondence to D. Morrison .

Ethics declarations

Consent for publication.

Not required.

Competing interests

Additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. table s1..

Network metric definitions.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Morrison, D., Bedinger, M., Beevers, L. et al. Exploring the raison d’etre behind metric selection in network analysis: a systematic review. Appl Netw Sci 7 , 50 (2022). https://doi.org/10.1007/s41109-022-00476-w

Download citation

Received : 01 October 2021

Accepted : 27 May 2022

Published : 14 July 2022

DOI : https://doi.org/10.1007/s41109-022-00476-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Network analysis
Graph theory
Urban systems
Disaster management
Natural hazards

Search Menu
Browse content in Arts and Humanities
Browse content in Archaeology
Anglo-Saxon and Medieval Archaeology
Archaeological Methodology and Techniques
Archaeology by Region
Archaeology of Religion
Archaeology of Trade and Exchange
Biblical Archaeology
Contemporary and Public Archaeology
Environmental Archaeology
Historical Archaeology
History and Theory of Archaeology
Industrial Archaeology
Landscape Archaeology
Mortuary Archaeology
Prehistoric Archaeology
Underwater Archaeology
Urban Archaeology
Zooarchaeology
Browse content in Architecture
Architectural Structure and Design
History of Architecture
Residential and Domestic Buildings
Theory of Architecture
Browse content in Art
Art Subjects and Themes
History of Art
Industrial and Commercial Art
Theory of Art
Biographical Studies
Byzantine Studies
Browse content in Classical Studies
Classical History
Classical Philosophy
Classical Mythology
Classical Literature
Classical Reception
Classical Art and Architecture
Classical Oratory and Rhetoric
Greek and Roman Epigraphy
Greek and Roman Law
Greek and Roman Papyrology
Greek and Roman Archaeology
Late Antiquity
Religion in the Ancient World
Digital Humanities
Browse content in History
Colonialism and Imperialism
Diplomatic History
Environmental History
Genealogy, Heraldry, Names, and Honours
Genocide and Ethnic Cleansing
Historical Geography
History by Period
History of Emotions
History of Agriculture
History of Education
History of Gender and Sexuality
Industrial History
Intellectual History
International History
Labour History
Legal and Constitutional History
Local and Family History
Maritime History
Military History
National Liberation and Post-Colonialism
Oral History
Political History
Public History
Regional and National History
Revolutions and Rebellions
Slavery and Abolition of Slavery
Social and Cultural History
Theory, Methods, and Historiography
Urban History
World History
Browse content in Language Teaching and Learning
Language Learning (Specific Skills)
Language Teaching Theory and Methods
Browse content in Linguistics
Applied Linguistics
Cognitive Linguistics
Computational Linguistics
Forensic Linguistics
Grammar, Syntax and Morphology
Historical and Diachronic Linguistics
History of English
Language Acquisition
Language Evolution
Language Reference
Language Variation
Language Families
Lexicography
Linguistic Anthropology
Linguistic Theories
Linguistic Typology
Phonetics and Phonology
Psycholinguistics
Sociolinguistics
Translation and Interpretation
Writing Systems
Browse content in Literature

Bibliography

Children's Literature Studies
Literary Studies (Asian)
Literary Studies (European)
Literary Studies (Eco-criticism)
Literary Studies (Romanticism)
Literary Studies (American)
Literary Studies (Modernism)
Literary Studies - World
Literary Studies (1500 to 1800)
Literary Studies (19th Century)
Literary Studies (20th Century onwards)
Literary Studies (African American Literature)
Literary Studies (British and Irish)
Literary Studies (Early and Medieval)
Literary Studies (Fiction, Novelists, and Prose Writers)
Literary Studies (Gender Studies)
Literary Studies (Graphic Novels)
Literary Studies (History of the Book)
Literary Studies (Plays and Playwrights)
Literary Studies (Poetry and Poets)
Literary Studies (Postcolonial Literature)
Literary Studies (Queer Studies)
Literary Studies (Science Fiction)
Literary Studies (Travel Literature)
Literary Studies (War Literature)
Literary Studies (Women's Writing)
Literary Theory and Cultural Studies
Mythology and Folklore
Shakespeare Studies and Criticism
Browse content in Media Studies
Browse content in Music
Applied Music
Dance and Music
Ethics in Music
Ethnomusicology
Gender and Sexuality in Music
Medicine and Music
Music Cultures
Music and Religion
Music and Media
Music and Culture
Music Education and Pedagogy
Music Theory and Analysis
Musical Scores, Lyrics, and Libretti
Musical Structures, Styles, and Techniques
Musicology and Music History
Performance Practice and Studies
Race and Ethnicity in Music
Sound Studies
Browse content in Performing Arts
Browse content in Philosophy
Aesthetics and Philosophy of Art
Epistemology
Feminist Philosophy
History of Western Philosophy
Metaphysics
Moral Philosophy
Non-Western Philosophy
Philosophy of Science
Philosophy of Language
Philosophy of Mind
Philosophy of Perception
Philosophy of Action
Philosophy of Law
Philosophy of Religion
Philosophy of Mathematics and Logic
Practical Ethics
Social and Political Philosophy
Browse content in Religion
Biblical Studies
Christianity
East Asian Religions
History of Religion
Judaism and Jewish Studies
Qumran Studies
Religion and Education
Religion and Health
Religion and Politics
Religion and Science
Religion and Law
Religion and Art, Literature, and Music
Religious Studies
Browse content in Society and Culture
Cookery, Food, and Drink
Cultural Studies
Customs and Traditions
Ethical Issues and Debates
Hobbies, Games, Arts and Crafts
Lifestyle, Home, and Garden
Natural world, Country Life, and Pets
Popular Beliefs and Controversial Knowledge
Sports and Outdoor Recreation
Technology and Society
Travel and Holiday
Visual Culture
Browse content in Law
Arbitration
Browse content in Company and Commercial Law
Commercial Law
Company Law
Browse content in Comparative Law
Systems of Law
Competition Law
Browse content in Constitutional and Administrative Law
Government Powers
Judicial Review
Local Government Law
Military and Defence Law
Parliamentary and Legislative Practice
Construction Law
Contract Law
Browse content in Criminal Law
Criminal Procedure
Criminal Evidence Law
Sentencing and Punishment
Employment and Labour Law
Environment and Energy Law
Browse content in Financial Law
Banking Law
Insolvency Law
History of Law
Human Rights and Immigration
Intellectual Property Law
Browse content in International Law
Private International Law and Conflict of Laws
Public International Law
IT and Communications Law
Jurisprudence and Philosophy of Law
Law and Politics
Law and Society
Browse content in Legal System and Practice
Courts and Procedure
Legal Skills and Practice
Primary Sources of Law
Regulation of Legal Profession
Medical and Healthcare Law
Browse content in Policing
Criminal Investigation and Detection
Police and Security Services
Police Procedure and Law
Police Regional Planning
Browse content in Property Law
Personal Property Law
Study and Revision
Terrorism and National Security Law
Browse content in Trusts Law
Wills and Probate or Succession
Browse content in Medicine and Health
Browse content in Allied Health Professions
Arts Therapies
Clinical Science
Dietetics and Nutrition
Occupational Therapy
Operating Department Practice
Physiotherapy
Radiography
Speech and Language Therapy
Browse content in Anaesthetics
General Anaesthesia
Neuroanaesthesia
Browse content in Clinical Medicine
Acute Medicine
Cardiovascular Medicine
Clinical Genetics
Clinical Pharmacology and Therapeutics
Dermatology
Endocrinology and Diabetes
Gastroenterology
Genito-urinary Medicine
Geriatric Medicine
Infectious Diseases
Medical Toxicology
Medical Oncology
Pain Medicine
Palliative Medicine
Rehabilitation Medicine
Respiratory Medicine and Pulmonology
Rheumatology
Sleep Medicine
Sports and Exercise Medicine
Clinical Neuroscience
Community Medical Services
Critical Care
Emergency Medicine
Forensic Medicine
Haematology
History of Medicine
Browse content in Medical Dentistry
Oral and Maxillofacial Surgery
Paediatric Dentistry
Restorative Dentistry and Orthodontics
Surgical Dentistry
Browse content in Medical Skills
Clinical Skills
Communication Skills
Nursing Skills
Surgical Skills
Medical Ethics
Medical Statistics and Methodology
Browse content in Neurology
Clinical Neurophysiology
Neuropathology
Nursing Studies
Browse content in Obstetrics and Gynaecology
Gynaecology
Occupational Medicine
Ophthalmology
Otolaryngology (ENT)
Browse content in Paediatrics
Neonatology
Browse content in Pathology
Chemical Pathology
Clinical Cytogenetics and Molecular Genetics
Histopathology
Medical Microbiology and Virology
Patient Education and Information
Browse content in Pharmacology
Psychopharmacology
Browse content in Popular Health
Caring for Others
Complementary and Alternative Medicine
Self-help and Personal Development
Browse content in Preclinical Medicine
Cell Biology
Molecular Biology and Genetics
Reproduction, Growth and Development
Primary Care
Professional Development in Medicine
Browse content in Psychiatry
Addiction Medicine
Child and Adolescent Psychiatry
Forensic Psychiatry
Learning Disabilities
Old Age Psychiatry
Psychotherapy
Browse content in Public Health and Epidemiology
Epidemiology
Public Health
Browse content in Radiology
Clinical Radiology
Interventional Radiology
Nuclear Medicine
Radiation Oncology
Reproductive Medicine
Browse content in Surgery
Cardiothoracic Surgery
Gastro-intestinal and Colorectal Surgery
General Surgery
Neurosurgery
Paediatric Surgery
Peri-operative Care
Plastic and Reconstructive Surgery
Surgical Oncology
Transplant Surgery
Trauma and Orthopaedic Surgery
Vascular Surgery
Browse content in Science and Mathematics
Browse content in Biological Sciences
Aquatic Biology
Biochemistry
Bioinformatics and Computational Biology
Developmental Biology
Ecology and Conservation
Evolutionary Biology
Genetics and Genomics
Microbiology
Molecular and Cell Biology
Natural History
Plant Sciences and Forestry
Research Methods in Life Sciences
Structural Biology
Systems Biology
Zoology and Animal Sciences
Browse content in Chemistry
Analytical Chemistry
Computational Chemistry
Crystallography
Environmental Chemistry
Industrial Chemistry
Inorganic Chemistry
Materials Chemistry
Medicinal Chemistry
Mineralogy and Gems
Organic Chemistry
Physical Chemistry
Polymer Chemistry
Study and Communication Skills in Chemistry
Theoretical Chemistry
Browse content in Computer Science
Artificial Intelligence
Computer Architecture and Logic Design
Game Studies
Human-Computer Interaction
Mathematical Theory of Computation
Programming Languages
Software Engineering
Systems Analysis and Design
Virtual Reality
Browse content in Computing
Business Applications
Computer Security
Computer Games
Computer Networking and Communications
Digital Lifestyle
Graphical and Digital Media Applications
Operating Systems
Browse content in Earth Sciences and Geography
Atmospheric Sciences
Environmental Geography
Geology and the Lithosphere
Maps and Map-making
Meteorology and Climatology
Oceanography and Hydrology
Palaeontology
Physical Geography and Topography
Regional Geography
Soil Science
Urban Geography
Browse content in Engineering and Technology
Agriculture and Farming
Biological Engineering
Civil Engineering, Surveying, and Building
Electronics and Communications Engineering
Energy Technology
Engineering (General)
Environmental Science, Engineering, and Technology
History of Engineering and Technology
Mechanical Engineering and Materials
Technology of Industrial Chemistry
Transport Technology and Trades
Browse content in Environmental Science
Applied Ecology (Environmental Science)
Conservation of the Environment (Environmental Science)
Environmental Sustainability
Environmentalist Thought and Ideology (Environmental Science)
Management of Land and Natural Resources (Environmental Science)
Natural Disasters (Environmental Science)
Nuclear Issues (Environmental Science)
Pollution and Threats to the Environment (Environmental Science)
Social Impact of Environmental Issues (Environmental Science)
History of Science and Technology
Browse content in Materials Science
Ceramics and Glasses
Composite Materials
Metals, Alloying, and Corrosion
Nanotechnology
Browse content in Mathematics
Applied Mathematics
Biomathematics and Statistics
History of Mathematics
Mathematical Education
Mathematical Finance
Mathematical Analysis
Numerical and Computational Mathematics
Probability and Statistics
Pure Mathematics
Browse content in Neuroscience
Cognition and Behavioural Neuroscience
Development of the Nervous System
Disorders of the Nervous System
History of Neuroscience
Invertebrate Neurobiology
Molecular and Cellular Systems
Neuroendocrinology and Autonomic Nervous System
Neuroscientific Techniques
Sensory and Motor Systems
Browse content in Physics
Astronomy and Astrophysics
Atomic, Molecular, and Optical Physics
Biological and Medical Physics
Classical Mechanics
Computational Physics
Condensed Matter Physics
Electromagnetism, Optics, and Acoustics
History of Physics
Mathematical and Statistical Physics
Measurement Science
Nuclear Physics
Particles and Fields
Plasma Physics
Quantum Physics
Relativity and Gravitation
Semiconductor and Mesoscopic Physics
Browse content in Psychology
Affective Sciences
Clinical Psychology
Cognitive Psychology
Cognitive Neuroscience
Criminal and Forensic Psychology
Developmental Psychology
Educational Psychology
Evolutionary Psychology
Health Psychology
History and Systems in Psychology
Music Psychology
Neuropsychology
Organizational Psychology
Psychological Assessment and Testing
Psychology of Human-Technology Interaction
Psychology Professional Development and Training
Research Methods in Psychology
Social Psychology
Browse content in Social Sciences
Browse content in Anthropology
Anthropology of Religion
Human Evolution
Medical Anthropology
Physical Anthropology
Regional Anthropology
Social and Cultural Anthropology
Theory and Practice of Anthropology
Browse content in Business and Management
Business Strategy
Business Ethics
Business History
Business and Government
Business and Technology
Business and the Environment
Comparative Management
Corporate Governance
Corporate Social Responsibility
Entrepreneurship
Health Management
Human Resource Management
Industrial and Employment Relations
Industry Studies
Information and Communication Technologies
International Business
Knowledge Management
Management and Management Techniques
Operations Management
Organizational Theory and Behaviour
Pensions and Pension Management
Public and Nonprofit Management
Strategic Management
Supply Chain Management
Browse content in Criminology and Criminal Justice
Criminal Justice
Criminology
Forms of Crime
International and Comparative Criminology
Youth Violence and Juvenile Justice
Development Studies
Browse content in Economics
Agricultural, Environmental, and Natural Resource Economics
Asian Economics
Behavioural Finance
Behavioural Economics and Neuroeconomics
Econometrics and Mathematical Economics
Economic Systems
Economic History
Economic Methodology
Economic Development and Growth
Financial Markets
Financial Institutions and Services
General Economics and Teaching
Health, Education, and Welfare
History of Economic Thought
International Economics
Labour and Demographic Economics
Law and Economics
Macroeconomics and Monetary Economics
Microeconomics
Public Economics
Urban, Rural, and Regional Economics
Welfare Economics
Browse content in Education
Adult Education and Continuous Learning
Care and Counselling of Students
Early Childhood and Elementary Education
Educational Equipment and Technology
Educational Strategies and Policy
Higher and Further Education
Organization and Management of Education
Philosophy and Theory of Education
Schools Studies
Secondary Education
Teaching of a Specific Subject
Teaching of Specific Groups and Special Educational Needs
Teaching Skills and Techniques
Browse content in Environment
Applied Ecology (Social Science)
Climate Change
Conservation of the Environment (Social Science)
Environmentalist Thought and Ideology (Social Science)
Natural Disasters (Environment)
Social Impact of Environmental Issues (Social Science)
Browse content in Human Geography
Cultural Geography
Economic Geography
Political Geography
Browse content in Interdisciplinary Studies
Communication Studies
Museums, Libraries, and Information Sciences
Browse content in Politics
African Politics
Asian Politics
Chinese Politics
Comparative Politics
Conflict Politics
Elections and Electoral Studies
Environmental Politics
European Union
Foreign Policy
Gender and Politics
Human Rights and Politics
Indian Politics
International Relations
International Organization (Politics)
International Political Economy
Irish Politics
Latin American Politics
Middle Eastern Politics
Political Methodology
Political Communication
Political Philosophy
Political Sociology
Political Behaviour
Political Economy
Political Institutions
Political Theory
Politics and Law
Public Administration
Public Policy
Quantitative Political Methodology
Regional Political Studies
Russian Politics
Security Studies
State and Local Government
UK Politics
US Politics
Browse content in Regional and Area Studies
African Studies
Asian Studies
East Asian Studies
Japanese Studies
Latin American Studies
Middle Eastern Studies
Native American Studies
Scottish Studies
Browse content in Research and Information
Research Methods
Browse content in Social Work
Addictions and Substance Misuse
Adoption and Fostering
Care of the Elderly
Child and Adolescent Social Work
Couple and Family Social Work
Developmental and Physical Disabilities Social Work
Direct Practice and Clinical Social Work
Emergency Services
Human Behaviour and the Social Environment
International and Global Issues in Social Work
Mental and Behavioural Health
Social Justice and Human Rights
Social Policy and Advocacy
Social Work and Crime and Justice
Social Work Macro Practice
Social Work Practice Settings
Social Work Research and Evidence-based Practice
Welfare and Benefit Systems
Browse content in Sociology
Childhood Studies
Community Development
Comparative and Historical Sociology
Economic Sociology
Gender and Sexuality
Gerontology and Ageing
Health, Illness, and Medicine
Marriage and the Family
Migration Studies
Occupations, Professions, and Work
Organizations
Population and Demography
Race and Ethnicity
Social Theory
Social Movements and Social Change
Social Research and Statistics
Social Stratification, Inequality, and Mobility
Sociology of Religion
Sociology of Education
Sport and Leisure
Urban and Rural Studies
Browse content in Warfare and Defence
Defence Strategy, Planning, and Research
Land Forces and Warfare
Military Administration
Military Life and Institutions
Naval Forces and Warfare
Other Warfare and Defence Issues
Peace Studies and Conflict Resolution
Weapons and Equipment

The Oxford Handbook of Quantitative Methods in Psychology, Vol. 1

< Previous chapter

23 Network Analysis: A Definitional Guide to Important Concepts

Harold D. Green Jr., RAND Corporation, Santa Monica, CA

Stanley Wasserman is the James H. Rudy Professor of Statistics, Psychology, and Sociology at Indiana University. He is also Research Fellow of the International Laboratory for Applied Network Research at the National Research University Higher School of Economics in Moscow, and has had faculty appointments in Minnesota and Illinois. Professor Wasserman was Founding Chair of the Department of Statistics at Indiana and Founding Editor and Coordinating Editor of the journal Network Science. He is coauthor of Social Network Analysis: Methods and Applications, and is an Honorary Fellow of the American Statistical Association, the International Statistical Institute, and the American Association for the Advancement of Science. He received his PhD in 1977 in Statistics from Harvard University. He is a member of the Department of Psychological and Brain Sciences and the Department of Statistics.

Published: 16 December 2013
Cite Icon Cite
Permissions Icon Permissions

Written for the social/behavioral scientist seeking to learn the fundamentals of network analysis, this chapter focuses on the core concepts of network methodology. There is a new wave of statistical models linking social structure to individual behaviors that promises to keep the social network paradigm at the forefront of social and behavioral science for many years to come. In the course of our discussion, we define network analysis, explain its development, and suggest its future evolution. We introduce the key concepts, methods, theory, and applications of network analysis, hopefully whetting the appetites of those unfamiliar with this area, encouraging them to learn more, particularly about statistical approaches to network analysis. Those with an existing knowledge of social network analysis or with more sophisticated quantitative backgrounds may find the chapter useful for basic references.

Introduction

This chapter is written for the social/behavioral scientist seeking to learn the fundamentals of network analysis before diving into the more complicated literature or to understand the basics of network analysis underlying twenty-first century network science. The core concepts of network analysis are tacitly included in a new wave of statistical models that link social structure to individual behaviors. Given the increase in network-based research, novices in social network analysis comprise a rapidly growing and increasingly important audience.

In the course of our discussion, we define network analysis, explain its development, and suggest its future evolution. We introduce the key concepts, methods, theory, and applications of network analysis, hopefully whetting the appetites of those unfamiliar with this area, encouraging them to learn more, particularly about statistical approaches to network analysis. Those with an existing knowledge of social network analysis or with more sophisticated quantitative backgrounds may find the chapter useful for basic references.

What Is a Social Network?

It’s a foregone conclusion that networks are everywhere and that the networks paradigm has become pervasive. Transportation networks ( Guimera, Mossa, Turtschi, & Amaral, 2005 ) move people and materials around the world. Trade networks ( Smith & White, 1992 ) support our growing global economy. Communication networks ( Barnett, 2001 ) make it easy for us to watch happenings in the Middle East from our living rooms in Minnesota. The power grid ( Albert, Albert, & Nakarado, 2004 ) balances our need for electricity through a network of sources and sinks. Complicated networks of protein interactions ( Li et al., 2004 ) ensure that we are metabolizing the food we eat.

Networks and systems have risen to special prominence in the past decades ( Barabasi, 2002 ; Watts, 2004 ) as computational skills have enabled us to visualize and analyze more and more complicated data. However, network thinking is not new, particularly with respect to how individuals interact with each other ( Moreno, 1934 ). Kinship ( D. R. White & Johansen, 2005 ), team work ( Cummings & Cross, 2003 ), collaboration ( Cross, Parker, & Borgatti, 2002 ), and nation-building ( Hafner-Burton, Kahler, & Montgomery, 2009 ) all focus on how individuals are related to each other and why they may choose to create or break social ties.

Our discussion focuses on the analysis of networks from a behavioral or social sciences perspective. We are interested in the analysis of networks that approximate the size, structure, and composition of actual social groups. In this context, a social network is (1) a set of actors who (2) have the possibility of being connected to each other by at least one type of relationship—for example, friendship.

Network Actors

Network actors are usually individuals ( Espelage, Green, & Wasserman, 2007b ), although other entities such as clubs ( Galaskiewicz, Wasserman, Rauschenbach, Bielefeld, & Mullaney, 1985 ), families ( Padgett & Ansell, 1993 ), organizations ( Borgatti & Foster, 2003 ), or nations ( Nemeth & Smith, 1985 ) have also been and continue to be actors in network studies. The only requirement is that actors must generally be at the same level of analysis; analyses and interpretations become complicated when network actors comprise a mix of people, places, and things. Network actors may also have the attributes—nominal, ordinal, or ratio—one would expect in standard behavioral science research. Nominal variables such as gender and race/ethnicity are common. Ordinal variables such as educational attainment or frequency of smoking are also common. Ratio variables such as grade point average or blood pressure have been used in network studies, but less frequently ( Scott, 2000c ; Wasserman & Faust, 1994 ; Wellman & Berkowitz, 1988 ).

Potential Connections Among Actors

In a social network, all actors have the possibility of being connected by at least one type of relationship ( Wasserman & Faust, 1994 ). From the social perspective, a relationship is simply an interaction between any two actors in the population of interest. Thus, friendship, exchange (of resources), advice-seeking, shared activities, romantic involvement, and kinship can all be considered relationships—or relations—from the network perspective. However, ties between pairs of actors must all represent the same type of interaction. Relational data can be dichotomous, either present or absent; valued, with higher values representing a greater frequency or intensity; or signed, with valence indicating positive or negative affect. For example, whether I know Laura represents one sort of information, how well I know her is a different sort of information, and whether I like or dislike her is a third.

Representing Network Data

Normally network data are represented by a square sociomatrix , with g rows and columns, where g is the number of actors in the network. The value in the i , j cell represents the information we have about the relationship between i and j . Relationships can be directed, such as “knowing,” where actor i reports knowing actor j , but actor j does not necessarily report knowing actor i . One only has to be reminded that often school children know those children in the grades above them but don’t know those children in the grades below them to get the sense of a directed relationship. Directed relationships can lead to asymmetric sociomatrices, in which corresponding cells ( i , j ) and ( j , i ) are not necessarily the same. Other relationships, such as marriage (in its most frequent operationalization), are nondirected and lead to symmetric sociomatrices. That is, the definition of the relationship, marriage, implies that both actors in the dyad have agreed to the relationship. Actor attribute data are represented by gxm matrices where g represents the number of actors in the network and m represents the number of attribute variables ( Wasserman & Faust, 1994 ). This rectangular matrix is the standard matrix used in most social science research.

This simple framework for social network data can be extended or modified fairly easily. Collecting more than one type of relationship among the same set of actors allows for the investigation of multivariate network data. One might, for example, be interested not only in the formal lines of communication within an organization but also in the informal lines of communication ( Borgatti & Cross, 2003 ). Further, rather than collecting data from all members of a well-defined population such as a classroom or a sports team ( sociocentric or complete network data), one might want to collect information about the network surrounding a particular individual or set of individuals ( egocentric [ Marsden, 1990 ], personal [ McCarty, 2002 ], or local networks). Finally, network analysis can accommodate what has been termed affiliation data ( Borgatti & Everett, 1997 ; Borgatti & Halgin, 2010; Breiger, 1974 ), in which members of the population are connected to events or clubs (things with which they are affiliated), rather than directly to each other. These data are often easily gathered from documentary and public sources and thus are often used when primary social network data cannot be collected.

We focus on complete network data collected via primary data collection. However, we will return to these other types of network data when appropriate throughout this chapter.

Social Networks Can be Both Predictors and Outcomes in Social and Behavioral Research

Early in the history of social network analysis, social and behavioral scientists were interested primarily in describing the structural features of social networks. However, this quickly gave way to more interesting studies that aimed to connect social structure with the behaviors and attitudes of individuals and groups ( Borgatti & Foster, 2003 ; Luke & Harris, 2007 ; Shumate & Palazzolo, 2010 ). The key intuition here is that something about the structure of a network affects how people think or behave. For example, a denser, more well-connected network structure might be related to a high degree of consensus about who should take a particular political office ( Bienenstock, Bonacich, & Oliver, 1990 ). Or an individual who is better connected to others in the network might have access to more accurate information about the political race ( Freeman, 1977 ; Ibarra, 1993 ). In this case, network structure is considered a predictor .

However, researchers also began to realize that in some cases, an individual’s behaviors, attitudes, or attributes might themselves affect network structures. That is, White family networks might differ from those of Hispanic and African-American families ( Angel & Tienda, 1982 ). Or the fact that an individual smokes cigarettes may lead to greater numbers of friendships with other smokers and fewer friendships with nonsmokers ( Ennett & Bauman, 1994 ). Here, network structure is an outcome or a response variable.

Networks As Predictors

Generally, the focus of network studies is the individual. When studies investigate how an individual’s position in the network affects his/her behaviors and attitudes, we say that the focus is on social influence ( Marsden, 1981 ; Marsden & Friedkin, 1993 ; Steglich, Snijders, & Pearson, 2010 ; Valente, 1995 ), an evolutionary mechanism that assumes networks are predictors. That is, how does one’s social position or one’s connection to others with particular attributes influence one’s behaviors? Generally such studies focus on the “influencers” to whom an individual is directly connected. The best known of such studies are concerned with peers and risk behaviors such as smoking, drinking, and drug use ( Bauman & Ennett, 1996 ; Christakis & Fowler, 2008 ; Mercken, Snijders, Steglich, Vertiainen, & de Vries, 2010 ; Neaigus et al., 2006 ; Urberg, Degirmencioglu, & Pilgrim, 1997 ). However, in some cases the influence of those to whom an individual is indirectly connected or the normative behavior within a local network may be important ( Go, Green, Kennedy, Pollard, & Tucker, 2010 ). These contextual, group-level, or neighborhood effects have also been studied in a network analysis context ( Koehly et al., 2008 ; Poteat, Espelage, & Green, 2007 ; Preciado, Snijders, Burk, Stattin, & Kerr, 2011 ; Sampson, Morenoff, & Gannon-Rowley, 2002 ).

Networks As Outcomes

When studies investigate how the attributes, behaviors, and attitudes of individuals affect their network positions, we say that the focus is on social selection ( Robins, Pattison, & Elliott, 2001 ), an evolutionary mechanism that assumes networks are outcomes. That is, how do the characteristics of individuals affect their position in the network? Many recent studies of the distribution of smoking, drinking, and risk behaviors have shown that, contrary to popularly held opinions, peer influence plays less of a role in creating similarities among friends with respect to these behaviors ( Bullers, Cooper, & Russell, 2001 ; Hall & Valente, 2007 ; Mercken, Candel, Willems, & De Vries, 2007 ; Mercken, Snijders, Steglich, Vartiainen, & de Vries, 2010 ). It is, in fact, individuals choosing others whose behaviors are already like their own that lead to the similarities in behavior that we see. For example, adolescents who smoke may belong to a network of other adolescent smokers, not because network members influence each other to smoke but because they choose other smokers to associate with.

This fact raises an important point. Network data collected cross-sectionally can only be used to detect the outcomes of social influence and social selection: homophily ( McPherson, Smith-Lovin, & Cook, 2001 ). Homophily here refers to similarities among individuals who are directly connected to each other: “Birds of a feather flock together.” That is, are bullies more or less likely to be friends with other bullies? The influence and selection mechanisms that lead to homophily can only be explored via longitudinal studies, in which we are able to observe changes in network structure and in individual attributes ( Aral, Muchnik, & Sundararajan, 2009 ; Snijders, 2009 ; Snijders, van de Bunt, & Steglich, 2010 ; Steglich et al., 2010 ). Co-evolution of such features over time forms the basis for most current social network models, and we will return to this topic in a subsequent section. First, however, we provide some background information about networks and network features necessary for understanding more complex network concepts.

The Set of Network Descriptive Concepts Is Comprehensive but Has Limited Applicability

Social and behavioral scientists and graph theorists have developed a range of concepts for describing network structure and composition. However, these concepts are fairly limited in their applicability to analytic paradigms beyond the descriptive for a number of reasons. The most important limitation stems from interdependencies among network actors and difficulties with comparability across networks of differing sizes. The set of concepts has increasingly broad focus—(1) network actors, (2) connections shared with other actors, (3) subgroups defined by structural features of the nodes, and (4) network-wide concepts. We briefly describe each of these below. Generally one might think of these features as endogenous, based solely on network ties, or exogenous, independent of network structure ( Friedkin & Johnsen, 1997 ). For greater detail, readers are directed to the many introductory network analysis texts such as those by Wasserman and Faust (1994) , Scott (2000c) , Hanneman and Riddle (2005) , or Wellman and Berkowitz (1988) .

Network actors (or alters) are often called nodes , a term taken from graph theory, because they are the connection points for the ties formed by the particular type of relation under investigation. Generally, these nodes are described by a set of exogenous attributes (these fit into the g × m matrix we discussed earlier), independent of an alter’s position in the network. The simplest endogenous

Network representations of indegree and outdegree.

node-level attribute, an alter’s position in the network, is most frequently described by a group of measures described as centrality or prestige measures ( Bonacich, 1987 ; Borgatti, 2005 ; Freeman, 1978 , 1979 ; Friedkin, 1991 ; Gould, 1989 ; Scott, 2000a ; Wasserman & Faust, 1994 ). Researchers have developed a fair number of these measures, but they all have one common idea: certain network members are, by virtue of their structural position, more or less prominent than others. The simplest of these measures are based on the number of ties a node reports having with other nodes (a node’s out-degree ) or the number of ties a node receives (a node’s in-degree ), shown in Figure 23.1 .

Beyond these simple definitions of prestige are quite a few others. Some of them are based on the number of shortest paths a node lies on between any two other nodes. One such measure is called betweenness ( Freeman, 1977 ). Others are based on the average number of steps between a node and all others in the network. One such distance-based measure is called closeness ( Freeman, 1977 ). Still others are based on whether a node is connected to well-connected others. Further measures of centrality are based on flow, information, dependence, and control, variously defined by graph theorists and social scientists ( Borgatti, 2005 ).

Statistical models of influence and selection are, at their most basic level, models that determine whether an individual’s position is related to his/her behavior. Thus, a basic understanding of prestige is necessary for applying these models appropriately; a more sophisticated understanding is required to ensure the most appropriate choice of parameters for network statistical models.

Connections Shared With Other Actors

Network actors gain their structural positions by virtue of the connections, or edges, they share with other nodes. Edges, too, can have attributes. Most importantly, the type of relationship represented is an edge attribute. When networks are comprised of multiple relationships, it becomes important to note which edge type(s) is being discussed. The presence or absence of a tie can be represented by a dichotomous variable. However, sometimes these relationships vary in strength, which can be represented by a value that indicates frequency, intensity, duration, or level of formality or reliability ( McPherson, Smith-Lovin, & Brashears, 2006 ). Figure 23.2 displays these different edge types graphically.

Network representations of an undirected, dichotomous network and a directed, valued network.

As noted earlier, edges may be directional. That is, a tie may exist from i to j but not from j to i . Consider loaning money to a friend. You may loan money to him or her but never ask for a loan from him or her. This difference would be noted in the sociomatrix by differing values in related i , j and j , i cells. When directionalties exist and exist at the same level for both individuals, they are called mutual or reciprocal . As we will discuss later, reciprocation has a special relevance for statistical models. The newest models of structure and behaviors also include frequency or strength measures ( Krivitsky, 2011 ). Thus, understanding edge characteristics is also necessary for understanding the new statistical models.

Subgroups Defined by Network Structure

Beyond nodes and edges, certain groups of nodes, by virtue of their patterns of connections to other nodes, are often grouped together. The intuition behind these “subgroup” measures is that certain groups of network members may be more alike in certain attributes, leading lead them to form cohesive subgroups ( Erickson, 1988 ; K. A. Frank, 1995 ; Freeman, 1992 ; Moody & White, 2003 ; Scott, 2000b ; Wasserman & Faust, 1994 ). The simplest of these subgroups is a clique , shown in Figure 23.3 , a group in which everyone is connected to everyone else. Although frequent in small groups, they tend to be increasingly rare in larger networks; a strict definition that requires all clique members to be directly

Network representations of a simple undirected and a simple directed clique.

connected to all others within the group is difficult to meet in larger groups.

There are, however, less strict definitions suggested for cohesive subgroups. Clans , k - plexe s, and lambda sets (among others) all create subgroups based on the members in the group being a certain distance away from each other. The new statistical models of network evolution have as a basic assumption, that dyadic interactions build themselves into subgroup interactions and ultimately lead to the global network structures we see in empirically collected network data ( Robins, Pattison, & Wang, 2009 ; Robins, Pattison, & Woolcock, 2005 ). Thus understanding subgroups and the underlying definitional concepts is critical to accurate model specification.

Many “subgroup” definitions are developed directly from formal graph theoretic concepts and are often difficult to apply in the context of social science research. Social scientists have developed other, more relevant subgroup concepts, most of them based on a group of network nodes being more connected to each other than they are to the remaining nodes in a network. Most of these are based on some operationalization of “modularity.” Generally, the modularity of a network indicates how easy it would be to divide the network into subgroups or modules. It is normally measured as the proportion of ties within the group relative to the proportion of ties to outside members. The initial operationalization of modularity gives rise to another type of subgroup, a faction similar to a community or a subculture ( Clauset, Newman, & Moore, 2004 ; Danon, Diaz-Guilera, Duch, & Arenas, 2005 ; Donetti & Munoz, 2004 ; Duch & Arenas, 2005 ; Fortunato, 2010 ; Fortunato, Latora, & Marchiori, 2004 ; Girvan & Newman, 2002 ; Guimera, Sales-Pardo, & Amaral, 2004 ; Gustafsson, Hornquist, & Lombardi, 2006 ; Hastings, 2006 ; Lancichinetti & Fortunato, 2009 ; Newman, 2001b , 2004a , 2004b , 2006 ; Newman & Girvan, 2004 ; Pollner, Palla, & Vicsek, 2006 ; Reichardt & Bornholdt, 2004 ).

The concept of modularity in a network has analytic consequences. For example, an intuition stemming from the concept of modularity is that an entire grade may not be the best level of analysis for linking behavior and structure. In some grades, classroom interactions may be more relevant. Modularity-based measures identify subgroups where structure and behavior may be more strongly linked. Thus, understanding modularity is essential for choosing the appropriate level of analysis. Further, new network models are inherently multilevel, and new research will soon enable the inclusion of modularity effects, allowing us to model larger networks while specifying that the normative behaviors within local network neighborhoods or communities have stronger effects on an individual’s behaviors ( Pattison & Robins, 2002 ; Snijders & Baerveldt, 2003 ) than network-wide behaviors.

Network-Wide Concepts

Sometimes certain network members are structurally similar to each other not because they are directly connected or because they are members of the same subgroup, but because they have the same pattern of connections to other network members. Network nodes are structurally equivalent ( Borgatti & Everett, 1992 ; Burt, 1980 , 1987c ; Faust, 1988 ; Knoke & Kuklinski, 1982 ; Leicht, Holme, & Newman, 2006 ; Lorrain & White, 1971 ; Reichardt & White, 2007 ; Scott, 1991 ; Wasserman & Faust, 1994 ) to each other if they share an identical set of connections to identical nodes in a network. Two unmarried uncles in the same family would be structurally equivalent; it would be impossible to tell them apart. Generally, however, this level of equivalence is hard to attain, and this kind of equivalence is network-dependent. Two unmarried uncles from different families would not be structurally equivalent because they are not connected to the exact same parents or to the same siblings or descendants. However, social scientists would still argue that these uncles occupy similar structural positions.

Automorphic and regular equivalence concepts loosen the restriction of identical patterns of ties to identical nodes and allow for more generalized definitions of equivalence that are more comparable across networks. In general, the creation of automorphic and regular equivalence classes simplifies network diagrams and summarizes general patterns of connections to and from a set of jointly occupied positions of network actors. The multilevel nature of current statistical network models allows for an individual’s “role” to be incorporated. Although a role is

Network representations of a centralized and a decentralized network.

a specialized structural feature, it operates as a nodal attribute and thus can be used to model selection and influence processes.

Generalized measures at the level of the network itself allow us to summarize what is happening structurally in a network. Network density is the proportion of ties that are present in a network relative to the number of ties that could exist, if every node were connected to every other node. Network centralization ( Freeman, 1978 ) is a measure that indicates how evenly spread the ties in the network are. That is, does a certain network member have a particularly high number of ties while others have very few? If so, the network is considered centralized. If not, the network is considered decentralized ( see Fig. 23.4 ).

Centralization is based on the centrality and prestige measures described above, and thus, each prestige (or centrality) measure has an equivalent network-wide centralization score. Of course, exogenous characteristics of the nodes can also be presented as summary measures. The proportion of males, the average level of education, the average annual income, and other measures of network composition are also network-level descriptive measures. Network-level measures are often seen as structural “control” variables in new statistical models, because, for example, the likelihood of reciprocity, of star-shaped structures centered on popular individuals, and of other higher-order features all depend on the density of ties within a network.

Network Theoretical Concepts Have Become Important for Understanding Social Systems

Years of social and behavioral research involving network descriptive properties have brought certain concepts to the fore as important for theoretical understanding of social systems ( Borgatti & Halgin, 2011 ; Borgatti, Mehra, Brass, & Labianca, 2009 ). As we did in describing network measures, we will describe important network theoretical concepts with incrementally broad focus. Some of these concepts concern the distribution of ties across nodes within a network. Preferential attachment (Newman, 2001a) and small world concepts ( Amaral, Scala, Barthelemy, & Stanley, 2000 ; Killworth, McCarty, Bernard, & House, 2006 ; Milgram, 1967 ; Travers & Milgram, 1969 ; Watts, 1999 ; Watts & Strogatz, 1998 ) are two important examples. Some concepts have to do with the arrangement of attributes or ties across dyads of nodes in a network. Homophily ( Cohen, 1977 ; Festinger, Schachter, & Back, 1967 ; Hallinan, 1980 ; Kandel, 1978 ; Monge & Contractor, 2003 ; Newcomb, 1961 ) and mutuality ( Bearman, 1997 ; Breiger & Ennis, 1997 ; Molm, 2003 ; Molm, Collett, & Schaefer, 2006 ; Uehara, 1990 ; Walker et al., 2000 ) fall into this category. Some of the concepts, indeed most of them, have to do with the arrangement of ties across triads or tetrads of nodes in a network. Weak ties ( Friedkin, 1980 ; Granovetter, 1973 , 1983 ; Lin, Ensel, & Vaughn, 1981 ) , structural holes ( G. Ahuja, 2000 ; Burt, 1992 , 2004 ; Podolny & Baron, 1997 ) , structural balance , ( Cartwright & Harary, 1956 ; Davis, 1967 ; Heider, 1946 ; Holland & Leinhardt, 1971 ; Taylor, 1970 ; Wasserman & Faust, 1994 ) and transitivity ( Holland & Leinhardt, 1971 ) concepts fall into this category. In the next three paragraphs we summarize the key features of each of these seven concepts beginning with nodal concepts, extending to dyadic concepts, and concluding with triadic and higher-order concepts.

Nodal Concepts

Consider a network of college freshmen, perhaps all members of a sorority. Within this group, it is likely that some individuals, such as the sorority president, would receive a large number of nominations if we were to ask the sorority members to identify who was the most popular. Other sorority members would probably receive few nominations. This variation in the number of nominations (or in-degree as we discussed previously) leads some individuals to become hubs in a social network and others to become isolates . There is a distribution of degree scores ( Newman, Strogatz, & Watts, 2001 ) such that a large number of individuals have only one or a few nominations, and very few individuals have a large number of nominations. Network researchers have found that, in general, this frequency distribution of degree scores within a network follows a single parameter power-law distribution ( Barabasi & Albert, 1999 ; Merton, 1968 ). Because new statistical models of networks and behavior strive to determine the effects of behavior on structure (and vice versa) net of the effects of structure on itself, the models require at a very basic level an adequate modeling of degree distributions and have incorporated a number of basic structural parameters that serve to facilitate this. Extending the observation that ties are not randomly distributed across nodes in a network but follow a particular pattern, network researchers have shown that, over time, individuals who appear as hubs in a network tend to attract more ties, a concept known a preferential attachment ( Newman, 2001a ; Pollner et al., 2006 ). Similarly to the “rich get richer” concept, the popular tend to increase in popularity over time. Researchers also discovered that ties are also not evenly distributed. In large networks, most people are not connected to each other; but, because of “hubs,” the average number of steps required to reach any individual is fairly low. In general this leads to a network with a low average shortest path length (shortest paths are known as geodesics in social network research) and a higher than average expected amount of clustering (or preferential attachment), as we have just discussed. These types of networks have been termed small world ( Milgram, 1967 ; Watts, 1999 , 2004 ; Watts & Strogatz, 1998 ) networks, and many researchers find that moderately sized networks have these properties.

As networks grow larger, perhaps to the size of the internet, we see an even stronger variation in the degree distribution and an even stronger fit to a power-law distribution emerges. These networks are often termed scale-free ( Callaway, Newman, Strogatz, & Watts, 2000 ; Krapivsky & Redner, 2001 ; Krapivsky, Redner, & Leyvraz, 2000 ; Price, 1965 , 1980 ) networks because their features become essentially independent of the number of actors.

Thus, new statistical models have also incorporated parameters that allow researchers to specify that structural position itself may predict increased structural position over time. That is, I become more popular in a network because I was popular to begin with. Parameters of this type help researchers control for purely structural evolution of networks so that the impact of behaviors (and vice versa) can be isolated.

Dyadic Concepts

Beyond features associated with individual nodes and the distribution of ties among them (the nodal concepts we described in the last section ), certain features of dyads have also become fairly well accepted. We have already mentioned that, in general, network members connected to each other will be more alike with respect to their exogenous characteristics than would be expected by chance. This concept, homophily ( Cohen, 1977 ; Festinger et al., 1967 ; Hallinan, 1980 ; Kandel, 1978 ; McPherson et al., 2001 ; Monge & Contractor, 2003 ), is a feature of most social systems.

However, the type of homophily evinced in each network is context dependent, and it often becomes a challenge for the researcher to determine exactly what types of homophily exist within a particular population and for a specific relation. For example, networks of romantic ties among individuals often display ethnic homophily but even more often display homophily on smoking status ( Bearman, Moody, & Stovel, 2004 ; Kobus, 2003 ). Although the mechanism is open to investigation, the fact that more romantic attachments display homophily points out the relevance of this dyadic feature in social research. Homophily-related parameters are at the very heart of the new statistical models for network analysis, as homophilous dyads evolving over time (targeted to specific behaviors of interest) are the basis for any understanding of influence or selection. Care must be taken, however, to control for other nodal attributes that might lead to homophily, incorporating variables such as gender, ethnicity, grade, and so forth, to further isolate the impact of the key behavioral variable in the influence and selection models developed.

Other dyadic concepts are based on purely structural patterns. Most small- to medium-sized social networks display higher levels of mutuality or reciprocity than would be expected by chance ( Bearman, 1997 ; Breiger & Ennis, 1997 ; Molm, 2003 ; Molm et al., 2006 ; Uehara, 1990 ; Walker et al., 2000 ). That is, close friends tend to provide reciprocal nominations of each other more often than not. Increased mutuality has become expected across network studies, particularly when the relations investigated have to do with positive affect among individuals. Relationships that focus on hierarchies, formal communication, or exchange are less likely to display this type of mutuality, but it remains a relevant structural feature to explore ( M. K. Ahuja & Carley, 1998 ). Mutual ties are some of the first selection-based parameters included in new statistical models, as they indicate one of the most important endogenous network selection mechanisms. Beyond network density, the level of mutuality is the most important network feature to control for. It is, in effect, a microlevel process that impacts the global network structure.

Triadic Concepts

The diversity of concepts and connections to sociological ideas associated with groups of three nodes becomes very rich. However, a few key concepts have taken precedence. In the late 70s, Granovetter argued that although strong ties are important for community cohesion, weak ties are important for access to unique resources ( Granovetter, 1973 , 1983 ). Consider a very tightly connected group of friends who have no connections to others. They have no access to sources of new information. Even a single connection to an outside member could be enough to provide access to innovative or novel information. Thus, weak ties between individuals who do not share the same connections to others in a social network become important. The weak vs. strong ties idea has become very important when investigating innovation, exchange, and organizational behavior, particularly job search ( Friedkin, 1980 ; Granovetter, 1973 , 1983 ; Lin et al., 1981 ). Weak ties have been operationalized in the network statistical models in a number of ways, beginning with simple measures of “actors two steps away” or “multiple actors two steps away” or “multiple open triangles.” Increasingly complex parameters for this important triangular effect continue to be developed ( Goodreau & Golden, 2007 ; Snijders, Pattison, Robins, & Handcock, 2006 ).

Closely related to the strong ties/weak ties concept is the concept of structural holes ( G. Ahuja, 2000 ; Burt, 1992 , 2004 , 2005 ; Fernandez & Gould, 1994 ; Podolny & Baron, 1997 ). Structural holes are, at their simplest, places in a network where a group of three nodes have two but not three connections. The triangles are “open.” These open triangles allow the central node to become a broker ( Alderson & Beckfield, 2004 ; Borgatti, Jones, & Everett, 1998 ; Burt, 2005 ; Fernandez & Gould, 1994 ), controlling access and transmission of resources between the two other nodes. Although this position can be risky, it offers a great deal of opportunity for improving one’s status. Again, networks where exchange is important, such as in business or trade, are where structural holes have been most explored. These “brokerage” effects are often modeled with the combination of an attribute and an “open triangle” parameter. More complicated parameters remain in development.

Closed triangles enable us to consider the directions of ties within the triangles. Given a network of directed ties, there are 16 different triads that can occur ( Davis, 1967 ; Holland & Leinhardt, 1971 ). Indeed, one can use analytic methods to count the number of triads of the various types to gain an understanding of how many opportunities for structural holes exist and of how well-connected the network is.

More interestingly, researchers found that some of these triads occur far more frequently or far less frequently than expected by chance. One triad that appears more often than expected is the transitive triad, where if A directs a tie to B and B directs a tie to C, then A directs a tie to C. High levels of transitivity ( Holland & Leinhardt, 1971 ) suggest a network that has achieved structural balance ( Cartwright & Harary, 1956 , 1979 ; Davis, 1967 ; Heider, 1946 ; Taylor, 1970 ). A transitive triad is the structural signature of the classic saying, “a friend of my friend is also my friend.”

A triad that occurs less often than expected is the forbidden triad, in which A directs a tie to B and B directs a tie to C but A does not direct a tie to C. This triad is likely to lead to a high level of stress among the three individuals because there is social pressure for A to “be friends with” C because of the other social ties. One can rely on one’s own experiences of being “caught in the middle” between two friends who disagree or dislike each other to understand the pressures associated with this type of triad.

The fact that these triads are, indeed, found more or less often in social systems has become widely accepted among social networks researchers. Parameters for transitivity and other closed-triangle structural parameters are key for ensuring goodness of fit in new network models and, therefore, should be well understood before incorporating them ( Snijders, van de Bunt et al., 2010 ) during model development.

Based on the intuition that connected subsets of four nodes are also important, if not as important as triangles, some researchers are exploring the types of connections that can occur among tetrads of nodes. This active area of research will likely expand as more models incorporate these parameters.

Techniques and Approaches for Collecting Network Data Are Diverse

Collecting social network data has never been easy ( Freeman, Romney, & Freeman, 1987 ; Granovetter, 1976 ; Killworth & Bernard, 1976 , 1979 ; Knoke & Yang, 2008 ; Krackhardt, 1987 ; Laumann, Marsden, & Prensky, 1983 ; Marsden, 1990 , 2005 ; McPherson, 1982 ; Wasserman & Faust, 1994 ). Characteristics of each individual study often lead to very different data collection approaches, although network behaviors and outcomes may be similar. For example, two researchers may be interested in how an individual’s friends influence their drinking. One researcher may be able to gain access to a school and thus may be able to interview every student; another researcher may only be able to contact students by e-mail and may have to interview only a sample of students. Differing data collection approaches will lead to differing analytic approaches and, perhaps, differing outcomes. In this section we review the key features of most social network data collection approaches and conclude with a brief discussion of a few data collection issues whose impact on network studies has not been fully explored. Whether or not investigators are collecting their own data, they need to understand how their data were collected so that the important design-related features can be included in network statistical models.

Sociocentric Data Collection Designs

The most frequently implemented network data collection approach is based on the collection of data from all individuals in a particular well-bounded social setting such as a school, a social club, or an organization ( Laumann et al., 1983 ). In this case, a researcher collects attribute data from each respondent and then asks each respondent to provide information on to whom they are connected across one or more relations. The most comprehensive approach is free choice roster design, in which each respondent is given a list of all respondents and asked to choose as many or as few as they desire. However, if the population of interest is of any size at all, then this task can soon become burdensome. One way to ease respondent burden is to use a fixed choice roster design, in which the choice is limited (or fixed) to a set number, so that respondents only identify, say, five individuals for a particular relation. Alternatively, rather than use a roster, a researcher might use a free choice nomination design, in which an individual is asked to name those individuals whose names come to mind, focusing on those who are most salient to a respondent. Making a decision about free or fixed choice has an impact on the number of nominations a respondent can make, effectively providing an upper and/or lower bound on a respondent’s outdegree that must be controlled for in network statistical models ( Kossinets, 2006 ), as it affects network density, degree distributions, mutuality, and other structural features.

Complete network studies, although the most frequent, suffer from a number of shortcomings. For example, the school context is usually of paramount importance to young people. However, this may not always be the case ( Feld, 1981 ). For example, children may be more influenced to smoke by friends in their neighborhood who are older or who attend different schools. A school-centered complete network study would not capture that important finding. Adults may spend substantial time at work, but their behaviors may be more impacted by the amount of time they spend outside of work. Thus a complete network study of a workplace may not be appropriate for understanding how friendships might impact exercise behaviors. Further, because roster studies involve providing respondents with the names of all individuals, certain ethical concerns arise ( Borgatti & Molina, 2005 ; Borgatti & Molina, 2003 ; Kadushin, 2005 ). If a parent does not agree that their child can participate in a school-based network study, does this mean that their name will also not appear in the roster? And if actors are nominated by other respondents, then how should those nominations be handled?

Complete network studies cannot account for the fact that individuals have a variety of social foci and can sometimes be hindered by ethical concerns. In addition, the results of sociocentered studies are difficult to generalize to other populations. Because the findings are focused on a single population, they may not have relevance in populations that differ even slightly—a fact that is important to keep in mind when interpreting network statistical models.

Egocentric Data Collection Designs

As interest in network studies has grown, the sizes of the populations under investigation have grown. Other network data collection approaches have been developed that account for some of the concerns associated with complete network studies and that can be accommodated in other types of sampling designs. Egocentric designs ( Campbell & Lee, 1991 ; O. Frank, 2005 ; Gile & Handcock, 2006 , 2010 ; Marin & Hampton, 2007 ; Marsden, 1987 ; Moore, 1990 ) focus on the ties surrounding a particular individual, or ego. In this type of study an ego generally reports on the attributes of the network actors he or she identifies (e.g., how many are male, how many have graduated college) and on the attributes of the relationships they have with them (e.g., which actors provide support and how much support they might provide).

One benefit of this type of data collection is that it is not hindered by a possibly irrelevant boundary and thus can be administered to large, randomly sampled populations. Indeed this type of network data collection has been included in a number of national level surveys, most notably the General Social Survey ( Burt, 1984 , 1986 , 1987a , 1987b ; Burt & Guilarte, 1986 ; Marsden, 1987 ). However, egocentered study designs provide little opportunity for respondents to indicate how well connected their network members might be to each other and thus eliminate the possibility of exploring how structural features beyond those associated with ego/alter dyads might impact behaviors and attitudes of interest ( Kenny, 1995 , 1996a , 1996b ; Kenny & Cook, 1999 ; Kenny & Judd, 1986 ; Kenny, Kashy, & Bolger, 1998 ; Kenny, Kashy, & Cook, 2006 ; Kenny & La Voie, 1984 ; Snijders & Kenny, 1999 ; van Duijn, van Busschbach, & Snijders, 1999 ). Newer dyadic data analysis models and “social relations” models account for clustering within dyads and allow for the multilevel modeling of such data.

To expand an egocentered network study to include how local structural features beyond the dyad might impact behaviors, we have to add an additional data collection task. We must also ask the respondent to provide information about interactions among his or her network alters. Studies that elicit alters, collect attribute information about the ego and their alters, and complete the study by asking the ego to report on his or her perceptions of interaction among actors are called personal network studies ( Bernard, Johnsen, Killworth, & McCarty, 1990 ; Lubbers et al., 2010 ; McCarty, 2002 ; McCarty, Bernard, Killworth, & Shelley, 1997 ; McCarty, Killworth, & Rennell, 2007 ). These studies retain all the features of standard egocentric network studies but also incorporate a respondents’ local “cognitive” social structure—hence, an alternative name: cognitive social networks ( Freeman et al., 1987 ; Krackhardt, 1987 ).

The underlying assumption of a personal network study is that individuals’ perceptions of the social structures surrounding them are what have the greatest impact on their behaviors and are thus relevant and important to measure. This type of data collection can be introduced into larger studies because there is no requirement that respondents be socially connected. The structural information collected is based on an alter’s reported network members (not members of the study sample), and thus measures of network structure and composition become independent descriptors of the respondent’s social context. There is some possibility that respondents will name similar actors and thus might violate assumptions of independence. To date, however, there is little information on how frequently this occurs, and no direct approaches exist (given confidentiality concerns) for addressing this issue.

Personal network data can also be modeled using the newer dyadic and social relations models, given their multilevel features ( Kenny, 1995 , 1996a , 1996b ; Kenny & Cook, 1999 ; Kenny & Judd, 1986 ; Kenny et al., 1998 ; Kenny et al., 2006 ; Kenny & La Voie, 1984 ). Given a series of simplifying assumptions, these data can also be modeled using more standard exponential random graph approaches, although a discussion of those complexities is beyond the scope of this chapter ( Lubbers et al., 2010 ; van Tilburg, 1998 ).

Each of these data collection approaches can be used at one or more points in time. The researcher must decide which approach best matches the study’s objectives. The limitations inherent in determining causality in cross-sectional studies are well understood. Longitudinal studies address these limitations but bring the additional complexities of loss to follow up, changes in the population membership (new members of a bounded community, loss of old members), or drastic changes in network structures caused by other exogenous events (hurricane Katrina likely completely demolished well-established networks in New Orleans; being diagnosed with cancer or another chronic illness can lead to drastic changes in an individual’s egocentric or personal network) ( Koskinen & Edling, 2010 ; Snijders, 2005 , 2006 , 2009 ; Snijders, van de Bunt et al., 2010 ; Steglich et al., 2010 ). These features can (and indeed must) also be included in network statistical models to ensure goodness of fit and appropriate parameter estimates.

A researcher must also consider whether affiliation data ( Borgatti & Everett, 1997 ; Borgatti & Halgin, 2010 ; Breiger, 1974 ; Schweinberger & Snijders, 2007 ) (as we have discussed previously) or direct data collection is best. This decision is sometimes based on the population under investigation and sometimes based on the focus of the study. Elites and hard-to-reach populations are sometimes best studied with affiliation data, particularly data that can be collected via archival or publicly available sources ( Moore, 1979 ). For example, studies of interlocking directorates ( Mizruchi, 1996 ) can often be completed based on organizations’ annual reports. Studies of terrorists ( Krebs, 2002 ) are often completed using public record archival data, as interviewing terrorists may not be allowed. Although collecting affiliation data might not be the most desirable choice, it may be the best choice in specific circumstances. Finally, given the time associated with collecting even a single relation in a population, the number and type of relations a researcher collects in a population becomes important, as does the selection of appropriate respondent and alter attributes. Rudimentary exponential random graph models for affiliation data exist and are being actively developed ( Koskinen & Edling, 2010 ).

There are other concerns associated with network studies. First, complete network studies suffer from issues of boundary specification ( Laumann et al., 1983 ). Although this is not such a problem for egocentric or personal network studies, determining the appropriate population can be problematic. Second, in certain circumstances network sampling ( O. Frank, 2005 ; Golinelli et al., 2010 ; Goodman, 1961 ; Holland & Leinhardt, 1973 ; McCarty, 2002 ; McCarty, Killworth et al., 2007 ; Robins, Pattison, & Woolcock, 2004 ) becomes an issue. For complete network studies, sampling individuals is likely to lead to respondents who do not know each other and thus to disconnected networks. For all types of network studies, sampling is likely to introduce measurement error ( Holland & Leinhardt, 1973 ) into network descriptive indices and bias into analyses that incorporate network structural features. To date, the type and magnitude of these errors and biases have not been fully investigated. However, these concepts are still important for model interpretation and for understanding and discussing the limitations of network statistical models ( Schweinberger, 2012 ; Snijders, 1996 , 2001 , 2008 ; Snijders, Koskinen, & Schweinberger, 2010 ; Snijders & van Duijn, 1997 ).

There Are Well-Established Approaches for Analyzing Social Network Data

There are a great many deterministic approaches for analyzing social network data, discussed in detail in every network analysis text ( Hanneman & Riddle, 2005 ). These approaches tend to focus only on the data at hand and rely at most on empirical rather than probabilistic approaches. Most of these deterministic approaches are based on the manipulation of data matrices using matrix algebra. At the least, these data must include a square sociomatrix in which the cells of the matrix represent relational ties. To this might be added other relational sociomatrices or rectangular attribute matrices that provide exogenous information about the members of the network. The data in these sociomatrices can be valued or dichotomous, although most of these deterministic approaches were developed for dichotomous data. Valued data are generally analyzed by dichotomizing at some chosen level such that values at or above the threshold represent a relationship and those below do not. Basic network analytic approaches can include visualizations of the network information, can focus on how individual nodes relate to each other, can seek to group sets of nodes based on structural characteristics, or can explore the structure of the entire network.

Common to all of these approaches, however, is the recognition that these analyses are rarely generalizable to other populations or other relations. It is important to understand the predecessors of the new statistical models, beginning with these deterministic approaches. More often than not, innovations (such as approaches to modeling valued data) stem from solutions applied to earlier analytic strategies ( Robins, Pattison, & Wasserman, 1999 ). Further, all the applications we discuss in this section are actively driving the development of new modeling approaches, parameters, and paradigms as the need for statistical models to support social networks research grows.

Visual Analytic Approaches

Some network researchers base their search for structure on network visualizations; others begin with visualizations, using them to guide their quantitative analyses ( Hogan, Carrasco, & Wellman, 2007 ; Johnson, Christian, Brunt, Hickman, & Waide, 2010 ; Johnson & Krempel, 2004 ; Johnson, Luczkovich, & Borgatti, 2009 ; McCarty, Molina, Aguilar, & Rota, 2007 ). Some researchers ( Borner, Chen, & Boyack, 2003 ; Brandes & Pich, 2009 ; Pich, 2008 ) work to develop ever better algorithms and approaches for graphing network information ( Brandes et al., 2008 ; Brandes, Fleischer, & Puppe, 2007 ; Brandes & Lerner, 2008 , 2010 ; Brandes & Pich, 2011 ). Indeed visual analytic approaches have expanded beyond the network analytic community and have become fairly common, even in the popular press. There exist programs for visualizing static networks and attributes in two and three dimensions ( Corum, 2011 ). NetDraw ( Borgatti, 2002 ) is the easiest to use and most interoperable; Pajek ( Batagelj & Mrvar, 1998 ), CI-KNOW ( Green, Contractor, & Yao, 2006 ), JUNG ( O’Madadhain, Fisher, White, & Boey, 2003 ), Mage ( Richardson & Richardson, 1992 ), and Visone ( Brandes & Wagner, 2003 ) also create network diagrams and facilitate visual analytic approaches. Each has its own outstanding features. Multidimensional scaling ( Pich, 2008 ) can also be used to investigate networks in ways that help to identify similarities among nodes based on their patterns of connections. Programs that visualize network dynamics also exist. The mage program has a dynamic instance, called kinemage ( Richardson & Richardson, 1992 ). Visone also allows for the visualization of dynamic networks.

Regardless of the program chosen, the visual analytic ( Heer & Agrawala, 2008 ; Shen, Ma, & Eliassi-Rad, 2006 ) approach seeks to facilitate the detection of structure and similarity in networks, whether that be based on structural or attribute data. Visualizations of network structure and composition often lead to intuitions regarding the data and contribute to hypothesis and model development. Thus, understanding visual analytics becomes important in the early stages of developing network statistical models.

As we mentioned previously, a number of key questions in social network analysis hinge on the intuition that social position is in some way associated with differential success or access to resources. That is, is popularity in a social network associated with better job performance ( Sparrowe, Liden, Wayne, & Kraimer, 2001 )? Is social isolation associated with aggressive behaviors ( Cairns, Cairns, Neckerman, Gest, & Gariepy, 1988 )? These questions are usually answered using a class of indicators often called centrality measures ( Freeman, 1978 ) and standard statistical approaches that measure association (like correlations and regressions).

There are numerous other measures of network centrality or prestige based on varying operationalizations of social embeddedness. Each of them has relevance in certain social contexts. Researchers who study adolescents explore whether centrality is associated with pro-social behaviors ( Farmer & Rodkin, 1996 ), substance use ( Valente, Gallaher, & Mouttapa, 2004 ), aggression ( Cairns et al., 1988 ), or school performance ( Friedkin & Slater, 1994 ). Organizational researchers investigate whether centrality in a formal communication structure correlates with centrality in informal communication networks ( Kraut, Fish, Root, & Chalfonte, 1993 ). Business researchers may investigate whether centrality in an exchange network is associated with long-term success ( Benson, 1975 ). The common theme is that a particular node’s position relative to his or her peers in the network is related to exogenous node attributes of interest. The interested reader is directed particularly to the work of Freeman, Bonacich, and Friedkin for conceptual clarifications and discussion of the features of this broad range of indicators ( Bonacich, 1987 ; Borgatti, 2005 ; Freeman, 1977 , 1978 ; Friedkin, 1991 ; Gould, 1989 ; Scott, 2000a ; Wasserman & Faust, 1994 ). Analytically, these indicators are generally linked to nodal attributes using standard statistical approaches, with network dependencies overlooked and limited generalizability acknowledged.

Dyads and Subgroups

Beyond the position of a single node in a network, often researchers are interested in small groups of two or three nodes. In particular, a researcher may be interested in whether reciprocated ties in a network appear more or less frequently than would be expected by chance ( Bonacich, 1987 ; Borgatti, 2005 ; Freeman, 1978 , 1979 ; Friedkin, 1980 , 1991 ; Gould, 1989 ; Granovetter, 1973 , 1983 ; Lin et al., 1981 ; Scott, 2000a ; Wasserman & Faust, 1994 ). Further, researchers may be interested in whether reciprocation is related to any sort of shared nodal attribute. For example, are women more likely to reciprocate ties with each other than men are ( Hagan, 1998 )? Do smokers reciprocate ties with other smokers more or less often than we might expect ( Hall & Valente, 2007 )? Analyses that explore homophily in dyadic relationships are becoming more and more frequent as computational approaches enable the rapid calculation of measures associated with homophilous dyads evolve.

Groups of three are the smallest social systems in which cliques may form. As such, they represent the simplest “exemplar” social network. A great deal of attention has been devoted to exploring the number and type of triangles that exist in a network ( Davis, 1967 ; Holland & Leinhardt, 1971 ). The triangles could be completely disconnected, partially connected (in a variety of interesting and socially relevant patterns), or completely mutually connected. The idea of “closing the gap” has led to the development of quite a bit of network-based theory, from Granovetter’s strength of weak ties arguments ( Friedkin, 1980 ; Granovetter, 1973 , 1983 ; Lin et al., 1981 ) to Burt’s structural holes and brokerage concepts ( G. Ahuja, 2000 ; Burt, 1992 , 2004 , 2005 ; Fernandez & Gould, 1994 ; Podolny & Baron, 1997 ). In general, these ideas focus on the costs and benefits associated with open or closed triangles (often of the transitive sort).

Of course, subgroups larger than three people are often the focus of our research interests. In particular cliques, groups of nodes that are completely connected, indicate areas of very strong connection that are often worth investigating for why they might be that way. An extended discussion of the fine distinctions between these types of structurally defined subgroups is beyond the scope of this introductory chapter, but the interested reader is directed to the substantial literature on these concepts. Identification of structural subgroups can be computationally intensive, which may have limited their application in social research. However, approaches do exist for identifying community structure within networks and they are being applied, particularly in the analysis of adolescent friendship data ( Clauset et al., 2004 ; Danon et al., 2005 ; Donetti & Munoz, 2004 ; Fortunato, 2010 ; Fortunato et al., 2004 ; Girvan & Newman, 2002 ; Guimera et al., 2004 ; Gustafsson et al., 2006 ; Hastings, 2006 ; Lancichinetti & Fortunato, 2009 ; Newman, 2001b , 2004a , 2004b , 2006 ; Newman & Girvan, 2004 ; Pollner et al., 2006 ; Reichardt & Bornholdt, 2004 ).

Blockmodels

Individuals in networks, besides being connected to each other or grouped into community structures, can also be similar because they occupy similar, structurally equivalent positions. Studies of equivalence generally take perfect structural equivalence as the ideal and calculate indices of how structurally equivalent any two nodes are to each other. This could be a Euclidean distance or correlation (among many choices of measures of association) calculated using each node’s connections. The indices are used to determine which nodes are equivalent.

Adopting a theoretical assumption of equivalence leads to blockmodels, either direct/exploratory or generalized/confirmatory (where one assumes not only the number of positions/subgroups but how the positions relate to each other). A blockmodel is a set of statements mapping actors to positions and explaining the relationships among and between positions ( Doreian, Batagelj, & Ferligoj, 2004 , 2005a ; Panning, 1982 ; H. C. White, Boorman, & Breiger, 1976 ). Some blockmodels use the fact that members of the same position are exchangeable to motivate explorations of what other characteristics actors might share ( DiMaggio, 1986 ). Others use the fact that exchangeability increases the network’s resilience to shocks to explore other structural features of the network, such as how the network’s overall structure would change if nodes were removed ( Xu & Chen, 2005 ).

Permutation tests often form the basis for determining how well the observed data conform to an expected blockmodel. This extends analysis of structural equivalence and focuses on common patterns of relationships among groups of nodes that are members of the same “equivalence class.” For example, one might develop a set of hypotheses about how physicians, nurses, and patients interact. Those hypotheses could be represented and tested via blockmodeling. The goal here is to present a set of structural hypotheses about the connections between groups of nodes and then test them based on the empirical data contained in the sociomatrix.

Doreian, Batagelj, and Ferligoj (2005a) describe some recent generalizations of blockmodels, including some statistical ideas for fit and assessment. The computer program Pajek ( Batagelj & Mrvar, 1998 ) was designed to fit generalized blockmodels and has implemented this strong body of work quite well. Analyses of structural equivalence have also been targeted to localized areas of networks (e.g., within one or two steps from a focal node) to facilitate analyses, which can become computationally intensive ( Breiger, Boorman, & Arabie, 1975 ; Faust & Romney, 1985 ; Panning, 1982 ; SAS Institute, 1990 ). Further algebraic/positional analyses become possible because of the simplifying equivalence assumption, and useful (but technical) techniques for the study of multiple relational networks have been developed ( Pattison, 1993 ; Wasserman & Faust, 1994 ).

Quadratic Assignment

There are also analytic approaches that investigate networks as a whole. Most of them are based on ideas that compare the observed sociomatrix to a hypothesized or expected sociomatrix. Underlying all permutation tests is a statistical assumption that the data are random and follow a uniform distribution, conditional on all node features that do not depend on the node labels. This approach, referred to as the quadratic assignment procedure, allows one to test conformity of two relations nonparametrically ( Fienberg & Wasserman, 1981 ; Wasserman & Anderson, 1987 ). In essence, it calculates a measure of association between two matrices and evaluates it with a permutation test. In general, these approaches take the network as a whole and compare it to some hypothesized “target” network, providing information on structural patterns and on how well a researcher’s structural intuitions are borne out.

Researchers Are Developing New Statistical Approaches for Analyzing Social Network Data

Recent advances in statistical modeling of network structures and behaviors now make it possible to approach network-type questions from a much more fundamental perspective ( Robins et al., 2001 ; Robins, Pattison, Kalish, & Lusher, 2007 ; Robins et al., 2009 ; Schweinberger, 2012 ; Schweinberger & Snijders, 2007 ; Snijders, 1996 , 2001 , 2005 , 2006 , 2008 , 2009 ; Snijders, Koskinen et al., 2010 ; Snijders, Steglich, & Schweinberger, 2007 ; Snijders, van de Bunt et al., 2010 ; Snijders & van Duijn, 1997 ; Steglich et al., 2010 ; Steglich, Snijders, & West, 2006 ; Wasserman & Robins, 2005 ). A more comprehensive approach to analyzing dynamic network data is provided by using a new exponential family of random graph distributions known as p *, or ERGMs Steglich, Snijders, and West (2006) . General estimating equations (GEEs), designed to model dependent categorical data, perhaps measured over time, have been used effectively by Christakis and Fowler in their Framingham studies ( Christakis & Fowler, 2008 ; Dawber, Kannel, & Lyell, 1963 ). These two approaches are based on the idea that networks and individual attributes co-evolve over time and are interdependent. Statistical and stochastic blockmodeling approaches continue to be developed, especially generalized/confirmatory blockmodels, particularly for analyzing large-scale data sets. Dyadic data analytic approaches ( Kenny, 1995 , 1996a , 1996b ; Kenny & Cook, 1999 ; Kenny & Judd, 1986 ; Kenny et al., 1998 ; Kenny et al., 2006 ; Kenny & La Voie, 1984 ; Snijders & Kenny, 1999 ), extended from multilevel models, are also gaining prominence for analyzing social network data, particularly for egocentric and personal network data. In this section we briefly present each of these analytic approaches. We conclude with a discussion of issues associated with these approaches—particularly the impact of missing data on all statistical approaches for social network analysis.

Statistical Models of Networks and Behavior

Some new statistical approaches model the relationships between network structure and individual behaviors and attitudes starting from the assumption that all structures and behaviors are dependent in perhaps complicated ways. Analytic and empirical evidence has of course shown that network ties are not independently distributed. Pressures against the forbidden triad are one example of this type of dependence. Homophily among friends is another example of this type of dependence. Longitudinal studies have shown that in addition to network structures evolving over time (dependent on network structures from previous time points) network structure can change over time because of the attributes of network members. In addition, the attributes of network members can change over time, based on the structural characteristics of the members.

Indeed, network structure and individual attributes often co-evolve. Recall that social influence is the mechanism by which a network member’s attributes change based on social position or the attributes of network members with whom he or she are directly connected. Social selection is the mechanism by which network structure changes because network members create and/or dissolve ties based on others’ attributes. Most new statistical approaches attempt to disentangle these evolutionary mechanisms, separating social influence from social selection ( Aral et al., 2009 ; Go et al., 2010 ; Hall & Valente, 2007 ; Mercken, Snijders, Steglich, Vartiainen et al., 2010 ; Mercken, Snijders, Steglich, Vartiainen et al., 2010 ; Steglich et al., 2010 ).

Random Graph Distributions— p * or ERGMs

Network statistical approaches for cross-sectional studies with measurements at one point in time began with the statistical models p 1 (dyadic independence) and p 2 (dyadic independence with heterogeneous parameters depending on actor covariates). A more sophisticated approach has evolved, using the exponential family of random graph distributions known as p * ( Wasserman & Robins, 2005 ) or ergm models. Software programs for the analysis of cross-sectional models are in continual development. The R libraries statnet, ergm and the stand-alone program PNET ( Handcock, Hunter, Butts, Goodreau, & Morris, 2003 ; Hunter, Handcock, Butts, Goodreau, & Morris, 2008 ; PNET) are most frequently used in this context. For longitudinal studies of networks, new statistical models of structural evolution that depend only on structural and compositional factors have been very successful. The underlying principles of structural actor-driven, actor-oriented models for longitudinal research were presented in Snijders (1996) and significantly developed further by Snijders and his research team over the past decade (SIENA). The statistical package SIENA fits these actor-driven actor-oriented models ( Ripley & Snijders, 2010 ).

Underlying most approaches to p * modeling is the forward specification of models that contain parameters designed to assess the role that network structure, network content, and interaction effects play in network evolution. These modeling approaches depend on conditional Markov chain Monte Carlo maximum likelihood estimation techniques ( Schweinberger & Snijders, 2007 ; Snijders, Koskinen et al., 2010 ). Models can include parameters that describe directed and undirected combinations of dyads and triads, important categorical and continuous attribute effects, network influence and selection parameters (including those that measure the degree to which individuals’ attributes become like attributes of the network as a whole and like their friends). The forward selection approach, as well as the diversity of parameters available, allows for flexible modeling of complex network dynamics, but it requires a strong theoretical justification for specifying parameters. p * analyses explicitly model influence and selection based on the inclusion of two decision equations, one behavioral and another structural, which are essentially random utility models that predict behavioral or structural decisions ( Robins et al., 2007 ; Snijders, 1996 , 2005 ; Snijders, van de Bunt et al., 2010 ; Steglich et al., 2010 ; Steglich et al., 2006 ; Wasserman & Robins, 2005 ). In the actor-oriented approach used in the SIENA program (only one of several possible applications for modeling longitudinal changes in social networks), the basic mechanism of the model is as follows: The time period between time t and t-1 is subdivided into an infinite number of microsteps. At each microstep, the actor makes a behavioral decision (influence) or structural change (selection), given the conditions in the previous microstep. These microsteps contribute to the overall changes viewed in the network between time t and time t-1 . These decisions are expressed as changes in transition probabilities from one choice state to another, with behavioral and structural decisions defined as mutually dependent ( Handcock, Hunter, Butts, Goodreau, & Morris, 2003 ; Robins et al., 2007 ; Snijders et al., 2006 ). As a variant of p * that incorporates a time element, the model is related to spatial autocorrelation models as described by Strauss (1992) and Wasserman and Pattison (1996) .

Generalized Estimating Equations

More complicated longitudinal network data sets have been analyzed recently using generalized estimating equations ( see , for example, the analytic methods used by Christakis & Fowler [2008] ). These equations arise from the analysis of categorical data (such as body mass index or happiness scores; Christakis & Fowler [2007] ; Fowler & Christakis [2008] ), repeatedly measured over time, using generalized linear models. The associations among the measurements over time introduce a multivariate aspect to the estimation—hence, the need for approximations and special software. The main advantage of GEE s resides in the unbiased estimation of population-averaged regression coefficients despite possible misspecification of the correlation structure. Such models and their assumptions can be used to model longitudinal network data and actor attribute measurements.

Dyadic and other multilevel approaches also provide an opportunity for exploring network relationships ( Kenny et al., 2006 ) clustered across respondents. Generally the outcomes of these models are some sort of dyadic characteristic. This could be homophily with respect to attributes or a dyadic behavior such as using drugs with their network members or providing support to the respondent. In these models, network actors are clustered within respondents, allowing us to control for respondent-level characteristics and local network structural characteristics. Primary predictors are characteristics of network actors and other relationship attributes. These studies can be of the one-to-many variety, in which we have actors clustered within respondents (typical of egocentric or personal network studies), or round-robin, in which all actors play the focal individual surrounded by their actors. This is often called the social relations model and is typical of dyadic analyses based on sociocentric or complete network data.

Some Outstanding Issues Regarding Data Gathering and Network Sampling

Theoretical perspectives and promising findings from social network analysis have begun to influence the social and behavioral sciences. Sophisticated techniques now exist to analyze many kinds of network data ( Carrington, Scott, & Wasserman, 2005 ; Doreian, Batagelj, & Ferligoj, 2005b ; Marsden, 2005 ; Snijders et al., 2007 ). However, it is unknown how network-based study designs affect analyses that apply social network analysis to understand selection and influence processes in the context of behaviors like drug use or exercise. Confusion exists about how to assess the appropriateness, measurement aspects, and reliability of network data collected cross-sectionally or longitudinally. From empirical research, social network theory and methods, we know quite a bit about the statistical properties of networks measured completely, where data are gathered from every social actor about every other social actor on all relations. We lack such knowledge for social network studies where data sets are incomplete—that is, when information on network relationships and individual attributes is missing (by chance or by design) or when it is studied at the local or personal level.

Data gathering and analysis for studies involving networks have never been easy. Random sampling in any of its forms, despite being a standard approach in cross-sectional and longitudinal social and behavioral research, is simply not appropriate in a network based study: sampled individuals are unlikely to have relational ties to each other ( O. Frank, 2005 ; Golinelli et al., 2010 ; Goodman, 1961 ; McCarty, 2002 ; McCarty, Killworth et al., 2007 ; Thompson, 2002 ). Consequently, network researchers have developed a variety of sampling techniques specific to studies with network aspects. The “complete” network approach involves collecting data from every member of a target population. Although very common in studies based on small, closed populations, it is not feasible to use in large-scale studies.

Snowball sampling—and its more general forms, respondent-driven sampling and link-tracing—are more applicable to network studies but less frequently used, having largely unknown effects on the measurement of network properties and model fits ( O. Frank, 2005 ; Goodman, 1961 ). Personal and egocentric network approaches ( McCarty, 2002 ; McCarty, Killworth et al., 2007 ), in which respondents are asked to report on their relational ties and on their perceptions of social structure among network actors, are most often implemented in large-scale behavioral studies, but their relationship to actual social structures is largely unknown and generate a great deal of what is, in essence, missing data ( Golinelli et al., 2010 ).

Modeling sampled network data (including personal network data) generates statistical problems ( Holland & Leinhardt, 1973 ; Robins et al., 2004 ). Much of network analysis is based on the assumption that data are from complete networks. Any other data collection approach may introduce errors into parameter estimates or yield incorrect models, but it is currently unknown what kind and how much error may be introduced or how it may affect a study’s outcomes. Until now, no systematic investigations have examined how sampling approaches and the associated data loss affect models that link behaviors to the social context, expressed in terms of social networks. Deeper knowledge of how missing data, whether they arise by chance or by design, might affect studies that incorporate features of the social world through network indices has important ramifications for network statistical models.

Complete network data assume (and may even require) close to 100% participation rates under the analytic limits of currently available statistical techniques. Analytic techniques for complete networks are more developed statistically ( Carrington et al., 2005 ; Doreian et al., 2005b ; O. Frank, 2005 ; Goodman, 1961 ; Marsden, 2005 ; Snijders et al., 2007 ; Thompson, 2002 ), but they require data not available in traditional designs and do not naturally lend themselves to standard statistical approaches. In sum, because the complete network tradition argues that the structural properties of networks map the environment, the standards of data collection are different from and more difficult than traditional studies ( O. Frank, 2005 ; Goodman, 1961 ; Thompson, 2002 ). Little is known, if anything, about how data missing by chance in complete network studies affects estimates of individual and network-wide indices. Even the simplest indicators may be strongly biased and lead to erroneous findings regarding the relationship between network position and actual behavior ( Holland & Leinhardt, 1973 ).

Each type of data collection brings its own type of bias. Complete network data collection approaches can either allow respondents to nominate as many friends as they wish (a “free” choice design) or can fix the number of nominations to a set number (a “fixed” choice, sampled design). Fixed-choice network studies, pioneered by Frank ( O. Frank, 2005 ; Thompson, 2002 ), require fewer resources to collect and thus are more likely to be implemented. In a snowball network sample ( Goodman, 1961 ), the members of a set of sampled respondents report on the people to whom they have ties of a specific kind. All of these newly nominated individuals constitute the “first-order” zone of the network. The researcher then samples from the new list and gathers all additional respondents (those nominated by those in the first-order zone not among the original respondents or those in this zone). These additional respondents constitute the “second-order” zone. Snowballing proceeds through several zones.

More general “chain methods” (methods designed to trace ties through a network from a source to an end) can be used in practice. Chain methods include snowball sampling, respondent-driven sampling (where the respondents aid the investigator in finding and interviewing actors in more distant zones), link-tracing designs (where the focus is on specific types of relationships rather than specific types of people), and the “small world” technique (where the goal is to study how many steps it takes for a random sample of individuals to reach the same target node).

In each type of data collection, selection bias is associated with each respondent’s nominees. However, this bias will vary from individual to individual, and thus should not introduce systematic changes into the data set that cannot be accounted for in an individual error term. More important is the bias that a fixed number of nominations will introduce into a study. Unfortunately, the impact of fixed choice designs on network indices has not been fully investigated. For example, in the Framingham Heart Study ( Dawber et al., 1963 ), so thoroughly analyzed by Christakis and Fowler (2008) in recent years, respondents were asked to name only a small number (no more than 3) of friends, and only some of these ties were actually studied. The network analyzed consisted only of ties among Framingham residents. If a respondent said his best friend lived in New Hampshire, then that tie was not included in the analysis and no data were gathered on such friends. Hence, friendship ties were sampled. We know little about how sampled data, even data sampled in such systematic ways as fixed-choice and snowball sampling, affect analyses. In the research associated with network sampling mentioned previously, the primary focus is on estimating network properties (e.g., the average number of ties per actor, the degree of reciprocity, the level of transitivity, the density of the relation under study, or the frequencies of ties between subgroups of actors based on the sampled units), not on how these study designs affect other analyses that link network variables to behavioral. Little is known about how data missing by design in complete network studies affect estimates of individual and network-wide indices. The worst case scenario is that statistical results are simply inaccurate.

The “personal” approach represents the most typical way in which network data are included in behavioral and epidemiological studies. The focus is on the ego. Data are gathered about ego’s social ties or relationships among his or her actors with a “name generator” (e.g., who the ego is friends with, depends on for support, etc.; McCarty [2002] ; McCarty, Killworth et al., [2007] ). Attribute information and information on interactions among actors can be collected in a standardized battery on each name (e.g., gender, age, ethnicity, attitudes) and can be used to examine, for example, the influence of network homogeneity on structural issues (e.g., network density) and contextual issues (e.g., the likelihood of smoking or drinking).

The personal network data collection technique produces data very different from “complete” social network data because network information is collected from one individual in the network (the respondent, or ego) rather than from each member of the network. The result is a relational matrix in which most individuals are structurally separated from each other. One advantage of the personal network approach is that the social network of each ego is not bounded by any irrelevant framework such as the members of one organization or of one church. Personal network (egocentric) data collection tasks are more often included in standard behavioral research designs than are complete network data collection tasks; however, these data tend to be underutilized.

In sum, personal network data collection does fit with standard survey approaches and offers the potential to test network-based hypotheses in behavioral research. However, it is unknown how network-based statistical approaches are affected by the missing data inherent to such designs.

Deeper knowledge of how missing data, whether absent by chance or by design, might affect studies that incorporate features of the social world through network indices has ramifications for a vast number of current and planned behavioral studies. Thus, research on the effects of missing data has the potential to influence entire research disciplines and could possibly lead to revisions in the way we understand the interactions between social context and behaviors. The extent to which study design impacts estimates of network structure and composition and subsequent models that include those estimates is as yet unknown.

Conclusion: Social Network Analysis Is an Important and Evolving Approach for Social Systems Analysis

In this chapter we have attempted to provide a comprehensive introduction to social network analysis and related descriptive, analytic, and statistical approaches, pointing out why an understanding of these ideas is important for the appropriate application of new statistical models that link networks and behavior. Despite open questions associated with data collection approaches and missing data, social network approaches remain integral to understanding social systems where relationships are considered. Social psychologists use these methods for understanding peer effects on attitudes ( de la Haye, Robins, Mohr, & Wilson, 2010 , 2011a , 2011b ). Network methods are used in public health to model the transmission of disease vectors and the spread of health behaviors ( Koehly et al., 2003 ). Indeed some of the earliest network models were developed to understand the transmission of HIV and the best mechanisms for disseminating safer-sex practices ( Goodreau et al., 2010 ; Goodreau & Golden, 2007 ; Kennedy et al., 2010 ). Communications researchers use these approaches to understand the diffusion of innovation or the evolution of communication networks ( Contractor, Wasserman, & Faust, 2006 ; Monge & Contractor, 2003 ; Monge, Heiss, & Margolin, 2008 ; Monge et al., 2011 ; Shumate & Palazzolo, 2010 ; Su, Huang, & Contractor, 2010 ). Organizational researchers use the methods to explore how formal and informal networks shape organizational behavior ( Shumate & Dewitt, 2008 ; Shumate & Lipp, 2008 ). Developmental psychologists use social network analysis to understand how social processes and peers impact children’s transition to adolescence and adolescents’ transition to adulthood ( Espelage, Green, & Polanin 2012 ; Espelage, Green, & Wasserman, 2007a ; Gest, Davidson, Rulison, Moody, & Welsh, 2007 ; Gest, Graham-Bermann, & Hartup, 2001 ; Gest, Rulison, Davidson, & Welsh, 2008 ; Goodreau, Kitts, & Morris, 2009 ; Pollard, 2009 ; Poteat et al., 2007 ; Valente, Fujimoto, Chou, & Spruijt-Metz, 2009 ; Valente et al., 2004 ; Valente, Unger, & Johnson, 2005 ; Veenstra, Lindenberg, Munniksma, & Dijkstra, 2010 ; Veenstra et al., 2007 ). Social workers use these approaches to understand the provision of social support among friends and relatives, particularly in the context of at-risk populations or populations with chronic health issues ( Green, Atuyambe, Ssali, Ryan, & Wagner, 2011 ; Tucker et al., 2009 ). Sociologists use social network analysis to understand how social position translates into social capital.

We believe that the diversity of social network approaches, coupled with the diversity of active research areas, highlights the relevance of social networks research in the social and behavioral sciences. The networks paradigm, because it accommodates so many research questions and remains an analog to much of our anecdotal understanding of social systems (“a friend of a friend is my friend,” “birds of a feather flock together”), will continue to gain prominence among social researchers and may quite possibly emerge as one of the key paradigms for understanding the social world ( Barabasi, 2003 ; Christakis & Fowler, 2009 ; Watts, 2004 ).

Future Directions

Throughout this chapter we have pointed out future directions, difficult problems, and important topics that will drive the development of social network analysis and its application in behavioral and social sciences. Here, we summarize some of the key questions we believe to be important in the coming years. In no particular order, they are:

What are the ethical considerations associated with collecting network data, particularly when study participants are likely to provide information about their relationships with others who may have opted not to participate in the study? When does describing the strength of your relationship with another person and providing a description of that person become a “secondary subjects” research issue? What are the best solutions to these problems, given that network studies are becoming more frequent and much larger?

How do key social network study designs and network sampling approaches (and, more broadly, missing network data) impact network indices at the nodal, dyadic, subgroup, and summary levels?

How do key social network study designs and network sampling approaches (and, more broadly, missing network data) impact models (particularly those we describe here) that incorporate those network indices, linking network features to individual behaviors and attitudes?

What is the best mechanism for comparing network studies (whose findings are generally considered relevant only to the population studied) that vary with respect to the size of networks analyzed, the kind of relations investigated, the behaviors/attitudes of interest, and so forth. How might we, for example, compare findings from a study of networks and adolescent drug use in a rural high school with a study of networks and gang violence in a maximum security prison?

What are the limits of generalizability for new network statistical models? That is, if we find evidence that the influence of best friends is strong with respect to initiation of smoking among mid-western adolescents in a school-based study, can we assume that this will hold, in general, for all adolescents? For adults?

Author Note

Of course, we assume perfect data, and avoid for now discussions of measurement error and recall bias (which are discussed in Chapter 2 of Wasserman & Faust [1994] and Chapter 2 of Carrington, Scott, & Wasserman).

Ahuja, G. ( 2000 ). Collaboration networks, structural holes, and innovation: A longitudinal study. Administrative Science Quarterly, 45 (3), 425–55.

Google Scholar

Ahuja, M. K. , & Carley, K. M. ( 1998 ). Journal of computer-mediated communication. Organization Science, 3 (4), 0. doi: 10.1111/j.1083-6101.1998.tb00079.x

Albert, R. , Albert, I. , & Nakarado, G. L. ( 2004 ). Structural vulnerability of the North American power grid. Physical Review E, 69 (2 Pt 2), 025103.

Alderson, A. S. , & Beckfield, J. ( 2004 ). Power and position in the world city system. American Journal of Sociology, 109 (4), 811–51.

Amaral, L. A. N. , Scala, A. , Barthelemy, M. , & Stanley, H. E. ( 2000 ). Classes of small-world networks. Proceedings of the National Academy of Sciences of the United States of America, 97 (21), 11149–52.

Angel, R. , & Tienda, M. ( 1982 ). Determinants of extended household structure: Cultural pattern or economic need? American Journal of Sociology, 87 (6), 1360–83.

Aral, S. , Muchnik, L. , & Sundararajan, A. ( 2009 ). Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106 (51), 21544–9. doi: 10.1073/pnas.0908800106

Barabasi, A. L. ( 2002 ). Linked: The new science of networks . Cambridge, MA: Perseus Publishing.

Google Preview

Barabasi, A. L. ( 2003 ). Linked: How everything is connected to everything else and what it means . New York: Plume.

Barabasi, A. L. , & Albert, R. ( 1999 ). Emergence of scaling in random networks. Science, 286 (5439), 509–12.

Barnett, G. A. ( 2001 ). A longitudinal analysis of the international telecommunication network, 1978–1996. American Behavioral Scientist, 44 (10), 1638–55.

Batagelj, V. , & Mrvar, A. ( 1998 ). Pajek: A program for large network analysis. Connections, 21 (2), 47–57.

Bauman, K. E. , & Ennett, S. T. ( 1996 ). On the importance of peer influence for adolescent drug use: Commonly neglected considerations. Addiction, 91 (2), 185–98.

Bearman, P. S. ( 1997 ). Generalized exchange. American Journal of Sociology, 102 (5), 1383–415.

Bearman, P. S. , Moody, J. , & Stovel, K. ( 2004 ). Chains of affection: The structure of adolescent romantic and sexual networks. American Journal of Sociology, 110 (1), 44–91.

Benson, J. K. ( 1975 ). The interorganizational network as a political economy. Administrative Science Quarterly, 20(2), 229–49.

Bernard, H. R. , Johnsen, E. C. , Killworth, P. D. , & McCarty, C. ( 1990 ). Comparing four different methods for measuring personal social networks. Social Networks, 12 (3), 179–215. doi: 10.1016/0378-8733(90)90005-t

Bienenstock, E. J. , Bonacich, P. , & Oliver, M. ( 1990 ). The effect of network density and homogeneity on attitude polarization. Social Networks, 12 (2), 153–72. doi: 10.1016/0378-8733(90)90003-R

Bonacich, P. ( 1987 ). Power and centrality: A family of measures. American Journal of Sociology, 92 (5), 1170–82.

Borgatti, S. P. ( 2002 ). Netdraw (social network analysis software) (Version 1.0.0.21) [computer program]. Lexington, KY: Analytic Technologies.

Borgatti, S. P. ( 2005 ). Centrality and network flow. Social Networks, 27 , 55–71.

Borgatti, S. P. , & Cross, R. ( 2003 ). A relational view of information seeking and learning in social networks. Management Science, 49 (4), 432–45.

Borgatti, S. P. , & Everett, M. ( 1992 ). Notions of position in social network analysis. Sociological Methodology, 22 , 1–35.

Borgatti, S. P. , & Everett, M. G. ( 1997 ). Network analysis of 2-mode data. Social Networks, 19 (3), 243–69.

Borgatti, S. P. , & Foster, P. C. ( 2003 ). The network paradigm in organizational research: A review and typology. Journal of Management, 29 (6), 991–1013.

Borgatti, S. P. , & Halgin, D. ( 2010 ). Analyzing affiliation networks. In P. Carrington & J. Scott (Eds.), The sage handbook of social network analysis (pp. 417–433). London: Sage Publications.

Borgatti, S. P. , & Halgin, D. S. ( 2011 ). On network theory. Organization Science, [ePub ahead of print] . doi: 10.1287/orsc.1100.0641

Borgatti, S. P. , Jones, C. , & Everett, M. G. ( 1998 ). Network measures of social capital. Connections, 21 (2), 1–36.

Borgatti, S. P. , Mehra, A. , Brass, D. J. , & Labianca, G. ( 2009 ). Network analysis in the social sciences. Science, 323 (5916), 892–5. doi: 10.1126/science.1165821

Borgatti, S. P. , & Molina, J. ( 2005 ). Toward ethical guidelines for network research in organizations. Social Networks, 27 (2), 107–17. doi: 10.1016/j.socnet.2005.01.004

Borgatti, S. P. , & Molina, J. L. ( 2003 ). Ethical and strategic issues in organizational social network analysis. Journal of Applied Behavioral Science, 39 (3), 337–49. doi: 10.1177/0021886303258111

Borner, K. , Chen, C. M. , & Boyack, K. W. ( 2003 ). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37 , 179–255.

Brandes, U. , Delling, D. , Gaertler, M. , Gorke, R. , Hoefer, M. , Nikoloski, Z. , & Wagner, D. ( 2008 ). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering, 20 (2), 172–88. doi: 10.1109/Tkde.2007.190689

Brandes, U. , Fleischer, D. , & Puppe, T. ( 2007 ). Dynamic spectral layout with an application to small worlds. Journal of Graph Algorithms and Applications, 11 (2), 325–43.

Brandes, U. , & Lerner, J. ( 2008 ). Visual analysis of controversy in user-generated encyclopedias. Information Visualization, 7 (1), 34–48. doi: 10.1057/palgrave.ivs.9500171

Brandes, U. , & Lerner, J. ( 2010 ). Structural similarity: Spectral methods for relaxed blockmodeling. Journal of Classification, 27 (3), 279–306.

Brandes, U. , & Pich, C. ( 2009 ). An experimental study on distance-based graph drawing (extended abstract). Graph Drawing, 5417 , 218–29.

Brandes, U. , & Pich, C. ( 2011 ). More flexible radial layout. Journal of Graph Algorithms and Applications, 15 (1), 157–73.

Brandes, U. , & Wagner, D. ( 2003 ). visone – analysis and visualization of social networks. In M. Junger & P. Mutzel (Eds.), Graph Drawing Software (pp. 321–340). Springer- Verlag.

Breiger, R. L. ( 1974 ). The duality of persons and groups. Social Forces, 53 (2), 181–90.

Breiger, R. L. , Boorman, S. A. , & Arabie, P. ( 1975 ). An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. Journal of Mathematical Psychology, 12 (3), 328–83. doi: 10.1016/0022-2496(75)90028-0

Breiger, R. L. , & Ennis, J. G. ( 1997 ). Generalized exchange in social networks: Statistics and structure. L’Annee sociologique, 47 , 73–88.

Bullers, S. , Cooper, M. L. , & Russell, M. ( 2001 ). Social network drinking and adult alcohol involvement: A longitudinal exploration of the direction of influence. Addictive Behaviors, 26 (2), 181–99. doi: 10.1016/S0306-4603(00)00099-X

Burt, R. S. ( 1980 ). Models of network structure. Annual Review of Sociology, 6 , 79–141.

Burt, R. S. ( 1984 ). Network items and the general social survey. Social Networks, 6 , 293–339.

Burt, R. S. ( 1986 ). A note on sociometric order in the general social survey network data. Social Networks, 8 , 149–74.

Burt, R. S. ( 1987 a). A note on missing network data in the general social survey. Social Networks, 9 , 63–73.

Burt, R. S. ( 1987 b). A note on the general social survey’s ersatz network item. Social Networks, 9 , 73–85.

Burt, R. S. ( 1987 c). Social contagion and innovation—cohesion versus structural equivalence. American Journal of Sociology, 92 (6), 1287–335.

Burt, R. S. ( 1992 ). Structural holes: The social structure of competition . Cambridge, MA: Harvard Press.

Burt, R. S. ( 2004 ). Structural holes and good ideas. American Journal of Sociology, 110 (2), 349–99.

Burt, R. S. ( 2005 ). Brokerage & closure: An introduction to social capital . Oxford, UK: Oxford University Press.

Burt, R. S. , & Guilarte, M. G. ( 1986 ). A note on scaling the general social survey network data. Social Networks, 8 , 387–96.

Cairns, R. B. , Cairns, B. D. , Neckerman, H. J. , Gest, S. D. , & Gariepy, J. L. ( 1988 ). Social networks and aggressive-behavior—peer support or peer rejection. Developmental Psychology, 24 (6), 815–23.

Callaway, D. S. , Newman, M. E. J. , Strogatz, S. H. , & Watts, D. J. ( 2000 ). Exact solution of percolation on random graphs with arbitrary degree distributions. Physical Review Letters, 85 , 5468–71.

Campbell, K. E. , & Lee, B. A. ( 1991 ). Name generators in surveys of personal networks. Social Networks, 13 (3), 203–21.

Carrington, P. J. , Scott, J. , & Wasserman, S. ( 2005 ). Models and methods in social network analysis . New York: Cambridge University Press.

Cartwright, D. , & Harary, F. ( 1956 ). Structural balance: A generalization of Heider’s theory. Psychological Review, 63 (5), 277–93. doi: 10.1037/h0046049

Cartwright, D. , & Harary, F. ( 1979 ). Balance and clusterability: An overview. In P. W. Holland & S. Leinhardt (Eds.), Perspectives on social network research (pp. 25–50). New York: Academic Press.

Christakis, N. A. , & Fowler, J. H. ( 2007 ). The spread of obesity in a large social network over 32 years. New England Journal of Medicine, 357 (4), 370–9.

Christakis, N. A. , & Fowler, J. H. ( 2008 ). The collective dynamics of smoking in a large social network. New England Journal of Medicine, 358 (21), 2249–58. doi: 10.1056/NEJMsa0706154

Christakis, N. A. , & Fowler, J. H. ( 2009 ). Connected: The surprising power of our social networks and how they shape our lives . New York: Little Brown.

Clauset, A. , Newman, M. E. J. , & Moore, C. ( 2004 ). Finding community structure in very large networks. Physical Review E, 70 (6). doi: 10.1103/Physreve.70.066111

Cohen, J. M. ( 1977 ). Sources of peer group homogeneity. Sociology of Education, 50 (4), 227–41.

Contractor, N. S. , Wasserman, S. , & Faust, K. ( 2006 ). Testing multitheoretical, multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review, 31 (3), 681–703. doi: 10.2307/20159236

Corum, J. (June 20, 2011). Only connect, New York Times . Retrieved from http://www.nytimes.com/interactive/2011/06/20/science/brain.html . Accessed April 27, 2013.

Cross, R. , Parker, A. , & Borgatti, S. ( 2002 ). Making invisible work visible: Using social network analysis to support strategic collaboration. California Management Review, 44 (2), 25–46.

Cummings, J. N. , & Cross, R. ( 2003 ). Structural properties of work groups and their consequences for performance. Social Networks, 25 (3), 197–210. doi: 10.1016/S0378-8733(02)00049-7

Danon, L. , Diaz-Guilera, A. , Duch, J. , & Arenas, A. ( 2005 ). Comparing community structure identification. Journal of Statistical Mechanics-Theory and Experiment . doi: 10.1088/1742-5468/2005/09/P09008

Davis, J. A. ( 1967 ). Clustering and structural balance in graphs. Human Relations, 20 (2), 181-187. doi: 10.1177/001872676702000206

Dawber, T. R. , Kannel, W. B. , & Lyell, L. P. ( 1963 ). An approach to longitudinal studies in a community: The Framingham study. Annals of the New York Academy of Sciences, 107 , 539–56.

de la Haye, K. , Robins, G. , Mohr, P. , & Wilson, C. ( 2010 ). Obesity-related behaviors in adolescent friendship networks. Social Networks, 32 (3), 161–7. doi: 10.1016/j.socnet.2009.09.001

de la Haye, K. , Robins, G. , Mohr, P. , & Wilson, C. ( 2011 a). Homophily and contagion as explanations for weight similarities amongst adolescent friends. Journal of Adolescent Health, (Epub ahead of print) . doi: 10.1016/j.jadohealth.2011.02.008

de la Haye, K. , Robins, G. , Mohr, P. , & Wilson, C. ( 2011 b). How physical activity shapes, and is shaped by, adolescent friendships. Social Science & Medicine .

DiMaggio, P. ( 1986 ). Structural analysis of organizational fields: A blockmodel approach. Research in Organizational Behavior, 8 , 335–70.

Donetti, L. , & Munoz, M. A. ( 2004 ). Detecting network communities: A new systematic and efficient algorithm. Journal of Statistical Mechanics-Theory and Experiment . doi: 10.1088/1742-5468/2004/10/P10012

Doreian, P. , Batagelj, V. , & Ferligoj, A. ( 2004 ). Generalized blockmodeling of two-mode network data. Social Networks, 26 (1), 29–53. doi: 10.1016/j.socnet.2004.01.002

Doreian, P. , Batagelj, V. , & Ferligoj, A. ( 2005 a). Generalized blockmodeling . New York: Cambridge University Press.

Doreian, P. , Batagelj, V. , & Ferligoj, A. ( 2005 b). Positional analyses of sociometric data. In P. J. Carrington , J. Scott & S. Wasserman (Eds.), Models and methods in social network analysis (pp. 77–97). New York: Cambridge University Press.

Duch, J. , & Arenas, A. ( 2005 ). Community detection in complex networks using extremal optimization. Physical Review E, 72 (2). doi: 10.1103/PhysRevE.72.027104

Ennett, S. T. , & Bauman, K. E. ( 1994 ). The contribution of influence and selection to adolescent peer group homogeneity: The case of adolescent cigarette smoking. Journal of Personality and Social Psychology, 67 (4), 653–63.

Erickson, B. ( 1988 ). The relational basis of attitudes. In B. Wellman & S. Berkowitz (Eds.), Social structures: A network approach (pp. 99–121). Cambridge, UK: Cambridge University Press.

Espelage, D. L. , Green, H. D., Jr. , & Polanin J. ( 2012 ). Willingness to intervene in bullying episodes among middle school students: Individual and peer group influences. Journal of Early Adolescence . doi: 10.1177/0272431611423017.

Espelage, D. L. , Green, H. D. , & Wasserman, S. ( 2007 a). Friendship patterns and bullying perpetration among youth: Application of p * analysis. In P. C. Rodkin & L. D. Hanish (Eds.), Social network analysis and children’s peer relations: New directions for child and adolescent development (Vol. 118, pp. 61–75). San Francisco, CA: Jossey-Bass.

Espelage, D. L. , Green, H. D. , & Wasserman, S. ( 2007 b). Statistical analysis of friendship patterns and bullying behaviors among youth. New Directions for Child and Adolescent Development, 2007 (118), 61–75. doi: 10.1002/cd.201

Farmer, T. W. , & Rodkin, P. C. ( 1996 ). Antisocial and prosocial correlates of classroom social positions: The social network centrality perspective. Social Development, 5 (2), 174–88.

Faust, K. ( 1988 ). Comparison of methods for positional analysis—structural and general equivalences. Social Networks, 10 (4), 313–41.

Faust, K. , & Romney, A. K. ( 1985 ). Does structure find structure? A critique of Burt’s use of distance as a measure of structural equivalence. Social Networks, 7 (1), 77–103. doi: 10.1016/0378-8733(85)90009-7

Feld, S. L. ( 1981 ). The focused organization of social ties. American Journal of Sociology, 86 (5), 1015–35.

Fernandez, R. M. , & Gould, R. V. ( 1994 ). A dilemma of state power—brokerage and influence in the national-health policy domain. American Journal of Sociology, 99 (6), 1455–91.

Festinger, L. , Schachter, S. , & Back, K. ( 1967 ). Social pressures in informal groups: A study of human factors in housing . Stanford: Stanford University Press.

Fienberg, S. E. , & Wasserman, S. S. ( 1981 ). Categorical data analysis of single sociometric relations. Sociological Methodology, 11 , 156–92.

Fortunato, S. ( 2010 ). Community detection in graphs. Physics Reports-Review Section of Physics Letters, 486 (3–5), 75–174. doi: 10.1016/j.physrep.2009.11.002

Fortunato, S. , Latora, V. , & Marchiori, M. ( 2004 ). Method to find community structures based on information centrality. Physical Review E, 70 (5). doi: 10.1103/Phvsreve.70.056104

Fowler, J. H. , & Christakis, N. A. ( 2008 ). Dynamic spread of happiness in a large social network: Longitudinal analysis over 20 years in the Framingham heart study. British Medical Journal, 337 . doi: 10.1136/Bmj.A2338

Frank, K. A. ( 1995 ). Identifying cohesive subgroups. Social Networks, 17 , 27–56.

Frank, O. ( 2005 ). Network sampling and model fitting. In P. Carrington , J. Scott & S. Wasserman (Eds.), Models and methods in social network analysis (pp. 31–56). New York: Cambridge University Press.

Freeman, L. ( 1977 ). A set of measures of centrality based upon betweenness. Sociometry, 40 , 35–41.

Freeman, L. ( 1978 ). Centrality in social networks—conceptual clarification. Social Networks, 1 , 215–39.

Freeman, L. ( 1979 ). Centrality in social networks—conceptual clarification. Social Networks, 1 , 125–39.

Freeman, L. ( 1992 ). The sociological concept of group: An empirical test of two models. American Journal of Sociology, 98 (1), 152–66.

Freeman, L. , Romney, A. K. , & Freeman, S. C. ( 1987 ). Cognitive structure and informant accuracy. American Anthropologist, 89 (2), 310–25.

Friedkin, N. E. ( 1980 ). A test of structural features of Granovetter’s strength of weak ties theory. Social Networks, 2 (4), 411–22. doi: 10.1016/0378-8733(80)90006-4

Friedkin, N. E. ( 1991 ). Theoretical foundations for centrality measures. American Journal of Sociology, 96 , 1478–504.

Friedkin, N. E. , & Johnsen, E. C. ( 1997 ). Social positions in influence networks. Social Networks, 19 (3), 209–22. doi: 10.1016/S0378-8733(96)00298-5

Friedkin, N. E. , & Slater, M. R. ( 1994 ). School leadership and performance—a social network approach. Sociology of Education, 67 (2), 139–57.

Galaskiewicz, J. , Wasserman, S. , Rauschenbach, B. , Bielefeld, W. , & Mullaney, P. ( 1985 ). The influence of corporate power, social status, and market position on corporate interlocks in a regional network. Social Forces, 64 (2), 403–31.

Gest, S. D. , Davidson, A. J. , Rulison, K. L. , Moody, J. , & Welsh, J. A. ( 2007 ). Features of groups and status hierarchies in girls’ and boys’ early adolescent peer networks. New Directions for Child and Adolescent Development, 2007 (118), 43–60.

Gest, S. D. , Graham-Bermann, S. A. , & Hartup, W. W. ( 2001 ). Peer experience: Common and unique features of number of friendships, social network centrality, and sociometric status. Social Development, 10 (1), 23–40.

Gest, S. D. , Rulison, K. L. , Davidson, A. J. , & Welsh, J. A. ( 2008 ). Children’s academic reputations among peers: Longitudinal associations with academic self-concept, effort and performance. Developmental Psychology, 44 , 625–36.

Gile, K. J. , & Handcock, M. S. ( 2006 ). Model based assessment of the impact of missing data on inference for networks . Center for Statistics and the Social Sciences. Seattle, WA.

Gile, K. J. , & Handcock, M. S. ( 2010 ). Respondent driven sampling: An assessment of current methodology. Sociological Methodology, 40 (1), 285–327. doi: 10.1111/j.1467-9531.2010.01223.x

Girvan, M. , & Newman, M. E. J. ( 2002 ). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99 (12), 7821–6. doi: 10.1073/pnas.122653799

Go, M.-H. , Green, H. D. , Kennedy, D. P. , Pollard, M. S. , & Tucker, J. S. ( 2010 ). Peer influence and selection effects on adolescent smoking. Drug and Alcohol Dependence, 109 (1–3), 239–42. doi: 10.1016/j.drugalcdep.2009.12.017

Golinelli, D. , Ryan, G. , Green, H. D. , Kennedy, D. P. , Tucker, J. S. , & Wenzel, S. L. ( 2010 ). Sampling to reduce respondent burden in personal network studies and its effect on estimates of structural measures. Field Methods, 22 (3), 217–30. doi: 10.1177/1525822x10370796

Goodman, L. A. ( 1961 ). Snowball sampling. The Annals of Mathematical Statistics, 32 (1), 148–70.

Goodreau, S. M. , Cassels, S. , Kasprzyk, D. , Montano, D. E. , Greek, A. , & Morris, M. ( 2010 ). Concurrent partnerships, acute infection and HIV epidemic dynamics among young adults in zimbabwe. AIDS and Behavior . doi: 10.1007/s10461-010-9858-x

Goodreau, S. M. , & Golden, M. R. ( 2007 ). Biological and demographic causes of high HIV and STD prevalence in men who have sex with men. Sexually Transmitted Infections, 83 (6), 458–62. doi: 10.1136/sti.2007.025627

Goodreau, S. M. , Kitts, J. A. , & Morris, M. ( 2009 ). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography, 46 (1), 103–25. doi: 10.1353/dem.0.0045

Gould, R. ( 1989 ). Power and social structure in community elites. Social Forces, 68 , 531–52.

Granovetter, M. ( 1973 ). The strength of weak ties. American Journal of Sociology, 78 (6), 1360–80.

Granovetter, M. ( 1976 ). Network sampling: Some first steps. American Journal of Sociology, 83 , 1287–303.

Granovetter, M. ( 1983 ). The strength of weak ties: A network theory revisited. Sociological Theory, 201–33.

Green, H. D. , Atuyambe, L. , Ssali, S. , Ryan, G. , & Wagner, G. ( 2011 ). Social networks of plhas in uganda: Implications for mobilizing plha as agents for prevention. AIDS and Behavior, 15 (5), 992–1002.

Green, H. D. , Contractor, N. S. , & Yao, Y. ( 2006 ). CI-KNOW: Cyberinfrastructure knowledge networks on the web. A social network enabled recommender system for locating resources in cyberinfrastructures. Eos, Transactions, American Geophysical Union, 87 (52).

Guimera, R. , Mossa, S. , Turtschi, A. , & Amaral, L. A. ( 2005 ). The worldwide air transportation network: Anomalous centrality, community structure, and cities’ global roles. Proceedings of the National Academy of Sciences of the United States of America, 102 (22), 7794–9. doi: 10.1073/pnas.0407994102

Guimera, R. , Sales-Pardo, M. , & Amaral, L. A. N. ( 2004 ). Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70 (2). doi: 10.1103/Physreve.70.025101

Gustafsson, M. , Hornquist, M. , & Lombardi, A. ( 2006 ). Comparison and validation of community structures in complex networks. Physica A: Statistical Mechanics and Its Applications, 367 , 559–76. doi: 10.1016/j.physa.2005.12.017

Hafner-Burton, E. M. , Kahler, M. , & Montgomery, A. H. ( 2009 ). Network analysis for international relations. International Organization, 63 , 559–92. doi: 10 + 10170S0020818309090195

Hagan, J. M. ( 1998 ). Social networks, gender, and immigrant incorporation: Resources and constraints. American Sociological Review, 63 (1), 55–67.

Hall, J. A. , & Valente, T. W. ( 2007 ). Adolescent smoking networks: The effects of influence and selection on future smoking. Addictive Behaviors, 32 (12), 3054–9. doi: 10.1016/j.addbeh.2007.04.008

Hallinan, M. ( 1980 ). Patterns of cliquing among youth. In H. C. Foot , A. J. Chapman & J. R. Smith (Eds.), Friendship and social relations in children (pp. 321–342). New York: John Wiley and Sons.

Handcock, M. , Hunter, D. R. , Butts, C. , Goodreau, S. M. , & Morris, M. ( 2003 ). Statnet: Software tools for the analysis, simulation, and visualization of network data (Version 3). Seattle, WA: Statnet Project. Retrieved from http://www.statnetproject.org . Accessed April 27, 2013.

Hanneman, R. , & Riddle, M. ( 2005 ). Introduction to social network methods . Riverside, CA: University of California, Riverside.

Hastings, M. B. ( 2006 ). Community detection as an inference problem. Physical Review E, 74 (3). doi: 10.1103/Physreve.74.035102

Heer, J. , & Agrawala, M. ( 2008 ). Design considerations for collaborative visual analytics. Information Visualization, 7 (1), 49–62. doi: 10.1057/palgrave.ivs.9500167

Heider, F. ( 1946 ). Attitudes and cognitive orientation. Journal of Psychology, 21 .

Hogan, B. , Carrasco, J. A. , & Wellman, B. ( 2007 ). Visualizing personal networks: Working with participant-aided sociograms. Field Methods, 19 (2), 116–44. doi: 10.1177/1525822x06298589

Holland, P. W. , & Leinhardt, S. ( 1971 ). Transitivity in structural models of small groups. Comparative Group Studies, 2 (2), 107–24.

Holland, P. W. , & Leinhardt, S. ( 1973 ). The structural implications of measurement error in sociometry. Journal of the American Mathematical Society, 3 , 85–111.

Hunter, D. R. , Handcock, M. S. , Butts, C. T. , Goodreau, S. M. , & Morris, M. ( 2008 ). Ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software, 24 (3), 1–29.

Ibarra, H. ( 1993 ). Network centrality, power, and innovation involvement: Determinants of technical and administrative roles. Academy of Management Journal, 36 , 471–501.

Johnson, J. C. , Christian, R. R. , Brunt, J. W. , Hickman, C. R. , & Waide, R. B. ( 2010 ). Evolution of collaboration within the us long term ecological research network. Bioscience, 60 (11), 931–40. doi: 10.1525/bio.2010.60.11.9

Johnson, J. C. , & Krempel, L. ( 2004 ). Network visualization: The “Bush Team” in Reuters news ticker 9/11-11/15/01. Journal of Social Structure, 5 (2). http://www.cmu.edu/joss/ content/articles/volume5/JohnsonKrempel/ . Accessed April 27, 2013.

Johnson, J. C. , Luczkovich, J. , & Borgatti, S. ( 2009 ). A continuous-time Markov chain model of the seasonal trophic network dynamics of the Chesapeake Bay. Ecological Modelling, 220 , 3133–3140.

Kadushin, C. ( 2005 ). Who benefits from network analysis: Ethics of social network research. Social Networks, 27 (2), 139–53. doi: 10.1016/j.socnet.2005.01.005

Kandel, D. B. ( 1978 ). Homophily, selection, and socialization in adolescent friendships. The American Journal of Sociology, 84 (2), 427–36.

Kennedy, D. , Wenzel, S. , Tucker, J. , et al. ( 2010 ). Unprotected sex of homeless women living in los angeles county: An investigation of the multiple levels of risk. AIDS and Behavior, 14 (4), 960–73. doi: 10.1007/s10461-009-9621-3

Kenny, D. A. ( 1995 ). The effect of nonindependence on significance testing in dyadic research. Personal Relationships, 2 (1), 67–75. doi: 10.1111/j.1475-6811.1995.tb00078.x

Kenny, D. A. ( 1996 a). The design and analysis of social-interaction research. Annual Review of Psychology, 47 , 59–86. doi: 10.1146/annurev.psych.47.1.59

Kenny, D. A. ( 1996 b). Models of non-independence in dyadic research. [Empirical Study]. Journal of Social and Personal Relationships, 13 (2), 279–94. doi: 10.1177/0265407596132007

Kenny, D. A. , & Cook, W. ( 1999 ). Partner effects in relationship research: Conceptual issues, analytic difficulties, and illustrations. Personal Relationships, 6 (4), 433–48. doi: 10.1111/j.1475-6811.1999.tb00202.x

Kenny, D. A. , & Judd, C. M. ( 1986 ). Consequences of violating the independence assumption in analysis of variance. Psychological Bulletin, 99 (3), 422–31. doi: 10.1037/0033-2909.99.3.422

Kenny, D. A. , Kashy, D. A. , & Bolger, N. ( 1998 ). Data analysis in social psychology. In D. T. Gilbert , S. T. Fiske & G. Lindzey (Eds.), The handbook of social psychology, vols. 1 and 2 (4th ed., pp. 233–65). New York: McGraw-Hill.

Kenny, D. A. , Kashy, D. A. , & Cook, W. L. ( 2006 ). Dyadic data analysis . New York: Guilford Press.

Kenny, D. A. , & La Voie, L. ( 1984 ). The social relations model. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 18, pp. 142–82). Orlando, FL: Academic Press.

Killworth, P. D. , & Bernard, H. R. ( 1976 ). Informant accuracy in social network data. Human Organization, 35 , 269–86.

Killworth, P. D. , & Bernard, H. R. ( 1979 ). Informant accuracy in social network data III: A comparison of triadic structure in behavioral and cognitive data. Social Networks, 2 , 10–46.

Killworth, P. D. , McCarty, C. , Bernard, H. R. , & House, M. ( 2006 ). The accuracy of small world chains in social networks. Social Networks, 28 (1), 85–96. doi: 10.1016/j.socnet.2005.06.001

Knoke, D. , & Kuklinski, J. ( 1982 ). Network analysis . Beverly Hills, CA: Sage Publications.

Knoke, D. , & Yang, S. ( 2008 ). Data collection. Social network analysis (2nd ed.). Los Angeles: Sage Publications.

Kobus, K. ( 2003 ). Peers and adolescent smoking. Addiction, 98 (Suppl 1), 37–55.

Koehly, L. M. , Peters, J. A. , Kuhn, N. , et al. ( 2008 ). Sisters in hereditary breast and ovarian cancer families: Communal coping, social integration, and psychological well-being. Psychooncology, 17 (8), 812–21. doi: 10.1002/pon.1373

Koehly, L. M. , Peterson, S. K. , Watts, B. G. , Kempf, K. K. G. , Vernon, S. W. , & Gritz, E. R. ( 2003 ). A social network analysis of communication about hereditary nonpolyposis colorectal cancer genetic testing and family functioning. Cancer Epidemiology Biomarkers & Prevention, 12 (4), 304–13.

Koskinen, J. H. , & Edling, C. ( 2010 ). Modelling the evolution of a bipartite network—peer referral in interlocking directorates. Dynamics of Social Networks, 34 (3), 309–322. doi: 10.1016/j.socnet.2010.03.001

Kossinets, G. ( 2006 ). Effects of missing data in social networks. Social Networks, 28 (3), 247–68. doi: 10.1016/j.socnet. 2005.07.002

Krackhardt, D. ( 1987 ). Cognitive social-structures. Social Networks, 9 (2), 109–34.

Krapivsky, P. L. , & Redner, S. ( 2001 ). Organization of growing random networks. Physical Review E, 63 (6). doi: 10.1103/PhysRevE.63.066123

Krapivsky, P. L. , Redner, S. , & Leyvraz, F. ( 2000 ). Connectivity of growing random networks. Physical Review Letters, 85 (21), 4629–32.

Kraut, R. E. , Fish, R. S. , Root, R. W. , & Chalfonte, B. L. ( 1993 ). Informal communication in organizations—form, function, and technology. In R. Baecker (Ed.), Groupware and computer supported co-operative work (pp. 287–314). San Francisco: Morgan Kaufmann Publishers.

Krebs, V. E. ( 2002 ). Mapping networks of terrorist cells. Connections, 24 (3), 43–52.

Krivitsky, P. N. ( 2011 ). Exponential-family random graph models for valued networks . Statistics and iLab. Carnegie Mellon University. Pittsburgh.

Lancichinetti, A. , & Fortunato, S. ( 2009 ). Community detection algorithms: A comparative analysis. Physical Review E, 80 (5). doi: 10.1103/Physreve.80.056117

Laumann, E. O. , Marsden, P. , & Prensky, D. ( 1983 ). The boundary specification problem in network analysis. In R. S. Burt & M. J. Minor (Eds.), Applied network analysis: A methodological introduction (pp. 8–34). London: Sage Publications.

Leicht, E. A. , Holme, P. , & Newman, M. E. J. ( 2006 ). Vertex similarity in networks. Physical Review E, 73 (2). doi: 10.1103/Physrevb.73.026120

Li, S. , Armstrong, C. M. , Bertin, N. , et al. ( 2004 ). A map of the interactome network of the metazoan c. Elegans. Science, 303 (5657), 540–3. doi: 10.1126/science.1091403

Lin, N. , Ensel, W. M. , & Vaughn, J. C. ( 1981 ). Social resources and strength of ties—structural factors in occupational-status attainment. American Sociological Review, 46 (4), 393–403.

Lorrain, F. , & White, H. C. ( 1971 ). Structural equivalence of individuals in social networks. Journal of Mathematical Sociology, 1 , 49–80.

Lubbers, M. J. , Molina, J. L. , Lerner, J. , Brandes, U. , Avila, J. , & McCarty, C. ( 2010 ). Longitudinal analysis of personal networks. The case of Argentinean migrants in Spain. Social Networks, 32 (1), 91–104. doi: 10.1016/j.socnet.2009.05.001

Luke, D. A. , & Harris, J. K. ( 2007 ). Network analysis in public health: History, methods, and applications. Annual Review of Public Health, 28 , 69–93. doi: 10.1146/annurev.publhealth.28.021406.144132

Marin, A. , & Hampton, K. N. ( 2007 ). Simplifying the personal network name generator: Alternatives to traditional multiple and single name generators. Field Methods, 19 (2), 163–93. doi: 10.1177/1525822x06298588

Marsden, P. V. ( 1981 ). Introducing influence processes into a system of collective decisions. American Journal of Sociology, 86 , 1203–35.

Marsden, P. V. ( 1987 ). Core discussion networks of Americans. American Sociological Review, 52 (1), 122–31.

Marsden, P. V. ( 1990 ). Network data and measurement. Annual Review of Sociology, 16 , 435–63.

Marsden, P. V. ( 2005 ). Recent developments in network measurement. In P. Carrington , J. Scott & S. Wasserman (Eds.), Models and methods in social network analysis (pp. 8–30). New York: Cambridge University Press.

Marsden, P. V. , & Friedkin, N. E. ( 1993 ). Network studies of social influence. Sociological Methods and Research, 22 , 127–51.

McCarty, C. ( 2002 ). Structure in personal networks. Journal of Social Structure, 3 (1). http://www.cmu.edu/joss/content/ articles/volume3/McCarty.html . Accessed April 27, 2013.

McCarty, C. , Bernard, H. R. , Killworth, P. D. , & Shelley, G. A. ( 1997 ). Eliciting representative samples of personal networks. Social Networks, 19 (4), 303–23. doi: 10.1016/s0378-8733(96)00302-4

McCarty, C. , Killworth, P. D. , & Rennell, J. ( 2007 ). Impact of methods for reducing respondent burden on personal network structural measures. Social Networks, 29 (2), 300–15. doi: 10.1016/j.socnet.2006.12.005

McCarty, C. , Molina, J. L. , Aguilar, C. , & Rota, L. ( 2007 ). A comparison of social network mapping and personal network visualization. Field Methods, 19 (2), 145–62. doi: 10.1177/1525822x06298592

McPherson, M. ( 1982 ). Hypernetwork sampling—duality and differentiation among voluntary organizations. Social Networks, 3 (4), 225–49.

McPherson, M. , Smith-Lovin, L. , & Brashears, M. E. ( 2006 ). Social isolation in America: Changes in core discussion networks over two decades. American Sociological Review, 71 (3), 353–75. doi: 10.1177/000312240607100301

McPherson, M. , Smith-Lovin, L. , & Cook, J. M. ( 2001 ). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27 (1), 415–44. doi: 10.1146/annurev.soc.27.1.415

Mercken, L. , Candel, M. , Willems, P. , & De Vries, H. ( 2007 ). Disentangling social selection and social influence effects on adolescent smoking: The importance of reciprocity in friendships. Addiction, 102 , 1483–92.

Mercken, L. , Snijders, T. A. B. , Steglich, C. , Vartiainen, E. , & de Vries, H. ( 2010 ). Dynamics of adolescent friendship networks and smoking behavior. Social Networks, 32 (1), 72–81. doi: 10.1016/j.socnet.2009.02.005

Mercken, L. , Snijders, T. A. B. , Steglich, C. , Vertiainen, E. , & de Vries, H. ( 2010 ). Smoking-based selection and influence in gender-segregated friendship networks: A social network analysis of adolescent smoking. Addiction, 105 (7), 1280–9. doi: 10.1111/j.1360-0443.2010.02930.x

Merton, R. K. ( 1968 ). The Matthew Effect in science: The reward and communication systems of science are considered. Science, 159 (3810), 56–63. doi: 10.1126/science.159. 3810.56

Milgram, S. ( 1967 ). The small world problem. Psychology Today, 2 , 60–7.

Mizruchi, M. S. ( 1996 ). What do interlocks do? An analysis, critique, and assessment of research on interlocking directorates. Annual Review of Sociology, 22 , 271–98.

Molm, L. D. ( 2003 ). Theoretical comparisons of forms of exchange. Sociological Theory, 21 (1), 1–17.

Molm, L. D. , Collett, J. L. , & Schaefer, D. R. ( 2006 ). Conflict and fairness in social exchange. Social Forces, 84 (4), 2331–52.

Monge, P. R. , & Contractor, N. S. ( 2003 ). Homophily, proximity and social support. Theories of communication networks (pp. 223–39). New York: Oxford University Press.

Monge, P. R. , Heiss, B. M. , & Margolin, D. B. ( 2008 ). Communication network evolution in organizational communities. Communication Theory, 18 (4), 449-U435. doi: 10.1111/j.1468-2885.2008.00330.x

Monge, P. R. , Lee, S. , Fulk, J. , Weber, M. , Shen, C. H. , Schultz, C. , … Frank, L. B. ( 2011 ). Research methods for studying evolutionary and ecological processes in organizational communication. Management Communication Quarterly, 25 (2), 211–51. doi: 10.1177/0893318911399447

Moody, J. , & White, D. R. ( 2003 ). Structural cohesion and embeddedness: A hierarchical conception of social groups. American Sociological Review, 68 , 103–27.

Moore, G. ( 1979 ). The structure of a national elite network. American Sociological Review, 44 (5), 673–92.

Moore, G. ( 1990 ). Structural determinants of men’s and women’s personal networks. American Sociological Review, 55 (5), 726–35. doi: 10.2307/2095868

Moreno, J. ( 1934 ). Who shall survive? A new approach to the problem of human interrelations. Washington, DC: Nervous and Mental Disease Publishing Co.

Neaigus, A. , Gyarmathy, V. A. , Miller, M. , Frajzyngier, V. M. , Friedman, S. R. , & Des Jarlais, D. C. ( 2006 ). Transitions to injecting drug use among noninjecting heroin users—social network influence and individual susceptibility. Journal of Acquired Immune Deficiency Syndromes, 41 (4), 493–503.

Nemeth, R. J. , & Smith, D. A. ( 1985 ). International trade and world-system structure: A multiple network analysis. Quantitative Studies of the World-System, 8 (4), 517–60.

Newcomb, T. M. ( 1961 ). The acquaintance process . New York: Holt, Rinehart, Winston.

Newman, M. E. J. ( 2001 a). Clustering and preferential attachment in growing networks. Physical Review E, 64 (2), 025102.

Newman, M. E. J. ( 2001 b). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64 (1), 016131.

Newman, M. E. J. ( 2004 a). Detecting community structure in networks. European Physical Journal B, 38 (2), 321–30. doi: 10.1140/epjb/e2004-00124-y

Newman, M. E. J. ( 2004 b). Fast algorithm for detecting community structure in networks. Physical Review E, 69 (6), 066133. doi: 10.1103/Physreve.69.066133

Newman, M. E. J. (2006). Modularity and community structure in networks . Proceedings of the National Academy of Sciences of the United States of America , 103 (23), 8577–82. doi: 10.1073/pnas.0601602103

Newman, M. E. J. , & Girvan, M. ( 2004 ). Finding and evaluating community structure in networks. Physical Review E, 69 (2), 026113. doi: 10.1103/Physreve.69.026113

Newman, M. E. J. , Strogatz, S. H. , & Watts, D. J. ( 2001 ). Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64 (2), 026118.

O’Madadhain, J. , Fisher, D. , White, S. , & Boey, Y. ( 2003 ). The JUNG (Java Universal Network/Graph) framework. Irvine, CA: School of Information and Computer Science, University of California.

Padgett, J. F. , & Ansell, C. K. ( 1993 ). Robust action and the rise of the Medici, 1400–34. American Journal of Sociology, 98 (6), 1259–319.

Panning, W. H. ( 1982 ). Fitting blockmodels to data. Social Networks, 4 (1), 81–101. doi: 10.1016/0378-8733(82)90014-4

Pattison, P. E. ( 1993 ). Algebraic models for social networks . Cambridge, UK: Cambridge University Press.

Pattison, P. E. , & Robins, G. ( 2002 ). Neighborhood-based models for social networks. Sociological Methodology, 32 , 301–37.

Pich, C. ( 2008 ). Applications of multidimensional scaling to graph drawing . Explorative Analysis and Visualization of Large Information Spaces. DFG Colloquium Konstanz. Universitat Konstanz. Konstanz.

PNET . Retrieved from http://sna.unimelb.edu.au/PNet . Accessed April 27, 2013.

Podolny, J. M. , & Baron, J. N. ( 1997 ). Resources and relationships: Social networks and mobility in the workplace. American Sociological Review, 62 (5), 673–93.

Pollard, M. S. ( 2009 ). Friendship networks and alcohol use in adolescence and young adulthood. Bethesda, MD: National Institute on Alcohol Abuse and Alcoholism.

Pollner, P. , Palla, G. , & Vicsek, T. ( 2006 ). Preferential attachment of communities: The same principle, but a higher level. Europhysics Letters, 73 (3), 478–84. doi: 10.1209/epl/i2005-10414-6

Poteat, V. P. , Espelage, D. L. , & Green, H. D., Jr. ( 2007 ). The socialization of dominance: Peer group contextual effects on homophobic and dominance attitudes. Journal of Personality and Social Psychology, 92 (6), 1040–50. doi: 10.1037/0022-3514.92.6.1040

Preciado, P. , Snijders, T. A. B. , Burk, W. J. , Stattin, H. , & Kerr, M. ( 2011 ). Does proximity matter? Distance dependence of adolescent friendships. Social Networks, In Press, Corrected Proof . doi: 10.1016/j.socnet.2011.01.002

Price, D. J. ( 1965 ). Networks of scientific papers. Science, 149 , 510–15.

Price, D. J. ( 1980 ). A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 27 , 292–306.

Reichardt, J. , & Bornholdt, S. ( 2004 ). Detecting fuzzy community structures in complex networks with a potts model. Physical Review Letters, 93 (21). doi: 10.1103/Physrevlett.93.218701

Reichardt, J. , & White, D. R. ( 2007 ). Role models for complex networks. European Physical Journal B, 60 (2), 217–24. doi: 10.1140/epjb/e2007-00340-y

Richardson, D. C. , & Richardson, J. S. ( 1992 ). The Kinemage: A tool for scientific communication. Protein Science, 1 (1), 3–9.

Ripley, R. , & Snijders, T. A. B. ( 2010 ). Manual for Siena version 4.0 . Oxford: University of Oxford, Department of Statistics.

Robins, G. , Pattison, P. , & Elliott, P. ( 2001 ). Network models for social influence processes. Psychometrika, 66 (2), 161–90.

Robins, G. , Pattison, P. , Kalish, Y. , & Lusher, D. ( 2007 ). An introduction to exponential random graph (p’) models for social networks. Social Networks, 29 , 173–91.

Robins, G. , Pattison, P. , & Wang, P. ( 2009 ). Closure, connectivity and degree distributions: Exponential random graph (p *) models for directed social networks. Social Networks, 31 (2), 105–17. doi: 10.1016/j.socnet.2008.10.006

Robins, G. , Pattison, P. , & Wasserman, S. ( 1999 ). Logit models and logistic regressions for social networks, III. Valued relations. Psychometrika, 64 , 371–94.

Robins, G. , Pattison, P. , & Woolcock, J. ( 2004 ). Missing data in networks: Exponential random graph (p *) models for networks with non-respondents. Social Networks, 26 (3), 257–83. doi: 10.1016/j.socnet.2004.05.001

Robins, G. , Pattison, P. , & Woolcock, J. ( 2005 ). Small and other worlds: Global network structures from local processes. American Journal of Sociology, 110 , 894–936.

Sampson, R. J. , Morenoff, J. D. , & Gannon-Rowley, T. ( 2002 ). Assessing “neighborhood effects”: Social processes and new directions in research. Annual Review of Sociology, 28 , 443–78.

SAS Institute . ( 1990 ). Introduction to clustering procedures. SAS/STAT user’s guide (Vol. 1, pp. 53–101). Cary, NC: SAS Institute.

Schweinberger, M. ( 2012 ). Statistical modelling of network panel data: Goodness of fit. British Journal of Mathematical and Statistical Psychology, 65 (2), 263–281.

Schweinberger, M. , & Snijders, T. A. B. ( 2007 ). Markov models for digraph panel data: Monte Carlo-based derivative estimation. Computational Statistics & Data Analysis, 51 (9), 4465–83. doi: 10.1016/j.csda.2006.07.014

Scott, J. ( 1991 ). Positions, roles and clusters. Social network analysis: A handbook (pp. 123–74). London: Sage Publications.

Scott, J. ( 2000 a). Centrality and centralization. Social network analysis: A handbook . London: Sage Publications.

Scott, J. ( 2000 b). Components, cores and cliques. Social network analysis: A handbook . London: Sage Publications.

Scott, J. ( 2000 c). Social network analysis: A handbook . London: Sage Publications.

Shen, Z. Q. , Ma, K. L. , & Eliassi-Rad, T. ( 2006 ). Visual analysis of large heterogeneous social networks by semantic and structural abstraction. IEEE Transactions on Visualization and Computer Graphics, 12 (6), 1427–39.

Shumate, M. , & Dewitt, L. ( 2008 ). The north/south divide in ngo hyperlink networks. Journal of Computer-Mediated Communication, 13 (2), 405–28. doi: 10.1111/j.1083-6101.2008.00402.x

Shumate, M. , & Lipp, J. ( 2008 ). Connective collective action online: An examination of the hyperlink network structure of an ngo issue network. Journal of Computer-Mediated Communication, 14 (1), 178–201. doi: 10.1111/j.1083-6101.2008.01436.x

Shumate, M. , & Palazzolo, E. ( 2010 ). Exponential random graph (p *) models as a method for social network analysis in communication research. Communication Methods and Measures, 4 (4), 341–71. doi: 10.1080/19312458.2010.527869

SIENA . Retrieved from http://www.stats.ox.ac.uk/~snijders/siena/ . Accessed April 27, 2013.

Smith, D. A. , & White, D. R. ( 1992 ). Structure and dynamics of the global economy: Network analysis of international trade 1965–1980. Social Forces, 70 , 857.

Snijders, T. A. B. ( 1996 ). Stochastic actor-oriented models for network change. Journal of Mathematical Sociology, 21 (1), 149–72.

Snijders, T. A. B. ( 2001 ). The statistical evaluation of social network dynamics. Sociological Methodology, 31 (1), 361–95. doi: 10.1111/0081-1750.00099

Snijders, T. A. B. ( 2005 ). Models for longitudinal network data. Models and Methods in Social Network Analysis, 1 , 215–247.

Snijders, T. A. B. ( 2006 ). Statistical methods for network dynamics. In S. R. Luchini & et al (Eds.), Proceedings of the xliii scientific meeting, Italian statistical society (pp. 281–96). Padova: CLEUP.

Snijders, T. A. B. (2008). Statistical modeling of dynamics of non-directed networks . Paper presented at the XXV International Sunbelt Social Networks Conference, Redondo Beach, CA.

Snijders, T. A. B. ( 2009 ). Longitudinal methods of network analysis. In B. Meyers (Ed.), Encyclopedia of Complexity and System Science (pp. 5998–6013). Berlin: Springer Verlag.

Snijders, T. A. B. , & Baerveldt, C. ( 2003 ). A multilevel network study of the effects of delinquent behavior on friendship evolution. Journal of Mathematical Sociology, 27 , 123–51.

Snijders, T. A. B. , & Kenny, D. A. ( 1999 ). The social relations model for family data: A multilevel approach. Personal Relationships, 6 (4), 471–86.

Snijders, T. A. B. , Koskinen, J. , & Schweinberger, M. ( 2010 ). Maximum likelihood estimation for social network dynamics. Annals of Applied Statistics, 4 (2), 567–88. doi: 10.1214/09-Aoas313

Snijders, T. A. B. , Pattison, P. , Robins, G. L. , & Handcock, M. ( 2006 ). New specifications for exponential random graph models. Sociol Methodol, 36 , 99–153.

Snijders, T. A. B. , Steglich, C. E. G. , & Schweinberger, M. ( 2007 ). Modeling the co-evolution of networks and behavior. In K. van Montfort , J. Oud & A. Satorra (Eds.), Longitudinal models in the behavioral and related sciences (pp. 41–71). Mahwah, NJ: Lawrence Erlbaum Associates.

Snijders, T. A. B. , van de Bunt, G. G. , & Steglich, C. E. G. ( 2010 ). Introduction to stochastic actor-based models for network dynamics. Social Networks, 32 (1), 44–60. doi: 10.1016/j.socnet.2009.02.004

Snijders, T. A. B. , & van Duijn, M. A. J. ( 1997 ). Simulation for statistical inference in dynamic network models. In R. Conte , R. Hegselmann & P. Terna (Eds.), Simulating social phenomena (pp. 493–512). Berlin: Springer.

Sparrowe, R. T. , Liden, R. C. , Wayne, S. J. , & Kraimer, M. L. ( 2001 ). Social networks and the performance of individuals and groups. Academy of Management Journal, 44 (2), 316–25.

Steglich, C. , Snijders, T. A. B. , & Pearson, M. ( 2010 ). Dynamic networks and behavior: Separating selection from influence. Sociological Methodology, 40 (1), 329–93. doi: 10.1111/j.1467-9531.2010.01225.x

Steglich, C. , Snijders, T. A. B. , & West, P. ( 2006 ). Applying SIENA: An illustrative analysis of the co-evolution of adolescents’ friendship networks, taste in music, and alcohol consumption. Methodology, 2 , 48–66.

Strauss, D. ( 1992 ). The many faces of logistic regression. American Statistician, 46 (4), 321–7.

Su, C. , Huang, M. , & Contractor, N. ( 2010 ). Understanding the structures, antecedents, and outcomes of organizational learning and knowledge transfer: A multi-theoretical and multilevel network analysis. European Journal of International Management, 4 (6), 576–601. doi: 10.1504/EJIM.2010.035590

Taylor, H. ( 1970 ). Balance in small groups . New York; Cincinnati; Toronto; London; Melbourne: Von Nostrand Reinhold Company.

Thompson, S. K. ( 2002 ). Sampling, second edition . New York: Wiley.

Travers, J. , & Milgram, S. ( 1969 ). An experimental study of the small world problem. Sociometry, 32 , 425–43.

Tucker, J. S. , Kennedy, D. , Ryan, G. , Wenzel, S. L. , Golinelli, D. , & Zazzali, J. ( 2009 ). Homeless women’s personal networks: Implications for understanding risk behavior. Human Organization, 68 (2), 129–40.

Uehara, E. ( 1990 ). Dual exchange theory, social networks, and informal social support. American Journal of Sociology, 96 (3), 521–57.

Urberg, K. A. , Degirmencioglu, S. M. , & Pilgrim, C. ( 1997 ). Close friend and group influence on adolescent cigarette smoking and alcohol use. Developmental Psychology, 33 (5), 834–44. doi: 10.1037/0012-1649.33.5.834

Valente, T. W. ( 1995 ). Network models of the diffusion of innovation . New York: Hampton Press.

Valente, T. W. , Fujimoto, K. , Chou, C.-P. , & Spruijt-Metz, D. ( 2009 ). Adolescent affiliations and adiposity: A social network analysis of friendships and obesity. Journal of Adolescent Health, 45 (2), 202–4. doi: 10.1016/j.jadohealth.2009.01.007

Valente, T. W. , Gallaher, P. , & Mouttapa, M. ( 2004 ). Using social networks to understand and prevent substance use: A transdisciplinary perspective. Substance Use & Misuse, 39 (10–12), 1685–712. doi: 10.1081/ja-200033210

Valente, T. W. , Unger, J. B. , & Johnson, C. A. ( 2005 ). Do popular students smoke? The association between popularity and smoking among middle school students. Journal of Adolescent Health, 37 (4), 323–9.

van Duijn, M. A. J. , van Busschbach, J. T. , & Snijders, T. A. B. ( 1999 ). Multilevel analysis of personal networks as dependent variables. Social Networks, 21 (2), 187–209. doi: 10.1016/s0378-8733(99)00009–x

van Tilburg, T. ( 1998 ). Losing and gaining in old age: Changes in personal network size and social support in a four-year longitudinal study. The Journals of Gerontology: Series B: Psychological Sciences and Social Sciences , 53B (6), S313–23.

Veenstra, R. , Lindenberg, S. , Munniksma, A. , & Dijkstra, J. K. ( 2010 ). The complex relation between bullying, victimization, acceptance, and rejection: Giving special attention to status, affection, and sex differences. Child Development, 81 (2), 480–6. doi: 10.1111/j.1467-8624.2009.01411.x

Veenstra, R. , Lindenberg, S. , Zijlstra, B. J. H. , De Winter, A. F. , Verhulst, F. C. , & Ormel, J. ( 2007 ). The dyadic nature of bullying and victimization: Testing a dual-perspective theory. Child Development, 78 (6), 1843–54. doi: 10.1111/j.1467-8624.2007.01102.x

Walker, H. A. , Thye, S. R. , Simpson, B. , Lovaglia, M. J. , Willer, D. , & Markovsky, B. ( 2000 ). Network exchange theory: Recent developments and new directions. Social Psychology Quarterly, 63 (4), 324–37.

Wasserman, S. , & Anderson, C. ( 1987 ). Stochastic a posteriori blockmodels: Construction and assessment. Social Networks, 9 (1), 1–36.

Wasserman, S. , & Faust, K. ( 1994 ). Social network analysis: Methods and applications . Cambridge, UK: Cambridge University Press.

Wasserman, S. , & Pattison, P. ( 1996 ). Logit models and logistic regressions for social networks: I. An introduction to Markov random graphs and p *. Psychometrika, 61 (3), 401–26.

Wasserman, S. , & Robins, G. L. ( 2005 ). An introduction to random graphs, dependence graphs, and p *. In P. Carrington , J. Scott & S. Wasserman (Eds.), Models and methods in social network analysis (pp. 148–61). New York: Cambridge University Press.

Watts, D. J. ( 1999 ). Small worlds: The dynamics of networks between order and randomness . Princeton, NJ: Princeton University Press.

Watts, D. J. ( 2004 ). The new science of networks. Six degrees: The science of a connected age . New York: W.W. Norton and Company.

Watts, D. J. , & Strogatz, S. H. ( 1998 ). Collective dynamics of “small-world” networks. Nature , 393 (6684), 440–2.

Wellman, B. , & Berkowitz, S. D. ( 1988 ). Introduction: Studying social structures. In B. Wellman & S. D. Berkowitz (Eds.), Social structure: A network approach (pp. 1–14). Cambridge, UK: Cambridge University Press.

White, D. R. , & Johansen, U. C. ( 2005 ). Network analysis and ethnographic problems: Process models of a turkish nomad clan . Lexington Press.

White, H. C. , Boorman, S. , & Breiger, R. ( 1976 ). Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology, 81 , 730–80.

Xu, J. , & Chen, H. C. ( 2005 ). Criminal network analysis and visualization. Communications of the ACM, 48 (6), 100–7.

About Oxford Academic
Publish journals with us
University press partners
What we publish
New features
Open access
Institutional account management
Rights and permissions
Get help with access
Accessibility
Advertising
Media enquiries
Oxford University Press
Oxford Languages
University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

Copyright © 2024 Oxford University Press
Cookie settings
Cookie policy
Privacy policy
Legal notice

This Feature Is Available To Subscribers Only

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Network analysis to evaluate the impact of research funding on research community consolidation

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation University of California at Davis, Davis, California, United States of America

Roles Conceptualization, Data curation, Methodology, Writing – original draft, Writing – review & editing

Roles Conceptualization, Methodology, Writing – original draft, Writing – review & editing

Daniel J. Hicks,
David A. Coil,
Carl G. Stahmer,
Jonathan A. Eisen

Published: June 18, 2019
https://doi.org/10.1371/journal.pone.0218273
See the preprint
Reader Comments

In 2004, the Alfred P. Sloan Foundation launched a new program focused on incubating a new field, “Microbiology of the Built Environment” (MoBE). By the end of 2017, the program had supported the publication of hundreds of scholarly works, but it was unclear to what extent it had stimulated the development of a new research community. We identified 307 works funded by the MoBE program, as well as a comparison set of 698 authors who published in the same journals during the same period of time but were not part of the Sloan Foundation-funded collaboration. Our analysis of collaboration networks for both groups of authors suggests that the Sloan Foundation’s program resulted in a more consolidated community of researchers, specifically in terms of number of components, diameter, density, and transitivity of the coauthor networks. In addition to highlighting the success of this particular program, our method could be applied to other fields to examine the impact of funding programs and other large-scale initiatives on the formation of research communities.

Citation: Hicks DJ, Coil DA, Stahmer CG, Eisen JA (2019) Network analysis to evaluate the impact of research funding on research community consolidation. PLoS ONE 14(6): e0218273. https://doi.org/10.1371/journal.pone.0218273

Editor: Wolfgang Glanzel, KU Leuven, BELGIUM

Received: February 1, 2019; Accepted: May 29, 2019; Published: June 18, 2019

Copyright: © 2019 Hicks et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The MoBE article list is included with the data collection and analysis scripts at https://doi.org/10.5281/zenodo.2548840 . Data from Crossref can be retrieved using the available scripts. Data from Scopus cannot be shared publicly for intellectual property reasons, but can be retrieved using the available scripts at a subscribing institution.

Funding: Funding for DAC and JAE came from the Alfred P. Sloan Foundation. DJH’s postdoctoral fellowship was funded by a gift to UC Davis from Elsevier. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: DJH’s postdoctoral fellowship was funded by a gift to UC Davis from Elsevier. This does not alter our adherence to PLOS ONE policies on sharing data and materials.

Introduction

In 2004, the Alfred P. Sloan Foundation launched a program focusing on the “Microbiology of the Built Environment”, sometimes known as “MoBE”. The aims of this program were to catalyze research on microbes and microbial communities in human built environments, such as homes, vehicles, and water systems; and to develop the topic into a whole field of inquiry. Prior to 2004, many new developments (e.g., major advances in DNA sequencing technology) had catalyzed innovation in studies of microbes found in other environments (e.g., those living in and on humans and other animals, those found in the soil, those found in the oceans), but these innovations had not spread rapidly enough to studies of the microbes in the built environment. Similarly, many developments had occurred in studies of the built environment (e.g., the spread of low cost sensor systems), but focus had not yet been placed on the living, microbial components of built environments. This is not to say there had been no studies on the MoBE topic prior to 2004, but rather that the pace of advances in the area were modest at best compared to advances in other areas of microbiology and built environment studies. The MoBE area was founded on the belief that institutionally supported, integrated, trans-disciplanary scientific inquiry could address these shortfalls and lead to major benefits in areas such as indoor health, disease transmission, biodefense, forensics, and energy efficiency.

The Sloan Foundation’s program ultimately lasted 15 years and invested more than $50 million on work in the MoBE field. A key goal of this program was to bring together the highly disparate fields of microbiology (especially the area focused on studies of entire ecosystems of microbes) and building science (e.g. with a focus on building, maintaining, regulating, and studying built environments) with their different approaches, cultures, incentives, and rewards. Grants were given to many projects and a diverse collection of people covering many fields including microbiology, architecture, building science, software development, and meeting organization (a list of all grants from the program can be found at https://sloan.org/grants-database?setsubprogram=2 ). The products of these grants included a diverse collection of programs and projects, dozens of new collaborations, many novel and sometimes large data sets on various MoBE topics, new software and tools for MoBE studies, and hundreds of scholarly publications.

Recent reviews of the state of the field (e.g. [ 1 ] [ 2 ]) have qualitatively highlighted the success of this program. In this paper we report a quantitative assessment of the Sloan MoBE program and the MoBE field using a network analysis of scholarly literature. Specifically, the aim of this study was to compare the community of researchers funded by the Sloan Foundation’s MoBE program to their scientific peers. If the Sloan Foundation’s program was successful at cultivating a new research community around MoBE topics, we hypothesized that we would see the evolution of an increasingly dense and more tightly connected network over the duration of the funding program.

Programs explicitly dedicated to funding interdisciplinary research may have an important role to play in the development of new research communities. [ 3 ] finds that interdisciplinary research proposals are less likely to be funded by the Australian Research Council’s Discovery Programme, which is designed to fund basic research across the disciplines but is not explicitly interdisciplinary. This indicates an incentive for researchers to propose—and then conduct—disciplinary research, which is more likely to build on established research communities. By contrast, [ 4 ] finds evidence of both novel collaborations as well as cross-disciplinary citations and publications for researchers funded by the US National Robotics Initiative program, which is explicitly interdisciplinary.

[ 5 ] proposes that coauthor networks can be used to examine the emergence of Kuhnian “normal science” [ 6 ]. Specifically, they relate the formation of a giant component—in which a single connected component of the network contains a supermajority of authors—to the formation of the kind of research community Kuhn described. [ 5 ] focuses on three topological statistics for coauthor networks: (1) the diameter (average shortest path length between pairs of nodes) of the largest component, (2) the fraction of edges in the largest component, and (3) “densification,” the exponent of a power law model relating edge and node counts across time for a given dynamic network. While diameter and edge fraction are dynamic, calculated at each time step (e.g., annually) as the coauthor network changes, densification is a summary across time. [ 7 ] uses topic modeling to subdivide papers from the arXiv, the physics repository, into various subfields, then applies the approach of [ 5 ] to examine the dynamics of coauthor networks in each subfield. Following [ 5 ], [ 7 ] also uses the diameter of the largest component as a key statistic, but also examines the fraction of nodes, rather than edges, in the largest component.

As [ 5 ] acknowledges, Kuhn’s notion of a paradigm and normal scientific research is controversial. In addition, network topology alone cannot provide insight into the normative aspects of a Kuhnian paradigm. That is, in Kuhn’s view, a paradigm provides a rules and standards for good scientific research. The term paradigm comes from linguistics, in which a paradigm characterizes rules and standards for a specific construction. For example, “amo, amas, amat, amamus, amatis, amant” is a paradigm for the first conjugation of Latin verbs. Similarly, the paradigms for a normal science (e.g., protocols for experimental design and statistical analysis) provide shared rules and standards for good research—at least for the research community operating under the paradigm. The fact that a network of researchers are working with each other does not tell us whether they have this kind of shared normative framework.

However, the fact that a network of researchers are working with each other (or not) does provide insight into the structural possibilities for the circulation of ideas and information among researchers. Information flow within and across the boundaries of scientific communities has long been a major topic in science and technology studies (STS) and philosophy of science [ 8 ]; [ 9 ]; [ 10 ]. Increased information flow is also often a key goal of research funding programs, especially information flow across disciplinary boundaries [ 11 ]. Insofar as a scientific community is defined in terms of information flow, a transition from a disconnected or loosely-connected collaboration network to a highly-connected one does provide evidence for the formation of a scientific community.

[ 12 ] moves from coauthor networks to institutional collaboration networks (if X and Y are coauthors, then their respective institutions are collaborators) to examine the development of the field of strategic management. [ 12 ] calculates several dynamic network statistics for institutional networks, including average clustering, diameter, “connectedness” and “fragmentation” (which unfortunately are not defined, and have various incompatible definitions in the network analysis literature), and the number and fraction of nodes in the largest component.

[ 13 ] examines the role of funded researchers (“PIs”) in the collaboration network in Slovenia from 1970-2016. Part of their analysis focuses on the relationship among several statistics over overlapping time periods, including the fraction of nodes in the giant component, the mean fraction of each node’s neighbors who are PIs, the number of connected components when PIs are removed from the giant component, and the relative size of the largest component when PIs are removed.

All of these studies use dynamic analysis of coauthor networks to examine development and change in research communities over time. However, none of these studies is designed to examine the effect of a particular funding program on the research community, and only [ 13 ] situates the group of researchers of interest (“PIs” or funded researchers) in the context of their peers (i.e., authors who were not funded).

In contrast, [ 14 ] uses coauthor and institutional collaboration networks, among other bibliometric methods, to examine the impact of a US National Aeronautics and Space Administration (NASA) program focused on astrobiology; while [ 15 ] uses a coauthor network, again among other methods, to study the early impacts of the US National Science Foundation (NSF) Science of Science Policy (SciSIP) program. Because these are early assessments of their respective funding programs, both of these studies use static rather than dynamic collaboration networks.

[ 16 ] and [ 17 ] use dynamic network methods to analyze individual-level funding program impacts. [ 16 ] compares participants in two fellowship programs, funded by Japan Science and Technology Agency and Japan Society for the Promotion of Science, to their peers in a large literature database, focusing on individual betweeness centrality over time. [ 17 ] tests several hypotheses concerning the relationship between local topological features of the network (e.g., the size of a researcher’s neighborhood) and patent applications under a Chinese program to fund photovoltaic research.

Of these four program assessment studies, only [ 16 ] incorporates a comparison group of researchers.

In the present study, we use the theoretically-informed approach developed in [ 5 ] and [ 7 ] to examine the community-level impact of a specific funding program, namely, the MoBE program. By comparing MoBE-funded researchers to their peers, and incorporating robustness checks for the way peers are identified, we can have more confidence in the interpretation of our results as identifying causal effects of the MoBE program. In addition, by deploying a wider variety of network statistics, we identify changes in the coauthor networks that would be missed by the smaller set of statistics used in [ 5 ] and [ 7 ].

Compared to the literature reviewed above, our study is distinctive for using network analysis methods and a comparison group of researchers to analyze the community-level impacts of a particular research funding program. To be clear, we make no claims here about the impacts of research funding programs more generally, but we do think that the MoBE program is an interesting case of an explicit attempt to create an interdisciplinary, multi-institution research community. Insofar as we find that the MoBE program was successful in this attempt, future research might identify specific features of the program that contributed to this success and could be generalized to other such programs.

Methods and materials

Corpus selection.

Publications funded by the Sloan Foundation’s MoBE program provided the starting point for our data collection and analysis. We evaluate the effect of this program by analyzing these publications in the context of previous work by the same authors, as well as a “control” or comparison set of authors working in the same general areas. We identify the comparison set as authors publishing frequently in the same journals as MoBE-funded publications.

Identifying sloan foundation-funded publications

A list of awards made within the Sloan-funded MoBE program is available at https://sloan.org/grants-database?setsubprogram=2 . The MoBE program awarded USD 51,000,000 in grants ranging from USD 3,500 to USD 2,500,000 (mean USD 335,000, median USD 125,000). Table 1 lists organizations than received 3 or more awards from this program. Fig 1 shows the number of new and active awards and publications within the MoBE program over time. While the earliest research awards were awarded in 2004, the number of new research awards expanded rapidly starting in 2011, with peak activity (most active research awards) in 2014. The first MoBE-funded publications did not appear until 2008, and peak publication occurred in 2016, indicating a lag of 2-3 years between research activities and the publication record.

PPT PowerPoint slide
PNG larger image
TIFF original image

A: New awards made each year. B: Active awards in each year. C: Publications in each year. Dark gray vertical lines indicate the end of 2017, when MoBE-funded publications were identified. Colors indicate award types in A and B; color is not meaningful in C.

https://doi.org/10.1371/journal.pone.0218273.g001

Awards include research funding as well as funds for meeting organization, data infrastructure development, outreach, and other categories. n: Number of awards received.

https://doi.org/10.1371/journal.pone.0218273.t001

A list of publications associated with the MoBE program was compiled through a combination of strategies. An initial set of papers was identified by manually searching for acknowledgement of Sloan Foundation funding in any publications authored by the grantees during the program period. Additional publications were identified by searching Google Scholar for relevant MoBE papers and identifying those authored by grantees during the program period. Finally, each grantee (as well as sometimes their lab members (n = ~ 50)) was contacted directly and asked whether the publication list we had for them was both accurate and complete. This feedback led to some publications being removed from the list (as having not derived from the Sloan Foundation’s program) and others being added. In addition, we posted requests for feedback in various social media settings (e.g., blogs, Twitter) asking for feedback on the list ( https://www.microbe.net/2017/09/07/sloan-funded-mobe-reference-collection/ ; https://www.microbe.net/2018/03/15/one-last-call-for-help-with-sloan-funded-mobe-paper-collection/ ). The final list contained 327 publications. 20 of these publications did not have digital object identifiers (DOIs) on record and were excluded from further analysis.

Identifying peer authors

We sought to compare MoBE researchers to peers who were not funded by the MoBE program, in order to control for ordinary developments in both individual careers (e.g., more senior researchers are likely to have more collaborators) and research communities (e.g., more researchers are trained and join the community). In what follows, researchers funded by the Sloan Foundation’s program are referred to as the “collaboration” authors; their peers are the “comparison” authors.

Several methods were considered for developing this comparison set. Keyword searches were judged to be too noisy, producing significant numbers of false positive and false negative matches, as well as highly sensitive to the particular keywords used. Forward-and-backward citation searches using the 307 MoBE articles (compare [ 18 ]) produced lists on the order of 1,000,000 publications, which was judged to be impractically large. As an alternative, peer authors were identified as authors who are highly prolific in the same journals as the 307 MoBE articles.

Specifically, using the rcrossref package [ 19 ] to access the Crossref API (application programming interface; https://github.com/CrossRef/rest-api-doc ), metadata were retrieved for 572,362 articles published in 111 journals between 2008 and 2018 inclusive. ( PLOS One was dropped prior retrieving these metadata, due to its general nature and extremely high publication volume.) 14 journals published at least 10,000 articles during this time period; these appeared to be high-volume, general or broad-scope journals, such as Science or Environmental Science & Technology . The 345,546 articles from these 14 journals were removed, leaving 226,816 articles from 97 journals. Because Crossref does not provide any standardized author identifiers, simple name matching was used to estimate the number of articles published by each author. (This method means “Maria Rodriguez” and “M. Rodriguez” would be counted as different authors at this stage.) The same method was used to roughly identify authors of MoBE-funded papers. After filtering out authors of MoBE-funded papers, the 1,000 most prolific authors were selected as candidates for the comparison set. See Fig 2 .

https://doi.org/10.1371/journal.pone.0218273.g002

Next, to retrieve standardized author identifiers, a covering set of papers was identified such that each candidate name appeared as an author of at least one paper in the covering set. This covering set included all candidates by name, and no filtering was applied in identifying the covering set. Metadata for these papers was retrieved from the Scopus API ( https://dev.elsevier.com ), which incorporates an automated author matching system and standardized identifiers, referred to as author IDs. These author IDs were then used to characterize researchers as members of the MoBE collaboration or comparison set. Collaboration authors were defined as any author who either (a) was an author of at least two MoBE-funded papers or (b) was the author of at least one MoBE-funded paper and appeared in the candidates list (total n = 393 distinct names for the collaboration; 438 distinct author IDs). Candidates for the comparison set were removed if they were classified as part of the collaboration (total n = 770 distinct author IDs for the comparison set). (In what follows, we do not distinguish between authors and author IDs).

Author histories

Author histories (up to 200 publications since 1999 inclusive) for all 1,208 authors were retrieved using the Scopus API. These histories include both MoBE-funded and non-MoBE-funded papers, published in all journals indexed by Scopus. This resulted in an analysis dataset of 85,306 papers. Besides standard metadata, each paper was identified as MoBE-funded (or not). Table 2 shows the distribution of papers in the analysis dataset across 4 author combinations: only comparison authors; only collaboration authors, with separate counts for MoBE and non-MoBE funded papers; and “mixed” papers, with authors from both sets.

Author groups are based only on authors included in either the collaboration or comparison set. For example, a non-MoBE paper by two collaboration authors and a third author (not included in either the collaboration set or the comparison set) would be counted as “collaboration authors only”.

https://doi.org/10.1371/journal.pone.0218273.t002

Disciplinary identification

As discussed in the introduction, one of the primary aims of the MoBE program was to promote interdisciplinary collaboration between microbiologists, on the one hand, and researchers in fields such as civil engineering and indoor air quality, on the other. To assess the success of the program in this respect, we attempted to collect data on researchers’ disciplinary self-identification. We contacted 80 MoBE-funded researchers via email, asking them what percentage of their research/work they would consider related to microbiology, building science, or “other.” 30 researchers responded. We conducted an exploratory analysis, looking for associations between area self-identification and researchers’ publications in the analysis dataset, based on (a) the All Science Journal Classification [ASJC] subject areas identified by Scopus, (b) all words used in paper abstracts, and (c) the 1000 most-informative words used in paper abstracts (where “informative” was calculated in terms of entropy over the self-identified disciplines). In each case, principal component analysis indicated that there were no useful associations that could be used to classify all authors within this disciplinary space (e.g., using a machine learning model). In light of these unpromising exploratory results and limited resources, efforts to interdisciplinary collaboration were not pursued further.

Network analysis

The analysis dataset of 85,306 papers was used as the basis for constructing time-indexed collaboration networks. Each author forms a node (distinguished by author ID); edges correspond to papers published in a given year, so that two authors are connected by an edge for a given year if they coauthored at least one paper published in that year. All collaboration authors had at least one edge; 72 comparison authors did not have at least one edge (i.e., at least one paper coauthored with another author in the dataset), and were dropped from the network analysis (remaining comparison n = 698). Authors who collaborated on multiple papers in a given year were connected with multiple edges, except when calculating density (see below).

After constructing the combined (collaboration + comparison) network, separate cumulative-annual networks were constructed for each set of authors. For example, two authors would be connected in the 2011 network if and only if (1) they were in the same author set and (2) they had coauthored at least one publication between 1999 and 2011 inclusive. Cumulative networks were used to reduce noise in the most recent years, due to incomplete data for 2018 and as the Sloan Foundation’s funding program was starting to wind down. Analyzing separate cumulative networks allows the examination of the development of research communities through time and between the author sets.

For network analysis, we extended the approach developed by [ 5 ] and [ 7 ]. Specifically, both of these studies proposed that community formation can be measured in terms of giant component coverage and mean distance or shortest path length: increasing coverage combined with decreased distance indicates community consolidation. Neither [ 5 ] nor [ 7 ] used a control or comparison group (neither study aimed to to examine the impact of a specific funding program or other intervention). In the study, we calculated a total of eight network topological statistics and directly compare the two author sets. Specifically, we calculated the number of authors, number of components, coverage of the giant component (as a fraction of authors included in the largest component), entropy ( H ) of the component size distribution, diameter, density (fraction of all possible edges actually realized), mean distance, and transitivity in each year.

Number of authors simply measures the total size of each network. Because these are cumulative networks, the number of authors necessarily increases. The number of components, coverage of the giant component, and entropy of the component distribution are measures of the large-scale structure of the network. More components indicate that the network is divided into subcommunities that do not interact (at least in terms of coauthoring papers); fewer components indicates consolidation of the research community. Giant component coverage and entropy measure the relative sizes of these different components; higher giant component coverage and lower entropy indicate that more authors can be found in a single component, which in turn indicates research community consolidation.

Diameter, density, and mean distance can be interpreted as measures of the ability of information to flow through the network. Lower diameter, higher density, and lower mean distance indicate that it is easier for information to move between any two given researchers, as there are fewer intermediary coauthors and a higher probability of a direct connection. These therefore indicate research community consolidation.

Transitivity is an aggregate measure of the local-scale structure of the network. Low transitivity indicates that the network is comprised of loosely connected clusters; there is collaboration across groups of researchers, but it is relatively rare. High transitivity, by contrast, indicates that the network cannot be divided into distinguishable clusters. High transitivity therefore indicates research community consolidation.

Two robustness checks were incorporated into our analysis. First, to account for the possibility of data errors or missingness, perturbed networks were generated for each year by randomly switching the endpoints of 5% of edges. Second, the construction of the comparison set is likely to exclude students, postdoctoral researchers, and other early-career researchers. Insofar as these types of authors are included in the collaboration set, the collaboration network may appear to be more well-connected than the comparison set. To account for this possibility, we construct and analyze filtered versions of the annual cumulative networks. Authors are included in the filtered versions only if they have 50 or more papers total in the analysis analysis dataset.

Acknowledgment sections and other sources of funding information are not included in the metadata retrieved for this analysis. We are therefore unable to identify funding sources except for MoBE-funded papers, for which we have our own metadata. The comparison method is thus designed to test only whether or not the removal of MoBE-funded research produces a response effect in the shape of the overall discursive space. It does not consider independent relationships between MoBE and other sources nor relationships between non-MoBE sources. An underlying assumption of the analysis is, therefore, that the rates of impact from other sources of research funding are constant and that there is no underlying relationship between MoBE funding and other funding sources such that the removal of MoBE funding results in uneven removal of another source(s) of funding. Examining these relationships is potential direction for future study.

All data collection and analysis was carried out in R [ 20 ]. Complete data collection and analysis code, as well as the list of MoBE-funded publications, is available at https://doi.org/10.5281/zenodo.2548839 .

Results/Discussion

Qualitative analysis.

The development of the combined network is shown in Fig 3 . MoBE-funded authors and papers are shown in blue; non-MoBE-funded authors and papers are shown in red. All together, we believe that Fig 3 shows the consolidation of the MoBE collaboration within a consolidating larger research community.

Panels show time slices (non-cumulative) of the giant component of the combined coauthor network. Blue nodes and edges are MoBE authors and papers; red nodes and edges are non-MoBE authors and papers. Network layouts are calculated separate for each slice using the Fruchterman-Reingold algorithm with default values in the igraph package.

https://doi.org/10.1371/journal.pone.0218273.g003

Prior to the beginning of the MoBE funding in 2004, subset of MoBE researchers are actively working with each other; but many MoBE researchers are isolated in this network, and the largest component is only loosely connected. Qualitatively, the combined network has a sparse “lace” structure, with many long loops, as well as an “archipelago” of numerous small disconnected components.

During the early years of the funding period (2005-2008 and 2009-2013), a tighter cluster of MoBE researchers appears on the margins of the overall research community; but many MoBE researchers can be found scattered among the comparison authors and in disconnected components. The combined network has a “hairy ball” appearance, with a dense central “ball” and many peripheral “hairs,” and again an extensive “archipelago.” Part of the MoBE collaboration appears as a somewhat coherent “sub-ball.” We infer that this indicates that this part of the MoBE collaboration is highly integrated within the larger community.

During the peak period of MoBE funding (2015-2018), the vast majority of MoBE researchers appear to form one or two large, coherent communities at the center of the giant component—well-defined “blobs” of blue within a larger blob of red. Very few MoBE researchers appear outside of this coherent community. We suggest that this indicates tight integration involving almost all members of the MoBE collaboration.

However, because qualitative features of a visualized network are heavily dependent on the visualization method, this qualitative analysis should not be overinterpreted. Below we provide a quantitative analysis, less susceptible to overinterpretation.

Note that a few comparison set authors remain in small disconnected components even in the final time slice. These likely reflect “false positives” in the construction of the comparison set: authors who appear relatively frequently in the same journals as the MoBE publications, but do not actually conduct research in relevant research areas. We manually identified some such false positives, including authors of news stories in journals such as Current Biology or Nature Biotechnology as well as a few neuroscientists.

Quantitative analysis

Fig 4 shows statistics over time for the cumulative collaboration networks in each author set. Overall, both the MoBE research community and the comparison research community consolidated over time; but the MoBE research community consolidated faster and more thoroughly than the comparison set.

See text for explanation of the different statistics calculated here. Solid lines correspond to observed values; shaded ribbons correspond to 90% confidence intervals on rewired networks, where 5% of the observed edges are randomly rewired while maintaining each node’s degree distributions. 100 rewired networks are generated for each author set-year combination. Dashed lines correspond to observed values for authors with 50 or more total papers in the data. Blue corresponds to the MoBE collaboration; red corresponds to the peer comparison set of authors. Vertical lines indicate 2004, the first year of research funding by the MoBE program. Due to publication lags, we would not expect to see effects from 2004 funding until 2006-07.

https://doi.org/10.1371/journal.pone.0218273.g004

The most notable differences between the two author sets appear with the number of components, diameter, density, and transitivity. The comparison set stabilizes at 15-20 distinct components, while the MoBE collaboration approaches fewer than 5 components. However, for both author sets giant component coverage approaches 1 and H approaches 0, indicating that both networks contain a single giant component; the comparison set simply has several disconnected components with isolated researchers. As observed in the qualitative analysis, we believe this is plausibly due to “false negatives” in constructing the comparison set. The remaining statistics are generally robust to the inclusion of such “false negatives”.

Prior to 2010, the MoBE and comparison sets have a similar diameter: increasing during 1999-2005 as new researchers are added; then roughly stable until about 2010. Diameter remains above 10 for the comparison set, with a notable increase in 2008 followed by a decrease after 2013. By contrast, starting around 2010, the MoBE collaboration diameter is consistently less and decreasing.

However, diameter might be criticized as sensitive to network size. The relatively low diameter of the MoBE collaboration might be explained by the fact that this network has about half as many researchers as the comparison set.

Density and transitivity are automatically normalized against network size, and so avoid this potential confounder. For the collaboration set, transitivity peaks near 90% in 2012, indicating that at this time the connected components of the MoBE collaboration have almost no internal structure: everyone involved in the collaboration in 2012 is working directly with almost everyone else. Density plateaus at about 10% at this same time, and remains roughly stable over the remaining years of the study period. Transitivity and density then drop somewhat, but still remain remarkably high, indicating a highly interconnected research community even as the number of authors approaches its peak of just over 400. Transitivity is greater than 60% for both author sets in 2008-2009, but then diverges, dropping to around 50% in the comparison set by 2018. Density is consistently below about 2.5% for the comparison set throughout the entire study period.

Because of the delay between research and journal article publication, these network statistics provide a lagging indicator of community formation, of roughly 2-3 years. Taking this lag into account, our network analysis indicates that the MoBE research community consolidated around the period 2008-2010.

Shaded regions in Fig 4 indicate that most comparisons between the MoBE and comparison sets are robust to data errors. Diameter and number of components are somewhat more sensitive to possible data errors than the other statistics; but even here the comparison set statistics are consistently greater than the MoBE set statistics, indicating less consolidation in the comparison set.

The dashed lines in Fig 4 indicate that the comparisons are also robust to excluding early-career researchers. Other than the number of authors—which necessarily will decrease when authors are filtered—the only noteworthy effect of filtering is to increase the density of the collaboration network. There is no practical difference in the other statistics, especially for comparing the two networks of authors. Intuitively, filtering less productive authors is likely to remove less-connected authors from the margins of the network. These authors are less likely to provide important ties connecting otherwise separated communities.

Conclusions

Overall, we believe our results support the hypothesis that the Sloan Foundation-funded researchers consolidated as a community over the course of the program during 2008-2010. Whereas at the start of the program there were relatively few connections between researchers, especially across domains, by the end of our study period the network was dense and highly interconnected. In particular, while the Sloan Foundation-funded community was initially less connected than the control community it reached a similar level of consolidation by the end of the study period. This suggests to us that the program was successful in the stated goal of increasing collaboration between researchers.

We note that the most dramatic differences between the MoBE collaboration and the comparison set could not have been detected using the two statistics calculated by [ 7 ], namely, giant component coverage and mean distance. Giant component coverage approached unity for both networks, and the difference in mean distance was relatively small. Mean distance could also be criticized as too sensitive to network size. By contrast, the most striking differences in this case appeared in density and transitivity, which are automatically normalized for network size.

Acknowledgments

The authors would like to thank the many program grantees who assisted us in refining the list of publications. Also thanks to Julia Maritz for compiling the initial list of publications.

View Article
PubMed/NCBI
Google Scholar
2. National Academies. Microbiomes of the Built Environment [Internet]. National Academies Press; 2017.
4. Hicks DJ, Simmons R. The National Robotics Initiative: A Five-Year Retrospective. IEEE Robotics and Automation Magazine. forthcoming.
6. Kuhn T. The Structure of Scientific Revolutions. Second edition. University of Chicago Press; 1970.
9. (ed.) G M. Trading Zones and Interactional Expertise. Gorman M, editor. MIT Press; 2010.
16. Fujita M, Inoue H, Terano T. Evaluating funding programs through network centrality measures of co-author networks of technical papers. 2017 IEEE International Conference on Big Data (Big Data). IEEE; 2017.
20. R Core Team. R: A Language and Environment for Statistical Computing [Internet]. Vienna, Austria: R Foundation for Statistical Computing; 2018. Retrieved: https://www.R-project.org .

Open access
Published: 22 July 2019

Thematic series on Social Network Analysis and Mining

Rodrigo Pereira dos Santos 1 &
Giseli Rabello Lopes 2

Journal of Internet Services and Applications volume 10 , Article number: 14 ( 2019 ) Cite this article

5933 Accesses

1 Citations

4 Altmetric

Metrics details

Social networks were first investigated in social, educational and business areas. Academic interest in this field though has been growing since the mid twentieth century, given the increasing interaction among people, data dissemination and exchange of information. As such, the development and evaluation of new techniques for social network analysis and mining (SNAM) is a current key research area for Internet services and applications. Key topics include contextualized analysis of social and information networks, crowdsourcing and crowdfunding, economics in networks, extraction and treatment of social data, mining techniques, modeling of user behavior and social networks, and software ecosystems. These topics have important areas of application in a wide range of fields, such as academia, politics, security, business, marketing, and science.

1 Introduction

This Thematic Series of the Journal of Internet Services and Applications (JISA) presents a collection of articles around the topic of Social Network Analysis and Mining (SNAM). From advances in Computer Science research and practice, the field of SNAM has become an important subject due to (i) the large amount and diversity of data that could be analyzed, (ii) the capacity of processing and solving complex analysis with efficiency, (iii) the development of new solutions for visualization of complex networks, and (iv) the application of SNAM concepts in different domains.

The study of social networks was leveraged by the social, educational and business communities. Academic interest in this field has been growing since the mid twentieth century [ 1 ], given the increasing interaction among people, data dissemination and exchange of information. In this scenario, big data sets require more accurate analyses. As such, the development and evaluation of new techniques for social network analysis and mining (SNAM) is a current key research area for Internet services and applications. These topics have important areas of application in a wide range of fields.

A social network is composed of actors who have relationships with each other. Networks can have a few to many actors (nodes) and one or many types of relationships (arrows) between pairs of actors [ 2 ]. In our daily life, we have several practical examples of social networks: our family, friends, and colleagues from the university, gym, work, or casual meetings. Individuals and organizations – seen as nodes in social networks – can be connected due to several reasons, such as friendship and genealogy, but also values, visions, ideas, finances, disagreements, conflicts, services, computer networks, air routes etc. The structure created from such a large amount of relationships is complex. Therefore, researchers study the network as a whole from a sociocentric view (all the links referring to specific relations in a given population), or as a social structure in an egocentric view (with links selected from specific people) [ 3 ].

In addition, people join and create groups in any society [ 4 ], but the web platform fostered critical changes in the way people can interact and think about the reality. Interactions (i) become easier, (ii) allow a frequent exchange of information, and (iii) transform communications tools and social media (e.g., microblogs, blogs, wikis, Facebook) to mass communication means that are more agile and far-reaching. As such, the use of social media contributes to the sharing of different types of information, especially in real time. Some examples are personal data, location, opinions and preferences. In this context, SNAM can support the understanding of preferences and associations, the identification of interactions, the recognition of influences, and the comprehension of information flow (context and concepts) among network actors.

Finally, the understanding of interactions in a specific scenario can produce concrete results. In an organization, employees should work to avoid problems regarding knowledge sharing [ 5 ]. In natural science, social networks can aid in the study of endemies and epidemies propagation [ 6 ]. In marketing, SNAM can be used as a tool for brand spread, or for the study of a market segment towards the understanding of how information propagates [ 7 ]. The last (but not least) example is the use of SNAM for the identification of criminal networks [ 8 ].

This JISA Thematic Series originates from the 6th Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2017) that was held in São Paulo, Brazil, on July 04–05, 2017. BraSNAM 2017 was affiliated with the 37th Brazilian Computer Society Congress (CSBC 2017) which is the official event of the Brazilian Computer Society (SBC). BraSNAM is focused on bringing together researchers and professionals interested in social networks and related fields. The workshop aims at providing innovative contributions to the research, development and evaluation of novel techniques for SNAM and applications. Finally, the main goal is to provide a valuable opportunity for multidisciplinary groups to meet and engage in discussions on SNAM.

Continuing in this direction, this JISA Thematic Series targets new techniques for the field of SNAM, mainly fostered by the context of Internet services and applications. We received contributions at various levels: from theoretical foundations to experiments and case studies based on real cases and applications; from modeling to mining and analysis of big data sets; and from different subjects and domains, such as entertainment, public transportation, elections, and personal social circles.

This Thematic Series presents high-quality research and technical contributions. We received six submissions as extended versions of the best papers of BraSNAM 2017. Topics included: analysis of online discussion and comments, complex networks, graph mining, government open data, power metrics, community detection, link assessment, homophily, and sentiment analysis. The five out of six submissions that were selected for publication and appear in this issue are summarized in the following section.

2 The papers

Loures et al. [ 9 ] investigate the potential that online comments have to describe television series. The authors implement and evaluated several different summarization methods. Their results reveal that a small set of comments can help to describe the corresponding episodes and, when taken together, the series as a whole.

Caminha et al. [ 10 ] use graph mining techniques for the detection of overcrowding and waste of resources in public transport. The authors propose a new data processing methodology for the evaluation of collective transportation systems. The results show that their approach is capable of identifying global imbalances in the system based on an evaluation of the weight distributions of the edges of the supply and demand networks.

Verona et al. [ 11 ] propose metrics for power analysis on political and economic networks based on a sociology theory and network topology. The authors present a case study using a network built on data from Brazilian Elections about electoral donations explaining how the metrics can help in the analysis of power and influence of the different actors (corporations and persons) in this network.

Leão et al. [ 12 ] propose a method to handle social network data that exploits temporal features to improve the detection of communities by existing algorithms. By removing random relationships, the authors observe that social networks converge to a topology with more pure social relationships and better quality community structures.

Finally, Caetano et al. [ 13 ] propose an analysis of political homophily among Twitter users during the 2016 US presidential election. Their results showed that the homophily level increases when there are reciprocal connections, similar speeches or multiplex connections.

3 Paper selection process

The paper selection process was run during 2018 and the papers were published as soon as they were accepted and online-first versions became ready. Each submission went through two to four revisions before the final decision. We invited leading experts who are international researchers in the field of SNAM and related topics to form this Thematic Series’ editorial committee. All manuscripts were reviewed by at least three members of this editorial committee. Guest editors checked the new version produced after each review cycle in order to decide whether the authors carefully addressed the reviewers’ comments. Otherwise, a further review cycle was requested by the guest editors. The papers were reviewed by a total of 19 reviewers. The names of the editorial committee members are listed on the acknowledgements of this editorial.

4 Conclusion

The future of research and practice in the field of SNAM is challenging. Opportunities are many: theoretical and applied research has been published in specific conferences and journals, but also in traditional venues since it requires a multidisciplinary arrangement. Based on the papers accepted to this Thematic Series, we can highlight some research gaps. For example, Loures et al. [ 9 ] point out the challenge of abstractive summarization for online comments : it is usually a much more complex task than the extractive one, since it requires a natural language generation module and a domain dependent component to process and rank the extracted knowledge. In turn, Caminha et al. [ 10 ] point out the need for a simulator for reproducing the dynamics of human mobility through the bus system in the case of a large metropolis. In this context, the use of data mining to estimate probability can represent the current demand for a bus system.

Regarding applications of SNAM in the context of presidential elections, Verona et al. [ 11 ] point out the challenge of redesigning the power metric to show relative values inside the network, instead of big absolute values. Moreover, information about company owners should be integrated in order to reveal hidden connections behind donations and politicians. In turn, Caetano et al. [ 13 ] point out the need for further investigation on the temporal political homophily analysis correlating it with external events that may have influenced the users’ sentiments. This effort can allow user classification through data mining techniques to identify candidates’ advocates, political bots, and other actors. Finally, regarding community detection, Leão et al. [ 12 ] point out the challenge of adopting different approaches for community detection , consider additional algorithms to explore temporal aspects or identify overlapping communities, and evaluate filtered networks. Moreover, different alternatives to measure the strength of ties should be investigated.

In the end, this Thematic Series comes out with some meaningful over-arching results:

SNAM researchers and practitioners recognize the importance of sentiment analysis for in the identification of conflicts and agreements, as well as social trends and movements, in different domains. As such, new methods and techniques should be developed based on the large set of existing empirical studies on this topic;

The dynamic nature of social networks makes community detection somehow a hard work. Different algorithms exist and are many. However, the treatment of randomness and noise in social relations requires further investigation. In addition, the assessment of those relations over time is also a topic of interested in SNAM;

Another challenge in the area is the understanding of social power and the way it manifests in social networks. In this context, power is tightly related to the notion of influence and authority. Research can vary from the development and use of SNAM algorithms and tools to the theorization based on qualitative studies (e.g., case studies, ethnography, sociotechnical approaches);

SNAM opens opportunities to investigate different types of systems, such as (i) systems-of-systems: a set of constituent software-intensive systems that are managerially and operationally independent, and present some emergent behavior and evolutionary development (e.g., smart cities, transportation, air space, flood monitoring), and (ii) software ecosystems: a set of actors and artifacts as well as their relations over a common technological platform (e.g., iOS, Android, Eclipse, SAP);

Finally, SNAM can support research on new trends of collaborative systems, such as crowdsourcing, free and open source software development, accountability, transparency and community engagement. A common interest lies on how to improve information visualization and recommendation based on actors’ characteristics and behaviors as well as the changes in their relations over time.

Berkowitz SD. An introduction to structural analysis: the network approach to social research. Toronto: Butterworth; 1982.

Google Scholar

Wasserman S, Faust K. Social network analysis: methods and applications. Cambridge: Cambridge University Press; 1994. p. 1994.

Book Google Scholar

Hanneman RA, Riddle M. Introduction to social network methods. Riverside, CA: University of California, Riverside; 2005.

Castells, M., 2000, The Rise of the Network Society (The Information Age: Economy, Society and Culture, Volume 1, 2nd ed Wiley-Blackwell.

Studart RM, Oliveira J, Faria FF, Ventura LVF, Souza JM, Campos MLM. Using social networks analysis for collaboration and team formation identification. In: Proceedings of the 15th international conference on computer supported cooperative work in design, Lausanne; 2011. p. 562–9.

Mikolajczyk RT, Kretzschmar M. Collecting social contact data in the context of disease transmission: prospective and retrospective study designs. Soc Networks. 2008;30(2):127–35.

Article Google Scholar

Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, D.C., USA; 2003.

Svenson P, Svensson P, Tullberg H. Social network analysis and information fusion for anti-terrorism. In: Proceedings of the conference on civil and military readiness. Sweden: Enköping. p. 2006.

Loures TC, de Melo PO, Veloso AA. Is it possible to describe television series from online comments? J Internet Serv Appl. 2018;9:25.

Caminha C, Furtado V, Pinheiro V, Ponte C. Graph mining for the detection of overcrowding and waste of resources in public transport. J Internet Serv Appl. 2018;9:22.

Verona L, Oliveira J, Hisse JVC, Campos MLM. Metrics for network power based on Castells’ network theory of power: a case study on Brazilian elections. J Internet Serv Appl. 2018;9:23.

Leão, J. C., Brandão, M. A., VAZ DE Melo, P. O. S., Laender, A. H. F. “Who is really in my social circle?”, Journal of Internet Services and Applications (2018) 9:23.

Caetano JA, Lima HS, Santos MF, Marques-Neto HT. Using sentiment analysis to define twitter political users’ classes and their homophily during the 2016 American presidential election. J Internet Serv Appl. 2018;9:18.

Download references

Acknowledgments

We thank all the authors, reviewers, editors-in-chief, and staff for the great work, which supported this Thematic Series on a very important topic both for the research community and for the industry. In particular, we thank all the editorial committee members: Alessandro Rozza ( lastminute.com Group, ITALY), Altigran Soares da Silva (Federal University of Minas Gerais, BRAZIL), Antonio Loureiro (Federal University of Minas Gerais, BRAZIL), Ari-Veikko Anttiroiko (Tampere University), Artur Ziviani (National Laboratory for Scientific Computing, BRAZIL), Bernardo Pereira Nunes (Pontifical Catholic University of Rio de Janeiro, BRAZIL), Claudio Miceli de Farias (Federal University of Rio de Janeiro, BRAZIL), Daniel Batista (University of São Paulo, BRAZIL), Flavia Bernardini (Federal Fluminense University, BRAZIL), Giacomo Livan (University College London, UK), Isabela Gasparini (Santa Catarina State University, BRAZIL), Jesús Mena-Chalco (Federal University of ABC, BRAZIL), Jonice Oliveira (Federal University of Rio de Janeiro, BRAZIL), Leandro Augusto Silva (Mackenzie Presbyterian University, BRAZIL), Luciano Antonio Digiampietri (University of São Paulo, BRAZIL), Luiz André Portes Paes Leme (Federal Fluminense University, BRAZIL), Mirella Moro (Federal University of Minas Gerais, BRAZIL), Raimundo Moura (Federal University of Piauí, BRAZIL), Yosh Halberstam (University of Toronto, CANADA). The voluntary work of these researchers was crucial for this Thematic Series.

Author information

Authors and affiliations.

Department of Applied Informatics, Federal University of the State of Rio de Janeiro – UNIRIO, Rio de Janeiro, Brazil

Rodrigo Pereira dos Santos

Department of Computer Science, Federal University of Rio de Janeiro – UFRJ, Rio de Janeiro, Brazil

Giseli Rabello Lopes

You can also search for this author in PubMed Google Scholar

Contributions

Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Rodrigo Pereira dos Santos .

Ethics declarations

Competing interests.

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

dos Santos, R.P., Lopes, G.R. Thematic series on Social Network Analysis and Mining. J Internet Serv Appl 10 , 14 (2019). https://doi.org/10.1186/s13174-019-0113-z

Download citation

Published : 22 July 2019

DOI : https://doi.org/10.1186/s13174-019-0113-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Road network analysis of Guwahati city using GIS

Research Article
Published: 23 July 2019
Volume 1 , article number 906 , ( 2019 )

Cite this article

Debashis Das ORCID: orcid.org/0000-0003-2116-9844 1 ,
Anil Kr. Ojha 1 ,
Harlin Kramsapi 1 ,
Partha P. Baruah 1 &
Mrinal Kr. Dutta 1

15k Accesses

17 Citations

Explore all metrics

Starting from the past or medieval period to the present day situation, history clearly indicates that the development of a region was and even today is function of a good transportation network. In the present day, society demands for an efficient and unobstructed road network after experiencing major issues or problems like traffic congestion, delay, pollution, increased vehicle operating cost and road accidents. Keeping in mind the above needs and constraints to traffic movement, an analysis of a digitized road network of the concerned city/town can be one of the best remedy to solve the problems. Such analysis is best suited in ArcGIS, Geographic Information System (GIS) software for creating, analyzing and compiling maps for obtaining information. In the present study an effort was made to prepare a road network map of Guwahati city and to find the shortest route between two places by proper analysis and digitization of its existing road network system in order to solve the traffic related problems to great extent. Network Analyst is a special analysis tool in ArcGIS which not only scrutinize the closest facility available in network of digitized interconnected lines but also facilitates in optimizing route during floods and emergency responses. One of the best models that can be generated through Network Analysis is the shortest route between required origin and destination points. The analysis is done based on input of certain network attributes like traversing distance, time and cost of travel, barriers, vehicle restrictions etc. All the important roads connecting each other within the Guwahati City were digitized in the GIS environment and proceeded further to serve the purpose.

Road Network Analysis of Major Destinations in Guwahati City Using GIS

Route Analysis of Hyderabad City Using Geomatics Application—A Case Study

GIS Based Circuity Analysis of Transport Network Structure of Calicut

Avoid common mistakes on your manuscript.

1 Introduction

The Indian highway network (IHN) is one of the busiest road networks in the world, constituting 2% of all roads in India but handling 40% of the total road traffic as of 2010 [ 1 ]. Traffic congestion occurs in certain parts and stretches of roads in Guwahati. Inadequate parking space, exorbitant bus, auto rickshaw, rickshaw and taxi fares, time-consuming and uncomfortable journey, frequent accidents are the causes and consequences related to traffic in the city. Flood inundation and water-logging problems are also faced in several ward of Guwahati Municipal Corporation area during the rainy season. Rapid urbanization with increased housing more rooftops, driveways, streets and other impervious or hard surfaces [ 2 ]. Hence to overcome the problems and to keep up sustainability, proper analysis of the present road network is very much crucial to help humanity to reach their destination with ease, reduced cost and time etc.

A GIS is an organized collection of computer hardware, software, geographic data, and personnel to efficiently capture, store, update, manipulate, analyze, and display all forms of geographically referenced information. ArcGIS Network Analyst enables users to dynamically model realistic network conditions, including turn restrictions, speed limits, height restrictions, and traffic conditions, at different times of the day [ 3 ]. ArcGIS proved to be one of the most user friendly, effective and time-saving tools in the field of both traffic engineering and transportation planning.

2 Methodology

For the network analysis of Guwahati City, a street basemap was added in ArcGIS software from the Google Earth satellite imagery and geo-referenced to get the co-ordinates. Geo-referencing involves image alignment in a co-ordinate system [ 4 ]. The study area is being enlarged to an easily workable scale as shown (Fig. 1 ).

Image showing the enlarged study area (Guwahati city) marked as red color (Google Maps 2018)

Layers are the mechanisms used to display geographic datasets in ArcGIS. For the analysis, the layers were being categorized as major road (mainly the Highways) and minor road (city roads connecting the highways and other urban localities), lanes and by-lanes in addition to wards. The table of contents lists all the layers on the map and shows what the features in each layer represent [ 5 ]. After successful creation, the base map digitization for each layer was done to form a skeleton of the prevailing road network. Digitization is process of making an electronic version of a real world object or event, enabling the object to be stored, displayed and manipulated on a computer, and disseminated over networks [ 6 ]. The entire GMC area is divided into 31 municipal wards and each municipal ward is further divided into 2, 3 or 4 Area Sabhas (subwards). A ward map was drawn as a layer over the Guwahati city based on the official GMC (Guwahati Municipal Corporation) map (Figs. 2 , 3 ).

Road network map of Guwahati city drawn in ArcGIS software

Ward map of Guwahati city

Network analysis in GIS rests firmly on the theoretical foundation of the mathematical sub disciplines of graph theory and topology. The most common and familiar implementations of network models are those used to represent the networks with which much of the population interacts every day: transportation and communications networks [ 7 ]. Routing is the act of selecting a course of travel, and it is arguably the most fundamental logistical operation in network analysis. Although network analysis in GIS has been largely limited to the simplest routing functions, the recent past has seen the development of object oriented data structures, the introduction of dynamic networks [ 8 ], the ability to generate multi-modal networks, and the use of simulation methods to generate solutions to network problems. There are, of course, many important network design problems that are very difficult to solve optimally due to their combinatorial complexity [ 9 ]. To allocate and provide urban facilities in an area with complex road network, determination of the shortest route as well as travel demand (trips generated and attracted) from the facility to the concerned area is a must. Generating the shortest path between two locations in a road network is a problem that can be solved by various map services and commercial navigation products [ 10 ]. There are several extremely efficient algorithms for determining the optimal route, the most widely cited of which was developed by Edsgar Dijkstra (in the year 1959).

The algorithm [ 11 ] is represented in brief as below:

where V is a set of vertices and E is a set of edges.

Dijkstra’s algorithm keeps two sets of vertices:

S the set of vertices whose shortest paths from the source have already been determined.

V-S the remaining vertices.

The other data structures needed are:

D array of best estimates of shortest path to each vertex

Pi an array of predecessors for each vertex

The basic mode of operation is:

Initial is d and pi .

Set S to empty.

While there are still vertices in V-S .

Sort the vertices in V-S according to the current best estimate of their distance from the source.

Add u, the closest vertex in V-S , to S .

Relax all the vertices still in V-S connected to u.

Pseudo code for Dijkstra’s Algorithm:

Distance [s] ← 0 (distance to source vertex is zero)

for all v ∈ V–{s}

do distance [v] ← ∞ (set all other distances to infinity)

S ← ∅ (S, the set of visited vertices is initially empty) Q ← V (Q, the queue initially contains all vertices) while Q ≠ ∅ (while the queue is not empty)

do u ← min distance (Q, distance) (select the element of Q with the min. distance)

S ← S ∪ {u} (add u to list of visited vertices) for all v ∈ neighbors[u]

do if distance [v] > distance [u] + w(u, v) (if new shortest path found)

then d[v] ← d[u] + w (u, v) (set new value of shortest path)

(if desired, add trace back code)

return dist.

To run the analysis over the digitized road network, a network geo dataset was created which resulted in a layer consisting of the junctions and edges connected topologically to each other. Ward analysis was done using Intersection tool to get the statistics of all the categorized roads created as layers. The origin and destination points were selected to solve the network for determination of the shortest route and to serve the purpose of our study (Fig. 4 ).

Junctions and edges of the digitized road network generated by network analyst toolbar in ArcGIS

An attribute table, tabular arrangement of meaningful data, of the digitized road network was generated to get the latitudes and longitudes (in degrees), road ID and name and the distances (in meters) between the junctions (or origin to destination) (Fig. 5 , Table 1 ).

Image showing the attribute table of the digitized road network of Guwahati city

4 Results and Discussions

The results can be interpreted as follows:

Guwahati city has been divided into 31 main municipal wards which are further subdivided forming a total of 90 wards. From the ward-road network analysis, it has been seen that-

Ward 22C has the largest area (9.954 km 2 ) and Ward 12B has smallest area (0.103 km 2 ). The 3 major roads pass through the 44 wards of Guwahati city out of which the ward 1B covers the maximum number of major roads (2) with a total length of 7.873 km. 48.88% of the wards have the major roads (Figs. 6 , 7 ).

Map showing density of major roads ward wise in ArcGIS

Map showing density of minor roads ward wise in ArcGIS

All the minor roads pass through 85 wards of the city out of which the ward 10A holds the maximum number of minor roads (9) with a total length of 4.554 km. The lanes and by-lanes cover up all the wards among which ward 6C has highest number of lanes (148) traversing throughout with a total length of 37.305 km. 94.44% of the wards have the minor roads.

Lanes cover up all the wards.

But when it comes to existing road density or the number of specific roads in all the wards, more than the half of the total wards (90 wards) doesn’t have sufficient roads. Such deficiency of road networks is mostly seen in the outer areas cum rural urban of the city including the rural villages. Limited roads and poor network is one of the reasons for traffic congestion. Traffic congestion leads to delay of vehicles, increased vehicle operating cost, increase in air and sound pollution, emission of poisonous gases like Carbon Monoxide (CO), road accidents, warming up of surrounding urban area, human frustration leading to road rage etc. From the field survey, it has been observed that for feeder trips, both transit and Para transit use some other routes other than the shortest route as per GIS. The reasons are numerous like transport movement policy, poor pavement conditions, unawareness of non-commuters etc. (Fig. 8 ).

Map showing density of lanes and by-lanes ward wise

A network analysis layer stores the inputs, properties, results of a network analysis and is always performed on network dataset. Network analysis in GIS rests firmly on the theoretical foundation of the mathematical sub disciplines of graph theory and topology. This paper is presented taking into account the route analysis layer and the analysis is based on real time network problems independent of the hierarchy (major roads, minor roads and lanes).GIS networks consist of interconnected lines (known as edges) and intersections (known as junctions) that represent routes upon which people, goods, etc., can travel. Network analysis helps in modeling as well as planning and management of moderate to heavy traffic routes. One common type of network analysis is finding the shortest path between two points. Junctions (or nodes) and edges have certain attributes affixed to them which help in modeling. Edges and junctions are topologically connected to each other-edges must connect to other edges at junctions, and the flow from edges in the network is transferred to other edges through junctions.

Determination of the shortest route between the origin and destination using the GIS Network Analyst will not only help the tourists or business entrepreneurs to access the tourist places or the trade centers with ease but it will also reduce cost and avoid traffic congestion resulting in less emission of pollutants. Dijkstra (1959) proposed a graph search algorithm that can be used to solve the single-source shortest path problem for any graph that has a non-negative edge path cost [ 12 ]. Network attributes are properties of the network elements that control movement over the network and helps in finding the shortest route based on type of attribute such as distance, vehicle restrictions, turn restrictions etc. This paper presents the shortest route based on the assumptions that traffic congestions are not considered and the calculations are based on road distances (Figs. 9 , 10 ; Table 2 ).

Image showing the shortest route between Khanapara old ASTC point to Jalukbari flyover point (shown by blue color) in ArcGIS

Image showing the shortest route between Paltan Bazar to Lokhra Bamunpara Chowk (shown by blue color) in ArcGIS

To travel from Nepali Mandir, Paltan Bazar to the ISBT, Guwahati there are three relatively shorter routes, the distance and via of which are given below:

via Lal Ganesh, Distance = 12 km

via Fayez Ahmed Road, Distance = 9.3 km

via Dhirenpara Road, Distance = 8.9 km

Division of the different types of vehicular road traffic into alternative shortest routes on the basis of operating characteristics and mileage performance of vehicles and proper revision of the transportation policy of the concerned can tackle the traffic congestion.

5 Conclusion

Thus Network Analysis is one of the most powerful tools to deal with the real time transportation problems. It is reliable, user friendly and efficiently solves the network problems. It has replaced the conventional methods of analyzing and saves lot of time and work. It emerged out as the helping hand in the field of transportation planning, origin and destination studies. It has high potentiality in analyzing the closest facility and service areas in a network. Ward analysis is a another easiest way to know the present density and number of roads in service which in turn would help in transportation planning keeping in consideration the various aspects. Information about the shortest route would help the drivers and road users to save the transportation cost, time and avoid traffic congestions in commuter routes by diverting the excess flow through less travelled shorter routes. Road structure improvement of the shortest routes by widening and imposing restrictions to parking would be a beneficial factor. Signal timing should be adequate for the intersecting shortest routes to serve maximum flow and avoid delay. Law enforcement should be strictly implemented on such routes to maintain discipline and stimulate the urban development. Increasing the number of minor roads (provided those are shortest) in wards concerned depending on the demand for transportation to-

Avoid congestion in the existing single routes.

Increase connectivity of the rural urban with the major road to avoid entry into the Central Business District (CBD).

As a whole, it can also be concluded that GIS has a lot of applications directly or indirectly in the field of transportation as the studies about the existing road network problems as their solutions can enhance the economy of a locality to a great extent. Implementation of the advanced technologies can be beneficial to maintain the sustainability and transportation related GIS tasks should be preferred for better output in future.

6 Research and Development (R&D)

The work has been carried out considering distance as impedance. In many cases, it has been seen that, the shortest route in terms of distance doesn’t always mean the shortest one in terms of time. One of the prime reasons is the huge flow of vehicles through the only shortest route leading to time consumption. Further work with much improved accuracy and efficiency should be carried out considering time as impedance factor. Traffic Data in real time for different routes has to be obtained from Google maps. Google Maps Traffic uses GPS signal data from smartphones in order to sustain such an immense algorithm [ 13 ]. It calculates the speed of the users which helps in knowing the affected routes. The shortest route algorithm can preserve the travel time with 20–22%, depending on the travel distances [ 14 ]. Although finding the shortest route between two points might be a known case for the mass using the Google Maps, but it gives long range transportation planning solutions like determining the zones/areas where urban activities like shopping malls, schools/colleges, offices, industries, health centers, recreational centers etc., has to be established at a specific location depending on the travel demand and the availability of required class of roads to make accessible to the people within as well as outside the locality. Since there are many places around the globe where internet/mobile connectivity is either less or nil, the shortest route map becomes the replacement of Google Maps. In addition to that, research work can be carried out to determine the path for establishing higher modes of public transport like Light Rail Transit (LRT), Metro trains, Bus Rapid Transit (BRT) etc. Also, the choice of shortest route due to variation in traffic during different times of the day, seasons, climate, festivals etc., and during emergency situations like accidents, fire etc., can be studied.

Mukherjee S (2012) Statistical analysis of the road network of India. Pranama J Phys 1:2. https://doi.org/10.1007/s12043-012-0336-z

Article Google Scholar

Barman P, Goswami DC (2009) Floodzone of Guwahati municipal corporation area using GIS technology. In: 10th ESRI India user conference 2009

Kumar P, Kumar D (2016) Network analysis using GIS techniques: a case of Chandigarh city. Int J Sci Res (IJSR) 5(2):409–411

Herbei MH, Ciolac V, Smuleac A, Nistor E, Ciolac L (2010) Georeferencing of topographical maps using the software ArcGIS. Res J Agric Sci 42(3):595–606

Google Scholar

Laixing L, Deren L, Zhenfeng S (2008) Research on geospatial information sharing platform based on ArcGIS server. In: The International archives of the photogrammetry, remote sensing and spatial information sciences, vol 37, Part B4, pp 791–795

Manjula KR, Jyothi S, Varma AK (2010) Digitzing the forest resource map using ArcGIS. IJCSI Int J Comput Sci Issues 7:300

Kurtin KM (2007) Network analysis in geographic information science: review, assessment and projections. Cartogr Geogr Inf Sci 34(2):103–111

Sutton JC, Wyman MM (2000) Dynamic location: an iconic model to synchronize temporal and spatial transportation data. Transp Res Part C Emerg Technol 8(1–6):37–52

Magnanti TL, Wong RT (1984) Network design and transportation planning: models and algorithms. Transp Sci 18(1):1–55

Lingkun W, Xiaokui X, Dingxiong D, Gao C, Andy DZ, Shuigeng Z (2012) Shortest path and distance queries on road networks: an experimental evaluation. VLDB 5(5):406–417

Khaing O, Wai H, Myat E (2018) Using Dijkstra’s algorithm for public transportation system in yangon based on GIS. Int J Sci Eng Appl 7(11):442–447

Akpofure ON, Paul AN (2017) Anapplication of Dijkstra’s Algorithm to shortest route problem. IOSR J Math (IOSR-JM). e-ISSN: 2278-5728

Hajnalka KA (2018). Road network analysis using GIS techniques in the interest of finding the optimal routes for emergency situatons. Case study: CLUJ-NAPOCA (ROMANIA), Geographia Napocensis Anul XII, nr. 1, 2018

Ahmed S, Ibrahim RF, Hefny HA (2017) GIS based Network Analysis for the Roads Network of the Greater Cairo Area. In: Proceedings of the international conference on applied research in computer science and engineering ICAR’17, vol 2144, no 1 at http://ceur-ws.org

Download references

Author information

Authors and affiliations.

Department of Civil Engineering, Jorhat Engineering College, Jorhat, Assam, 785007, India

Debashis Das, Anil Kr. Ojha, Harlin Kramsapi, Partha P. Baruah & Mrinal Kr. Dutta

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debashis Das .

Ethics declarations

Conflict of interest.

On behalf of all the authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Das, D., Ojha, A.K., Kramsapi, H. et al. Road network analysis of Guwahati city using GIS. SN Appl. Sci. 1 , 906 (2019). https://doi.org/10.1007/s42452-019-0907-4

Download citation

Received : 17 February 2019

Accepted : 10 July 2019

Published : 23 July 2019

DOI : https://doi.org/10.1007/s42452-019-0907-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Digitization of maps
Road network
Network analysis
Shortest route
Find a journal
Publish with us
Track your research

Introduction
Conclusions
Article Information

Hazard ratio for obesity was modeled according to mean daily step counts and 25th, 50th, and 75th percentile PRS for body mass index. Shaded regions represent 95% CIs. Model is adjusted for age, sex, mean baseline step counts, cancer status, coronary artery disease status, systolic blood pressure, alcohol use, educational level, and a PRS × mean steps interaction term.

Mean daily steps and polygenic risk score (PRS) for higher body mass index are independently associated with hazard for obesity. Hazard ratios model the difference between the 75th and 25th percentiles for continuous variables. CAD indicate coronary artery disease; and SBP, systolic blood pressure.

Each point estimate is indexed to a hazard ratio for obesity of 1.00 (BMI [calculated as weight in kilograms divided by height in meters squared] ≥30). Error bars represent 95% CIs.

eTable. Cumulative Incidence Estimates of Obesity Based on Polygenic Risk Score for Body Mass Index and Mean Daily Steps at 1, 3, and 5 Years

eFigure 1. CONSORT Diagram

eFigure 2. Risk of Incident Obesity Modeled by Mean Daily Step Count and Polygenic Risk Scores Adjusted for Baseline Body Mass Index

Data Sharing Statement

See More About

Customize your JAMA Network experience by selecting one or more topics from the list below.

Academic Medicine
Acid Base, Electrolytes, Fluids
Allergy and Clinical Immunology
American Indian or Alaska Natives
Anesthesiology
Anticoagulation
Art and Images in Psychiatry
Artificial Intelligence
Assisted Reproduction
Bleeding and Transfusion
Caring for the Critically Ill Patient
Challenges in Clinical Electrocardiography
Climate and Health
Climate Change
Clinical Challenge
Clinical Decision Support
Clinical Implications of Basic Neuroscience
Clinical Pharmacy and Pharmacology
Complementary and Alternative Medicine
Consensus Statements
Coronavirus (COVID-19)
Critical Care Medicine
Cultural Competency
Dental Medicine
Dermatology
Diabetes and Endocrinology
Diagnostic Test Interpretation
Drug Development
Electronic Health Records
Emergency Medicine
End of Life, Hospice, Palliative Care
Environmental Health
Equity, Diversity, and Inclusion
Facial Plastic Surgery
Gastroenterology and Hepatology
Genetics and Genomics
Genomics and Precision Health
Global Health
Guide to Statistics and Methods
Hair Disorders
Health Care Delivery Models
Health Care Economics, Insurance, Payment
Health Care Quality
Health Care Reform
Health Care Safety
Health Care Workforce
Health Disparities
Health Inequities
Health Policy
Health Systems Science
History of Medicine
Hypertension
Images in Neurology
Implementation Science
Infectious Diseases
Innovations in Health Care Delivery
JAMA Infographic
Law and Medicine
Leading Change
Less is More
LGBTQIA Medicine
Lifestyle Behaviors
Medical Coding
Medical Devices and Equipment
Medical Education
Medical Education and Training
Medical Journals and Publishing
Mobile Health and Telemedicine
Narrative Medicine
Neuroscience and Psychiatry
Notable Notes
Nutrition, Obesity, Exercise
Obstetrics and Gynecology
Occupational Health
Ophthalmology
Orthopedics
Otolaryngology
Pain Medicine
Palliative Care
Pathology and Laboratory Medicine
Patient Care
Patient Information
Performance Improvement
Performance Measures
Perioperative Care and Consultation
Pharmacoeconomics
Pharmacoepidemiology
Pharmacogenetics
Pharmacy and Clinical Pharmacology
Physical Medicine and Rehabilitation
Physical Therapy
Physician Leadership
Population Health
Primary Care
Professional Well-being
Professionalism
Psychiatry and Behavioral Health
Public Health
Pulmonary Medicine
Regulatory Agencies
Reproductive Health
Research, Methods, Statistics
Resuscitation
Rheumatology
Risk Management
Scientific Discovery and the Future of Medicine
Shared Decision Making and Communication
Sleep Medicine
Sports Medicine
Stem Cell Transplantation
Substance Use and Addiction Medicine
Surgical Innovation
Surgical Pearls
Teachable Moment
Technology and Finance
The Art of JAMA
The Arts and Medicine
The Rational Clinical Examination
Tobacco and e-Cigarettes
Translational Medicine
Trauma and Injury
Treatment Adherence
Ultrasonography
Users' Guide to the Medical Literature
Vaccination
Venous Thromboembolism
Veterans Health
Women's Health
Workflow and Process
Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

Download PDF
X Facebook More LinkedIn

Brittain EL , Han L , Annis J, et al. Physical Activity and Incident Obesity Across the Spectrum of Genetic Risk for Obesity. JAMA Netw Open. 2024;7(3):e243821. doi:10.1001/jamanetworkopen.2024.3821

Manage citations:

Permissions

Physical Activity and Incident Obesity Across the Spectrum of Genetic Risk for Obesity

1 Division of Cardiovascular Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
2 Center for Digital Genomic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
3 Division of Genetic Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
4 Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee
5 Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee
6 Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee
7 Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
8 Department of Biomedical Engineering, Vanderbilt University Medical Center, Nashville, Tennessee
9 Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
10 Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee

Question Does the degree of physical activity associated with incident obesity vary by genetic risk?

Findings In this cohort study of 3124 adults, individuals at high genetic risk of obesity needed higher daily step counts to reduce the risk of obesity than those at moderate or low genetic risk.

Meaning These findings suggest that individualized physical activity recommendations that incorporate genetic background may reduce obesity risk.

Importance Despite consistent public health recommendations, obesity rates in the US continue to increase. Physical activity recommendations do not account for individual genetic variability, increasing risk of obesity.

Objective To use activity, clinical, and genetic data from the All of Us Research Program (AoURP) to explore the association of genetic risk of higher body mass index (BMI) with the level of physical activity needed to reduce incident obesity.

Design, Setting, and Participants In this US population–based retrospective cohort study, participants were enrolled in the AoURP between May 1, 2018, and July 1, 2022. Enrollees in the AoURP who were of European ancestry, owned a personal activity tracking device, and did not have obesity up to 6 months into activity tracking were included in the analysis.

Exposure Physical activity expressed as daily step counts and a polygenic risk score (PRS) for BMI, calculated as weight in kilograms divided by height in meters squared.

Main Outcome and Measures Incident obesity (BMI ≥30).

Results A total of 3124 participants met inclusion criteria. Among 3051 participants with available data, 2216 (73%) were women, and the median age was 52.7 (IQR, 36.4-62.8) years. The total cohort of 3124 participants walked a median of 8326 (IQR, 6499-10 389) steps/d over a median of 5.4 (IQR, 3.4-7.0) years of personal activity tracking. The incidence of obesity over the study period increased from 13% (101 of 781) to 43% (335 of 781) in the lowest and highest PRS quartiles, respectively ( P = 1.0 × 10 −20 ). The BMI PRS demonstrated an 81% increase in obesity risk ( P = 3.57 × 10 −20 ) while mean step count demonstrated a 43% reduction ( P = 5.30 × 10 −12 ) when comparing the 75th and 25th percentiles, respectively. Individuals with a PRS in the 75th percentile would need to walk a mean of 2280 (95% CI, 1680-3310) more steps per day (11 020 total) than those at the 50th percentile to have a comparable risk of obesity. To have a comparable risk of obesity to individuals at the 25th percentile of PRS, those at the 75th percentile with a baseline BMI of 22 would need to walk an additional 3460 steps/d; with a baseline BMI of 24, an additional 4430 steps/d; with a baseline BMI of 26, an additional 5380 steps/d; and with a baseline BMI of 28, an additional 6350 steps/d.

Conclusions and Relevance In this cohort study, the association between daily step count and obesity risk across genetic background and baseline BMI were quantified. Population-based recommendations may underestimate physical activity needed to prevent obesity among those at high genetic risk.

In 2000, the World Health Organization declared obesity the greatest threat to the health of Westernized nations. 1 In the US, obesity accounts for over 400 000 deaths per year and affects nearly 40% of the adult population. Despite the modifiable nature of obesity through diet, exercise, and pharmacotherapy, rates have continued to increase.

Physical activity recommendations are a crucial component of public health guidelines for maintaining a healthy weight, with increased physical activity being associated with a reduced risk of obesity. 2 - 4 Fitness trackers and wearable devices have provided an objective means to capture physical activity, and their use may be associated with weight loss. 5 Prior work leveraging these devices has suggested that taking around 8000 steps/d substantially mitigates risk of obesity. 3 , 4 However, current recommendations around physical activity do not take into account other contributors such as caloric intake, energy expenditure, or genetic background, likely leading to less effective prevention of obesity for many people. 6

Obesity has a substantial genetic contribution, with heritability estimates ranging from 40% to 70%. 7 , 8 Prior studies 9 - 11 have shown an inverse association between genetic risk and physical activity with obesity, whereby increasing physical activity can help mitigate higher genetic risk for obesity. These results have implications for physical activity recommendations on an individual level. Most of the prior work 9 - 11 focused on a narrow set of obesity-associated variants or genes and relied on self-reported physical activity, and more recent work using wearable devices has been limited to 7 days of physical activity measurements. 12 Longer-term capture in large populations will be required to accurately estimate differences in physical activity needed to prevent incident obesity.

We used longitudinal activity monitoring and genome sequencing data from the All of Us Research Program (AoURP) to quantify the combined association of genetic risk for body mass index (BMI; calculated as weight in kilograms divided by height in meters squared) and physical activity with the risk of incident obesity. Activity monitoring was quantified as daily step counts obtained from fitness tracking devices. Genetic risk was quantified by using a polygenic risk score (PRS) from a large-scale genomewide association study (GWAS) of BMI. 13 We quantified the mean daily step count needed to overcome genetic risk for increased BMI. These findings represent an initial step toward personalized exercise recommendations that integrate genetic information.

Details on the design and execution of the AoURP have been published previously. 14 The present study used AoURP Controlled Tier dataset, version 7 (C2022Q4R9), with data from participants enrolled between May 1, 2018, and July 1, 2022. Participants who provided informed consent could share data from their own activity tracking devices from the time their accounts were first created, which may precede the enrollment date in AoURP. We followed the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) reporting guideline. In this study, only the authorized authors who completed All of Us Responsible Conduct of Research training accessed the deidentified data from the Researcher Workbench (a secured cloud-based platform). Since the authors were not directly involved with the participants, institutional review board review was exempted in compliance with AoURP policy.

Activity tracking data for this study came from the Bring Your Own Device program that allowed individuals who already owned a tracking device (Fitbit, Inc) to consent to link their activity data with other data in the AoURP. By registering their personal device on the AoURP patient portal, patients could share all activity data collected since the creation of their personal device account. For many participants, this allowed us to examine fitness activity data collected prior to enrollment in the AoURP. Activity data in AoURP are reported as daily step counts. We excluded days with fewer than 10 hours of wear time to enrich our cohort for individuals with consistently high wear time. The initial personal activity device cohort consisted of 12 766 individuals. Consistent with our prior data curation approach, days with less than 10 hours of wear time, less than 100 steps, or greater than 45 000 steps or for which the participant was younger than 18 years were removed. For time-varying analyses, mean daily steps were calculated on a monthly basis for each participant. Months with fewer than 15 valid days of monitoring were removed.

The analytic cohort included only individuals with a BMI of less than 30 at the time activity monitoring began. The primary outcome was incident obesity, defined as a BMI of 30 or greater documented in the medical record at least 6 months after initiation of activity monitoring. The latter stipulation reduced the likelihood that having obesity predated the beginning of monitoring but had not yet been clinically documented. We extracted BMI values and clinical characteristics from longitudinal electronic health records (EHRs) for the consenting participants who were associated with a health care provider organization funded by the AoURP. The EHR data have been standardized using the Observational Medical Outcomes Partnership Common Data Model. 15 In the AoURP, upon consent, participants are asked to complete the Basics survey, in which they may self-report demographic characteristics such as race, ethnicity, and sex at birth.

We filtered the data to include only biallelic, autosomal single-nucleotide variants (SNVs) that had passed AoURP initial quality control. 16 We then removed duplicate-position SNVs and kept only individual genotypes with a genotype quality greater than 20. We further filtered the SNVs based on their Hardy-Weinberg equilibrium P value (>1.0 × 10 −15 ) and missing rate (<5%) across all samples. Next, we divided the samples into 6 groups (Admixed American, African, East Asian, European, Middle Eastern, and South Asian) based on their estimated ancestral populations 16 , 17 and further filtered the SNVs within each population based on minor allele frequency (MAF) (>0.01), missing rate (<0.02), and Hardy-Weinberg equilibrium P value (>1.0 × 10 −6 ). The SNVs were mapped from Genome Reference Consortium Human Build 38 with coordinates to Build 37. Because the existing PRS models have limited transferability across ancestry groups and to ensure appropriate power of the subsequent PRS analysis, we limited our analysis to the populations who had a sample size of greater than 500, resulting in 5964 participants of European ancestry with 5 515 802 common SNVs for analysis.

To generate principal components, we excluded the regions with high linkage disequilibrium, including chr5:44-51.5 megabase (Mb), chr6:25-33.5 Mb, chr8:8-12 Mb, and chr11:45-57 Mb. We then pruned the remaining SNVs using PLINK, version 1.9 (Harvard University), pairwise independence function with 1-kilobase window shifted by 50 base pairs and requiring r 2 < 0.05 between any pair, resulting in 100 983 SNPs for further analysis. 18 Principal component analysis was run using PLINK, version 1.9. The European ancestry linkage disequilibrium reference panel from the 1000 Genomes Project phase 3 was downloaded, and nonambiguous SNPs with MAF greater than 0.01 were kept in the largest European ancestry GWAS summary statistics of BMI. 13 We manually harmonized the strand-flipping SNPs among the SNP information file, GWAS summary statistics files, and the European ancestry PLINK extended map files (.bim).

We used PRS–continuous shrinkage to infer posterior SNP effect sizes under continuous shrinkage priors with a scaling parameter set to 0.01, reflecting the polygenic architecture of BMI. GWAS summary statistics of BMI measured in 681 275 individuals of European ancestry was used to estimate the SNP weights. 19 The scoring command in PLINK, version 1.9, was used to produce the genomewide scores of the AoURP European individuals with their quality-controlled SNP genotype data and these derived SNP weights. 20 Finally, by using the genomewide scores as the dependent variable and the 10 principal components as the independent variable, we performed linear regression, and the obtained residuals were kept for the subsequent analysis. To check the performance of the PRS estimate, we first fit a generalized regression model with obesity status as the dependent variable and the PRS as the independent variable with age, sex, and the top 10 principal components of genetic ancestry as covariates. We then built a subset logistic regression model, which only uses the same set of covariates. By comparing the full model with the subset model, we measured the incremental Nagelkerke R 2 value to quantify how much variance in obesity status was explained by the PRS.

Differences in clinical characteristics across PRS quartiles were assessed using the Wilcoxon rank sum or Kruskal-Wallis test for continuous variables and the Pearson χ 2 test for categorical variables. Cox proportional hazards regression models were used to examine the association among daily step count (considered as a time-varying variable), PRS, and the time to event for obesity, adjusting for age, sex, mean baseline step counts, cancer status, coronary artery disease status, systolic blood pressure, alcohol use, educational level, and interaction term of PRS × mean steps. We presented these results stratified by baseline BMI and provided a model including baseline BMI in eFigure 2 in Supplement 1 as a secondary analysis due to collinearity between BMI and PRS.

Cox proportional hazards regression models were fit on a multiply imputed dataset. Multiple imputation was performed for baseline BMI, alcohol use, educational status, systolic blood pressure, and smoking status using bootstrap and predictive mean matching with the aregImpute function in the Hmisc package of R, version 4.2.2 (R Project for Statistical Computing). Continuous variables were modeled as restricted cubic splines with 3 knots, unless the nonlinear term was not significant, in which case it was modeled as a linear term. Fits and predictions of the Cox proportional hazards regression models were obtained using the rms package in R, version 4.2.2. The Cox proportional hazards regression assumptions were checked using the cox.zph function from the survival package in R, version 4.2.2.

To identify the combinations of PRS and mean daily step counts associated with a hazard ratio (HR) of 1.00, we used a 100-knot spline function to fit the Cox proportional hazards regression ratio model estimations across a range of mean daily step counts for each PRS percentile. We then computed the inverse of the fitted spline function to determine the mean daily step count where the HR equals 1.00 for each PRS percentile. We repeated this process for multiple PRS percentiles to generate a plot of mean daily step counts as a function of PRS percentiles where the HR was 1.00. To estimate the uncertainty around these estimations, we applied a similar spline function to the upper and lower estimated 95% CIs of the Cox proportional hazards regression model to find the 95% CIs for the estimated mean daily step counts at each PRS percentile. Two-sided P < .05 indicated statistical significance.

We identified 3124 participants of European ancestry without obesity at baseline who agreed to link their personal activity data and EHR data and had available genome sequencing. Among those with available data, 2216 of 3051 (73%) were women and 835 of 3051 (27%) were men, and the median age was 52.7 (IQR, 36.4-62.8) years. In terms of race and ethnicity, 2958 participants (95%) were White compared with 141 participants (5%) who were of other race or ethnicity (which may include Asian, Black or African American, Middle Eastern or North African, Native Hawaiian or Other Pacific Islander, multiple races or ethnicities, and unknown race or ethnicity) ( Table ). The analytic sample was restricted to individuals assigned European ancestry based on the All of Us Genomic Research Data Quality Report. 16 A study flowchart detailing the creation of the analytic dataset is provided in eFigure 1 in Supplement 1 . The BMI-based PRS explained 8.3% of the phenotypic variation in obesity (β = 1.76; P = 2 × 10 −16 ). The median follow-up time was 5.4 (IQR, 3.4-7.0) years and participants walked a median of 8326 (IQR, 6499-10 389) steps/d. The incidence of obesity over the study period was 13% (101 of 781 participants) in the lowest PRS quartile and 43% (335 of 781 participants) in the highest PRS quartile ( P = 1.0 × 10 −20 ). We observed a decrease in median daily steps when moving from lowest (8599 [IQR, 6751-10 768]) to highest (8115 [IQR, 6340-10 187]) PRS quartile ( P = .01).

We next modeled obesity risk stratified by PRS percentile with the 50th percentile indexed to an HR for obesity of 1.00 ( Figure 1 ). The association between PRS and incident obesity was direct ( P = .001) and linear (chunk test for nonlinearity was nonsignificant [ P = .07]). The PRS and mean daily step count were both independently associated with obesity risk ( Figure 2 ). The 75th percentile BMI PRS demonstrated an 81% increase in obesity risk (HR, 1.81 [95% CI, 1.59-2.05]; P = 3.57 × 10 −20 ) when compared with the 25th percentile BMI PRS, whereas the 75th percentile median step count demonstrated a 43% reduction in obesity risk (HR, 0.57 [95% CI, 0.49-0.67]; P = 5.30 × 10 −12 ) when compared with the 25th percentile step count. The PRS × mean steps interaction term was not significant (χ 2 = 1.98; P = .37).

Individuals with a PRS at the 75th percentile would need to walk a mean of 2280 (95% CI, 1680-3310) more steps per day (11 020 total) than those at the 50th percentile to reduce the HR for obesity to 1.00 ( Figure 1 ). Conversely, those in the 25th percentile PRS could reach an HR of 1.00 by walking a mean of 3660 (95% CI, 2180-8740) fewer steps than those at the 50th percentile PRS. When assuming a median daily step count of 8740 (cohort median), those in the 75th percentile PRS had an HR for obesity of 1.33 (95% CI, 1.25-1.41), whereas those at the 25th percentile PRS had an obesity HR of 0.74 (95% CI, 0.69-0.79).

The mean daily step count required to achieve an HR for obesity of 1.00 across the full PRS spectrum and stratified by baseline BMI is shown in Figure 3 . To reach an HR of 1.00 for obesity, when stratified by baseline BMI of 22, individuals at the 50th percentile PRS would need to achieve a mean daily step count of 3290 (additional 3460 steps/d); for a baseline BMI of 24, a mean daily step count of 7590 (additional 4430 steps/d); for a baseline BMI of 26, a mean daily step count of 11 890 (additional 5380 steps/d); and for a baseline BMI of 28, a mean daily step count of 16 190 (additional 6350 steps/d).

When adding baseline BMI to the full Cox proportional hazards regression model, daily step count and BMI PRS both remain associated with obesity risk. When comparing individuals at the 75th percentile with those at the 25th percentile, the BMI PRS is associated with a 61% increased risk of obesity (HR, 1.61 [95% CI, 1.45-1.78]). Similarly, when comparing the 75th with the 25th percentiles, daily step count was associated with a 38% lower risk of obesity (HR, 0.62 [95% CI, 0.53-0.72]) (eFigure 2 in Supplement 1 ).

The cumulative incidence of obesity increases over time and with fewer daily steps and higher PRS. The cumulative incidence of obesity would be 2.9% at the 25th percentile, 3.9% at the 50th percentile, and 5.2% at the 75th percentile for PRS in year 1; 10.5% at the 25th percentile, 14.0% at the 50th percentile, and 18.2% at the 75th percentile for PRS in year 3; and 18.5% at the 25th percentile, 24.3% at the 50th percentile, and 30.9% at the 75th percentile for PRS in year 5 ( Figure 4 ). The eTable in Supplement 1 models the expected cumulative incidence of obesity at 1, 3, and 5 years based on PRS and assumed mean daily steps of 7500, 10 000, and 12 500.

We examined the combined association of daily step counts and genetic risk for increased BMI with the incidence of obesity in a large national sample with genome sequencing and long-term activity monitoring data. Lower daily step counts and higher BMI PRS were both independently associated with increased risk of obesity. As the PRS increased, the number of daily steps associated with lower risk of obesity also increased. By combining these data sources, we derived an estimate of the daily step count needed to reduce the risk of obesity based on an individual’s genetic background. Importantly, our findings suggest that genetic risk for obesity is not deterministic but can be overcome by increasing physical activity.

Our findings align with those of prior literature 9 indicating that engaging in physical activity can mitigate genetic obesity risk and highlight the importance of genetic background for individual health and wellness. Using the data from a large population-based sample, Li et al 9 characterized obesity risk by genotyping 12 susceptibility loci and found that higher self-reported physical activity was associated with a 40% reduction in genetic predisposition to obesity. Our study extends these results in 2 important ways. First, we leveraged objectively measured longitudinal activity data from commercial devices to focus on physical activity prior to and leading up to a diagnosis of obesity. Second, we used a more comprehensive genomewide risk assessment in the form of a PRS. Our results indicate that daily step count recommendations to reduce obesity risk may be personalized based on an individual’s genetic background. For instance, individuals with higher genetic risk (ie, 75th percentile PRS) would need to walk a mean of 2280 more steps per day than those at the 50th percentile of genetic risk to have a comparable risk of obesity.

These results suggest that population-based recommendations that do not account for genetic background may not accurately represent the amount of physical activity needed to reduce the risk of obesity. Population-based exercise recommendations may overestimate or underestimate physical activity needs, depending on one’s genetic background. Underestimation of physical activity required to reduce obesity risk has the potential to be particularly detrimental to public health efforts to reduce weight-related morbidity. As such, integration of activity and genetic data could facilitate personalized activity recommendations that account for an individual’s genetic profile. The widespread use of wearable devices and the increasing demand for genetic information from both clinical and direct-to-consumer sources may soon permit testing the value of personalized activity recommendations. Efforts to integrate wearable devices and genomic data into the EHR further support the potential future clinical utility of merging these data sources to personalize lifestyle recommendations. Thus, our findings support the need for a prospective trial investigating the impact of tailoring step counts by genetic risk on chronic disease outcomes.

The most important limitation of this work is the lack of diversity and inclusion only of individuals with European ancestry. These findings will need validation in a more diverse population. Our cohort only included individuals who already owned a fitness tracking device and agreed to link their activity data to the AoURP dataset, which may not be generalizable to other populations. We cannot account for unmeasured confounding, and the potential for reverse causation still exists. We attempted to diminish the latter concern by excluding prevalent obesity and incident cases within the first 6 months of monitoring. Genetic risk was simplified to be specific to increased BMI; however, genetic risk for other cardiometabolic conditions could also inform obesity risk. Nongenetic factors that contribute to obesity risk such as dietary patterns were not available, reducing the explanatory power of the model. It is unlikely that the widespread use of drug classes targeting weight loss affects the generalizability of our results, because such drugs are rarely prescribed for obesity prevention, and our study focused on individuals who were not obese at baseline. Indeed, less than 0.5% of our cohort was exposed to a medication class targeting weight loss (phentermine, orlistat, or glucagonlike peptide-1 receptor agonists) prior to incident obesity or censoring. Finally, some fitness activity tracking devices may not capture nonambulatory activity as well as triaxial accelerometers.

This cohort study used longitudinal activity data from commercial wearable devices, genome sequencing, and clinical data to support the notion that higher daily step counts can mitigate genetic risk for obesity. These results have important clinical and public health implications and may offer a novel strategy for addressing the obesity epidemic by informing activity recommendations that incorporate genetic information.

Accepted for Publication: January 30, 2024.

Published: March 27, 2024. doi:10.1001/jamanetworkopen.2024.3821

Corresponding Author: Evan L. Brittain, MD, MSc ( [email protected] ) and Douglas M. Ruderfer, PhD ( [email protected] ), Vanderbilt University Medical Center, 2525 West End Ave, Suite 300A, Nashville, TN 37203.

Author Contributions: Drs Brittain and Ruderfer had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Brittain, Annis, Master, Roden, Ruderfer.

Acquisition, analysis, or interpretation of data: Brittain, Han, Annis, Master, Hughes, Harris, Ruderfer.

Drafting of the manuscript: Brittain, Han, Annis, Master, Ruderfer.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: Brittain, Han, Annis, Master.

Obtained funding: Brittain, Harris.

Administrative, technical, or material support: Brittain, Annis, Master, Roden.

Supervision: Brittain, Ruderfer.

Conflict of Interest Disclosures: Dr Brittain reported receiving a gift from Google LLC during the conduct of the study. Dr Ruderfer reported serving on the advisory board of Illumina Inc and Alkermes PLC and receiving grant funding from PTC Therapeutics outside the submitted work. No other disclosures were reported.

Funding/Support: The All of Us Research Program is supported by grants 1 OT2 OD026549, 1 OT2 OD026554, 1 OT2 OD026557, 1 OT2 OD026556, 1 OT2 OD026550, 1 OT2 OD 026552, 1 OT2 OD026553, 1 OT2 OD026548, 1 OT2 OD026551, 1 OT2 OD026555, IAA AOD21037, AOD22003, AOD16037, and AOD21041 (regional medical centers); grant HHSN 263201600085U (federally qualified health centers); grant U2C OD023196 (data and research center); 1 U24 OD023121 (Biobank); U24 OD023176 (participant center); U24 OD023163 (participant technology systems center); grants 3 OT2 OD023205 and 3 OT2 OD023206 (communications and engagement); and grants 1 OT2 OD025277, 3 OT2 OD025315, 1 OT2 OD025337, and 1 OT2 OD025276 (community partners) from the National Institutes of Health (NIH). This study is also supported by grants R01 HL146588 (Dr Brittain), R61 HL158941 (Dr Brittain), and R21 HL172038 (Drs Brittain and Ruderfer) from the NIH.

Role of the Funder/Sponsor: The NIH had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data Sharing Statement: See Supplement 2 .

Additional Contributions: The All of Us Research Program would not be possible without the partnership of its participants.

Register for email alerts with links to free full-text articles
Access PDFs of free articles
Manage your interests
Save searches and receive search alerts

IMAGES

(PDF) Network Analysis as a Research Methodology in Science Education
Frontiers
(PDF) Computer Networking: A Survey
😂 Research paper on social networking. Research Paper Example On The
(PDF) Introduction to Social Network Analysis
Thematic Network Analysis

VIDEO

Tried foil paper network jam hack
QUANTITATIVE TECHNIQUES: DRAWING OF A NETWORK DIAGRAM
MODEL QUESTION PAPER ECA & NETWORK ANALYSIS #modelquestionpaper2024 #importantquestions #eca #netwok
Circuit Analysis and Network Analysis Model QP #modelquestionpaper2023 #circuitanalysis #network
Network Analysis
How To Start A Research Paper? #research #journal #article #thesis #phd

COMMENTS

Network analysis of multivariate data in psychological science
The schematic workflow of psychometric network analysis as discussed in this paper is represented in Fig. 2.Typically, one starts with a research question that dictates a data collection scheme ...
Network analysis: a brief overview and tutorial
Objective: The present paper presents a brief overview on network analysis as a statistical approach for health psychology researchers. Networks comprise graphical representations of the relationships (edges) between variables (nodes). Network analysis provides the capacity to estimate complex patterns of relationships and the network structure can be analysed to reveal core features of the ...
Full article: The past, present, and future of network monitoring: A
He is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the International Statistical Institute. His research interests include high-dimensional statistics, machine learning, network analysis, stochastic control, optimization and queueing theory. Cecile Paris is a Science Leader at Data61, CSIRO.
Full article: Network analytics: an introduction and illustrative
On the other hand, Figure 1 presents a network of diseases developed by the authors of this paper. In this undirected comorbidity network, the nodes are diseases. Two diseases are connected if these co-occur in the patients. ... Network analysis is a popular research area for prescriptive analytics. The majority of the prescriptive analytics ...
(PDF) Network analysis: A brief overview and tutorial
Objective: The present paper presents a brief overview on network analysis as a statistical approach for health psychology researchers. Networks comprise graphical representations of the ...
Deep Learning for Network Traffic Monitoring and Analysis (NTMA): A
1. Introduction. During the last years, NTMA have received much attention as a significant research topic in supporting the performance of networking [1].As common solutions in network management, NTMA techniques have been introduced both by industry and academia [2], [3].Although different NTMA techniques have been introduced, emerging networking technologies and paradigms have made ...
Exploring the raison d'etre behind metric selection in network analysis
Network analysis is a useful tool to analyse the interactions and structure of graphs that represent the relationships among entities, such as sectors within an urban system. Connecting entities in this way is vital in understanding the complexity of the modern world, and how to navigate these complexities during an event. However, the field of network analysis has grown rapidly since the ...
Network Analysis: A Definitional Guide to Important Concepts
Given the increase in network-based research, novices in social network analysis comprise a rapidly growing and increasingly important audience. In the course of our discussion, we define network analysis, explain its development, and suggest its future evolution. We introduce the key concepts, methods, theory, and applications of network ...
Network Analysis in the Social Sciences
The representation and analysis of community network structure remains at the forefront of network research in the social sciences today, with growing interest in unraveling the structure of computer-supported virtual communities that have proliferated in recent years ( 12 ). By the 1960s, the network perspective was thriving in anthropology.
Complex Networks: a Mini-review
Network analysis is a powerful tool that provides us a fruitful framework to describe phenomena related to social, technological, and many other real-world complex systems. In this paper, we present a brief review about complex networks including fundamental quantities, examples of network models, and the essential role of network topology in the investigation of dynamical processes as ...
Estimating psychological networks and their accuracy: A tutorial paper
Footnote 11 Network accuracy has been a blind spot in psychological network analysis, and the authors are aware of only one prior paper that has examined network accuracy (Fried et al. 2016), which used an earlier version of bootnet than the version described here. Further remediating the blind spot of network accuracy is of utmost importance ...
A review of network traffic analysis and prediction techniques
This paper presents a. review of several techniques proposed, used and. practiced for network traffic analysis and prediction. The distinctiveness and restrictions of previ ous. researches are ...
Graph Theory and Algorithms for Network Analysis
As a res ult, network analysis is made possible by the. graph theory and algorithms, which offer strong tools for studying. and comprehending the complicated linkages and structures of. complex ...
Network analysis to evaluate the impact of research funding on research
In 2004, the Alfred P. Sloan Foundation launched a new program focused on incubating a new field, "Microbiology of the Built Environment" (MoBE). By the end of 2017, the program had supported the publication of hundreds of scholarly works, but it was unclear to what extent it had stimulated the development of a new research community. We identified 307 works funded by the MoBE program, as ...
Introduction (Chapter 1)
Butts, Carter T. 2009. " Revisiting the Foundations of Network Analysis .". Science 325: 414. (A critical summary of the idea that all connected systems are "a network," and highlights the need to tailor approaches to the complexities of empirical settings.) Google Scholar. Easly, David, and Kleinberg, Jon. 2010.
A network pharmacology-based approach to explore potential ...
The network-target-based network pharmacology is a promising approach for the next-generation mode of drug research and development for TCM herbs or herbal formulae. ... network analysis ...
Social network analysis using deep learning: applications ...
1.4 Organization. The remaining part of the paper is organized as follows. Section 2 contains a brief description of deep learning techniques and online social networks. In Sect. 3, we describe an overview of the research related to opinion and sentiment analysis.Section 4 contains the research pertaining to text classification and recommender systems.
Thematic series on Social Network Analysis and Mining
Social networks were first investigated in social, educational and business areas. Academic interest in this field though has been growing since the mid twentieth century, given the increasing interaction among people, data dissemination and exchange of information. As such, the development and evaluation of new techniques for social network analysis and mining (SNAM) is a current key research ...
Connected Papers
Get a visual overview of a new academic field. Enter a typical paper and we'll build you a graph of similar papers in the field. Explore and build more graphs for interesting papers that you find - soon you'll have a real, visual understanding of the trends, popular works and dynamics of the field you're interested in.
A global timekeeping problem postponed by global warming
Spectral analysis shows that nonseasonal variations in these terms (also dominated by changes in ω a) are the largest contributor to changes in ∆ ω for periods from days to about 5 years.
Modelling
To tackle this gap in the literature, the authors conducted a study using Machine Learning (ML) algorithms and Social Network Analysis (SNA) to predict CEM-related citation metrics. ... provides an outlook for future research directions and describes possible research applications. Feature papers are submitted upon individual invitation or ...
Network Analysis as a Research Methodology in Science Education Research
rules about how the emergent patterns visible in the network arose. (From Bruun, 2016.) Network Analysis as a Research M ethodology in Science Education Research. tual blend consists of at least ...
Road network analysis of Guwahati city using GIS
Network analysis in GIS rests firmly on the theoretical foundation of the mathematical sub disciplines of graph theory and topology. The most common and familiar implementations of network models are those used to represent the networks with which much of the population interacts every day: transportation and communications networks [].Routing is the act of selecting a course of travel, and it ...
Comment Sentiment Analysis Using Bidirectional Encoder ...
This research paper unravels the power of Bidirectional Encoder Representations from Transformers (BERT), a cutting-edge language representation model, in the realm of comment sentiment analysis. By focusing on two main aspects - the working principles of BERT and the methodology of comment sentiment analysis - the paper aims to captivate ...
Full article: Network analysis: a brief overview and tutorial
ABSTRACT. Objective: The present paper presents a brief overview on network analysis as a statistical approach for health psychology researchers. Networks comprise graphical representations of the relationships (edges) between variables (nodes). Network analysis provides the capacity to estimate complex patterns of relationships and the network structure can be analysed to reveal core features ...
Remote Sensing
This paper unveils a novel Memory-Augmented Deep Unfolding Network (MADUN) for SAR imaging in marine environments. ... Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an ...
Predicting and improving complex beer flavor through machine ...
For each beer, we measure over 200 chemical properties, perform quantitative descriptive sensory analysis with a trained tasting panel and map data from over 180,000 consumer reviews to train 10 ...
Physical Activity and Incident Obesity Across the ...
Importance Despite consistent public health recommendations, obesity rates in the US continue to increase. Physical activity recommendations do not account for individual genetic variability, increasing risk of obesity. Objective To use activity, clinical, and genetic data from the All of Us Research Program (AoURP) to explore the association of genetic risk of higher body mass index (BMI ...
(PDF) Network forensics analysis using Wireshark
to join the slower channel, that the connection process is. below the message of 'JOIN #sl0w3r l 03dx', and that. Network forensics analysis using Wireshark 97. it is working on downloading ...
Analysis of spatiotemporal differentiation characteristics of rural
This article takes Shaoguan City, a resource-exhausted city, as the research object. Using multi temporal remote sensing images, statistical yearbooks, and other data, based on the analysis of indicators such as biological abundance index, vegetation coverage index, water network density index, land stress index, and pollution load index, the ecological environment status index is used for ...