% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/network.R
\name{network}
\alias{network}
\title{Define a Network Generator}
\usage{
network(name, netfun, ..., params = list())
}
\arguments{
\item{name}{Character string specifiying the name of the current network, may be used for adding new network that replaces the existing one (resample previous network)}

\item{netfun}{Character name of the user-defined network generating function, can be any R function that returns a matrix of friend IDs of dimension \code{c(n, Kmax)}.
The function must accept a named argument \code{n} that specifies the total sample size of the network.
The matrix of network IDs should have \code{n} rows and \code{Kmax} columns, where each row \code{i} contains a vector of unique IDs in \code{1:n} that are \code{i}'s friends
(observations that can influence \code{i}'s node distribution), except for \code{i} itself.
Arguments to \code{netfun} can be either passed as named arguments to \code{network} function itself or as a named list of parameters \code{params}.
These network arguments can themselves be functions of the previously defined node names,
allowing for network sampling itself to be dependent on the previously simulated node values, as shown in Example 2.}

\item{...}{Named arguments specifying distribution parameters that are accepted by the network sampling function in \code{netfun}.
These parameters can be R expressions that are themselves formulas of the past node names.}

\item{params}{A list of additional named parameters to be passed on to the \code{netfun} function.
The parameters have to be either constants or character strings of R expressions of the past node names.}
}
\value{
A list containing the network object(s) of type \code{DAG.net}, this will be utilized when data is simulated with \code{sim} function.
}
\description{
Define a network generator by providing a function (using the argument \code{netfun}) which will simulate a network of connected friends for observations \code{i} in \code{1:n}.
This network then serves as a backbone for defining and simulating from the structural equation models for dependent data.
In particular, the network allows new nodes to be defined as functions of the previously simulated node values of \code{i}'s friends, across all observations \code{i}.
Let \code{F_i} denote the set of friends of one observation \code{i} (observations in \code{F_i} are assumed to be "connected" to \code{i}) and
refer to the union of these sets \code{F_i} as a "network" on \code{n} observations, denoted by \code{F}.
A user-supplied network generating function \code{netfun} should be able to simulate such network \code{F} by returning a matrix of \code{n} rows,
where each row \code{i} defines a friend set \code{F_i}, i.e., row \code{i} should be a vector of observations in \code{1:n} that are connected to \code{i} (friends of \code{i}),
with the remainder filled by \code{NA}s.
Each friend set \code{F_i} can contain up to \code{Kmax} unique indices \code{j} from \code{1:n}, except for \code{i} itself.
\code{F_i} is also allowed to be empty (row \code{i} has only \code{NA}s), implying that \code{i} has no friends.
The functionality is illustrated in the examples below. For additional information see Details.
To learn how to use the \code{node} function for defining a node as a function of the friend node values, see Syntax and Network Summary Measures.
}
\details{
Without the network of friends, the \code{DAG} objects constructed by calling the \code{node} function can only specify structural equation models for independent and identically distributed data.
That is, if no network is specified, for each observation \code{i} a node can be defined conditionally only on \code{i}'s own previously simulated node values.
As a result, any two observations simulated under such data-generating model are always independent and identically distributed.
Defining a network \code{F} allows one to define a new structural equation model where a node for each observation \code{i} can depend
on its own simulated past, but also on the previously simulated node values of \code{i}'s friends (\code{F_i}).
This is accomplished by allowing the data generating distribution for each observation \code{i}'s node to be defined conditionally
on the past node values of \code{i}'s friends (observations in \code{F_i}).
The network of friends can be used in subsequent calls to \code{node} function where new nodes (random variables) defined by the \code{node} function can depend on the node values of \code{i}'s friends
(observations in the set \code{F_i}). During simulation it is assumed observations on \code{F_i} can simultaneously influence \code{i}.

Note that the current version of the package does not allow combining time-varying node indexing \code{Var[t]} and network node indexing \code{Var[[net_indx]]}
for the same data generating distribution.

Each argument for the input network can be an evaluable R expression. All formulas are captured by delayed evaluation and are evaluated during the simulation.
Formulas can refer to standard or user-specified R functions that must only apply to the values of previously defined nodes
(i.e. node(s) that were called prior to \code{network()} function call).
}
\section{Syntax}{

The \code{network} function call that defines the network of friends can be added to a growing \code{DAG} object by using \code{'+'} syntax, much like a new \code{node} is added to a \code{DAG}.
Subsequently defined nodes (\code{node} function calls) can employ the double square bracket subsetting syntax to reference previously simulated node values
for specific friends in \code{F_i} simultaneously across all observations \code{i}.
For example, \code{VarName[[net_indx]]} can be used inside the \code{node} formula to reference the node \code{VarName} values of \code{i}'s friends in \code{F_i[net_indx]},
simultaneously across all \code{i} in \code{1:n}.

The friend subsetting index \code{net_indx} can be any non-negative integer vector that takes values from 0 to \code{Kmax},
where 0 refers to the \code{VarName} node values of observation \code{i} itself (this is equivalent to just using \code{VarnName} in the \code{node} formula),
\code{net_indx} value of 1 refers to node \code{VarName} values for observations in \code{F_i[1]}, across all \code{i} in \code{1:n}
(that is, the value of \code{VarName} of \code{i}'s first friend \code{F_i[1]}, if the friend exists and \code{NA} otherwise),
and so on, up to \code{net_indx} value of \code{Kmax}, which would reference to the last friend node values of \code{VarName}, as defined by observations in \code{F_i[Kmax]} across all \code{i}.
Note that \code{net_indx} can be a vector (e.g, \code{net_indx=c(1:Kmax)}),
in which case the result of the query \code{VarName[[c(1:Kmax)]]} is a matrix of \code{Kmax} columns and \code{n} rows.

By default, \code{VarName[[j]]} evaluates to missing (\code{NA}) when observation \code{i} does not have a friend under \code{F_i[j]} (i.e., in the \code{j}th spot of \code{i}'s friend set).
This default behavior however can be changed to return 0 instead of \code{NA}, by passing an additional argument \code{replaceNAw0 = TRUE} to the corresponding \code{node} function.
}

\section{Network Summary Measures}{

One can also define summary measures of the network covariates by specifying a node formula that applies an R function to the result of \code{VarName[[net_indx]]}.
The rules for defining and applying such summary measures are identical to the rules for defining summary measures for time-varying nodes VarName[t_indx].
For example, use \code{sum(VarName[[net_indx]])} to define a summary measure as a sum of \code{VarName} values of friends in \code{F_i[net_indx]}, across all observations \code{i} in \code{1:n}.
Similarly, use \code{mean(VarName[[net_indx]])} to define a summary measure as a mean of \code{VarName} values of friends in \code{F_i[net_indx]}, across all \code{i}.
For more details on defining such summary functions see the \code{simcausal} vignette.
}
\examples{
#--------------------------------------------------------------------------------------------------
# EXAMPLE 1. USING igraph R PACKAGE TO SIMULATE NETWORKS
#--------------------------------------------------------------------------------------------------

#--------------------------------------------------------------------------------------------------
# Example of a network sampler, will be provided as "netfun" argument to network(, netfun=);
# Generates a random graph according to the G(n,m) Erdos-Renyi model using the igraph package;
# Returns (n,Kmax) matrix of net IDs (friends) by row;
# Row i contains the IDs (row numbers) of i's friends;
# i's friends are assumed connected to i and can influence i in equations defined by node())
# When i has less than Kmax friends, the remaining i row entries are filled with NAs;
# Argument m_pn: > 0
# a total number of edges in the network as a fraction (or multiplier) of n (sample size)
#--------------------------------------------------------------------------------------------------
gen.ER <- function(n, m_pn, ...) {
  m <- as.integer(m_pn*n)
  if (n<=10) m <- 20
  igraph.ER <- igraph::sample_gnm(n = n, m = m, directed = TRUE)
  sparse_AdjMat <- igraph.to.sparseAdjMat(igraph.ER)
  NetInd_out <- sparseAdjMat.to.NetInd(sparse_AdjMat)
  return(NetInd_out$NetInd_k)
}

D <- DAG.empty()
# Sample ER model network using igraph::sample_gnm with m_pn argument:
D <- D + network("ER.net", netfun = "gen.ER", m_pn = 50)
# W1 - categorical (6 categories, 1-6):
D <- D +
  node("W1", distr = "rcat.b1",
        probs = c(0.0494, 0.1823, 0.2806, 0.2680, 0.1651, 0.0546)) +
# W2 - binary infection status, positively correlated with W1:
  node("W2", distr = "rbern", prob = plogis(-0.2 + W1/3)) +
# W3 - binary confounder:
  node("W3", distr = "rbern", prob = 0.6)
# A[i] is a function W1[i] and the total of i's friends values W1, W2 and W3:
D <- D + node("A", distr = "rbern",
              prob = plogis(2 + -0.5 * W1 +
                            -0.1 * sum(W1[[1:Kmax]]) +
                            -0.4 * sum(W2[[1:Kmax]]) +
                            -0.7 * sum(W3[[1:Kmax]])),
              replaceNAw0 = TRUE)
# Y[i] is a function of netW3 (friends of i W3 values) and the total N of i's friends
# who are infected AND untreated:
D <- D + node("Y", distr = "rbern",
              prob = plogis(-1 + 2 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
                            -2 * sum(W3[[1:Kmax]])
                            ),
              replaceNAw0 = TRUE)
# Can add N untreated friends to the above outcome Y equation: sum(1 - A[[1:Kmax]]):
D <- D + node("Y", distr = "rbern",
              prob = plogis(-1 + 1.5 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
                            -2 * sum(W3[[1:Kmax]]) +
                            0.25 * sum(1 - A[[1:Kmax]])
                            ),
              replaceNAw0 = TRUE)
# Can add N infected friends at baseline to the above outcome Y equation: sum(W2[[1:Kmax]]):
D <- D + node("Y", distr = "rbern",
              prob = plogis(-1 + 1 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
                            -2 * sum(W3[[1:Kmax]]) +
                            0.25 * sum(1 - A[[1:Kmax]]) +
                            0.25 * sum(W2[[1:Kmax]])
                            ),
              replaceNAw0 = TRUE)
Dset <- set.DAG(D, n.test = 100)
# Simulating data from the above sem:
datnet <- sim(Dset, n = 1000, rndseed = 543)
head(datnet)
# Obtaining the network object from simulated data:
net_object <- attributes(datnet)$netind_cl
# Max number of friends:
net_object$Kmax
# Network matrix
head(attributes(datnet)$netind_cl$NetInd)

#--------------------------------------------------------------------------------------------------
# EXAMPLE 2. USING CUSTOM NETWORK GENERATING FUNCTION
#--------------------------------------------------------------------------------------------------

#--------------------------------------------------------------------------------------------------
# Example of a user-defined network sampler(s) function
# Arguments K, bslVar[i] (W1) & nF are evaluated in the environment of the simulated data then
# passed to genNET() function
  # - K: maximum number of friends for any unit
  # - bslVar[i]: used for contructing weights for the probability of selecting i as
  # someone else's friend (weighted sampling), when missing the sampling goes to uniform
  # - nF[i]: total number of friends that need to be sampled for observation i
#--------------------------------------------------------------------------------------------------
genNET <- function(n, K, bslVar, nF, ...) {
  prob_F <- plogis(-4.5 + 2.5*c(1:K)/2) / sum(plogis(-4.5 + 2.5*c(1:K)/2))
  NetInd_k <- matrix(NA_integer_, nrow = n, ncol = K)
  nFriendTot <- rep(0L, n)
  for (index in (1:n)) {
    FriendSampSet <- setdiff(c(1:n), index)
    nFriendSamp <- max(nF[index] - nFriendTot[index], 0L)
    if (nFriendSamp > 0) {
      if (length(FriendSampSet) == 1)  {
        friends_i <- FriendSampSet
      } else {
        friends_i <- sort(sample(FriendSampSet, size = nFriendSamp,
                          prob = prob_F[bslVar[FriendSampSet] + 1]))
      }
      NetInd_k[index, ] <- c(as.integer(friends_i),
                            rep_len(NA_integer_, K - length(friends_i)))
      nFriendTot[index] <- nFriendTot[index] + nFriendSamp
    }
  }
  return(NetInd_k)
}

D <- DAG.empty()
D <- D +
# W1 - categorical or continuous confounder (5 categories, 0-4):
  node("W1", distr = "rcat.b0",
        probs = c(0.0494, 0.1823, 0.2806, 0.2680, 0.1651, 0.0546)) +
# W2 - binary infection status at t=0, positively correlated with W1:
  node("W2", distr = "rbern", prob = plogis(-0.2 + W1/3)) +
# W3 - binary confounder:
  node("W3", distr = "rbern", prob = 0.6)

# def.nF: total number of friends for each i (0-K), each def.nF[i] is influenced by categorical W1
K <- 10
set.seed(12345)
normprob <- function(x) x / sum(x)
p_nF_W1_mat <- apply(matrix(runif((K+1)*6), ncol = 6, nrow = (K+1)), 2, normprob)
colnames(p_nF_W1_mat) <- paste0("p_nF_W1_", c(0:5))
create_probs_nF <- function(W1) t(p_nF_W1_mat[,W1+1])
vecfun.add("create_probs_nF")
D <- D + node("def.nF", distr = "rcat.b0", probs = create_probs_nF(W1))

# Adding the network generator that depends on nF and categorical W1:
D <- D + network(name="net.custom", netfun = "genNET", K = K, bslVar = W1, nF = def.nF)
# Define A[i] is a function W1[i] as well as the total sum of i's friends values for W1, W2 and W3:
D <- D + node("A", distr = "rbern",
              prob = plogis(2 + -0.5 * W1 +
                            -0.1 * sum(W1[[1:Kmax]]) +
                            -0.4 * sum(W2[[1:Kmax]]) +
                            -0.7 * sum(W3[[1:Kmax]])),
              replaceNAw0 = TRUE)
# Y[i] is a the total N of i's friends who are infected AND untreated
# + a function of friends W3 values
D <- D + node("pYRisk", distr = "rconst",
              const = plogis(-1 + 2 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
                              -1.5 * sum(W3[[1:Kmax]])),
              replaceNAw0 = TRUE)

D <- D + node("Y", distr = "rbern", prob = pYRisk)
Dset <- set.DAG(D, n.test = 100)

# Simulating data from the above sem:
datnet <- sim(Dset, n = 1000, rndseed = 543)
head(datnet, 10)
# Obtaining the network object from simulated data:
net_object <- attributes(datnet)$netind_cl
# Max number of friends:
net_object$Kmax
# Network matrix
head(attributes(datnet)$netind_cl$NetInd)
plotDAG(Dset)
}
\seealso{
\code{\link{igraph.to.sparseAdjMat}}; \code{\link{sparseAdjMat.to.NetInd}}; \code{\link{NetInd.to.sparseAdjMat}}; \code{\link{sparseAdjMat.to.igraph}}
}

