Compute the adjacency matrix of a network from a database of movements records.

This function computes the adjacency matrix of a network of facilities across which subjects can be transferred. The matrix is computed from a database that contains the records of the subjects' stays in the facilities. This function is a simple wrapper around the two functions edgelist_from_base, which computes the edgelist of the network from the database, and matrix_from_edgelist, which converts the edgelist into the adjacency matrix.

Usage

matrix_from_base(
  base,
  window_threshold = 365,
  count_option = "successive",
  prob_params = c(0.0036, 1/365, 0.128),
  condition = "dates",
  noloops = TRUE,
  nmoves_threshold = NULL,
  flag_vars = NULL,
  flag_values = NULL,
  verbose = FALSE
)

Arguments

base

(data.table) A database of records of stays of subjects in facilities. The table should have at least the following columns:

subjectID (character) unique subject identifier
facilityID (character) unique facility identifier
admDate (POSIXct) date of admission in the facility
disDate (POSIXct) date of discharge of the facility

window_threshold

(integer) A number of days. If two stays of a subject at two facilities occurred within this window, this constitutes a connection between the two facilities (given that potential other conditions are met).

count_option

(character) How to count connections. Options are "successive", "probability" or "all". See details.

prob_params

(vector of numeric) Three numerical values to calculate the probability that a movement causes an introduction from hospital A to hospital B. See Donker T, Wallinga J, Grundmann H. (2010) <doi:10.1371/journal.pcbi.1000715> for more details. For use with count_option="probability". prob_params[1] is the rate of acquisition in hospital A (related to LOS in hospital A). Default: 0.0036 prob_params[2] is the rate of loss of colonisation (related to time between admissions). Default: 1/365 prob_params[4] is the rate of transmission to other patients in hospital B (related to LOS in hospital B). Default: 0.128

condition

(character) Condition(s) used to decide what constitutes a connection. Can be "dates", "flags", or "both". See details.

noloops

(boolean). Should transfers within the same nodes (loops) be kept or set to 0. Defaults to TRUE, removing loops (setting matrix diagonal to 0).

nmoves_threshold

(numeric) A threshold for the minimum number of subject transfer between two facilities. Set to NULL to deactivate, default to NULL.

flag_vars

(list) Additional variables that can help flag a transfer, besides the dates of admission and discharge. Must be a named list of two character vectors which are the names of the columns that can flag a transfer: the column that can flag a potential origin, and the column that can flag a potential target. The list must be named with "origin" and "transfer". Eg: list("origin" = "var1", "target" = "var2"). See details.

flag_values

(list) A named list of two character vectors which contain the values of the variables in flag_var that are matched to flag a potential transfer. The list must be named with "origin" and "transfer". The character vectors might be of length greater than one. Eg: list("origin" = c("value1", "value2"), "target" = c("value2", "value2")). The values in 'origin' and 'target' are the values that flag a potential origin of a transfer, or a potential target, respectively. See details.

verbose

TRUE to print computation steps

Value

A square matrix, the adjacency matrix of the network.

Details

The edgelist contains the information on the connections between nodes of the network, that is the movements of subjects between facilities. The edgelist can be in two different formats: long or aggregated. In long format, each row corresponds to a single movement between two facilities, therefore only two columns are needed, one containing the origin facilities of a movement, the other containing the target facilities. In aggregated format, the edgelist is aggregated by unique pairs of origin-target facilities. Thus, each row corresponds to a unique connection between two facilities, and the table contains an additional variable which is the count of the number of movements recorded for the pair. If the edgelist is provided in long format, it will be aggregated to compute the matrix.

Examples

mydb <- create_fake_subjectDB(n_subjects = 100, n_facilities = 10)
myBase <- checkBase(mydb)
#> Checking for missing values...
#> Checking for duplicated records...
#> Removed 0 duplicates
#> Done.
matrix_from_base(myBase)
#>     f01 f02 f03 f04 f05 f06 f07 f08 f09 f10
#> f01   0   1   1   3   1   1   2   0   0   1
#> f02   1   0   1   0   2   1   0   3   2   1
#> f03   2   1   0   0   2   4   7   2   1   0
#> f04   0   3   1   0   2   0   0   1   0   1
#> f05   0   2   2   1   0   2   0   0   2   2
#> f06   0   1   4   5   2   0   2   2   2   0
#> f07   0   2   2   1   5   2   0   2   1   2
#> f08   1   1   1   0   0   2   2   0   1   0
#> f09   0   1   3   1   1   1   2   2   0   0
#> f10   3   0   1   0   3   3   1   1   2   0