knitr::opts_chunk$set(echo = TRUE)

A Personalized Disease Network (PDN) is a connected graph that represents the disease evolution of an individual patient. We use diagnoses, medical interventions, and their dates to build a PDN. Each node in the PDN represents an event (e.g. Heart failure) and each node is connected to another by a directed edge if the dates fulfill a certain criterion. PDN were introduced by Javier Cabrera and proved to be effective for improving goodness of fit of survival models, dominating individual comorbidity variables. (Cabrera et. al. 2016)

Like many other R packages, the simplest way to obtain glmnet is to install it directly from CRAN. Type the following command in R console:

install.packages("PDN", repos = "https://cran.us.r-project.org")

Users may change the repos options depending on their locations and preferences. Other options such as the directories where to install the packages can be altered in the command. For more details, see help(install.packages). Here the R package has been downloaded and installed to the default directories. Alternatively, users can download the package source at "https://cran.r-project.org/package=PDN" and type Unix commands to install it to the desired location.

The purpose of this section is to give users a general sense of the package, including the components, what they do and some basic usage. We will briefly go over the main functions, see the basic operations and have a look at the outputs. Users may have a better idea after this section what functions are available, which one to choose, or at least where to seek help. More details are given in later sections.

First, we load the glmnet package:

```
library(PDN)
```

The package contains five functions: **buildnetworks**, **datecut**, **draw.PDN**, **draw.PDN.circle** and **draw.PDN.cluster**, and three sample data sets: **comorbidity_data**, **demographic_data** and **survival_data**.

You can visit the help page of the package using the following command:

help(package = "PDN")

For **comorbidity_data**, its columns represent different diagnoses, its rows represent different patients, and the entries represent the time difference between the respective event and January 1st 1970. NA means the patient does not have the respective diagnosis.

```
summary(comorbidity_data)
```

For **demographic_data**, it contains the demographic information of each patients It has five columns which are sex, race, hispan, dshyr and prime.

```
summary(demographic_data)
```

For **survival_data**, it contains information of survival days from admission to death, as well as censor information regarding whether the patient is dead or not.

```
summary(survival_data)
```

In order to create a PDN based on **comorbidity_data**, we need to find the criterion for each pair of diagnoses that define the connection of the network.

Generally, it is important to choose an appropriate cutoff point such that it captures information of the interaction between the two diagnoses.

For CMTHY and HTN, the distribution of the time difference is shown below:

abc = comorbidity_data$CMTHY - comorbidity_data$HTN abc = abc[!is.na(abc)] bins=seq(floor(min(abc)),ceiling(max(abc)+1000),500) hist(abc,breaks=bins, yaxt="n", xaxt="n", main = "", ylab="Number of Patients", xlab="Time difference between CMTHY and HTN in days", col="grey" ) axis(side=2, at=axTicks(2), labels=formatC(axTicks(2), format="d", big.mark=',')) axis(side=1, at=axTicks(1), labels=formatC(axTicks(1), format="d", big.mark=',')) box()

We used Cox PH regression to find the best criterion for cutoff point following the rule proposed by (Cabrera et. al. 2016).

k1 = datecut(comorbidity_data,survival_data[,1],survival_data[,2]) k1[1:20]

By using function buildnetworks, we past **comorbidity_data** and criterion information we get from the datecut into the function, then we will get the adjacency matrix for the network shown below:

a = buildnetworks(comorbidity_data,k1) a[1:10,1:10]

In our case, the nodes have chronological order, and we opt to incorporate this order in the network visualization for a more informative and clean result. Below left is the example PDN from function **draw.PDN.circle**.

PDN for individual patient:

datark = t(apply(comorbidity_data,1,rank)) dak = sort(datark[1,]) draw.PDN.circle(a[1,],dak) title("PDN for Patient 1")

Here is the example using the functioin **draw.PDN**, which utilize network and ggplot2 for better visualization:

nn = names(comorbidity_data) draw.PDN(a[7,],nn)

By looping each patient through draw.PDN.circle, we obtain the visualizations for the set of patients:

par(mfrow=c(4,5)) for(i in 1 : 20){ dak = apply(datark,2,sort) draw.PDN.circle(a[i,],dak[i,]) title(main=paste("Patient",i)) }

Here is an example of using **draw.PDN.cluster** to visulaize network of cluster, we use different colors to represent weighted connection of each nodes.

##Clustering Example k1 = datecut(comorbidity_data,survival_data[,1],survival_data[,2]) a = buildnetworks(comorbidity_data,k1) datark = t(apply(comorbidity_data,1,rank)) require(survival) zsq = NULL for(i in 1:ncol(a)){ a1 = (summary(coxph(Surv(as.numeric(survival_data[,1]),survival_data[,2]) ~ a[,i], data=as.data.frame(a)))$coefficient[,4])^2 zsq = cbind(zsq,a1) } zsq = as.numeric(zsq) wi=zsq/sum(zsq,na.rm=TRUE) wi[wi<10^ -3]=10^ -3 wi[is.na(wi)]=10^ -3 #weighted matrix wa = NULL for(i in 1:ncol(a)){ wa = cbind(wa,a[,i]*wi[i]) } #PCA pr.out=prcomp(wa) x.svd=svd(wa) x.score1 <- wa %*% x.svd$v x.score2 <- x.svd$u %*% diag(x.svd$d) ##HC cluster using the first 8 PCA scores dp<-dist(x.score2[,1:8]) hcp<-hclust(dp, method="ward.D") ##Plot Network s1=rev(sort(apply(a[cutree(hcp,3)==3,],2,mean)))[1:50] dak = sort(apply(datark[cutree(hcp,3)==3,],2,mean,na.rm=TRUE)) names(dak) = unlist(strsplit(names(dak),"DAT")) draw.PDN.cluster(s1,dak)

In the below example using PDN network information to perform regression, we combine comorbidity information and the adjacency matrix from buildnetworks as our design matrix which is shown below:

x = cbind(!is.na(comorbidity_data),a) x[1:10,1:10]

We then use the **glmnet** package to perform k-fold cross-validation for the Cox model. We extract two optimal lambdas, that is lambda.min: the ?? at which the minimal MSE is achieved, and lambda.1se: the largest ?? at which the MSE is within one standard error of the minimal MSE.
We can check the active covariates in our model and see their coefficients based on the lambda we have chosen.

require(survival) require(glmnet) y=Surv(survival_data[,1],survival_data[,2]) glm1 = cv.glmnet(x,y,family="cox",alpha=0.8) b = glmnet(x,y,family="cox",alpha=0.8,lambda=c(glm1$lambda.min,glm1$lambda.1se))$beta b[1:20,]

Through stepwise model selection based on the AIC criterion, we can see that the network variables dominate individual comorbidity variables. We conclude that the network information is significant for improving the model compared to just comorbidity information.

isel = b[,2]!=0 bcox = coxph(y~.,data=data.frame(x[,isel])) step1 = step(bcox) step1$anova

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.