Survival Analysis of MIMIC-II ICU patients using Time-Sclicing Cox Regression Method

From DIR
Jump to: navigation, search
Author Jingyi Shi, Amirreza Niakanlahiji, Riyi Qiu
For Dataset MIMIC
All Blogs of MIMIC Survival Analysis of MIMIC-II ICU patients using Time-Sclicing Cox Regression Method

In this blog, we will show a real course project with a set of experiments to help users get started with the MIMIC dataset. This project was based on imitating a recent publication titled Mortality Prediction in ICUs Using a Novel Time-Slicing Cox Regression Method (Wang et al.) [2], in which extracted MIMIC data were applied to a new method of survival analysis. In order to compare our results with Wang's paper, we used the same version of MIMIC, which was MIMIC-II v2.6.


Survival analysis is generally using a set of statistical procedures to analyze event occurrence by given time-to-event data. Time, in this context, means the number of years, months, weeks, days or any time units from the beginning of following up an individual until the event that researchers are interested in occurs. An event can be death, disease incidence, relapse, recovery or other tractable change that happens to the individual. [1] Without doubts, survival analysis can play a vital role to improve health care by correctly predicting events, so that hospitals are able to make a difference in advance. However, every analysis in the clinical context using either statistical methods or machine learning methods is facing a considerable problem, that is there are usually no large enough data sets, especially for rare events, to make trends or thresholds clear enough when modeling. In order to cursorily solve this problem, the Time-Slicing Cox Regression (TS-Cox) method can be an alternative being referred to. [2] The TS-Cox is principally based on Cox regression method but using Time-Slicing before modeling. Defined by the TS-Cox, one individual is sliced into multiple windows (new individuals) based on a window size and a step size. Therefore, the sample size of individuals for training is exponentially growing.

In this project, we imitated the TS-Cox method in the publication of Wang et al. [2] in order to learn and improve. Here, we used the ICU patient data in the same database--the MIMIC-II (Medical Information Mart for Intensive Care II) v2.6 database [3]. After extracting the same data and waveforms, we defined an individual to be one sliced window of an ICU patient, the event to be the death of the patient, and the time to be failure time [2] (in minutes) of the window. Three different window sizes (512, 1024, 2048), three step sizes (50, 100, 200), two sampling methods (stratified sampling and unstratified sampling), and two sets of features (a set of 14 features and a set of 20 features) were performed during the survival analysis, hence 36 different results including models and evaluations finally.

In the rest of the blog, we elaborate the procedures of using the TS-Cox method as well as 10-fold cross-validation for ICU-patient survival analysis in the Method section. Models after training and evaluation comparisons after testing are clarified in the Results section. Finally, we draw a conclusion, and discuss the risks of using TS-Cox and directions of future improvement in the Discussion and Conclusion section.


In order to imitate the TS-Cox work of Wang et al., we implemented a workflow as follows: (1) data extraction; (2) time slicing and missing data handling; (3) feature computation, including 14 core features and 6 entropy features; (4) modeling and evaluations. Although there was only one window size (1024 minutes), one step size (100 minutes), and one sampling method (unstratified method) implemented in Wang's paper, we performed more--three window sizes, three step sizes, and two sampling methods in total--in consideration of unintended findings.

Data Source

The data used for analysis were collected, converted and extracted from the MIMIC-II v2.6 database [3], which is accessible via the MIMIC-II Explorer ( [4]. The explorer is an SQL interface to the MIMIC-II database, on which we directly queried for relevant data. Two tables--icustay_detail and waveform_trends --were used and two types of information were especially collected: a patient's death status--icustay_expire_flg attribute in the icustay_detail table, and the waveform file name of a patient's last ICU stay--filename attribute in the waveform_trends table. The SQL statement for the data is shown below. There were 2781 unique waveform file names of last ICU stays as a result.

SELECT icustay_detail.subject_id, icustay_detail.icustay_expire_flg, waveform_trends.filename 
FROM icustay_detail INNER JOIN waveform_trends ON icustay_detail.subject_id=waveform_trends.subject_id AND waveform_trends.filename<>'[none]' AND icustay_detail.subject_icustay_total_num=icustay_detail.subject_icustay_seq

After achieving the file names, waveforms were automatically downloaded from by running our shell script attached below.

# Shell script
# Download waveforms by given waveform file names
while IFS='' read -r line ---- [[ -n "$line" ]]; do
	echo "downloading: $line"
done < "$1"

# Step 1: Save the shell script to a file named
# Step 2: Extract and save all the file names from the previous SQL query. A file name a line. Name this file filenames.txt
# Step 3: Run "./ filenames.txt" in terminal window
# Downloading takes time and storage. Be patient ...

The downloaded waveforms were signal files which can not be directly used for analysis, so a conversion was then executed by running another shell script below using the rdsamp command provided by WFDB software package [5].

# Shell script
# Convert waveforms to numerical data
while IFS='' read -r line ---- [[ -n "$line" ]]; do
	echo "converting: $line"
	rdsamp -r original_dat/$line -H -v -pd >$line.txt
done < "$1"

# Step 1: Install the WFDB software package follow the tutorial on
# Step 2: Save the shell script to a file named
# Step 3: Run "./ filenames.txt" in terminal window
# For more information about conversion, see the tutorial on

By conversion, a set of attributes were listed in a numerical way, including time, HR (heart rate), ABPSys, PULSE, RESP, SpO2 (peripheral capillary oxygen saturation), NBPSys, etc.. Because we used time, death status, HR and SpO2 data as features in the project, we extracted (by two Python scripts below) only the numerical waveforms that included full HR and SpO2 records without missing columns or "-" values. Finally, there were 1245 numerical waveforms left for analysis.

# Python script 1

data=open('<your_path>\mimic2\signals_txt\\filenames.txt','r') #==change==
log=open('<your_path>\mimic2\signals_txt_HR SpO2\log.txt','w') #==change==
namefile=open('<your_path>\mimic2\signals_txt_HR SpO2\\filenames_in_this_folder.txt','w') #==change==
for line in data:
    if len(line)>0:
        print('converting: '+str(progress))
        data1=open('<your_path>\mimic2\signals_txt\\'+fileN+'.txt', 'r') #==change==
        if len(header)>1:
            HR_i=column.index("     HR") if "     HR" in column else None
            SpO2_i=column.index("   SpO2") if "   SpO2" in column else None
            if HR_i>0 and SpO2_i>0:
                data1.readline() #remove header
                output=open('<your_path>\mimic2\signals_txt_HR SpO2\\'+fileN+'.txt', 'w') #==change==
                for line1 in data1:
                    value[0]=" ".join(value[0].split())
                    value[HR_i]=" ".join(value[HR_i].split())
                    value[SpO2_i]=" ".join(value[SpO2_i].split())
log.write('number of original files: '+str(progress)+'\n'+'number of files after removing missing-column files: '+str(file_n)+'\n')

# Python script 2

data_m=open('<your_path>\mimic2\signals_txt_HR SpO2\\filenames_in_this_folder.txt','r') #==change==
log=open('<your_path>\mimic2\signals_txt_HR SpO2 0\log.txt','w') #==change==
filenames_m=open('<your_path>\mimic2\signals_txt_HR SpO2 0\\filenames_in_this_folder.txt','w') #==change==
log_m=open('<your_path>\mimic2\signals_txt_HR SpO2 0\log_null.txt','w') #==change==

log_m.write('A list of bad files with null values: \n')


for line_m in data_m:
    if len(line_m)>0:
        print progress
        data=open('<your_path>\mimic2\signals_txt_HR SpO2\\'+fileN+'.txt', 'r') #==change==
        for line in data:
            if len(line)>0:
        data=open('<your_path>\mimic2\signals_txt_HR SpO2\\'+fileN+'.txt', 'r') #==change==
        for line in data:
            if len(line)>0:
                if HR=='-' or SpO2=='-' or HR=='' or SpO2=='':
        if countline==filelength:
            data1=open('<your_path>\mimic2\signals_txt_HR SpO2\\'+fileN+'.txt', 'r') #==change==
            data1out=open('<your_path>\mimic2\signals_txt_HR SpO2 0\\'+fileN+'.txt', 'w') #==change==
            for line1 in data1:
log.write('number of original files: '+str(progress)+'\n'+'number of files without null values: '+str(file_n)+'\n')

Time Slicing and Missing Data Handling

We developed MIMICParser, which was written in C#, to slice the time windows and to compute the 14 core features for each of the windows. Different window sizes and step sizes can be defined beforehand. For more information and the source code of MIMICParser, one can refer to the link below.
MIMICParser Download

Some of the time windows in the waveforms contained zero elements (0.000), which indicated missing values. In order to cope with this issue, MIMICParser used the following heuristic. The parser removed all records in a window (with n records) containing zero elements. After removal phase, if a window had less than n * threshold (threshold=0.3 in our project) records, the window was silently dropped.

Feature Computation

We extracted 20 features in total besides of failure time (see S_l in Wang's paper [2]) and death status (1 means dead and 0 means alive).

MIMICParser implemented the following formulas to calculate 14 core features (7 measurements for each of HR and SpO2) for each sliced window:

  • Mean
  • Median
If n%2==1, Median(X)=value of ((n+1)/2)th item term.
If n%2==0, Median(X)=value of [(n/2)th item term + (n/2+1)th item term]/2.
  • Variance
Var(X)=\frac{1}{n}\sum_{i=1}^{n} ({x_{i}-\bar{x}})^{2}
  • Skewness
Skewness(X)=\frac{\frac{1}{n}\sum_{i=1}^{n} ({x_{i}-\bar{x}})^{3}}{[\frac{1}{n-1}\sum_{i=1}^{n} ({x_{i}-\bar{x}})^{2}]^{\frac{3}{2}}}
  • Kurtosis
Kurtosis(X)=\frac{\frac{1}{n}\sum_{i=1}^{n} ({x_{i}-\bar{x}})^{4}}{[\frac{1}{n}\sum_{i=1}^{n} ({x_{i}-\bar{x}})^{2}]^{2}}
  • Standard Deviation

R packages pracma [6] and pdc [7] were applied to calculate 6 entropy features (3 entropies for each of HR and SpO2) for each sliced window. The 3 entropies were:

  • Approximate entropy
  • Sample entropy
  • Permutation entropy

The calculations of these entropies were time-consuming because the speed that R imports large time series data is very low. Each dataset (e.g., 1024 window size and 100 step size) was estimated to take more than 60 hours to finish calculating 6 entropy features. Given all window-slicing datasets (9 datasets in total), the computation time would be more than three weeks. So we leveraged the help of an HPC (High-Performance Computing) cluster--VIPER, which was provided by our university. As ran in VIPER, we finally finished running all entropy computation scripts within one week. For the R script computing entropies, one can refer to the code below.

#R script

#configure the folder name "windowsSize_x-step_x" before use

#install.packages("pdc") #uncommend this line if package not installed
#install.packages("pracma") #uncommend this line if package not installed

###input file name##add directory path if needed
con = file("<path_of_MIMICParser_result>/windowsSize_1024-step_100/windowsDump.txt","r") #==change==
##read 1 line
line = readLines(con,1)
### each two lines in input is one window
### use x to identify the even lines
x = 1
while( length(line) != 0 ) {
  ### take all coma off
  t1 = unlist(strsplit(line, "\\,"))
  len = length(t1)
  ## take IDs off
  ts = as.numeric(t1[3:len])
  ## entropy calculations
  ae = approx_entropy(ts, edim = 2)
  se = sample_entropy(ts, edim = 2)
  pe = min(entropyHeuristic(ts)$entropy[,3])
  ## odd lines
  if (x%%2==1) {
    pid = t1[1]
    pidex = t1[2]
    out = c(pid, pidex, ae, se, pe)
  ## even lines
  else {
    out = append(out, ae)
    out = append(out, se)
    out = append(out, pe)
    write(out, file = "<path_of_MIMICParser_result>/windowsSize_1024-step_100/entropy_result.txt", append = TRUE, ncolumns = 8, sep = "\t") #==change==
  line = readLines(con,1)
  x = x + 1

Modeling and Evaluations

After computing all 20 features of windows, 9 datasets were delivered for 10-fold cross-validation using Cox regression for training and a set of evaluation measurements after testing.

The 9 datasets are listed below:

  • Dataset 1. window size 512 and step size 50
  • Dataset 2. window size 512 and step size 100
  • Dataset 3. window size 512 and step size 200
  • Dataset 4. window size 1024 and step size 50
  • Dataset 5. window size 1024 and step size 100
  • Dataset 6. window size 1024 and step size 200
  • Dataset 7. window size 2048 and step size 50
  • Dataset 8. window size 2048 and step size 100
  • Dataset 9. window size 2048 and step size 200

In this stage, three main steps were executed: (1) 9 datasets were split into 720 separate datasets for further 10-fold cross-validation training and testing; (2) to each split dataset for training, Cox Regression was applied; (3) thresholds were computed and a set of evaluation measurements were used including the AUC_ROC (area under Receiver Operating Characteristic curve), AUC_PR (area under Precision-Recall curve), specificity, sensitivity and PPV (positive predictive value).

Splitting Datasets for 10-Fold Cross-Validation

Python scripts were used to create unstratified and stratified datasets for 10-fold cross-validation (Source Code Download). We randomly re-sorted each dataset, equally cut it into ten parts, and numbered these parts 1-10. Part 1 was saved as a testing set and the other nine parts were combined to be a training set. Same sampling process was iterated for nine times on part 2 to part 10.

To implement the stratified method, each dataset was divided into two datasets: one contained all dead records and the other one contained all alive records. Same re-sorting, cutting, numbering, and sampling procedures were executed on each stratified dataset. The only difference here was that before sampling, parts with the same number from both divided datasets (dead records and alive records) were combined into one.

Finally, 9 datasets were split into 720 separate datasets for further 10-fold cross-validation training and testing, of which 180 datasets (9 * (10 training sets + 10 testing sets)) were the products of using unstratified sampling method and 14 features, 180 were with unstratified method and 20 features, 180 were stratified with 14 features, and the other 180 were stratified with 20 features.

Cox Regression for Training

Cox regression, also known as proportional hazards regression, is regression method analyzing time series data targeting to an event. In this project, we used the coxph function in an R package named survival [8] for training based on Cox regression. The coxph function is able to fit a Cox regression model by inputting the formula about time, event, and other features. An example of using coxph is shown below:

model = coxph( Surv(Survival_Time, ICUExpireFlag) ~ HR_Mean + HR_Median + ... + SpO2_Sample_Entropy + SpO2_Permutation_Entropy )

where the ICUExpireFlag referred to the death status (1 means dead and 0 means alive) in last ICU stays.

As a result, we extracted the coefficients in models and computed the values of proportional hazards for the testing sets generated by the dataset splitting step. The values of proportional hazards were computed by the following formula:

ph = exp(b_HR_Mean * HR_Mean + b_HR_Median * HR_Median + ... + b_SpO2_Sample_Entropy * SpO2_Sample_Entropy + b_SpO2_Permutation_Entropy * SpO2_Permutation_Entropy)

where b_FeatureName is the coefficient of the corresponding feature in the modeling result. For the full R script of training and testing, one can refer to the code below.

#R script

#* Save split file in "in_and_out" folder
#* Configure this script
#	* Set "in_and_out" folder path
#	* Set path of packages
#	* Change parameters if needed. See "==change==" marks in this script.
#* Run

#remove result--"models.txt" if re-run the script

#models_and_evaluations.txt (final results)

#==================================First Part===================================================
#install.packages("survival") #uncommend this line and set package path if package not installed


path = "<your_path>/in_and_out/" #set your path here, end by "folder/" #==change==

beta_N = 20 #number of beta/coefficients #==change==

dataset_Start=1 #==change==
dataset_End=18 #==change==
part_Start=1 #==change==
part_End=10 #==change==

for(dataset in dataset_Start:dataset_End) {
  for(part in part_Start:part_End) {
    trainFile = paste(path, "dataset", toString(dataset), "_except_part", toString(part), ".txt", sep="")
    cols = c('Window_Start_Index', 'Survival_Time', 'ID', 'ICUInTime', 'ICUOutTime', 'ICUExpireFlag', 'HR_Mean', 'HR_Median', 'HR_Variance', 'HR_Skewness', 'HR_Kurtosis', 'HR_Standard_Deviation', 'HR_RMSSD', 'HR_Approx_Entropy', 'HR_Sample_Entropy', 'HR_Permutation_Entropy', 'SpO2_Mean', 'SpO2_Median', 'SpO2_Variance', 'SpO2_Skewness', 'SpO2_Kurtosis', 'SpO2_Standard_Deviation', 'SpO2_RMSSD', 'SpO2_Approx_Entropy', 'SpO2_Sample_Entropy', 'SpO2_Permutation_Entropy') #==change==
    trainData = read.table(trainFile, col.names=cols, sep='\t')
    model = coxph(Surv(trainData$Survival_Time,trainData$ICUExpireFlag)~trainData$HR_Mean+trainData$HR_Median+trainData$HR_Variance+trainData$HR_Skewness+trainData$HR_Kurtosis+trainData$HR_Standard_Deviation+trainData$HR_RMSSD+trainData$HR_Approx_Entropy+trainData$HR_Sample_Entropy+trainData$HR_Permutation_Entropy+trainData$SpO2_Mean+trainData$SpO2_Median+trainData$SpO2_Variance+trainData$SpO2_Skewness+trainData$SpO2_Kurtosis+trainData$SpO2_Standard_Deviation+trainData$SpO2_RMSSD+trainData$SpO2_Approx_Entropy+trainData$SpO2_Sample_Entropy+trainData$SpO2_Permutation_Entropy) #==change==
    b_HR_Mean = summary(model)$coefficients["trainData$HR_Mean", "coef"] #get coefficient #==change==
    b_HR_Median = summary(model)$coefficients["trainData$HR_Median", "coef"] #get coefficient #==change==
    b_HR_Variance = summary(model)$coefficients["trainData$HR_Variance", "coef"] #get coefficient #==change==
    b_HR_Skewness = summary(model)$coefficients["trainData$HR_Skewness", "coef"] #get coefficient #==change==
    b_HR_Kurtosis = summary(model)$coefficients["trainData$HR_Kurtosis", "coef"] #get coefficient #==change==
    b_HR_Standard_Deviation = summary(model)$coefficients["trainData$HR_Standard_Deviation", "coef"] #get coefficient #==change==
    b_HR_RMSSD = summary(model)$coefficients["trainData$HR_RMSSD", "coef"] #get coefficient #==change==
    b_HR_Approx_Entropy = summary(model)$coefficients["trainData$HR_Approx_Entropy", "coef"] #get coefficient #==change==
    b_HR_Sample_Entropy = summary(model)$coefficients["trainData$HR_Sample_Entropy", "coef"] #get coefficient #==change==
    b_HR_Permutation_Entropy = summary(model)$coefficients["trainData$HR_Permutation_Entropy", "coef"] #get coefficient #==change==
    b_SpO2_Mean = summary(model)$coefficients["trainData$SpO2_Mean", "coef"] #get coefficient #==change==
    b_SpO2_Median = summary(model)$coefficients["trainData$SpO2_Median", "coef"] #get coefficient #==change==
    b_SpO2_Variance = summary(model)$coefficients["trainData$SpO2_Variance", "coef"] #get coefficient #==change==
    b_SpO2_Skewness = summary(model)$coefficients["trainData$SpO2_Skewness", "coef"] #get coefficient #==change==
    b_SpO2_Kurtosis = summary(model)$coefficients["trainData$SpO2_Kurtosis", "coef"] #get coefficient #==change==
    b_SpO2_Standard_Deviation = summary(model)$coefficients["trainData$SpO2_Standard_Deviation", "coef"] #get coefficient #==change==
    b_SpO2_RMSSD = summary(model)$coefficients["trainData$SpO2_RMSSD", "coef"] #get coefficient #==change==
    b_SpO2_Approx_Entropy = summary(model)$coefficients["trainData$SpO2_Approx_Entropy", "coef"] #get coefficient #==change==
    b_SpO2_Sample_Entropy = summary(model)$coefficients["trainData$SpO2_Sample_Entropy", "coef"] #get coefficient #==change==
    b_SpO2_Permutation_Entropy = summary(model)$coefficients["trainData$SpO2_Permutation_Entropy", "coef"] #get coefficient #==change==
    model_output = matrix(c("dataset", dataset, "except part", part, b_HR_Mean, b_HR_Median, b_HR_Variance, b_HR_Skewness, b_HR_Kurtosis, b_HR_Standard_Deviation, b_HR_RMSSD, b_HR_Approx_Entropy, b_HR_Sample_Entropy, b_HR_Permutation_Entropy, b_SpO2_Mean, b_SpO2_Median, b_SpO2_Variance, b_SpO2_Skewness, b_SpO2_Kurtosis, b_SpO2_Standard_Deviation, b_SpO2_RMSSD, b_SpO2_Approx_Entropy, b_SpO2_Sample_Entropy, b_SpO2_Permutation_Entropy), ncol=beta_N+4, byrow=TRUE) #==change==
    model_output = as.table(model_output)
    model_filename = paste(path, "models.txt", sep="")
    write.table(model_output, model_filename, sep="\t", row.names=FALSE, col.names=FALSE, append=TRUE)
    testFile = paste(path, "dataset", toString(dataset), "_part", toString(part), ".txt", sep="")
    cols = c('Window_Start_Index', 'Survival_Time', 'ID', 'ICUInTime', 'ICUOutTime', 'ICUExpireFlag', 'HR_Mean', 'HR_Median', 'HR_Variance', 'HR_Skewness', 'HR_Kurtosis', 'HR_Standard_Deviation', 'HR_RMSSD', 'HR_Approx_Entropy', 'HR_Sample_Entropy', 'HR_Permutation_Entropy', 'SpO2_Mean', 'SpO2_Median', 'SpO2_Variance', 'SpO2_Skewness', 'SpO2_Kurtosis', 'SpO2_Standard_Deviation', 'SpO2_RMSSD', 'SpO2_Approx_Entropy', 'SpO2_Sample_Entropy', 'SpO2_Permutation_Entropy') #==change==
    testData = read.table(testFile, col.names=cols, sep='\t')
    testData["ph"] = NA
    testData$ph = exp(b_HR_Mean*testData$HR_Mean + b_HR_Median*testData$HR_Median + b_HR_Variance*testData$HR_Variance + b_HR_Skewness*testData$HR_Skewness + b_HR_Kurtosis*testData$HR_Kurtosis + b_HR_Standard_Deviation*testData$HR_Standard_Deviation + b_HR_RMSSD*testData$HR_RMSSD + b_HR_Approx_Entropy*testData$HR_Approx_Entropy + b_HR_Sample_Entropy*testData$HR_Sample_Entropy + b_HR_Permutation_Entropy*testData$HR_Permutation_Entropy + b_SpO2_Mean*testData$SpO2_Mean + b_SpO2_Median*testData$SpO2_Median + b_SpO2_Variance*testData$SpO2_Variance + b_SpO2_Skewness*testData$SpO2_Skewness + b_SpO2_Kurtosis*testData$SpO2_Kurtosis + b_SpO2_Standard_Deviation*testData$SpO2_Standard_Deviation + b_SpO2_RMSSD*testData$SpO2_RMSSD + b_SpO2_Approx_Entropy*testData$SpO2_Approx_Entropy + b_SpO2_Sample_Entropy*testData$SpO2_Sample_Entropy + b_SpO2_Permutation_Entropy*testData$SpO2_Permutation_Entropy) #proportional hazards #==change==
    output_filename = paste(path, "test_results_dataset", toString(dataset), "_part", toString(part), ".txt", sep="")
    write.table(testData, output_filename, sep="\t", row.names=FALSE, col.names=FALSE)
#==================================Second Part===================================================
#install.packages("pROC") #uncommend this line and set package path if package not installed
#install.packages("PRROC") #uncommend this line and set package path if package not installed

library("pROC") #==change==
library("PRROC") #==change==

#path = "your_path" #set your path here if use this part separately #==change==

model_filename = paste(path, "models.txt", sep="")
cols = c('dataset_name1','name2','name3','name4','b_HR_Mean', 'b_HR_Median', 'b_HR_Variance', 'b_HR_Skewness', 'b_HR_Kurtosis', 'b_HR_Standard_Deviation', 'b_HR_RMSSD', 'b_HR_Approx_Entropy', 'b_HR_Sample_Entropy', 'b_HR_Permutation_Entropy', 'b_SpO2_Mean', 'b_SpO2_Median', 'b_SpO2_Variance', 'b_SpO2_Skewness', 'b_SpO2_Kurtosis', 'b_SpO2_Standard_Deviation', 'b_SpO2_RMSSD', 'b_SpO2_Approx_Entropy', 'b_SpO2_Sample_Entropy', 'b_SpO2_Permutation_Entropy') #==change==
modelData = read.table(model_filename, col.names=cols, sep='\t')
modelData["AUC_ROC"] = NA
modelData["Specificity"] = NA
modelData["Sensitivity"] = NA
modelData["PPV"] = NA
modelData["AUC_PR"] = NA
modelData["AUC_ROC_by_PRROC"] = NA
modelData["Threshold"] = NA

i = 0
for(dataset in dataset_Start:dataset_End) {
  for(part in part_Start:part_End) {
    i = i+1
    filename = paste(path, "test_results_dataset", toString(dataset), "_part", toString(part), ".txt", sep="")
    cols = c('Window_Start_Index', 'Survival_Time', 'ID', 'ICUInTime', 'ICUOutTime', 'ICUExpireFlag', 'HR_Mean', 'HR_Median', 'HR_Variance', 'HR_Skewness', 'HR_Kurtosis', 'HR_Standard_Deviation', 'HR_RMSSD', 'HR_Approx_Entropy', 'HR_Sample_Entropy', 'HR_Permutation_Entropy', 'SpO2_Mean', 'SpO2_Median', 'SpO2_Variance', 'SpO2_Skewness', 'SpO2_Kurtosis', 'SpO2_Standard_Deviation', 'SpO2_RMSSD', 'SpO2_Approx_Entropy', 'SpO2_Sample_Entropy', 'SpO2_Permutation_Entropy', 'ph') #==change==
    data = read.table(filename, col.names=cols, sep='\t')
    rocobj = roc(data$ICUExpireFlag, data$ph, percent=FALSE) #get AUC_ROC #==change==
    my.coords = coords(rocobj, x="all", ret=c("specificity", "sensitivity", "ppv", "threshold"))
    evaluations = my.coords[,my.coords["specificity",]>=0.9485] #around 0.95, in order to get threshold
    modelData$AUC_ROC[i] = rocobj$auc[1]
    modelData$Specificity[i] = evaluations[1]
    modelData$Sensitivity[i] = evaluations[2]
    modelData$PPV[i] = evaluations[3]
    modelData$Threshold[i] = evaluations[4]
    dead = vector(mode="numeric", length=0)
    alive = vector(mode="numeric", length=0)
    for(n in 1:length(data$ICUExpireFlag)) { #==change==
      if(data$ICUExpireFlag[n]==1) { dead = append(dead, data$ph[n], after=length(dead)) }
      else { alive = append(alive, data$ph[n], after=length(alive)) }
    pr = pr.curve(scores.class0 = dead, scores.class1 = alive, curve = FALSE)
    modelData$AUC_PR[i] = pr[2]
    roc = roc.curve(scores.class0 = dead, scores.class1 = alive, curve = FALSE) #check AUC_ROC
    modelData$AUC_ROC_by_PRROC[i] = roc[2]

modelData <- data.frame(lapply(modelData, as.character), stringsAsFactors=FALSE) #data frame type

output_filename = paste(path, "models_and_evaluations.txt", sep="")
write.table(modelData, output_filename, sep="\t", row.names=FALSE, col.names=TRUE)

For each dataset with a specific window size, step size, and sampling method, the best model after 10-fold cross-validation was chosen to be the one with the highest PPV in 10 iterations.


As imitating and comparing to Wang's evaluation work, we fixed the specificity around 0.95 as well to get the corresponding threshold, AUC_ROC, AUC_PR, sensitivity, and PPV. The evaluation results were computed in R by the coords function in the pROC package and the pr.curve function in the PRROC package. If a value of proportional hazards was larger than the trained threshold, the individual was predicted as dead, otherwise alive. For each dataset with a specific window size, step size, and sampling method, 10 sets of evaluation results after 10 iterations in 10-fold cross-validation were finally averaged.


Overall, we achieved 36 sets of models and evaluation results after 10-fold cross-validation using Cox regression for training and AUC_ROC, AUC_PR, sensitivity and PPV as evaluation measurements after testing. Some basic information and the evaluation results are listed in Table 1. For full modeling, evaluation and comparison results of the 36 datasets, one can refer to the "Full Model and Evaluation Results Download" link at the end of the blog.

Since the dataset Wang's paper presented was with 1024 window size, 100 step size, unstratified sampling method and 32 features, we specially elaborate the modeling and evaluation results of the most similar one in the first subsection below, which is with 1024 window size, 100 step size, unstratified sampling method and 20 features (dataset index 23 in Table 1). In the second subsection, we state three interesting findings by comparing the results.

Table 1. Metadata and evaluation results of 36 different datasets.
Index Feature # Stratified or not Window size Step size Total window # AUC_ROC AUC_PR Specificity Sensitivity PPV
Wang's 32 Unstratified 1024 100 - 0.7514 0.2522 0.9485 0.3424 0.2956
1 14 Unstratified 512 50 58504 0.586500042 0.277848811 0.948572478 0.137206136 0.378782933
2 14 Unstratified 512 100 29450 0.586413952 0.276648337 0.948671924 0.13635995 0.376518387
3 14 Unstratified 512 200 14908 0.58722473 0.280304693 0.948858048 0.140769857 0.387008439
4 14 Unstratified 1024 50 51020 0.584167878 0.298085113 0.948664484 0.144295496 0.402352925
5 14 Unstratified 1024 100 25720 0.582762634 0.293374447 0.948758889 0.140607618 0.395006605
6 14 Unstratified 1024 200 13070 0.5841767 0.286874029 0.949084544 0.137004079 0.383662578
7 14 Unstratified 2048 50 37246 0.575232616 0.327882379 0.948686767 0.14158569 0.432927003
8 14 Unstratified 2048 100 18763 0.57456312 0.326239055 0.948846651 0.146065493 0.438408806
9 14 Unstratified 2048 200 9536 0.568165132 0.328881985 0.949191651 0.135272647 0.410567358
10 14 Stratified 512 50 58504 0.586601039 0.277588947 0.948696384 0.1359375 0.376971678
11 14 Stratified 512 100 29450 0.58695016 0.276712377 0.948621554 0.135466179 0.375568219
12 14 Stratified 512 200 14908 0.584087566 0.272473116 0.948844884 0.134782609 0.373608065
13 14 Stratified 1024 50 51020 0.584331679 0.295280598 0.94868677 0.143319838 0.400232819
14 14 Stratified 1024 100 25720 0.583851147 0.293358063 0.948915663 0.137096774 0.389156281
15 14 Stratified 1024 200 13070 0.582728409 0.292436566 0.948863636 0.1404 0.392908722
16 14 Stratified 2048 50 37246 0.575050991 0.32866632 0.948542024 0.144059406 0.435420119
17 14 Stratified 2048 100 18763 0.575070967 0.327160888 0.948979592 0.145925926 0.438565518
18 14 Stratified 2048 200 9536 0.576602974 0.328366385 0.949197861 0.150731707 0.442959689
19 20 Unstratified 512 50 58504 0.615341012 0.306876704 0.948594652 0.173305206 0.435242182
20 20 Unstratified 512 100 29450 0.614593082 0.305517908 0.948716282 0.173136421 0.43508101
21 20 Unstratified 512 200 14908 0.614136636 0.301697781 0.948943171 0.16969311 0.430115321
22 20 Unstratified 1024 50 51020 0.615671708 0.325064692 0.948596974 0.168940821 0.440397991
23 20 Unstratified 1024 100 25720 0.613800732 0.322697107 0.948764342 0.167639726 0.437126466
24 20 Unstratified 1024 200 13070 0.61373578 0.322750368 0.949015333 0.172699069 0.443111724
25 20 Unstratified 2048 50 37246 0.585695653 0.346131416 0.948653943 0.168870838 0.47662275
26 20 Unstratified 2048 100 18763 0.585355758 0.345879343 0.948846312 0.170644006 0.477289228
27 20 Unstratified 2048 200 9536 0.588160461 0.348665431 0.949193704 0.167610088 0.472697926
28 20 Stratified 512 50 58504 0.615668096 0.306948165 0.948696384 0.173529412 0.436050155
29 20 Stratified 512 100 29450 0.614710985 0.306013442 0.948621554 0.171297989 0.430545488
30 20 Stratified 512 200 14908 0.613903238 0.301670279 0.948844884 0.167753623 0.425530551
31 20 Stratified 1024 50 51020 0.615572325 0.324678086 0.94868677 0.168522267 0.440160722
32 20 Stratified 1024 100 25720 0.614273416 0.323369965 0.948915663 0.167137097 0.436264961
33 20 Stratified 1024 200 13070 0.613080303 0.320934352 0.948863636 0.1724 0.443007261
34 20 Stratified 2048 50 37246 0.585266715 0.346000387 0.948542024 0.169306931 0.476174807
35 20 Stratified 2048 100 18763 0.585824305 0.34501189 0.948979592 0.16962963 0.476712947
36 20 Stratified 2048 200 9536 0.587587714 0.346721829 0.949197861 0.167804878 0.472962399

Modeling and Evaluation Results of One Dataset

The dataset we choose to elaborate in this section is with 1024 window size, 100 step size, unstratified sampling method and 20 features (dataset index 23 in Table 1). After time slicing and handling missing data, we had 945 unique ICU patients left, of which 116 (12.3%) were dead and 829 (87.7%) were alive. The model of it after Cox regression training and selected by the best PPV is shown below:

ph = exp(0.017219434 * HR_Mean + 0.000459047 * HR_Median + 0.003585291 * HR_Variance + 0.004120923 * HR_Skewness + 0.000750431 * HR_Kurtosis - 0.130933905 * HR_Standard_Deviation + 0.066657892 * HR_RMSSD + 0.481958967 * HR_Approx_Entropy - 0.746065619 * HR_Sample_Entropy - 0.147992451 * HR_Permutation_Entropy - 0.031672657 * SpO2_Mean + 0.007294408 * SpO2_Median - 0.003289708 * SpO2_Variance - 0.001994458 * SpO2_Skewness + 0.000682034 * SpO2_Kurtosis + 0.219984497 * SpO2_Standard_Deviation - 0.144379499 * SpO2_RMSSD - 0.254112193 * SpO2_Approx_Entropy + 0.248200714 * SpO2_Sample_Entropy - 0.914535063 * SpO2_Permutation_Entropy)

where ph refers to the value of proportional hazards.

In order to predict a new patient's death status after ICU stays, values of 20 features (HR_Mean, HR_Median, ..., SpO2_Permutation_Entropy) need to be computed and inputted into the model above. After that, a ph value will be achieved. Since we set the specificity to be around 0.95, the threshold of ph in this model was 0.228446568. That is, if the new ph value is larger than 0.228446568, the patient is predicted as dead, otherwise alive.

Given specificity to be around 0.95, we got a set of evaluation results as follows:

AUC_ROC AUC_PR Specificity Sensitivity PPV
Our results 0.6138 0.3227 0.9488 0.1676 0.4371
Wang's results 0.7514 0.2522 0.9485 0.3424 0.2956

By comparing to Wang's results, it is obvious that the AUC_ROC and sensitivity of our results were much lower while the AUC_PR and PPV were much higher. The differences in results might be caused by ways of missing data handling, different feature numbers (20 features in our project vs. 32 features in Wang's paper), and differences between using 10-fold cross-validation (in our project) and 5-fold cross-validation (in Wang's paper).

Three Findings based on Comparisons

In consideration of unintended findings, we horizontally compared 36 sets of evaluation results from different perspectives. The comparisons we made were listed below:

  • Fixing window sizes, compare the results when step sizes increase.
  • Fixing step sizes, compare the results when window sizes increase.
  • Compare the results when total window numbers increase.
  • Compare the results of using unstratified sampling method with those of using the stratified method.
  • Compare the results incorporating 14 features with those incorporating 20 features.

After comparisons, we discovered three interesting findings elaborated in the following subsections. The full comparisons and plots can be found in the "Full Model and Evaluation Results Download" link at the end of the blog.

Entropy Features in Deed Improved the Evaluation Results

By comparing the results of using 20 features with those of using only 14 core features, we found that all the AUC_ROC, AUC_PR, sensitivity, and PPV of using 20 features were very much higher.

Table 2 shows a comparison between 14-feature dataset and 20-feature dataset, of which both were with 1024 window size, 100 step size and unstratified sampling. From the comparison, we can intuitively make an assumption that the approximate entropy, sample entropy and the permutation entropy of HR and SpO2 play an important role in the ICU-patient survival analysis.

Table 2. A comparison of evaluation results between 14-feature dataset and 20-feature dataset.
Index Feature # Stratified or not Window size Step size Total window # AUC_ROC AUC_PR Specificity Sensitivity PPV
5 14 Unstratified 1024 100 25720 0.582762634 0.293374447 0.948758889 0.140607618 0.395006605
23 20 Unstratified 1024 100 25720 0.613800732 0.322697107 0.948764342 0.167639726 0.437126466

AUC_ROC Decreased When Window Sizes Increased

By comparing performances given a fixed step size, one clear conclusion we can draw is that the AUC_ROC values are overall decreasing when window sizes are increasing. The plots of AUC_ROC, of which step sizes are fixed to be 50, 100, and 200 individually, are shown in Figure 1 and 2. According to the comparisons, we had two guesses of the reasons: (1) A larger window size leads to more overlapping on windows and less independency. Less independent individuals can cause worse performances. (2) A larger window size leads to less window numbers in total, which means a smaller sample size when training. A smaller sample size may cause less accurate results.

Meanwhile, in another comparison--by increasing total window numbers (see Figure 3), there is no extremely clear association between window numbers and AUC_ROC values. Therefore, it is more likely that the first guess of reasons is true.

MIMIC TSCox Fig1and2.PNG


Using Unstratified or Stratified Sampling Methods Made No Difference

Another interesting finding after comparisons is that there is no difference whatever sampling method was used. For example, in Table 3, all the AUC_ROC, AUC_PR, sensitivity, and PPV values of using different sampling methods are fairly close, which means the sample was balanced enough (dead vs. alive individuals) after time-slicing.

Table 3. A comparison of evaluation results between unstratified and stratified sampling methods.
Index Feature # Stratified or not Window size Step size Total window # AUC_ROC AUC_PR Specificity Sensitivity PPV
23 20 Unstratified 1024 100 25720 0.613800732 0.322697107 0.948764342 0.167639726 0.437126466
32 20 Stratified 1024 100 25720 0.614273416 0.323369965 0.948915663 0.167137097 0.436264961

Discussion and Conclusion

In this project, we not only imitated the 1024 window size and 100 step size experiment but also performed more to discover extra findings besides of Wang's work. After TS-Cox procedures, we finally achieved 36 sets of models and evaluations, which can be found on Table 1 and the "Full Model and Evaluation Results Download" link at the end of the blog. By a series of comparisons, three interesting findings are elaborated in the Results section. They are: (1) entropy features improved the evaluation results a lot; (2) AUC_ROC overall decreased when window sizes increased; (3) using unstratified or stratified sampling methods made no difference to performances. Even though our AUC_ROC value of the same window size and step size was lower than Wang's result, there are meaningful reasons that all the different ways of handling missing data, different cross-validation methods and the fewer numbers of features were possible to cause a worse result.

In order to improve the performances in the future, there are a variety of approaches we can attempt, such as trying more window sizes and step sizes to see clearer trends and extracting more features as well as incorporating feature selection methods. Moreover, for future analysis associated with Cox regression model, the step of checking assumptions [9], such as I.I.D. (independent and identically distributed) and identical survival times [10], need to be paid much attention to at the very beginning, so that risks of prediction failure when moving to different datasets can be reduced.

Full Model and Evaluation Results Download (.xlsx file)


[1] D. G. Kleinbaum and M. Klein, Survival Analysis: A Self-Learning Text. Springer Science & Business Media, Jan. 2006.
[2] Y. Wang, W. Chen, K. Heard, M. H. Kollef, T. C. Bailey, Z. Cui, Y. He, C. Lu, and Y. Chen, “Mortality Prediction in ICUs Using A Novel Time-Slicing Cox Regression Method,” AMIA Annual Symposium Proceedings, vol. 2015, pp. 1289–1295, Nov. 2015. [Online]. Available:
[3] M. Saeed, M. Villarroel, A. T. Reisner, G. Clifford, L.-W. Lehman, G. Moody, T. Heldt, T. H. Kyaw, B. Moody, and R. G. Mark, “Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a publicaccess intensive care unit database,” Critical care medicine, vol. 39, no. 5, p. 952, 2011.
[4] “MIMIC-II Explorer.” [Online]. Available:
[5] “The WFDB Software Package.” [Online]. Available:
[6] H. W. Borchers, “Pracma: practical numerical math functions. r package version 1.8. 3,” 2015.
[7] A. Brandmaier, “pdc: Permutation distribution clustering,” R package version, vol. 1, no. 1, 2014.
[8] T. M. Therneau, “Survival Analysis [R package survival version 2.39-3].” [Online]. Available:
[9] “Checking assumptions in the cox proportional hazards regression model.” [Online]. Available:
[10] N. E. Breslow, “Analysis of Survival Data under the Proportional Hazards Model,” International Statistical Review / Revue Internationale de Statistique, vol. 43, no. 1, pp. 45–57, 1975. [Online]. Available: