This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

This website provides four different approaches to documentation for the GIM Dataset.

  • Data Dictionary: Start here if you want the direct facts on how the data tables are structure.
  • Explanation: Start here if you want a more detailed exploration of the dataset–where does it come from, and who does it include?
  • Tutorials: Start here if you want to learn how to work with and analyze the data.
  • How-To Guides: Start here if you want to take the first steps to access and load in the data.

1 - Data Dictionary

Schema and contents of the data table

Data Table Overview

Tables in the dataset are as follows:

Table Raw Pre-processed
Encounters link link
Baseline Values link link
Demographics link link
Numeric Values link link
Clinical Orders link link
Medication Administration link link
Outcomes N/A link
Alternate Outcomes N/A link

1.1 - Encounters (Static)

Overview

  • Training set: train_encounters.csv
    • 16 976 rows
  • Validation set: valid_encounters.csv
    • 2 405 rows
  • Testing set: test_encounters.csv
    • 985 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
SERVICE_FROM_DK Indentifier used for the service before transfer to GIM
FROM_SERVICE The service before transfer to GIM
FROM_SERVICE_CD The FROM_SERVICE code
FROM_UNIT The unit where the patient was transferred from
FROM_LOCATION_SERVICE The service associated with FROM_UNIT
PATIENT_DK Unique patient identifier
ADT_DISCHARGE The discharge disposition in the EHR
ICD_10 ICD code of the Most Responsible Diagnosis
MRP_DIAGNOSIS Most responsible diagnosis
age Patient age at time of admission
gender Gender of patient. 1 for Female, 0 for Male. (Note: Data may instead refer to sex, and does not contain any additional categories.)
pre_gim_icu Flag for patient having been in the ICU prior to GIM entry
post_gim_icu Flag for patient entering ICU after GIM
death Flag for mortality
palliative_transfer Flag for palliative transfer
OUTCOME_TYPE 1 if ICU Transfer, 2 if Death, 3 if Palliative entry, 4 if Palliative discharge, 5 if Discharged
OUTCOME_ALL 1 if OUTCOME_TYPE is 1, 2, 3, or 4; 0 if OUTCOME_TYPE is 5
gim_to_outcome Time from GIM entry to outcome in days.
ccc Flag indicating if comfort care order was placed
outcome_no_ccc OUTCOME_TYPE except where (death == 1 and ccc == 1 and OUTCOME_TYPE == 2), then outcome_no_ccc = 3. Deaths that are “expected” (based on comfort care order) are grouped with palliative entry.
time_to_event Time from GIM entry to outcome in hours.

1.2 - Demographic Variables (Static)

Overview

  • Training set: train_demographics.csv
    • 16 976 rows
  • Validation set: valid_demographics.csv
    • 2 405 rows
  • Testing set: test_demographics.csv
    • 985 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
marital Marital status for patient encounter. Possible values are marital_divorced_widowed, marital_partnered, marital_single, and marital_unknown_other
province Patient province. Options are Ontario or Other.
language Primary language spoken by patient. (Note: Data may be unreliable)
religion Religion of patient. See religion_codes.csv for a list of all codes and associated meanings.
no_housing Patient housing status. 1 if the patient is unhoused (has no found address or address corresponds to a homeless shelter), 0 if the patient is housed.
age Age in years at time of admission. Rounded to nearest 5 years and bounded at 20 (lower) and 95 (upper).
sex Sex of patient. 1 for Female, 0 for Male.

1.3 - Baseline Values (Static)

Overview

  • Training set: train_baseline.csv
    • 16 976 rows
  • Validation set: valid_baseline.csv
    • 2 405 rows
  • Test set: test_baseline.csv
    • 985 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
lab_co2 Total CO2
lab_cl Chloride
lab_na Sodium
lab_agap Anion Gap
lab_hct HcT (Haematocrit)
lab_mch MCH (Mean Corpuscular Hemoglobin)
lab_mchc MCHC
lab_rbc RBC
lab_iwbcr WBC
lab_mcv MCV
lab_rdw RDW
lab_hgb HGB
lab_cr Creatinine
lab_plt PLT
lab_glur Glucose Random
lab_k Potassium
lab_mpv MPV
lab_alymp Absolute Lymphocytes
lab_aeos Absolute Eosinophils
lab_abaso Absolute Basophils
lab_amono Absolute Monocytes
lab_aneut Absolute Neutrophils
lab_alb Albumin
lab_rpt PT
lab_rinr INR
lab_rptt aPTT
lab_tbil Total Bilirubin
lab_ast AST
lab_alt ALT
lab_alp ALP
lab_urea Urea
lab_ca Calcium
lab_mg Magnesium
lab_po4 Phosphorus
lab_tni Troponin I Serum
lab_lip Lipase
lab_ck CK
lab_uuro Urobilinogen
lab_spg Specific Gravity
lab_ph pH
lab_amy Amylase
lab_vlact Lactate Venous
lab_vhion H ion Venous
lab_mvsa Measured Oxygen Saturation Venous
lab_vpo2 pO2 Venous
lab_vbe Base Excess Venous
lab_vtco2 Bicarbonate Venous
lab_vpco2 pCO2 Venous
lab_vph pH Venous
vital_spulse Pulse
vital_sbpsystolic Systolic Blood Pressure
vital_sbpdiastolic Diastolic Blood Pressure
vital_srespirations Respirations
vital_so2saturation O2 Saturation (%)
vital_stemperature Temperature (C)

1.4 - Numeric Variables (Time-Varying)

Raw Data

Overview

  • Training set: train_numeric_variables.csv
    • 5 030 004 rows
  • Validation set: valid_numeric_variables.csv
    • 685 621 rows
  • Testing set: test_numeric_variables.csv
    • 270 537 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
variable Measure (vital, lab, etc) corresponding to the columns in pre-processed data table.
time Time measure was recorded (in hours since admission)
numeric_value Value of measurement

Pre-Processed Data

Overview

  • Training set: train_numeric_timeseries_8hr.csv
    • 362 179 rows
  • Validation set: valid_numeric_timeseries_8hr.csv
    • 52 181 rows
  • Testing set: test_numeric_timeseries_8hr.csv
    • 20 538 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
time_window 8-hour window since time of admission

Each of the following variables (corresponding to a measurement) includes the following:

  • The value of the measurement (base name)
  • An indicator variable, 1 if the variable was measured in the time window and 0 if a value was imputed (base name suffixed with _measured)
  • A time elapsed variable, indicating the number of time (in hours) elapsed since the previous available masure (base name suffixed with _measured_last)
Column Name Description
alcohol_sciwascore Total Score
diabetic_spocglucresult POC Glucose Result (mmol/L)
in_out_acatheter Catheter
in_out_aivpb1 IV Piggy Back #1
in_out_aothoutput Other Output
in_out_atmsincontinent Times Incontinent
in_out_siv23and13 IV Dextrose 3.3%/NaCl 0.3%
in_out_sivnormalsaline IV Sodium Chlorine 0.9% (NaCl)
in_out_sotherintake Other Intake
lab_abaso Absolute Basophils
lab_abe Base Excess Arterial
lab_acet Acetaminophen Level
lab_aeos Absolute Eosinophils
lab_agap Anion Gap
lab_ahion H Ion Arterial
lab_alact Lactate Arterial
lab_alb Albumin
lab_alp ALP
lab_alt ALT
lab_alymp Absolute Lymphocytes
lab_amono Absolute Monocytes
lab_amy Amylase
lab_aneut Absolute Neutrophils
lab_ao Aortic Root
lab_apco2 pCO2 Arterial
lab_aph pH Arterial
lab_apo2 pO2 Arterial
lab_asa Salicylate Level
lab_ast AST
lab_atco2 Bicarbonate Arterial
lab_b12 Vitamin B12 Level
lab_bc Bilirubin Direct
lab_bnps NT-proBNP
lab_ca Calcium
lab_cacra Calcium, Ionized (pH 7.4) corr.
lab_cai Ionized Calcium
lab_caicr Calcium, Ionized (pH 7.4) corr.
lab_ck CK
lab_cl Chloride
lab_co2 Total CO2
lab_cr Creatinine
lab_crp C-Reactive Protein
lab_esr1 ESR
lab_etoh Ethanol
lab_fe Iron Total
lab_fer Ferritin
lab_glob Globulin, calc.
lab_glpoc Glucose POC (Lifescan)
lab_glur Glucose Random
lab_hba1 A1c.
lab_hct HCT
lab_hgb HGB
lab_ical Calcium Ionized Arterial
lab_igab Absolute IG
lab_ivsd IV Septum
lab_iwbcr WBC
lab_k Potassium
lab_la Left Atrium
lab_ld LD
lab_lip Lipase
lab_lvedd LV Diastole
lab_lvesd LV Systole
lab_masa Measured O2 Saturation Arterial
lab_mch MCH
lab_mchc MCHC
lab_mcv MCV
lab_metaa Absolute Metamyelocytes M. Diff
lab_mg Magnesium
lab_mpv MPV
lab_mvsa Measured O2 Saturation Venous
lab_myela Absolute Myelocytes M. Diff
lab_na Sodium
lab_orcai Calcium Ionized O.R. Arterial
lab_orglu Glucose O.R. Arterial
lab_orhc Hematocrit O.R. Arterial Calculation
lab_ork Potassium O.R. Arterial
lab_orna Sodium O.R. Arterial
lab_osm Osmolality Serum
lab_ph pH
lab_plt PLT
lab_po4 Phosphorus
lab_pwd Posterior Wall
lab_rbc RBC
lab_rdw RDW
lab_ret Retics
lab_rinr INR
lab_rpt PT
lab_rptt APTT
lab_sat Saturation
lab_spg Specific Gravity
lab_tbil Bilirubin Total
lab_tibc TIBC
lab_tni Troponin I Serum
lab_tpr Total Protein Serum
lab_tsh TSH
lab_ucl Chloride Random
lab_uk Potassium Random
lab_una Sodium
lab_uosm Osmolality
lab_ure Urea
lab_uuro Urobilinogen
lab_vbe Base Excess Venous
lab_vhion H Ion Venous
lab_vlact Lactate Venous
lab_vpco2 pCO2 Venous
lab_vph pH Venous
lab_vpo2 pO2 Venous
lab_vtco2 Bicarbonate Venous
shift_assess_scvhrtrate Heart Rate
shift_assess_spninstymov1 Pain Intensity With Movement (0-10) #1
shift_assess_spninstymov2 Pain Intensity With Movement (0-10) #2
shift_assess_spnintstyrest1 Pain Intensity at Rest (0-10) #1
shift_assess_spnintstyrest2 Pain Intensity at Rest (0-10) #2
shift_assess_srpfio2b Resp FiO2 Percent
shift_assess_srpo2lmin O2 L/Min
skin_abradenscore Braden Score Total
vital_sbpdiastolic Diastolic Blood Pressure
vital_sbpsystolic Systolic Blood Pressure
vital_sfio2 FiO2%
vital_so2saturation O2 Saturation (%)
vital_spainintmove Pain Intensity with Movement
vital_spulse Pulse
vital_srespirations Respirations
vital_stemperature Temperature (C)

1.5 - Clinical Orders (Time-Varying)

Raw Data

Overview

  • Training set: train_clinical_orders.csv
    • 107 983 rows
  • Validation set: valid_clinical_orders.csv
    • 15 394 rows
  • Testing set: test_clinical_orders.csv
    • 5 869 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
order_name Clinical order corresponding to the columns in pre-processed data table.
type Type of clinical order (given by prefix)
start_time Start time of order (in hours since admission to GIM ward)
end_time End time of order (in hours since admission to GIM ward)

Pre-Processed Data

Overview

  • Training set: train_clinical_orders_on_off_timeseries_8hr.csv
    • 362 179 rows
  • Validation set: valid_clinical_orders_on_off_timeseries_8hr.csv
    • 52 181 rows
  • Testing set: test_clinical_orders_on_off_timeseries_8hr.csv
    • 20 538 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
time_window 8-hour window since time of admission
img_porchest1v Imaging order - Portable Chest
img_ctthrabdplc Imaging order - CT Thorax Abdomen Pelvis with Contrast
img_radchest2v Imaging order - XRAY Chest PA and LAT
img_usabdlimited Imaging order - US Abdomen Limited
img_ctthoraxpe Imaging order - CT Thorax Rule Out Pulmonary Embolus
img_mriheado Imaging order - MRI Head
img_radabo2v Imaging order - XRAY Abdomen AP and LAT
img_ctthoraxnoc Imaging order - CT Thorax
img_usabdcomplete Imaging order - US Abdomen Complete
img_ctperfusionstrok Imaging order - CT Perfusion Stroke
img_ctabdoplcon Imaging order - CT Abdomen and Pelvis
img_ctheadnocont Imaging order - CT Head no Contrast
img_usdopvenextb Imaging order - US Doppler Venous Lower Extremity Bilateral - R/O DVT
img_ctabdonocont Imaging order - CT Abdomen
img_radpelvisap Imaging order - XRAY Pelvis AP
img_radabdo1v Imaging order - XRAY Abdomen
img_ctabdoplwcon Imaging order - CT Abdomen Pelvis with Contrast
img_usabdpelvis Imaging order - US Abdomen and Pelvis
img_cardioliteprrs Imaging order - Cardiolate and Persantine Scan with Rest
img_radlumpsin3v Imaging order - XRAY Lumbar Spine 3 Views
img_porabdomen1v Imaging order - Portable Abdomen
img_int0180 Imaging order - Interventional Radiological Procedure Request
img_ctabdopelphas Imaging order - CT Abdomen Pelvis Triphasic Study
img_mrispineo Imaging order - MRI Spine
img_usdopportal Imaging order - US Doppler Portal Vein
img_radchest1v Imaging order - XRAY Chest
img_mriabdomeno Imaging order - MRI Abdomen
img_ctheadwithcon Imaging order - CT Head with Contrast
img_radhipunilt2v Imaging order - XRAY Hip Unilateral Left AP and LAT
img_usdopabdoart Imaging order - US Doppler Abdomen Renal Arterial or Venous
img_ctheadangio Imaging order - CT Head Angio (Circle of Willis)
img_ctthoraxc Imaging order - CT Thorax with Contrast
img_porabdomen2v Imaging order - Portable XRAY Abdomen AP and LAT
img_gastricdeglut Imaging order - XRAY Deglutition Study
img_bonewholebody Imaging order - Bone Scan Whole Body
img_radkneert2v Imaging order - XRAY Knee Right AP and LAT
img_ctneckc Imaging order - CT Neck
img_ctextremity Imaging order - CT Extremity
img_spllumbarpunct Imaging order - XRAY Lumbar Puncture
img_usfaceneck Imaging order - US Face and Neck
img_ctneckcont Imaging order - CT Neck with Contrast
img_ctheadwwocon Imaging order - CT Head with and without contrast
img_ctheadcarotids Imaging order - CT Head and Carotids
img_ctrenalcolic Imaging order - CT Abdomen Rule Out Renal Colic
img_radkneelt2v Imaging order - XRAY Knee Left AP and LAT
img_radcervspin3v Imaging order - XRAY Cervical Spine AP, Lateral and Open Mouth Odontoid View
img_usguidedbiopsy Imaging order - US Guided Biopsy
img_radhipunirt2v Imaging order - XRAY Hip Unilateral Right AP and LAT
img_ctpelvisc Imaging order - CT Pelvis
img_radfootleft3v Imaging order - XRAY Foot Left 3 Views
img_splercp Imaging order - XRAY ERCP
img_usabdpelltd Imaging order - US Abdomen and Pelvis Limited
img_uspeltrvagftri Imaging order - US Pelvis Transvaginal (First Trimester)
img_radskulorbits Imaging order - XRAY Skull Orbits
img_ctspinelcont Imaging order - CT Spine Lumbar without contrast
img_ussofttissueuni Imaging order - US Soft Tissue Unilateral
img_usextremityl Imaging order - US Extremity Left (Soft Tissue)
img_radchest3v Imaging order - XRAY Chest 3 Views
img_usextremityr Imaging order - US Extremity Right (Soft Tissue)
img_hemodiacathins Imaging order - Hemodialysis Catheter Insertion
img_ang0090 Imaging order - Angiography Procedure Request
img_nva3 Imaging order - NVA Abscess Drain Abdomen
img_radfootrt3v Imaging order - XRAY Foot Right 3 Views
img_radshoulderrt2v Imaging order - XRAY Shoulder Right 3 Views
img_ctheadcspnc Imaging order - CT Scan CT Head and Cervical Spine
img_survmetastatic Imaging order - XRAY Metastatic Survey
img_radhipbilat4v Imaging order - XRAY Hip Bilateral 4 Views
img_cardiolitepr Imaging order - Cardiolite Persantine
img_lungvq Imaging order - V/Q Scan
img_radshoulderlt3v Imaging order - XRAY Shoulder Left 3 Views
img_radanklert3v Imaging order - XRAY Ankle Right 3 Views
img_radkneelt4v Imaging order - XRAY Knee Left 4 Views
img_splngtubeins Imaging order - XRAY NG Tube Insertion
img_usdopliver Imaging order - US Doppler Liver Disease Screening
img_radkneert4v Imaging order - XRAY Knee Right 4 Views
img_radtibfibrt2v Imaging order - XRAY Tib Fib Right 2 Views
img_radanklet3v Imaging order - XRAY Ankle Left 3 Views
img_radthorspin2v Imaging order - XRAY Thoracic Spine 2 Views
img_mrabrain Imaging order - MRA Brain
img_ctenterography Imaging order - CT Enterography
img_ctspinecervical Imaging order - CT Spine Cervical without contrast
img_radkneebil4v Imaging order - XRAY Knee Bilateral 4 Views
img_usdopnvexuni Imaging order - US Doppler Vein and Extremity Unilateral
img_usdopvnexbi Imaging order - US Doppler Vein and Extremity Bilateral
img_mriextremityo Imaging order - MRI Extremity Unilateral
img_mripelviso Imaging order - MRI Pelvis
img_nva9 Imaging order - NVA Abscess Drain Thorax
img_mracarotids Imaging order - MRA Carotids
img_ctthld Imaging order - CT Thorax Low Dose
img_ctaaa Imaging order - CT Abdominal Aneurysm
img_intpcin Imaging order - PICC Insertion Double Lumen
img_intpcsl Imaging order - PICC Insertion Single Lumen
img_usabdltdpelvltd Imaging order - US Abdomen Limited and Pelvis Limited
img_mri0733 Imaging order - MRI Stroke clinic Time of Flight Circle of Willis
img_mri0759 Imaging order - MRI Lumbar Spine
img_mri0701 Imaging order - MRI Fast Brain Gadolinium
img_mri0700 Imaging order - MRI Fast Brain
img_mri0824 Imaging order - MRI Osteomyelitis Unilateral
img_mri0832 Imaging order - MRI Liver MRCP
diet_tube_feed Diet order - Tube Feed
diet_regular_other Diet order - Regular Other
diet_oral Diet order - Oral
diet_renal Diet order - Renal
diet_diabetic Diet order - Diabetic
diet_cardiac Diet order - Cardiac
diet_npo Diet order - NPO
diet_regular Diet order - Regular
diet_clear_fluids Diet order - Clear Fluids
diet_nutrition_supplement Diet order - Nutrition Supplement
telemetry Telemtry Order
consult_physio Consult order
consult_general Consult order
consult_stroke Consult order
consult_social Consult order
consult_speech Consult order
consult_dietitian Consult order
consult_chaplain Consult order
consult_physiotherapist Consult order
consult_acute Consult order
consult_gastroenterology Consult order
consult_respiratory Consult order
consult_occupational Consult order
consult_psychiatry Consult order
consult_wound Consult order
consult_physiotherapy Consult order
consult_geriatric Consult order
consult_pharmacist Consult order
consult_chiropodist Consult order
consult_addiction Consult order
consult_research Consult order
cardio_ecg Cardiovascular diagnostics
cardio_vascularlab Cardiovascular diagnostics
cardio_echo Cardiovascular diagnostics
cardio_holter Cardiovascular diagnostics
cardio_peripheralvascular Cardiovascular diagnostics
resp_oxygen Respiratory order
resp_pulmonaryfunctiontest Respiratory order
resp_bipapcpap Respiratory order
resp_respiratoryintervention Respiratory order
resp_chesttube Respiratory order
resp_ventilator Respiratory order
act_sitter Activity & limitations
act_constantcare Activity & limitations
act_opcophysrestr Activity & limitations
act_restrictions Activity & limitations
code_gmc General Medical Care
code_ccc Code status - No CPR: Comprehensive Comfort Care
code_full Code status - Full Code
code_als Code status - No CPR: Advanced Life Support
opcociwacare Protocol order
opcohephiaptt Protocol order
opcoivinslow Protocol order
opcoivinshi Protocol order
opcochfdiuret Protocol order
trans_infusefrozenplasma Transfusion
trans_transfusepackedredbloodcells Transfusion
trans_transfuseplatelets Transfusion
trans_infusealbumin25 Transfusion
transf_transfusionother Transfusion
trans_infuesivimmuneglobulin Transfusion
trans_prothrombincomplexconcentratepcc Transfusion
wound_dressingswoundcare Dressings & wound/skin care
wound_skincare Dressings & wound/skin care
neuro_eeg Neurophysiology
neuro_emg Neurophysiology

1.6 - Medication Administration (Time-Varying)

Raw Data

Overview

  • Training set: train_medication_admin.csv
    • 1 666 594 rows
  • Validation set: valid_medication_admin.csv
    • 267 388 rows
  • Testing set: test_medication_admin.csv
    • 102 762 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
AHFS_CODE AHFS Code of administered medication corresponding to the columns in the pre-processed data table.
admin_time Time medication was administered in hours since admission.

Pre-Processed Data

Overview

  • Training set: train_medication_admin_8hr.csv
    • 362 179 rows
  • Validation set: valid_medication_admin_8hr.csv
    • 52 181 rows
  • Testing set: test_medication_admin_8hr.csv
    • 20 538 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
time_window 8-hour window since time of admission
med_[] 1 if medication with corresponding AHFS code was administered; 0 otherwise
med_NA 1 if medication order is written as missing

1.7 - Outcomes & Alternate Outcomes (Time-Varying)

Outcomes

Overview

  • Training set: train_outcome_timeseries_8hr.csv
    • 362 179 rows
  • Validation set: valid_outcome_timeseries_8hr.csv
    • 52 181 rows
  • Testing set: test_outcome_timeseries_8hr.csv
    • 20 538 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
window 8-hour window since time of admission
OUTCOME_ALL 1 if the outcome is anything other than a patient discharge (outcome_no_ccc is 1, 2, 3, or 4); 0 if the patient is discharged (outcome_no_ccc is 5)
outcome_no_ccc 1 if ICU Transfer, 2 if Death (excepting comfort care order placed), 3 if Palliative Entry (or Death where comfort care order placed), 4 if Palliative Discharge, 5 if Discharged
outcome_type_24 If an outcome happnes in the next 24 hours, takes the value of outcome_no_ccc; otherwise, 0
outcome_type_48 If an outcome happnes in the next 48 hours, takes the value of outcome_no_ccc; otherwise, 0
outcome_type_72 If an outcome happnes in the next 72 hours, takes the value of outcome_no_ccc; otherwise, 0
outcome_all_24 1 if patient experiences any outcome (OUTCOME_ALL is 1) in the next 24 hours; 0 otherwise
outcome_all_48 1 if patient experiences any outcome (OUTCOME_ALL is 1) in the next 48 hours; 0 otherwise
outcome_all_72 1 if patient experiences any outcome (OUTCOME_ALL is 1) in the next 72 hours; 0 otherwise
outcome_icu_24 1 if patient transfers to the ICU in the next 24 hours; 0 otherwise
outcome_icu_48 1 if patient transfers to the ICU in the next 48 hours; 0 otherwise
outcome_icu_72 1 if patient transfers to the ICU in the next 72 hours; 0 otherwise
outcome_udeath_24 1 if patient has an unexpected death (outcome_no_ccc is 2) in the next 24 hours; 0 otherwise
outcome_udeath_48 1 if patient has an unexpected death (outcome_no_ccc is 2) in the next 48 hours; 0 otherwise
outcome_udeath_72 1 if patient has an unexpected death (outcome_no_ccc is 2) in the next 72 hours; 0 otherwise
outcome_pal_24 1 if patient transfers to palliative care or has an expected death (outcome_no_ccc is 3 or 4) in the next 24 hours; 0 otherwise
outcome_pal_48 1 if patient transfers to palliative care or has an expected death (outcome_no_ccc is 3 or 4) in the next 48 hours; 0 otherwise
outcome_pal_72 1 if patient transfers to palliative care or has an expected death (outcome_no_ccc is 3 or 4) in the next 72 hours; 0 otherwise

Alternate Outcomes

Overview

  • Training set: train_alternate_outcome_timeseries_8hr.csv
    • 362 179 rows
  • Validation set: valid_alternate_outcome_timeseries_8hr.csv
    • 52 181 rows
  • Testing set: test_alternate_outcome_timeseries_8hr.csv
    • 20 538 rows

Links on: ENCOUNTER_NUM

Schema

Column Name Description
ENCOUNTER_NUM Identifier used for each encounter
window 8-hour window since time of admission
sepsis_24 1 if patient experiences sepsis in the next 24 hours; 0 otherwise
sepsis_48 1 if patient experiences sepsis in the next 48 hours; 0 otherwise
sepsis_72 1 if patient experiences sepsis in the next 72 hours; 0 otherwise
resp_24 1 if patient experiences respiratory failure in the next 24 hours; 0 otherwise
resp_48 1 if patient experiences respiratory failure in the next 48 hours; 0 otherwise
resp_72 1 if patient experiences respiratory failure in the next 72 hours; 0 otherwise

2 - Explanation and Background

Exploration of the contents of the dataset

Background

The Data Science and Advanced Analytics (DSAA) team at Unity Health Toronto has developed and evaluated advanced patient monitoring and decision support systems to improve the efficiency, accuracy, and timeliness of clinical decision-making on the General Internal Medicine (GIM) inpatient ward at St. Michael’s Hospital. The GIM dataset was created through this work, and is comprised of de-identified health related data associated with patients who were admitted under the GIM service at St. Michael’s Hospital

Funding for the creation and de-identification of the dataset was provided by Unity Health Toronto. The dataset was originally created internally, and was provided to T-CAIREM under a data transfer agreement to make the dataset available. It is currently the largest Canadian healthcare dataset made openly available to researchers.

Data

The General Internal Medicine (GIM) dataset is comprised of deidentified health related data associated with over 22,000 patient encounters for 14,000 unique patients who were admitted under the GIM service at St. Michael’s Hospital between 2011 and 2019. All patients admitted under a GIM service with an admission of at least 30 hours were included. The dataset is provided in both a preprocessed format and as raw data tables, all available as CSV files.

Data tables are all divided into three sets, for training (data collected prior to December 1, 2017), validation (data collected between December 1, 2017 and December 1, 2018), and testing (data collected after December 1, 2018). The dataset includes both static and time-varying tables. Please note that division into sets occurs on the level of encounters and not individual patients. As such, the same patient may be represented in multiple of the training, validation, and testing datasets.

Variables

The following variables were selected for inclusion based on consultation with a staff physician, including:

  • 136 Numeric Values including:
    • 9 vital signs
    • 100 labs
    • 7 shift assessment variables
    • 7 intake-outtake variables
    • 1 ulcer variable, 1 alcohol scale, 1 diabetes variable
  • 165 Clinical Orders including:
    • Imaging
    • Telemetry
    • Consults
    • Cardio
    • Diet
    • Respiration
    • Activities
    • Codes
    • Protocols
    • Transfusions
    • Wound Care
    • Neuro
  • Medication Administrations (grouped by AHFS Class)

Collection and Pre-Processing

Data was extracted directly from the following source systems:

  • Admit-Discharge-Transfer (ADT) System: Identify patient encounters under the GIM service.
  • Electronic Medical Records (EMR): Demographics, laboratory results, clinical orders, vitals and ICD-10 codes.
  • Medication Administration Check (MAK): Documentation for all inpatient medication administrations, including the type of medication, dose, timing, administration route, and administration timestamp.

The dataset is provided in its original, raw form as well as in a pre-processed form which aggregates data into fixed time windows. Pre-processing is done as follows:

  • Time-varying data is binned into 8 hour windows
  • Numeric data is averaged within each window, trimmed, and normalized. Two variables are added: an indicator for measurement, and a time since last measurement
    • Missing numeric data is carried forward with mean imputation
  • Orders are given as indicator variable
    • Missing orders are imputed as zero
  • Medications are grouped into classes and then classes are given as indicator variables

For more details, please review the explanations for each individual data table. Please note that the use of mean imputation may pose challenges with using the binned data.

De-identification

The following steps were taken by individuals at Unity Health to de-identify the data:

  1. Patient IDss and encounter numbers were removed from the data. Encounter numbers were replaced with a unique random 6 digit number.
  2. Addresses, postal codes, and names were stripped from the data.
  3. Any variable containing the year or month have been removed from the data. Pre-processed data includes a time window indicating the number of 8 hour blocks since admission, while raw data includes a “time since admission” variable for each measurement.

In addition, T-CAIREM staff have also further de-identified the data by grouping individuals' ages into five-year categories, capping these categories at 20 on the lower end and 100 on the upper end.

Use

The data included in this dataset has been used internally at St. Michael’s Hospital to build systems for improving patient monitoring and decision making. Some of this work has been referenced in publications[1][2].

Research Ethics Board (REB) approval has been obtaind for both the creation of the dataset and the de-identified version of the dataset. Individual consent has been provided for the collection and analysis of data. Consent has not been given for the secondary use of the de-identified dataset, and is not required under PHIPA or TCPS2.

This dataset is governed by Unity Health REB, protocol #21-206. Transfer of the data is governed by a Data Transfer Agreement between the University of Toronto and Unity Health. Access by authorized users is governed by a Data Sharing Agreement and Code of Conduct, as well as the Health Data Nexus Contributor Review Health Data License 1.0.

Data is owned by Unity Health Toronto, with access provided by T-CAIREM at the University of Toronto. The dataset will be updated and maintained jointly by Unity Health and T-CAIREM, until either party chooses to remove support for the dataset. For more information or any questions about the information in this dataset,

Version History

  • 1.0.0: Original hosting of the dataset.
  • 1.0.1: Update to the sharing policy to conform to T-CAIREM guidelines.

Citation

DOI: https://doi.org/10.57764/1w7f-kb56

Version: 1.0.1

BibTeX Citation:

References

  • [1] Verma AA, Murray J, Greiner R, Cohen JP, Shojania KG, Ghassemi M, Straus SE, Pou-Prom C, Mamdani M. Implementing machine learning in medicine. CMAJ. 2021 Aug 30;193(34):E1351-7.
  • [2] Nestor B, McCoy LG, Verma A, Pou-Prom C, Murray J, Kuzulugil S, Dai D, Mamdani M, Goldenberg A, Ghassemi M. Preparing a clinical support model for silent mode in general internal medicine. In Machine Learning for Healthcare Conference 2020 Sep 18 (pp. 950-972). PMLR.

2.1 - Encounters (Static)

Description

This is a master table containing all the essential information associated with a patient encounter (visit). Each row represents a distinct encounter in the GIM ward. The Encounters table links to all other tables on the ENCOUNTER_NUM column.

2.2 - Demographic Variables (Static)

Description

This table includes patient demographics for the encounters listed in the Encounters dataset. Each row represents a distinct encounter in the GIM ward. Demographics are consistent across encounters for each patient, with the exception of age (which naturally changes over time.)

The lack of demographic information related to race and ethnicity presents a potential difficulty with data fairness and equality when drawing conclusions based on the data in this datasets. Please be conscious of how the lack of this data affects any analysis. Please also note that the term “Sex” is provided as a binary field consisting of “Male” and “Female” entries, based on how patients are entered in electronic medical records. The term “Sex” should be approached with caution in this data, as it may not reflect the lived experiences of individuals included in the dataset and may provide misleading information on individuals' medical information.

Figures

Figure 1: Patients by Sex

Figure 2: Patients by Age

Figure 3: Patients by Province

Figure 4: Patients by Language

Figure 5: Patients by Marital Status

Figure 6: Patients by Housing Status

Figure 7: Patients by Religion

2.3 - Baseline Values (Static)

Descriptions

This table includes the mean values for several of the variables listed in the Numeric time-varying tables collected prior to admission in the GIM ward. Each row represents a distinct encounter in the GIM ward.

2.4 - Numeric Variables (Time-Varying)

Raw Data

Description

This table includes numeric results for laboratory measurements and vitals and the time the measurement was taken. Each row represents a distinct encounter in the GIM ward and a distinct test at a distinct time. The tests listed in this table are equivalent to the columns in the pre-processed data tables. Result times less than 0 are measurements that would have taken place before the patient was in the GIM ward (e.g. while in the emergency department).

Pre-Processed Data

Description

This table contains average values of numeric variables measured over 8 hour windows. Each row represents a distinct encounter in the GIM ward and an 8-hour time window from admission. The table also includes an indicator valuable indicdating whether the measurement was taken and a counter describing the number of windows since the last measurement.

Figures

Figure 1: Numeric Values

Figure 2: Number of Tested Encounters for Top Eight Tests (as a proportion of total encounters in the data table)

Figure 3: Average Number of Tests (on encounters including the tested variable)

2.5 - Clinical Orders (Time-Varying)

Raw Data

Description

This table includes the start and stop times for all clinical orders in the dataset. Each row represents a distinct encounter in the GIM ward and a distinct time. The clinical orders listed in this table are equivalent to the columns in the pre-processed data table. Start and end times less than 0 are measurements that would have taken place before the patient was in the GIM ward (e.g. while in the emergency department).

Pre-Processed Data

Description

This table contains indicators for all clinical orders indicating whether they were included within 8 hour time windows. Each row represents a distinct encounter in the GIM ward and an 8-hour time window from admission.

Figures

Figure 1: Clinical Orders

2.6 - Medication Administration (Time-Varying)

Raw Data

Description

This table includes administered medications and the time of administration. Each row represents a distinct encounter in the GIM ward and a distinct time. The medications listed in this table (by AHFS code) are equivalent to the columns in the pre-processed data tables. Start and end times less than 0 are measurements that would have taken place before the patient was in the GIM ward (e.g. while in the emergency department).

Pre-Processed Data

Description

This table contains indicators for all medication classes administered (by AHFS code) indicating whether or not the medication was administered within the 8 hour time window. Each row represents a distinct encounter in the GIM ward and an 8-hour time window from admission.

2.7 - Outcomes & Alternate Outcomes (Time-Varying)

Outcomes

Description

This table contains the type of outcome for the encounter (over 8 hour windows) as well as a variety of indicator variables indicating whether individual outcomes happen over the following 24, 48, or 72 hours. Each row represents a distinct encounter in the GIM ward and an 8-hour time window from admission.

Attention should be paid when attempting to use this data to predict patient outcomes. There is a significant disparity in outcomes between release and other outcomes, which must be taken into account with any analysis.

Figures

Figure 1: Patient Outcomes

Alternate Outcomes

Description

This table contains indicator variables indicating whether alternate outcomes (sepsis or respiratory failure) happen over the following 24, 48, or 72 hours, for each 8 hour window.

Figures

Figure 2: Patient Alternate Outcomes

3 - Tutorials

Working with and analyzing the data

These tutorials are designed to help you to start working with and using the data. Check back in as more tutorials are posted over time.

3.1 - Loading Demographic Data

Exploring the demographics of the dataset

In this tutorial, you’ll learn how to load in your first dataset and use it to create a descriptive plot of a demographic variable (in this case, patient sex.) Note that this tutorial is meant to be used in a Health Data Nexus Jupyter workspace.

4 - Getting Started

Accessing the platform and loading in the data

To start using the dataset, access the Health Data Nexus. Check out this page for information on how to use the data.