Technical Notes

Life tables

The life tables necessary for calculating net survival were created using death certificates of Pennsylvania residents. A period life table for each calendar year was created for different combinations of age, race, and sex. The probability of dying in an age group was calculated using a Poisson model with flexible splines, allowing the creation of unabridged and smooth life tables.

While net survival assumes the population survival function excludes those with the condition, life tables excluding cancer patients are extremely hard to create. The statistics in this report instead use life tables for the entire population of Pennsylvania.

The life tables use death certificates for Pennsylvania residents whose ages at death are no more than 125. The racial categories covered by the life tables are white and black. Death certificates with another race are aggregated into the “other” race category.

Net survival

Let \(\lambda(t)\), called the hazard function, represent the instantaneous rate of mortality at time \(t\), \(t > 0\). Then the probability of surviving to time \(T\) is given by the survival function

\[ S(T) = \exp{-\int_{0}^{T} \lambda(t) \, dt} \]

If the observed hazard function is \(\lambda_O\) for a group with cancer and \(\lambda_P\) for a matching population without cancer, then the excess hazard is

\[ \lambda_{E} = \lambda_{P} - \lambda_{O} \]

If a survival function uses this excess hazard, it is called net survival.

Net hazard is never directly observable and cannot be used to estimate a patient’s expected time of death or the number of deaths among a population. Instead, it can highlight disparities in cancer diagnosis, treatment, and comorbidities between populations.

Pohar Perme method

For this report, net survival was estimated using the Pohar Perme method (Perme, Stare, and Estéve 2014), which is an unbiased estimator of net survival in a relative setting.

Relative survival, which does not consider the cause of death, was used for 2 reasons:

  1. Pennsylvania does not have a standard method for deciding the primary cause of death.
  2. A person whose death was caused by something other than cancer might have lived longer had they not developed cancer.

In the Pohar Perme method, the predicted survival for each subpopulation was found by weighting its measured net survival (defined above) using the survival time distribution of the respective population. The application of weights attempted to reduce the bias of death due to population hazard.

Population hazards come from life tables produced by the Department of Health. Records were matched to these life tables by age, sex, race, and calendar year. Due to unreliable population hazard estimates for races other than white and black, records with other races were matched to population hazards by age, sex, and calendar year only.

Cohort approach

The cohort approach groups patients by year of diagnosis, from 2001 to 2017, with a survival function being created up to the final day of follow-up, December 31, 2017. A net survival estimate for a cohort can be considered the “experienced” net survival for an average patient diagnosed in a specific year.

Patients diagnosed in 2017 were not included in the analysis because death linkage for those cases was still in progress.

Complete approach

Recent cases do not have 5 years of follow-up, so the complete approach was used to create an alternative estimate of recent net survival. It included cases diagnosed during the 7-year period 2010 to 2016.

Cases with less than 5 years of follow-up still contributed to the hazard estimates during the first few years after diagnosis. The interval in which a case contributed depended on its year of diagnosis.

Table 1: Years of Diagnosis Contributing to Intervals of Time After Diagnosis
0 to 1 years 1 to 2 years 2 to 3 years 3 to 4 years 4 to 5 years

Details of this report

Data sources

Net survival estimates

All net survival estimates in this report used cancer incidence records provided by the Pennsylvania Cancer Registry (PCR) meeting certain criteria.

The patient:

  • was a Pennsylvania resident;
  • was at least 15 and at most 99 years old at diagnosis;
  • was not transgendered, intersexed, or of unknown sex;
  • was not of unknown race;
  • had known years of birth and last contact; and
  • survived at least 1 day past diagnosis.

The cancer:

  • was diagnosed during the period of 2001 to 2017;
  • was reported by a source other than a death certificate;
  • was invasive (including urinary bladder in situ);
  • was the first malignant diagnosis for the patient (including urinary bladder in situ);
  • had a known year of diagnosis; and
  • was not listed as being diagnosed after the patient’s date of last contact.


Race data in this report comes from what health care providers report to the PCR. The categories follow the definitions used in the 2000 U.S. Census (U.S. Census Bureau 2007, appendix G-7).

Stage at diagnosis

All statistics for the percentage of cancer cases diagnosed at a particular stage were retrieved from the Pennsylvania Department of Health’s EDDIE data query system. Further information on these statistics can be found on the system’s home page.

Early stage includes in situ, localized, stage I, and stage II cancers. Late stage includes regional, distant, stage III, and stage IV cancers. In situ cases were not used to calculate net survival rates, because they are not invasive. Net survival rates for early staged cases therefore only include localized cases. The exception is in situ urinary bladder cases, which were used for net survival rates because of unclear standards for differentiating between in situ and localized cases.

Age-adjusted death rates

All age-adjusted death rates were retrieved from the Pennsylvania Department of Health’s EDDIE data query system. Further information on these statistics can be found on the system’s home page.

Imputed dates

Net survival estimates relied on the dates of 3 events: birth, diagnosis, and last contact. To avoid any bias from excluding records with an unknown month or day for any of these dates, these “incomplete” dates were imputed as being halfway between their earliest and latest possible dates.

For example, if a patient’s date of diagnosis is recorded as May 2005, but no day of the month is given, the date would be imputed as May 16, 2005. This is halfway between May 1, 2005, and May 31, 2005. But if the patient’s date of last contact were May 20, 2005, then the diagnosis date would be imputed as May 10, 2005, which is halfway between the start of the final and the last contact date.

Records that were missing the year of birth, diagnosis, or last contact were not included in the analysis.

Date of death

The PCR practices inactive follow-up, in which they rely on incoming cancer abstracts and death certificate matches to determine a patient’s vital status (alive/dead). This means the vital status was only reliable up to most recent year-of-death matched.

For this report, the maximum follow-up date was December 31, 2018. Patients living on this date are right-censored: days survived or dates of death beyond this date were not used in the analysis.


A case’s insurance grouping is based on the primary payer of health care costs at diagnosis. Payers are grouped into 1 of 3 categories:

  • Insured
    • Private insurance
    • Medicare (except with Medicaid eligibility)
    • Military
  • Medicaid
    • Any Medicaid plan
    • Medicare with Medicaid eligibility
    • Indian/public health service
  • Uninsured
    • No insurance or payer other than self

Neighborhood poverty

A patient’s neighborhood poverty level reflects the percentage of households below the poverty line in the patient’s census tract at time of diagnosis. This report categorizes neighborhood poverty levels as:

Category Range
Low 0% to < 5%
Moderate 5% to < 10%
High 10% to < 20%
Very high 20% to 100%

Poverty levels were assigned using the Poverty and Census Tract Linkage SAS Program provided by the North American Association of Central Cancer Registries.


Net survival estimates were age-adjusted using the International Cancer Survival Standards (ICSS) populations (Corazziari, Quinn, and Capocaccia 2004; Surveillance, Epidemiology, and End Results Program 2012). The age groups used were 15-44, 45-54, 55-64, 65-74, and 75+. The population size for each age group depended on the cancer site and was meant to reflect the age distribution of incidence for the site.

Software used


SAS version 9.4 was used for record cleaning and preparation, date handling, case selection, and age-adjustment.


Stata version 14 was used for calculating net survival rates using version 1.4 of the stns command (Clerc-Urmès, Grzebyk, and Hédelin 2014).


R version 3.6.2 (2019-12-12) was used for analyzing the net survival results, producing graphics, and creating this report with the bookdown and knitr packages. The following is the list of used third-party packages and their versions.

  • bookdown 0.15
  • data.table 1.12.8
  • DBI 1.1.0
  • gdtools 0.2.1
  • ggiraph 0.7.0
  • ggplot2 3.2.1
  • htmltools 0.4.0
  • knitr 1.27
  • rmarkdown 2.1
  • RSQLite 2.2.0
  • stringi 1.4.4
  • xml2 1.2.2
  • yaml 2.2.0