The life tables necessary for calculating net survival were created using death certificates of Pennsylvania residents. A period life table for each calendar year was created for different combinations of age, race, and sex. The probability of dying in an age group was calculated using a Poisson model with flexible splines, allowing the creation of unabridged and smooth life tables.
While net survival assumes the population survival function excludes those with the condition, life tables excluding cancer patients are extremely hard to create. The statistics in this report instead use life tables for the entire population of Pennsylvania.
The life tables use death certificates for Pennsylvania residents whose ages at death are no more than 125. The racial categories covered by the life tables are white and black. Death certificates with another race are aggregated into the “other” race category.
Let \(\lambda(t)\), called the hazard function, represent the instantaneous rate of mortality at time \(t\), \(t > 0\). Then the probability of surviving to time \(T\) is given by the survival function
\[ S(T) = \exp{-\int_{0}^{T} \lambda(t) \, dt} \]
If the observed hazard function is \(\lambda_O\) for a group with cancer and \(\lambda_P\) for a matching population without cancer, then the excess hazard is
\[ \lambda_{E} = \lambda_{P} - \lambda_{O} \]
If a survival function uses this excess hazard, it is called net survival.
Net hazard is never directly observable and cannot be used to estimate a patient’s expected time of death or the number of deaths among a population. Instead, it can highlight disparities in cancer diagnosis, treatment, and comorbidities between populations.
For this report, net survival was estimated using the Pohar Perme method (Perme, Stare, and Estéve 2014), which is an unbiased estimator of net survival in a relative setting.
Relative survival, which does not consider the cause of death, was used for 2 reasons:
In the Pohar Perme method, the predicted survival for each subpopulation was found by weighting its measured net survival (defined above) using the survival time distribution of the respective population. The application of weights attempted to reduce the bias of death due to population hazard.
Population hazards come from life tables produced by the Department of Health. Records were matched to these life tables by age, sex, race, and calendar year. Due to unreliable population hazard estimates for races other than white and black, records with other races were matched to population hazards by age, sex, and calendar year only.
The cohort approach groups patients by year of diagnosis, from 2001 to 2017, with a survival function being created up to the final day of follow-up, December 31, 2017. A net survival estimate for a cohort can be considered the “experienced” net survival for an average patient diagnosed in a specific year.
Patients diagnosed in 2017 were not included in the analysis because death linkage for those cases was still in progress.
Recent cases do not have 5 years of follow-up, so the complete approach was used to create an alternative estimate of recent net survival. It included cases diagnosed during the 7-year period 2010 to 2016.
Cases with less than 5 years of follow-up still contributed to the hazard estimates during the first few years after diagnosis. The interval in which a case contributed depended on its year of diagnosis.
0 to 1 years | 1 to 2 years | 2 to 3 years | 3 to 4 years | 4 to 5 years | |
---|---|---|---|---|---|
2010 | |||||
2011 | |||||
2012 | |||||
2013 | |||||
2014 | |||||
2015 | |||||
2016 |
All net survival estimates in this report used cancer incidence records provided by the Pennsylvania Cancer Registry (PCR) meeting certain criteria.
The patient:
The cancer:
Race data in this report comes from what health care providers report to the PCR. The categories follow the definitions used in the 2000 U.S. Census (U.S. Census Bureau 2007, appendix G-7).
All statistics for the percentage of cancer cases diagnosed at a particular stage were retrieved from the Pennsylvania Department of Health’s EDDIE data query system. Further information on these statistics can be found on the system’s home page.
Early stage includes in situ, localized, stage I, and stage II cancers. Late stage includes regional, distant, stage III, and stage IV cancers. In situ cases were not used to calculate net survival rates, because they are not invasive. Net survival rates for early staged cases therefore only include localized cases. The exception is in situ urinary bladder cases, which were used for net survival rates because of unclear standards for differentiating between in situ and localized cases.
All age-adjusted death rates were retrieved from the Pennsylvania Department of Health’s EDDIE data query system. Further information on these statistics can be found on the system’s home page.
Net survival estimates relied on the dates of 3 events: birth, diagnosis, and last contact. To avoid any bias from excluding records with an unknown month or day for any of these dates, these “incomplete” dates were imputed as being halfway between their earliest and latest possible dates.
For example, if a patient’s date of diagnosis is recorded as May 2005, but no day of the month is given, the date would be imputed as May 16, 2005. This is halfway between May 1, 2005, and May 31, 2005. But if the patient’s date of last contact were May 20, 2005, then the diagnosis date would be imputed as May 10, 2005, which is halfway between the start of the final and the last contact date.
Records that were missing the year of birth, diagnosis, or last contact were not included in the analysis.
The PCR practices inactive follow-up, in which they rely on incoming cancer abstracts and death certificate matches to determine a patient’s vital status (alive/dead). This means the vital status was only reliable up to most recent year-of-death matched.
For this report, the maximum follow-up date was December 31, 2018. Patients living on this date are right-censored: days survived or dates of death beyond this date were not used in the analysis.
A case’s insurance grouping is based on the primary payer of health care costs at diagnosis. Payers are grouped into 1 of 3 categories:
A patient’s neighborhood poverty level reflects the percentage of households below the poverty line in the patient’s census tract at time of diagnosis. This report categorizes neighborhood poverty levels as:
Category | Range |
---|---|
Low | 0% to < 5% |
Moderate | 5% to < 10% |
High | 10% to < 20% |
Very high | 20% to 100% |
Poverty levels were assigned using the Poverty and Census Tract Linkage SAS Program provided by the North American Association of Central Cancer Registries.
Net survival estimates were age-adjusted using the International Cancer Survival Standards (ICSS) populations (Corazziari, Quinn, and Capocaccia 2004; Surveillance, Epidemiology, and End Results Program 2012). The age groups used were 15-44, 45-54, 55-64, 65-74, and 75+. The population size for each age group depended on the cancer site and was meant to reflect the age distribution of incidence for the site.
SAS version 9.4 was used for record cleaning and preparation, date handling, case selection, and age-adjustment.
Stata version 14 was used for calculating net survival
rates using version 1.4 of the stns
command (Clerc-Urmès, Grzebyk, and Hédelin 2014).
R version 3.6.2 (2019-12-12) was used for analyzing the net survival results,
producing graphics, and creating this report with the bookdown
and knitr
packages. The following is the list of used third-party packages and their
versions.