Dr. Steven Skates completed his undergraduate education at the University of Western Australia and his graduate education at the University of Chicago in Statistics. He is currently an Associate Professor at Massachusetts General Hospital and Harvard Medical School where he is an early detection investigator. His research focuses on developing longitudinal algorithms for early detection of ovarian cancer – often referred to as a “silent killer” because in its early stages its symptoms are vague and non-specific – conducting and analyzing screening trials implementing these algorithms, and discovery and validation of early detection ovarian cancer serum biomarkers.
On Tuesday Oct. 3, Dr. Skates will be at McGill to give a lecture titled Mortality analyses in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) that will be held in the Meakins Amphitheatre (room 521) at the McIntyre Building from 3:30 – 4:30 p.m.
Ahead of his visit, which was organized by the Faculty of Medicine’s Department of Epidemiology, Biostatistics and Occupational Health, Dr. Skates took the time to answer some questions about his work.
Why did you and your colleagues mount this huge trial of screening for ovarian cancer (UKCTOCS)? Why did it require such a large and long study?
Progress in reducing ovarian cancer mortality through therapeutic improvements has been modest largely because ovarian cancer is detected mostly in late stage due to lack of specific symptoms, and in late stage prognosis is very poor. However, if detected in early stage, prognosis is excellent with surgical removal of the ovaries (oophorectomy) the primary treatment method. Thus ovarian cancer is a natural target for exploring early detection as an approach to reducing ovarian cancer mortality.
Most ovarian cancers occur in postmenopausal women. An immediate challenge for ovarian cancer screening is its low annual incident rate of about 1 in 2,300 postmenopausal women. There are no strong risk factors other than age and family history with the latter usually indicating presence of a BRCA1 or BRCA2 gene mutation. These inherited mutations account for about 10 per cent of ovarian cancers while 90 per cent of these cancers are sporadic.
Since definitive diagnosis of ovarian cancer requires major surgery (laparotomy) only a few false positive results – at most 10 – at this stage of the screening exam would be allowed for each screen detected case. This requires a very low false positive rate while still having a reasonable true positive rate (sensitivity). Between 1986-1993 multiple pilot screening trials in the UK led by Ian Jacobs and in Sweden showed that CA125 testing, which measures the amount of the protein CA125 (cancer antigen 125) in your blood, followed by ultrasound for a positive blood test had sufficient low false positive rate that only five operations were required to detect one ovarian cancer. From these trials we learned that each woman has her own CA125 level and variation about that level, and developed a method for detecting significant increases above this baseline which increased sensitivity while maintaining the same low false positive rate. Another UK screening trial from 1996-2001 implementing this method showed evidence of improved sensitivity for early stage disease, sufficient to convince medical research agencies to fund a definitive trial of 200,000 women, half not screened (control arm) and half screened annually. Half the screened group were tested with the longitudinal CA125 algorithm followed by ultrasound for a positive blood test, and half had an ultrasound as the primary screen. The primary endpoint was ovarian cancer mortality.
In all medical trials, there is variation in the number of events due to biological variation between people even when there is no effect of an intervention. To rule out variation as the cause for the difference in number of events between the intervention arm and the control arm we need to have sufficient number of events in both arms to judge that variation is unlikely to be the explanation for the observed difference when there is no effect, and similarly, when there is an effect.
Since ovarian cancer is relatively infrequent, the trial is large and long to observe sufficient number of deaths due to ovarian cancer so that we have a high chance (95 per cent) of saying an effect is not present when there is no effect, and a high chance (80 per cent) of saying an effect is present when there is an effect. We hypothesized the size of the effect was a 30 per cent reduction in ovarian cancer mortality. To ensure any difference in mortality is unlikely (95 per cent) to be due to chance variation when there is no effect, the trial required 228 ovarian cancer deaths in the control arm. To have a high chance (80 per cent) of detecting a 30 per cent mortality reduction if screening does have an effect requires 80 deaths in one of the screened arms.
These numbers of ovarian cancer deaths require 200,000 women to be enrolled in this randomized trial and screened annually for 7–11 years. The consenting and randomization of 200,000 women required invitations sent to over 1,200,000 postmenopausal women of whom 205,000 accepted and 202,000 were eligible and randomized. This process required four years – hence the range of 7–11 years of annual screening. Screening started in June 2001 and ended on December 31, 2011, and follow-up continued through December 31, 2014, with data cleaning, analysis and results conducted, written, and published by December 2015.
What were the results, and how were they received? Why has there been push-back from some quarters as to the results so far?
The primary test of the difference compared all the ovarian cancer deaths in the control arm to all the ovarian cancer deaths in each of the two screening arms. There was a 15 per cent reduction which was not statistically significant having a p-value of 0.10. However, this analysis used a model which assumes the effect is immediate and is a constant 15 per cent reduction throughout the 14 years of the trial. Professor Jim Hanley (of McGill’s Department of Epidemiology, Biostatistics and Occupational Health) has long been making the case that the assumption of a constant reduction in screening (and prevention) trials is not appropriate. When this assumption is not true it lowers the power of the analysis, meaning that we could miss detecting an effect if it is truly there. A p-value of 0.05 is considered statistically significant, and because we had two screening arms compared to one control arm, the p-value for each comparison was 0.0256 which adjusted for the chance factor in making two comparisons.
However, we also pre-specified a sub-group for which the longitudinal test would work best, namely the cases for which a baseline CA125 had been measured before a rise was detected. Such cases are the incident cases and formed 80 per cent of all the cases. The other cases were present at time of first test, and are referred to as prevalent cases. For the incident cases, there was sufficient evidence of non-proportionality, and with a different statistical model that allowed for non-proportionality, there was a significant reduction in ovarian cancer mortality with a p-value of 0.021.
As with all screening trials where an effect has been observed, there was a delay in the effect since time is measured from time of randomization which is close to the first screening test. In contrast, therapeutic trials measure time from randomization which is after diagnosis of the disease and close to the time of first treatment. When the treatment works the effect is often immediately observed, hence the proportionality assumption is often reasonable for therapeutic trials, but not for early detection trials where statistical models allowing for non-proportionality are more appropriate. The delay in the screening effect was seven years. With the model allowing for non-proportionality, the ovarian cancer mortality reduction in the incident cases was eight per cent in the first seven years (0–7), and 28 per cent in the second seven years (7–14).
The execution of the trial was well received. The non-significant main analysis was disappointing, and there was pushback on the sub-group analysis. The interpretation was controversial since the usual procedure is to determine whether an effect is present through the main analysis, and then determine if there are differences of the effect in sub-groups. We stated there was an effect in the sub-group of incident cases (80 per cent of cases) even though the main analysis did not find a significant effect, hence there was pushback from some sectors of the community. However, because the main analysis relied on an assumption of proportionality throughout the trial, which is not appropriate when there is a delayed effect as has been the case in all other screening trials and in this trial, and the sub-group analysis was pre-specified, we expressed cautious optimism in the results without claiming a definitive result. The net outcome is the team has been funded to conduct four more additional years of follow-up where further ovarian cancer deaths will be observed, potentially increasing the power of the analysis.
Other screening blood tests are being developed as well as other serum proteins tests, cell-free DNA tests in the blood, DNA tests in uterine lavage, and new imaging tests are also all in development.
Screening for breast cancer with mammography has about a 20 per cent mortality reduction, as does PSA screening for prostate cancer and low-dose CT screening for lung cancer, while colorectal cancer screening with sigmoidoscopy has about a 26–31 per cent reduction. The average over the 14 years of the ovarian cancer screening study (UKCTOCS) is 15 per cent, somewhat lower than the above the results in screening trials for other cancers. However, if the delayed effect is taken into account, and the effect during the second half of the study (7 – 14 years) is extrapolated to a 30 year annual screening program from ages 50–80, then the estimated effect over the 30 years is a 23 per cent mortality reduction.
You are a statistician. What has been your role in ovarian cancer screening? What are the biggest statistical challenges for you?
I trained as a statistician but my career has evolved and it is more accurate to say I am a researcher in the early detection of cancer. I have been involved in the analysis of the early screening trials and with Ian Jacobs, developed the risk of ovarian cancer algorithm (ROCA) from these analyses. ROCA is designed to detect significant rises in CA125 above a woman’s own baseline. I was involved in the design of UKCTOCS, in its execution, and in its analysis and reporting. In total, ROCA has been implemented in six ovarian cancer early detection trials in the US and UK, in postmenopausal women at normal risk and in high risk women with a strong family history of breast and/or ovarian cancer. For one of the US trials I was the principal investigator. Additional biomarkers beyond CA125 are clearly needed to increase sensitivity even further, and I have led a project on biomarker discovery and validation through the NCI’s Early Detection Research Network. Designing an optimal biomarker pipeline which has multiple phases for deciding which candidates should move forward to the next phase presents multiple statistical challenges, many still unsolved. At present, the biggest statistical challenge for me in UKCTOCS is addressing the assumption of proportionality in the main analysis in an unbiased manner.
Some people say that any way to find cancers earlier and thus treat them earlier has to be good. So why are some people so reticent about advocating (mass) cancer screening? Is cancer screening being under-used or over-used?
Any cancer screening program will have true positive and true negative tests which are benefits to the people being screened, and these benefits need to outweigh the harms due to false positive and false negative tests. There will be people sent to surgery for suspected ovarian cancer who have diseases other than ovarian cancer and in some cases do not have any disease. The net harms from such false positive tests need to be outweighed by the net benefits of true positive tests which detect ovarian cancer before it would have been detected clinically. There is debate on what a reasonable trade-off is hence some researchers assert that some cancer screening programs do not achieve a reasonable trade-off. PSA testing for prostate cancer screening is controversial since many prostate cancers detected would not have caused the patient to die, and recently the value of mammography relative to the number of exams and follow-up needed to detect one cancer have been questioned.
There is clearly underutilization in that most cancer screening programs supported by such organizations as the American Cancer Society do not reach all eligible people. For the past 20 years the death rate due to breast cancer has been going down. Half of this decrease has been attributed to early detection with mammography, and half due to advances in therapy. While advances in therapy for ovarian cancer have increased the median survival from two years to four years over the past 50 years, the death rate remains constant. We hope early detection for ovarian cancer with our approach will ultimately show an unequivocal reduction in the death rate due to ovarian cancer in the trial. If so, then we hope it will become a supported program and contribute to lowering the ovarian cancer mortality rate in the general population.
On Tuesday Oct. 3, Dr. Skates will be at McGill to give a lecture titled Mortality analyses in the UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) that will be held in the Meakins Amphitheatre (room 521) at the McIntyre Building from 3:30 – 4:30 p.m. Get more information.