An international Ki67 reproducibility study.
BACKGROUND: In breast cancer, immunohistochemical assessment of proliferation using the marker Ki67 has potential use in both research and clinical management. However, lack of consistency across laboratories has limited Ki67's value. A working group was assembled to devise a strategy to harmonize Ki67 analysis and increase scoring concordance. Toward that goal, we conducted a Ki67 reproducibility study.
METHODS: Eight laboratories received 100 breast cancer cases arranged into 1-mm core tissue microarrays-one set stained by the participating laboratory and one set stained by the central laboratory, both using antibody MIB-1. Each laboratory scored Ki67 as percentage of positively stained invasive tumor cells using its own method. Six laboratories repeated scoring of 50 locally stained cases on 3 different days. Sources of variation were analyzed using random effects models with log2-transformed measurements. Reproducibility was quantified by intraclass correlation coefficient (ICC), and the approximate two-sided 95% confidence intervals (CIs) for the true intraclass correlation coefficients in these experiments were provided.
RESULTS: Intralaboratory reproducibility was high (ICC = 0.94; 95% CI = 0.93 to 0.97). Interlaboratory reproducibility was only moderate (central staining: ICC = 0.71, 95% CI = 0.47 to 0.78; local staining: ICC = 0.59, 95% CI = 0.37 to 0.68). Geometric mean of Ki67 values for each laboratory across the 100 cases ranged 7.1% to 23.9% with central staining and 6.1% to 30.1% with local staining. Factors contributing to interlaboratory discordance included tumor region selection, counting method, and subjective assessment of staining positivity. Formal counting methods gave more consistent results than visual estimation.
CONCLUSIONS: Substantial variability in Ki67 scoring was observed among some of the world's most experienced laboratories. Ki67 values and cutoffs for clinical decision-making cannot be transferred between laboratories without standardizing scoring methodology because analytical validity is limited.