Accuracy of chest computed tomography and reverse transcription polymerase chain reaction in diagnosis of 2019 novel coronavirus disease; a systematic review and meta-analysis

1Minimally Invasive Surgery Research Center, Iran University of Medical Sciences, Tehran, Iran 2Shahid Beheshti University of Medical Sciences, Tehran, Iran 3Student Research Committee, Faculty of Medicine, Iran University of Medical Sciences, Tehran, Iran 4School of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran 5Faculty of Medicine, Iran University of Medical Sciences, Tehran, Iran 6Department of Pediatric Nephrology, Firoozabadi Hospital, Iran University of Medical Sciences, Tehran, Iran 7Emergency Medicine Management Research Center, Iran University of Medical Sciences, Tehran, Iran 8Metabolic Disorders Research Center, Endocrinology and Metabolism Molecular-Cellular Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran 9Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Clinical Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran #Three authors have contributed equally and all of them are first authors. Immunopathologia Persa http www.immunopathol.com


Rationale
Coronaviruses are important human pathogens, causing a broad range of conditions from encephalitis to enteritis and more prominent nowadays, pneumonia. The latter seems to be the most frequent and critically severe manifestation of the current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), previously known as 2019-nCoV (1,2). This novel coronavirus has infected more people than its two epidemically out-breaking predecessors, SARS-CoV and Middle East respiratory syndrome coronavirus (MERS-CoV), which have had cumulatively 10 000 cases so far (1,3,4). The disease occurs mostly in patients 30-79 years old (86.6% of confirmed cases), and most cases (81%) were having a mild set of manifestations (non-pneumonia or mild pneumonia) maximally (1,5,6). Different tests with various accuracies, the results similar to the other infections, and no robust collective diagnostic accuracy all confuse clinicians in DOI: 10.34172/ipp.2021.36 Objectives This study aims to use the sets data regarding chest CT and RT-PCR extracted from various studies -executed in China and other involving countries -to achieve a bird'seye view and demonstrate more reliable and robust clinical diagnosis criteria regarding this emergent matter. This is obtained through systematic review and diagnostic test accuracy (DTA) by meta-analysis for CT and RT-PCR. Moreover, the publications in which categorized data of CT findings have been discussed will be described and analyzed.

Protocol and registration
This study is implemented according to the PRISMA statement (13), its subsequent for DTA meta-analysis (14), and the MOOSE group's proposal for reporting of metaanalysis of observational studies in epidemiology (15).

Eligibility criteria
The inclusion criteria comprised (a) all population-based studies after December 1, 2019, (b) addressing chest CT and/or RT-PCR tests regarding the SARS-CoV-2 adequately or as a part of their endeavors, (c) with directlyreported or extractable values of sensitivity, specificity, or any statistical/quantitative measurement of the diagnostic quality of the test. The exclusion criteria consisted of (a) all publications not meeting the above, (b) non-English literature, (c) studies before December 1, 2019, (d) case reports, reviews, or descriptive/qualitative studies, and studies in which only a novel diagnostic test has been innovated, and (e) studies in which no mention of any diagnostic tests or a quantitative measurement has been made.
Search strategies eTable 1 in online Supplementary file 1 shows the search strategies used, designed by AK, FS, HM, KS, and PP, mainly to not limit the entries to any condition, but only to the alternative names that SARS-CoV-2/COVID-19 has been called. AK, HA, and PP started and completed the search on March 26, 2020, and only the articles after December 1, 2019, have been included. Google Scholar, as a cumulative database, was limited only to the first 500 related results.

Data collection process
Endnote® X9 (Clarivate Analytics, Philadelphia, USA) was used for study screening and data extraction. AK, FS, HM, KS, and PP assigned each study to the inclusion and exclusion groups. In the first step, each of the five authors has read the titles and abstracts, and if doubted, has evaluated the full-text. Secondly, the five authors read the full-text and executed the final inclusion process. Disagreement situations regarding the inclusion process resolved through dialogue and no necessity for a thirdparty involvement occurred.

Data extraction
AA, AJ, DM, FS, HA, HD, PP, RT, SM, and SV extracted data, filling a pre-designed spreadsheet containing study characteristics and variables regarding chest CT and RT-PCR, subgroups and definitions, categories of chest CT findings, sensitivity, specificity, positive and negative predictive values (PPV and NPV), positive and negative likelihood ratios (PLR and NLR), odds ratio (OR), accuracy, and demographics. Where any of the mentioned values not directly reported in a study, the authors were to calculate it -if possible.

Key point
In a systematic review and meta-analysis to measure the diagnostic accuracy of chest computed tomography and reverse transcription polymerase chain reaction assay, we recommended to initially performing chest CT to rule out the uninfected people. In the suspicious cases, we suggest RT-PCR to confirm the disease. Performing serial reverse transcription polymerase chain reaction instead of the onetime test is highly recommended, to let the viral loads reach the diagnostic levels, especially in cases of high clinical suspicion.
Quality assessment AA, AJ, AK, DM, FS, HA, HD, PP, RT, SM, and SV completed the quality assessment based on the QUADAS-2 revised tool for diagnostic accuracy studies (16). The team reviewed each study and filled the pre-designed table for the risk of bias appraisal and its related concerns in 4 domains: patient selection, index tests (RT-PCR and chest CT), reference standard, and flow and timing. The studies -with acceptable quality -concerning the measurement of RT-PCR and chest CT application for the diagnosis enrolled in the meta-analysis for accuracy of the tests.

Diagnostic accuracy measures
The accuracy for the diagnosis of COVID-19 per patient has been measured in the studies through true positive (TP), false positive (FP), false negative (FN), and true negative (TN) test measures, which also result in the sensitivity, specificity, PLR, and NLR for RT-PCR and chest CT. The studies have reported the number of participants for the tests with positive/negative results for SARS-CoV-2. They then mostly have compared the test with the gold standard (usually RT-PCR), directly or indirectly. Some of the entries had subgroups defined, of which subgroups of severity were of interest and extracted for further analysis.

Synthesis of results
The definition for COVID-19 in the included studies mainly comprised symptoms, and the confirmation was mostly by RT-PCR and then chest CT. To build a set of analyzable data, the variables expressing the same concepts in different studies needed to be similarized and interpreted to unified values. The described values regarding chest CT and RT-PCR have been used to fill in or, if need be, calculate TP, FP, FN, TN, sensitivity, specificity, PLR, NLR, PPV, NPV, odds ratio, and their standard errors and 95% confidence intervals (CIs).

Meta-analysis
The software used for analyses was Stata/MP version 16.0 (StataCorp LLC, College Station, Texas, USA). AK designed and AK, HM, KS, and PP performed the meta-analyses. To be more informative, the data were summarized and pooled in various types. We used the metandi command for pooling the classic studies which have all the necessary information about TP, FP, FN, and TN values for the index tests (17). The second approach for summarizing the existing literature was collecting any index in any evidence and pooling it using metaprop command for numeral variables (18). The third analysis was based on metan command for the indices expressed as means and their standard errors (19).

Other analysis
The command metabias was used for the appraisal of publication bias and metatrim to trim and fill the studies if needed. For the meta-regression assessment, metareg was executed; metainf was used to investigate the influence of each study on the meta-analysis by omitting the studies one-by-one and repeating the computations. Besides, to appraise the differences of chest CT characteristics based on the severity in two subgroups of non-severe and severe patients, metaprop was performed.

Study selection
The process of study screening is summarized in the flow diagram ( Figure 1). After searching, there were 2074 studies found; (1) PubMed resulted in 428, (2) Scopus in 359, (3) Embase in 757, and (4) Google Scholar in 500 first related, and (5) searching in other sources' data made for the novel coronavirus (WHO, SSRN, MedRxiv, and CDC) in 28 entries. After removing the duplicates, 1395 studies enrolled in the title/abstract screening for the eligibility criteria. Ninety-one were accepted for fulltext screening, and then 30 studies were included in the qualitative analysis (The excluded entries were; two basic sciences and molecular assessment, two case reports, one epidemiologic study, two letters to the editor, one nonaccessible full-text, eight non-English, eight reviews, 33 not mentioning any quantitative measures of the index tests, and four according to other reasons). Of 30 selected studies, all were addressing the diagnostic values of chest CT and two, RT-PCR.
Study characteristics eTable 2 in the online supplement summarizes the study characteristics. Of 30 included studies, all were conducted in Asia; 29 (96.7%) in China and 1 (3.3%) in South Korea. The set contained the following study designs: 13 (43.3%) retrospective cross-sectional (20-32), 7 (23.3%) retrospective cohort (10,11,33-37), 4 (13.3%) prospective cross-sectional (1,38-40), 3 (10.0%) prospective cohort (41)(42)(43), and three (10.0%) retrospective case series (9,44,45). No randomized controlled trial was eligible to include. The reference standards in the studies were the following; metagenomic next-generation sequencing (mNGS) in 2 studies (6.7%), RT-PCR in 22 (73.3%), chest CT in 1 (3.3%), and clinical features in three (10.0%). Participants mostly recruited from hospitals. Figure 2 shows the QUADAS-2 risk of bias and concerns summary, and eFigure 1 in the online supplement presents the graph. The studies mostly had a high/unclear risk of bias in all four domains. In the course of patient selection, the studies mostly included the infected population and did not randomize or blind the inclusion process. Besides, non-optimal reference standards -more often RT-PCR, which is itself not an ideal standard -have been creating a source of bias. Consequently, FP and TP values are not completely reliable, and thus the accuracy values would not be so valid (most probably overestimated). A major source of bias in flow and timing domain would be raised from the time interval between the reference standard and the index test, which has jeopardized the validity of the comparison between the reference and index tests because of the disease progression and therapies during the time. Also, the studies mostly were implemented in clinical settings; therefore the interpretation of the index test would have been exposed to bias since the reference test results had probably been revealed to the clinician. Conclusively, the overall risk of bias for most of the studies would be high or unclear, affecting the diagnostic values.

Results of individual studies
Four studies were reporting adequate values to be included in DTA analysis for chest CT. Ai et al (41)

Synthesis of results
Chest CT Studies addressing sensitivity, specificity, NLR, PLR, NPV, PPV, accuracy, and OR of chest CT directly or indirectly were extracted and used for fixed-or random-effects meta-analysis -which one of them fitted best. The results are shown in eTable 5 in the online supplement, and the Forest plot for the sensitivity could be found in Figure 3.
The four studies (10,26,37,41) of interest for DTA metaanalysis contained extracted values of TP, FP, FN, and TN were pooled and underwent metandi. The results are shown in the online supplement's eTable 6. The sensitivity was 96.6% (95% CI: 95.1 to 97.6), specificity 22.5% (95% CI: 1.4 to 34.5), and NLR 0.15 (95% CI: 0.1 to 0.3). Figure  4 presents the hierarchical summary receiver operating characteristic (HSROC) curve for the DTA analysis, which demonstrates that the study estimates of the aggregated sensitivity and specificity (alternatively, the accuracy) are heterogeneous.
The proportions meta-analysis of chest CT characteristics indicated that 97% (95% CI: 95 to 99) of COVID-19 patients had abnormalities on their chest CT. These abnormalities were bilateral in 82% (95% CI: 76 to 87) and involved more than two lobes in 74% (95% CI: Immunopathologia Persa Volume 7, Issue 2, 2021 5 Accuracy of CT and RT-PCR for COVID-1 53 to 96). The most affected lobe of the lungs was right lower lobe in 54% (95% CI: 2 to 100), and the most frequent pattern was mixed GGO, which is seen in 71% (95% CI: 36 to 92) of patients. Other CT characteristics are summarized in the online supplement's eTable 7. In additional analyses, after merging the patterns that are equivalent to GGO or consolidation, results demonstrated that GGO or consolidation are seen in 65% (95% CI: 49 to 80). Additionally, random-effects meta-analysis after merging similar characteristics in chest CT showed that the peripheral distribution, linear opacities, and pleural effusion are seen in 35% (95% CI: 16 to 54), 49% (95% CI: 8 to 89) and 8% (95% CI: 5 to 11) of patients, respectively ( Table 1). The CT scan characteristics were not different between severe and non-severe patients except for the number of pleural effusion findings (eTable 8); 0.86 for the severe patients (95% CI: 0.46 to 0.97) and 0.02 for the non-severe group (95% CI: 0.00 to 0.12).

RT-PCR
Pooling diagnostic values of two studies (23,41) for RT-PCR by metan regardless of being serial or only as an initial test show sensitivity and specificity of 88.2% (95% CI -78.1 to 98.2) and 100.0% (95% CI: 97.6 to 102.4), respectively (eTable 9). I-square indicated no heterogeneity between these two studies.

Additional analyses Chest CT
Publication bias for the studies addressing chest CT showed bias for the sensitivity and PPV (P = 0.002 and 0.044, respectively). No trimming was performed in both cases, because of no significant changes in data. The metainf analysis showed no deviations from 95% confidence ranges for the diagnostic values of chest CT. Meta-regression for the diagnostic values of CT showed that of all independent variables, the mean age of the study samples (P = 0.098; tau 2 =57.53) and percentage of males (P = 0.14; tau 2 =56.99) had significant effects on the heterogeneity of sensitivity between the studies. This effect vanished in the combination of mean age and percentage of males (P = 0.33 and 0.40, respectively; tau 2 =61.82). Thus, the mean age of samples and sex combinations of the population have partially affected the heterogeneity. The results showed no significant effect of the sample size, reference standard, study design, and ethnicity, as independent variables, alone or in combinations, on the diagnostic values.

RT-PCR
The assessment for publication bias, trim and fill, and meta-regression for the pooled data of RT-PCR were not applicable since the number of studies was insufficient. For initial RT-PCR, studies underwent metabias and had significant publication bias for sensitivity (P < 0.001) and accuracy (P = 0.002). After metatrim, no trimming was performed, and data were unchanged. Command metainf showed no deviations from the 95% confidence range.

Chest CT
The results of data analysis in this study indicate that the CT scan has high sensitivity but relatively low-specificity, because of a low false-negative rate and a high falsepositive rate. Therefore, many non-infected people are misdiagnosed, while on the other hand, not many of the patients with COVID-19 are missed. Besides, CT is an acceptable test for ruling out COVID-19, while not a robust tool for confirming the disease. As mentioned before, the false-negative rate was low in CT because the virus involved the lung in the early stages of the disease (20,35,36). Furthermore, a high false-positive rate in CT may be because of the similar behavior of SARS-CoV-2 to the other respiratory tract viruses (influenza, parainfluenza, and respiratory syncytial viruses) (10,26,37). GGO, pleural effusion, consolidations, treein-bud patterns, and nodules are common findings in all (47). The high DOR and sensitivity, relatively high NPV and accuracy, in combination with low NLR for chest CT show that it has a high diagnostic value. However, we should consider that all diagnostic indices are somehow affected by bias in the domains of reference standard, index test, and flow and timing. The analysis demonstrates that GGO is one of the most frequent patterns in COVID-19 patients. Still, it is very unspecific, with differential diagnoses such as cancer, inflammatory conditions, injuries, edema, hemorrhage, and pulmonary fibrosis ( 48). More importantly, GGO could be found as a widespread finding in H1N1 influenza patients (49). It also has been found in MERS-CoV and SARS-CoV patients and all types of viral lower respiratory infections, more frequently in the infection caused by parainfluenza and respiratory syncytial viruses (47,(50)(51)(52).
Mixed GGO was also evident in 71% of CT findings, which is also could be found frequently in SARS and malignancies (51,53,54). In the context of the involvement's symmetry and loci, chest tomography findings were more frequently bilateral. Lower lobes of both right and left sides were the most involved lobes, which is a similar trait to that of the parainfluenza virus but separates COVID-19 form influenza and respiratory syncytial infections (47). The distribution is mostly peripheral. Results of the subgroup analysis have shown a higher prevalence of pleural effusion in the severe than the non-severe COVID-19 patients. However, pleural effusion in COVID-19 is majorly overlapped with its frequent differential diagnoses; congestive heart failure, parapneumonic effusion, malignancy, pulmonary embolization, and other viral diseases [55]. Besides, bilateral pleural effusion is a common finding in patients with viral lower respiratory tract infection caused by MERS-CoV, parainfluenza virus, respiratory syncytial virus, and influenza virus (47,50). Thus, this finding may be of high value for severity monitoring and follow-up. To the day, there have been not many studies addressing CT characteristics in COVID-19, and more subgroup analyses are needed to bring more findings similar to the pleural effusion to light, making a valuable asset of diagnostic evidence.

Diagnostic role of RT-PCR and chest CT
According to the RT-PCR test indices, it seems a better tool than CT for confirming the disease, but because of the large number of undiagnosed patients in this procedure, it cannot be used suitably for ruling out the disease. Thus, RT-PCR is not a suitable tool for the primary screening of patients. The higher false-negative ratio in RT-PCR is due to sampling errors, sampling location, and low viral load of the sample (10). However, these results are not reliable due to few numbers of studies that entered in the analysis and their result was completely different. Yang et al (37) showed that PCR sensitivity increased in the serial RT-PCR due to an increase in viral load in the sample from the patients. Still, Ai et al (41) demonstrated that the sensitivity of second-time PCR is lower than the initial PCR. It seems the first study designed a better method, and its results are more reliable. Therefore second-time RT-PCR may be a suitable tool to follow up the patients whose test results are negative but are clinically suspicious. Given all the aspects, further studies are needed to conclude the application of second-time RT-PCR. On the other hand, there were no available data about the time interval between the symptoms' onset and initial RT-PCR, or initial and second RT-PCR. The mean age of the samples has affected the heterogeneity among the studies, and thus it is essential to interpret and design the diagnostic studies based on the age groups, especially the elderly. Besides, as a source of heterogeneity of the studies, the quality assessment in all four domains is not promising; it has a massive effect on CT and RT-PCR's diagnostic values and causes overestimation of the CT sensitivity. The time interval between RT-PCR and CT, and patient selection methods are the most significant concerns. Henceforward, this study could not suggest a definite time for performing RT-PCR and an optimal interval between RT-PCR and chest CT, and further evaluation is needed to suggest the best time to perform the tests.

Limitations and strengths
The novel nature and the pandemic situation of COVID-19 urges us to rapidly address a diagnostic review to facilitate the clinical approaches, though not enough time seems to be passed from the first cases, allowing more in-depth evaluations and more populated closed-cases resourcing the studies. This has limited us not only because of the number and quality of the studies reviewed but also in terms of not fully comprehending the real behavior of the virus.
It is essential to mention that along with the quantitative data in the studies, we even used the descriptive data regarding RT-PCR and CT, compiling them quantitatively, to extract data needed to calculate our index values. Another strength of our work is having different studies from different locations. This enabled us to compute PPV and NPV. Usually, these indices are not calculated in diagnostic test accuracy meta-analysis because of the different values. However, we calculated these indices by using metan in addition to metandi even though interpreting these two indices should be conducted very cautiously because of the high impact of the prevalence in each region on such indices.
Notably, the prevalence of the disease has a high impact on PPV and NPV, and the aggregate data for RT-PCR have not led to PLR and NLR. Moreover, the prevalence could result in different PPV and NPV in different locations, because of the wide variety of numbers of closed cases and different stages of the involvement of every location. Thus, time is needed to attain more closed cases and more detailed studies of RT-PCR and chest CT.

Conclusion
The pandemic situation and high rates of disease transmission urge us to rapidly diagnose the patients and isolate them to stop further spreading. The results show that with its very good NLR, sensitivity, availability, and rapidness, chest CT scan is an excellent test to rule out COVID-19 in the uninfected. It also is a very good follow-up and severity assessment tool. Our analysis confirms this through the high NPV of chest CT. Besides, noting that RT-PCR is a specific but not adequately rapid and available test, it could be used to confirm the suspicious cases after performing the initial chest CT scan. However, considering the high rates of false-negative in the initial RT-PCR tests, in case of strong suspicion, it is recommended to perform repeated RT-PCR tests, to have an absolute confirmation by letting the viral load reach higher levels. Performing repeated RT-PCR is crucial in these situations to avoid missing the infected people, and to stop the further spread of the disease.