Decoding and Systematization of Medical Imaging Features of Multiple Human Malignancies

Purpose To summarize the data of previously reported medical imaging features on human malignancies to provide a scientific basis for more credible imaging feature selection for future studies. Materials and Methods A search was performed in PubMed from database inception through March 23, 2018, for studies clearly stating the decoding of medical imaging features for malignancy-related objectives and/or hypotheses. The Newcastle-Ottawa scale was used for quality assessment of the included studies. Unsupervised hierarchical clustering was performed on the manually extracted features from each included study to identify the application rules of medical imaging features across human malignancies. CT images of 1000 retrospective patients with non–small cell lung cancer were used to reveal a pattern for the value distribution of complex texture features. Results A total of 5026 imaging features of malignancies affecting 20 parts of the human body from 930 original articles were collated and assessed in this study. A meta-feature construct was proposed to facilitate the investigation of details of any high-dimensional complex imaging features of malignancy. A correlation atlas was constructed to clarify the general rules of applying medical imaging features to the analysis of human malignancy. Assessment of this data revealed a pattern of value distributions of the most commonly reported texture features across human malignancies. Furthermore, the significant expression of the gene mutational signature 1B across human cancer was highly consistent with the presence of the run length imaging feature across different human malignancy types. Conclusion The results of this study may facilitate more credible imaging feature selection in all oncology tasks across a wide spectrum of human malignancies and help to reduce bias and redundancies in future medical imaging studies. Keywords: Computer Aided Diagnosis (CAD), Computer Applications-General (Informatics), Evidence Based Medicine, Informatics, Research Design, Statistics, Technology Assessment Supplemental material is available for this article. Published under a CC BY 4.0 license.

For our review, articles were eligible for inclusion if they explicitly proposed solutions for the prespecified clinical task(s) by using imaging feature-based techniques and reported the enrolled subject and image characteristics according to prespecified criteria. Studies were also included in which the medical imaging features were extracted in a quantitative or qualitative manner from regions of interests (ROIs) in radiologic images of malignancies, or from cancer cell images or high-dimensional resolution histopathology images. Studies applying medical imaging features to particular malignancy task(s) usually require enrollment of cases with particular corresponding image scan sequences. Thus, radiology subspecialties or techniques used in the included studies also had to be declared.
Both print and online-only original articles registered in online archives were included in this study. Review articles were also included in this study if medical imaging features applied to specific malignancy tasks were reported. Articles written in other languages were also included. Forms of publication that were excluded from the analysis include case reports, clinical perspectives, state of the art articles, editorials, letters, quizzes, video-audio media, educational material, book reviews, commentaries, and news pieces.

Appendix E2. Examples of Naming Inconsistencies
The first challenge of medical imaging feature systematization consisted of identifying the names used for features. Although the definition of imaging features has been provided by the IBSI, in actual studies, the naming of features by different researchers is inconsistent. For example, the reported pixel-based features of "percentage" and "probability"; and the features of "maximum value of histogram" and "most frequent voxel"; and the description of "maximum probability" and "maximum co-occurrence matrix element" actually carried the same meaning, although their descriptions varied across articles. On the other hand, features with similar descriptions, such as "information measures of correlation 1" and "information measures of correlation 2"; and features of "contrast" and "local contrast," referred to different concepts across articles. Furthermore, complicated imaging features, such as "long run low gray level emphasis (LRLGLE)," can easily to be confused with other features, such as "long run high gray level emphasis" (LRHGLE), and "short run low gray level emphasis" (SRLGLE).

Appendix E3. Meta-Feature Construction Details
For instance, for simple features, such as shape, size, diameter, count, color, etc, the original feature description was retained as a meta-feature. Complicated descriptions, such as the presentation of "mean and standard deviation of the normalized radial length," were decomposed into the following meta-features: "mean," "standard," "deviation," "norm," "radial," and "length." Fix feature names, such as "angular second moment (ASM)," "angle co-occurrence matrices (ACM)," "apparent diffusion coefficient (ADC)," and "second diagonal moment (SDM)," were treated as independent meta-features. Based on these meta-features, it was then possible to clarify the details of utilization of each imaging feature in human cancer-related tasks.

Appendix E4. Visualization of the Network Topology Diagram
An online computing platform for visualization of the network topology diagram of these correlations is presented at http://www.ciitool.com/#/mifa in this study. In the visualized network, both features and cancers were shown as nodes (see Figure 3 in the manuscript). Once a feature (or cancer node) is clicked, the other nodes with which it is correlated are enhanced in the display, to facilitate understanding. For a specific node of interest, scrolling of the mouse enables the viewer to read more details. Besides, we separately present the details describing the applications of meta-features across all cancer types, sorted by radiologic image (CT, mammography, MRI, PET, and ultrasound) and histopathology image, in Tables E11-E14. An open access database expressing the correlation of the human cancer type, oncology tasks, and corresponding medical imaging features is available at the URL: http://dx.doi.org/10.17632/gv5j2gk467.10.

Appendix E5. Recombining Meta-Features to Obtain Statistics of Original Imaging Features
The names of the original features in this study were manually extracted from the existing literature reports, and were not artificially modified to maintain accuracy. Thus, most metafeatures were single features, as they were extracted from complex feature representations. Therefore, it may be difficult for readers to decide the relationship between individual metafeatures and human cancers. However, based on the statistics and meta-features, if the reader is interested in a certain imaging feature that has complicated descriptions, the details of utilization of the imaging feature in human cancers could be restored. By using the statistical results provided in this study, the reader can review the specific details of any of the imaging features of interest in cancer. Then, using the records of the original feature in Table E2, the PubMed ID of the studies that reported the original imaging feature, as well as the name of other features reported in the same article, and the corresponding clinical tasks and imaging modalities, can be obtained. For instance, for the complicated feature of "short run emphasis," which is a highdimensional texture feature from the run-length matrix, by recombining the meta-features of "short," "run," and "emphasis," we found that this feature was mainly used for the clinical task of characterization of lung cancer by using CT images in Table E11. Then, the corresponding PubMed ID, the clinical tasks, the other imaging features reported in the same article, and the imaging modalities of the "short run emphasis" could be traced by searching the meta-features or original features in Table E2.

Appendix E6. The Significant Meta-Features in Imaging Modality
The mean, entropy, intensity, gray level, and shape were the most frequently reported metafeatures in studies using CT scans, which significantly concentrated on characterization and monitoring tasks in lung cancer. In MRI, enhancement, contrast, intensity, entropy, variance, and correlation were the top features, which were mostly used for the characterization task in breast and prostate cancer. Nucleus, area, density, mean, variance, and shape were the most commonly used meta-features of histopathology images, which were mainly used for characterization and detection tasks in breast and vaginal cancer.

Appendix E7. Supplementary Materials of CT Image CT Imaging for Visualization of the Significant Texture Features
Pretherapy contrast-enhanced CT images of 1000 early-stage patients with NSCLC were acquired in the Department of Radiology at our hospital. Contrast-enhanced CT were performed on every patient using one of the two multidetector row CT (MDCT) systems (GE Lightspeed Ultra 8, GE Healthcare, Hino, Japan or 64-slice LightSpeed VCT, GE Medical systems, Milwaukee, Wis), with the following acquisition parameters: 120 kV; 160 mAs; 0.5-or 0.4second rotation time; detector collimation: 8 × 2.5 mm or 64 × 0.625 mm; field of view, 350 × 350 mm; matrix, 512 × 512. After routine nonenhanced CT, contrast-enhanced CT was performed after 25 s delay following intravenous administration of 85 mL of iodinated contrast material (Ultravist 370, Bayer Schering Pharma, Berlin, Germany) at a rate of 2.5-3.0 mL/s with a pump injector (Ulrich CT Plus 150, Ulrich Medical, Ulm, Germany). CT image was reconstructed with standard kernel, with interval: 1 mm-2.5 mm. Retrieval of CT images: All of the CT images were retrieved from the picture archiving and communication system (PACS) (Carestream, Canada).

Tumor Boundary Extraction and Feature Analysis
Tumor delineation.-Complete region of interest (ROI) of lung tumor was manually delineated for texture feature analysis. Two radiologists with more than 10 years of experience in chest CT interpretation were chiefly responsible for the manual tumor segmentation. We randomly selected 20 cases and the similarity index of features from the segmentation results by the two radiologists indicated that the interclass correlation coefficient (ICC) of the segmentation of the two radiologists was ranged from 0.810 to 0.936. The average stability of extracted features was 92.50%.

Feature analysis.-
The texture features of the image in the middle layer of each patient were first calculated by the algorithm in Matlab (version: 2015b), and then the values of the texture features of different patients were displayed by using R language (version: 3.4.3).
The following are the details of feature analysis, and all the source code was published in following URL: DOI: http://dx.doi.org/10.17632/gv5j2gk467.9.
A. Matlab (version: 2015b) was used to calculate the six texture features for all the images. First, we randomly generate 500 images with the dimension of 32*32 by computer program. Then, the feature of entropy of the random images and NSCLC images were calculated by the "entropyrandom" and "entropy" function, respectively. The core-algorithm of the calculation of the feature of entropy of was the toolkit of "entropy" in Matlab. Next, the toolkit of "graycomatrix" was used to obtain the texture feature of standard deviation, correlation, contrast, energy, and homogeneity in four directions. In this study, the gray level value of random images and that of lung cancer ROIs were normalized uniformly. The algorithm is as follows. After obtaining all the texture feature values, the toolkit of "ggplot2" in R language (version: 3.4.3) was used to display the feature values of different images. The algorithm is as follows.