# Users Guide for ROC-kit

** **

** **

**You can also download a PDF version here.**

**ROC-kit **** **

GUI version 1.0.1

ROC libraries 1.0.3

for

Microsoft Windows XP, Vista and 7 32 and 64 bit

Apple Macintosh OS X 10.5+

Linux Operating System 32 and 64 bit

(Febrary 2011)

Lorenzo L. Pesce, Ph.D.

John Papaioannu, M.Sc.

Charles E. Metz, Ph.D.

Department of Radiology, MC2026

The University of Chicago

5841 South Maryland Avenue

Chicago, IL 60527

USA

e-mail:

## System Requirements & Notes

All versions of ROC-kit (“roc.jar”) after version 1.0.3 have been tested on the 32-bit versions of Microsoft Windows 7™ and Vista™, 64-bit versions of Apple Macintosh OS X 10.5+, and a number of 32-bit Linux versions (as Linux tends to be a more open system, it tends also to be less predictable; in general we expect ROC-kit to work on systems with gcc4.1+ installed plus the conditions detailed under “System Requirements”). The software has not been tested on Windows 95, 98 and NT, which are not supported. Some versions might have problems with Windows 2000™, XP™, but we will do our best to overcome them. However, support for the latter will be dropped in the medium term future as well. Linux versions will be moved to 64-bit (discontinuing 32-bit) starting around version 1.0.4. Versions for other operating systems may be released in the future (users are welcome to log such requests).

In this document we will be using the expression “ROC-kit” to refer to the version that include this user’s guide. Occasionally we will specify which version of the software this guide actually refers to. Since it is difficult to keep user’s guides up to date with software development, we encourage users to report bugs, typos and inconsistencies to us in the interest of the users’ community.

## Disclaimer & licensing agreement* *

The use of ROC-kit is constrained by the licensing agreement that users are required to acknowledge before downloading the software. Using the software assumes acknowledgement of that agreement. We ask that you do not distribute this software to others. However, any copying or distribution of this software implies that the person distributing it assumes full responsibility for informing the recipients of the licensing agreement with The University of Chicago. Modification of the software should be indicated in any modified version of this software. Failure to acknowledge the source of this software is breach of copyright law.

By using this software to analyze data, you agree that any publications based on your analysis will cite the appropriate ROC-kit references. Please cite also the Metz’s ROC Software website (http://radiology.uchicago.edu/?q=MetzROCsoftware or http://xray.bsd.uchicago.edu/krl/roc_soft.htm, the latter will be discontinued in early 2011** ) but do not cite this user’s guide**, which has not been peer-reviewed.

Although this software has been carefully tested, neither The University of Chicago nor any of the individuals (present or past) who participated in the development or testing of this software and user’s guide is responsible for any errors or for any damages that may result from use of the software.

Some of the images used in this guide might be from versions older than the one you are using. To the best of our knowledge these differences should not affect the use of the specific features being described. If you discover one or more situations where this might not be the case, please let us know and we will do our best to correct the guide as soon as possible.

*Inquiries or comments concerning this program should be directed to the addresses on the cover page.*

## System Requirements

**General requirement**:

- JRE (Java Runtime Environment) version 5.0 or later

(http://java.sun.com/javase/downloads)

**For Apple Macintosh OS X operating systems**:- Intel processor
- OS X 10.5+ (Tiger and previous versions are not supported)

**For Linux operating systems**:- A PC equipped with a Pentium or later processor.
- Requirements on Linux system are not completely clear and users are welcome to report in detail any problems they encounter on their operating system; we will do our best to solve any installation and operation problems. The software is compiled to work on both 32-bit and 64-bit version of Linux.

**For Windows operating systems**:- A PC equipped with a Pentium or later processor.
- Windows (Vista or 7) 32-bit.
- Windows (
*2000, XP*— might have problems, testing and support will be discontinued) 32-bit.

## Where to find this software

ROC-kit is available from the following web site (directly or through requests):

http://radiology.uchicago.edu/?q=MetzROCsoftware.

Follow the link at the “software” tab on the right of the page or contact us if the download page cannot be found.

** Note1**: The web site is likely to change considerably in the near future, but we plan to keep this URL active.

*Note2:** *In case of downloading problems, use the contact information provided on the cover page of this user's guide.

## How to install (and uninstall) this software

- Connect to the web site mentioned in the previous section
- Follow the online instructions to download
, the executable ROC Java Archive (JAR) file, to your computer. When downloading the file, it is usually best to select the “Save” option rather than “Open”.*roc.jar* - Verify that your system satisfies the requirements described in “System Requirements,” especially concerning the Java Runtime Environment.
**TO START:**When using Windows or OS X, double click on. When using Linux, you must type on the command line “*roc.jar*”.*java -jar roc.jar**Note: more details on how to start and run the program are provided later in this guide.*

*Note**: *We can provide also dynamically linked libraries, loadable objects and shared libraries (DLLs and .so’s), that can be called by external programs such as R, IDL and SAS and used independently from this GUI. Additional information on these can be obtained by contacting us directly or from the web site.

** Disclaimer**: We cannot assume any responsibility for problems created during the installation or un-installation process (including all the operations necessary to locate, download, copy or open the software or this guide).

## Request for feedback

ROC-kit is a Beta version of this software. Although we have tested this software extensively and fixed all of the bugs discovered, there is no guarantee that we have found every bug. If you discover any possible errors or anomalies in the software, please contact us immediately so that we may fix any problems that might have been found. Our address appears on the cover page of this guide.

## An Overview of ROC-kit

ROC-kit is designed for use when case variation is relevant to calculate the statistical significance of a classification-performance difference between different treatments (diagnostic tests, or modalities). It will also perform an analysis with a single modality, if the user requires it. One should note that the user must determine which ROC model and statistical variation model is appropriate for her or his experimental design—not all options are appropriate for all experimental designs. ROC-kit performs conventional ROC analysis ^{1-3} on both continuously-distributed and ordinal-category data (e.g., “confidence ratings”) that represent diagnostic test results. There are no predetermined limits on the number of treatments / modalities / conditions (fixed factor: effect of the different modality on the expected value of the performance index), or cases (random factor: effect of cases on the variability of the performance index). More models and analysis options are added regularly.

## Purposes of ROC-kit

ROC-kit is designed to fit ROC curves, estimate ROC indices values and determine the statistical significance of differences between ROC index values when the performance of a diagnostic device is affected by the cases analyzed (e.g., patient, specimen or sample).

The required data are **unpaired, partially-paired or fully-paired test-results** from actually-negative (e.g., “normal,” “healthy” or “noise”) cases and actually-positive (e.g., “abnormal,” “diseased” or “signal”) cases. This means that it is not necessary for every case to be observed using every modality — i.e., a fully-crossed, balanced design is not required.

The analysis can be done using non-parametric or semi-parametric ROC models, and statistical variation can be modeled using maximum-likelihood (ML), resampling or moment based approaches. (Not all combinations of data structures and ROC models are possible.)

*Note**:* ROC-kit is in a continuous state of development, so this user’s guide may provide an incomplete picture of the software's potential and problems.

Accepted types of input data for testing differences

** **

**Three types of input data** are allowed when testing the statistical significance of differences between ROC curve estimates:

(1) Unpaired (uncorrelated) test results for two or more test conditions. The “conditions” are applied to independent case samples — for example, different diagnostic tests are performed on different patients, or two different radiologists (considered here as fixed factors and not random factors) make probability judgments concerning the presence of a specified disease in different images, etc. *These datasets can be input as LABROC, ROCKIT or LABMRMC files*.

(2) Partially-paired or fully-paired (correlated) test results from a single test-reader in each of two or more conditions, in which data from all of the different conditions are available for some of the cases in a single case sample, but some additional patients may have received only one of the diagnostic tests. *These datasets can be input as ROCKIT or (only if fully-paired) LABMRMC files*.

(3) Fully-paired multi-reader-multi-case (MRMC) test results — for example, every diagnostic test is performed on each patient in the sample and the outcome of every diagnostic test is interpreted by every test reader. *These* *datasets can be input only as LABMRMC files*. ** **

Limitations or known non-resolved bugs of the current release

- MRMC data cannot be analyzed as such, but only multi modality only subsets. Therefore, Only parts of LABMRMC datasets can be interpreted because the MRMC analysis modules have not been released.
- Statistical comparison for partially-paired or fully-paired multi-modality data can be done only using non-parametric methods and therefore are based only on the AUC as computed using the Wilcoxon (Mann-Whitney U-statistic), equivalent to the trapezoidal area.
^{2} - Single ROC curve data can be analyzed using both non-parametric and semi-parametric models, but parametric models have not been implemented yet (and might never be because they are rarely useful).
- The CvBM model is released as an alpha version. Feedback is particularly welcome for this model.
- Error bars (95% confidence intervals) on ROC curves are occasionally displayed in confusing ways. We have no managed to completely solve this problem yet.

* *

*Note**: *A more detailed description of the meaning of these terms is provided in the following pages, in the “Glossary of ROC Terms” section and in the references therein.

## Warnings and notes about ROC-kit

**Note**: The input data must ** not** include more than one test result from each case for each condition, unless there is strong evidence that these test results can be considered independent. If multiple test results from a single case-condition combination are pooled in the input, the program is very likely to overestimate the statistical significance of any apparent difference between the ROC curves, thereby invalidating the statistical test. When multiple test-result pairs are available from each case (for example, from replication in each condition), the datasets should be run separately (for example, for each replication) or our LABMRMC/DBM-MRMC (in collaboration with the University of Iowa) software or similar approaches should be used instead.

** **

### Statistical models

The kinds of data that ROC-kit can analyze were discussed in the previous section. In statistical terminology, the software allows analyses that include^{1, 3}:

- One random factor (case variability, single-curve ROC fitting, either semi-parametric or non-parametric);
- One random factor (case variability, single-curve ROC fitting, either semi-parametric or non-parametric) and one fixed factor (multi-modality comparisons); or
- One random factor (case variability, single-curve ROC fitting, either semi-parametric or non-parametric), one fixed factor (multi-modality comparisons) and another random factor (reader or rater variability).
**[Not released yet]**

In the past these analyses were provided by University of Chicago software such as PROPROC/LABROC4/RSCORE, ROCKIT/CORROC/INDROC and LABMRMC/DBM-MRMC, respectively.

Different procedures estimate variability using one or more of the following methods:

- The inverse of the Hessian matrix, utilizing the Cramer-Rao bound, as usually done in maximum-likelihood estimation (MLE), as in CORROC
^{4}, LABROC4^{5}and PROPROC^{6}. MLE is based on the likelihood function or the probability of the data giving a specific model for the population. MLE considers it as a function of the parameter models. It has been shown that asymptotically, the estimates are normal and their variance is related to the inverse of the second derivates of the log likelihood function.^{7} - A moment based estimate, using the properties of U-statistics, as is done by the Gallas
^{8}or Zhou and Gatsonis^{9}methods. For a description of these methods, see Gallas and the references therein.^{8} - A resampling method, such as the jackknife, DBM-MRMC
^{10}, or bootstrap, Gruszauskas^{11}. The bootstrap uses the sample as an estimate of the population distribution from which samples are drawn with replacement and then employed to estimate properties of the population itself.^{12}The jackknife is based on similar principles, but it is usually implemented by creating a set of sub-samples by leaving out part of the data (the most commonly used form, also called Round-Robin, leaves just one case out). These leave-one-out estimates are very useful in ROC analysis, in part because they allow the “creation” of "pseudovalues" of any summary index that can be associated with individual cases, even for global indices such as the AUC^{10}.

### ROC curve models and indices

ROC-kit computes ROC indices under the following models for the ROC data:

“Proper” models (to be defined below):

- Proper binormal model (referred to hereafter as PROPROC and based on our previous PROPROC executable).

“Conventional” models:

- Conventional binormal model (referred to hereafter as CvBM and based loosely on our previous LABROC4 algorithm).

Non-parametric (empirical) method:

- Trapezoidal/Wilcoxon, non-parametric estimation of ROC indices.

The trapezoidal/Wilcoxon model does not make any distributional assumption with regards to the positive and negative cases — this is why it is non-parametric. Details about this model can be found in Delong et al. ^{13}. The other two models are semi-parametric, in the sense that monotonic transformations of the test result values (the quantity measured in the experiment) are assumed to be related to normal distributions, one for the negative and one for the positive cases, but the decisions about the state of cases are not made in the same way. PROPROC assumes that decisions are made using likelihood ratios, i.e., the decisions about an unknown case are based upon the relative likelihood of being positive for that specific value (as an ideal observer would do) ^{14}. CvBM assumes that the true ROC curve for each reader-treatment combination plots as a straight line on "normal-deviate" axes, or equivalently, that the input data follow normal distributions for both the actually-negative and actually-positive fractions of the population after some unknown monotonic transformation. ROC curves measured in a broad variety of fields have shapes compatible with these assumptions. The assumptions are often acceptable even when the raw data have multimodal and/or skewed distributions, because they are equivalent to the assumptions that some variable functionally related to the one measured in the experiment (the test result value) is normal for both conditional distributions (actually negative and actually positive), respectively. The proper models have the great advantage of not producing curves that have change in convexity, which is considered incompatible with medical evidence ^{15}.

A number of indices are available for the analysis:

- Area under the curve (AUC)
- Partial area under the curve between two values of FPF or to the right of the curve between two values of TPF (to be input by the user)
- Sensitivity for a user-input value of specificity
- Specificity for a user-input value of sensitivity

For a description of the meaning of these different indices, users are referred to Wagner et al.^{1}, Pepe ^{2} or Zhou et al.^{3}.

Not all indices are available for all models in different versions of the software or in different types of analysis. Please refer to the software you have or to the web site for the most recent version. In general, recent versions have more features.

*Note1**:* Any of the available indices can be employed for the analysis. If another index is desired, the analysis might need to be repeated. One should be aware that repeating the analysis with multiple indices in order to obtain the greatest (or least) statistical significance violates the assumptions of the statistical analysis and invalidates results of the statistical tests unless corrections are made, as described in the note immediately below.

** Note2**: If more than a single comparison is attempted, a correction for multiple comparisons such as Bonferroni or Holm-Sidak should be performed whenever a conclusion is to be drawn concerning the best comparison and/or the comparisons overall.

^{16}

## Acknowledgments

This guide was compiled by Lorenzo Pesce and is based on previous work done at The University of Chicago and The University of Iowa.

The algorithms employed in this software were described by the authors of the papers referenced for each method. The program itself was written and tested by Lorenzo L. Pesce and John Papaioannou. The code is an almost completely rewritten version of parts originally written by Benjamin A. Herman and Hatem Abu-Dagga, with some code taken from earlier programs written by Helen Kronman, Pu-Lan Wang, Jong-Her Shen and Xiaochuan Pan.

Development of this software was supported primarily by the National Institute on Alcohol Abuse and Alcoholism under Contract HHSN267200700039C (Charles E. Metz, P.I.) and in part by the National Institute of Biomedical Imaging and Bioengineering under Grant RO1 EB000863 (Kevin S. Berbaum, P.I.).

By using this software to analyze data, you agree that any publications based on your analysis will cite the appropriate references (listed in the user’s guide where the various features are described) and the Metz ROC software website (http://radiology.uchicago.edu/?q=MetzROCsoftware).

# To Run ROC-kit

** **

## To start the program

Þ Windows or OS X: double click on ** roc.jar**.

Þ Linux: you must type on the command line “** java -jar roc.jar**”

The program main window (see Figure 1) consists of a vertical menu bar and program page.

The main program page displays, immediately under the “ROC” title, the version of the Java engine for the GUI and the version of the ROC library for the estimation routines. Make sure to take note of these for replication, publication and bug-reporting purposes.

*Note1**:* Throughout program execution, only the tabs or options in full color and those with black text are available, whereas the other ones cannot be selected. Note that the available options can change during different phases of execution to serve as a guide as to what options are appropriate during the current phase of execution. The available options can change also in different versions of the program as more options become available.

*Note2**:* ROC-kit’s interface will look slightly different on different operating systems and versions thereof. The figures in this guide are screenshots from the Apple Macintosh Aqua interface running under OS X 10.4.11.

**Figure 1.** Program main window (as shown on OS X 10.4.11)

### Options available at startup

Immediately after startup, only the “input,” “about” and “quit” icons are active in the menu bar on the left side of ROC-kit's window.

The “about” icon returns the user to the Welcome page, so it does nothing at startup.

The “input” icon leads to a “dataset selection” window that will look like the standard file section window of the operating system being used (Figure 2).

The “quit” icon closes ROC-kit.

## Execution of an analysis

The following steps describe ROC-kit's features. Various alternatives are possible, and not all of their combinations are described explicitly here.

** Note:** Older implementations of some of ROC-kit's algorithms allowed manual input of data. This is no longer possible, so an input file must be created first. The manual-input option was dropped because it was and highly vulnerable to typographical errors and rarely used.

### Opening and reading an input file.

Click the “input” icon and then use the resulting window to select the desired input file. If a test run of the program is desired, an input file can be requested using the contact information on the first page of this user’s guide.

**Figure 2**. Dataset selection window (OS X 10.4.11)

After a file has been selected, a pop up window will ask what kind of input file it is. As stated above, three types of file input are acceptable: LABROC, ROCKIT and LABMRMC.

** **

**Figure 3**. File Format Specifier window instructing the program to read the input file as if it were a LABROC type of file (OS X 10.4.11)

A description of the different input file types is provided later in this guide. If the format of the input file contains one or more errors, a descriptive error message is generated to help determine is the nature of the problem, as in the example shown in Figure 4.

**Figure 4**. Error message generated by a problem with a LABMRMC input file (OS X 10.4.11)

* *

** A note to long-time users of our University of Chicago software:** ROC-kit's "ROCKIT" input format differs from that used by previous versions of ROCKIT in that CODEWORDS have been abandoned, both for the “KIT” line and for the dataset description line. A more complete description is provided later in this guide.

### Reading the input file and displaying the data

The image in the main window will have a common appearance regardless of whether the input file is in LABROC (Figure 5), ROCKIT (Figure 6) or LABMRMC (Figure 7) format.

**Figure 5**. Program window, displaying LABROC input file (OS X 10.4.11)

** Note: **The height of the whiskers in each of ROC-kit's "box-and-whiskers" plots indicates the range of test-result values from actually-negative and actually-positive cases in the input data file, whereas the height of each box indicates the 25

^{th}-through 75

^{th}percentile of those input values. Moreover, each box's width shows the relative numbers of actually-positive and actually-negative cases in the sample (e.g., in the example shown in Fig. 5, actually-positive cases [red] are much more numerous than actually-negative cases [green], but their test-result values occupy a somewhat narrower range).

** **

**Figure 6**. Program window, displaying ROCKIT input file (OS X 10.4.11)

**Figure 7**. Program window, displaying LABMRMC input file (OS X 10.4.11)

The main window is divided in two sections: the top one contains a quantitative description of the data, whereas the bottom one displays data as a box-plot. The checkboxes in the latter allow the user to select subsets of the data for specific types of analysis.

** Note**: By placing the display cursor over a particular reader’s internal frame or over a specific modality’s sub-internal frame, a pop up window will appear with additional information about that frame. This behavior occurs with many other objects in the program’s windows as well.

**Figure 8**. The upper internal frame of ROC-kit's input-date description window, for a LABMRMC input file (OS X 10.4.11)

**Quantitative description**

- The first line within the input-date description window contains the name of the file that has been selected and its path. In the case of the file from Figure 8, the file name (with its path) is /Users/lorenzopesce/ROC/java/data/MRMC_18_2.txt

* *

** Note:** the button at the end of the file name can be used to change the input file. Using this button will bring up the pop up window described in Figure 2.

- The rest of the upper internal frame of the window indicates the numbers of Cases (subdivided within parentheses into actually-negative and actually-positive cases), Treatments and Readers that were found in the input file.

** Note:** the button at the bottom of the upper internal frame of the window can be used to reselect the entire dataset as described in the file. For details about how to select subsets of the data, see following section.

** Note:** For a LABROC file the numbers of treatment and readers are set to 1 by default, as only one sequence of data values is analyzed. For a ROCKIT file, the number of Readers will be set to 1, but the number of treatments can be any value. For a LABMRMC files both numbers are free to take any positive integer value > 0.

**Display of data and selection of the data to be used in the analysis**

Not all versions of the software are capable of analyzing complete datasets (e.g., the versions released first are unable to analyze MRMC data). Therefore, after a dataset has been uploaded, it is possible that the “fitting tab” on the left side of the main menu is still “grayed,” thereby indicating that the full dataset cannot be analyzed by any of the methods available and that a subset of the data must be selected.

Alternatively, one may wish to select only a part of the data for analysis (e.g., fit a single ROC curve to the data produced by a particular reader using a particular modality).

Two examples of the content of the main window when displaying datasets are shown in Figures 9 (ROCKIT input) and 10 (LABMRMC input).

**Figure 9**. Program window displaying a ROCKIT input file: bottom internal frame (OS X 10.4.11)

The ROCKIT file is displayed with case-pairing matrices because a different number of cases can be available to different modalities. LABMRMC datasets are fully paired whereas LABROC files are unpaired.

** **

LABMRMC datafiles are organized by reader: in the example in Figure 10, for each Reader (R1 to R18), the data for 4 modalities is provided (T1, T2, T3 and T4). As already stated each modality has been applies to the same actually-positive and actually negative cases (paired design) and all readers read all cases for all modalities (fully-paired design).

** **

** **

**Figure 10.** Program window displaying a LABMRMC input file: bottom internal frame. Reader “R10” has been selected to view (OS X 10.4.11)

- The plot for specific readers (if available) can be selected by clicking anywhere on the frame that displays their data (in Figure 10, reader “R10” has been selected). Clicking on the bottom right check box allows users to select/deselect a specific reader from the following analysis.
- Modalities can be selected/deselected by checking/unchecking the check box on top of them.
- The button at the bottom of the upper internal frame (described above) can be used to reselect all Cases, Treatments and Readers.

Figure 11 shows the dataset from Figure 10 after all readers except “R10” have been deselected. Note how the top internal frame shows only the selected data. All data can be restored by checking the “Select All” check.

**Figure 11.** Program window displaying a LABMRMC input file where a subset of the data has been selected. Only reader “R10” has been selected (OS X 10.4.11)

### Selecting curve-fitting options

Click the “Fitting” icon from the vertical menu bar, which should be activated if a suitable dataset has been selected from the data. ROC-kit will then automatically proceed through a sequence steps depending upon what kind of data was selected (i.e., whether it is a single curve, paired data, MRMC data and so on).

The following screen will appear (Figure 12, differences among different dataset types might exist, as it will be explained in the following).

**Figure 12.** The “Fitting Options” program window. Grayed options are not available for use, either because they been have not implemented in the release being run, because they do not apply to the specific analysis chosen, or because other choices must be made first (OS X 10.4.11)

** Note:** Some of the options within this window usually will be grayed. For example:

- In early releases of ROC-kit the only available analysis for paired data is non-parametric, so the semi-parametric and parametric options are not available;
- If a non-parametric estimation approach has been chosen, the lower internal frame will be grayed because no specific ROC curve model should be selected (non-parametric implies that no models are used).

First the user needs to select the “Estimation Approach”, i.e.specify , whether a non-parametric, a semi-parametric or a parametric analysis is to be used (see Pepe’s book for details^{2}).

- If a non-parametric model is chosen, no “ROC curve model” option will be available.
- If semi-parametric is chose, the CvBM
^{5}and PBM^{6}approaches will be made available. These models were described briefly in the “Warnings and notes” section above. - No parametric models are currently available, though in principle the same models as for the semi-parametric approach could be used. Parametric models are mentioned here mostly for completeness; they are very rarely used in actual analysis, because the strong assumptions that they require are rarely valid.

After fitting options have been selected, the “analysis” icon will become active. Clicking on it will yield a new window, shown in Figure 13. This window allows the user to select a method to estimate uncertainties.

** Note:** It is possible to go back to any of the previous windows if any change needs to be done, e.g., change the input file, select a different subset of the data or chose a different set of fitting options. In general, any icon or check button that is active can be used to select new choices or undo previously made ones.

** **

**Figure 13.** Program window: Uncertainty estimation options. Grayed options are not available for use, either because they been have not implemented in the release being run, because they do not apply to the specific analysis chosen, or because other choices must be made first (OS X 10.4.11)

### Selecting an uncertainty estimation method

This window is divided in two internal frames, one that allows the user to select an uncertainty estimation method and the other that presents the reader with a summary of the data and options that were selected for this analysis (this summary can be cut and pasted for reporting or logging purposes).

Four methods are available when a semi-parametric or parametric method is chosen and were briefly described above:

- Inverse of Information Matrix: This is the standard procedure used with Maximum-likelihood estimation. As not all methods are based on MLE, this option might not be available. In this specific instance, we are referring only to single curve fitting.
- Jackknife, a resampling method.
- Bootstrap, a resampling method.
- ROCKIT. This method is actually based on the inverse of the information matrix, but here we report it separately to emphasize that this is provided for paired and partially-paired data as well as for a single curve.

Three methods are available when a non-parametric method is chosen and also were briefly described in the overview section:

- Bootstrap based method.
- U-statistic based method, based on the moments approach proposed by Zhou and Gatsonis.
- U-statistic based method, based on the moments approach proposed by Gallas.

After the selection process has been completed, the “Compute Estimates” button in the top right corner of the upper internal frame of the main window becomes active. This button can be seen in Figure 13 but is inactive there because no uncertainty estimation method had been selected.

** Note**: After the computation has been completed, the “results” icon becomes active.

### Displaying results

The results section begins with an upper internal frame that describes what options are available.

Figure 14 shows the options that are available after a single ROC curve fit has been performed with PROPROC, for example; a similar window would appear if CvBM had been used. Figure 15 shows the window that will appear if a non-parametric approach was chosen, in which case the “Plot Curve” option will yield only the empirical ROC curve that corresponds to the input data.

** **

**Figure 14.** Program window, upper internal frame: Results Options. This example shows the options that would be displayed after a single ROC curve fit was done using PROPROC (OS X 10.4.11)

**Figure 15.** Program window, upper internal frame: Results Options. This example shows the options that would be displayed after a single ROC curve fit was done using a non-parametric approach (OS X 10.4.11)

The display options are described in the following bulleted points. All of the indices mentioned there are explained in some detail in the “Glossary of ROC Terms” section. Default values are simply those that we have used from time to time in the past and should not be considered appropriate for particular research purposes. All values that depend upon a user-selected threshold should be chosen thoughtfully **before **the analysis is performed. In general, “trying” different thresholds violates the assumption of the statistical tests performed and is considered “data-dredging”.

- allows users to select all the output available for the specific analysis chosen
- will yield a display of the estimated AUC and its standard error.
- will produce a plot of the ROC curve as well as numerical values of the empirical operating points and list of coordinates on the fitted ROC points. These values can be “cut” and “pasted” to any data-display or document processor on any operating system that can use this software.
- will produce an estimate of the horizontal partial area to the right of the ROC curve between TPF
_{low }(in this example 0.85, the default value) and TPF_{high }(in this example 1.0, the default value). - will produce the FPF value associated with a user-specified TPF value (in this example 0.85, the default value). The radio button at the end of this value can be selected to choose either “FPF at TPF” or “TPF at FPF”, thereby affecting the direction in which an error bar is displayed on the ROC curve when a point is selected on it.
- will produce an estimate of the vertical partial area under the ROC curve between FPF
_{low }(in this example 0.0, the default value) and FPF_{high }(in this example 0.15, the default value). - will produce the TPF value associates with a specific FPF value (in this example 0.15, the default value). The radio button at the end of this value can be selected to choose either “FPF at TPF” or “TPF at FPF”, thereby affecting the direction in which an error bar is displayed on the ROC curve when a point is selected on it.

When the “Display results” button on the top right corner of the internal frame is clicked, the program will display the selected estimates. The “Results Options” internal frame will remain in the main screen, while two additional internal frames will appear to show the ROC plot and the data internal frames.

**Figure 16.** Program window: Results Options with “Select All” selected. Shown for a single Curve fit done using a non-parametric approach (OS X 10.4.11.)

** **

**Plot internal frame**

One or more ROC curves will be displayed in the “Plot” internal frame and identified in a legend.

- Single modalities :
- If a non-parametric approach was selected, only the empirical ROC curve associated with the input data will be shown.
- If a semi-parametric or parametric curve-fitting method was selected, the empirical ROC curve will be shown together with the fitted curve.
- Multiple modalities:
- For each modality: one empirical ROC curve and a corresponding fitted curve (if a semi-parametric or parametric method was selected).

Several actions can be performed on the plot internal frame:

- A point can be selected on the plot. The program will display the numerical FPF and TPF values associated with that point. Moreover, depending upon whether the radio button associated with “FPF a TPF” or “TPF at FPF” was selected, a vertical or horizontal error bar will be shown.
- By selecting the “fitted” tab in the “Data” internal frame, users can select both how to estimate numerically FPF and TPF sequences (which will be displayed in this same tab and can be copied and pasted into other applications) and what type of error bands can be put around the ROC curve, i.e., using the uncertainty on TPF, assuming that FPF is known, or the uncertainty in FPF when TPF is known. (Vertical and horizontal error bars are not provided simultaneously because this would require selecting a cutoff value in the test-result space and mapping the corresponding ROC points into the unit square, whereas the current version of ROC-kit does not able to do that.)
- By right clicking on the plot (e.g., by using CTRL + mouse click on OS X), it is possible to edit the plot (e.g., change its title or colors), save it (e.g., as a JPG or PNG file), or print it.
- Either the entire plot or a specific region of it can be selected.

**Data internal frame**

The data internal frame contains several tabs. We describe the tabs here only for a single modality, because multi-modality tabs are organized equivalently.

- The “estimates” tab displays the values (and uncertainties — as standard errors or confidence intervals) for the indices that were selected in the “Results Options” internal frame. These values can be copied and pasted to any application.
- The “empirical” tab, shows the empirical operating points that correspond to the input data. These can also copied and pasted.
- The “fitted” tab, presents the user with a number of options:

- indicates the number of points on the fitted ROC curve for which coordinates are to be computed. In general, larger numbers of points provide more information, but users may request a smaller number.
- gives the user three options: points that are (approximately) uniformly distributed on the curve (selected by intersecting the ROC curve with uniformly distributed rays centered at (1,0)), which yields smooth representations of ROC curves even when a considerable portion of the curve is vertical or horizontal; points that are computed at uniformly-spaced FPF values; and points that are computed at uniformly-spaced TPF values. These last two options will cause the program to generate confidence bands on the plot (vertical if points are selected uniformly for FPF, horizontal if points are chosen uniformly on TPF).

# Input file description

** **

Here we describe the different input files formats accepted by ROC-kit and then briefly describe how to prepare input files using conventional text editing or spreadsheet programs. The latter instructions are not intended by any means to be exhaustive or even appropriate for the specific kinds of data and data collection that will be desired by every users, but rather to provide only some guidance in determining the source of potential I/O problems.

## Input data

Data for ROC analysis usually represent either continuously-distributed numerical values or ordinal-category confidence-rating data. Continuous data and all numerical category data are entered into the input file as they are according to the instructions of the following sections. Ordinal-category data, if expressed verbally (e.g., “signal possibly present”; see “Glossary of ROC terms” section), must first be transformed into numerical values, usually integers. Of course care must be taken to give larger values to categories that express higher probability of containing the signal if that ordering is specified in the input file.

*Þ **It is crucially important to note that this program is not appropriate for analysis of test-result values that have been pooled across multiple human test-readers – e.g., multiple radiologists. *

* *

** Note**: the program is very flexible and accepts many different kinds of number formats as input data. Should users find any issues with their data, they are invited to contact us at the address on the first page of this user’s guide.

## Input file types

The different types of input files that can be employed with the installed version of ROC-kit can be viewed when a file format specifier pops up after an input file has selected. Description of the following input-file types are provided here:

- LABROC files
- ROCKIT files
- LABMRMC files

** Note1:** the input file extension has no effect on how the program reads the data; instead, each file format is selected by the user via the file format specifier.

** Note2:** Files must have been saved previously as text files to be readable by of ROC-kit.

### LABROC input file description

In the following, numbers represent lines or groups of lines in an input file, whereas bullets indicate descriptions of those input lines.

- A free-text description of the input file

- This description allows you to identify easily the type of data stored in the file — e.g., the text string “LABROC example file” is used in Figure 17 below.

- On a new line, an alphanumeric codeword on a single line:

- "Large" (or large, L or l) if stronger positivity corresponds to larger input values (i.e., actually-positive cases are expected to yield larger test-result values)
- "Small" (or small, S or s) if strongr positivity corresponds to smaller input values (i.e., actually-positive cases are expected to yield smaller test-result values)
- ROC-kit reads only the first character of each of these code words and is not case-sensitive, so spelling errors and capitalization are not problematic.

- A sequence of test-result values for actually-negative (e.g., ”actually disease-free” or “actually normal”) cases. Any number of actually-negative cases can be entered.
- Input of actually-negative cases must be terminated by a final line containing an asterisk (*) as its first character.
- A sequence of test-result values for actually-positive (e.g., “actually diseased” or “abnormal”) cases. Any number of actually-positive cases can be entered.
- Input of actually-positive cases must be terminated by a final line containing an asterisk (*) as its first character.

*Example – LABROC input file*

LABROC example file

Large

1

2

6

8

4

2

3

*

4

5

6

9

10

12

*

* *

**Figure 17. **Example of a LABROC input file.* *

*Note:** *when using categorical data you need to convert it to this format. For example if you have 10 negative and 3 positive cases in category one, you need to type “1” 10 times in the section where the values for negative cases are and 3 times in the section where the positive are. For example, if you have the following categorical data:

Category -> |
I |
II |
III |
IV |
V |

Actually-negative |
4 |
3 |
2 |
0 |
1 |

Actually-positive |
0 |
1 |
3 |
2 |
4 |

… then the corresponding input file would be:

*Example – conversion of categorical – LABROC input file*

LABROC example file conversion categorical to list format

Large

1

1

1

1

2

2

2

3

3

5

*

2

3

3

3

4

4

5

5

5

5

*

* *

**Figure 18. **Example of a LABROC input file obtained by converting categorical data written in matrix form into list data.

## ROCKIT input file description

In the following description, numbers represent input lines or groups of input lines, whereas bullets (•) identify comments concerning those input lines.

**Note:*** *The old " ROCKIT" format required an additional codeword “KIT” after the first line. However, this codeword is not used — and should not be included — in ROC-kit's "ROCKIT" input-file format.

- A free-text description of the input file

- This description allows you to identify easily the type of data stored in the file — e.g., “Protocols A, B, C” in the example shown in Figure 19 below.

- On a new line, enter the names of all of the conditions, beginning with the first condition. Each condition name must be enclosed in quotation marks (“”), and spaces or tabs must separate the individual condition names.
- On a new line, enter an alphanumeric codeword for each condition:

- "Large" (or large, L or l) if stronger positivity corresponds to larger input values (i.e., actually-positive cases are expected to yield larger test-result values)
- "Small" (or small, S or s) if strongr positivity corresponds to smaller input values (i.e., actually-positive cases are expected to yield smaller test-result values)

- ROC-kit reads only the first character of each of these code words and is not case-sensitive, so spelling errors and capitalization are not problematic.

- A sequence of test-result values for actually-negative (e.g., ”actually disease-free” or “actually normal”) cases.

- On a line for each actually-negative case, enter the test result for condition 1, one or more blank spaces, the test result for condition 2, etc. If no test-result is available for a given condition, then enter a “#” instead.
*Optionally*, these test results can be followed by one or more blank spaces and then a brief free-text description of the case. Any number of cases can be entered

- Input of actually-negative cases must be terminated by a final line containing an asterisk (*) as its first character.
- A sequence of test-result values for actually-positive (e.g., ”actually disease-free” or “actually normal”) cases.

- On a line for each actually-positive case, enter the test result for condition 1, one or more blank spaces, the test result for condition 2, etc. If no test-result is available for a given condition, then enter a “#” instead.
*Optionally*, these test results can be followed by one or more blank spaces and then a brief free-text description of the case. Any number of cases can be entered

- Input of actually-positive cases must be terminated by a final line containing an asterisk (*) as its first character.

*Example ROCKIT input file*

Protocols A, B, C

"AllA" "RandomB" "FirstC"

Large Large Large

0.506890 0.504623 0.237661

0.059184 0.045537 0.017611

0.368607 0.363302 0.363302

0.091150 0.063475 0.063475

0.199107 0.123600 0.123600

0.586394 0.696826 0.696826

0.241429 0.305163 0.305163

0.130843 0.137590 0.124501

0.360271 0.428434 0.528913

0.040779 0.043534 0.043534

*

0.447501 0.796846 0.796846

0.681469 0.591049 #

# 0.915652 0.915652

# 0.883633 0.306815

0.622279 0.458858 0.458858

0.936076 0.940385 0.934565

0.109487 0.128081 0.068720

0.381017 # 0.234412

0.115708 0.147069 0.280961

0.065876 # 0.080754

0.898820 0.899298 0.849371

0.471889 0.359150 0.461840

*

**Figure 19. **Example of an ROCKIT input file.* *

## LABMRMC input file format description

Numbers represent input lines or groups of input lines, whereas bullets represent the description of those input lines.

- A free-text description for the file (up to 60 characters, including any leading blanks).

- This description allows you to identify easily the type of data stored in the file. For example, if the current file contains information on a 10 observer study of 4 different mammographic CAD techniques, then this line might be:

“2006 CAD comparison in mammo, 10 readers, 4 treatments”i.’conditions X and Y: examples of’;

- The name of the particular reader whose data you are about to enter (starting with the first one).
- On a single line, enter the names of all the treatments.

- Each treatment name must be enclosed in quotation marks (“”) and must be no more than 12 characters long.
- ROC-kit imposes no limitation on the number of treatments used.

- On one line enter an alphabetic code word (“small” or “large” or simply “s” or “l”) for each treatment, separated by one or more blank spaces, to indicate that smaller or larger test results, respectively, are associated with stronger evidence of positivity (e.g., “signal”, “disease” or “abnormality”).

- ROC-kit reads only the first character of each of these code words and is not case-sensitive, so spelling errors and capitalization are not problematic.

- A sequence of test-result values for actually-negative (e.g., “noise” or “normal”) cases.

- On a line for each actually-negative case, enter the test result for treatment 1, one or more blank spaces, the test result for treatment 2, one or more blank spaces, the test result for treatment 3, etc.
*Optionally*, these test results can be followed by one or more blank spaces and then a brief free-text description of the case. Any description must follow the values of all modalities (i.e., must be at the end of the line). - It is absolutely essential that the cases appear in the input data in exactly the same order for each reader.
- The program currently requires all test results values for every condition.
- There are no limits on the number of actually-negative cases that can be entered.
- Input of actually-negative cases must be terminated by a final line containing an asterisk (*) as its first character.

- A sequence of test-result values for actually-positive (e.g., “signal”, “diseased” or “abnormal”) cases.

- The input format is the same as that for actually-negative cases.
- There are no limits on the number of actually-positive cases that can be entered.
- Again, the input of actually-positive cases must be terminated by a final line of free text containing an asterisk (*) as its first character.

- Repeat steps 2, 5 and 6 for the remaining readers. ROC-kit does not have limits on the number of readers in an MRMC dataset. End the dataset with a pound sign (#).

**Figure ****1****0.** Example of a LABMRMC input file.

** **

** **

** **

** **

** **

## Using other programs to create input files

Many programs can be used to create input files as described above, e.g., Microsoft Word or Excel, TextEdit, Emacs or Vi, to cite a few. Input files can also be produced using computational environments such as R, SAS, Mathematica, Matlab or IDL. However, it is essential that these programs save the input file in a text-only (word processor) or formatted-text (spreadsheet) format. On some systems or when using specific editing software, it may be necessary to try a few different alternative. ** **

# Glossary of ROC Terms

** **

## Actually-positive and actually-negative

*Actually-negative* cases are those cases that have been verified to be negative with respect to an explicitly-specified disease of interest. This means that the researcher is very confident, or as confident as possible, that these cases in fact do not contain the signal that is being sought (e.g., in cancer screening it means that the patient either is healthy or has such a minor version of the disease in question that the medical community does not consider the disease to be present). Verification of actually-negative cases usually is accomplished by using an independent, highly accurate diagnostic tool, often known as the "gold standard." (In cancer screening, this is usually a pathology report and/or some period of patient follow-up to ensure that the "gold standard" diagnosis was in fact correct.)

*Actually-positive* cases are those cases that have been verified to be positive with respect to an explicitly-specified disease of interest. This means that the researcher is very confident, or as confident as possible, that these cases in fact contain the signal that is being sought (e.g., in cancer screening it means that the patient has the disease in a form that the medical community considers to be cancer). Verification of actually-negative cases usually is accomplished by using an independent, highly accurate diagnostic tool, often known as the "gold standard." (In cancer screening, this is usually a pathology report.)

## TPF, FPF, TPR, FPR, TP, FP, TN, FN

*False Positive Fraction* (FPF) — or equivalently, *False Positive Rate* (FPR) — corresponds to the number of actually-negative cases incorrectly diagnosed as positive by the modality under investigation divided by the total number of actually-negative cases — i.e., the fraction of actually-negative cases that has been classified falsely as positive. Note that unless the test is inherently binary (Yes/No), this value corresponds to a specific setting of threshold for the decision variable — e.g., see Pepe’s book for details ^{2}. The same applies to all the following definitions. A *False Positive *(FP) is an actually-negative case that has been classified incorrectly as positive. Similarly, a *False Negative *(FN) is an actually-positive case that has been classified incorrectly as negative. FPF is equivalent to 1.0 — "Specificity".

* *

*True positive fraction* (TPF) — or equivalently, *True Positive Rate* (TPR) — corresponds to the number of actually-positive cases correctly diagnosed as positive by the modality under investigation divided by the total number of actually-positive cases— i.e., the fraction of actually-positive cases that has been classified correctly ("truly") as positive. A *True Positive *(TP) is an actually-positive case that has been classified correctly as positive. Similarly, a *True Negative *(TN) is an actually-negative case that has been classified correctly as negative. TPF is equivalent to "Sensitivity".

Estimates of these index values can be computed from empirical ROC data. They can be estimated either as a function of a threshold setting against which the test-result (decision variable) is compared or as a function of the complementary decision fraction — i.e., TPF can be estimated as a function of FPF or FPF can be estimated as a function of a TPF. Because particular values of these decision-fractions depend upon the more or less arbitrary choice of a threshold setting or complementary decision-fraction value, they are prone to data-dredging (looking for the “best” threshold) and/or arbitrary decision-making. Therefore, users must chose these threshold settinsg or complementary decision-fraction values very carefully, justify them in reasonable detail, and fix them before analyzing the data, (One should note that even “looking at the plot” constitutes a form of data-dredging).

## Ordinal confidence ratings, continuous ratings and categorical data

By “ordinal confidence ratings” here, we mean inherently ordinal data that are labeled by verbal expressions that represent a rater's ranges of confidence in the presence or absence of a signal; for example, the set of rating categories for interpretations of radiographic images could be “signal definitely absent”, “signal probably absent”, “signal possibly present, or equivocal”, “signal probably present”, and “signal definitely present". At least in principle, another example of ordinal categories is provided by the BI-RADS reporting scale used in mammography. Data of this kind are often called categorical or discrete because there are many fewer categories than data points and usually the distinction between categories is rather sharp (though this does not preclude the existence of borderline cases). For analysis, the category labels usually are replaced by integers to replicate a rank-ordering of the data, e.g.:

Signal definitely absent |
1 |

Signal probably absent |
2 |

Signal possibly absent, equivocal |
3 |

Signal probably present |
4 |

Signal definitely present |
5 |

“Continuous ratings” are quantitative probability estimates or lab-test results — e.g., blood-serum concentrations — and are expressed by decimal numbers (e.g., 3.456). Usually there are many more possible values than data points.

In general, empirical datasets range from those that represent data in a very few categories (only two for an analysis in terms of sensitivity and specificity) to those that involve subjective probability scales with perhaps several hundred categories, or to those that use the quasi-continuous scales of clinical laboratory tests or neural-network outputs. One should note that apart from numerical and statistical issues (e.g., standard error and bias of estimates), all of these data are equivalent from the perspective of ROC analysis as long as they can be ranked unequivocally.

## Area, Area under the curve, AUC_{ }

By far the most commonly used summary index in ROC analysis is the area under the ROC curve ^{2}, which can be interpreted as the average TPF value (i.e., "sensitivity") of the diagnostic test. Usually it is called "area under the curve" (AUC), because this terminology applies to any ROC curve, defined using any model, whether parametric, semi-parametric or non-parametric. Sometimes this terms is replaced colloquially by “area” when context makes the interpretation unequivocal. One should note that " A_{z}" technically refers only to areas under ROC curves that arise from the conventional binormal model (CvBM) ans so should not be used to represent AUC in general.

## Partial Area under the curve, pAUC_{ }

A less commonly employed — but potentially less “vague” — summary index in ROC analysis is provided the partial area under the ROC curve ^{2}, which represents either the average TPF value provided by the test over a limited range of FPF values (vertical partial AUC), or the average value of specificity = 1-FPF over a limited range of TPF values (horizontal partial AUC). Often this partial area under the curve is denoted by "pAUC", which applies to any ROC curve, defined using any model, whether parametric, semi-parametric or non-parametric.

Because the ranges for used to compute partial area are chosen more or less arbitrarily by users, they are both prone to data-dredging (looking for the “best” threshold) and/or arbitrary decision making. Therefore, users must chose these threshold settinsg or complementary decision-fraction values very carefully, justify them in reasonable detail, and fix them before analyzing the data, (Again, one should note that even “looking at the plot” constitutes a form of data-dredging).

## Jackknife: Leave-None-Out, Leave-One-Out and pseudovalues

Statistical inferences made by this program regarding multi-reader, multi-case (MRMC) datasets are based on Analysis of Variance (ANOVA), with the data for the ANOVA computation produced by use of the Tukey-Quenouille jackknife ^{17}. MRMC datasets analyzed by this version of ROC-kit are required to be fully crossed, i.e., every reader must read every case using in modality. If the user selects a subset of an MRMC dataset that involves only a single reader and a single modality, then the resulting sequence of actually-negative and actually-positive cases will be referred to hereafter in this guide as a reader-modality dataset.

The first step in an MRMC analysis is to compute the ROC index estimate of interest (e.g., AUC) for each reader-modality dataset. These are called "Leave-None-Out" estimates, because no cases are left out of the sample. For a particular a reader-modality dataset, let us denote this estimate by q. Because ROC index values such as AUC cannot be computed for a single case, one way to compute the contribution of each case to the "Leave-None-Out" index estimate is to recompute the index after taking out the k^{th} case, thereby yieldin the " Leave-One-Out" estimate, q_{(k)}. The contribution of the k^{th} case can then be estimated by its"pseudovalue"

## Conventional binormal model (CvBM)

The term “conventional binormal model” has been used recently in comparisons with so-called “proper" ROC curve-fitting models”; however, for many years it was known simply as “the binormal model”. Each possible case is assumed to be associated with a value x. Usually, this variable x does not represent the quantity that is measured in the experiment being analyzed, but instead is some unknown monotonic transformation of that value, and therefore is called a *latent* (i.e., not observed) decision variable. In the conventional binormal model, x is assumed to be normally distributed both for the actually-negative cases and for the actually-positive. In general, the two truth-conditional normal distributions are assumed to have different means and standard deviations: m_{n}, s_{n} for the actually-negative cases and m_{s}, s_{s} for the actually-positive cases. The usual convention is that positive-cases are more likely to have larger values, which implies that m_{n }< m_{s}.

If we let f denote the normal distribution function, then the value of f(m_{n}, s_{n}, -x_{t}) indicates the fraction of actually-negative cases with a value above x_{t} (i.e., the false positive fraction or FPF) and f(m_{s}, s_{s}, -x_{t}) indicates the fraction of actually-positive cases with a value above x_{t }(the true positive fraction or TPF at the same threshold setting). (Recall that the fraction of cases above x is 1 - f(m, s, x), which is the same as f(m, s, -x) because of symmetry.)

The ROC curve is produced by sweeping the value of x_{t} and plotting f(m_{s}, s_{s}, -x_{t}) against f(m_{n}, s_{n}, -x_{t}) in the ROC plot (TPF vs FPF) ^{2}. The latent variable x can be assumed to have produced the measurements (or test result values), which we are using to make the MRMC study, after it had been subjected to a monotonic transformation and possibly some discretization (to produce discrete values from the continuous values of the normal distribution). This model is called "binormal" because we are assuming that some monotonic transformation of the decision variable arises from one of two normal distributions that correspond to actually-positive and actually-negative cases, respectively.

Unless the two normal probability density functions have precisely the same widths (i.e., the same standard devistions), they must intersect twice, which necessarily causes means that there are either large values for which it is more likely to have an actually-negative case than to have an actually-positive (because the actually-negative conditional distribution is larger there after they intersected) or that there are smaller values for which it is more likely to have an actually-positive case than to have an actually-negative (because the actually-positive conditional distribution is larger there because the value is before the intersection). This will create a so-called hook. (Note that this behavior is sometimes not negligible.) A “proper model” is a correction to this. The conventional binormal model is usually defined using two parameters: “a” that represents the vertical intercept and “b” that represents the slope of the fitted ROC curve when it is plotted as a straight line on “normal-deviate” axes. The first parameter is related to the difference between the means of the two normal distributions, while the second parameter is the ratio of their standard deviations.

## Proper binormal model (PROPROC)

The term “proper binormal model" often is used to distinguish a second model based on normal distributions from that which we call the “conventional binormal model". Although the proper binormal model also assumes that a pair of normal latent-variable distributions underlies the classification process (see the preceding section for a definition of latent variables), it assumes that the ROC curve is produced not by sweeping a threshold through the normally-distibuted quantity, but instead by transforming each normally-distributed value to its corresponding likelihood ratio and then sweeping a threshold through the values of liklihood ratio ^{14}. Thus, the proper binormal model employs likelihood ratio (or equivalently, any monotonically increasing function of it) as the latent decision variable. Accordingly, the coordinates of the point on the ROC curve corresponding to each threshold value has an abscissa (FPF) equal to the number of actually negative cases with likelihood ratio above the threshold divided by the total the number of actually negative cases, and an ordinate (TPF) equal to the number of actually positive cases with likelihood ratio above the threshold divided by the total the number of actually positive cases. This use of likelihood ratio as the decision variable for the curve-fitting model prevents the formation of the so-called hooks. Other solutions to the “hooks” problem have been suggested and some are or will be included in this software package (e.g., the so-called "contaminated" models). The exact meaning of the parameters of the proper binormal model, d_{a} and c, is not intuitive. In general d_{a} is proportional (non-linearly) to the value of the area under the curve, whereas c is proportional (also non-linearly) to the skewness of the curve. ROC-kit's users are encouraged to tefer to a paper by Metz and Pan ^{14} and to a subsequent paper by Pesce and Metz ^{6} for additional details concerning this model and its implementation.

## LABROC4/LABROC5 categorization

* LABROC4 categorization*. Before maximum-likelihood estimation of the ROC parameters is attempted, continuously-distributed input data are rank-ordered and then collapsed into truth-state runs as per the LABROC4 algorithm, ^{5}. This categorization sceme was developed initially for fitting ROC curves on the basis of the conventional binormal model but applies in principle to all ROC curve-fitting models. Because ROC analysis is concerned with the ability of an algorithm to separate actually-negative from actually-positive cases, after a dataset's test-result values have been rank ordered the actual values of the test results can be replaced by their ranks with no loss of information (i.e., the resulting ROC curve will be identical). To clarify the concepts, let us consider a simple example with a dataset that involves only 12 cases; 6 actually-negative (green) and 6 actually-positive (red). The following steps will be involved:

- Let us suppose that on input we have the actually-negative cases first, followed by the actually-positive (the 1
^{st}row immediately beloe) - The cases are ranked (2
^{nd}row) - The values are replaced with their ranks (3
^{rd}row) - Contiguous ranks of the same truth (either all actually-positive or all actually-negative) are collapsed (4
^{th}row). - The cases with the same number are counted and those counts are the new categorical data (the subsequent table). That second table produces exactly the same empirical ROC curve — and hence, the same fitted ROC curve — as the original data, apart from possible numerical issues.

5.0 |
7.0 |
3.0 |
1.0 |
1.0 |
3.0 |
6.0 |
7.0 |
8.0 |
10.0 |
1.0 |
7.0 |

1.0 |
1.0 |
1.0 |
3.0 |
3.0 |
5.0 |
6.0 |
7.0 |
7.0 |
7.0 |
8.0 |
10.0 |

1 |
1 |
1 |
2 |
2 |
3 |
4 |
5 |
6 |
6 |
7 |
8 |

1 |
1 |
1 |
2 |
2 |
2 |
3 |
4 |
5 |
5 |
5 |
5 |

**Categorical data obtained after original test-result data in example above are categorized by LABROC4 scheme:**

Category index |
I |
II |
III |
IV |
V |

# actually-negative cases |
2 |
3 |
0 |
1 |
0 |

# actually-positive cases |
1 |
0 |
1 |
0 |
4 |

*LABROC5 categorization*. The runs obtained from the LABROC4 algorithm (let us supposed that it generated L categories, in previous example 5) can then reduced to K categories in an *ad hoc** *but empirically useful way, in the sense that it has been shown empirically to produce good fits ^{5}. Here we describe a simplified version of the algorithm to aid understanding; details of the full algorithm can be found in Metz et al. ^{5}.

The steps in the LABROC5 algorithm (starting from the previous LABROC4 points):

- Consider the boundaries between points (in the example above they are between 0 and 1, 1 and 2, 2 and 3, 3 and 4). Since we sweep from the more positive to the less positive test-result values, the boundaries are ordered from the largest value to the smallest (1
^{st}row) - The boundaries are transformed to their corresponding operating points — i.e., to their corresponding values of FPF and TPF (2
^{nd}row) - The two decision fractions for each point are summed cumulatively to obtain the so-called "city block distance" of each point from (0,0) (3
^{rd}row) - The city block distances are centered on 1 (which corresponds to a point on the -45 degrees diagonal) (4
^{th}row) - These values are transformed into their normal distribution values (5
^{th}row) - The “most uniform” values are selected and the categories in between are collapsed

4,3 |
3,2 |
2,1 |
1,0 |

(0, 4/6) |
(1/6, 4/6) |
(1/6, 5/6) |
(4/6, 5/6) |

4/6 |
5/6 |
1 |
9/6 |

-1/3 |
-1/6 |
0 |
½ |

+.37 |
+.43 |
.5 |
.69 |

OK |
Collapse |
OK |
OK |

** **

**Categorical data obtained after original test-result data in example above are categorized by LABROC5 scheme:**

Category index |
I |
II |
III |
IV |

# actually-negative cases |
2 |
3 |
1 |
0 |

# actually-positive cases |
1 |
0 |
1 |
4 |

** **

** NOTE: **The reduced set of LABROC5 points does not preserve all information in the original dataset.

# APPENDIX

** **

(reserved for future use)

* *

# References

[1] Wagner RF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 2007;14(6):723-48.

[2] Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford; New York: Oxford University Press, 2004.

[3] Zhou X-h, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. New York: Wiley-Interscience, 2002.

[4] Metz CE, Herman BA, Roe CA. Statistical comparison of two ROC-curve estimates obtained from partially-paired datasets. Med Decis Making 1998;18(1):110-21.

[5] Metz CE, Herman BA, Shen JH. Maximum likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat Med 1998;17(9):1033-53.

[6] Pesce LL, Metz CE. Reliable and computationally efficient maximum-likelihood estimation of "proper" binormal ROC curves. Acad Radiol 2007;14(7):814-29.

[7] Kendall MG, Stuart A, Ord JK, Arnold SF, O'Hagan A. Kendall's advanced theory of statistics. London, New York: Edward Arnold; Halsted Press, 1994.

[8] Gallas BD. One-shot estimate of MRMC variance: AUC. Acad Radiol 2006;13(3):353-62.

[9] Zhou XH, Gatsonis CA. A simple method for comparing correlated ROC curves using incomplete data. Stat Med 1996;15(15):1687-93.

[10] Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27(9):723-31.

[11] Gruszauskas NP, Drukker K, Giger ML, Sennett CA, Pesce LL. Performance of breast ultrasound computer-aided diagnosis: dependence on image selection. Acad Radiol 2008;15(10):1234-45.

[12] Efron B, Tibshirani R. An introduction to the bootstrap. New York: Chapman & Hall, 1993.

[13] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44(3):837-45.

[14] Metz CE, Pan X. "Proper" Binormal ROC Curves: Theory and Maximum-Likelihood Estimation. J Math Psychol 1999;43(1):1-33.

[15] Metz CE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 1989;24(3):234-45.

[16] Glantz SA. Primer of biostatistics. New York: McGraw-Hill, Medical Pub. Div., 2002.

[17] Tukey JW. Bias and Confidence in Not-Quite Large Samples. Annals of Mathematical Statistics 1958;29(2):614-.