About this document

This document provides the RMarkdown behind the figures and interactive visualisations in our paper for anyone who wants to see how they were created or who wishes to extend them.

Source code (Rmarkdown) is available from Github at https://github.com/DavidRLovell/Never-mind-the-metrics
Some of these figures link to the Desmos Graphing Calculator
- Desmos automatically ensures that the underlying code is available to copy and develop further.

Generating confusion matrices

We begin by generating all possible confusion matrices of a given size. A confusion matrix comprises four non-negative integers and the sum of these is known as a weak composition. Here is a function to create all k element weak compositions of total n, based on code by Michel Billaud

makeAllWeakCompositions <- function(n,k){
  # Initialise the matrix that will hold all compositions
  composition <- matrix(data=0, nrow=choose(n+k-1,k-1), ncol=k)
  
  composition[1,k]  <- n # Set the first composition (0,...,0,n)
  current.row       <- 1 # Set the current row to the first row
  last.nonzero      <- k # The last non-zero element of the current row is in position k
  
  # While the first element of the current row is less than n... 
  while(composition[current.row,1] < n){
    # generate the next row
    next.row <- current.row + 1
    # copy the current row into the next row
    composition[next.row,] <- composition[current.row,]
    # turn    a b ...   y   z 0 0 ...   0
    #                       ^ last
    # into    a b ... (y+1) 0 0 0 ... (z-1)
    
    last.nonzero                            <- max(which(composition[next.row,] > 0))
    z                                       <- composition[current.row, last.nonzero]
    composition[next.row, last.nonzero - 1] <- composition[current.row, last.nonzero - 1] + 1
    composition[next.row, last.nonzero    ] <- 0
    composition[next.row, k               ] <- z - 1
    current.row                             <- next.row
  }
  return(composition)
}

To demonstrate, here are the first six and the last six weak compositions of four elements that sum to 5

makeAllWeakCompositions(5,4) %>% head()

     [,1] [,2] [,3] [,4]
[1,]    0    0    0    5
[2,]    0    0    1    4
[3,]    0    0    2    3
[4,]    0    0    3    2
[5,]    0    0    4    1
[6,]    0    0    5    0

makeAllWeakCompositions(5,4) %>% tail()

      [,1] [,2] [,3] [,4]
[51,]    3    1    1    0
[52,]    3    2    0    0
[53,]    4    0    0    1
[54,]    4    0    1    0
[55,]    4    1    0    0
[56,]    5    0    0    0

The number \(C_k^{'}(n)\) of compositions of a number \(n\) of length \(k\) (where 0 is allowed) is given by (Weisstein, n.d.):

\[ \begin{align} C_k^{'}(n) &= \binom{n+k-1}{k-1}\\ &=\frac{(n+k-1)!}{n!(k-1)!} \end{align} \]

so \[ \begin{align} C_4^{'}(0) &= 1\\ C_4^{'}(1) &= 4\\ C_4^{'}(2) &= 10\\ C_4^{'}(3) &= 20\\ C_4^{'}(4) &= 35\\ C_4^{'}(5) &= 56\\ &\vdots\\ C_4^{'}(100) &= 1.76851\times 10^{5} \end{align} \]

Now that we can generate all possible confusion matrices of a given total, we can augment them with various performance metrics.

Adding performance metrics to confusion matrices

The following function returns a dataframe representing all possible confusion matrices of size n, projected into 3-dimensions with a range of performance metrics added as columns

make.confmat <- function(n) {
  # This matrix is used to project the four dimensional confusion matrix into three dimensions
  project3d <- matrix(
    c(
        0 ,   0 ,   1, # TP
        0 ,   1 ,   0, # FP
        1 ,   0 ,   0, # FN
      -1/3, -1/3, -1/3 # TN
    ), byrow = TRUE, nrow=4
  )
  
  makeAllWeakCompositions(n,4) -> abcd              # All confusion matrices of size n
  colnames(abcd) <- c("TP", "FP", "FN", "TN")       # with columns named after confusion matrix elements
  abcd %*% project3d           -> xyz               # ...projected into 3D
  colnames(xyz)  <- c("x", "y", "z")                # with columns named after the three dimensions
  bind_cols(as_tibble(abcd), as_tibble(xyz)) %>%    # ...bound side by side
    mutate(                                         # and augmented with...
      text       =sprintf("%2d %2d\n%2d %2d", TP,FP,FN,TN), # Label for plotly
      Pos        =TP+FN,                            # Number of actual positives
      Neg        =FP+TN,                            # Number of actual negatives
      TPR        =TP/Pos,                           # True Positive Rate
      FPR        =FP/Neg,                           # False Positive Rate
      PLR        =TPR/FPR,                          # Positive Likelihood Ratio (LR+)
      TNR        =TN/Neg,                           # True Negative Rate
      FNR        =FN/Pos,                           # False Negative Rate
      NLR        =FNR/TNR,                          # Negative Likelihood Ratio (LR-)
      DOR        =PLR/NLR,                          # Diagnostic Odds Ratio
      prior.O    =Pos/Neg,                          # Prior odds of actual class being X
      prior.P    =zdiv(Pos,Neg),                    # Prior prob of actual class being X
      post.O     =TP/FP,                            # Posterior odds that actual class is X
      post.P     =zdiv(TP,FP),                      # Posterior probability that actual class is X 
      prior.O.n  =Neg/Pos,                          # Prior odds of actual class NOT being X
      prior.P.n  =zdiv(Neg,Pos),                    # Prior prob of actual class NOT being X
      post.O.n   =TN/FN,                            # Posterior odds that actual class is NOT X 
      post.P.n   =zdiv(TN,FN),                      # Posterior probability that actual class is NOT X
      MCC        =MCC(TP,FP,FN,TN),                 # Matthews correlation coefficient
      logDOR     =log(DOR),                         # log of DOR
      #slogDOR    =logDOR/log((Pos-1)*(Neg-1)),     # scaled log of DOR
      J          =TPR + TNR - 1,                    # Youden's J, Balanced Accuracy
      Acc        =(TP+TN)/(Pos + Neg),              # Accuracy
      F1         =2*TP / (2*TP + FP + FN),          # F1
      Markedness =post.P + post.P.n - 1,            # Markedness
      g.mean     =sqrt(TPR * TNR),                  # Geometric mean
      Prev.Thresh=sqrt(FPR)/(sqrt(TPR)+sqrt(FPR)),  # Prevalence threshold
      Threat.Scr =TP / (TP + FN + FP),              # Threat score
      Fowlkes.M  =sqrt(post.P * TPR),               # Fowlkes-Mallows index
    ) 
  }

All possible ROC and Precision-Recall reference points

http://bit.ly/see-ROC-reference-points shows all possible \((p+1)\times(n+1)\) points in ROC and Precision-Recall spaces corresponding to confusion matrices of size \(N = p+n\), coloured from red (low) to blue (high) Balanced Accuracy. Users can change \(N\) and \(p\) by adjusting the sliders in the left hand side of the Desmos window.

Confusion matrix performance metric contours

http://bit.ly/see-confusion-metrics enables us to interactively visualise a range of confusion matrix performance metrics by plotting their contours, coloured from red (low) to white (middle) to blue (high).

Users can change \(N\) and \(p\) by adjusting the sliders in the left hand side of the Desmos window, and can set the position of a test point by adjusting the \(a_1\) and \(d_1\) sliders. There are many things that users can turn on and off by clicking on the small round circles at the left edge of the screen:

Contours of prevalence-dependent and prevalence independent metrics. These switches are titled Show Accuracy, Show MCC, through to Show Geometric Mean and, when activated, display the contours of the chosen performance metrics
Additional information and decoration switches allow users to show all possible ROC points; a movable test point whose corresponding confusion matrix and performance metric values can be displayed; and various titles. Importantly, users can toggle the limits of what is displayed, so that performance metric contours beyond ROC space can be visualised.

Uncertainty in confusion matrices and their performance metrics

http://bit.ly/see-confusion-uncertainty enables interactive exploration of the posterior predictive pmfs of confusion matrices and three performance metrics (MCC, BA, \(F_1\)) under binomial and beta-binomial models of uncertainty.

Users can change \(N\) and \(p\) by adjusting the sliders in the left hand side of the Desmos window, and can set the position of a test point by adjusting the \(a\) and \(d\) sliders. There are many things that users can turn on and off by clicking on the small round circles at the left edge of the screen:

Marginal and joint pmfs of True and False Positive rates. Users can show these posterior predictive probability mass functions for confusion matrices of size \(N=p+n\) under binomial and beta-binomial models of uncertainty, given that \(a\) True Positives and \(d\) True Negatives have been observed.
Posterior predictive pmfs of MCC, BA and \(F_1\) can be shown using the Show PMF... switches for each performance metric. There are also switches to show the unique performance metric values (Show rug...), the number of times these unique values are observed (Show count...) and a histogram summary of the probability mass functions (Show histogram...).
Additional information and decoration switches allow users to show all possible ROC points; a movable test point whose corresponding confusion matrix and performance metric values can be displayed; and various labels.
item[Axis and point size scales are sliders that allow users to adjust the size of the points used in the joint pmf display, the maximum of the performance metric pmfs y-axis (\(P_{max}\)), and the maximum of the performance metric counts y-axis (\(C_{max}\)).

As noted on the visualisation, Desmos supports lists of up to \(10\,000\) elements, so \(np\) must be \(< 10\,001\) for the joint probability mass distributions to plot. This visualisation runs in your web browser and interaction becomes slow with larger values of \(np\): we recommend starting with \(N=60, p=20\).

Display an interactive confusion simplex

The helper function plot.simplex() generates an interactive 3D visualisation of a confusion matrix, coloured by a chosen metric. Here it is used to provide 3D projections of binary confusion matrices of size 100. Each point corresponds to a unique confusion matrix and is coloured by the value of that matrix’s Matthews Correlation Coefficient (MCC). For reference, we label the four extreme points corresponding to all True Positives, (TP=100), all False Negatives (FN=100), etc., and connect those vertices to give an impression of the regular tetrahedral lattice (i.e., the 3-simplex) of the projected points. In total, there are \(\binom{100+4-1}{4-1}=176\,851\) different binary confusion matrices of size 100. Rather than show all these, we have taken three slices through the lattice: from back to front, the rectangular lattices of points correspond to confusion matrices where \(p = 20, 50, 90\), respectively.

plot.simplex(confmat.100, metric="MCC")

Mouse over the tetrahedron, then click and drag to change its orientation. Click on the text Pos==20 to toggle that slice of the confusion matrix.

Here is the same confusion simplex, this time, coloured by Accuracy.

plot.simplex(confmat.100, metric="Acc")

Plot the contours of MCC in ROC space

Here is a confusion matrix representing \(N=a+b+c+d\) examples \[ \begin{bmatrix} \mathrm{TP} & \mathrm{FP}\\ \mathrm{FN} & \mathrm{TN} \end{bmatrix}= \begin{bmatrix} a & b\\ c & d \end{bmatrix} \] in which there are \(p=a+c\) actual positives and \(n=b+d\) actual negatives.

The Matthews Correlation Coefficient is defined to be \[ \begin{align} \mathrm{MCC}(a, b, c, d) &=\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\\ &=\frac{ad-(n-d)(p-a)}{\sqrt{(a+n-d)pn(p-a+d)}}. \end{align} \]

For given numbers of positives (\(p\)) and negatives (\(n\)), this performance metric achieves a value of \(-1 \leq k \leq 1\) along the contour lines with \[ a(k, p, n, d) = \left\{ \begin{array}{ c l } \frac{1}{2 (k^2 p + n)} \left( +\sqrt{ \frac {k^2 p (n + p)^2 (4d(n-d) + k^2 n p)} {n} } + 2dp(k^2 - 1) + k^2p(p - n) + 2np \right), & k \geq 0\\ \frac{1}{2 (k^2 p + n)} \left( -\sqrt{ \frac {k^2 p (n + p)^2 (4d(n-d) + k^2 n p)} {n} } + 2dp(k^2 - 1) + k^2p(p - n) + 2np \right), & k < 0 \end{array} \right. \]

To illustrate these contours in ROC space, here is an orthographic projection of the slice of points from the confusion simplex shown above where \(p=20\) and \(n=80\), coloured by the value of the Matthews Correlation Coefficient (MCC). The continuous lines indicate the contours of MCC, ranging from \(-0.9, -0.8, \dots, 0.9\). Note that while MCC can be calculated for continuous arguments, empirical confusion matrices give rise to a finite set of \((p+1)\times(n+1)\) arguments, corresponding to the points in this 2D lattice.

…and here are the same points and contours scaled to fit within the ROC space. ROC curves plot a classifier’s true positive rate against its false positive rate in the space of rational numbers from \([0,1]\times[0,1]\). This is equivalent to re-scaling the \(x\)-axis of (a) by a factor of \(\tfrac1n\) and the \(y\)-axis by \(\tfrac1p\). Again, the contours of the MCC performance metric are defined continuously, but empirical confusion matrices can only take on values at the discrete points in this plot which have \(n+1=81\) possible \(x\)-values \((0, \tfrac1n, \tfrac2n, \dots, 1)\) and \(p+1=21\) possible \(y\)-values \((0, \tfrac1p, \tfrac2p, \dots, 1)\).

Where do different ROC points have the same performance metric value?

Each panel shows all the possible points in the ROC space of confusion matrices of 20 positive and 40 negative examples (top row) and 20 positive and 41 negative examples (bottom row). Points are coloured by the number of times the performance metric value at that point is observed in the confusion matrices of those totals. Three different performance metrics are presented: MCC (left), BA (middle), \(F_1\) (right). Performance metric contours are shown in the background, coloured by their value. Note that one additional negative example changes the configuration of possible points in ROC space so that each possible MCC and BA value is unique (bottom left and middle); the multiplicity of different \(F_1\) values remains much the same. (bottom right).

Visualising the joint probability mass function of TP and TN

Here are two ways to show confusion matrix pmfs and performance metric contours in ROC space. Both plots show the posterior predictive pmf of confusion matrices under a beta-binomial model of uncertainty for a classifier observed to produce the confusion matrix \[\begin{bmatrix}16&8\\4&32\end{bmatrix}\] The left plot uses circle areas to represent probability mass; the right plot uses ridge lines. In the background are the contours of the \(F_1\) performance metric and in black are the contours \(F_1=\tfrac{4}{10}\) and \(F_1=\tfrac{2}{3}\), along each of which lie 11 points in ROC space.

Visualising how uncertainty changes with more data

Reducing uncertainty in performance metrics requires more data to increase the precision of the predictive distribution of confusion matrices. These four contour plots show the posterior predictive pmfs (under a beta-binomial model of uncertainty) after observing confusion matrices of increasing size but with the same false and true positive rates (0.2, 0.8). From left to right, the sizes of confusion matrix increase by a factor of 4 and the heights and widths of the contours decrease by a factor of \(\tfrac{1}{2}\).

We can add the contours of a performance metric (in this case MCC) in the background:

Never mind the metrics—what about the uncertainty?

David Lovell

January 2023