This document provides the RMarkdown behind the figures and interactive visualisations in our paper for anyone who wants to see how they were created or who wishes to extend them.
We begin by generating all possible confusion matrices of a given
size. A confusion matrix comprises four non-negative integers and the
sum of these is known as a weak composition. Here is a function
to create all k
element weak compositions of total
n
, based on code by Michel
Billaud
makeAllWeakCompositions <- function(n,k){
# Initialise the matrix that will hold all compositions
composition <- matrix(data=0, nrow=choose(n+k-1,k-1), ncol=k)
composition[1,k] <- n # Set the first composition (0,...,0,n)
current.row <- 1 # Set the current row to the first row
last.nonzero <- k # The last non-zero element of the current row is in position k
# While the first element of the current row is less than n...
while(composition[current.row,1] < n){
# generate the next row
next.row <- current.row + 1
# copy the current row into the next row
composition[next.row,] <- composition[current.row,]
# turn a b ... y z 0 0 ... 0
# ^ last
# into a b ... (y+1) 0 0 0 ... (z-1)
last.nonzero <- max(which(composition[next.row,] > 0))
z <- composition[current.row, last.nonzero]
composition[next.row, last.nonzero - 1] <- composition[current.row, last.nonzero - 1] + 1
composition[next.row, last.nonzero ] <- 0
composition[next.row, k ] <- z - 1
current.row <- next.row
}
return(composition)
}
To demonstrate, here are the first six and the last six weak compositions of four elements that sum to 5
makeAllWeakCompositions(5,4) %>% head()
[,1] [,2] [,3] [,4]
[1,] 0 0 0 5
[2,] 0 0 1 4
[3,] 0 0 2 3
[4,] 0 0 3 2
[5,] 0 0 4 1
[6,] 0 0 5 0
makeAllWeakCompositions(5,4) %>% tail()
[,1] [,2] [,3] [,4]
[51,] 3 1 1 0
[52,] 3 2 0 0
[53,] 4 0 0 1
[54,] 4 0 1 0
[55,] 4 1 0 0
[56,] 5 0 0 0
The number \(C_k^{'}(n)\) of compositions of a number \(n\) of length \(k\) (where 0 is allowed) is given by (Weisstein, n.d.):
\[ \begin{align} C_k^{'}(n) &= \binom{n+k-1}{k-1}\\ &=\frac{(n+k-1)!}{n!(k-1)!} \end{align} \]
so \[ \begin{align} C_4^{'}(0) &= 1\\ C_4^{'}(1) &= 4\\ C_4^{'}(2) &= 10\\ C_4^{'}(3) &= 20\\ C_4^{'}(4) &= 35\\ C_4^{'}(5) &= 56\\ &\vdots\\ C_4^{'}(100) &= 1.76851\times 10^{5} \end{align} \]
Now that we can generate all possible confusion matrices of a given total, we can augment them with various performance metrics.
The following function returns a dataframe representing all possible
confusion matrices of size n
, projected into 3-dimensions
with a range of performance metrics added as columns
make.confmat <- function(n) {
# This matrix is used to project the four dimensional confusion matrix into three dimensions
project3d <- matrix(
c(
0 , 0 , 1, # TP
0 , 1 , 0, # FP
1 , 0 , 0, # FN
-1/3, -1/3, -1/3 # TN
), byrow = TRUE, nrow=4
)
makeAllWeakCompositions(n,4) -> abcd # All confusion matrices of size n
colnames(abcd) <- c("TP", "FP", "FN", "TN") # with columns named after confusion matrix elements
abcd %*% project3d -> xyz # ...projected into 3D
colnames(xyz) <- c("x", "y", "z") # with columns named after the three dimensions
bind_cols(as_tibble(abcd), as_tibble(xyz)) %>% # ...bound side by side
mutate( # and augmented with...
text =sprintf("%2d %2d\n%2d %2d", TP,FP,FN,TN), # Label for plotly
Pos =TP+FN, # Number of actual positives
Neg =FP+TN, # Number of actual negatives
TPR =TP/Pos, # True Positive Rate
FPR =FP/Neg, # False Positive Rate
PLR =TPR/FPR, # Positive Likelihood Ratio (LR+)
TNR =TN/Neg, # True Negative Rate
FNR =FN/Pos, # False Negative Rate
NLR =FNR/TNR, # Negative Likelihood Ratio (LR-)
DOR =PLR/NLR, # Diagnostic Odds Ratio
prior.O =Pos/Neg, # Prior odds of actual class being X
prior.P =zdiv(Pos,Neg), # Prior prob of actual class being X
post.O =TP/FP, # Posterior odds that actual class is X
post.P =zdiv(TP,FP), # Posterior probability that actual class is X
prior.O.n =Neg/Pos, # Prior odds of actual class NOT being X
prior.P.n =zdiv(Neg,Pos), # Prior prob of actual class NOT being X
post.O.n =TN/FN, # Posterior odds that actual class is NOT X
post.P.n =zdiv(TN,FN), # Posterior probability that actual class is NOT X
MCC =MCC(TP,FP,FN,TN), # Matthews correlation coefficient
logDOR =log(DOR), # log of DOR
#slogDOR =logDOR/log((Pos-1)*(Neg-1)), # scaled log of DOR
J =TPR + TNR - 1, # Youden's J, Balanced Accuracy
Acc =(TP+TN)/(Pos + Neg), # Accuracy
F1 =2*TP / (2*TP + FP + FN), # F1
Markedness =post.P + post.P.n - 1, # Markedness
g.mean =sqrt(TPR * TNR), # Geometric mean
Prev.Thresh=sqrt(FPR)/(sqrt(TPR)+sqrt(FPR)), # Prevalence threshold
Threat.Scr =TP / (TP + FN + FP), # Threat score
Fowlkes.M =sqrt(post.P * TPR), # Fowlkes-Mallows index
)
}
http://bit.ly/see-ROC-reference-points shows all possible \((p+1)\times(n+1)\) points in ROC and Precision-Recall spaces corresponding to confusion matrices of size \(N = p+n\), coloured from red (low) to blue (high) Balanced Accuracy. Users can change \(N\) and \(p\) by adjusting the sliders in the left hand side of the Desmos window.
http://bit.ly/see-confusion-metrics enables us to interactively visualise a range of confusion matrix performance metrics by plotting their contours, coloured from red (low) to white (middle) to blue (high).
Users can change \(N\) and \(p\) by adjusting the sliders in the left hand side of the Desmos window, and can set the position of a test point by adjusting the \(a_1\) and \(d_1\) sliders. There are many things that users can turn on and off by clicking on the small round circles at the left edge of the screen:
Show Accuracy
,
Show MCC
, through to Show Geometric Mean
and,
when activated, display the contours of the chosen performance
metrics
http://bit.ly/see-confusion-uncertainty enables interactive exploration of the posterior predictive pmfs of confusion matrices and three performance metrics (MCC, BA, \(F_1\)) under binomial and beta-binomial models of uncertainty.
Users can change \(N\) and \(p\) by adjusting the sliders in the left hand side of the Desmos window, and can set the position of a test point by adjusting the \(a\) and \(d\) sliders. There are many things that users can turn on and off by clicking on the small round circles at the left edge of the screen:
Show PMF...
switches for each performance metric. There are
also switches to show the unique performance metric values
(Show rug...
), the number of times these unique values are
observed (Show count...
) and a histogram summary of the
probability mass functions (Show histogram...
).
As noted on the visualisation, Desmos supports lists of up to \(10\,000\) elements, so \(np\) must be \(< 10\,001\) for the joint probability mass distributions to plot. This visualisation runs in your web browser and interaction becomes slow with larger values of \(np\): we recommend starting with \(N=60, p=20\).
The helper function plot.simplex()
generates an
interactive 3D visualisation of a confusion matrix, coloured by a chosen
metric. Here it is used to provide 3D projections of binary confusion
matrices of size 100. Each point corresponds to a unique confusion
matrix and is coloured by the value of that matrix’s Matthews
Correlation Coefficient (MCC). For reference, we label the four extreme
points corresponding to all True Positives, (TP=100), all False
Negatives (FN=100), etc., and connect those vertices to give an
impression of the regular tetrahedral lattice (i.e., the 3-simplex) of
the projected points. In total, there are \(\binom{100+4-1}{4-1}=176\,851\) different
binary confusion matrices of size 100. Rather than show all these, we
have taken three slices through the lattice: from back to front, the
rectangular lattices of points correspond to confusion matrices where
\(p = 20, 50, 90\), respectively.
plot.simplex(confmat.100, metric="MCC")
Mouse over the tetrahedron, then click and drag to change its
orientation. Click on the text Pos==20
to toggle that slice
of the confusion matrix.
Here is the same confusion simplex, this time, coloured by Accuracy.
plot.simplex(confmat.100, metric="Acc")
Here is a confusion matrix representing \(N=a+b+c+d\) examples \[ \begin{bmatrix} \mathrm{TP} & \mathrm{FP}\\ \mathrm{FN} & \mathrm{TN} \end{bmatrix}= \begin{bmatrix} a & b\\ c & d \end{bmatrix} \] in which there are \(p=a+c\) actual positives and \(n=b+d\) actual negatives.
The Matthews Correlation Coefficient is defined to be \[ \begin{align} \mathrm{MCC}(a, b, c, d) &=\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\\ &=\frac{ad-(n-d)(p-a)}{\sqrt{(a+n-d)pn(p-a+d)}}. \end{align} \]
For given numbers of positives (\(p\)) and negatives (\(n\)), this performance metric achieves a value of \(-1 \leq k \leq 1\) along the contour lines with \[ a(k, p, n, d) = \left\{ \begin{array}{ c l } \frac{1}{2 (k^2 p + n)} \left( +\sqrt{ \frac {k^2 p (n + p)^2 (4d(n-d) + k^2 n p)} {n} } + 2dp(k^2 - 1) + k^2p(p - n) + 2np \right), & k \geq 0\\ \frac{1}{2 (k^2 p + n)} \left( -\sqrt{ \frac {k^2 p (n + p)^2 (4d(n-d) + k^2 n p)} {n} } + 2dp(k^2 - 1) + k^2p(p - n) + 2np \right), & k < 0 \end{array} \right. \]
To illustrate these contours in ROC space, here is an orthographic projection of the slice of points from the confusion simplex shown above where \(p=20\) and \(n=80\), coloured by the value of the Matthews Correlation Coefficient (MCC). The continuous lines indicate the contours of MCC, ranging from \(-0.9, -0.8, \dots, 0.9\). Note that while MCC can be calculated for continuous arguments, empirical confusion matrices give rise to a finite set of \((p+1)\times(n+1)\) arguments, corresponding to the points in this 2D lattice.
…and here are the same points and contours scaled to fit within the ROC space. ROC curves plot a classifier’s true positive rate against its false positive rate in the space of rational numbers from \([0,1]\times[0,1]\). This is equivalent to re-scaling the \(x\)-axis of (a) by a factor of \(\tfrac1n\) and the \(y\)-axis by \(\tfrac1p\). Again, the contours of the MCC performance metric are defined continuously, but empirical confusion matrices can only take on values at the discrete points in this plot which have \(n+1=81\) possible \(x\)-values \((0, \tfrac1n, \tfrac2n, \dots, 1)\) and \(p+1=21\) possible \(y\)-values \((0, \tfrac1p, \tfrac2p, \dots, 1)\).
Each panel shows all the possible points in the ROC space of confusion matrices of 20 positive and 40 negative examples (top row) and 20 positive and 41 negative examples (bottom row). Points are coloured by the number of times the performance metric value at that point is observed in the confusion matrices of those totals. Three different performance metrics are presented: MCC (left), BA (middle), \(F_1\) (right). Performance metric contours are shown in the background, coloured by their value. Note that one additional negative example changes the configuration of possible points in ROC space so that each possible MCC and BA value is unique (bottom left and middle); the multiplicity of different \(F_1\) values remains much the same. (bottom right).
Here are two ways to show confusion matrix pmfs and performance metric contours in ROC space. Both plots show the posterior predictive pmf of confusion matrices under a beta-binomial model of uncertainty for a classifier observed to produce the confusion matrix \[\begin{bmatrix}16&8\\4&32\end{bmatrix}\] The left plot uses circle areas to represent probability mass; the right plot uses ridge lines. In the background are the contours of the \(F_1\) performance metric and in black are the contours \(F_1=\tfrac{4}{10}\) and \(F_1=\tfrac{2}{3}\), along each of which lie 11 points in ROC space.
Reducing uncertainty in performance metrics requires more data to increase the precision of the predictive distribution of confusion matrices. These four contour plots show the posterior predictive pmfs (under a beta-binomial model of uncertainty) after observing confusion matrices of increasing size but with the same false and true positive rates (0.2, 0.8). From left to right, the sizes of confusion matrix increase by a factor of 4 and the heights and widths of the contours decrease by a factor of \(\tfrac{1}{2}\).
We can add the contours of a performance metric (in this case MCC) in the background: