API Reference

dazed.confusion_matrix

Confusion matrix module.

class dazed.confusion_matrix.ConfusionMatrix(y1, y2, labels=None, info=None)

Construct a confusion matrix.

Creates a confusion matrix from multiple different data formats and provides useful methods for exploring the data.

__init__(y1, y2, labels=None, info=None)

Contruct a confusion matrix from sparse values.

In most cases it’s recommended that you use the “from…” methods instead as they offer additional support for multilabel data.

Parameters
  • y1 (List[Union[str, int]]) – A list of true labels.

  • y2 (List[Union[str, int]]) – A list of predicted labels.

  • labels (Optional[List[Union[str, int]]]) – A list of all possible labels (in case not present in y1 and y2).

  • info (Optional[List[Any]]) – A list containing any additional info about each sample.

Example

>>> truth = ["cat", "dog", "cat", "dog", "fish"]
>>> pred = ["cat", "dog", "dog", "cat", "fish"]
>>> ConfusionMatrix(truth, pred)
  | 0 1 2     index | label
---------     -------------
0 | 1 1 0         0 |   cat
1 | 1 1 0         1 |   dog
2 | 0 0 1         2 |  fish
---------     -------------
as_array(present_only=True)

Get confusion matrix as an array.

Parameters

present_only (bool) – Whether to return an matrix that only includes labels present in y1 and/or y2.

Return type

Tuple[ndarray, List[Union[str, int]]]

Returns

A confusion matrix as a numpy array. A list of the confusion matrices labelss.

Example

>>> truth = ["cat", "dog", "cat", "dog", "fish"]
>>> pred = ["cat", "dog", "dog", "cat", "fish"]
>>> ConfusionMatrix.from_sparse(truth, pred).as_array()
(array([[1, 1, 0],
        [1, 1, 0],
        [0, 0, 1]]), ['cat', 'dog', 'fish'])
as_df(present_only=True)

Get confusion matrix as df.

Parameters

present_only (bool) – Whether to return an matrix that only includes labels present in y1 and/or y2.

Return type

DataFrame

Returns

A confusion matrix as pandas dataframe.

Example

>>> truth = ["cat", "dog", "cat", "dog", "fish"]
>>> pred = ["cat", "dog", "dog", "cat", "fish"]
>>> ConfusionMatrix.from_sparse(truth, pred).as_df()
    cat  dog  fish
cat     1    1     0
dog     1    1     0
fish    0    0     1
as_str(present_only=True)

Get confusion matrix as a string.

Parameters

present_only (bool) – Whether to return an matrix that only includes labels present in y1 and/or y2.

Return type

str

Returns

A confusion matrix string.

Example

>>> truth = ["cat", "dog", "cat", "dog", "fish"]
>>> pred = ["cat", "dog", "dog", "cat", "fish"]
>>> print(ConfusionMatrix.from_sparse(truth, pred).as_str())
  | 0 1 2     index | label
---------     -------------
0 | 1 1 0         0 |   cat
1 | 1 1 0         1 |   dog
2 | 0 0 1         2 |  fish
---------     -------------
classmethod from_df(cls, df, y1_names, y2_names, labels=None, info_names=None)

Contruct a confusion matrix from a pandas dataframe.

Parameters
  • df (DataFrame) – A pandas dataframe containing either a column of sparse labels or multiple columns of onehot encoded values.

  • y1_names (Union[List[str], str]) – True column name or a list of prediction column names if multilabel.

  • y2_names (Union[List[str], str]) – Prediction column name or a list of prediction column names if multilabel.

  • labels (Optional[List[Union[str, int]]]) – A list of all possible labels (in case not present in y1 and y2).

  • info_names (Optional[List[str]]) – A list of column names to use for additional sample info.

Returns

A confusion matrix.

Raises

ValueError – If label y1_names or y2 names are not the correct type.

Example

>>> sparse_df = pd.DataFrame()
>>> sparse_df["truth"] = ["cat", "dog", "cat", "dog", "fish"]
>>> sparse_df["pred"] = ["cat", "dog", "dog", "cat", "fish"]
>>> ConfusionMatrix.from_df(sparse_df, "truth", "pred")
  | 0 1 2     index | label
---------     -------------
0 | 1 1 0         0 |   cat
1 | 1 1 0         1 |   dog
2 | 0 0 1         2 |  fish
---------     -------------
>>> onehot_df = pd.DataFrame()
>>> onehot_df["cat_truth"] = [0, 1, 0, 1]
>>> onehot_df["dog_truth"] = [1, 0, 1, 0]
>>> onehot_df["cat_pred"] = [0, 1, 1, 0]
>>> onehot_df["dog_pred"] = [1, 0, 0, 1]
>>> ConfusionMatrix.from_df(
...    onehot_df,
...    ["cat_truth", "dog_truth"],
...    ["cat_pred", "dog_pred"],
...    ["cat", "dog"],
... )
  | 0 1     index | label
-------     -------------
0 | 1 1         0 |   cat
1 | 1 1         1 |   dog
-------     -------------
classmethod from_onehot(y1, y2, labels=None, info=None, multilabel=False)

Contruct a confusion matrix from onehot encoded values.

Parameters
  • y1 (ndarray) – An array of onehot encoded values of shape [num_samples, num_labels].

  • y2 (ndarray) – An array of onehot encoded values of shape [num_samples, num_labels].

  • labels (Optional[List[Union[str, int]]]) – A list of label names, in the same order as the columns of y1 and y2.

  • info (Optional[List[Any]]) – A list containing any additional info about each sample.

  • multilabel (bool) – Indicates whether each sample can have multiple labels.

Returns

A confusion matrix.

Example

>>> truth = np.array([[0, 1], [1, 0], [0, 1], [1, 0]])
>>> pred = np.array([[0, 1], [1, 0], [1, 0], [0, 1]])
>>> ConfusionMatrix.from_onehot(truth, pred, ["cat", "dog"])
  | 0 1     index | label
-------     -------------
0 | 1 1         0 |   cat
1 | 1 1         1 |   dog
-------     -------------
classmethod from_sparse(y1, y2, labels=None, info=None, multilabel=False)

Contruct a confusion matrix from sparse values.

Parameters
  • y1 (Union[List[Union[str, int]], List[List[Union[str, int]]]]) – A list of true labels (a list of lists if multilabel).

  • y2 (Union[List[Union[str, int]], List[List[Union[str, int]]]]) – A list of predicted labels (a list of lists if multilabel).

  • labels (Optional[List[Union[str, int]]]) – A list of all possible labels (in case not present in y1 and y2).

  • info (Optional[List[Any]]) – A list containing any additional info about each sample.

  • multilabel (bool) – Indicates whether each sample can have multiple labels.

Returns

A confusion matrix.

Example

>>> truth = ["cat", "dog", "cat", "dog", "fish"]
>>> pred = ["cat", "dog", "dog", "cat", "fish"]
>>> ConfusionMatrix.from_sparse(truth, pred)
  | 0 1 2     index | label
---------     -------------
0 | 1 1 0         0 |   cat
1 | 1 1 0         1 |   dog
2 | 0 0 1         2 |  fish
---------     -------------
label_pair_info(label_1, label_2)

Get a sample information by label pair.

Parameters
  • label_1 (Union[int, str]) – A true label.

  • label_2 (Union[int, str]) – A predicted label.

Return type

List[Any]

Returns

A list of info for samples that had a true label of label_1 and predicted label of label_2.

Raises

ValueError – if label not present.

Example

>>> truth = ["cat", "dog", "cat", "dog", "fish"]
>>> pred = ["cat", "dog", "dog", "cat", "fish"]
>>> filenames = ["img0.jpg", "img1.jpg", "img2.jpg", "img3.jpg", "img4.jpg"]
>>> cm = ConfusionMatrix.from_sparse(truth, pred, info=filenames)
>>> cm.label_pair_info("cat", "dog")
['img2.jpg']
most_confused()

Get a list of label confusions and counts.

Return type

List[Tuple[Union[int, str], Union[int, str], int]]

Returns

A list of tuples of format (label1, label1, number of confusions).

>>> truth = ["cat", "dog", "cat", "dog", "fish"]
>>> pred = ["cat", "cat", "dog", "cat", "fish"]
>>> cm = ConfusionMatrix.from_sparse(truth, pred)
>>> cm.most_confused()
[('dog', 'cat', 2), ('cat', 'dog', 1)]