Performance Metrics¶
calculate_performance_measures¶
- cherimoya.performance.calculate_performance_measures(logps, true_counts, pred_log_counts, labels=None, kernel_sigma=7, kernel_width=81, smooth_true=False, smooth_predictions=False, measures=None)[source]¶
Calculates a set of performance measures given true and observed data.
This function will take in observed readouts, predicted profiles, and predicted counts, and calculate a series of specified performance measures on them. Each performance measure could be calculated individually using its function, but this function provides a wrapper around running any number of them. The measures one can choose are:
Profile performance measures:
profile_mnll: the multinomial log-likelihood of the observed profile given the predicted logits.profile_jsd: the Jensen-Shannon divergence between the observed profile and the predicted probabilities.profile_pearson: the Pearson correlation between the observed profile and the predicted probabilities.profile_spearman: the Spearman correlation between the observed profiles and the predicted probabilities.
Count performance measures:
count_pearson: the Pearson correlation between the observed log counts and the predicted log counts.count_spearman: the Spearman correlation between the observed log counts and the predicted log counts.count_mse: the mean-squared error between the observed log counts and the predicted log counts.
Optionally, one can choose to smooth the observed data before calculating the profile correlations and JSD. It is important to note that this smoothing is not being done on the predictions, but on the observed bp-resolution counts, with the reasoning being that these counts are sparse due to their bp resolution nature. The smoothing is done according to a Gaussian with a sigma and kernel width as specified.
- Parameters:
logps (torch.Tensor, shape=(n, n_strands, length)) – The predicted logits or log probabilities for each basepir for each strand. If the predictions are unstranded, this dimension must be 1.
true_counts (torch.Tensor, shape=(n, n_strands, length)) – The integer counts of the number of reads per basepair in the observed data.
pred_log_counts (torch.Tensor, shape=(n, n_outputs)) – The predicted log counts for each example.
kernel_sigma (int, optional) – If smoothing the observed profile, the sigma to use in the Gaussian smoothing. Default is 7.
kernel_width (int, optional) – If smoothing the observed profile, the kernel width to use in the Gaussian smoothing. Default is 81.
smooth_true (bool, optional) – Whether to smooth the observed data using a Gassian kernel. Default is False.
smooth_predictions (bool, optional) – Whether to smooth the predicted values using a Gaussian kernel. Default is False.
measures (None or list, optional) – If a list of strings, each string should correspond to a performance measure to calculate. If None, calculate all performance measures.
- Returns:
measures_ – A dictionary where the keys are the names of performance measures and the values are tensors containing the values. Each profile performance measure will have the shape (n, 1) and each count performance measure will have the shape (1,).
- Return type:
dict of torch.Tensors
Profile Metrics¶
- cherimoya.performance.pearson_corr(arr1, arr2)[source]¶
The Pearson correlation between two tensors across the last axis.
Computes the Pearson correlation in the last dimension of arr1 and arr2. arr1 and arr2 must be the same shape. For example, if they are both A x B x L arrays, then the correlation of corresponding L-arrays will be computed and returned in an A x B array.
- Parameters:
arr1 (torch.Tensor) – One of the tensor to correlate.
arr2 (torch.Tensor) – The other tensor to correlation.
- Returns:
correlation – The correlation for each element where n is arr1.shape[-1].
- Return type:
torch.Tensor
- cherimoya.performance.spearman_corr(arr1, arr2)[source]¶
The Spearman correlation between two tensors across the last axis.
Computes the Spearman correlation in the last dimension of arr1 and arr2. arr1 and arr2 must be the same shape. For example, if they are both A x B x L arrays, then the correlation of corresponding L-arrays will be computed and returned in an A x B array.
A dense ordering is used and ties are broken based on position in the tensor.
- Parameters:
arr1 (torch.Tensor) – One of the tensor to correlate. This can be any number of dimensions but the MSE is calculated across the last dimension.
arr2 (torch.Tensor) – The other tensor to correlation.
- Returns:
correlation – The correlation for each element.
- Return type:
torch.Tensor
- cherimoya.performance.jensen_shannon_distance(logps, true_counts)[source]¶
The Jensen-Shannon distance between two tensors across the last axis.
Computes the Jensen-Shannon distance in the last dimension of logps and true_counts. These two tensors must be the same shape. For example, if they are both A x B x L arrays, then the KL divergence of corresponding L-arrays will be computed and returned in an A x B array. This will renormalize the arrays so that each subarray sums to 1. If the sum of a subarray is 0, then the resulting JSD will be NaN.
- Parameters:
logps (torch.Tensor) – A tensor of log probabilities or logits. This must be in log space.
true_counts (torch.Tensor) – A tensor of true integer counts at each position.
- Returns:
jsd – The Jensen-Shannon divergence for each element.
- Return type:
torch.Tensor
- cherimoya.performance.mean_squared_error(arr1, arr2)[source]¶
The mean squared error between two tensors averaged along the last axis.
Computes the element-wise squared error between two tensors and averages these across the last dimension. arr1 and arr2 must be the same shape. For example, if they are both A x B x L arrays, then the MSE of corresponding L-arrays will be computed and returned in an A x B array.
- Parameters:
arr1 (torch.Tensor) – A tensor of values.
arr2 (torch.Tensor) – Another tensor of values with the same shape as arr1.
- Returns:
mse – The L2 distance between two tensors.
- Return type:
torch.Tensor
Smoothing¶
- cherimoya.performance.smooth_gaussian1d(x, kernel_sigma, kernel_width)[source]¶
Smooth a signal along the sequence length axis.
This function is a replacement for the scipy.ndimage.gaussian1d function that works on PyTorch tensors. It applies a Gaussian kernel to each position which is equivalent to applying a convolution across the sequence with weights equal to that of a Gaussian distribution. Each sequence, and each channel within the sequence, is smoothed independently.
- Parameters:
- Returns:
x_smooth – The smoothed tensor.
- Return type:
torch.tensor, shape=(n_sequences, n_channels, seq_len)
- cherimoya.performance.batched_smoothed_function(logps, true_counts, f, smooth_predictions=False, smooth_true=False, kernel_sigma=7, kernel_width=81, exponentiate_logps=False, batch_size=200)[source]¶
Batch a calculation with optional smoothing.
Given a set of predicted and true values, apply some function to them in a batched manner and store the results. Optionally, either the true values or the predicted ones can be smoothed.
- Parameters:
logps (torch.tensor) – A tensor of the predicted log probability values.
true_counts (torch.tensor) – A tensor of the true values, usually integer counts.
f (function) – A function to be applied to the predicted and true values.
smooth_predictions (bool, optional) – Whether to apply a Gaussian filter to the predictions. Default is False.
smooth_true (bool, optional) – Whether to apply a Gaussian filter to the true values. Default is False.
kernel_sigma (float, optional) – The standard deviation of the Gaussian to be applied. Default is 7.
kernel_width (int, optional) – The width of the kernel to be applied. Default is 81.
exponentiate_logps (bool, optional) – Whether to exponentiate each batch of log probabilities. Default is False.
batch_size (int, optional) – The number of examples in each batch to evaluate at a time. Default is 200.
- Returns:
results – The results of applying the function to the tensor.
- Return type:
torch.tensor