Skip to content

Add IBL's "noise cutoff" quality metric #3984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

huizizhang949
Copy link

Add a new metric "noise cutoff" based on IBL https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522?file=49783080 in a new function compute_noise_cutoffs.

The original implementation is here https://github.com/int-brain-lab/ibllib/blob/2e1f91c622ba8dbd04fc53946c185c99451ce5d6/brainbox/metrics/single_units.py.

The "noise cutoff" metric assesses whether a unit’s spike‐amplitude distribution is truncated
at the low-end, which may be due to the high amplitude detection threhold in the deconvolution step,
i.e., if low‐amplitude spikes were missed. It does not assume a Gaussian shape;
instead, it directly compares counts in the low‐amplitude bins to counts in high‐amplitude bins (cutoff) and the peak bin (ratio).

Here are examples with no truncation and truncation, which give low and high values (cutoff and ratio), respectively.
example_cutoff

I have added documentation and tests.

@alejoe91 alejoe91 added the qualitymetrics Related to qualitymetrics module label Jun 12, 2025
@alejoe91 alejoe91 requested a review from chrishalcrow June 12, 2025 13:55
Copy link
Member

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments, suggestions, questions :)

sorting_analyzer=sorting_analyzer
high_quantile=0.25,
low_quantile=0.10,
n_bins=100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no discussion of what number of bins means for this metric. Might be worth adding. Why not do 10, or 1000 bins?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a comment to the docs. This is also the default value used in IBL's implementation.

Comment on lines 93 to 94
if peak_sign == "both":
raise TypeError('peak_sign should either be "pos" or "neg"!')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this error occur right after peak sign so we don't waste time doing the _get_amplitudes_by_units function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also how does a user know what the peak sign is? Where did it come from. I would consider making this note refer to the peak sign of the spike_amplitudes extension.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment about where to change the peak_sign in the error message. I moved the error before _get_amplitudes_by_units.

amps : array-like
Spike amplitudes.
high_quantile : float, default: 0.25
Quantile of the amplitude range above which values are treated as "high" (e.g. 0.25 = top 25%), the reference region.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optionally thinking about this should we also put a note about the top 25% is really the 75 percentile? Not sure about this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 'eg. 0.25 = top 25%' makes this clear.

@zm711
Copy link
Member

zm711 commented Jun 13, 2025

From a profiling perspective is there any benefit to thinking about doing a numba/numpy dual strategy? It seems like _noise_cutoff is all numpy based so could be sped up with a numba version. That would complicate the PR strategy, but would be curious about the profiling. It would also be an excellent exercise in profiling code performance. Definitely not required unless this is really slow. That's the reason we had to shut off the nn metrics as a default for qm stuff.

'`peak_sign` should either be "pos" or "neg". You can set `peak_sign` as an argument when you compute spike_amplitudes.'
)

amplitudes_by_units = _get_amplitudes_by_units(sorting_analyzer, unit_ids, peak_sign)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets have in mind that this _get_amplitudes_by_units is not efficient because it internal use mask on all units.

amplitude_extension.get_data(outputs="by_units") is really faster for this. We should use it here and also inside
_get_amplitudes_by_units the only pain is that amplitude will be splited over segments but concatenate is fast.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just tried to implement amplitude_extension.get_data(outputs="by_units") and then concatenated, but it became very complex. We propose that there should be another PR to make _get_amplitudes_by_units to use the approach suggested.

@samuelgarcia
Copy link
Member

Thank you for this, this is really nice.
This is a bit similar for the global question that "amplitude_cutoff" but with a differents approach and implementation.
The actual "amplitude_cutoff" is not super reliable.
This looks easier.
I am wondering how much this ratio is able to discriminate a real cut in the distribution like you have on the right plots and a distribution with a positive skewness coeeficient.

@zm711
Copy link
Member

zm711 commented Jun 16, 2025

The actual "amplitude_cutoff" is not super reliable.

related to this, I don't think I've ever got amplitude_cutoff to work on my data. I always get nans for it. So this could be an interesting if it works more often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qualitymetrics Related to qualitymetrics module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants