Skip to content

How should we handle missing base qualities in the consensus caller? #1031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nh13 opened this issue Feb 20, 2025 · 6 comments · May be fixed by #1032
Open

How should we handle missing base qualities in the consensus caller? #1031

nh13 opened this issue Feb 20, 2025 · 6 comments · May be fixed by #1032
Assignees
Labels

Comments

@nh13
Copy link
Member

nh13 commented Feb 20, 2025

Consider a read where the base qualities are not stored, we could:

  1. Except, with a better message than we currently have (I believe we'll get an index out of bounds exception).
  2. Add an option to the consensus callers to use a default base quality when base qualities are missing (e.g. Q30).
  3. Add an option to the consensus callers where we can provide a default base quality when base qualities are missing (e.g. Q30). By default it is None, so base qualities are required, but used only when explicitly specified.
  4. Others?

The issue with (1) is that would force folks to filter out such reads if they want to get the consensus caller to work. The issue with (2 )is that someone produces a BAM file accidentally with missing qualities and fgbio handles it gracefully, and they would have preferred to have known about it (no one reads the logs).

@clintval
Copy link
Member

clintval commented Feb 20, 2025

@nh13 passing by comment here. Would there be a way to let the missing qualities truly be missing qualities and make the consensus probability math tolerant of the missing term? So when there are no base qualities, then a simpler consensus calling algorithm could be used that simply is unaware of the base quality? Then we wouldn't need to make a dummy value that could potentially affect downstream probabilities and their meaning. Curious what you think.

@nh13 nh13 self-assigned this Mar 5, 2025
@nh13
Copy link
Member Author

nh13 commented Mar 5, 2025

I think providing a better exception is the first PR, then we can look at what the best option is.

@nh13
Copy link
Member Author

nh13 commented Mar 5, 2025

#1040

@tfenne
Copy link
Member

tfenne commented Mar 7, 2025

@nh13 you don't answer @clintval saying a better exception is the next step ... then implement a broader solution in the PR 😛

I think there are more options we might consider first:

  • Discard any input reads that have no qualities and log the number of such reads discarded
  • For any molecule that has > 0 input reads with no qualities, have the output consensus read have no quality score(!)

in addition to the two you have:

  • continue to fail, but with a better exception message
  • add a default base quality to be used when qualities are absent

@nh13
Copy link
Member Author

nh13 commented Mar 7, 2025

The more I think about it, the less I like an opt-in default base quality. I mean the user could instead modify the BAM upstream to add in default base qualities and the tool would work. Are we endorsing such a solution if we add the option in to have a default base quality: #1032

@tfenne
Copy link
Member

tfenne commented Mar 7, 2025

Yeah - I think your comment further upstream is a good one: make the error much clearer that the callers don't support reads without base qualities, get that merged, then we can think about if we want to do anything in addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants