Skip to content

Conversation

@bentyeh
Copy link

@bentyeh bentyeh commented Nov 8, 2025

Added validation for values in Hex format tags, using regular expression defined in the SAM specification.


Currently, set_tag() does not perform any validation of the tag value for Hex format tags (where value_type (typecode) is 'H'). This means that pysam allows writing reads to a SAM/BAM file that may result in errors when trying to read that same read via pysam.AlignmentFile.fetch().

Specifically, pysam.AlignmentFile.fetch() (via ... via pysam.AlignmentFile.cnext() via sam.c:sam_read1() via ... via sam.c:aux_parse() in htslib) checks that Hex format tag values contain an even number of characters.

pysam/htslib/sam.c

Lines 2835 to 2839 in 2f9d50d

} else if (type == 'Z' || type == 'H') {
char *end = strchr(q, '\t');
if (!end) end = q + strlen(q);
_parse_err(type == 'H' && ((end-q)&1) != 0,
"hex field does not have an even number of digits");

Therefore, the proposed patch (which follows the SAM specification) for the write side of the pysam interface will be a bit stricter than what the read side checks for. Nonetheless, this helps ensure round-trip compatibility of reads written by pysam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant