Update handling of INFO/END when writing records in VCF #1201
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is meant to resolve issue #1200. In particular, it changes the way the
writemethod forVariantFilehandles reasoning about when to writeINFO/ENDor not. Previously, the code attempted to check to write this only when there were symbolic alleles, but ended up only writing this for insertions when asked for explicitly.The code change now decides to exclude writing
INFO/ENDif it's not present in the header, but will write in all cases when included in the header. This should allow users to updateENDvalues usingrecord.stop, like in the following examples.Start with
example.vcf.gzas:Here are two blocks of Python code run on it with their respective outputs:
In this case, the output matches
example.vcf.gzdespite editing therecord.stoppositions, because theENDfield is not defined in the header. However, this code block:produces the following output:
So the user can control whether
ENDshould appear in the INFO fields by toggling whether it should be included in the header or not, and then access it viarecord.stopas usual. I think this makes more conceptual sense to check whether to print the field or not based on the header values. Since thesyncmethod uses the same formula for determining theENDcoordinate, it should be consistent with the existing paradigm in the other field setters.