Skip to content

Commit bd98456

Browse files
benjefferymergify[bot]
authored andcommitted
Move masks to later in docs
1 parent 4098b56 commit bd98456

File tree

1 file changed

+34
-30
lines changed

1 file changed

+34
-30
lines changed

docs/usage.md

Lines changed: 34 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -129,36 +129,6 @@ Sites which are not used for inference will
129129
still be included in the final tree sequence, with mutations at those sites being placed
130130
onto branches by {meth}`parsimony<tskit.Tree.map_mutations>`.
131131

132-
### Masks
133-
134-
It is also possible to *completely* exclude sites and samples, by specifing a boolean
135-
`site_mask` and/or a `sample_mask` when creating the `VariantData` object. Sites or samples with
136-
a mask value of `True` will be completely omitted both from inference and the final tree sequence.
137-
This can be useful, for example, if you wish to select only a subset of the chromosome for
138-
inference, e.g. to reduce computational load. You can also use it to subset inference to a
139-
particular contig, if your dataset contains multiple contigs. Note that if a `site_mask` is provided,
140-
the ancestral states array should only specify alleles for the unmasked sites.
141-
142-
Below, for instance, is an example of including only sites up to position six in the contig
143-
labelled "chr1" in the `example_data.vcz` file:
144-
145-
```{code-cell}
146-
import numpy as np
147-
148-
# mask out any sites not associated with the contig named "chr1"
149-
# (for demonstration: all sites in this .vcz file are from "chr1" anyway)
150-
chr1_index = np.where(vcf_zarr.contig_id[:] == "chr1")[0]
151-
site_mask = vcf_zarr.variant_contig[:] != chr1_index
152-
# also mask out any sites with a position >= 80
153-
site_mask[vcf_zarr.variant_position[:] >= 80] = True
154-
155-
smaller_vdata = tsinfer.VariantData(
156-
"_static/example_data.vcz",
157-
ancestral_state="ancestral_state",
158-
site_mask=site_mask,
159-
)
160-
print(f"The `smaller_vdata` object returns data for only {smaller_vdata.num_sites} sites")
161-
```
162132

163133
### Topology inference
164134

@@ -257,6 +227,40 @@ software such as [tsdate](https://tskit.dev/software/tsdate.html): the _tsinfer_
257227
algorithm is only intended to infer the genetic relationships between the samples
258228
(i.e. the *topology* of the tree sequence).
259229

230+
### Masks
231+
232+
It is possible to *completely* exclude sites and samples, by specifing a boolean
233+
`site_mask` and/or a `sample_mask` when creating the `VariantData` object. Sites or samples with
234+
a mask value of `True` will be completely omitted both from inference and the final tree sequence.
235+
This can be useful, for example, if you wish to select only a subset of the chromosome for
236+
inference, e.g. to reduce computational load. You can also use it to subset inference to a
237+
particular contig, if your dataset contains multiple contigs. Note that if a `site_mask` is provided,
238+
the ancestral states array should only specify alleles for the unmasked sites.
239+
240+
Below, for instance, is an example of including only sites up to position six in the contig
241+
labelled "chr1" in the `example_data.vcz` file:
242+
243+
```{code-cell}
244+
import numpy as np
245+
import zarr
246+
247+
vcf_zarr = zarr.open("_static/example_data.vcz")
248+
249+
# mask out any sites not associated with the contig named "chr1"
250+
# (for demonstration: all sites in this .vcz file are from "chr1" anyway)
251+
chr1_index = np.where(vcf_zarr.contig_id[:] == "chr1")[0]
252+
site_mask = vcf_zarr.variant_contig[:] != chr1_index
253+
# also mask out any sites with a position >= 80
254+
site_mask[vcf_zarr.variant_position[:] >= 80] = True
255+
256+
smaller_vdata = tsinfer.VariantData(
257+
"_static/example_data.vcz",
258+
ancestral_state="ancestral_state",
259+
site_mask=site_mask,
260+
)
261+
print(f"The `smaller_vdata` object returns data for only {smaller_vdata.num_sites} sites")
262+
```
263+
260264

261265
(sec_usage_simulation_example)=
262266

0 commit comments

Comments
 (0)