@@ -129,36 +129,6 @@ Sites which are not used for inference will
129
129
still be included in the final tree sequence, with mutations at those sites being placed
130
130
onto branches by {meth}` parsimony<tskit.Tree.map_mutations> ` .
131
131
132
- ### Masks
133
-
134
- It is also possible to * completely* exclude sites and samples, by specifing a boolean
135
- ` site_mask ` and/or a ` sample_mask ` when creating the ` VariantData ` object. Sites or samples with
136
- a mask value of ` True ` will be completely omitted both from inference and the final tree sequence.
137
- This can be useful, for example, if you wish to select only a subset of the chromosome for
138
- inference, e.g. to reduce computational load. You can also use it to subset inference to a
139
- particular contig, if your dataset contains multiple contigs. Note that if a ` site_mask ` is provided,
140
- the ancestral states array should only specify alleles for the unmasked sites.
141
-
142
- Below, for instance, is an example of including only sites up to position six in the contig
143
- labelled "chr1" in the ` example_data.vcz ` file:
144
-
145
- ``` {code-cell}
146
- import numpy as np
147
-
148
- # mask out any sites not associated with the contig named "chr1"
149
- # (for demonstration: all sites in this .vcz file are from "chr1" anyway)
150
- chr1_index = np.where(vcf_zarr.contig_id[:] == "chr1")[0]
151
- site_mask = vcf_zarr.variant_contig[:] != chr1_index
152
- # also mask out any sites with a position >= 80
153
- site_mask[vcf_zarr.variant_position[:] >= 80] = True
154
-
155
- smaller_vdata = tsinfer.VariantData(
156
- "_static/example_data.vcz",
157
- ancestral_state="ancestral_state",
158
- site_mask=site_mask,
159
- )
160
- print(f"The `smaller_vdata` object returns data for only {smaller_vdata.num_sites} sites")
161
- ```
162
132
163
133
### Topology inference
164
134
@@ -257,6 +227,40 @@ software such as [tsdate](https://tskit.dev/software/tsdate.html): the _tsinfer_
257
227
algorithm is only intended to infer the genetic relationships between the samples
258
228
(i.e. the * topology* of the tree sequence).
259
229
230
+ ### Masks
231
+
232
+ It is possible to * completely* exclude sites and samples, by specifing a boolean
233
+ ` site_mask ` and/or a ` sample_mask ` when creating the ` VariantData ` object. Sites or samples with
234
+ a mask value of ` True ` will be completely omitted both from inference and the final tree sequence.
235
+ This can be useful, for example, if you wish to select only a subset of the chromosome for
236
+ inference, e.g. to reduce computational load. You can also use it to subset inference to a
237
+ particular contig, if your dataset contains multiple contigs. Note that if a ` site_mask ` is provided,
238
+ the ancestral states array should only specify alleles for the unmasked sites.
239
+
240
+ Below, for instance, is an example of including only sites up to position six in the contig
241
+ labelled "chr1" in the ` example_data.vcz ` file:
242
+
243
+ ``` {code-cell}
244
+ import numpy as np
245
+ import zarr
246
+
247
+ vcf_zarr = zarr.open("_static/example_data.vcz")
248
+
249
+ # mask out any sites not associated with the contig named "chr1"
250
+ # (for demonstration: all sites in this .vcz file are from "chr1" anyway)
251
+ chr1_index = np.where(vcf_zarr.contig_id[:] == "chr1")[0]
252
+ site_mask = vcf_zarr.variant_contig[:] != chr1_index
253
+ # also mask out any sites with a position >= 80
254
+ site_mask[vcf_zarr.variant_position[:] >= 80] = True
255
+
256
+ smaller_vdata = tsinfer.VariantData(
257
+ "_static/example_data.vcz",
258
+ ancestral_state="ancestral_state",
259
+ site_mask=site_mask,
260
+ )
261
+ print(f"The `smaller_vdata` object returns data for only {smaller_vdata.num_sites} sites")
262
+ ```
263
+
260
264
261
265
(sec_usage_simulation_example)=
262
266
0 commit comments