Filtering is done by PLINK or TASSEL with these following steps
1) Filtering taxa | Remove all other varieties, keep only exist in your phenotype file ** Missing and MAF is calculated from this subset, not from the whole set |
2) Filtering missing | Remove SNPs having missing value more than specified (default > 10%) For TASSEL, since decimal number cannot be input as minimum count, the number is rounded for missing filtering. For example, if you have 426 varieties of phenotype, missing 0.1 will be set for minimum count as round(426 * (1 - 0.1)) = round(383.4) = 383 If you have 352 varieties of phenotype and set missing for 0.2, the minimum count will be round(352 * (1 - 0.2)) = round(281.6) = 282 |
3) Filtering MAF | Remove SNPs having Minor Allele Frequency less than specified (default < 5%) |
4) Prunning by LD | Remove SNPs residing in the same linkage, determined by pairwise correlation (r2) (default > 0.1) |
5) Thinning by distance | Remove SNPs having distance less than specified (default < 1k bases) |
Phenotype file to be uploaded must be in this following format
<Trait> | Trait_1 | Trait_2 | ... |
W00XXX | value | value | ... |
W00XXX | value | value | ... |
... |