Title: | Analysing 'SNP' Data to Support Captive Breeding |
---|---|
Description: | Functions are provided that facilitate the analysis of SNP (single nucleotide polymorphism) data to answer questions regarding captive breeding and relatedness between individuals. 'dartR.captive' is part of the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>. |
Authors: | Bernd Gruber [aut, cre], Arthur Georges [aut], Jose L. Mijangos [aut], Carlo Pacioni [aut], Peter J. Unmack [ctb], Oliver Berry [ctb], Lindsay V. Clark [ctb], Floriaan Devloo-Delva [ctb], Eric Archer [ctb] |
Maintainer: | Bernd Gruber <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.75 |
Built: | 2025-01-21 04:42:10 UTC |
Source: | https://github.com/cran/dartR.captive |
This function takes one individual and estimates their probability of coming from individual populations from multilocus genotype frequencies.
gl.assign.grm(x, unknown, verbose = NULL)
gl.assign.grm(x, unknown, verbose = NULL)
x |
Name of the genlight object containing the SNP data [required]. |
unknown |
Name of the individual to be assigned to a population [required]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
This function is a re-implementation of the function multilocus_assignment from package gstudio. Description of the method used in this function can be found at: https://dyerlab.github.io/applied_population_genetics/population-assignment.html
A data.frame
consisting of assignment probabilities for each
population.
Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr
require("dartR.data") if ((requireNamespace("rrBLUP", quietly = TRUE)) & (requireNamespace("gplots", quietly = TRUE))) { res <- gl.assign.grm(platypus.gl, unknown = "T27") }
require("dartR.data") if ((requireNamespace("rrBLUP", quietly = TRUE)) & (requireNamespace("gplots", quietly = TRUE))) { res <- gl.assign.grm(platypus.gl, unknown = "T27") }
This script assigns an individual of unknown provenance to one or more target populations based on the unknown individual's proximity to population centroids; proximity is estimated using Mahalanobis Distance.
The following process is followed:
An ordination is undertaken on the populations to again yield a series of orthogonal (independent) axes.
A workable subset of dimensions is chosen, that specified, or equal to the number of dimensions with substantive eigenvalues, whichever is the smaller.
The Mahalobalis Distance is calculated for the unknown against each population and probability of membership of each population is calculated. The assignment probabilities are listed in support of a decision.
gl.assign.mahalanobis( x, dim.limit = 2, plevel = 0.999, plot.out = TRUE, unknown, verbose = NULL )
gl.assign.mahalanobis( x, dim.limit = 2, plevel = 0.999, plot.out = TRUE, unknown, verbose = NULL )
x |
Name of the input genlight object [required]. |
dim.limit |
Maximum number of dimensions to consider for the confidence ellipses [default 2] |
plevel |
Probability level for bounding ellipses [default 0.999]. |
plot.out |
If TRUE, produces a plot showing the position of the unknown in relation to putative source populations [default TRUE] |
unknown |
Identity label of the focal individual whose provenance is unknown [required]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
There are three considerations to assignment. First, consider only those populations for which the unknown has no private alleles. Private alleles are an indication that the unknown does not belong to a target population (provided that the sample size is adequate, say >=10). This can be evaluated with gl.assign.pa().
A next step is to consider the PCoA plot for populations where no private alleles have been detected. The position of the unknown in relation to the confidence ellipses is plotted by this script as a basis for narrowing down the list of putative source populations. This can be evaluated with gl.assign.pca().
The third step (delivered by this script) is to consider the assignment probabilities based on the squared Generalised Linear Distance (Mahalanobis distance) of the unknown from the centroid for each population, then to consider the probability associated with its quantile using the Chisquare approximation. In effect, this index takes into account position of the unknown in relation to the confidence envelope in all selected dimensions of the ordination. The larger the assignment probability, the greater the confidence in the assignment.
If dim.limit is set to 2, to correspond with the dimensions used in gl.assign.pa(), then the output provides a ranking of the final set of putative source populations.
If dim.limit is set to be > 2, then this script provides a basis for further narrowing the set of putative populations.If the unknown individual is an extreme outlier, say at less than 0.001 probability of population membership (0.999 confidence envelope), then the associated population can be eliminated from further consideration.
Warning: gl.assign.mahal() treats each specified dimension equally, without regard to the percentage variation explained after ordination. If the unknown is an outlier in a lower dimension with an explanatory variance of, say, 0.1 dimensions from the ordination.
Each of these above approaches provides evidence, none are 100 They need to be interpreted cautiously.
In deciding the assignment, the script considers an individual to be an outlier with respect to a particular population at alpha = 0.001 as default
A data frame with the results of the assignment analysis.
Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr
# Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown = "UC_01044", nmin = 10, threshold = 1 ) test_2 <- gl.assign.pca(test, unknown = "UC_01044", plevel = 0.95) df <- gl.assign.mahalanobis(test_2, unknown = "UC_01044")
# Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown = "UC_01044", nmin = 10, threshold = 1 ) test_2 <- gl.assign.pca(test, unknown = "UC_01044", plevel = 0.95) df <- gl.assign.mahalanobis(test_2, unknown = "UC_01044")
This script eliminates from consideration as putative source populations, those populations for which the individual has too many private alleles. The populations that remain are putative source populations, subject to further consideration.
The algorithm identifies those target populations for which the individual has no private alleles or for which the number of private alleles does not exceed a user specified threshold.
An excessive count of private alleles is an indication that the unknown does not belong to a target population (provided that the sample size is adequate, say >=10).
gl.assign.pa( x, unknown, nmin = 10, threshold = 0, n.best = NULL, verbose = NULL )
gl.assign.pa( x, unknown, nmin = 10, threshold = 0, n.best = NULL, verbose = NULL )
x |
Name of the input genlight object [required]. |
unknown |
SpecimenID label (indName) of the focal individual whose provenance is unknown [required]. |
nmin |
Minimum sample size for a target population to be included in the analysis [default 10]. |
threshold |
Populations to retain for consideration; those for which the focal individual has less than or equal to threshold loci with private alleles [default 0]. |
n.best |
If given a value, dictates the best n=n.best populations to retain for consideration (or more if their are ties) based on private alleles [default NULL]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
A genlight object containing the focal individual (assigned to population 'unknown') and populations for which the focal individual is not distinctive (number of loci with private alleles less than or equal to the threshold). If no such populations, the genlight object contains only data for the unknown individual.
Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr
# Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown = "UC_00146", nmin = 10, threshold = 1)
# Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown = "UC_00146", nmin = 10, threshold = 1)
This script assigns an individual of unknown provenance to one or more target populations based on its proximity to each population defined by a confidence ellipse in ordinated space of two dimensions.
The following process is followed:
The space defined by the loci is ordinated to yield a series of orthogonal axes (independent), and the top two dimensions are considered. Populations for which the unknown lies outside the specified confidence limits are no longer removed from the dataset.
gl.assign.pca(x, unknown, plevel = 0.999, plot.out = TRUE, verbose = NULL)
gl.assign.pca(x, unknown, plevel = 0.999, plot.out = TRUE, verbose = NULL)
x |
Name of the input genlight object [required]. |
unknown |
Identity label of the focal individual whose provenance is unknown [required]. |
plevel |
Probability level for bounding ellipses in the PCoA plot [default 0.999]. |
plot.out |
If TRUE, plot the 2D PCA showing the position of the unknown [default TRUE] |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
There are three considerations to assignment. First, consider only those populations for which the unknown has no private alleles. Private alleles are an indication that the unknown does not belong to a target population (provided that the sample size is adequate, say >=10). This can be evaluated with gl.assign.pa().
A next step is to consider the PCoA plot for populations where no private alleles have been detected and the position of the unknown in relation to the confidence ellipses as is plotted by this script. Note, this plot is considering only the top two dimensions of the ordination, and so an unknown lying outside the confidence ellipse can be unambiguously interpreted as it lying outside the confidence envelope. However, if the unknown lies inside the confidence ellipse in two dimensions, then it may still lie outside the confidence envelope in deeper dimensions. This second step is good for eliminating populations from consideration, but does not provide confidence in assignment.
The third step is to consider the assignment probabilities, using the script gl.assign.mahalanobis(). This approach calculates the squared Generalised Linear Distance (Mahalanobis distance) of the unknown from the centroid for each population, and calculates the probability associated with its quantile under the zero truncated normal distribution. This index takes into account position of the unknown in relation to the confidence envelope in all selected dimensions of the ordination.
Each of these approaches provides evidence, none are 100 need to be interpreted cautiously. They are best applied sequentially.
In deciding the assignment, the script considers an individual to be an outlier with respect to a particular population at alpha = 0.001 as default.
A genlight object containing only those populations that are putative source populations for the unknown individual.
Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr
# Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown = "UC_00146", nmin = 10, threshold = 1, verbose = 3 ) test_2 <- gl.assign.pca(test, unknown = "UC_00146", plevel = 0.95, verbose = 3)
# Test run with a focal individual from the Macleay River (EmmacMaclGeor) test <- gl.assign.pa(testset.gl, unknown = "UC_00146", nmin = 10, threshold = 1, verbose = 3 ) test_2 <- gl.assign.pca(test, unknown = "UC_00146", plevel = 0.95, verbose = 3)
This script removes individuals suspected of being related as
parent-offspring,using the output of the function
gl.report.parent.offspring
, which examines the frequency of
pedigree inconsistent loci, that is, those loci that are homozygotes in the
parent for the reference allele, and homozygous in the offspring for the
alternate allele. This condition is not consistent with any pedigree,
regardless of the (unknown) genotype of the other parent.
The pedigree inconsistent loci are counted as an indication of whether or not
it is reasonable to propose the two individuals are in a parent-offspring
relationship.
gl.filter.parent.offspring( x, min.rdepth = 12, min.reproducibility = 1, range = 1.5, method = "best", rm.monomorphs = FALSE, plot_theme = theme_dartR(), plot_colors = gl.colors(2), plot.file = NULL, plot.dir = NULL, verbose = NULL )
gl.filter.parent.offspring( x, min.rdepth = 12, min.reproducibility = 1, range = 1.5, method = "best", rm.monomorphs = FALSE, plot_theme = theme_dartR(), plot_colors = gl.colors(2), plot.file = NULL, plot.dir = NULL, verbose = NULL )
x |
Name of the genlight object containing the SNP genotypes [required]. |
min.rdepth |
Minimum read depth to include in analysis [default 12]. |
min.reproducibility |
Minimum reproducibility to include in analysis [default 1]. |
range |
Specifies the range to extend beyond the interquartile range for delimiting outliers [default 1.5 interquartile ranges]. |
method |
Method of selecting the individual to retain from each pair of parent offspring relationship, 'best' (based on CallRate) or 'random' [default 'best']. |
rm.monomorphs |
If TRUE, remove monomorphic loci after filtering individuals [default FALSE]. |
plot_theme |
Theme for the plot. See Details for options [default theme_dartR()]. |
plot_colors |
List of two color names for the borders and fill of the plots [default gl.colors(2)]. |
plot.file |
Name for the RDS binary file to save (base name only, exclude extension) [default NULL] |
plot.dir |
Directory to save the plot RDS files [default as specified by the global working directory or tempdir()] |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
If two individuals are in a parent offspring relationship, the true number of
pedigree inconsistent loci should be zero, but SNP calling is not infallible.
Some loci will be miss-called. The problem thus becomes one of determining if
the two focal individuals have a count of pedigree inconsistent loci less
than would be expected of typical unrelated individuals. There are some quite
sophisticated software packages available to formally apply likelihoods to
the decision, but we use a simple outlier comparison.
To reduce the frequency of miss-calls, and so emphasize the difference
between true parent-offspring pairs and unrelated pairs, the data can be
filtered on read depth. Typically minimum read depth is set to 5x, but you
can examine the distribution of read depths with the function
gl.report.rdepth
and push this up with an acceptable loss of
loci. 12x might be a good minimum for this particular analysis. It is
sensible also to push the minimum reproducibility up to 1, if that does not
result in an unacceptable loss of loci. Reproducibility is stored in the slot
@other$loc.metrics$RepAvg
and is defined as the proportion of
technical replicate assay pairs for which the marker score is consistent.
You can examine the distribution of reproducibility with the function
gl.report.reproducibility
.
Note that the null expectation is not well defined, and the power reduced, if
the population from which the putative parent-offspring pairs are drawn
contains many sibs. Note also that if an individual has been genotyped twice
in the dataset, the replicate pair will be assessed by this script as being
in a parent-offspring relationship.
You should run gl.report.parent.offspring
before filtering. Use
this report to decide min.rdepth and min.reproducibility and assess impact on
your dataset.
Note that if your dataset does not contain RepAvg or rdepth among the locus
metrics, the filters for reproducibility and read depth are no used.
Examples of other themes that can be used can be consulted in
the filtered genlight object without A set of individuals in parent-offspring relationship. NULL if no parent-offspring relationships were found.
Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr
gl.report.rdepth
, gl.report.reproducibility
,
gl.report.parent.offspring
out <- gl.filter.parent.offspring(testset.gl[1:10, 1:50])
out <- gl.filter.parent.offspring(testset.gl[1:10, 1:50])
This function calculates the mean probability of identity by state (IBS) across loci that would result from all the possible crosses of the individuals analyzed. IBD is calculated by an additive relationship matrix approach developed by Endelman and Jannink (2012) as implemented in the function A.mat (package rrBLUP).
gl.grm( x, plotheatmap = TRUE, palette_discrete = NULL, palette_convergent = NULL, legendx = 0, legendy = 0.5, plot.file = NULL, plot.dir = NULL, verbose = NULL, ... )
gl.grm( x, plotheatmap = TRUE, palette_discrete = NULL, palette_convergent = NULL, legendx = 0, legendy = 0.5, plot.file = NULL, plot.dir = NULL, verbose = NULL, ... )
x |
Name of the genlight object containing the SNP data [required]. |
plotheatmap |
A switch if a heatmap should be shown [default TRUE]. |
palette_discrete |
the color of populations [gl.select.colors]. |
palette_convergent |
A convergent palette for the IBD values [default convergent_palette]. |
legendx |
x coordinates for the legend[default 0]. |
legendy |
y coordinates for the legend[default 1]. |
plot.file |
Name for the RDS binary file to save (base name only, exclude extension) [default NULL] |
plot.dir |
Directory in which to save files [default = working directory] |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
... |
Parameters passed to function A.mat from package rrBLUP. |
Two or more alleles are identical by descent (IBD) if they are identical copies of the same ancestral allele in a base population. The additive relationship matrix is a theoretical framework for estimating a relationship matrix that is consistent with an approach to estimate the probability that the alleles at a random locus are identical in state (IBS).
This function also plots a heatmap, and a dendrogram, of IBD values where each diagonal element has a mean that equals 1+f, where f is the inbreeding coefficient (i.e. the probability that the two alleles at a randomly chosen locus are IBD from the base population). As this probability lies between 0 and 1, the diagonal elements range from 1 to 2. Because the inbreeding coefficients are expressed relative to the current population, the mean of the off-diagonal elements is -(1+f)/n, where n is the number of loci. Individual names are shown in the margins of the heatmap and colors represent different populations.
An identity by descent matrix
Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr
Endelman, J. B. (2011). Ridge regression and other kernels for genomic selection with r package rrblup. The Plant Genome 4, 250.
Endelman, J. B. , Jannink, J.-L. (2012). Shrinkage estimation of the realized relationship matrix. G3: Genes, Genomics, Genetics 2, 1405.
Other inbreeding functions:
gl.grm.network()
gl.grm(platypus.gl[1:10, 1:100])
gl.grm(platypus.gl[1:10, 1:100])
This script takes a G matrix generated by gl.grm
and represents
the relationship among the specimens as a network diagram. In order to use
this script, a decision is required on a threshold for relatedness to be
represented as link in the network, and on the layout used to create the
diagram.
gl.grm.network( G, x, method = "fr", node.size = 8, node.label = TRUE, node.label.size = 2, node.label.color = "black", link.color = NULL, link.size = 2, relatedness_factor = 0.125, title = "Network based on a genomic relationship matrix", palette_discrete = gl.select.colors(x, library = "brewer", palette = "PuOr", ncolors = nPop(x), verbose = 0), plot.dir = NULL, plot.file = NULL, verbose = NULL )
gl.grm.network( G, x, method = "fr", node.size = 8, node.label = TRUE, node.label.size = 2, node.label.color = "black", link.color = NULL, link.size = 2, relatedness_factor = 0.125, title = "Network based on a genomic relationship matrix", palette_discrete = gl.select.colors(x, library = "brewer", palette = "PuOr", ncolors = nPop(x), verbose = 0), plot.dir = NULL, plot.file = NULL, verbose = NULL )
G |
A genomic relationship matrix (GRM) generated by
|
x |
A genlight object from which the G matrix was generated [required]. |
method |
One of 'fr', 'kk', 'gh' or 'mds' [default 'fr']. |
node.size |
Size of the symbols for the network nodes [default 8]. |
node.label |
TRUE to display node labels [default TRUE]. |
node.label.size |
Size of the node labels [default 3]. |
node.label.color |
Color of the text of the node labels [default 'black']. |
link.color |
Colors for links [default gl.select.colors]. |
link.size |
Size of the links [default 2]. |
relatedness_factor |
Factor of relatedness [default 0.125]. |
title |
Title for the plot [default 'Network based on genomic relationship matrix']. |
palette_discrete |
A discrete set of colors with as many colors as there are populations in the dataset [default NULL]. |
plot.dir |
Directory to save the plot RDS files [default as specified by the global working directory or tempdir()] |
plot.file |
Name for the RDS binary file to save (base name only, exclude extension) [default NULL] |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity]. |
The gl.grm.network function takes a genomic relationship matrix (GRM) generated by the gl.grm function to represent the relationship among individuals in the dataset as a network diagram. To generate the GRM, the function gl.grm uses the function A.mat from package rrBLUP, which implements the approach developed by Endelman and Jannink (2012).
The GRM is an estimate of the proportion of alleles that two individuals have in common. It is generated by estimating the covariance of the genotypes between two individuals, i.e. how much genotypes in the two individuals correspond with each other. This covariance depends on the probability that alleles at a random locus are identical by state (IBS). Two alleles are IBS if they represent the same allele. Two alleles are identical by descent (IBD) if one is a physical copy of the other or if they are both physical copies of the same ancestral allele. Note that IBD is complicated to determine. IBD implies IBS, but not conversely. However, as the number of SNPs in a dataset increases, the mean probability of IBS approaches the mean probability of IBD.
It follows that the off-diagonal elements of the GRM are two times the kinship coefficient, i.e. the probability that two alleles at a random locus drawn from two individuals are IBD. Additionally, the diagonal elements of the GRM are 1+f, where f is the inbreeding coefficient of each individual, i.e. the probability that the two alleles at a random locus are IBD.
Choosing a meaningful threshold to represent the relationship between individuals is tricky because IBD is not an absolute state but is relative to a reference population for which there is generally little information so that we can estimate the kinship of a pair of individuals only relative to some other quantity. To deal with this, we can use the average inbreeding coefficient of the diagonal elements as the reference value. For this, the function subtracts 1 from the mean of the diagonal elements of the GRM. In a second step, the off-diagonal elements are divided by 2, and finally, the mean of the diagonal elements is subtracted from each off-diagonal element after dividing them by 2. This approach is similar to the one used by Goudet et al. (2018).
Below is a table modified from Speed & Balding (2015) showing kinship values, and their confidence intervals (CI), for different relationships that could be used to guide the choosing of the relatedness threshold in the function.
|Relationship|Kinship|95 |Identical twins/clones/same individual | 0.5 | - |
|Sibling/Parent-Offspring | 0.25 | (0.204, 0.296)|
|Half-sibling | 0.125 | (0.092, 0.158)|
|First cousin | 0.062 | (0.038, 0.089)|
|Half-cousin | 0.031 | (0.012, 0.055)|
|Second cousin | 0.016 | (0.004, 0.031)|
|Half-second cousin | 0.008 | (0.001, 0.020)|
|Third cousin | 0.004 | (0.000, 0.012)|
|Unrelated | 0 | - |
Four layout options are implemented in this function:
'fr' Fruchterman-Reingold layout layout_with_fr (package igraph)
'kk' Kamada-Kawai layout layout_with_kk (package igraph)
'gh' Graphopt layout layout_with_graphopt (package igraph)
'mds' Multidimensional scaling layout layout_with_mds (package igraph)
A network plot showing relatedness between individuals
Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr
Endelman, J. B. , Jannink, J.-L. (2012). Shrinkage estimation of the realized relationship matrix. G3: Genes, Genomics, Genetics 2, 1405.
Goudet, J., Kay, T., & Weir, B. S. (2018). How to estimate kinship. Molecular Ecology, 27(20), 4121-4135.
Speed, D., & Balding, D. J. (2015). Relatedness in the post-genomic era: is it still useful?. Nature Reviews Genetics, 16(1), 33-44.
Other inbreeding functions:
gl.grm()
if (requireNamespace("igraph", quietly = TRUE) & requireNamespace("rrBLUP", quietly = TRUE ) & requireNamespace("fields", quietly = TRUE)) { t1 <- possums.gl # filtering on call rate t1 <- gl.filter.callrate(t1) t1 <- gl.subsample.loc(t1, n = 100) # relatedness matrix res <- gl.grm(t1, plotheatmap = FALSE) # relatedness network res2 <- gl.grm.network(res, t1, relatedness_factor = 0.125) }
if (requireNamespace("igraph", quietly = TRUE) & requireNamespace("rrBLUP", quietly = TRUE ) & requireNamespace("fields", quietly = TRUE)) { t1 <- possums.gl # filtering on call rate t1 <- gl.filter.callrate(t1) t1 <- gl.subsample.loc(t1, n = 100) # relatedness matrix res <- gl.grm(t1, plotheatmap = FALSE) # relatedness network res2 <- gl.grm.network(res, t1, relatedness_factor = 0.125) }
This script takes a distance matrix generated by dist() and represents the relationship among the specimens as a network diagram. In order to use this script, a decision is required on a threshold for relatedness to be represented as link in the network, and on the layout used to create the diagram.
gl.plot.network( D, x = NULL, method = "fr", node.size = 3, node.label = FALSE, node.label.size = 0.7, node.label.color = "black", alpha = 0.005, title = "Network based on genetic distance", verbose = NULL )
gl.plot.network( D, x = NULL, method = "fr", node.size = 3, node.label = FALSE, node.label.size = 0.7, node.label.color = "black", alpha = 0.005, title = "Network based on genetic distance", verbose = NULL )
D |
A distance or dissimilarity matrix generated by dist() or gl.dist() [required]. |
x |
A genlight object from which the D matrix was generated [default NULL]. |
method |
One of "fr", "kk" or "drl" [default "fr"]. |
node.size |
Size of the symbols for the network nodes [default 3]. |
node.label |
TRUE to display node labels [default FALSE]. |
node.label.size |
Size of the node labels [default 0.7]. |
node.label.color |
Color of the text of the node labels [default 'black']. |
alpha |
Upper threshold to determine which links between nodes to display [default 0.005]. |
title |
Title for the plot [default "Network based on genetic distance"]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
The threshold for relatedness to be represented as a link in the network is specified as a quantile. Those relatedness measures above the quantile are plotted as links, those below the quantile are not. Often you are looking for relatedness outliers in comparison with the overall relatedness among individuals, so a very conservative quantile is used (e.g. 0.004), but ultimately, this decision is made as a matter of trial and error. One way to approach this trial and error is to try to achieve a sparse set of links between unrelated 'background' individuals so that the stronger links are preferentially shown.
There are several layouts from which to choose. The most popular are given as options in this script.
fr – Fruchterman, T.M.J. and Reingold, E.M. (1991). Graph Drawing by Force-directed Placement. Software – Practice and Experience 21:1129-1164.
kk – Kamada, T. and Kawai, S.: An Algorithm for Drawing General Undirected Graphs. Information Processing Letters 31:7-15, 1989.
drl – Martin, S., Brown, W.M., Klavans, R., Boyack, K.W., DrL: Distributed Recursive (Graph) Layout. SAND Reports 2936:1-10, 2008.
Colors of node symbols are those of the rainbow.
returns no value (i.e. NULL)
Custodian: Arthur Georges – Post to https://groups.google.com/d/forum/dartr
if ((requireNamespace("rrBLUP", quietly = TRUE)) & (requireNamespace("gplots", quietly = TRUE))) { test <- gl.subsample.loc(platypus.gl, n = 100) test <- gl.keep.ind(test, ind.list = indNames(test)[1:10]) D <- gl.grm(test, legendx = 0.04) gl.plot.network(D, test) }
if ((requireNamespace("rrBLUP", quietly = TRUE)) & (requireNamespace("gplots", quietly = TRUE))) { test <- gl.subsample.loc(platypus.gl, n = 100) test <- gl.keep.ind(test, ind.list = indNames(test)[1:10]) D <- gl.grm(test, legendx = 0.04) gl.plot.network(D, test) }
This script examines the frequency of pedigree inconsistent loci, that is, those loci that are homozygotes in the parent for the reference allele, and homozygous in the offspring for the alternate allele. This condition is not consistent with any pedigree, regardless of the (unknown) genotype of the other parent. The pedigree inconsistent loci are counted as an indication of whether or not it is reasonable to propose the two individuals are in a parent-offspring relationship.
gl.report.parent.offspring( x, min.rdepth = 12, min.reproducibility = 1, range = 1.5, plot_theme = theme_dartR(), plot_colors = gl.colors(2), plot.dir = NULL, plot.file = NULL, verbose = NULL )
gl.report.parent.offspring( x, min.rdepth = 12, min.reproducibility = 1, range = 1.5, plot_theme = theme_dartR(), plot_colors = gl.colors(2), plot.dir = NULL, plot.file = NULL, verbose = NULL )
x |
Name of the genlight object containing the SNP genotypes [required]. |
min.rdepth |
Minimum read depth to include in analysis [default 12]. |
min.reproducibility |
Minimum reproducibility to include in analysis [default 1]. |
range |
Specifies the range to extend beyond the interquartile range for delimiting outliers [default 1.5 interquartile ranges]. |
plot_theme |
Theme for the plot. See Details for options [default theme_dartR()]. |
plot_colors |
List of two color names for the borders and fill of the plots [default gl.colors(2)]. |
plot.dir |
Directory to save the plot RDS files [default as specified by the global working directory or tempdir()] |
plot.file |
Name for the RDS binary file to save (base name only, exclude extension) [default NULL] Creates a plot that shows the sex linked markers. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
If two individuals are in a parent offspring relationship, the true number of
pedigree inconsistent loci should be zero, but SNP calling is not infallible.
Some loci will be miss-called. The problem thus becomes one of determining
if the two focal individuals have a count of pedigree inconsistent loci less
than would be expected of typical unrelated individuals. There are some quite
sophisticated software packages available to formally apply likelihoods to
the decision, but we use a simple outlier comparison.
To reduce the frequency of miss-calls, and so emphasize the difference
between true parent-offspring pairs and unrelated pairs, the data can be
filtered on read depth.
Typically minimum read depth is set to 5x, but you can examine the
distribution of read depths with the function gl.report.rdepth
and push this up with an acceptable loss of loci. 12x might be a good minimum
for this particular analysis. It is sensible also to push the minimum
reproducibility up to 1, if that does not result in an unacceptable loss of
loci. Reproducibility is stored in the slot @other$loc.metrics$RepAvg
and is defined as the proportion of technical replicate assay pairs for which
the marker score is consistent. You can examine the distribution of
reproducibility with the function gl.report.reproducibility
.
Note that the null expectation is not well defined, and the power reduced, if
the population from which the putative parent-offspring pairs are drawn
contains many sibs. Note also that if an individual has been genotyped twice
in the dataset, the replicate pair will be assessed by this script as being
in a parent-offspring relationship.
The function gl.filter.parent.offspring
will filter out those
individuals in a parent offspring relationship.
Note that if your dataset does not contain RepAvg or rdepth among the locus
metrics, the filters for reproducibility and read depth are no used.
Examples of other themes that can be used can be consulted in
A set of individuals in parent-offspring relationship. NULL if no parent-offspring relationships were found.
Custodian: Arthur Georges (Post to https://groups.google.com/d/forum/dartr)
gl.report.rdepth
,gl.report.reproducibility
,
gl.filter.parent.offspring
out <- gl.report.parent.offspring(testset.gl[1:10, 1:100])
out <- gl.report.parent.offspring(testset.gl[1:10, 1:100])
Run program EMIBD9
gl.run.EMIBD9( x, outfile = "EMIBD9_Res.ibd9", outpath = tempdir(), emibd9.path = getwd(), Inbreed = TRUE, ISeed = 42, plot.out = TRUE, plot.dir = NULL, plot.file = NULL, verbose = NULL )
gl.run.EMIBD9( x, outfile = "EMIBD9_Res.ibd9", outpath = tempdir(), emibd9.path = getwd(), Inbreed = TRUE, ISeed = 42, plot.out = TRUE, plot.dir = NULL, plot.file = NULL, verbose = NULL )
x |
Name of the genlight object containing the SNP data [required]. |
outfile |
A string, giving the path and name of the output file [default "EMIBD9_Res.ibd9"]. |
outpath |
Path where to save the output file. Use outpath=getwd() or outpath='.' when calling this function to direct output files to your working or current directory [default tempdir(), mandated by CRAN]. |
emibd9.path |
Path to the folder emidb files. Please note there are 2 different executables depending on your OS: EM_IBD_P.exe (=Windows) EM_IBD_P (=Mac, Linux). You only need to pointto the folder (the function will recognise which OS you are running) [default getwd()]. |
Inbreed |
A Boolean, taking values 0 or 1 to indicate inbreeding is not and is allowed in estimating IBD coefficients [default 1]. |
ISeed |
An integer used to seed the random number generator [default 42]. |
plot.out |
A boolean that indicates whether to plot the results [default TRUE]. |
plot.dir |
Directory to save the plot RDS files [default as specified by the global working directory or tempdir()] |
plot.file |
Name for the RDS binary file to save (base name only, exclude extension) [default NULL] |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default NULL, unless specified using gl.set.verbosity] |
Download the program from here:
https://www.zsl.org/about-zsl/resources/software/emibd9
For Windows, Mac and Linux install the program then point to the folder where you find: EM_IBD_P.exe (=Windows) and EM_IBD_P (=Mac, Linux). If running really slow you may want to create the files using the function and then run in parallel using the documentation provided by the authors [you need to have mpiexec installed].
A matrix with pairwise relatedness
Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr
Wang, J. (2022). A joint likelihood estimator of relatedness and allele frequencies from a small sample of individuals. Methods in Ecology and Evolution, 13(11), 2443-2462.
## Not run: #To run this function needs EMIBD9 installed in your computer t1 <- gl.filter.allna(platypus.gl) res_rel <- gl.run.EMIBD9(t1) ## End(Not run)
## Not run: #To run this function needs EMIBD9 installed in your computer t1 <- gl.filter.allna(platypus.gl) res_rel <- gl.run.EMIBD9(t1) ## End(Not run)
This function takes one individual and estimates their probability of coming from individual populations from multilocus genotype frequencies.
utils.assignment(x, unknown, verbose = NULL)
utils.assignment(x, unknown, verbose = NULL)
x |
Name of the genlight object containing the SNP data [required]. |
unknown |
Name of the individual to be assigned to a population [required]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
This function is a re-implementation of the function multilocus_assignment from package gstudio. Description of the method used in this function can be found at: https://dyerlab.github.io/applied_population_genetics/population-assignment.html
A data.frame
consisting of assignment probabilities for each
population.
Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr
require("dartR.data") res <- utils.assignment(platypus.gl, unknown = "T27")
require("dartR.data") res <- utils.assignment(platypus.gl, unknown = "T27")
This function takes one individual and estimates their probability of coming from individual populations from multilocus genotype frequencies.
utils.assignment_2(x, unknown, verbose = NULL)
utils.assignment_2(x, unknown, verbose = NULL)
x |
Name of the genlight object containing the SNP data [required]. |
unknown |
Name of the individual to be assigned to a population [required]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
This function is a re-implementation of the function multilocus_assignment from package gstudio. Description of the method used in this function can be found at: https://dyerlab.github.io/applied_population_genetics/population-assignment.html
A data.frame
consisting of assignment probabilities for each
population.
Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr
require("dartR.data") res <- utils.assignment_2(platypus.gl, unknown = "T27")
require("dartR.data") res <- utils.assignment_2(platypus.gl, unknown = "T27")
This function takes one individual and estimates their probability of coming from individual populations from multilocus genotype frequencies.
utils.assignment_3(x, unknown, verbose = 2)
utils.assignment_3(x, unknown, verbose = 2)
x |
Name of the genlight object containing the SNP data [required]. |
unknown |
Name of the individual to be assigned to a population [required]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
This function is a re-implementation of the function multilocus_assignment from package gstudio. Description of the method used in this function can be found at: https://dyerlab.github.io/applied_population_genetics/population-assignment.html
A data.frame
consisting of assignment probabilities for each
population.
Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr
require("dartR.data") res <- utils.assignment_2(platypus.gl, unknown = "T27")
require("dartR.data") res <- utils.assignment_2(platypus.gl, unknown = "T27")
This function takes one individual and estimates their probability of coming from individual populations from multilocus genotype frequencies.
utils.assignment_4(x, unknown, verbose = 2)
utils.assignment_4(x, unknown, verbose = 2)
x |
Name of the genlight object containing the SNP data [required]. |
unknown |
Name of the individual to be assigned to a population [required]. |
verbose |
Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log; 3, progress and results summary; 5, full report [default 2, unless specified using gl.set.verbosity]. |
This function is a re-implementation of the function multilocus_assignment from package gstudio. Description of the method used in this function can be found at: https://dyerlab.github.io/applied_population_genetics/population-assignment.html
A data.frame
consisting of assignment probabilities for each
population.
Custodian: Luis Mijangos – Post to https://groups.google.com/d/forum/dartr
require("dartR.data") res <- utils.assignment_2(platypus.gl, unknown = "T27")
require("dartR.data") res <- utils.assignment_2(platypus.gl, unknown = "T27")