RClone: A R package to analyse population genetics data on partial asexuals

RClone is the R version of GenClone, plus the possibility to work with several populations, MultiLocus Lineages custom definition and use, and p-value calculation for psex (probability of originating from distinct sexual events) and psex_Fis (taking account of Hardy-Weinberg equilibrium departure) as MLGsim.

From left to right and top to bottom: Gracilaria gracilis ©Station biologique de Roscoff - Christophe DESTOMBE, Acyrthosiphon pisum ©INRA - Bernard CHAUBET, cold water coral reef (Lophelia pertusa, Madrepora oculata) ©IFREMER - BobEco Cruise 2011, Posionia oceanica ©IFREMER, Alexandrium minutum©CNRS - Aliou DIA, Melampsora larici-populina ©INRA, Laminaria digitata © Station biologique de Roscoff - Marie-Laure GUILLEMIN, cold water coral reef (Lophelia pertusa, Madrepora oculata) ©IFREMER - BobEco Cruise 2011, Zostera marina ©IFREMER - Olivier DUGORNAY, Trypanosoma brucei gambiense sur frottis de sang de souris ©IRD - Sophie RAVEL.

Clonality is a widespread trait across the Tree of Life (Halkett, Simon & Balloux 2005) allowing organisms to produce offspring without sexual reproduction. These offspring/clones are genetically identical to their relatives, at the exception of somatic mutations. A key step in genetic analysis of potentially clonal dataset involves a genotypic analysis to discriminate Multi Locus Genotypes (MLG). The study of clonality includes its detection as well as the estimation of its quantitative and qualitative consequences on the demographic and evolutionary trajectories of populations. The study of clonality thus requires the ability to distinguish between two central components that are: the demographic individual (ramet, i.e. demographic unit which could be a module in clonal plant, a colony in corals, or an individual aphid in insects) and the genetic individual (genet, i.e. cluster of ramets that are all derived from a single event of sexual reproduction followed by clonal multiplication, “in a clonally propagating organism, the entity that persists and evolves” - Ayala 1998). For this purpose, molecular markers might be able to identify ramets from the same genet, as those are supposed to share the same MLG. However, slightly distinct MLGs may belong to the same clonal lineage, due to the occurrence of somatic mutations (Klekowski 2003), and scoring errors (Douhovnikoff & Dodd 2003; Meirmans & Van Tienderen 2004). Working with large datasets, which is now made possible by New Generation Sequencing (NGS), increases the probability to have to manage uncommon somatic mutations and decreases the time allocated to resolve scoring errors ambiguities. To make ecological and evolutionary analyses possible and easier considering such issue, the concept of Multi Locus Lineages (MLL) was thus introduced (Arnaud-Haond et al. 2007b) to define clusters of MLGs belonging to the same genet, therefore sharing the same original event of sexual reproduction, but appearing slightly distinct either due to somatic mutations or scoring errors.

Briefly, the most used clonal related indices in the literature can be computed with clonal_index: R, the richness index, S, the Simpson index applied to the clonality, Hill, the inverse index, H’’, the Shannon-Wiener index, and corresponding evenness indices: V, the Simpson evenness index and J’ Shannon-Wiener index. The description of the distribution of size (in terms of number of sampling units) among MLGs or MLLs can also be addressed through the Pareto distribution (Pareto 1887 in Vidondo et al. 1997) with Pareto_index. When sampling coordinates are available functions allowing to study the spatial components of clonality are also included: edge_effect computed edge effect by comparison of the average distance between lineages of single unit and the center of the sampling area with the average distance of all units and the center. agg_index computed aggregation of clones by comparison of the probability of clonal identity between pairs of closest spatially units to that of all randomly chosen units pairs. Spatial autocorrelation analyzes are used to determinate the scale of spatial dependence of clonal and genetic diversities, with autocorrelation. clonal_sub computed clonal subrange which corresponds to the spatial scale beyond which the clonality no longer affects the genetic structure. Finally, a recapitulative function genclone gathered 17 statistics describing clonal data and, when relevant, significance (p-values).

RClone proposes all functions implemented in this mother software GenClone including the study of spatial components of clonality. RClone also implements a semi-automatic procedure to define MLLs and compatibility of MLL with others functions of the package. RClone also relaxes previous limitations of GenClone software in terms of number of samples and loci, and multi-population handling, and enables the analyses of large datasets derived from the broader access to NGS. RClone also includes Psex ascertainment by critical probabilities (p-values) based on populations simulations, a method derived from MLGsim (Stenberg, Lundmark & Saura 2003) and MLGsim 2.0 (Ivens, van de Sanden & Bakker 2012) implemented with the authors’ permission. Moreover, RClone on CRAN offers the benefits of an active and collaborative open-source platform: code availability, reproducible research and data transfer among packages.

The RClone functions require genotypes for codominant markers, indication for the haploid or diploid nature of the organism and x/y sampling coordinates for spatial analyses. Missing data are not supported yet as such, and thus considered as new alleles if included. The functions available are distributed into four main themes: (i) tests checking for data set reliability to discriminate MLG, (ii) determination of clones among MLG and through clonal lineages with genetic distances, (iii) genotypic richness and evenness indices calculation with MLG or custom MLL, and (iv) description of spatial aspects of clonality. Several functions allow data importation, conversion and exportation with adegenet, Genetix (Belkhir et al. 1996-2004) or Arlequin (Excoffier, Laval & Schneider 2005).

RClone package can be downloaded on Github (latest version) or CRAN.

Two vignettes are available:

one population analyzes:

 multi-populations analyzes: