genoPlotR2: Automatable generation of visually appealing gene maps
Publication date
Authors
DOI
Document Type
Master Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
In comparative genomics, gene maps are a commonly used tool to visualize gene neighborhoods in multiple genomes at the same time. While existing tools are tailored for making visually appealing plots of well-studied regions, they often do not facilitate the exploration of data. Usually, they require fully annotated genomes or precise coordinates of the gene neighborhoods in question. These limitations inhibit researchers in their ability to explore their datasets visually.
To combat this, we developed genoPlotR2, an R package designed to make it easier to create gene maps on-the-fly. The wrapper script run_genoPlotR2.R was made to accompany it, using genoPlotR2 to generate gene maps from a command-line. genoPlotR2 leverages all the customization options that were already present in genoPlotR, while adding features that reduce the workload and improve automatability. There are new options for reading in the necessary genomic information, and enhancements have been made to the existing methods. Once parsed, this data is now easier to manipulate.
BLAST and DIAMOND can now be used from within R to create comparisons between the DNA segments. This way, only the pairwise sequence alignments that are necessary for the gene map are made, taking the order of the genomes into account. Alternatively, these comparisons can be created by making use of output from several different gene clustering algorithms. Results from these algorithms only need to be made once, even if genomes are added, removed, or reordered. Other advantages of gene clustering include being able to create custom groups (like the pathways genes are involved in), and the fact that it adds an identifying attribute to the genomic features and comparisons, which can then be used to provide colors to these groupings.
Users can mark genes to have genoPlotR2 show their gene neighborhoods, either by selecting them individually, or by using identifying attributes such as orthogroups or gene names. genoPlotR2 can then transfer this mark to other connected genomic features, by leveraging the comparisons between them. In this manner, only a single gene on one genome needs to be marked for genoPlotR2 to determine which regions to show in the other genomes. We have also added two new ways to add a color scheme to the plot, both working in tandem with the other new functions to make it convenient to add colors to both the genomes and the comparisons between them. Among other visual options added, users can also add a legend to show the meaning behind these colors.
genoPlotR2 facilitates freely exploring genomic datasets from within R scripts, workflows, and interactive sessions, by automating processes and adding functions to manipulate the data. In addition, the wrapper script run_genoPlotR2.R can generate gene maps with a single command. This removes the need for any scripting, requiring only that researchers provide the genomic data and select their genes of interest. Together, these tools allow for exploration in comparative genomics analyses in situations where previously this would have taken too much time. This can accelerate the discovery of new insights into evolutionary relationships, functional genomics, and potential targets for medical and biotechnological applications.
Keywords
Gene maps; Comparative genomics; orthogroup; clusters of homologous genes; genoPlotR; R package; evolution; genes