phygeo

module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 7, 2024 License: BSD-2-Clause

README

PhyGeo

PhyGeo is a tool for phylogenetic biogeography analysis.

Installing

There are two ways to install PhyGeo. If you are only interested in the command-line tool, just go to the Releases tab, select the last release, choose an executable for your system architecture, renamed at your will (Here, it is assumed that you use the name phygeo or phygeo.exe if you use Windows), and put it in your default bin directory. If you want an up-to-date tool you require the go tool and install the PhyGeo package by running:

go install github.com/js-arias/phygeo@latest

If you want to use the package in your own code, just import the package, for example:

import "github.com/js-arias/phygeo/infer/diffusion"

Usage

PhyGeo is a command-line tool formed by a set of commands. Many commands have their own sub-commands. To see the list of commands, just type the name of the application:

phygeo

The best way to learn about the commands is by reading the included on-line help, using the command help, and using the command of interest as a parameter:

phygeo help diff map

This simple example dataset, which includes the instructions to run it, will be helpful to start using the program.

Setting a PhyGeo project

A diffusion analysis with PhyGeo requires three data sources: a phylogenetic tree, the distribution range of the terminals, and a paleogeography model. These data sources are stored in a project file, so you don't have to define them every time. They also give you the possibility of having multiple projects based on slightly different data (for example, the same phylogenetic tree and distribution ranges, but a different paleogeographic model).

Maybe the best way to start a project is by setting the paleogeography model:

phygeo geo add --type geomotion project muller-2022-motion.tab
phygeo geo add --type landscape project cao-2017-landscape.tab

In this example, a plate motion model called muller-2022-motion.tab and a landscape model called cao-2017-landscape.tab are added to project.tab. As it is possible that project.tab does not exist, it will be created automatically in the first call.

As paleogeography models are quite specialized datasets, here is a repository with several models ready to be used with PhyGeo.

To define the priors of the pixels, you must define a pixel prior file (here is an example of this kind of file) and then add it to the project:

phygeo geo prior --add model-pix-prior.tab project.tab

Phylogenetic trees in PhyGeo must be time-calibrated. The trees must be fully dichotomous. The preferred tree file is a tab-delimited file (here is an example file). To add a tree file, you must define the name of the destination file (if it is the first tree to be added), the project file, and one or more files with the trees (usually a single file):

phygeo tree add -f data-tree.tab project.tab vireya-tree.tab

Most users usually have the trees in newick format. To import a newick tree, use the flag --newick with the name of the added tree:

phygeo tree add -f data-tree.tab --newick vireya project.tab vireya.tre

See the documentation of the command tree add for more information about adding trees.

Specimen records are stored as a set of pixels (presence-absence pixels) or range maps. Both files have the same format (here is an example file). To import a set of records to a project, you must define a destination file (if it is the first set of records to be added), the kind of file (points or ranges), the project file, and one or more files with the records:

phygeo range add -f data-points.tab project.tab vireya-points.tab pseudovireya-points.tab

It is possible that your file with specimen records is a table with latitudes and longitudes or a table downloaded from GBIF. In such cases, you can import them using the flag --format:

phygeo range add -f data-points.tab -format darwin project.tab gbif-download.csv 

See the documentation of the command range add for more information about adding specimen records.

Note that the pixelation used for the specimen records must be of the same resolution as the paleogeography models.

To be sure that all terminals in all the trees in the project have at least a single valid record, use:

phygeo range taxa --val project.tab

if there is no problem, the command will finish silently; otherwise, it will report the name of the terminals without geographic data.

Analyzing the data

With a valid project, it is possible to make inferences from the data. There are several possibilities. Maybe the most simple is to just attempt a likelihood estimation of the data with an a priori lambda value, just to see what happens, or because from a previous analysis, you know that the given lambda is the maximum likelihood value:

phygeo diff like --lambda 100 -o like project.tab

This analysis will create a new file with the prefix like (for example, like-project.tab-vireya.100.000000-down.tab), which contains the down-pass likelihoods (i.e., the conditional likelihoods) of each pixel in each node, so it is a large file. As the calculation of the likelihood conditionals is a time-consuming operation, this file will be helpful to skip that operation in further analysis. For example, it can be used to perform stochastic mapping:

phygeo diff particles -i like-project.tab-vireya-100.000000-down.tab -o p project.tab

This analysis will create a new file with the prefix p (for example, 'p-vireya-100.000000x1000.tab'; here is an example file. This file will contain all the simulated dispersal paths, so it is usually a large file.

As you probably want to know the maximum likelihood estimate of lambda, you can use the command diff ml:

phygeo diff ml project.tab

The maximum likelihood estimation will be printed on the screen. It uses a simple hill-climbing algorithm that stops by default when the step size is smaller than 1.0; you can set a more detailed bound (but with a larger execution time).

Maybe you prefer a Bayesian analysis. As the only free parameter is the lambda value, you can make a simple integration:

phygeo diff integrate --min 100 --max 300 --parts 500 project.tab > log-like.tab

and then, using any program to read tab-delimited data (in this case log-like.tab, here is an example file), you can provide the prior for lambda (or just use the integration output, assuming a flat uniform prior).

To sample from the posterior (or for any distribution), you can use the same diff integrate command, but define a sampling function (at the moment, it just implements the gamma distribution):

phygeo diff integrate --distribution "gamma=75,0.5" -p 100 --parts 1000 project.tab

In this execution, for each sample (it will make 1000, defined with the flag --parts), it will make 100 stochastic mappings (defined with the flag -p). The output will have the prefix sample (for example, sample.tab-project.tab-vireya-sampling-1000x100.tab). These files are usually large and are of the same format as the output files produced by the diff particles command.

Working with the output

The results of the diff particles command, or diff integrate --distribution command, form the most important output of the program. These files contain one or more stochastic mappings (usually more than 100), i.e., the pixel locations of the nodes and internodes (branches that cross a time stage defined by the paleogeography model).

We can transform the stochastic maps into pixel frequencies, which are the approximation of the pixel posterior at each node. These frequencies can be raw (i.e., just counts of sampled pixels) or smoothed using a spherical KDE:

phygeo diff freq --kde 1000 -i p-vireya-100.000000x1000.tab -o kde project.tab

In this example, the pixel frequencies will be stored in a file with the kde prefix (for example, kde-project.tab-p-vireya-100.000000x1000.tab.tab), which is usually large.

Then we can create an image map of the frequencies:

phygeo diff map -c 1440 -key landscape-key.tab --gray -i kde-project.tab-p-vireya-100.000000x1000.tab.tab -o "ml-95/ml-95" project.tab

The command diff map will create a reconstruction from the frequency file using a rainbow color scheme (from blue for pixels with low posterior probability to red for pixels with a high posterior); see this directory for an example output. The command diff map has a lot of output options; see the command help for more information. Here are some options: to produce rotated (the default) or unrotated maps (maps with current geographic locations, --unrot flag), to output each node (the default), or output by time stage (--richness flag). A key file (here is an example) can be used to define the colors for the background geography (with the flag --gray, it will use a grey scale).

As stochastic maps include the starting and ending pixel at each node, it is possible to measure the distance traveled by a particle and its speed. Use the command diff speed to retrieve general speed results:

phygeo diff speed --tree speed --step 5 --box 5 -i ml-project.tab project.tab > speed.txt

This example produces a tree with the speed values colored in a rainbow color scheme (faster lineages in red, slower in blue, see this example file), in an svg format, and a log file (here is an example file), with the speed and distance traveled on average for each node. With this command, it is also possible to measure the speed at different time stages (using the flag --time). Consult the help diff speed command to learn more about this command.

Additional resources

Contribution and bug reports

The best way to contribute to the package is by running the program, detecting bugs, or asking for features. Use the tab issues to file a bug or ask for a feature.

If you like programming, you can create tools and packages to import export, or analyze data and results to or from PhyGeo. If you send me the link, I will post the link of your tool or package.

Of course, this package is open source, so you can modify it at your will!

Authorship and license

Copyright © 2023 J. Salvador Arias [email protected]. All rights reserved. Distributed under BSD2 licenses that can be found in the LICENSE file.

Directories

Path Synopsis
cmd
phygeo
PhyGeo is a tool for phylogenetic biogeography analysis.
PhyGeo is a tool for phylogenetic biogeography analysis.
phygeo/diff
Package diff is a metapackage for commands that dealt with the biogeographic inference using a diffusion model.
Package diff is a metapackage for commands that dealt with the biogeographic inference using a diffusion model.
phygeo/diff/freq
Package freq implements a command to calculate pixel frequencies from the stochastic mapping output.
Package freq implements a command to calculate pixel frequencies from the stochastic mapping output.
phygeo/diff/integrate
Package integrate implements a numerical integration of the likelihood curve for a diffusion model.
Package integrate implements a numerical integration of the likelihood curve for a diffusion model.
phygeo/diff/like
Package like implements a command to perform a biogeographic reconstruction using likelihood.
Package like implements a command to perform a biogeographic reconstruction using likelihood.
phygeo/diff/mapcmd
Package mapcmd implements a command to draw range reconstructions from pixel probability files.
Package mapcmd implements a command to draw range reconstructions from pixel probability files.
phygeo/diff/ml
Package ml implements a command to search for the maximum likelihood estimation of a biogeographic reconstruction.
Package ml implements a command to search for the maximum likelihood estimation of a biogeographic reconstruction.
phygeo/diff/particles
Package particles implements a command to run a stochastic mapping from a down-pass reconstruction.
Package particles implements a command to run a stochastic mapping from a down-pass reconstruction.
phygeo/diff/speed
Package speed implements a command to measure the speed and distance traveled in a reconstruction.
Package speed implements a command to measure the speed and distance traveled in a reconstruction.
phygeo/geo
Package geo is a metapackage for commands that dealt with paleogeographic reconstruction models.
Package geo is a metapackage for commands that dealt with paleogeographic reconstruction models.
phygeo/geo/add
Package add implements a command to add a paleogeographic reconstruction model to a PhyGeo project.
Package add implements a command to add a paleogeographic reconstruction model to a PhyGeo project.
phygeo/geo/prior
Package prior implements a command to manage pixel priors defined for a project.
Package prior implements a command to manage pixel priors defined for a project.
phygeo/prj
Package prj implements a command to print the basic information of a project.
Package prj implements a command to print the basic information of a project.
phygeo/rangecmd
Package rangecmd is a metapackage for commands that dealt with taxon distribution ranges.
Package rangecmd is a metapackage for commands that dealt with taxon distribution ranges.
phygeo/rangecmd/add
Package add implements a command to add taxon ranges to a PhyGeo project.
Package add implements a command to add taxon ranges to a PhyGeo project.
phygeo/rangecmd/kde
Package kde implements a command to estimate the range distributions using a kernel density estimator.
Package kde implements a command to estimate the range distributions using a kernel density estimator.
phygeo/rangecmd/mapcmd
Package mapcmd implements a command to draw the geographic range of the taxa in a PhyGeo project with defined distribution ranges.
Package mapcmd implements a command to draw the geographic range of the taxa in a PhyGeo project with defined distribution ranges.
phygeo/rangecmd/remove
Package remove implements a command to remove range distribution records not present on a tree.
Package remove implements a command to remove range distribution records not present on a tree.
phygeo/rangecmd/rotate
Package rotate implements a command to rotate the point records of a phygeo project.
Package rotate implements a command to rotate the point records of a phygeo project.
phygeo/rangecmd/taxa
Package terms implements a command to print the list of taxa in a PhyGeo project with defined distribution ranges.
Package terms implements a command to print the list of taxa in a PhyGeo project with defined distribution ranges.
phygeo/tree
Package tree is a metapackage for commands that dealt with phylogenetic trees.
Package tree is a metapackage for commands that dealt with phylogenetic trees.
phygeo/tree/add
Package add implements a command to add trees to a PhyGeo project.
Package add implements a command to add trees to a PhyGeo project.
phygeo/tree/draw
Package draw implements a command to draw trees in a phygeo project as SVG files.
Package draw implements a command to draw trees in a phygeo project as SVG files.
phygeo/tree/list
Package list implements a command to print the list of trees in a phygeo project.
Package list implements a command to print the list of trees in a phygeo project.
phygeo/tree/remove
Package remove implements a command to remove tree terminals from a PhyGeo project without defined distribution ranges.
Package remove implements a command to remove tree terminals from a PhyGeo project without defined distribution ranges.
phygeo/tree/terms
Package terms implements a command to print the list of the terminals in the trees of a PhyGeo project.
Package terms implements a command to print the list of the terminals in the trees of a PhyGeo project.
infer
diffusion
Package diffusion implements an spherical diffusion approximated using a discrete isolatitude pixelation for a phylogenetic biogeography analysis.
Package diffusion implements an spherical diffusion approximated using a discrete isolatitude pixelation for a phylogenetic biogeography analysis.
Package pixkey implements a simple color key for landscape pixelations.
Package pixkey implements a simple color key for landscape pixelations.
Package probmap implements a map image for a probability density, in a plate carrée (equirectangular) projection.
Package probmap implements a map image for a probability density, in a plate carrée (equirectangular) projection.
Package project implements reading and writing of PhyGeo project files.
Package project implements reading and writing of PhyGeo project files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL