Home / Download
FAQ
Comparison
Background & References
Plans
Siepel Lab
phastCons: Conservation scoring and identification of conserved elements
phastOdds: Log-odds scoring for
phylogenetic models or phylo-HMMs
phyloFit: Fitting of phylogenetic models to aligned DNA sequences
phyloP: Computation of p-values for
conservation or acceleration, either lineage-specific or across all
branches
exoniphy: Phylogenetic exon prediction
dless: Prediction of elements under lineage-specific selection
prequel: Probabilistic reconstruction of ancestral sequences
Alignments: msa_view, msa_split, msa_diff
Phylogenetic
models: tree_doctor, all_dists, draw_tree, consEntropy, indelFit, indelHistory
Sampling/bootstrapping: base_evolve, phyloBoot
Annotations: refeature, clean_genes, eval_predictions
...and others
Send feedback to: phast-help-l@cornell.edu
Q. Where does the name "PHAST" come from?
A. The name "PHAST" arose because several programs in the package (including phastCons, exoniphy, and dless) make use of phylogenetic hidden Markov models (phylo-HMMs). Phylo-HMMs have been called "space/time models" because they describe DNA sequences by two Markov processes — one that operates in the dimension of time (along the branches of an evolutionary tree) and one that operates in space (along the sequences themselves) (see Yang, 1995).
Q. How do I do X with program Y?
A. Most of the programs in the package have fairly detailed help pages, with examples. The help pages are available from these web pages (follow links in left panel) or by running each program with the --help (-h) option. If they do not contain the information you need, please send email to the phast-help mailing list.
Q. Is PHAST freely available? Am I allowed to reuse and/or redistribute the source code?
A. Yes. PHAST has always been freely available to academics, but it is now officially open source and available under the terms of a BSD-style license.
Q. What's the difference between PHAST and PAML / GERP / N-SCAN / etc.?
A. We have attempted to summarize similarities and differences of PHAST with respect to several related programs on the Comparison page.
Q. Which program in PHAST should I use for conservation scoring?
A. Both phastCons and phyloP (with --wig-scores or --base-by-base) can be used to produce conservation scores, and which one is best depends on the application. The most important difference between these two programs is that the scores produced by phyloP reflect individual alignment columns, and do not take into account conservation at neighboring sites. This is why the phyloP conservation plot in the UCSC Browser has a less smooth appearance, with more "texture" at individual bases, than the phastCons plot. This property also makes phyloP more appropriate than phastCons for evaluating signatures of selection at particular bases or classes of bases in the genome (e.g., all third codon positions). In addition, phyloP requires fewer assumptions than phastCons, by depending only on a model of neutral evolution, rather than on models of both neutral evolution and negative selection (conservation). On the other hand, because it directly models multibase elements, phastCons may be preferred as a conserved element detector. Its ability to pool information across sites can also be valuable in cases of few species or short branch lengths, where there may be insufficient data to detect selection separately at each site.
Q. Why are the basewise phyloP scores so "clumpy"; i.e., why are there so few distinct scores?
A. You are probably using scores generated with --method SPH (as for the 28-way vertebrate alignment in the UCSC Genome Browswer). These scores are based on a test statistic equal to the (estimated) number of substitutions along the branches of the phylogeny. The algorithm for computing the test statistic is nontrivial (see Siepel, Pollard, and Haussler, 2006), but at the end of the day the test statistic can only take values of 0, 1, 2, 3, ... Thus, with single-column scores, relatively few p-values (hence relatively few scores) are possible. The LRT and SCORE methods behave better in this regard, and newer versions of the phyloP tracks will use these methods.
Q. What exactly is {SS, *.cm, *.mod} format?
A. SS is a format used by PHAST to describe a multiple alignment in terms of its "sufficient statistics" for phylogenetic analysis — i.e., its distinct alignment columns and their counts, and optionally, the order in which the columns appear (which is typically needed for functional element identification but not for phylogenetic analysis). A *.cm file defines a "category map," i.e., a mapping from feature types (e.g., "exon", "ancestral repeat", "conserved element") to label numbers. These turn out to be useful in several HMM-based "parsing" applications. A *.mod file defines the parameters of a probalistic phylogenetic model, including a tree and branch lengths, a substitution rate matrix, and a background distribution over bases. Detailed specifications of these file formats will be made available as the PHAST documentation is improved. In the meantime, examples of SS and *.mod files can be obtained by converting or generating files with msa_view or phyloFit, respectively. Example *.cm files are included in the phast/data/exoniphy directory.
Q. What is {genepred, bed, MAF} format?
A. These are file formats developed for the UCSC Genome Browser and associated applications. Specifications can be found here. The GFF and GTF formats (also recognized by PHAST) are also described on this page.