# Cornell Orthologs and Positive Selection

*If you use the data provided on this web page, please cite:
Rhesus Macaque Genome Sequencing and Analysis Consortium.
Evolutionary and biomedical insights from the rhesus macaque genome.
Science. 2007 Apr 13;316(5822):222-34*

The following ortholog sets were constructed from union of RefSeq, knownGene, and VEGA gene annotations (downloaded in June 2006) using MULTIZ whole-genome multiple alignments of six mammalian species: human (hg18), chimpanzee (panTro2), macaque (rheMac2), mouse (mm8), rat (rn4), and dog (canFam2). More information can be found in the paper and supporting online material.

**1:1 orthologous gene sets:**

- Main set of 10376 trios (hg18/panTro2/rheMac2) used
for positive selection analysis:

[dataset] [list] - hg18/panTro2/rheMac2/mm8: [dataset] [list]
- hg18/rheMac2/mm8/rn4: [dataset] [list]
- hg18/panTro2/rheMac2/mm8/rn4: [dataset] [list]
- hg18/panTro2/rheMac2/mm8/rn4/canFam2: [dataset] [list]

**Analysis of dN/dS:**

Maximum likelihood estimates of omega = dN/dS for each gene where obtained using the codeml program of PAML with F3x4 codon frequencies, a separate estimation of kappa per gene and assuming a single omega across all sites and all branches. The file below summarizes the PAML output (N: number of nonsynonymous sites, S: number of synonymous sites, omega: nonsynonymous/synonymous rate ratio equal to dN/dS, dN: number of nonsynonymous substitutions per nonsynonymous site, dS synonymous substitutions per synonymous site, N*dN: number of nonsynonymous substitutions, S*dS: number of synonymous substitutions, t: Time or branch length measured as the expected number of nt substitutions per codon).

- Omega estimates per gene: pergeneomega

**Raw results of positive selection scans:**

For the purpose of positive selection scan, we used likelihood ratio tests based on site models (2a vs. 1a) on the trios (lrtall), and branch/site models on each branch separately (lrthuman, lrtchimp, lrtmacaque).

Note that site models split the sites into categories of purifying (category 0) and neutral (category 1) selection in null nearly neutral models, and into categories of purifying (category 0), neutral (category 1), and positive (category 2) selection in case of models allowing for positive selection. Due to this, the tables below DO NOT show overall dN/dS ratios, but only dN/dS (omega) ratios for each category of sites on both null model and alternative model.

For M1a vs. M2a (lrtall), p1Null=1-p0Null and p2Alt=1-p0Alt-p1Alt (PAML manual, p. 35). For the one-branch cases (lrthuman, lrtmacaque, lrtchimp), p0 and p1 are conventionally both defined as free parameters (PAML manual, p.39), but they are actually constrained under the null model such that there is only one degree of freedom (the model is mathematically equivalent to M1a, but with a different parameterization). To ease interpretation, here we treat p0Null and p1Null in the one-branch case exactly as under M1a, i.e., with p1Null=1-p0Null. However, in the alternative model, p0Alt and p1Alt are both free parameters, for negative selection and neutral evolution across branches, respectively, and the frequencies for the other two classes of sites (called 2a and 2b; PAML manual p. 39) are functions of the same two parameters. We simply ignore the p2Alt column in the one-branch case. Thus, the frequencies for classes 2a and 2b are not given explicitly and must be computed from p0Alt and p1Alt.

Also note that the test on macaque branch in fact includes changes that occurred on BOTH branch leading to macaque AND branch leading to hominids.

- All branches (test TA): lrtall
- Human branch (test TH): lrthuman
- Chimp branch (test TC): lrtchimp
- Macaque/hominids branch (test TM): lrtmacaque

Contact: Adam Siepel, BSCB, Cornell University,
Ithaca, NY 14853, acs4 at cornell.edu

Last update: 05/21/2007