This directory contains data pertaining to orthology assignments among
various Aspergillus species/strains, and between Asprgillus species
and other organisms.

Ortholog mappings among 16 Aspergillus species are given in the
tab-delimited file, "All_Species_Orthologs_by_Jaccard_clustering.txt".
Each column represents a different species, and each row a different
orthologous group of genes. In cases with more than one gene per
species in a group, genes (paralogs) for that species are separated
by a "|" (vertical bar, or "pipe" character).

Pairwise orthologs among species in AspGD, and between AspGD species
and S. cerevisiae and S. pombe, are provided within individual
subdirectories, the names of which list the species compared.

Ortholog assignments among the Aspergillus species are made
using a Jaccard clustering approach. The clustering is a two step
process, with the first step clustering proteins within each
genome/strain (using the results of an all-vs-all BLASTP search with
an 80% BLASTP percent identity threshold and an e-value threshold of
1e-5, and a minimum Jaccard similarity coefficient of 0.6). The
clusters that are generated by the first step are referred to as
simply "Jaccard clusters" and these clusters are themselves clustered
in the second step, using a reciprocal best BLAST match linkage, to
create the Jaccard Orthologous Clusters. The sets of clusters were
built from an initial cluster analysis with a minimum percent coverage
score of 80% (step 1) and a reciprocal best hit analysis with a
minimum percent coverage score of 80% (step 2). Note, that the
ortholog assignments were automatically generated, with no curator
intervention. Thus, there will occasionally be pairings that may not
occur with a different scoring matrix. In the interests of automating
the process, we do not intend to hand-curate the ortholog pairs at
this time.

Pairwise ortholog mappings to S. cerevisiae and S. pombe are generated
using the InParanoid software developed at the Karolinska Institutet
( InParanoid comparisons use the latest
set of S. cerevisiae proteins from SGD; the set of S. pombe proteins
from the Sanger Institute was used as an outgroup. Stringent cutoffs
were set: BLOSUM80 (instead of the default BLOSUM62), and an
InParanoid score of 100%. Files are provided containing the input
sequences that were used and the raw output files that were generated
by InParanoid.  In addition, files containing the processed output,
listing the orthology assignments, are provided.

The format of the pairwise ortholog and Best Hit files (updated
September 2012) contains three tab-delimited columns for each
For each organism, the columns display:
- the systematic name of the gene
- the standard/genetic name of the gene (if one exists)
- the database identifier for the gene