Large-Scale Datasets

AspGD is collecting high-throughput projects and data for the Aspergillus research community. The data themselves may be downloaded from the links we provide below.



Aspergillus fumigatus Datasets

JCVI 2011

Source: Data supplied to AspGD by Natalie Fedorova, JCVI. This project has been funded in whole or part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services under contract numbers N01-AI30071 and / or HHSN272200900007C.

Notes: A. fumigatus isolates: 4 "reference" and 19 "target" Aspergillus fumigatus strains have been sequenced using the Illumina platform. The list of sequenced strains includes (i) 7 drug resistant isolates, selected to facilitate the identification of new mechanisms of antifungal drug resistance (F series); (ii) 12 genotypically linked isolates from 2 hospital outbreaks, included to validate the discriminatory power of existing genotyping markers (B series).

SRA submission and metadata can be found here:

It was observed that the sequences of some B-series strains are virtually identical.

For SNP analysis, the following parameters were used:
samtools mpileup -E -ugf AF293.fa F18454.bam | bcftools view -bvcg - > var.raw.bcf bcftools view var.raw.bcf | varFilter -D 500 > var.flt.vcf

This reference file used in the alignments (also available for download from link below, AF293_REF.fasta) is the same sequence that was submitted to GenBank:
WGS AAHF01000001-AAHF01000019
WGS_SCAFLD CM000169-CM000176
plus the mitochondrial and ribosomal scaffolds, which will be submitted by JCVI later in 2011.

To remove false positives, the VCFs were further filtered using the following parameters:

  • Per-site SNP quality (QUAL) >= 70 (default > 50)
  • Raw read depth: DP >= 20 (default > 15)
  • High quality read depth (DP4): high-quality ref-forward = 0, ref-reverse = 0, alt-forward > 4 and alt-reverse > 4
  • Max-likelihood estimate of the first ALT allele count (AC1) = 2
  • Max-likelihood estimate of the first ALT allele frequency (AF1) = 1
The parameters were optimized to minimize the number of false positives by analyzing VCF file for AF293 Illumina reads mapped to the current (Sanger AF293) assembly.

The GFF files corresponding to each VCF file were generated at AspGD by converting formats and mapping scaffolds to chromosomes.

DOWNLOAD these data from AspGD:
VCF format
BAM format
GFF format
Reference sequence used in the alignments

Fedorova, N., et al.

Aspergillus oryzae Datasets

Wang et al., 2010

Source: A. oryzae RNA-Seq was performed by Wang B, Guo G, Wang C, Lin Y, Wang X, Zhao M, Guo Y, He M, Zhang Y, and Pan L, at the School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, Guangdong, China. Data were downloaded from NCBI by AspGD.

Notes: This A. oryzae RNA-Seq dataset is published in:
Wang B, et al. 2010.

Data were downloaded from NCBI:

This publicly available A. oryzae RNA-Seq dataset comprises eight experiments. Paired-end RNA-sequencing (RNA-Seq), using the Illumina platform, was conducted on polyadenylated-enriched mRNA isolated from four different culture conditions.

DOWNLOAD these data from AspGD:
FASTQ format
BAM format

Wang B, Guo G, Wang C, Lin Y, Wang X, Zhao M, Guo Y, He M, Zhang Y, Pan L. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing. Nucleic Acids Res. 2010 Aug;38(15):5075-87. Epub 2010 Apr 14.

If you would like to suggest datasets or citations to add to this list, please send a message to AspGD curators with details.

Return to AspGD Send a Message to the AspGD Curators