####### # Input genomes ####### assembled_genomes/: fasta file of each of the genomes included in the FESNov gene family analysis assembled_genomes.tar: fasta file of each of the genomes included in the FESNov gene family analysis ##### # Predictions ##### # Genome taxonomic annotations genome2taxonomy.r202.tab: GTDB taxonomy of the genomes included in the analysis # Gene predictions microbial_genomes-v1.proteins.dups_reformatted.faa: concatenated proteins of the genes predicted on the genomes in assembled_genomes/ microbial_genomes-v1.proteins.dups_reformatted.fna: concatenated CDSs of the genes predicted on the genomes in assembled_genomes/ # Gene functional annotation annotations.tar.gz: eggnog-mapper predictions for all the genes predicted on the concatenated proteome # Gene families microbial_genomes-v1.clustering.folded.tsv.gz: Gene families predicted on the concatenated proteome. Fields are: 1) Gene family name 2) Number of gene family members 3) Number of species (calculated on previous GTDB versions, we later updated this number with GTDB r202 taxonomic labels) 4) Whether the gene family contains any gene coming from refseq genomes (isolate) or not (metagenome) 5) Gene family members all_clusters.fasta.tar.gz: individual protein fasta file for gene families passing initial filters (More than 2 members, more than 2 species, no reference genome present) all_clusters.trees.tar.gz: individual protein family trees for gene families passing initial filters (More than 2 members, more than 2 species, no reference genome present) # Novel gene families before quality filters microbial_genomes-v1.clustering.folded.no_emapper.no_pfamB.no_pfamA.no_refSeq.txt: list of gene families with no significant homologs in EGGnog, Pfam and REFseq # Contig classification plasflow.json.gz: Plasflow predictions for the contigs in which each gene was detected seeker.json.gz: Seeker predictions for the contigs in which each gene was detected # Synapomorphism analysis cov_sp_per_fam.raw.tab: Coverage / specificity per every gene family computed. ###### # FESNov gene families data ###### # Fasta / hmm / tree files per FESNov gene family FESNov_families.fasta.tar.gz: individual protein fasta file for each FESNov gene family FESNov_families.trees.tar.gz: gene family trees for each FESNov gene family FESNov_families.hmm.tar.gz: hmm profile for each FESNov gene family FESNov_fams.rep_seq.faa: representative sequence per FESNov family # PDB files all_pdbs/: pdb files computed on the FESNov gene families pLDDT_values.tab: pLDDT values associated to each structure prediction # Stats per FESNov gene family FESNov_fams_stats.tab: characteristics of each FESNov gene family (dN/dS, identity, number of species, ...) FESNov_fams_funct_hab_info.json: Functional predictions and ecological information for all the FESNov gene families # lists of FESNov gene families in different functional and evolutioary groups sublists/ amp.tsv: Short FESNov proteins predicted to be antimicrobial peptides by Macrel BGC.tsv: FESNov proteins in biosynthetic clusters euk.tsv: FESNov proteins with homologs in the EukProt database fitness.tsv: FESNov gene families with homologs in the FitnessBrowser database foldseek.tsv: FESNov gene families showing structural similarity to PDB or Uniprot proteins hq_functions.tsv: FESNov gene families associated to KEGG patways with confidence >= 0.9 in genomic context analysis resistance.tsv: FESNov gene families surrounded by 2 or more AMR genes within the CARD database sberro.tsv: FESNov gene families with homologs in the small protein collection presented in Sberro et al. (2019) synapos.tsv: Synapomophic FESNov gene families