Welcome! This resource allows you to iteratively explore the genomic context, functional associations, phylogenetic information and ecological distribution of 413,335 novel and highly curated protein families, identified from thousands of uncultivated microbial organisms.
All data is derived from the systematic analysis of the so-called microbial dark matter of a multi-habitat metagenomics dataset.
You can search novel protein families by their genomic context (e.g. associated to specific KEGG functions or eggNOG orthologs), by taxa conservation (i.e., by finding rank-specific protein families), or even by association with fitness experiments.
Functional and evolutionary significance of unknown genes from uncultivated taxa
Álvaro Rodríguez del Río, Joaquín Giner-Lamia, Carlos P. Cantalapiedra, Jorge Botas, Ziqi Deng, Ana Hernández-Plaza, Lucas Paoli, Thomas S.B. Schmidt, Shinichi Sunagawa, Peer Bork, Luis Pedro Coelho, Jaime Huerta-Cepas
members | length (aa) | signal peptide (gram+) | signal peptide (gram-) | topology | sequence | neighbor sequences |
---|---|---|---|---|---|---|
[[ m ]] | [[ tdata.domains | getLen(m) ]] | [[ tdata.signalp.genes[m] | signalp('pos') ]] | [[ tdata.signalp.genes[m] | signalp('neg') ]] |
Taxonomic level | Number | Most common | Coverage | Specificity | Score | Total genomes | RefSeq genomes |
---|---|---|---|---|---|---|---|
[[ l ]] | [[ tdata["n_"+l[0]+''] ]] | [[ tdata[l[0]+'_mostcommon'] | trimTaxa]] | [[ tdata[l[0]+'_coverage'] | toFixed ]] | [[ tdata[l[0]+'_specificity'] | toFixed ]] | [[ tdata[l[0]+'_score'] | toFixed ]] | [[ tdata[l[0]+'_total_genomes'] ]] | [[ tdata[l[0]+'_refseq_genomes'] ]] |
Condition | Condition type | Fitness variation | T score | Identity | Novel family coverage | Original gene coverage | e-value |
---|---|---|---|---|---|---|---|
[[ c.condition ]] | [[ c.condition_type ]] | [[ c.fitness_delta | toFixed ]] | [[ c.t_score | toFixed ]] | [[ c.identity ]] | [[ c.cov_fam ]] | [[ c.cov_gene ]] | [[ c.evalue ]] |