Adjustable parameters
Adjustable parameters
Project description

Welcome! This resource allows you to iteratively explore the genomic context, functional associations, phylogenetic information and ecological distribution of 413,335 novel and highly curated protein families, identified from thousands of uncultivated microbial organisms.

All data is derived from the systematic analysis of the so-called microbial dark matter of a multi-habitat metagenomics dataset.

You can search novel protein families by their genomic context (e.g. associated to specific KEGG functions or eggNOG orthologs), by taxa conservation (i.e., by finding rank-specific protein families), or even by association with fitness experiments.

Citation

Functional and evolutionary significance of unknown genes from uncultivated taxa

Álvaro Rodríguez del Río, Joaquín Giner-Lamia, Carlos P. Cantalapiedra, Jorge Botas, Ziqi Deng, Ana Hernández-Plaza, Lucas Paoli, Thomas S.B. Schmidt, Shinichi Sunagawa, Peer Bork, Luis Pedro Coelho, Jaime Huerta-Cepas

GO TO PREPRINT

[[ info.level ]] [[ tax | trimTaxa ]] [[ info.nfam ]] fams
[[ ko ]] [[ info.nfam ]] fams
[[ info.desc ]] [[ ec ]]
[[ card ]]
[[ card ]]
[[ info.nfam ]] fams
[[ info.desc ]] [[ info.accession ]]
[[ condtype ]]
[[ info.nfam ]] fams
[[ info.desc ]]
Showing results from [[ (currentPage - 1) * perPage + 1 ]] to [[ Math.min(currentPage * perPage, totalItems) ]]. Total hits: [[ totalItems ]].

[[ tdata.code ]]

[[ tdata.n_members ]] members [[ tdata.nspecies ]] species [[ tdata.signalp.n_genes_sp ]] genes with signal peptide [[ tdata.mean_nh | toRounded ]] mean num. of TM domains
Sequence information
members length (aa) signal peptide (gram+) signal peptide (gram-) topology sequence neighbor sequences
[[ m ]] [[ tdata.domains | getLen(m) ]] [[ tdata.signalp.genes[m] | signalp('pos') ]] [[ tdata.signalp.genes[m] | signalp('neg') ]]
Taxonomic assignation
Taxonomic level Number Most common Coverage Specificity Score Total genomes RefSeq genomes
[[ l ]] [[ tdata["n_"+l[0]+''] ]] [[ tdata[l[0]+'_mostcommon'] | trimTaxa]] [[ tdata[l[0]+'_coverage'] | toFixed ]] [[ tdata[l[0]+'_specificity'] | toFixed ]] [[ tdata[l[0]+'_score'] | toFixed ]] [[ tdata[l[0]+'_total_genomes'] ]] [[ tdata[l[0]+'_refseq_genomes'] ]]
Experimentally tested
Condition Condition type Fitness variation T score Identity Novel family coverage Original gene coverage e-value
[[ c.condition ]] [[ c.condition_type ]] [[ c.fitness_delta | toFixed ]] [[ c.t_score | toFixed ]] [[ c.identity ]] [[ c.cov_fam ]] [[ c.cov_gene ]] [[ c.evalue ]]
Biome mapping distribution