ChEMBL Resources

Resources:
ChEMBL
|
SureChEMBL
|
ChEMBL-NTD
|
ChEMBL-Malaria
|
The SARfaris: GPCR, Kinase, ADME
|
UniChem
|
DrugEBIlity
|
ECBD

Wednesday, 10 August 2011

Descriptors for Protein Sequences?


Does anyone know of a website/web service that calculates a series of descriptors of a protein sequence, analogous to the descriptors that are regularly calculated for small molecules.

Specifically what I'm looking for is something that gives a large set of descriptors for either a sequence, or for a given stable identifier (e.g. UniProt ID). The descriptors I'd like back would be things like Molecular weight, number of each amino acid, fraction of each amino acid, hydrophobicity values, complexity/sequence entropy values, number of transmembrane helices, presence of certain features (e.g. signal sequence, nuclear localisation sequence, etc.), domain counts would be good as well - building up a 'fingerprint' for the sequence. I guess with a little bit of thought, it would be possible to come up with a fuller list of descriptors, and the above certainly isn't exclusive, but you get the idea; I'm sure. To be clear, I don't want an annotation service, I just want some numerical/logical feature descriptors.

Does something like this exist, should it be built if not, and so forth?

5 comments:

Vladimir Chupakhin said...

Di and tripeptides generated from sequence are the most simple one, and I think you don't need a web serive for that. It's quit easy to implement.

jpo said...

You're right, for small peptides it is quite straightforward to enumerate, and then calculate classical small molecule type features, but what I am after is a set of descriptors for arbitrary length sequences, and that have more 'biological context' - a sort of 'functional group' sort of analogue.

Egon Willighagen said...

John, the CDK has some protein descriptors, and MW can be done too, and AA count is not there but trivial. What is the input? The sequence, a PDB id or a PDB file?

Matteo Floris said...

Pepstats is a command line tool from the EMBOSS suite which is a source of descriptors: http://emboss.sourceforge.net/apps/cvs/emboss/apps/pepstats.html

Nan Xiao said...

You could try our R package protr: http://cran.r-project.org/web/packages/protr/ or the web app ProtrWeb: http://cbdd.csu.edu.cn:8080/protrweb/ . They provide several state-of-the-art protein sequence-derived structural and physicochemical descriptors. : )