蛋白质组学

蛋白质组学指南

Enrichment of the phosphoproteome can also be combined with protein profiling by 1- or 2-DE. In this way, changes in protein amount observed on electrophoresis will reflect the level of protein phosphorylation (Fig. 13). Recently, the principle of protein quantitation by ICAT has been combined with phosphoprotein enrichment (60). This was accomplished by the introduction of isotopic label into ethanedithiol, the reagent used to convert the alkene created by ß-elimination of phosphoserine into a free sulfhydryl group. In this way, the differences in the amount of phosphoproteins in extracts can be analyzed quantitatively in the mass spectrometer (60). It should be noted that because of the chemistry used in both of these methods, these techniques are relatively insensitive and require tens of picomoles of phosphoprotein. As a result, we have found that these methods as currently designed are impractical for the isolation and enrichment of low-abundance phosphoproteins.

Phosphorylation site determination by Edman degradation. Edman sequencing is still a widely used method for determining phosphorylation sites in proteins labeled with 32 P, either in vitro or in vivo (5, 22, 164). This is because sites can be determined at the sub-femtomolar level if enough radioactivity can be incorporated into the phosphoprotein of interest. In our hands, this can be as little as 1,000 cpm (not ideal). Briefly, 32 P-labeled protein is digested with a protease and the resulting phosphopeptides are separated and purified by reverse-phase HPLC or thin-layer chromatography (TLC) (Fig. 14). The isolated peptides are then cross-linked via their C termini to an inertmembrane (e.g. Immobilon P; PerSeptive Biosystems). The radioactive membrane is subjected to several rounds of Edman cycles, and radioactivity is collected after the cleavage step. The released 32 P is counted in a scintillation counter. This method positionally places the phosphoamino acid within the sequenced phosphopeptide. Of course, this is meaningful only if the sequence of th, e phosphopeptide is already known. In addition, the analysis ceases to become quantitative beyond 30 Edman cycles (even with efficient, modern Edman machines) due to well-understood issues with repetitive yield associated with Edman chemistry.

 

Recently, our laboratory has extended the usefulness of phosphorylation site characterization by Edman chemistry through the development of the cleaved radioactive peptide (CRP) program (J. A. MacDonald, A. J. Mackay, W. R. Pearson, and T. A. J. Haystead, submitted for publication). In CRP analysis, one requires only that the sequence of the protein be known. Purification and sequencing of individual peptides is not required. Radiolabeled proteins (isolated following immunoprecipitation from 32 P-labeled cells, for example) are cleaved at predetermined residues by the action of a protease. The phosphopeptides are then separated by HPLC or TLC (if only one site is present, no peptide separation is required), cross-linked to the inert membrane, and carried through 25 to 30 Edman cycles. The sequence of the target protein is entered into the CRP program. This program predicts how many Edman cycles are required to cover 100% of all the serines, threonines, and tyrosines from the site of cleavage. Generally, one round of CRP analysis narrows the number of possible sites to 5 to 10 for most proteins. Phosphoamino acid analysis can be used to reduce the number of possibilities still further. The CRP analysis is then repeated following cleavage with a second protease (usually one cutting at R, but M and F are alternatives). The second round of CRP usually unambiguously localizes thephosphoamino acid to one possible site. The technique does not work if sites are more than 30 amino acids away from all possible cleavage sites. The finding that CRP analysis is not applicable may in itself confine a phosphorylation site to a segment of the protein that is likely to produce very large proteolytic fragments. The Cleavage of Radioactive Proteins (CRP) program is accessible at http://fasta.bioch.virginia.edu/crp/ and was written in collaboration with Aaron Mackey and Bill Pearson of the University of Virginia (MacDonald et al., submitted).

Phosphorylation site determination by mass spectrometry. Because of its sensitivity, MS can allow the direct sequencing of phosphopeptides, resulting in unambiguous phosphorylation site identification. Below, a brief overview of some common methods for phosphorylation site determination by MS are given. A more complete discussion of this topic is provided by Mitchelhill and Kemp (110). Identification of phosphorylation sites in proteins provides several unique challenges for the mass spectrometrist. For example, unlike in protein identification, where analysis of any peptide within the protein can be informative, phosphorylation site analysis requires that the phosphorylated peptide be analyzed. This means that considerably more protein is required for analysis. In addition, phosphorylation can alter the cleavage pattern of a protein and the resulting phosphopeptides may require different purification methods. To isolate and purify the phosphopeptides of interest, it may be necessary to alter the way in which the phosphoprotein is digested and to alter the pH or the chromatographic material used for peptide purification (27, 110, 116).

(i) Phosphopeptide sequencing by MS/MS. In our laboratory, we have found that a combination of HPLC, Edman degradation, and phosphopeptide sequencing by MS/MS provides the best results for phosphorylation site determination (Fig. 14). Following excision and digestion of a 32 P-labeled protein, the peptides are resolved by HPLC. By monitoring HPLC fractions for radioactivity, the phosphopeptides can be selected for analysis. This reduces the complexity of the peptide mixture before MS is performed and facilitates phosphopeptide identification (Fig. 14).

Phosphopeptides can be identified from a mixture of peptides by a method known as precursor ion scanning (116). In this method, the second mass analyzer in the mass spectrometer is set at the mass of the reporter ion for the phospho group (PO3 - ) of m/z = 79. Peptides are sprayed under neutral or basic conditions, and phosphopeptides are identified in the precursor ion scan only if their fragmentation yields an ion of m/z = 79. Once a phosphopeptide is identified, the peptide mixture is sprayed under acidic conditions and the phosphopeptide is sequenced by conventional tandem MS/MS. On fragmentation of the phosphopeptide, phosphoserine can be identified by the formation of dehydroalanine(69 Da), the ß-elimination product of phosphoserine. Similarly, phosphothreonine can be identified by the formation of its ß-elimination product, dehydroamino-2-butyric acid at 83 Da (116).

(ii) Analysis of phosphopeptides by MALDI-TOF. MALDI-TOF mass spectrometry can also be used to identify phosphopeptides (81, 130, 177, 178). When phosphorylated peptides are subjected to ionization by MALDI, phosphate groups are frequently liberated from the peptides. This is the case for phosphoserine- and phosphothreonine-containing  peptides, which can liberate HPO3 or H3 PO4 , resulting in a neutral loss of 80 and 98 Da, respectively. Careful examination of the TOF spectrum for differences in peptide masses of 80 Da that are not found in the unphosphorylated peptide control can identify phosphopeptides. Phosphopeptides can also be identified by treating one of two identical samples with protein phosphatase to liberate phosphate groups (Fig. 14). Once a phosphopeptide is identified, it can be sequenced by MS/MS for identification of the phosphorylation site (178).

 

One of the most exciting applications of proteomics involves combining this technology with the power of yeast genetics to delineate signaling events in vivo. Our laboratory has published two papers using this strategy to identify in vivo targets for protein phosphatases (9, 40). In one study (9), we identified physiological substrates for the Glc7p-Reg1p complex by examining the effects of deletion of the REG1 gene on the yeast phosphoproteome. In S. cerevisiae , PP-1 (Glc7p) and its binding protein, Reg1p, are essential for the regulation of glucose repression pathways. The target for this phosphatase complex was not known. Analysis by 2-D phosphoprotein mapping identified two distinct proteins that were greatly increased in phosphate content in reg1  mutants. Mixed-peptide sequencing identified these proteins as hexokinase II (Hxk2p) and the E1 subunit of pyruvate dehydrogenase. We then went on to validate these findings in a comprehensive biochemical study. Consistent with increased phosphorylation of Hxk2p in response to REG1 deletion, fractionation of yeast extracts by anion-exchange chromatography identified a Hxk2p phosphatase activity in wild-type strains that was selectively lost in the reg1  mutant. Having carried out these studies, we attempted to rescue the reg1  phosphoprotein phenotype by overexpressing both wild-type and mutant Reg1p in the deletion strains. Here, both the phosphorylation state of Hxk2p and Hxk2p phosphatase activity were restored to wild-type levels in the reg1  mutant by expression of a LexA-Reg1p fusion protein. In contrast, expression of a LexA-Reg1p protein containing mutations at phenylalanine in a putative PP-1C (the catalytic subunit) binding site motif(K/R)(X)(I/V)XF was unable to rescue Hxk2p dephosphorylation in intact yeast or restore Hxk2p phosphatase activity. These results demonstrate that Reg1p targets PP-1C to dephosphorylate Hxk2p in vivo and that the peptide motif (K/R)(X)(I/V)XF is necessary for its PP-1 targeting function. These studies therefore demonstrate how a proteomics approach can be used to first identify enzyme targets in cells and then direct all further analysis to verify the findings. It should be pointed out that often 6 to 12 months of work ensues following the initial sequencing of the targeted proteins. Nevertheless, clearly a combined proteomics and genetics approach greatly enhances one's ability to directly answer key biological questions. We believe that a similar strategy could be adopted with transgenic or knockout mouse work, particularly in cases where there is no obvious phenotype.

[NextPage]

 

 

Proteome mining is a functional proteomics approach used to extract protein information from the analysis of specific subproteomes. The strategy of proteome mining is shown in Fig. 15. The principles of proteome mining are based on the assumption that all drug-like molecules selectively compete with a natural cellular ligand for a binding site on a protein target. In a proteome mine, natural ligands are immobilized on beads at high density and in an orientation that sterically favors interaction with their protein targets. The immobilized ligand is then exposed to whole-animal or tissue extract, and bound proteins are evaluated for specificity by protein sequencing. In the prototypic example from our laboratory, ATP is immobilized in the "protein kinase orientation" (via its gamma phosphate). Microsequencing of the proteins that were eluted with free ATP demonstrated that the nucleotide selectively recovered purine binding proteins including protein kinases, dehydrogenases, various purine-dependent metabolic enzymes, DNA ligases, heat shock proteins, and a variety of miscellaneous ATP-utilizing enzymes (P. R. Graves, J. Kwiek, P. Fadden, R. Ray, K. Hardeman, and T. A. J. Haystead, submitted for publication). This immobilized proteome represents  4% of the expressed eukaryotic genome.

 

We have utilized this captured proteome (the purine binding cassette proteome) to test the selectivity of purine analogs  that inhibit protein kinases and stress-induced ATPases in vitro. Using a proteome-mining ATP affinity array apparatus constructed in our laboratory, sufficient biomass was applied to ensure the recovery, per column, of 1 fmol of any protein expressed at 100 copies/cell (107 cells). After washing, each column in the array is eluted in parallel with molecules from a purine-based iterative library and fractions are collected. Eluates are screened for protein, and positive fractions generally contain a single protein, a small number of structurally related proteins, or a complex mixture. Only the first two categories are sequenced, since the third resulted from elution with a nonselective inhibitor.Once one has identified an eluted protein, one has all the necessary information on how to proceed. The first decision is biological relevance. Does the eluted protein(s) in any given fraction have relevance to any human disease? If the protein has no obvious use as a drug target, it is ignored. If the protein is deemed relevant, one immediately has a lead molecule and a defined target. In cases where a single protein is eluted, the lead is likely to be selective because it had an equal opportunity to interact with the rest of the captured proteome ( 4% of the genome). Selectivity can be tested by increasing the concentration of the lead compound during elution from nanomolar to micromolar. Information concerning potential toxicity can be gained by sequencing other proteins that are simultaneously eluted or eluted at higher concentrations. If some of these are undesirable targets, iterative substitutions can be made around the lead scaffold to improve selectivity. Proof of principle of this technology was obtained by using an iterative library derived from the heat shock protein 90 inhibitor geldanamycin, and a new physiological target, ADE2, was identified (P. Fadden, V. J. Davisson, L. Neckers, and T. A. J. Haystead, unpublished data). Screening Combichem libraries through a proteome-mining approach exploits the serendipitous nature of drug discovery to its maximum, merely because it accelerates the hit rate over a conventional screen by a factorial of the proteome that is bound. In the case of purine binding proteins, this may be several hundredfold. Protein microsequencing, the data contained within the various genome projects, and the ability to instantly search the literature for relevance enable one to interpret the outcomes in a rationale way.

We are currently using proteome mining to discover new antimalarial drugs that target purine binding proteins in the blood stage of infection. Because of the essential roles of purine-utilizing enzymes in cellular function, it is our hypothesis that these proteins are attractive candidates for a new generation of antimalarial drugs. In our malaria project, the P. falciparum (blood stage) and human red blood cell purine binding proteome are captured on ATP affinity arrays and simultaneously screened against purine-based combinatorial libraries. Combining both proteomes enables the selectivity and potential toxicity of a lead molecule to be measured early in the discovery process. Microsequencing enables human proteins to be readily discriminated from malarial ones. An additional benefit of mining the entire malarial purine binding cassette proteome is that multiple leads and their targets will be identified. Combined therapies that target multiple genes simultaneously are likely to exert such tremendous selective pressure on the targeted pathogen that it cannot develop resistance. We are currently expanding our immobilized natural-ligand library in order to apply proteome mining to other areas of biology.

 

The study of proteins, in contrast to that of DNA, presents a number of unique challenges. For example, there is no equivalent of PCR for proteins, so the analysis of low-abundance proteins remains a major challenge. In addition, in protein interaction studies, native conformations of proteins must be maintained to obtain meaningful results. Can proteins be studied on a large scale with speed, sensitivity, and reliability? In the last several years, recognition of the limitations of proteomics are beginning to point the field in new directions.

Although the technology for the analysis of proteins is rapidly progressing, it is still not feasible to study proteins on ascale equivalent to that of the nucleic acids. Most of proteomics relies on methods, such as protein purification or PAGE, that are not high-throughput methods. Even performing MS can require considerable time in either data acquisition or analysis. Although hundreds of proteins can be analyzed quickly and in an automated fashion by a MALDI-TOF mass spectrometer, the quality of data is sacrificed and many proteins cannot be identified. Much higherquality data can be obtained for protein identification by MS/MS, but this method requires considerable time in data interpretation. In our opinion, new computer algorithms are needed to allow more accurate interpretation of mass spectra without operator intervention. In addition, to access unannotated DNA databases across species, these algorithms should be error tolerant to allow for sequencing errors, polymorphisms, and conservative substitutions. New technologies will have to emerge before protein analysis on a large-scale (such as mapping the human proteome)becomes a reality.

Another major challenge for proteomics is the study of low-abundance proteins. In some eukaryotic cells, the amounts of the most abundant proteins can be 106 -fold greater than those of the low-abundance proteins. Many important classes of proteins (that may be important drug targets) such as transcription factors, protein kinases, and regulatory proteins are low-copy proteins. These low-copy proteins will not be observed in the analysis of crude cell lysates without some purification. Therefore, new methods must be devised for subproteome isolation. Despite these limitations, proteomics, when combined with other complementary technologies such as molecular biology, has enormous potential to provide new insight into biology. The ability to study complex biological systems in their entirety will ultimately provide answers that cannot be obtained from the study of individual proteins or groups of proteins.

[NextPage]

 

 

 

 

 

* Corresponding author. Mailing address: Department of Pharmacology and Cancer Biology, Duke University, Research Dr., C118 LSRC, Durham, NC 27710. Phone: (919) 613-8606. Fax: (919) 668-0977. E-mail: hayst001@mc.duke.edu .

REFERENCES

蛋白质组相关设备及试剂:

液相HPLC系统 | 毛细管LC系统 | 气相色谱系统 |
自动固相萃取系统 | 联用色谱系统 | 更多...

(a) Triple quadrupole. Triple-quadrupole mass spectrometers are most commonly used to obtain amino acid sequences. In the first stage of analysis, the machine is operated in MS scan mode and all ions above a certain m /z ratio are transmitted to the third quadrupole for mass analysis (Fig. 6) (82, 173). In the second stage, the mass spectrometer is operated in MS/MS mode and a particular peptide ion is selectively passed into the collision chamber. Inside the collision chamber, peptide ions are fragmented by interactions with an inert gas by a process known as collision-induced dissociation or collisionally activated dissociation. The peptide ion fragments are then resolved on the basis of their m /z ratio by the third quadrupole (Fig. 6). Since two different mass spectra are obtained in this analysis, it is referred to as tandem mass spectrometry (MS/MS). MS/MS is used to obtain the amino acid sequence of peptides by generating a series of peptides that differ in mass by a single amino acid (71, 73).

 

(b) Quadrupole-TOF. In recent years, several "hybrid" mass spectrometers have emerged from the combination of different ionization sources with mass analyzers. One example is the quadrupole-TOF mass spectrometer (111, 112, 162). In this machine, the first quadrupole (Q1 ) and the quadrupole collision cell (q) of a triple-quadrupole machine have been combined with a time-of-flight analyzer (TOF) (145). The main applications of a QqTOF mass spectrometer are protein identification by amino acid sequencing and characterization of protein modifications. However, because it is coupled to electrospray, it is not typically utilized for large-scale proteomics.

(c) MALDI-TOF. The principal application of a MALDI-TOF mass spectrometer is peptide mass fingerprinting because it can be completely automated, making it the method of choice for large-scale proteomics work (48). Because of its speed, MALDI-TOF is frequently used as a first-pass instrument for protein identification. If proteins cannot be identified by fingerprinting, they can then be analyzed by electrospray and MS/MS. A MALDI-TOF machine can also be used to obtain the amino acid sequence of peptides by a method known as post-source decay (152). However, peptide sequencing by post-source decay is not as reliable as sequencing with competing electrospray methods because the peptide fragmentation patterns are much less predictable (85, 111).

(d) MALDI-QqTOF. The MALDI-QqTOF mass spectrometer was developed to permit both peptide mass fingerprinting and amino acid sequencing (97, 147). It was formed by the combination of a MALDI ion source with a QqTOF mass analyzer (63, 91, 97, 147, 162). Thus, if a sample is not identified by peptide mass fingerprinting in the first step, the amino acid sequence can then be obtained without having to use a different mass spectrometer. However, in our experience, the amino acid sequence information obtained using this instrument was more difficult to interpret than that obtained from a nanospray-QqTOF mass spectrometer.

(e) FT-ICR. A Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometer is an ion-trapping instrument that can achieve higher mass resolution and mass accuracy than any other type of mass spectrometer (10). Recently, FT-ICR has been employed in the analysis of biomolecules ionized by both ESI and MALDI. The unique abilities of FT-ICRprovide certain advantages compared to other mass spectrometers. For example, because of its high resolution, FT-ICR can be used for the analysis of complex mixtures. FT-ICR, coupled to ESI, is also being employed in the study of protein interactions and protein conformations. A high-throughput, large-scale proteomics approach involving FT-ICR has recently been developed by Smith et al. (150). For a review of the operating principles of FT-ICR and its applications, the reader is directed to reference 104.

(v) Peptide fragmentation. As peptide ions are introduced into the collision chamber, they interact with the collision gas (usually nitrogen or argon) and undergo fragmentation primarily along the peptide backbone (71, 73, 172). Since peptides can undergo multiple types of fragmentation, nomenclature has been created to indicate what type of ions have been generated (Fig. 7). If, after peptide bond cleavage, the charge is maintained on the N-terminus of the ion, it is designated a b-ion, whereas if the charge is maintained on the C terminus, it is a y-ion (Fig. 7) (18, 135, 173). The difference in mass between adjacent y- or b-ions corresponds to that of an amino acid. This can be used to identify the amino acid and hence the peptide sequence, with the exception of isoleucine and leucine, which are identical in mass and therefore indistinguishable (103). Both y- and b-type ions can also eliminate NH3 (-17 Da), H2 O (-18 Da) and CO (-28 Da), resulting in pairs of signals observed in the mass spectrum (Fig. 7). In addition to fragmentation along the peptide backbone, cleavage can occur along amino acid side chains, and this information can be used to distinguish isoleucine and leucine (172).

 

(vi) Our approach to mass spectrometry. The sensitivity of a mass spectrometer is probably the single most important feature of the instrument. What is the sensitivity of a modern mass spectrometer? How much protein is needed to make an unambiguous identification? Many factors can affect sensitivity, such as sample preparation, sample ionization, the type of mass spectrometer used, the sample itself, and the type of database search employed. In our laboratory, we rely on 1- or 2-DE electrophoresis for the isolation and visualization of protein targets. We typically stain our gels with either Coomassie blue or silver stain. For most proteins, staining with Coomassie blue will give a dark band for  1µg of protein and a discernible one for  200 ng. With silver staining, we can detect a dark band at  50 ng and faint yet discernible bands at  5 to 10 ng. However, a significant number of proteins do not stain well by these methods and larger proteins tend to bind more stain (mole/mole) than small proteins. In addition, MS is not a quantitative technique because peptide ionization is not quantitative. Therefore, some proteins that are barely visible on gels can give stronger signals by MS than do some darkly staining proteins. For example, one of the most frequently sequenced proteins in MS is human keratin, a component of dust. It is a contaminant that will often appear on polyacrylamide gels as faint silver-stained bands with a variety of molecular weights. It can be introduced simply from the glass plates orgel combs used for protein gels; therefore, it is a good idea to wash these items in concentrated acid before use.

We have found in our laboratory that most proteins applied to the gel at 5 to 10 ng (100 to 200 fmol for a 50-kDa protein)can be identified by MS. However, the ability to identify a protein depends on the protein itself and its presence in thedatabase. Below 5 to 10 ng, the success rate decreases because fewer peptides are obtained for sequencing. Several prominent MS laboratories routinely report record-breaking sequencing sensitivity to the attomolar level. However, this sensitivity is usually toward a purified peptide sample that is directly introduced into the mass spectrometer. Since most proteins are isolated from gels for identification, this is not an accurate measure of sensitivity. In another case, it was reported that an amino acid sequence was obtained after the in-gel digestion of 25 fmol (1.7 ng) of pure bovine serum albumin (90). Again, since the protein was known before the analysis began, this is not a fair assessment of sensitivity. For unknown proteins, more protein is required because several peptides have to be sequenced before a confident assignment can be made.

[NextPage]

 

A typical approach to protein identification in our laboratory is outlined in Fig. 8. Protein from a polyacrylamide gel isexcised and then in-gel digested with trypsin by the method of Wilm et al. (170). Following peptide extraction from thegel, we purify the peptides on Poros R2 (149, 169) in microcapillary tubes by using the method described on the website http://www.protana.com/products/applicationnotes/purification/default.asp. We use the API QSTAR Pulsar mass spectrometer (AB/MDS-SCIEX) with nanospray ionization to obtain an MS scan of the peptide mixture. From the MS scan, a peptide ion is selected for MS/MS based on its signal strength and charge state, which allow it to be distinguished from the background ions. In nanospray ionization, most peptide ions are either doubly or triply charged whereas the background ions are singly charged. This peptide ion is also known as the parent ion. MS/MS of a parent ion is performed, and amino acid sequence information for the peptide is obtained. As shown in Fig. 8, a single peptide was sequenced and found to match rhoptry-associated protein 2 (RAP-2) from Plasmodium falciparum . Since matching multiple peptides to a protein increases the confidence of identification (106), we typically sequence several peptides for each sample. For RAP-2, a total of four peptides were found to match the protein. Because the staining intensity on gels is not always a good indicator of the signal obtained by MS and because gel bands often contain protein mixtures,  additional criteria can aid in protein identification. For example, if the major protein excised from the gel was 50 kDa, does the protein identified match in molecular mass? Is the protein from the expected species? If a protein is isolated from a 2-D gel, does it match the expected isoelectric point as exhibited on the gel?

 

 

Databases allow protein structural information harvested from Edman sequencing or MS to be used for protein identification. The goal of database searching is to be able to quickly and accurately identify large numbers of proteins (132). The success of database searching depends on the quality of the data obtained in the mass spectrometer, the quality of the database searched, and the method used to search the database. What is the best way to identify an unknown protein? What type of database search engine should be used?

Peptide mass fingerprinting database searching. One method of protein identification is peptide mass fingerprinting(77, 79, 102, 125, 175). In this method, the masses of peptides obtained from the proteolytic digestion of an unknown protein are compared to the predicted masses of peptides from the theoretical digestion of proteins in a database (Fig. 9). If enough peptides from the real mass spectrum and the theoretical one overlap, a protein identification can be made. The principal advantage of peptide mass fingerprinting is speed. The analysis and database search can be fully automated.

 

The single biggest disadvantage of peptide mass fingerprinting is ambiguity in protein identification. This is because of peptide mass redundancy. For example, a peptide of 5 amino acids can have the same mass by simple rearrangement of its constitutive amino acids; e.g., peptide VAGSE has the same mass as AVGSE or AEVGS and so on. For this technique to be successful, the masses of a large number of peptides must be obtained to provide enough specificity in the search, and this is not always possible. Mass redundancy occurs with greater frequency in large genomes.Moreover, peptide mass fingerprinting is effective only in the analysis of proteins from organisms whose genome is small, completely sequenced, and well annotated (131). It has limited use against unannotated or untranslated DNA databases such as the human genome. Because mass fingerprinting is not error tolerant, several factors in addition to mass redundancy contribute to its limited use, including sequencing errors, conservative substitutions, polymorphisms, and six possible translations at the DNA level.

Another factor affecting the success of peptide mass fingerprinting is mass accuracy (32, 62). Because it is critical to obtain an accurate measurement of the masses of multiple peptides, factors that alter the masses of those peptides can reduce the success of the method. One such example is the posttranslational modification of proteins. If the unknown protein is extensively modified, the peptides produced from that protein will not match the unmodified protein in the database. Recent improvements in the mass accuracy of mass spectrometers has increased the success rate of protein identification by this method (32, 54).

Finally, peptide mass fingerprinting does not work well with protein mixtures. As a protein mixture is converted to a mixture of peptides, it increases the complexity of the peptide mass fingerprint. The process of protein identification can be hindered if even two or three proteins are present in the sample (107). Several search methods have emerged to accommodate peptide mixtures in the mass spectrum. One example is a program called ProFound, which enables protein identification in simple protein mixtures (176). However, the lack of ability to analyze protein mixtures remains a major limitation of this method. A variety of tools for database searching now exist on the World Wide Web (Table 1). The ExPASy server provides a variety of tools for proteomics and programs for protein identification (reviewed in reference 165). Search programs used for peptide mass fingerprinting include PepSea (102), PeptIdent/MultiIdent (165), MS-Fit (32), MOWSE (125), and ProFound (176).

Amino acid sequence database searching. The most specific type of database searching for protein identification uses peptide amino acid sequence. If the amino acid sequence of a peptide can be identified, it can be used to search databases to find the protein from which it was derived. One method which utilizes this information is peptide mass tag searching. In this method, a partial amino acid sequence is obtained by interpretation of the MS/MS spectrum (the sequence tag) and this information is combined with the mass of the peptide and the masses of the peptide on either side of the sequence tag where the sequence is not known (Fig. 10). Also included in the search is the type of protease used to produce the peptides. Peptide mass tag searching is a more specific tool for protein identification than peptidemass fingerprinting (49, 103, 115, 170). In addition, one of the biggest advantages of utilizing MS/MS to obtain peptideamino acid sequence is that, unlike peptide mass fingerprinting, it is compatible with protein mixtures. The ability to identify proteins in mixtures is one of the great advantages of using MS as a protein identification tool. For example, in our laboratory we frequently identify multiple proteins from what appears to be a single band on an SDS-gel. In fact, in the majority of proteomics experiments, proteins are present in mixtures at the time of analysis.

 

The major disadvantage of performing MS/MS is that the process is not easily automated. As a result, considerable time is expended in performing the analysis and interpreting the mass spectrum. Although computer programs can assist in the interpretation of the spectrum, they currently are not able to make accurate assignments without some guidance. In addition, when searching a database with peptide mass tags, there is a lack of flexibility in the search programs. If a single mistake is made in the assignment of a y- or b-ion (which can happen quite frequently), the amino acid sequence will be incorrect and the database search will bring up irrelevant proteins. Often it is necessary to confirm that the peptide sequence obtained from the database matches the sequence obtained in the mass spectrometer. This can bedone by performing a theoretical fragmentation of the peptide from the database and comparing the two mass spectra. Additional clues can also be used, such as verifying if the peptide obtained from the database ends in amino acids consistent with the type of protease used.

De novo peptide sequence information. Another approach to protein identification is to obtain de novo sequence data from peptides by MS/MS and then use all the peptide sequences to search appropriate databases. Multiple peptidesequences can be used for protein identification by searching databases with the FASTS program (Mackey et al., submitted) (Fig. 5). The single biggest advantage of this method is the capability of searching peptide sequence information across both DNA and protein databases. This is because the search engine utilized exhibits a certain amount of flexibility in the assignment of protein scores. This search method is useful for organisms that do not have well-annotated databases such as Xenopus laevis or human. However, because this method requires several peptideamino acid sequences of 3 or 4 amino acids, it is not the first choice for peptide identification. Rather, the much faster methods of peptide mass fingerprinting or peptide mass tag searching can be used first. If these search methods fail, de novo sequence information can be obtained and used to identify the protein.

Uninterpreted MS/MS data searching. A large number of programs are now available for the identification of proteins by using uninterpreted MS/MS data. Examples include programs such as Mascot (129), SONAR (53), and SEQUEST (49)(Table 1). However, searches against unannotated or untranslated DNA databases with uninterpreted MS/MS data are likely to suffer from the same pitfalls associated with mass fingerprinting. In particular, polymorphisms, sequencing errors, and conservative substitutions will probably contribute to failure to accurately identify a protein. The development of uninterpreted MS/MS search algorithms that are error tolerant may overcome some of theseshortcomings, provided that they assign some form of statistical scoring to the identified proteins.

 

[NextPage]

 

 

 

The single most common application of proteomics is protein identification. Most investigators use proteomics approaches to isolate and display proteins based on their own specific criteria and then identify the proteins. Protein identification provides immediate information that will direct subsequent experimentation. For example, the identity of a protein can reveal an expected result, validate a proteomics approach, provide completely unexpected information, or reveal that your biochemical method is not working at all. We feel that the most critical stage of any proteomicsapproach is the strategic design for the isolation of protein targets. In recent years, as the technology of MS has improved, there has been a de-emphasis on the "front-end" of proteomics experiments compared to data analysis. This can result in the isolation of hundreds of irrelevant proteins for identification, consuming both time and effort. Our general strategy is to devise techniques that enrich for low-abundance proteins and then analyze only the proteins that appear on differential display or are isolated by affinity chromatography. To accomplish this, we use affinity columns and other strategies to select for protein targets. In each case, protein samples are subjected to a series of precolumns and high-stringency washes to remove nonspecific proteins. This reduces the number of irrelevant proteins for analysis.  

 

Many laboratories are now engaged in an effort to characterize protein complexes by MS. Examples include Link et al. utilizing multidimensional LC and MS/MS to identify proteins (95) or Mann and colleagues identifying proteins present after immunoprecipitation of protein complexes (124). Recently, Macara, Haystead, and coworkers used MS to identify interacting proteins with the Cdc42 effector, Borg3 (80). In this case, the "bait" protein, Borg3, was produced as a glutathione S -transferase (GST) fusion in E. coli and then mixed with NIH 3T3 cell lysate. Four interacting proteins were identified by mixed-peptide sequencing: heat shock protein Hsp70 and three septins including Septin6, Cdc10, and Nedd5 (Fig. 11). None of these proteins were present in the GST-only control sample. Although the interaction with Hsp70 was not pursued, it was shown from coimmunoprecipitation studies that endogenous Borg3 interacts with endogenous Cdc10 and Nedd5 (80). Additional proof from expression and structure-function studies confirmed a role for the Borg proteins as regulators of septin organization. It should be noted that although several proteins were quickly identified as Borg3 interactors by the pull-down experiment, it took several more months of work to confirm this interaction.

 

 

The largest application of proteomics continues to be protein expression profiling. Through the use of two-dimensional gels or novel techniques such as ICAT, the expression levels of proteins or changes in their level of modification between two different samples can be compared and the proteins can be identified. This approach can facilitate the dissection of signaling mechanisms or identify disease-specific proteins.

Expression profiling by two-dimensional electrophoresis. Currently, the majority of protein expression profiling studiesare performed by 2-DE. Several diseases have been studied, including heart disease (44) and cancer (30). Cancer cells are good candidates for proteomics studies because they can be compared to their nontransformed counterparts. Analysis of differentially expressed proteins in normal versus cancer cells can (i) identify novel tumor cell biomarkers that can be used for diagnosis, (ii) provide clues to mechanisms of cancer development, and (iii) identify novel targets for therapeutic intervention. Protein expression profiling has been used in the study of breast (121), esophageal (121), bladder (30) and prostate (114) cancer. From these studies, tumor-specific proteins were identified and 2-D protein expression databases were generated. Many of these 2-D protein databases are now available on the World Wide Web (15).

Isotope-coded affinity tags. Recently, a novel method for protein expression profiling was introduced that does not depend on the separation of proteins by 2-DE. This method is known as isotope-coded affinity tags (ICAT) and relies on the labeling of protein samples from two different sources with two chemically identical reagents that differ only in mass as a result of isotope composition (66). Differential labeling of samples by mass allows the relative amount of protein between two samples to be quantitated in the mass spectrometer. An example of the methodology of ICAT isshown in Fig. 12. Cell extract from two different samples is reacted with one of two forms of the ICAT reagent, an isotopically light form in which the linker contains eight hydrogens or a heavy form in which the linker contains eight deuterium atoms. The ICAT reagent reacts with cysteine residues in proteins via a thiol-reactive group and contains a biotin moiety to facilitate purification (Fig. 12). Peptides are recovered on the basis of the biotin tag by avidin affinity chromatography and are then analyzed by MS. The difference in peak heights between heavy and light peptide ions directly correlates with the difference in protein abundance in the cells. Thus, if a protein is present at a threefold higher level in one sample, this will be reflected in a threefold difference in peak heights. Following quantitation of the peptides, they can be fragmented by MS/MS and the amino acid sequence can be obtained. Thus, using this approach, proteins can be identified and their expression levels can be compared in the same analysis.

 

The single biggest advantage of this method is the elimination of the 2-D gel for protein quantitation. As a result, an increased amount of sample can be used to enrich for low-abundance proteins. Alternatively, the cell lysate can be fractionated prior to reaction with the ICAT reagent. This can allow the enrichment of low-abundance proteins before the analysis begins. The main disadvantages are that currently this method works only for proteins containing cysteine, even though this includes the majority of proteins (68). In addition, peptides must contain appropriately spaced protease cleavage sites flanking the cysteine residues. Finally, the ICAT label is large ( 500 kDa) and remains with each peptide throughout the analysis. This can make database searching more difficult, especially for small peptides with limited sequence (4, 65). Sensitivity may also be of concern since tagged peptides derived from low-copy proteins are likely to be poorly recovered during the affinity step as a result of nonspecific interactions with avidin-Sepharose. Studies have been performed to optimize the labeling of proteins with the ICAT reagent (151).

Protein arrays. Protein arrays are undergoing rapid development for the detection of protein-protein interactions and protein expression profiling (17, 98, 180, 181). Recently, protein microarrays were created using ordinary laboratory equipment (98). Proteins were immobilized by being covalently attached to glass microscope slides, and the protein microarrays were shown to be capable of interacting with other proteins, small molecules, and enzyme substrates (98). In another report, 5,800 yeast proteins were expressed and printed onto microscope slides. These protein microarrayswere used to identify novel calmodulin- and phospholipid-interacting proteins (180). These reports indicate that protein arrays hold great promise for the global analysis of protein-protein and protein-ligand interactions. Undoubtedly, these arrays will improve as the technology for their creation is developed and refined.

[NextPage]

 

 

Posttranslational modification of proteins is a fundamental regulatory mechanism, and characterization of protein modifications is paramount for understanding protein function. MS is one of the most powerful tools for the analysis of protein modifications because virtually any type of protein modification can be identified. Although we focus here on protein phosphorylation, the analysis of other types of protein modification by MS has been described (25). Protein phosphorylation is one of the most common of all protein modifications and has been found in nearly all cellularprocesses (74, 88, 153). MS can be used to identify novel phosphoproteins, measure changes in the phosphorylation state of proteins in response to an effector, and determine phosphorylation sites in proteins. Identification of phosphorylation sites can provide information about the mechanism of enzyme regulation and the protein kinases and phosphatases involved. A proteomics approach to protein phosphorylation has the advantage that instead of studying changes in the phosphorylation of a single protein in response to some perturbation, one can study all the phosphoproteins in a cell (the phosphoproteome) at the same time. A common approach to studying protein phosphorylation events is the use of in vivo labeling of phosphoproteins with inorganic 32 P. The phosphoproteomes of cells that differ in some way (e.g., normal versus diseased) can be analyzed by growing cells in inorganic 32 P and creating cell lysates. Changes in the phosphorylation state of proteins can then be examined by 2-DE and autoradiography. Proteins of interest are excised from the gel and microsequenced by MS. A major limitation of this approach is that while many phosphorylated proteins can be visualized by autoradiography, they cannot be identified because of their low abundance. One solution to this problem is enrichment of the phosphoproteome.

Phosphoprotein enrichment. Enrichment of the phosphoproteome of a cell can allow the identification of low-copy phosphoproteins that would otherwise go undetected. In one approach, phosphoproteins were enriched by conversionof phosphoserine residues to biotinylated residues (118). This method is an extension of techniques originally developed by Hielmeyer and colleagues (108) and more recently by our laboratory (51) for the identification of phosphorylation sites using Edman sequencing. Following derivatization, proteins that were formerly phosphorylated can be isolated by avidin affinity chromatography (118). Proteins immobilized on avidin beads can then be eluted with biotin, theoretically resulting in the isolation of the entire phosphoserine proteome (Fig. 13). By increasing the amountof cell lysate used for avidin affinity chromatography, low-abundance phosphoproteins can be enriched. However, this technique does not work for phosphotyrosine and the reactivity of phosphothreonine by this method is very poor (118). Tyrosine-phosphorylated proteins can be isolated by the use of antiphosphotyrosine antibodies (124). As an alternative, another method for phosphopeptide enrichment was devised to allow the recovery of proteins phosphorylated on serine, threonine, and tyrosine (179). In this method, a protein or mixture of proteins is digested to peptides with a protease and then subjected to a multistep procedure for the conversion of phosphoamino acids into free sulfhydryl groups. To capture the derivatized peptides, the free sulfhydryl groups in the peptides are then reacted with iodoacetyl groups immobilized on glass beads. Using this method, several phosphopeptides were recovered from ß-casein and from a yeast cell extract, although it was unclear whether all the proteins isolated from the yeast extract were phosphoproteins (179).

 

Enrichment of the phosphoproteome can also be combined with protein profiling by 1- or 2-DE. In this way, changes in protein amount observed on electrophoresis will reflect the level of protein phosphorylation (Fig. 13). Recently, the principle of protein quantitation by ICAT has been combined with phosphoprotein enrichment (60). This was accomplished by the introduction of isotopic label into ethanedithiol, the reagent used to convert the alkene created by ß-elimination of phosphoserine into a free sulfhydryl group. In this way, the differences in the amount of phosphoproteins in extracts can be analyzed quantitatively in the mass spectrometer (60). It should be noted that because of the chemistry used in both of these methods, these techniques are relatively insensitive and require tens of picomoles of phosphoprotein. As a result, we have found that these methods as currently designed are impractical for the isolation and enrichment of low-abundance phosphoproteins.

Phosphorylation site determination by Edman degradation. Edman sequencing is still a widely used method for determining phosphorylation sites in proteins labeled with 32 P, either in vitro or in vivo (5, 22, 164). This is because sites can be determined at the sub-femtomolar level if enough radioactivity can be incorporated into the phosphoprotein of interest. In our hands, this can be as little as 1,000 cpm (not ideal). Briefly, 32 P-labeled protein is digested with a protease and the resulting phosphopeptides are separated and purified by reverse-phase HPLC or thin-layer chromatography (TLC) (Fig. 14). The isolated peptides are then cross-linked via their C termini to an inertmembrane (e.g. Immobilon P; PerSeptive Biosystems). The radioactive membrane is subjected to several rounds of Edman cycles, and radioactivity is collected after the cleavage step. The released 32 P is counted in a scintillation counter. This method positionally places the phosphoamino acid within the sequenced phosphopeptide. Of course, this is meaningful only if the sequence of th, e phosphopeptide is already known. In addition, the analysis ceases to become quantitative beyond 30 Edman cycles (even with efficient, modern Edman machines) due to well-understood issues with repetitive yield associated with Edman chemistry.

 

Recently, our laboratory has extended the usefulness of phosphorylation site characterization by Edman chemistry through the development of the cleaved radioactive peptide (CRP) program (J. A. MacDonald, A. J. Mackay, W. R. Pearson, and T. A. J. Haystead, submitted for publication). In CRP analysis, one requires only that the sequence of the protein be known. Purification and sequencing of individual peptides is not required. Radiolabeled proteins (isolated following immunoprecipitation from 32 P-labeled cells, for example) are cleaved at predetermined residues by the action of a protease. The phosphopeptides are then separated by HPLC or TLC (if only one site is present, no peptide separation is required), cross-linked to the inert membrane, and carried through 25 to 30 Edman cycles. The sequence of the target protein is entered into the CRP program. This program predicts how many Edman cycles are required to cover 100% of all the serines, threonines, and tyrosines from the site of cleavage. Generally, one round of CRP analysis narrows the number of possible sites to 5 to 10 for most proteins. Phosphoamino acid analysis can be used to reduce the number of possibilities still further. The CRP analysis is then repeated following cleavage with a second protease (usually one cutting at R, but M and F are alternatives). The second round of CRP usually unambiguously localizes thephosphoamino acid to one possible site. The technique does not work if sites are more than 30 amino acids away from all possible cleavage sites. The finding that CRP analysis is not applicable may in itself confine a phosphorylation site to a segment of the protein that is likely to produce very large proteolytic fragments. The Cleavage of Radioactive Proteins (CRP) program is accessible at http://fasta.bioch.virginia.edu/crp/ and was written in collaboration with Aaron Mackey and Bill Pearson of the University of Virginia (MacDonald et al., submitted).

Phosphorylation site determination by mass spectrometry. Because of its sensitivity, MS can allow the direct sequencing of phosphopeptides, resulting in unambiguous phosphorylation site identification. Below, a brief overview of some common methods for phosphorylation site determination by MS are given. A more complete discussion of this topic is provided by Mitchelhill and Kemp (110). Identification of phosphorylation sites in proteins provides several unique challenges for the mass spectrometrist. For example, unlike in protein identification, where analysis of any peptide within the protein can be informative, phosphorylation site analysis requires that the phosphorylated peptide be analyzed. This means that considerably more protein is required for analysis. In addition, phosphorylation can alter the cleavage pattern of a protein and the resulting phosphopeptides may require different purification methods. To isolate and purify the phosphopeptides of interest, it may be necessary to alter the way in which the phosphoprotein is digested and to alter the pH or the chromatographic material used for peptide purification (27, 110, 116).

(i) Phosphopeptide sequencing by MS/MS. In our laboratory, we have found that a combination of HPLC, Edman degradation, and phosphopeptide sequencing by MS/MS provides the best results for phosphorylation site determination (Fig. 14). Following excision and digestion of a 32 P-labeled protein, the peptides are resolved by HPLC. By monitoring HPLC fractions for radioactivity, the phosphopeptides can be selected for analysis. This reduces the complexity of the peptide mixture before MS is performed and facilitates phosphopeptide identification (Fig. 14).

Phosphopeptides can be identified from a mixture of peptides by a method known as precursor ion scanning (116). In this method, the second mass analyzer in the mass spectrometer is set at the mass of the reporter ion for the phospho group (PO3 - ) of m/z = 79. Peptides are sprayed under neutral or basic conditions, and phosphopeptides are identified in the precursor ion scan only if their fragmentation yields an ion of m/z = 79. Once a phosphopeptide is identified, the peptide mixture is sprayed under acidic conditions and the phosphopeptide is sequenced by conventional tandem MS/MS. On fragmentation of the phosphopeptide, phosphoserine can be identified by the formation of dehydroalanine(69 Da), the ß-elimination product of phosphoserine. Similarly, phosphothreonine can be identified by the formation of its ß-elimination product, dehydroamino-2-butyric acid at 83 Da (116).

(ii) Analysis of phosphopeptides by MALDI-TOF. MALDI-TOF mass spectrometry can also be used to identify phosphopeptides (81, 130, 177, 178). When phosphorylated peptides are subjected to ionization by MALDI, phosphate groups are frequently liberated from the peptides. This is the case for phosphoserine- and phosphothreonine-containing  peptides, which can liberate HPO3 or H3 PO4 , resulting in a neutral loss of 80 and 98 Da, respectively. Careful examination of the TOF spectrum for differences in peptide masses of 80 Da that are not found in the unphosphorylated peptide control can identify phosphopeptides. Phosphopeptides can also be identified by treating one of two identical samples with protein phosphatase to liberate phosphate groups (Fig. 14). Once a phosphopeptide is identified, it can be sequenced by MS/MS for identification of the phosphorylation site (178).

 

One of the most exciting applications of proteomics involves combining this technology with the power of yeast genetics to delineate signaling events in vivo. Our laboratory has published two papers using this strategy to identify in vivo targets for protein phosphatases (9, 40). In one study (9), we identified physiological substrates for the Glc7p-Reg1p complex by examining the effects of deletion of the REG1 gene on the yeast phosphoproteome. In S. cerevisiae , PP-1 (Glc7p) and its binding protein, Reg1p, are essential for the regulation of glucose repression pathways. The target for this phosphatase complex was not known. Analysis by 2-D phosphoprotein mapping identified two distinct proteins that were greatly increased in phosphate content in reg1  mutants. Mixed-peptide sequencing identified these proteins as hexokinase II (Hxk2p) and the E1 subunit of pyruvate dehydrogenase. We then went on to validate these findings in a comprehensive biochemical study. Consistent with increased phosphorylation of Hxk2p in response to REG1 deletion, fractionation of yeast extracts by anion-exchange chromatography identified a Hxk2p phosphatase activity in wild-type strains that was selectively lost in the reg1  mutant. Having carried out these studies, we attempted to rescue the reg1  phosphoprotein phenotype by overexpressing both wild-type and mutant Reg1p in the deletion strains. Here, both the phosphorylation state of Hxk2p and Hxk2p phosphatase activity were restored to wild-type levels in the reg1  mutant by expression of a LexA-Reg1p fusion protein. In contrast, expression of a LexA-Reg1p protein containing mutations at phenylalanine in a putative PP-1C (the catalytic subunit) binding site motif(K/R)(X)(I/V)XF was unable to rescue Hxk2p dephosphorylation in intact yeast or restore Hxk2p phosphatase activity. These results demonstrate that Reg1p targets PP-1C to dephosphorylate Hxk2p in vivo and that the peptide motif (K/R)(X)(I/V)XF is necessary for its PP-1 targeting function. These studies therefore demonstrate how a proteomics approach can be used to first identify enzyme targets in cells and then direct all further analysis to verify the findings. It should be pointed out that often 6 to 12 months of work ensues following the initial sequencing of the targeted proteins. Nevertheless, clearly a combined proteomics and genetics approach greatly enhances one's ability to directly answer key biological questions. We believe that a similar strategy could be adopted with transgenic or knockout mouse work, particularly in cases where there is no obvious phenotype.

[NextPage]

 

 

Proteome mining is a functional proteomics approach used to extract protein information from the analysis of specific subproteomes. The strategy of proteome mining is shown in Fig. 15. The principles of proteome mining are based on the assumption that all drug-like molecules selectively compete with a natural cellular ligand for a binding site on a protein target. In a proteome mine, natural ligands are immobilized on beads at high density and in an orientation that sterically favors interaction with their protein targets. The immobilized ligand is then exposed to whole-animal or tissue extract, and bound proteins are evaluated for specificity by protein sequencing. In the prototypic example from our laboratory, ATP is immobilized in the "protein kinase orientation" (via its gamma phosphate). Microsequencing of the proteins that were eluted with free ATP demonstrated that the nucleotide selectively recovered purine binding proteins including protein kinases, dehydrogenases, various purine-dependent metabolic enzymes, DNA ligases, heat shock proteins, and a variety of miscellaneous ATP-utilizing enzymes (P. R. Graves, J. Kwiek, P. Fadden, R. Ray, K. Hardeman, and T. A. J. Haystead, submitted for publication). This immobilized proteome represents  4% of the expressed eukaryotic genome.

 

We have utilized this captured proteome (the purine binding cassette proteome) to test the selectivity of purine analogs  that inhibit protein kinases and stress-induced ATPases in vitro. Using a proteome-mining ATP affinity array apparatus constructed in our laboratory, sufficient biomass was applied to ensure the recovery, per column, of 1 fmol of any protein expressed at 100 copies/cell (107 cells). After washing, each column in the array is eluted in parallel with molecules from a purine-based iterative library and fractions are collected. Eluates are screened for protein, and positive fractions generally contain a single protein, a small number of structurally related proteins, or a complex mixture. Only the first two categories are sequenced, since the third resulted from elution with a nonselective inhibitor.Once one has identified an eluted protein, one has all the necessary information on how to proceed. The first decision is biological relevance. Does the eluted protein(s) in any given fraction have relevance to any human disease? If the protein has no obvious use as a drug target, it is ignored. If the protein is deemed relevant, one immediately has a lead molecule and a defined target. In cases where a single protein is eluted, the lead is likely to be selective because it had an equal opportunity to interact with the rest of the captured proteome ( 4% of the genome). Selectivity can be tested by increasing the concentration of the lead compound during elution from nanomolar to micromolar. Information concerning potential toxicity can be gained by sequencing other proteins that are simultaneously eluted or eluted at higher concentrations. If some of these are undesirable targets, iterative substitutions can be made around the lead scaffold to improve selectivity. Proof of principle of this technology was obtained by using an iterative library derived from the heat shock protein 90 inhibitor geldanamycin, and a new physiological target, ADE2, was identified (P. Fadden, V. J. Davisson, L. Neckers, and T. A. J. Haystead, unpublished data). Screening Combichem libraries through a proteome-mining approach exploits the serendipitous nature of drug discovery to its maximum, merely because it accelerates the hit rate over a conventional screen by a factorial of the proteome that is bound. In the case of purine binding proteins, this may be several hundredfold. Protein microsequencing, the data contained within the various genome projects, and the ability to instantly search the literature for relevance enable one to interpret the outcomes in a rationale way.

We are currently using proteome mining to discover new antimalarial drugs that target purine binding proteins in the blood stage of infection. Because of the essential roles of purine-utilizing enzymes in cellular function, it is our hypothesis that these proteins are attractive candidates for a new generation of antimalarial drugs. In our malaria project, the P. falciparum (blood stage) and human red blood cell purine binding proteome are captured on ATP affinity arrays and simultaneously screened against purine-based combinatorial libraries. Combining both proteomes enables the selectivity and potential toxicity of a lead molecule to be measured early in the discovery process. Microsequencing enables human proteins to be readily discriminated from malarial ones. An additional benefit of mining the entire malarial purine binding cassette proteome is that multiple leads and their targets will be identified. Combined therapies that target multiple genes simultaneously are likely to exert such tremendous selective pressure on the targeted pathogen that it cannot develop resistance. We are currently expanding our immobilized natural-ligand library in order to apply proteome mining to other areas of biology.

 

The study of proteins, in contrast to that of DNA, presents a number of unique challenges. For example, there is no equivalent of PCR for proteins, so the analysis of low-abundance proteins remains a major challenge. In addition, in protein interaction studies, native conformations of proteins must be maintained to obtain meaningful results. Can proteins be studied on a large scale with speed, sensitivity, and reliability? In the last several years, recognition of the limitations of proteomics are beginning to point the field in new directions.

Although the technology for the analysis of proteins is rapidly progressing, it is still not feasible to study proteins on ascale equivalent to that of the nucleic acids. Most of proteomics relies on methods, such as protein purification or PAGE, that are not high-throughput methods. Even performing MS can require considerable time in either data acquisition or analysis. Although hundreds of proteins can be analyzed quickly and in an automated fashion by a MALDI-TOF mass spectrometer, the quality of data is sacrificed and many proteins cannot be identified. Much higherquality data can be obtained for protein identification by MS/MS, but this method requires considerable time in data interpretation. In our opinion, new computer algorithms are needed to allow more accurate interpretation of mass spectra without operator intervention. In addition, to access unannotated DNA databases across species, these algorithms should be error tolerant to allow for sequencing errors, polymorphisms, and conservative substitutions. New technologies will have to emerge before protein analysis on a large-scale (such as mapping the human proteome)becomes a reality.

Another major challenge for proteomics is the study of low-abundance proteins. In some eukaryotic cells, the amounts of the most abundant proteins can be 106 -fold greater than those of the low-abundance proteins. Many important classes of proteins (that may be important drug targets) such as transcription factors, protein kinases, and regulatory proteins are low-copy proteins. These low-copy proteins will not be observed in the analysis of crude cell lysates without some purification. Therefore, new methods must be devised for subproteome isolation. Despite these limitations, proteomics, when combined with other complementary technologies such as molecular biology, has enormous potential to provide new insight into biology. The ability to study complex biological systems in their entirety will ultimately provide answers that cannot be obtained from the study of individual proteins or groups of proteins.

[NextPage]

 

 

 

 

 

* Corresponding author. Mailing address: Department of Pharmacology and Cancer Biology, Duke University, Research Dr., C118 LSRC, Durham, NC 27710. Phone: (919) 613-8606. Fax: (919) 668-0977. E-mail: hayst001@mc.duke.edu .

 

(0)

热评文章

发表评论