Delineating the Minimal Functional Binding Domain – How to Probe Protein Structures with DoMY-Seq

Delineating the Minimal Functional Binding Domain – How to Probe Protein Structures with DoMY-Seq


Protein Analysis

Protein analysis tools and resources provide useful information for research in molecular biology, structural biology, computational biology. These tools help our understanding about the unique features of proteins, such as 3D shapes, protein sequence and functions, domains and domain mapping in addition to providing large comprehensive databases.

PDB Protein Databank
Protein Data Bank (PDB) has archived information about the 3D shapes of proteins, nucleic acids, and complex assemblies.

Uniprot is a freely accessible resource to find protein sequence and functional information. It has tools to blast, align, map and search peptides, core data such as protein knowledgebase, sequence clusters and sequence archive, and support data as literature citations, taxonomy, and keywords.  The ExPASy Bioinformatics Resources Portal is an expandable and integrable portal that has access to many scientific resources, databases and software tools in different areas of life sciences.

ExPASy was launched by the SIB Swiss Institute of Bioinformatics. It has a visual guidance interface to select elements such as a DNA, RNA, protein, cell, organism, or population.

SMART Database
SMART stands for “Simple Modular Architecture Research Tool” and allowing the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than five hundred domain families are found and are completely annotated regarding their phylogenetic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database, as well as search parameters and taxonomic information, are stored in a relational database system. User interfaces allow searches for proteins containing specific combinations of domains.

Interpro Database
InterPro classifies proteins into families and predicts domains and important sites, which results in their functional analysis. To classify proteins, InterPro uses predictive models, known as signatures, from several different databases that make up the InterPro consortium.

NGS Data Analysis

Next Interactions provides a tailored analysis platform with custom scripts written in Perl and R to characterize interaction sites from the NGS-Y2H readout data. Other tools can be applied as well for this purpose and also for analysis that goes beyond we can provide.

Galaxy Main
The main Galaxy site ( is an installation of the Galaxy software combined with many common tools to analyze data free of charge. Researchers can use either the public mainframes or download a copy of the server for their own use. Large datasets can be analyzed because of the large CPU and disk space.

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses that you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

Integrative Genome Viewer (IGV)
Integrated Genome Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations. IGV is applied in our pipeline to visualize NGS reads that map to protein binding sites that are uncovered in the screen. An automated procedure extracts the enrichment of paired-end NGS reads.

Network and Pathway Analysis

PPI networks have been used to further the study of molecular evolution, to gain insight into the robustness of cells to perturbation, and for assignment of new protein functions.

(Technical note/description)
Qisampler performs a systematic statistical evaluation of scoring systems in a dataset. Qisampler is an R script that systematically evaluates several scoring schemes for high throughput experiments versus given golden sets using a sampling strategy. Modularity of the input format allows the use of this application with various dataset types, such as protein-protein interactions (PPIs), gene-expression microarray, or deep sequencing datasets.

String DB
STRING is a database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases. The STRING database aims to provide a critical assessment and integration of protein-protein interactions.

Gene Ontology
The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes and constitutes a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research.

KEGG Pathway Database
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances.

BioCyc is a collection of 14728 Pathway/Genome Databases (PGDBs), plus software tools for exploring them. Key aspects of BioCyc data: Quality data curated from tens of thousands of publications, including curated databases for E. coli, B. subtilis, H. sapiens, and S. cerevisiae. It contains computationally predicted metabolic pathways and operons.

Intact EBI
IntAct provides a freely available, open-source database system and analysis tools for molecular interaction data. All interactions are derived from literature curation or direct user submissions and are freely available

Visualisation of Network Data

With network visualization we can display the information in a protein-protein interaction network, with hubs and spoke, how protein associations are connected to each other, to highlight key elements (hubs) that are relevant in disease and modifiers that affect expression of a given trait and that affect disease outcome. Proteins also interact with different other proteins over time, therefore tools for time-based graph analysis may be helpful to spot trends or compare activity. Network visualization, graph visualization or link analysis helps us visually present networks of connected entities as links and nodes, where nodes represent data points and links represent the connections between them.  To understand connections in our flat data and to find interesting and valuable relationships we need powerful network visualization tools.

Network biology has been considered that biological networks are similar to social networks, Cytoscape is an open-source bioinformatics software platform to visualize molecular interaction networks and integrating with gene expression profiles and other data. Cytoscape is used in molecular and systems biology, genomics, and proteomics. Many modules have been added and sophisticated analysis can be performed.

Consult our Protein Interaction Screening Experts.

Contact Next Interactions for advice from our scientists about the best approach for elucidating the protein interactions you need to understand for your research. We can advise you on methods such as the NGS-yeast two-hybrid system, yeast two-hybrid system, how to compliment interactome coverage by mass spectrometry, and more.


What are protein-protein interactions (PPIs)?

Protein-protein interactions are the specific physical contacts established between two or more proteins. Protein-protein interactions fundamentally underlie the function of all biological systems.

Why does measuring protein-protein interactions (PPIs) matter?

Measurements of PPIs are central to understanding the molecular basis of disease phenotypes, and for enabling biotechnology to produce novel products. These products range from effective therapeutics to next-generation foods, materials, and fuels.

How do researchers measure protein interactions?

It depends on if they measure binary protein interaction (only two proteins) or a protein complex (three or more proteins). Binary interactions are easily determined by genetic selection systems such as yeast two-hybrid and display technologies used in vivo. Protein complexes are most often isolated and analyzed by affinity purifications followed by mass spectrometry.

Why is mapping of protein interaction sites important?

The determination of PPI motifs is important to generate loss-of-function mutations or assist structural and drug discovery studies. With our precision mapping of interaction sites, we can define the exact binding domains for two interacting proteins. Scientists can then use this information to generate and test drugs that disrupt and alter binding. We can also consider using the methodology to define novel interaction sites on target proteins that are induced by drugs.

What are interactome networks?

These are complex systems of tens of thousands of protein-protein interactions in the cell. We can envision cells as complex webs of macromolecular interactions. There are three interactome network types: metabolic, protein-protein interaction, and gene regulatory networks. These are composed of physical or biochemical interactions between macromolecules. Binary interactome empirical frameworks using the yeast two-hybrid offers a way to estimate the global size of interactomes. Next Interactions is offering high-throughput screening technology for >100 Y2H screens that allow for a comprehensive overview of interactome networks (e.g. disease pathways and host-pathogen interactions).

Can the effect of compounds or drugs on protein interactions be measured?

Yes, Next Interactions has methods available that allow compound testing. To learn more contact us.

Next Interactions Services

What is the Yeast Two-Hybrid (Y2H) method?

Yeast two-hybrid screening (often called Two-Hybrid system or Y2H) is a classic molecular biology technique to discover protein-protein interactions and/or protein-DNA interactions. Y2H tests for physical interactions or binding between two proteins (binary interactions).Most of the classic pathway discoveries were made by Y2H. The transcription factor of the yeast is split into two separate fragments, called the DNA-binding domain (DBD or BD) and activating domain (AD). The DBD is cloned to the known “bait” protein and the AD is cloned to the “prey” protein or a library of many prey proteins (possible interactors).  During the interaction of the two proteins (bait and pre), these domains get into physical proximity and transcription occurs. A reporter gene is also inserted right next to it, and its transcription signals that a successful interaction of two proteins (bait and prey) took place. 

What is a library? How is a library prepared?

The prey protein is fused to the activation domain AD of the yeast’s transcription factor. We can either test a single known protein as prey, or we can screen many known or unknown proteins, which is then a  library of different prey. A cDNA (complementary DNA) library is a collection of protein-encoding sequences that represent all the proteins expressed in a tissue or an organism. Screening libraries can also be generated from assembled ORF libraries and genomic DNA (gDNA).These protein-encoding sequences are inserted into the prey plasmid. This plasmid is then transfected into yeast cells. Each cell ultimately expresses no more than a single member from the protein library, therefore no multiple copies of the same protein will be expressed. At Next Interactions we use a modified protocol for preparing our libraries. This enables us to carry out efficient next-generation sequencing readouts followed by bioinformatics. To learn more, please contact us.

What is the difference between conventional Y2H and NGIS (NGS-Y2H)?

Conventional Y2H is the original simple method. As the readout, it is using  Sanger AKA first-generation sequencing or capillary sequencing. The NGS-Y2H is an innovation that relies on next-generation sequencing, which enables several levels of measurement with complex bioinformatics analysis. Visit our services page to learn more.

What material do I need to send?

You need to send the extracted RNA from the tissue for the library that you would like us to prepare. We are not specialized in extraction from all the various tissue sources. You need to measure and send us the RIN scores so that we know what quality RNA we are working with.

What is the purity of the material I should provide?

Best is an RNA Integrity (RIN) value of 8-10. If that is not possible, for RNAs from certain sources (plant samples), a lower amount may also work, we just need higher amounts of it.

What is the RIN score?

RIN stands for RNA Integrity Number, and is a metric of the quality of RNA, scaled from 0-10. It is based upon the relative size and ratio of the 18S and 28S peaks in a given RNA sample. If the RNA is more degraded, one or both of those peaks will be smaller than expected and the background noise will be higher, resulting in a lower RIN score.


What is the list of libraries Next Interactions has ready?

With our technology, we can rapidly generate cDNA libraries from any source, and we have readily available libraries from different sources: Arabidopsis thaliana, Zea mays, Glycine max, human heart, lung, kidney, yeast genomic libraries and more. Contact us to ask about your project needs.

What is a low complexity and high complexity library?

The complexity of the library means:  the greater the diversity of clones, the more likely the library contains sequences that will bind a given target with sufficient affinity. With a low complexity library there is low coverage of the template DNA or RNA with library fragments, and hence a significant proportion of reads share identical start sites or go even undetected in a screen. In contrast, a high complexity library contains many unique fragments and a minimum of identical fragments. With our platform, we can easily generate highly complex libraries consisting of 5-10 millions of unique cDNA or gDNA fragments that represent primary clones in the original transformation. Contact us to see what is needed in your case.

My library has low complexity. What are the consequences and causes of it?

Low complexity problems for a library can arise when the starting material is low, i.e. when multiple amplification cycles are required to generate the library that has then to be transformed into yeast. When too few copies of the sample sequence were used to construct the library, resulting in random sampling bias. For NGS, input requirements are usually dictated by the nature of the sample and the amount that is available. Please contact us with your specific project parameters, so we can answer your question.

I want a normalized library. Can you provide this service? Why or why not?

With our NGS-Y2H approach, we can detect all potential hits in a Y2H screen. So there is less or no need for normalizing the library separately.


What baits work well for Y2H-NGS? What baits are problematic?

In general, soluble proteins and protein fragments that form soluble domains are the easiest to screen. Transmembrane domains and hydrophobic domains may cause improper localization and folding in the Y2H context. A frequent problem is posed by self-activating bait proteins, which cause the improper activation of Y2H reporters in the absence of an interaction with an activation domain linked prey protein. Transcription factors as baits tend to self-activate Y2H reporters as it is their natural function to activate transcription. But self-activation is also often caused by exposed acidic patches in proteins that are not transcription factors. 

To evaluate bait proteins for problematic sequences that may affect proper expression and function in yeast we developed the Baitshop application (manuscript in preparation) which is a specialized bioinformatics tool for this purpose. 

What causes failures of bait cloning and expression?

Besides self-activation, other problems are also observed in bait fusion expression. For example, difficult sequences (repeats) and bait toxicity are the main sources of bait failure of a yeast fusion protein. Fusion proteins are not properly expressed, rapidly degraded or they cause toxicity in yeast upon high-level expression which leads to diminished yeast cell growth. Our Baitshop application has functions that allow us to determine if a protein has difficult structures such as repeat sequences that may hamper its expression. We also look for homologies to yeast proteins and to known apoptotic factors that could interfere with yeast cell growth and viability.

What causes bait dependency tests to fail? What is auto-activation? What causes it?

Self-activating baits activate the transcriptional reporters without any interaction with a prey protein. These proteins are either not screenable or need more careful adjustment.  We apply Baitshop procedures to determine self-activating sequences with sequence scanning to predict activation motifs, and evaluation of modeled protein structures to determine if such motifs are exposed on the protein surface.

My bait is a known self-activator in the Y2H assay. What are my options?

We take great care using Baitshop bioinformatics to generate function bait proteins and to find good conditions to screen even for the most difficult sequences. With this approach, we minimize the occurrence of bait proteins that have significant problems in the screening procedures.


What sequencing capacity should be used as a readout?

We perform the NGS for you and we will determine what is best for your project.


What approaches to bioinformatic analyses have higher false positives / lower false positives?

The conventional Y2H screening can result in up to 50-80% false positives therefore you need to spend a lot of time validating hits by other methods and rule them out one by one. With the NGS-Y2H the false positives are ruled out at the same time performing the bioinformatics analysis when we have data also from the background controls (level 3 NGS-Y2H). As a result, you need to spend less time (and money) for validation experiments.

Download our Brochure

Image of Next Interactions brochure


Our Publications

DoMY-Seq: A yeast two-hybrid–based technique for precision mapping of protein–protein interaction motifs. Castel P, Holtz-Morris A, Kwon Y, Suter BP, McCormick F. 2020, Nov 3. doi: 10.1074/jbc.RA120.014284.

Non-full-length Water-Soluble CXCR4(QTY) and CCR5(QTY) Chemokine Receptors: Implication for Overlooked Truncated but Functional Membrane Receptors. Qing R, Tao F, Chatterjee P, Yang G, Han Q, Chung H, Ni J, Suter BP, Kubicek J, Maertens B, Schubert T, Blackburn C, Zhang S. iScience. 2020 Oct 28;23(12):101670. doi: 10.1016/j.isci.2020.101670. eCollection 2020 Dec 18.
PMID: 33376963

QTY code enables design of detergent-free chemokine receptors that retain ligand-binding activities. Zhang S, Tao F, Qing R, Tang H, Skuhersky M, Corin K, Tegler L, Wassie A, Wassie B, Kwon Y, Suter B, Entzian C, Schubert T, Yang G, Labahn J, Kubicek J, Maertens B. Proc Natl Acad Sci U S A. 2018 Sep 11;115(37):E8652-E8659. doi: 10.1073/pnas.1811031115. Epub 2018 Aug 28.

Next-Generation Sequencing for Binary Protein-Protein Interactions. Suter B, Zhang X, Pesce CG, Mendelsohn AR, Dinesh-Kumar SP, Mao JH. Front Genet. 2015 Dec 17;6:346. doi: 10.3389/fgene.2015.00346. PMID: 26734059

Development and application of a DNA microarray-based yeast two-hybrid system.
Suter B, Fontaine JF, Yildirimman R, Raskó T, Schaefer MH, Rasche A, Porras P, Vázquez-Álvarez BM, Russ J, Rau K, Foulle R, Zenkner M, Saar K, Herwig R, Andrade-Navarro MA, Wanker EE.  Nucleic Acids Res. 2013 Feb 1;41(3):1496-507. PMID:  23275563

QiSampler: evaluation of scoring schemes for high-throughput datasets using a repetitive sampling strategy on gold standards. Fontaine JF, Suter B, Andrade-Navarro MA. BMC Res Notes. 2011 Mar 9;4:57.  PMID: 21388526

Two-hybrid technologies in proteomics research. Suter B, Kittanakom S, Stagljar I. Curr Opin Biotechnol. 2008 Aug;19(4):316-23. Review. PMID: 18619540

Client Publications

A caspase–RhoGEF axis contributes to the cell size threshold for apoptotic death in developing Caenorhabditis elegans. Sethi A, Wei H, Mishra N, Segos I, Lambie EJ,  Zanin E, Conradt B. PLoS Biol. 2022 Oct 6;20(10):e3001786.

A cascade of bHLH-regulated pathways programs maize anther development. Nan GL, Teng C, Fernandes J, O’Connor L, Meyers CB, Walbot V. The Plant Cell, Volume 34, Issue 4, April 2022, Pages 1207–1225.

The Osteogenesis Imperfecta Type V Mutant BRIL/IFITM5 Promotes Transcriptional Activation of MEF2, NFATc, and NR4A in Osteoblasts. Maranda V, Gaumond M H, Moffatt P. 2022 Feb; 23(4): 2148. Published online 2022 Feb 15. doi: 10.3390/ijms23042148.

C2orf69 mutations disrupt mitochondrial function and cause a multisystem human disorder with recurring autoinflammation. Lausberg E, Gießelmann S, Dewulf JP, Wiame E, Holz A, Salvarinova R, van Karnebeek CD, Klemm P, Ohl K, Mull M, Braunschweig T, Weis J, Sommer CJ, Demuth S, Haase C, Stollbrink-Peschgens C, Debray FG, Libioulle C, Choukair D, Oommen PT, Borkhardt A, Surowy H, Wieczorek D, Wagner N, Meyer R, Eggermann T, Begemann M, Van Schaftingen E, Häusler M, Tenbrock K, van den Heuvel L, Elbracht M, Kurth I, Kraft F. J Clin Invest. 2021 Jun 15;131(12):e143078. doi: 10.1172/JCI143078.

RAS interaction with Sin1 is dispensable for mTORC2 assembly and activity. Castel P, Dharmaiah S, Sale MJ, Messing S, Rizzuto G, Cuevas-Navarro A, Cheng A, Trnka MJ, Urisman A, Esposito D, Simanshu DK, McCormick F. 2021, Aug 17. doi: 10.1073/pnas.2103261118.

Effects of Acetylation and Phosphorylation on Subunit Interactions in Three Large Eukaryotic Complexes. Šoštarić N, O’Reilly FJ, Giansanti P, Heck AJR, Gavin AC, van Noort V. Mol Cell Proteomics. 2018 Dec;17(12):2387-2401. doi: 10.1074/mcp.RA118.000892. Epub 2018 Sep 4.

An effector from the Huanglongbing-associated pathogen targets citrus proteases.
Clark K, Franco JY, Schwizer S, Pang Z, Hawara E, Liebrand TWH, Pagliaccia D, Zeng L, Gurung FB, Wang P, Shi J, Wang Y, Ancona V, van der Hoorn RAL, Wang N, Coaker G, Ma W.  Nat Commun. 2018 Apr 30;9(1):1718. doi: 10.1038/s41467-018-04140-9.

Reference Publications

High-resolution protein fragment interactions using AVA-Seq on a human reference set. Schaefer-Ramadan S, Aleksic J, Al-Thani NM, Mohamoud YA, Hill DE, Malek JA. 2021, July 29. doi: 10.1101/2021.07.28.454266.

High-resolution protein–protein interaction mapping using all-versus-all sequencing (AVA-Seq). Andrews SS, Schaefer-Ramadan S, Al-Thani NM, Ahmed I, Mohamoud YA, Malek JA. 2019, Jun 10. doi: 10.1074/jbc.RA119.008792.

A user-friendly platform for yeast two-hybrid library screening using next generation sequencing. Erffelinck ML, Ribeiro B, Perassolo M, Pauwels L, Pollier J, Storme V, Goossens A. PLoS One. 2018 Dec 21;13(12):e0201270. doi: 10.1371/journal.pone.0201270. eCollection 2018.

Development and application of a recombination-based library versus library high- throughput yeast two-hybrid (RLL-Y2H) screening system. Yang F, Lei Y, Zhou M, Yao Q, Han Y, Wu X, Zhong W, Zhu C, Xu W, Tao R, Chen X, Lin D, Rahman K, Tyagi R, Habib Z, Xiao S, Wang D, Yu Y, Chen H, Fu Z, Cao G. Nucleic Acids Res. 2018 Feb 16;46(3):e17. doi: 10.1093/nar/gkx1173.

CrY2H-seq: a massively multiplexed assay for deep-coverage interactome mapping.
Trigg SA, Garza RM, MacWilliams A, Nery JR, Bartlett A, Castanon R, Goubil A, Feeney J, O’Malley R, Huang SC, Zhang ZZ, Galli M, Ecker JR. Nat Methods. 2017 Aug;14(8):819-825. doi: 10.1038/nmeth.4343. Epub 2017 Jun 26.

Protein interaction perturbation profiling at amino-acid resolution. Woodsmith J, Apelt L, Casado-Medrano V, Özkan Z, Timmermann B, Stelzl U.
Nat Methods. 2017 Dec;14(12):1213-1221. doi: 10.1038/nmeth.4464. Epub 2017 Oct 16.

Pooled-matrix protein interaction screens using Barcode Fusion Genetics. Yachie N, Petsalaki E, Mellor JC, Weile J, Jacob Y, Verby M, Ozturk SB, Li S, Cote AG, Mosca R, Knapp JJ, Ko M, Yu A, Gebbia M, Sahni N, Yi S, Tyagi T, Sheykhkarimli D, Roth JF, Wong C, Musa L, Snider J, Liu YC, Yu H, Braun P, Stagljar I, Hao T, Calderwood MA, Pelletier L, Aloy P, Hill DE, Vidal M, Roth FP. Mol Syst Biol. 2016 Apr 22;12(4):863. doi: 10.15252/msb.20156660.

A Y2H-seq approach defines the human protein methyltransferase interactome.
Weimann M, Grossmann A, Woodsmith J, Özkan Z, Birth P, Meierhofer D, Benlasfer N, Valovka T, Timmermann B, Wanker EE, Sauer S, Stelzl U. Nat Methods. 2013 Apr;10(4):339-42. doi: 10.1038/nmeth.2397. Epub 2013 Mar 3.

Quantitative Interactor Screening with next-generation Sequencing (QIS-Seq) identifies Arabidopsis thaliana MLO2 as a target of the Pseudomonas syringae type III effector HopZ2. Lewis JD, Wan J, Ford R, Gong Y, Fung P, Nahal H, Wang PW, Desveaux D, Guttman DS. BMC Genomics. 2012 Jan 9;13:8. doi: 10.1186/1471-2164-13-8.

Next-generation sequencing to generate interactome datasets. Yu H, Tardivo L, Tam S, Weiner E, Gebreab F, Fan C, Svrzikapa N, Hirozane-Kishikawa T, Rietman E, Yang X, Sahalie J, Salehi-Ashtiani K, Hao T, Cusick ME, Hill DE, Roth FP, Braun P, Vidal M. Nat Methods. 2011 Jun;8(6):478-80. doi: 10.1038/nmeth.1597. Epub 2011 Apr 24.