CC+ Documentation

Coiled coils comprise two or more α helices wound around each other to give rope-like structures^{1, 2, 3}. Most coiled coils are characterised by a sequence pattern of hydrophobic (h) and polar (h) residues called the heptad repeat, (hpphppp)_n. Each position within the heptad repeat is typically labelled a to g².

Characteristic Packing

In addition to the heptad repeat, coiled-coil structures have a particular packing arrangement between the α -helices termed “knobs-into-holes” (KIH) packing, first described by Crick¹. In KIH packing, every first (a) and fourth (d) residue of each heptad repeat is a “knob”, which fits into a diamond-shaped “hole” formed by four residues of another α-helix in close proximity.

SOCKET is an assignment method that focuses on these knob-into-hole interactions⁴; i.e. coiled-coil assignments are based on structure and not sequence. SOCKET identifies KIH interactions as follows:

all residues within α-helices are represented by their centres of mass;
“knob” residues are typically defined as side chains within 7 Å of 4 other residues on a neighbouring helix. This distance is called the packing cutoff, and has been chosen empirically to reduce false-positive assignments;
”hole” residues for a given knob are typically defined as the 4 nearest side chains;

2-Stranded coiled coils have pairwise-complementary KIH interactions: when a knob from helix A packs into a hole comprising four side chains of helix B, one or more of these hole residues reciprocate by packing into another hole of side chains on helix A. Higher-order coiled coils have cyclically complementary KIH interactions. In this case, a knob from helix A fits in a hole of helix B, the corresponding knob on B acts into a hole of C, and so on. Detailed information on the SOCKET algorithm can be found in “Socket: a Program For Identifying and Analysing Coiled-coil Motifs Within Protein Structures”⁴. We have recently updated SOCKET to Socket2⁵, and implemented a GUI for users to visualise coiled coils identified by Socket2 and from CC⁺ directly and interactively.

References

Crick, F. H. C. Acta Crystallogr., 1953, 6, 689-697.
You can view this paper.
Lupas, A., Bassler, J. Trends Biochem. Sci. 2017, 42, 130-140.
You can view this paper.
Woolfson, D.N. J. Biol. Chem., 2023, 299.
You can view this paper.
Walshaw, J., Woolfson, D.N., J. Mol. Biol., 2001, 307, 1427-1450.
You can view this paper.
Kumar, P., Woolfson, D.N., Bioinformatics, 2021, 37, 4575–4577.
You can view this paper, or visit the site.

The CC⁺ database is built upon a relational MySQL database of organized Socket2¹ assignments. The Dynamic Interface has been developed to facilitate quick and easy interaction with the relational database and available here. There are two form-based ‘Dynamic Interface’ tabs, one each for customizable searches of CCs in PDB structures and AlphaFold2 models, CCPlus-PDB and CCPlus-AlphaFold, respectively (Figure 1).

Figure 1: Flash page of CC⁺ home page. Two 'Dynamic Interface' (i) CCPlus-PDB and (ii) CCPlus-AlphaFold can be accessed by clicking on the clickable image buttons. Access to the current 'Statistics' and 'Documentation' page is also provided.

Some of the tabs (listed below) used to search these are common to both parts of the CC⁺ Database, but others are unique to each arm of the database. In all cases, searches are initiated by clicking the Search CC⁺ button, and default values for each of the parameters can be regained with the Reset button.

A Tabbed Interface

To avoid an overwhelming number of form inputs on-screen at a given moment, CCPlus-PDB and CCPlus-AlphaFold Dynamic Interfaces are divided into four and three tabs. Each allows inputs pertaining to a specific aspect of Socket2's coiled-coil assignments:

Specify PDB ID facilitates searching PDB IDs with coiled coils. This tab is available only for 'CCPlus-PDB' Dynamic Interface.
Specify structures facilitates searching for specific α-helical configurations.
Specify sequences facilitates searching for given amino-acid sequences.
Specify interactions facilitates searching for potentially-interacting pairs of residues.

A search of the CC+ database does not need the inputs of every tab to be configured. Inputs can be selected as required, refining or relaxing filters on the coiled-coil data set as necessary. Whenever a new tab is chosen, or a new search is run, the configured inputs are reported.

Specifying PDB ID

“Specify PDB ID” allows PDB ID as an input. This can be used in conjunction with other search parameters.

Specifying Structures

The “Specify structures” tab comprises dropdown inputs specifying different α-helical configurations observed in coiled-coil assignments.

For both the CCPlus-PDB (Figure 2A) and CCPlus-AlphaFold (Figure 2B) tabs, a slider allows users to choose the Socket2 cut-off value for identifying Coiled coils. The default value is 7 Å, which we recommend using, as the other values (7.5, 8, 8.5, and 9 Å) are increasingly less stringent.

Figure 2: 'Specify Structures' tab of both 'Dynamic Interface' forms. Default parameters are selected.

The CCPlus-PDB tab (Figure 2A) offers several search parameters for Coiled coils, including:

Redundancy (with less than a specific sequence identity)
the number of α Helices
relative Orientation
whether the Partnering helices have the same (homo-mers) or different (hetero-mers) sequences
whether the helices are from the same or different polypeptide Chains
with heptad or non-heptad repeats
the minimum number Length of the coiled coil helices

In addition to these options adopted from the original CC^{+, 2}, users can now specify the following:

type of Protein (membrane or globular)
the Experiment Type used to solve the parent structure
specific Modified Residues, i.e. non-proteinogenic residues

When the Experiment Type is defined an additional Resolution range can be added by the user.

Contrary to CCPlus-PDB, as AlphaFold2 predicted models are for the protomers and contain only the 20 standard proteogenic residues, the CCPlus-AlphaFold tab (Figure 2B) offers similar search parameters to the above, but without options for the number of Chains, Protein type, Experiment Type, and Modified Residues.

These searches can also be filtered by a min-max pLDDT Score³ for the predicted CC regions.

Figure 3: 'Specify Sequences' and 'Specify Interactions' tabs. These tabs are common for both 'Dynamic Interface' forms.

Specifying Sequences

The “Specify sequences” tab (Figure 3A) comprises two text inputs facilitating searches of the CC+ database for specific amino-acid sequences. Any amino-acid sequences entered will be sought in their entirety within Socket2-assigned coiled-coil structures.

The “sequence” input accepts both plain text sequences, e.g. RIKKLE, or PROSITE⁴ patterns, e.g. R-[ILVM]-X-X-[ILV]-E. If you experience errors with PROSITE patterns, it is recommended that you check the PROSITE syntax. The “register” input is entirely optional, but facilitates the linking of a given sequence to its structure within a coiled-coil assignment. Registers must be supplied in plain text using letters a to g, for example gabcde.

Specifying Interactions

The “Specify Interactions” tab (Figure 3B ) comprises of 5 drop-down inputs: two inputs each for two residues and its position in the heptad registry and a final input configures the maximum Å distance that is accepted between centres of mass.

Note: Searching for a given residue at a given position, and nothing more, will effectively replicate a one-residue search on the “Specify sequences” tab. It is also worth noting that just because two amino-acids are within a certain distance of each other, they may not necessarily interact.

Returned Coiled-coil Data

Results from each successful run will be presented as Summary, Gallery, and Compact view. Here, either the figure or the PDB ID will act as a clickable button using which a Socket2 run can be invoked for the specific coiled coil and the associated data can be downloaded for further use.

References

Kumar, P., Woolfson, D.N., Bioinformatics, 2021, 37, 4575–4577.
You can view this paper, or visit the site.
Testa, O.D., E Moutevelis, E., Woolfson, D.N., Nucleic Acids Res., 2009, 37, D315-D322.
You can view this paper.
Jumper, J. et. al. Nature, 2021, 596, 583–589.
You can view this paper.
Sigrist C.J.A. et. al. Nucleic Acids Res., 2010, 38, D161–D166
You can view this paper, or visit the site.

Occasionally, when searching for coiled-coil assignments using the Dynamic Interface, coiled coils you're looking for just don't turn up. Expected hits just aren't returned, and you might not find an immediate reason why. Hopefully, this section will resolve some of the more-frequently received queries regarding curious search results.

Check your current filters

The Dynamic Interface comprises of various tabs related to different aspects of coiled-coil structures. Because only one tab is displayed at a time, it's easy to forget the current configuration of the others, and therefore what filters are being applied to the currrent data set. To prevent this, there is a section of the Result page that reports the “Current filters”.

The “Current filters” report shows all the constraints being applied to your data set. If you suspect a coiled-coil structure is missing from your current search, it's prudent to check the currently applied filters whether they are configured correctly.

The default structural filters return only coiled-coil assignments that:

are ≤50% redundant in sequence;
comprise any number of α-helices;
have any relative orientation of α-helices;
are homomers and heteromers;
comprise any number of polypeptide chains;
have only canonical heptad repeats;
comprise α-helices longer than 11 amino acids.

Check the Redundancy setting

If, for example, you use the keyword facility of the Dynamic Interface to search for 2ZTA with the default settings (i.e. having started a new search or pressed the “Reset Everything” button), you won't find anything. This is normal.

Coiled coils often go missing because they are considered redundant. This means that a longer coiled coil, comprising identical (or highly similar) sequences, has been retained in the data set and the shorter, redundant assignments have been rejected.

If you can't find a particular PDB identifier, try relaxing the redundancy setting on the “Select structures” tab. The default value of 50% knocks out many of the shorter, well-reknowned coiled-coil structures. If Socket2 found a coiled-coil in the structure of a protein, it's in CC⁺ — you just have to select the right parameters.

If in doubt, relax everything

If you're still not finding a coiled-coil assignment that Socket2 definitely identifies, the last-ditch attempt is to relax every filter available. This will return absolutely every assignment in the CC database. Reset every tab to is defaults by selecting the “Reset everything” button, then go to the “Specify structures tab” and configure a search for coiled coils which:

are redundant (not non-identical);
comprise any number of α-helices;
have any relative orientation of α-helices;
are homomers and heteromers;
comprise any number of polypeptide chains;
have heptad and non-heptad repeats;
comprise α -helices of any length.

If you still can't find a coiled-coil assignment that you believe should be in CC+ using a keyword search for a PDB code, let me know. As always, the feedback I gratefully recieve goes into making the tool better for all concerned, so please keep me informed!

A successful search allows the users to analyse and visualize the coiled coils. Users are provided with the following three options to download the results of CC⁺ as a flat csv file or compressed file:

Summary and CC Sequences: A CSV file containing PDB IDs or Uniprot IDs with coiled coils and corresponding meta data as a flat file (Figure 1).

Figure 1: Overview of a typical CC⁺ Result page. Dropdown menu will appear for every successful run and an appropriate option can be selected to download either a summary file or 3D coordiantes. Using the clickable 'Profiles' button, the propensity of 20 proteogenic amino acids at every heptad position can be calculated.

3D-Coordinates (PDB file format): Using Biopython¹, PDB files containing only the coiled coil regions can be generated that can be downloaded as a part of compressed file (Figure 1). In case of a long list the session can time out and the compressed file will not get generated. Users may contact Prof. Woolfson.
Profiles: The front end provides an option to generate the amino acid count at each heptad position and its corresponding Swissport/internal propensity by clicking the ‘Profiles’ button (Figure 1). Same data, in csv format, can be downloaded from the generated profile page. (Figure 2).

Figure 2: Button to Download the profiles for each successful search. It appears at the end of the page containing the profile information.

Profiles presents two 20x7 tables of position-specific residue data. The first table contains the raw counts of residues at each register position for the gathered data set, the second table presents that same data but normalized against the amino-acid frequencies found in Swiss-prot² or in . This normalization procedure is described below in more detail.

Notes on Normalization

Swiss-prot Normalized Data: The propensities of amino acids for each position in an α-helical heptad (abcdefg) is calculated as PSSM table where the raw amino-acid frequencies are normalized against the amino-acid frequency in SWISS-PROT (tabulated below in %).
A: 8.25, Q: 3.93, L: 9.65, S: 6.64
R: 5.53, E: 6.72, K: 5.80, T: 5.35
N: 4.06, G: 7.07, M: 2.41, W: 1.10
D: 5.46, H: 2.27, F: 3.86, Y: 2.92
C: 1.38, I: 5.91, P: 4.74, V: 6.86

Internally Normalized Data: The Propensities of amino acids for each position in an α-helical heptad (abcdefg) is calculated as PSSM table where the raw amino-acid frequencies are normalized internally.

This gives values of <1 for residues which are less frequently at a position than would be expected, values of approximately 1 for residues occurring as frequently as expected, and values >1 for residues more frequently than expected.

References

Cock, P. J. et. al. Bioinformatics 2009, 25, 1422–1423.
You can view the paper or visit the site.
The UniProt Consortium. Nucl. Acids Res. 2023, D1, D523–D531.
You can view this paper, or visit the site.

CCPlus-PDB CCPlus-AlphaFold Statistics Documentation

Characteristic Packing

References

A Tabbed Interface

Specifying PDB ID

Specifying Structures

Specifying Sequences

Specifying Interactions

Returned Coiled-coil Data

References

Check your current filters

Check the Redundancy setting

If in doubt, relax everything

Notes on Normalization

References