Common issues in PDB structures

Note that most PDB files from the PDB require some pre-processing to be handled by GROMACS. Here is an overview of the most common issues and their solutions:

Protein

Multiple transmembrane proteins

When working with protein complexes, one must define which proteins are transmembrane. By not identifying all transmembrane proteins, or identifying too many proteins as transmembrane, PPM can produce an incorrect placement of the membrane in the system.

Non-natural residues

Non-natural residues are not supported by the PyMemDyn module. Consider changing the residue to the most closely related (in size and/or charge) natural amino acid. Sometimes, alternative names for natural amino acids are used and detected as non-natural residues. Here, simply renaming the amino acids to canonical naming fixes this issue.

Missing side chains

Many experimental PDB structures contain missing side chains. These are automatically detected and added by PyMemDyn

Chain breaks

Many experimental PDB structures contain chain breaks. These are automatically detected and connected through poly-alanine chains. This maintains the flexibility within the system while also keeping the system connected.

Discontinuous sequence

Within GPCRs, many structures have been identified with discontinuous sequences. Here, the GPCR chain is numbered 1-1000, while the G protein is interjected with numbers above 1000. To fix this, one should extract the G protein and give it a unique chain identifier.

N and C-terminus caps

Caps are not supported by the PyMemDyn module. GROMACS requires a non-capped structure and will handle capping. Often, not specifying the caps as residues during the setting up of the system on the webserver is enough to have PyMemDyn not consider the caps.

-

Ligand

Abnormal ligands

Ligands must be parsable by LigParGen to generate the appropriate force field parameters. If any issues arise, one can check on LigParGen if the ligand is parsable by LigParGen.

Multiple identical ligands

When handling a system with multiple identical ligands, these should be given unique ligand identifiers.

Covalent inhibitors

Covalent inhibitors are not supported by PyMemDyn. One can however simulate the ligand as a non-covalent entity and post-simulation reintroduce the covalent bonds.

Sugar (chains)

Sugars are often represented as chains in a PDB. One can include sugars in the simulations, but they have to be formatted as ligands (i.e. single residue identifiers). Note that sugars have to be parsable by LigParGen, thus one may face size restrictions of 200 atoms.

-

Waters

Explicit hydrogens

To include crystal waters, the hydrogens must be explicitly defined.

-

Ions

Standard naming scheme

To include ions, the standard naming scheme must be followed as documented here:

Table 1. Mapping of ions to their corresponding residue IDs (ResID).
Ion Br Ca Cl Cs F I K Li Mg Na Rb
ResId BR CA CL
CL-
CHL
CS F I K LI MG NA
NA+
SOD
RB

-

System size

Long unstructured protein sections

Proteins with large unstructured regions may significantly increase the simulated box size. This increases the simulation time drastically while providing limited additional information. In the worst-case scenario, the box becomes too big, and the simulation has to be terminated due to time out (3 days). One can consider removing unstructured regions. For loops, these will automatically be connected by a flexible poly-alanine chain to ensure the remaining systems behave similarly.

Complex size

PyMemDyn allows for complexes of multiple proteins and has been tested on systems consisting of up to 10 proteins. While larger complexes are in theory possible, time-out limitations may occur.