Common issues in PDB structures
Note that most PDB files from the PDB require some pre-processing to be handled by GROMACS. Here is an overview of the most common issues and their solutions:
Protein
Multiple transmembrane proteins
When working with protein complexes, one must define which proteins are transmembrane. By not identifying all transmembrane proteins, or identifying too many proteins as transmembrane, PPM can produce an incorrect placement of the membrane in the system.
Non-natural residues
Non-natural residues are not supported by the PyMemDyn module. Consider changing the residue to the most closely related (in size and/or charge) natural amino acid. Sometimes, alternative names for natural amino acids are used and detected as non-natural residues. Here, simply renaming the amino acids to canonical naming fixes this issue.
Missing side chains
Many experimental PDB structures contain missing side chains. These are automatically detected and added by PyMemDyn
Chain breaks
Many experimental PDB structures contain chain breaks. These are automatically detected and connected through poly-alanine chains. This maintains the flexibility within the system while also keeping the system connected.
Discontinuous sequence
Within GPCRs, many structures have been identified with discontinuous sequences. Here, the GPCR chain is numbered 1-1000, while the G protein is interjected with numbers above 1000. To fix this, one should extract the G protein and give it a unique chain identifier.
N and C-terminus caps
Caps are not supported by the PyMemDyn module. GROMACS requires a non-capped structure and will handle capping. Often, not specifying the caps as residues during the setting up of the system on the webserver is enough to have PyMemDyn not consider the caps.
-
Ligand
Abnormal ligands
Ligands must be parsable by LigParGen to generate the appropriate force field parameters. If any issues arise, one can check on LigParGen if the ligand is parsable by LigParGen.
Multiple identical ligands
When handling a system with multiple identical ligands, these should be given unique ligand identifiers.
Covalent inhibitors
Covalent inhibitors are not supported by PyMemDyn. One can however simulate the ligand as a non-covalent entity and post-simulation reintroduce the covalent bonds.
Sugar (chains)
Sugars are often represented as chains in a PDB. One can include sugars in the simulations, but they have to be formatted as ligands (i.e. single residue identifiers). Note that sugars have to be parsable by LigParGen, thus one may face size restrictions of 200 atoms.
-
Waters
Explicit hydrogens
To include crystal waters, the hydrogens must be explicitly defined.
-
Ions
Standard naming scheme
To include ions, the standard naming scheme must be followed as documented here:
Ion | Br | Ca | Cl | Cs | F | I | K | Li | Mg | Na | Rb |
---|---|---|---|---|---|---|---|---|---|---|---|
ResId | BR | CA | CL CL- CHL |
CS | F | I | K | LI | MG | NA NA+ SOD |
RB |
-
System size
Long unstructured protein sections
Proteins with large unstructured regions may significantly increase the simulated box size. This increases the simulation time drastically while providing limited additional information. In the worst-case scenario, the box becomes too big, and the simulation has to be terminated due to time out (3 days). One can consider removing unstructured regions. For loops, these will automatically be connected by a flexible poly-alanine chain to ensure the remaining systems behave similarly.
Complex size
PyMemDyn allows for complexes of multiple proteins and has been tested on systems consisting of up to 10 proteins. While larger complexes are in theory possible, time-out limitations may occur.