Case 1: Get started with ProteinHelper#

In this example, you will learn to parse PDB files with non-standard residues using ProteinHelper, and then calculate the hydrogen bond interactions using implicit hydrogens.

First of all, import the required packages.

import MDAnalysis as mda
import prolif as plf
from prolif.io.protein_helper import ProteinHelper
from prolif.molecule import Molecule
/home/yuyang/Project_local/GSoC2025_Hbond_PM/.venv/lib/python3.11/site-packages/MDAnalysis/topology/tables.py:52: DeprecationWarning: Deprecated in version 2.8.0
MDAnalysis.topology.tables has been moved to MDAnalysis.guesser.tables. This import point will be removed in MDAnalysis version 3.0.0
  warnings.warn(wmsg, category=DeprecationWarning)

Next, let’s initialize our ProteinHelper. In this example, MSE is a non-standard residue. MG and AGS are ligands for the complex, whose bond orders can be corrected by using the same ProteinHelper with SMILES strings.

protein_helper = ProteinHelper(
    [
        {
            "MSE": {"SMILES": "C[Se]CC[CH](N)C=O"},
            "MG": {"SMILES": "[Mg++]"},
            "AGS": {
                "SMILES": (
                    "c1nc(c2c(n1)n(cn2)[C@H]3[C@@H]"
                    "([C@@H]([C@H](O3)COP(=O)(O)OP(=O)(O)OP(=S)(O)O)O)O)N"
                )
            },
        }
    ]
)

Next, we will need to seperate receptor (here, the protein) and ligand. However, the current version of ProLIF cannot split the molecule directly. Thus, MDAnalysis are used for splitting protein and ligand into separate files, and we read them again.

# read and write with mdanalysis
u = mda.Universe("./test_data/5da9.pdb")
u.select_atoms("protein or water").write("./test_data/5da9_protein.pdb")
u.select_atoms("not protein and not water").write("./test_data/5da9_ligand.pdb")
/home/yuyang/Project_local/GSoC2025_Hbond_PM/.venv/lib/python3.11/site-packages/MDAnalysis/coordinates/PDB.py:1154: UserWarning: Found no information for attr: 'formalcharges' Using default value of '0'
  warnings.warn("Found no information for attr: '{}'"

Now, simply use the initialized ProteinHelper to read the file into plf.Molecule object. There is a warning message to hightlight cysteines are assiged to their neutral protonated state.

protein_mol = protein_helper.standardize_protein("./test_data/5da9_protein.pdb")
/home/yuyang/Project_local/GSoC2025_Hbond_PM/.venv/lib/python3.11/site-packages/prolif/io/protein_helper.py:161: UserWarning: Could not guess the forcefield based on the residue names. CYS is assigned to neutral CYS (charge = 0).
  standardized_resname = self.convert_to_standard_resname(

The bond-order-corrected residues will be saved in a list in Molecule.residues, which we use for calculating fingerprints.

protein_mol.residues.__len__()  # length of the residues in the molecule
992

To check the bond order of the residues, use plf.display_residues.

plf.display_residues(protein_mol, slice(0, 20), sanitize=False)
../_images/3c7ffd6fd7c8c7b3122a07573aef930e8103d0d2c681e5c75bf5ac7be13d4fc6.svg

We also read the ligand from the files.

ligands = protein_helper.standardize_protein("./test_data/5da9_ligand.pdb")

And, check the bond orders

plf.display_residues(ligands, slice(0, 20), sanitize=False)
../_images/dc25bd6c5f47e6b401444e7a4cae186f20d516521b8169d693492c67d5039db1.svg

In this case, we are interested in chain A’s AGS, so we select the specific ligand.

ligand = Molecule(ligands[1])

Then, we can calculate the hydrogen bond interactions with implicit hydrogens.

fp = plf.Fingerprint(["ImplicitHBDonor", "ImplicitHBAcceptor"], count=True)
fp.run_from_iterable([ligand], protein_mol)
df = fp.to_dataframe().T

Here is the result:

df
Frame 0
ligand protein interaction
AGS1402.A ARG13.A ImplicitHBAcceptor 1
ASN36.A ImplicitHBAcceptor 1
GLY37.A ImplicitHBDonor 1
ImplicitHBAcceptor 1
SER38.A ImplicitHBDonor 1
ImplicitHBAcceptor 2
GLY39.A ImplicitHBAcceptor 1
LYS40.A ImplicitHBDonor 1
ImplicitHBAcceptor 2
THR41.A ImplicitHBAcceptor 3
THR42.A ImplicitHBAcceptor 2
ALA64.A ImplicitHBDonor 1
ASP68.A ImplicitHBAcceptor 1
SER1208.B ImplicitHBDonor 1
ImplicitHBAcceptor 1
GLY1210.B ImplicitHBAcceptor 1
GLN1211.B ImplicitHBAcceptor 1

We can visualize the results with 2D plot.

view = fp.plot_lignetwork(ligand, kind="frame", frame=0, display_all=False)
view

Also, visualize the interactions in 3D.

view = fp.plot_3d(ligand, protein_mol, frame=0, display_all=True)
view.setStyle(
    {
        "resn": "HOH",
    },
    {"sphere": {"radius": 0.5, "color": "red"}},
)
view

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<prolif.plotting.complex3d.Complex3D at 0x7fb663633dd0>