Case 3: Benchmark and Method validation#

In this example, we show how to compare the results from implicit and explicit H-bond methods.

Import the required packages#

import os

import prolif as plf
from prolif import sdf_supplier
from prolif.io.protein_helper import ProteinHelper
from prolif.molecule import Molecule
from prolif.plotting.complex3d import Complex3D
from rdkit import Chem
/home/yuyang/Project_local/GSoC2025_Hbond_PM/.venv/lib/python3.11/site-packages/MDAnalysis/topology/tables.py:52: DeprecationWarning: Deprecated in version 2.8.0
MDAnalysis.topology.tables has been moved to MDAnalysis.guesser.tables. This import point will be removed in MDAnalysis version 3.0.0
  warnings.warn(wmsg, category=DeprecationWarning)
# custom functions for comparison and plotting
from validation.utils.metrics import (
    confusion_matrix,
    get_interactions,
    plot_confusion_matrix,
    tanimoto_coefficient_by_confusion_matrix,
)

We also use the protein helper function to read the files.

protein_helper = ProteinHelper(
    templates=[
        {
            "MSE": {"SMILES": "C[Se]CC[CH](N)C=O"},
        }
    ]
)

The test case is from PLINDER: 5da9__1__1.A_1.B__1.E_1.F. We have prepared the protonated file in the repository.

test_case_dir = "./test_data/5da9__1__1.A_1.B__1.E_1.F"

Explicit H-bond method#

First, we read the molecule from the path.

protein_mol = Molecule.from_rdkit(
    Chem.MolFromPDBFile(
        f"{test_case_dir}/receptor_protonated.pdb",
        sanitize=False,
        removeHs=False,
        proximityBonding=True,
    )
)
protein_mol = protein_helper.standardize_protein(protein_mol)

Then, we check the length of the residues.

protein_mol.residues.__len__()
834

We also check with the residues to know whether they are fixed with our protein helper as expected.

plf.display_residues(protein_mol, slice(0, 10), sanitize=False)
../_images/3be890db4f0ef97ad1d638cebd8bda8fc910f2b41b8208493e2347fa78dafa2f.svg

We next load all ligands in the directory.

ligands = []
for ligand_sdf in os.listdir(test_case_dir):  # noqa: PTH208
    if ligand_sdf.endswith("_protonated.sdf"):
        ligands.extend(sdf_supplier(f"{test_case_dir}/{ligand_sdf}"))

There are two ligands in our directory.

ligands
[<prolif.molecule.Molecule with 1 residues and 47 atoms at 0x7fb4128bb3d0>,
 <prolif.molecule.Molecule with 1 residues and 1 atoms at 0x7fb4129c3510>]
plf.display_residues(ligands[0])
../_images/3c68e866734c86a7c444c8ee7bb6060e1b482ef2a07d30925fb6e052911d6a80.svg

We currently focus on the cation/anion one.

ligand = ligands[0]

Then, we calculate the H-bond interactions (using explicit hydrogens).

fp = plf.Fingerprint(["HBDonor", "HBAcceptor"], count=True)
fp.run_from_iterable([ligand], protein_mol, progress=False)
df = fp.to_dataframe().T

Here is the result.

df
Frame 0
ligand protein interaction
UNL1 ARG13.A HBAcceptor 1
ASN36.A HBAcceptor 1
GLY37.A HBAcceptor 1
GLY39.A HBAcceptor 1
LYS40.A HBAcceptor 1
THR41.A HBAcceptor 1
THR42.A HBAcceptor 1
ALA64.A HBDonor 1
ASP68.A HBAcceptor 1
GLN159.A HBAcceptor 1
ARG431.A HBAcceptor 1
GLY344.B HBAcceptor 1
GLN345.B HBAcceptor 1

2D visualization#

We visualize the result into an 2D plot.

view = fp.plot_lignetwork(ligand, kind="frame", frame=0, display_all=False)
view

3D visualization#

The result can also be visualized in 3D plot.

view = fp.plot_3d(ligand, protein_mol, frame=0, display_all=True)
view

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<prolif.plotting.complex3d.Complex3D at 0x7fb4128211d0>

Implicit H-bond method#

To have a better comparison, we simply remove the hydrogens from the object instead of reading a new file without explicit hydrogens.

protein_mol_i = protein_helper.standardize_protein(
    Molecule.from_rdkit(Chem.RemoveAllHs(protein_mol))
)

Again, we check the residues after our protein helper.

plf.display_residues(protein_mol_i, slice(0, 10), sanitize=False)
../_images/1ce65be7e07213bee5abf4c217ef10bbbec282c94f29d3c852c6579660256f51.svg

Same for the ligand, we remove the hydrogens.

ligand_i = Molecule.from_rdkit(Chem.RemoveAllHs(ligand))

We then can recalculate the H-bond interations (but with implicit hydrogens).

fp_i = plf.Fingerprint(
    ["ImplicitHBDonor", "ImplicitHBAcceptor"],
    count=True,
)
fp_i.run_from_iterable([ligand_i], protein_mol_i, progress=False)
df_i = fp_i.to_dataframe().T

Here is the result.

df_i
Frame 0
ligand protein interaction
UNL1 ARG13.A ImplicitHBAcceptor 1
ASN36.A ImplicitHBAcceptor 1
GLY37.A ImplicitHBDonor 1
ImplicitHBAcceptor 1
SER38.A ImplicitHBDonor 1
ImplicitHBAcceptor 2
GLY39.A ImplicitHBAcceptor 1
LYS40.A ImplicitHBDonor 1
ImplicitHBAcceptor 2
THR41.A ImplicitHBAcceptor 3
THR42.A ImplicitHBAcceptor 2
ALA64.A ImplicitHBDonor 1
ASP68.A ImplicitHBAcceptor 1
SER342.B ImplicitHBDonor 1
ImplicitHBAcceptor 1
GLY344.B ImplicitHBAcceptor 1
GLN345.B ImplicitHBAcceptor 1

2D visualization#

view = fp_i.plot_lignetwork(ligand_i, kind="frame", display_all=False)
view

3D visualization#

view = fp_i.plot_3d(ligand_i, protein_mol_i, frame=0, display_all=True)
view

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<prolif.plotting.complex3d.Complex3D at 0x7fb412a73150>

Comparison between explicit and implicit H-bond methods#

For the 3D visualization, we can plot the results side by side using compare function. The left panel shows the H-bond interations from explicit hydrogens and the right panel is those from implicit hydrogens.

# create Complex3D objects (left: explicit)
comp3D = Complex3D.from_fingerprint(fp, ligand, protein_mol, frame=0)
# (right: implicit)
other_comp3D = Complex3D.from_fingerprint(fp_i, ligand_i, protein_mol_i, frame=0)

# compare the two Complex3D objects
view = comp3D.compare(other_comp3D, display_all=True)
view

3Dmol.js failed to load for some reason. Please check your browser console for error messages.

<prolif.plotting.complex3d.Complex3D at 0x7fb4128b0a90>

For the interation pairs, we can covert the result into two sets explicit_set and implicit_set. Then, we can know how many interactions detected by the explicit H-bond method overlap with those detected by the implicit H-bond method.

explicit_set = get_interactions(df)
implicit_set = get_interactions(df_i)

matrix = confusion_matrix(explicit_set, implicit_set)
tm_coef = tanimoto_coefficient_by_confusion_matrix(matrix)

The Tanimoto coefficient (also called intersection over union) is a measure used to evaluate performance. A higher score indicates better performance.

print("Tanimoto coefficient:", tm_coef)  # noqa: T201
fig, ax = plot_confusion_matrix(matrix)
fig.show()
Tanimoto coefficient: 0.5789473684210527
../_images/824bbdd132b96b3f30dc26c706e6c88edef67e38952d9625219e4b64a00c7ec6.png