Trajectory handler#

class utils.datasets.traj_handler.TrajectoryHandler(top_path: str | pathlib.Path, trajectory_path: Optional[Union[str, Path]] = None, ligand_name: Optional[str] = None, radius_of_interest: float = 16.0, spacing: float = 0.5, distance_cutoff: float = 5.0, warning_check: bool = True)[source]#

Bases: object

Trajectory handler for protein-ligand complex / protein-only trajectory.

This class is used to handle the trajectory of a protein-ligand complex or protein-only trajectory. It provides methods to read the trajectory, get the residues at the pocket, get the pocket center, write the structure, features, labels, interest region, and voxelised data. It also provides methods to preprocess the data and write the auxiliary files.

Parameters
  • top_path (required) – the path to the topology file (.pdb, .gro, … [MDAnalysis compatible])

  • trajectory_path (required) – the path to the trajectory file (.trr, .xtc, … [MDAnalysis compatible])

  • ligand_name (optional, recommended to provide) – the name of the ligand

  • radius_of_interest (optional) – the radius (Å) to consider the interest region (default: 16.0)

  • spacing (optional) – the spacing (Å) between the grid points (default = 0.5 due to the sampling theorem from the mesh spacing 1Å)

  • distance_cutoff (optional) – the surface points will be labelled only if the distance of the point to the ligand’s heavy atoms within this distance cutoff. (default: 5.0 Å)

  • warning_check (optional) – if True, the warnings will be shown (default = True)

Returns

self.top_path was set to the top_path self.trajectory_path was set to the trajectory_path self.ligand_name was set to the ligand_name self.universe was set to the MDAnalysis Universe object self.warning_check was set to the warning_check

Note

Functions:

high-level functions (can use self.variables and self.functions): low-level functions (can only use self.functions):

add_interest_region_to_ply(ply_path: str | pathlib.Path, ply_path_output: Optional[Union[str, Path]] = None)[source]#

Add the interest region to a PLY file.

Parameters
  • ply_path – str, the path to the input PLY file.

  • ply_path_output – str, the path to the output PLY file. If not provided, the input PLY file will be overwritten.

Returns

Save the PLY file with the interest region in ply_path_output. If the ply_path_output is not provided, otherwise in ply_path.

add_labels_to_ply(ply_path: str | pathlib.Path, ref_ligand_frame: int, ply_path_output: Optional[Union[str, Path]] = None)[source]#

Add the labels to a PLY file.

Parameters
  • ply_path – str, the path to the PLY file

  • ref_ligand_frame – int, the frame number to get the reference ligand for the surface.

  • ply_path_output – str, the path to the output PLY file

Returns

Save the PLY file with the labels in ply_path_output. If the ply_path_output is not provided, otherwise in ply_path.

align_traj_to_pocket(reference: Optional[Union[Universe, AtomGroup, int]] = None, select_Hs: bool = False, update_pocket_center: bool = True)[source]#

Use the pocket resids to align the trajectory to the pocket.

Requirement: residues_at_pocket.

Parameters
  • reference – MDAnalysis Universe object, AtomGroup object, or int, the reference to align the trajectory

  • select_Hs – bool, if True, the H atoms will be selected

  • update_pocket_center – bool, if True, the pocket center will be updated after the alignment (default: True)

Returns

Align the trajectory to the pocket. See self.universe,

it will be updated.

get_complex()[source]#

[Require ligand name] Get the complex from the trajectory (inlcuding, protein, ligand, protein + ligand) by MDAnalysis selection.

Returns

self.ligand was set to the ligand self.protein was set to the protein self.complex was set to the complex (protein + ligand)

get_frame(frame_number: int)[source]#

Get the frame of the trajectory.

Parameters

frame_number – int, the frame number to get

Returns

self.universe.trajectory was set to the frame_number

get_ligand()[source]#

[Require ligand name] Get the ligand from the trajectory by MDAnalysis selection.

Returns

self.ligand was set to the ligand

get_pocket_center(frame: int = 0)[source]#

[Require ligand name] Get the pocket center at a specific frame (default = 0).

Parameters

frame – int, the frame number to get the pocket center

Returns

self.pocket_center was set to the pocket center self.pocket_center_str was set to the pocket center in string format

Note

Deprecated the deepdrug3d version to calculate the pocket center. Instead, use mdanalysis to calculate the center of geometry.

get_pocket_residues()[source]#

[Require self.residues_at_pocket] Get the residues at the pocket from the trajectory by MDAnalysis selection.

Returns

self.pocket_residues (MDAnlysiis AtomGroup) was set to the residues

at the pocket

get_protein()[source]#

Get the protein from the trajectory by MDAnalysis selection.

Returns

self.protein was set to the protein

get_residues_at_pocket(ligand_aa_dist: int = 5, aa_existence_time: float = 0.5)[source]#

[Require ligand name] Get the resnames of the anchored residues at the pocket over a trajectory.

Parameters
  • ligand_aa_dist – int, the distance (Å) from the ligand to consider the residues.

  • aa_existence_time – float, the fraction of the trajectory that the residue should be present to be considered.

Returns

self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket in string format

get_residues_at_pocket_by_center(pocket_center: Optional[list] = None)[source]#

Get the residues at the pocket by the pocket center and the radius (self.radius_of_interest).

Parameters
  • pocket_center – list, the pocket center. The default is None,

  • trajectory (which will attempt to use the pocket center stored in the) –

  • handler.

Returns

self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket in string format

preprocess_workflow(pdb_path: str | pathlib.Path, ply_path: str | pathlib.Path, h5_path: str | pathlib.Path, frame: int = 0, with_label: bool = True)[source]#

Preprocessing workflow for a frame, including writing the structure, features, labels, interest region, and voxelised data.

Parameters
  • pdb_path – str, the path to the PDB file

  • ply_path – str, the path to the PLY file

  • h5_path – str, the path to the H5 file

  • frame – int, the frame number to get the features

  • with_label – bool, if True, the labels will be included in the H5 file

Returns

Save the PDB file in pdb_path Save the PLY file with the MASIF features in ply_path Save the H5 file in h5_path with [‘raw’] or [‘raw’ and ‘label’] (if with_label is True)

Note

  • raw: the voxelised data of the features

  • label: the voxelised data of the labels (if `with_label is True)

read_fragment_aux_file(aux_file_path: Optional[Union[str, Path]] = None)[source]#

Read the fragments from an auxiliary file.

Parameters

aux_file_path – str, the path to the auxiliary file (format: json), if not provided, use the default example file

Returns

self.labels_info was set to the fragments

Example:

from ProBiSEnSe.utils.datasets.traj_handler import TrajectoryHandler
traj_handler = TrajectoryHandler(
    top_path="example.pdb",
    trajectory_path="example.xtc",
    ligand_name="LIG",
)
aux_file_path = "example.json"
traj_handler.read_fragment_aux_file(aux_file_path)
print(traj_handler.label_fragment_info)
# Output:
>>> {
        "0": {
        "name": "out of the threshold"
        },
        "1": {
            "name": "fragment 1",
            "fragments_idx": [0, 1, 2, 3, 30, 45, 52]
        },
        "2": {
            "name": "fragment 2",
            "fragments_idx":
            [4, 5, 6, 7, 26, 27, 29, 31, 32, 43, 44, 46, 47, 50]
    }
read_pocket_aux_file(aux_file_path: str | pathlib.Path)[source]#

Read the residues at the pocket and the pocket center from an auxiliary file.

Parameters

aux_file_path – str, the path to the auxiliary file

Returns

self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket

in string format

self.pocket_center was set to the pocket center self.pocket_center_str was set to the pocket center in string format

read_pocket_from_string(residues_at_pocket_str: Optional[str] = None, pocket_center_str: Optional[str] = None)[source]#

Read the residues at the pocket and the pocket center from strings.

Parameters
  • residues_at_pocket_str – str, the residues at the pocket in string format (default: None)

  • pocket_center_str – str, the pocket center in string format (default: None)

Returns

self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket

in string format

self.pocket_center was set to the pocket center self.pocket_center_str was set to the pocket center in string format

set_config(radius_of_interest: Optional[float] = None, spacing: Optional[float] = None, distance_cutoff: Optional[float] = None)[source]#

Set the configuration (radius of interest, spacing, distance_cutoff), and detect whether the trajectory has multiple segids.

Parameters
  • radius_of_interest – float (recommend = 16), the radius (Å) to consider the interest region

  • spacing – float (recommend = 0.5), the spacing (Å) to consider the interest region

  • distance_cutoff – float (Å) (recommend = 5), the surface points will be labelled only if the distance of the point to the ligand’s heavy atoms within this distance cutoff.

Returns

self.radius_of_interest was set to the radius_of_interest self.spacing was set to the spacing self.distance_cutoff was set to the distance_cutoff

write_features_to_ply(pdb_path: str | pathlib.Path, ply_path: str | pathlib.Path, frame: Optional[int] = None)[source]#

Write the MASIF features to a PLY file. If the PDB file does not exist, it will be created from the trajectory.

Parameters
  • pdb_path – str, the path to the PDB file

  • ply_path – str, the path to the PLY file

  • frame – int, the frame number to get the features

Returns

Save the PLY file with the MASIF features in ply_path

write_pocket_aux_file(aux_file_path: str | pathlib.Path)[source]#

Write the residues at the pocket and the pocket center to an auxiliary file.

Parameters

aux_file_path – str, the path to the auxiliary file

Returns

Save the auxiliary file in aux_file_path

write_structure(pdb_path: str | pathlib.Path, frame: int, structure_type: Literal['complex', 'protein', 'ligand'] = 'protein', fragmentation: bool = False)[source]#

Write the structure as a PDB file for a specific frame.

Parameters
  • pdb_path – str, the path to the PDB file

  • frame – int, the frame number to get the structure

  • structure_type – str, the type of structure to write (complex, protein, ligand)

  • fragmentation – bool, if True, the structure will be fragmented

Returns

Save the PDB file in pdb_path

write_trajectory(traj_path: str | pathlib.Path, start_frame: int = 0, end_frame: Optional[int] = None, structure_type: Optional[Literal['complex', 'protein', 'ligand']] = None, step: int = 1)[source]#

Write the trajectory as a traj file.

Parameters
  • traj_path – str, the path to the trajectory file

  • start_frame – int, the frame number to start

  • end_frame – int, the frame number to end. If None, it will be the total number of frames.

  • structure_type – str, the type of structure to write (complex, protein, ligand). If None, it will be all atoms.

  • step – int, the step to write the frames

Returns

Save the trajectory file in traj_path.

write_voxelised_data_to_h5(ply_path: str | pathlib.Path, h5_path: str | pathlib.Path, with_label: bool = True)[source]#

Write the surface vertices into voxelised data and save in an H5 file.

Parameters
  • ply_path – str, the path to the input PLY file

  • h5_path – str, the path to the outpu H5 file

  • with_label – bool, if True, the labels will be included in the H5 file

Returns

Save the H5 file in h5_path with [‘raw’] or [‘raw’ and ‘label’] (if with_label is True)

Note

  • raw: the voxelised data of the features

  • label: the voxelised data of the labels (if with_label is True)

utils.datasets.traj_handler.check_standard_names(u: Universe)[source]#

Check the resnames and atom names for the topology.

Parameters

u – The input MDAnalysis Universe.

Returns

None

utils.datasets.traj_handler.convert_to_standard_names(PDB_path: str | pathlib.Path)[source]#

Convert the resnames and atom names in the PDB file to standard ones.

Parameters

PDB_path – The path to the input PDB file.

Returns

None. The converted PDB file will be saved to the same directory with the suffix “_converted.pdb”.

utils.datasets.traj_handler.get_ligand_around_resids(u: Universe, ligand_name: str, ligand_aa_dist: int, aa_existence_time: float = 0.5, with_segid: bool = False) list[source]#

Get the residues around the ligand over a trajectory.

Parameters
  • u (required) – The input MDAnalysis Universe, representing the molecular system.

  • ligand_name (required) – The name of the ligand to find residues around.

  • ligand_aa_dist – The distance (Å) from the ligand to consider residues as nearby.

  • aa_existence_time – The fraction of the trajectory during which a residue must be present near the ligand to be included (default = 50%).

  • with_segid – If True, includes the segment ID (segid) in the returned residues.

Returns

A list of the residues (include only resids, if with_segid is False; include segids and resids, if with_segid is True) around the ligand.

Return type

list

utils.datasets.traj_handler.get_resname_with_resid(u: Universe, resids: list) list[source]#

Get the residue name with the residue ID.

Parameters
  • u – The input MDAnalysis Universe.

  • resids – The residue IDs to get the residue names.

Returns

A list of the residue names with the residue IDs.

Return type

list

Example:

from MDAnalysis import Universe
u = Universe("example.pdb")
resids = [1, 2, 3]
resname_with_resid = get_resname_with_resid(u, resids)
print(resname_with_resid)
# Output:
>>> ['ALA1', 'ARG2', 'GLU3']