Trajectory handler#
- class utils.datasets.traj_handler.TrajectoryHandler(top_path: str | pathlib.Path, trajectory_path: Optional[Union[str, Path]] = None, ligand_name: Optional[str] = None, radius_of_interest: float = 16.0, spacing: float = 0.5, distance_cutoff: float = 5.0, warning_check: bool = True)[source]#
Bases:
objectTrajectory handler for protein-ligand complex / protein-only trajectory.
This class is used to handle the trajectory of a protein-ligand complex or protein-only trajectory. It provides methods to read the trajectory, get the residues at the pocket, get the pocket center, write the structure, features, labels, interest region, and voxelised data. It also provides methods to preprocess the data and write the auxiliary files.
- Parameters
top_path (required) – the path to the topology file (.pdb, .gro, … [MDAnalysis compatible])
trajectory_path (required) – the path to the trajectory file (.trr, .xtc, … [MDAnalysis compatible])
ligand_name (optional, recommended to provide) – the name of the ligand
radius_of_interest (optional) – the radius (Å) to consider the interest region (default:
16.0)spacing (optional) – the spacing (Å) between the grid points (default =
0.5due to the sampling theorem from the mesh spacing 1Å)distance_cutoff (optional) – the surface points will be labelled only if the distance of the point to the ligand’s heavy atoms within this distance cutoff. (default:
5.0Å)warning_check (optional) – if
True, the warnings will be shown (default =True)
- Returns
self.top_path was set to the top_path self.trajectory_path was set to the trajectory_path self.ligand_name was set to the ligand_name self.universe was set to the MDAnalysis Universe object self.warning_check was set to the warning_check
Note
- Functions:
high-level functions (can use self.variables and self.functions): low-level functions (can only use self.functions):
- add_interest_region_to_ply(ply_path: str | pathlib.Path, ply_path_output: Optional[Union[str, Path]] = None)[source]#
Add the interest region to a PLY file.
- Parameters
ply_path – str, the path to the input PLY file.
ply_path_output – str, the path to the output PLY file. If not provided, the input PLY file will be overwritten.
- Returns
Save the PLY file with the interest region in ply_path_output. If the ply_path_output is not provided, otherwise in ply_path.
- add_labels_to_ply(ply_path: str | pathlib.Path, ref_ligand_frame: int, ply_path_output: Optional[Union[str, Path]] = None)[source]#
Add the labels to a PLY file.
- Parameters
ply_path – str, the path to the PLY file
ref_ligand_frame – int, the frame number to get the reference ligand for the surface.
ply_path_output – str, the path to the output PLY file
- Returns
Save the PLY file with the labels in ply_path_output. If the ply_path_output is not provided, otherwise in ply_path.
- align_traj_to_pocket(reference: Optional[Union[Universe, AtomGroup, int]] = None, select_Hs: bool = False, update_pocket_center: bool = True)[source]#
Use the pocket resids to align the trajectory to the pocket.
Requirement: residues_at_pocket.
- Parameters
reference – MDAnalysis Universe object, AtomGroup object, or int, the reference to align the trajectory
select_Hs – bool, if
True, the H atoms will be selectedupdate_pocket_center – bool, if
True, the pocket center will be updated after the alignment (default:True)
- Returns
- Align the trajectory to the pocket. See self.universe,
it will be updated.
- get_complex()[source]#
[Require ligand name] Get the complex from the trajectory (inlcuding, protein, ligand, protein + ligand) by MDAnalysis selection.
- Returns
self.ligand was set to the ligand self.protein was set to the protein self.complex was set to the complex (protein + ligand)
- get_frame(frame_number: int)[source]#
Get the frame of the trajectory.
- Parameters
frame_number – int, the frame number to get
- Returns
self.universe.trajectory was set to the frame_number
- get_ligand()[source]#
[Require ligand name] Get the ligand from the trajectory by MDAnalysis selection.
- Returns
self.ligand was set to the ligand
- get_pocket_center(frame: int = 0)[source]#
[Require ligand name] Get the pocket center at a specific frame (default = 0).
- Parameters
frame – int, the frame number to get the pocket center
- Returns
self.pocket_center was set to the pocket center self.pocket_center_str was set to the pocket center in string format
Note
Deprecated the deepdrug3d version to calculate the pocket center. Instead, use mdanalysis to calculate the center of geometry.
- get_pocket_residues()[source]#
[Require self.residues_at_pocket] Get the residues at the pocket from the trajectory by MDAnalysis selection.
- Returns
- self.pocket_residues (MDAnlysiis AtomGroup) was set to the residues
at the pocket
- get_protein()[source]#
Get the protein from the trajectory by MDAnalysis selection.
- Returns
self.protein was set to the protein
- get_residues_at_pocket(ligand_aa_dist: int = 5, aa_existence_time: float = 0.5)[source]#
[Require ligand name] Get the resnames of the anchored residues at the pocket over a trajectory.
- Parameters
ligand_aa_dist – int, the distance (Å) from the ligand to consider the residues.
aa_existence_time – float, the fraction of the trajectory that the residue should be present to be considered.
- Returns
self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket in string format
- get_residues_at_pocket_by_center(pocket_center: Optional[list] = None)[source]#
Get the residues at the pocket by the pocket center and the radius (self.radius_of_interest).
- Parameters
pocket_center – list, the pocket center. The default is None,
trajectory (which will attempt to use the pocket center stored in the) –
handler. –
- Returns
self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket in string format
- preprocess_workflow(pdb_path: str | pathlib.Path, ply_path: str | pathlib.Path, h5_path: str | pathlib.Path, frame: int = 0, with_label: bool = True)[source]#
Preprocessing workflow for a frame, including writing the structure, features, labels, interest region, and voxelised data.
- Parameters
pdb_path – str, the path to the PDB file
ply_path – str, the path to the PLY file
h5_path – str, the path to the H5 file
frame – int, the frame number to get the features
with_label – bool, if
True, the labels will be included in the H5 file
- Returns
Save the PDB file in pdb_path Save the PLY file with the MASIF features in ply_path Save the H5 file in h5_path with [‘raw’] or [‘raw’ and ‘label’] (if with_label is
True)Note
raw: the voxelised data of the features
label: the voxelised data of the labels (if `with_label is
True)
- read_fragment_aux_file(aux_file_path: Optional[Union[str, Path]] = None)[source]#
Read the fragments from an auxiliary file.
- Parameters
aux_file_path – str, the path to the auxiliary file (format: json), if not provided, use the default example file
- Returns
self.labels_info was set to the fragments
Example:
from ProBiSEnSe.utils.datasets.traj_handler import TrajectoryHandler traj_handler = TrajectoryHandler( top_path="example.pdb", trajectory_path="example.xtc", ligand_name="LIG", ) aux_file_path = "example.json" traj_handler.read_fragment_aux_file(aux_file_path) print(traj_handler.label_fragment_info) # Output: >>> { "0": { "name": "out of the threshold" }, "1": { "name": "fragment 1", "fragments_idx": [0, 1, 2, 3, 30, 45, 52] }, "2": { "name": "fragment 2", "fragments_idx": [4, 5, 6, 7, 26, 27, 29, 31, 32, 43, 44, 46, 47, 50] }
- read_pocket_aux_file(aux_file_path: str | pathlib.Path)[source]#
Read the residues at the pocket and the pocket center from an auxiliary file.
- Parameters
aux_file_path – str, the path to the auxiliary file
- Returns
self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket
in string format
self.pocket_center was set to the pocket center self.pocket_center_str was set to the pocket center in string format
- read_pocket_from_string(residues_at_pocket_str: Optional[str] = None, pocket_center_str: Optional[str] = None)[source]#
Read the residues at the pocket and the pocket center from strings.
- Parameters
residues_at_pocket_str – str, the residues at the pocket in string format (default: None)
pocket_center_str – str, the pocket center in string format (default: None)
- Returns
self.residues_at_pocket was set to the residues at the pocket self.residues_at_pocket_str was set to the residues at the pocket
in string format
self.pocket_center was set to the pocket center self.pocket_center_str was set to the pocket center in string format
- set_config(radius_of_interest: Optional[float] = None, spacing: Optional[float] = None, distance_cutoff: Optional[float] = None)[source]#
Set the configuration (radius of interest, spacing, distance_cutoff), and detect whether the trajectory has multiple segids.
- Parameters
radius_of_interest – float (recommend = 16), the radius (Å) to consider the interest region
spacing – float (recommend = 0.5), the spacing (Å) to consider the interest region
distance_cutoff – float (Å) (recommend = 5), the surface points will be labelled only if the distance of the point to the ligand’s heavy atoms within this distance cutoff.
- Returns
self.radius_of_interest was set to the radius_of_interest self.spacing was set to the spacing self.distance_cutoff was set to the distance_cutoff
- write_features_to_ply(pdb_path: str | pathlib.Path, ply_path: str | pathlib.Path, frame: Optional[int] = None)[source]#
Write the MASIF features to a PLY file. If the PDB file does not exist, it will be created from the trajectory.
- Parameters
pdb_path – str, the path to the PDB file
ply_path – str, the path to the PLY file
frame – int, the frame number to get the features
- Returns
Save the PLY file with the MASIF features in ply_path
- write_pocket_aux_file(aux_file_path: str | pathlib.Path)[source]#
Write the residues at the pocket and the pocket center to an auxiliary file.
- Parameters
aux_file_path – str, the path to the auxiliary file
- Returns
Save the auxiliary file in aux_file_path
- write_structure(pdb_path: str | pathlib.Path, frame: int, structure_type: Literal['complex', 'protein', 'ligand'] = 'protein', fragmentation: bool = False)[source]#
Write the structure as a PDB file for a specific frame.
- Parameters
pdb_path – str, the path to the PDB file
frame – int, the frame number to get the structure
structure_type – str, the type of structure to write (complex, protein, ligand)
fragmentation – bool, if
True, the structure will be fragmented
- Returns
Save the PDB file in pdb_path
- write_trajectory(traj_path: str | pathlib.Path, start_frame: int = 0, end_frame: Optional[int] = None, structure_type: Optional[Literal['complex', 'protein', 'ligand']] = None, step: int = 1)[source]#
Write the trajectory as a traj file.
- Parameters
traj_path – str, the path to the trajectory file
start_frame – int, the frame number to start
end_frame – int, the frame number to end. If
None, it will be the total number of frames.structure_type – str, the type of structure to write (complex, protein, ligand). If
None, it will be all atoms.step – int, the step to write the frames
- Returns
Save the trajectory file in traj_path.
- write_voxelised_data_to_h5(ply_path: str | pathlib.Path, h5_path: str | pathlib.Path, with_label: bool = True)[source]#
Write the surface vertices into voxelised data and save in an H5 file.
- Parameters
ply_path – str, the path to the input PLY file
h5_path – str, the path to the outpu H5 file
with_label – bool, if
True, the labels will be included in the H5 file
- Returns
Save the H5 file in h5_path with [‘raw’] or [‘raw’ and ‘label’] (if with_label is
True)Note
raw: the voxelised data of the features
label: the voxelised data of the labels (if with_label is
True)
- utils.datasets.traj_handler.check_standard_names(u: Universe)[source]#
Check the resnames and atom names for the topology.
- Parameters
u – The input MDAnalysis Universe.
- Returns
None
- utils.datasets.traj_handler.convert_to_standard_names(PDB_path: str | pathlib.Path)[source]#
Convert the resnames and atom names in the PDB file to standard ones.
- Parameters
PDB_path – The path to the input PDB file.
- Returns
None. The converted PDB file will be saved to the same directory with the suffix “_converted.pdb”.
- utils.datasets.traj_handler.get_ligand_around_resids(u: Universe, ligand_name: str, ligand_aa_dist: int, aa_existence_time: float = 0.5, with_segid: bool = False) list[source]#
Get the residues around the ligand over a trajectory.
- Parameters
u (required) – The input MDAnalysis Universe, representing the molecular system.
ligand_name (required) – The name of the ligand to find residues around.
ligand_aa_dist – The distance (Å) from the ligand to consider residues as nearby.
aa_existence_time – The fraction of the trajectory during which a residue must be present near the ligand to be included (default = 50%).
with_segid – If
True, includes the segment ID (segid) in the returned residues.
- Returns
A list of the residues (include only resids, if with_segid is
False; include segids and resids, if with_segid isTrue) around the ligand.- Return type
list
- utils.datasets.traj_handler.get_resname_with_resid(u: Universe, resids: list) list[source]#
Get the residue name with the residue ID.
- Parameters
u – The input MDAnalysis Universe.
resids – The residue IDs to get the residue names.
- Returns
A list of the residue names with the residue IDs.
- Return type
list
Example:
from MDAnalysis import Universe u = Universe("example.pdb") resids = [1, 2, 3] resname_with_resid = get_resname_with_resid(u, resids) print(resname_with_resid) # Output: >>> ['ALA1', 'ARG2', 'GLU3']