Advanced usage for FragBEST-Myo#

Trajectory Handler#

Modify initial optional settings#

These optional settings can be adjusted as needed in TrajectoryHandler:

warning_check: Displays detailed information when reading and transforming data. Set it to False to suppress these messages.
radius_of_interest (unit: Å): Defines the size of the region of interest as a spherical area centered on the pocket_center. The default value is 16.0, based on our previously estimated approximate size of the OM pocket in myosin.
spacing (unit: Å): Determines the resolution when transforming data from surface vertices to voxels (used by the deep learning model). Since the resolution of surface vertices generated by MSMS is 1.0, the default value is set to 0.5 to balance computational efficiency and accuracy.
distance_cutoff (unit: Å): Specifies the distance cutoff (default: 5.0) for determining whether an atom is close enough to the ligand to be assigned a label (only applicable when a ligand exists in the system).

Here is an example:

# Load the trajectory using TrajectoryHandler
traj_handler = TrajectoryHandler(
    top_path,
    traj_path,
    ligand_name = ligand_name,
    radius_of_interest = radius_of_interest,
    warning_check = False,

)

Adjust threshold values when getting residues at the pocket#

You can modify threshold values ligand_aa_dist and/or aa_existence_time.

ligand_aa_dist: the cutoff distance between ligand’s and amino acid’s atoms.
aa_existence_time: the fraction of the existernce time for the amino acid’s atoms over MD simulation time.

Example:

# first step for calculating pocket center
traj_handler.get_residues_at_pocket(
    # Distance Between Ligand and Amino Acid Atoms (Unit: Å)
    ligand_aa_dist=5,
    # Fraction of Existence Time of Amino Acid Atoms
    aa_existence_time=0.5,  # 0.5 means 50%
)

# second step for calculating pocket center
traj_handler.get_pocket_residues()

Align trajectory using an MDAnalysis object as the reference#

During the alignment, the reference can also be a MDAnalysis’s Universe or AtomGroup.

reference = mda.Universe(top_path)  # Universe
traj_handler.align_traj_to_pocket(reference=reference)

reference = mda.Universe(top_path).select_atoms("resid 11")  # AtomGroup
traj_handler.align_traj_to_pocket(reference=reference)

Save aligned trajectory#

To save the aligned trajectory, use the self.write_trajectory() function.

# -------------------- Start of user-defined variables --------------------
traj_path_aligned = f"{output_dir}/PPS_OMB_300ns_skip10ns_aligned.xtc"
# -------------------- End of user-defined variables ----------------------

traj_handler.write_trajectory(traj_path=traj_path_aligned)

There are some optional settings available when saving the trajectory.

# -------------------- Start of user-defined variables --------------------
traj_path_aligned = f"{output_dir}/PPS_OMB_300ns_skip10ns_aligned.xtc"
start_frame = 0  # Start frame number
end_frame = None  # End frame number (if None, it will be the last frame)
structure_type = None  # Structure type ("protein", "ligand", "complex", None)
# If None, it will be saved all atoms
step = 10  # Step to save the frames
# (set larger to skip frames. e.g., 10 to save every 10 frames)
# -------------------- End of user-defined variables ----------------------

traj_handler.write_trajectory(
    traj_path=traj_path_aligned,
    start_frame=start_frame,
    end_frame=end_frame,
    structure_type=structure_type,
    step=step,
)

Advanced usage for saving PDB files#

Here are some other examples to save your PDB with different molecules.

Ligand (only available in the ligand-existing trajectory)

# -------------------- Start of user-defined variables --------------------
ligand_path = f"{output_dir}/OMB_{frame}.pdb"
structure_type = "ligand"
# -------------------- End of user-defined variables ----------------------

# ligand
traj_handler.write_structure(
    pdb_path=ligand_path, frame=frame, structure_type=structure_type
)

Complex (only available in the ligand-existing trajectory)

# -------------------- Start of user-defined variables --------------------
complex_path = f"{output_dir}/PPS_OMB_{frame}.pdb"
structure_type = "complex"
# -------------------- End of user-defined variables ----------------------


# complex (protein + ligand)
traj_handler.write_structure(
    pdb_path=complex_path, frame=frame, structure_type=structure_type
)

Ligand/Complex with chemical fragment annotation

If the fragment auxiliary file is available (see how to generate the fragment auxiliary file here), the fragment information for a ligand can be included in the chain column of the PDB file. For example, fragment 1 will appear as 1 in the chain column. To enable this feature, simply set fragmentation=True.

The fragments of OM are predefined in FragBEST-Myo, as shown in the figure below. The prepared fragment information file is available at ./utils/ppseg/myo/ligand_fragments_example.json.

# -------------------- Start of user-defined variables --------------------
fragment_info_path = f"{project_root}/utils/ppseg/myo/ligand_fragments_example.json"
ligand_f_path = f"{output_dir}/OMB_{frame}_fragment.pdb"
complex_f_path = f"{output_dir}/PPS_OMB_{frame}_fragment.pdb"
# -------------------- End of user-defined variables ----------------------

# read the fragment auxiliary file
traj_handler.read_fragment_aux_file(aux_file_path=fragment_info_path)

# ligand with fragments
traj_handler.write_structure(
    pdb_path=ligand_f_path, frame=frame, structure_type="ligand", fragmentation=True
)

# complex with fragments
traj_handler.write_structure(
    pdb_path=complex_f_path, frame=frame, structure_type="complex", fragmentation=True
)

Holo Decriptor Anaylser#

Load custom fragment information file to analyser#

To calculate a volume-based descriptor, you need to specify the path to the file containing ligand fragment information, which includes RDKit-calculated volumes. By default, this file is preloaded automatically, so you don’t need to specify the path manually.

If you load your custom fragment information file, see below example:

hd_analyser = HoloDescriptorAnalyser(
    source_path=output_dir, 
    frag_info_path="./frag_info.json",
)

For guidance on preparing ligand fragment information files, refer to this tutorial.

Specify certain columns and weights to calculate the overall score#

You can specify the columns to include in the calculation of the overall_score. This allows for greater flexibility in tailoring the ranking process to specific needs or priorities. Simply select the desired columns (e.g., z-score normalized columns) and define their corresponding weights for the linear combination. Note that the weights must match the columns you selected.

Additionally, if you would like to ignore the warning messages and rank all conformations solely based on the score, set filter_warning=False.

# ----------------- Start of user-defined variables -----------------
zscore_columns = [
    "nonbck_ratio_zscore",
    "nonbck_class_pt_ratio_zscore",
    "overall_predprobs_zscore",
    "holospace_frag_score_zscore",
]
weights = [1, 2, 3, 1]
# ----------------- End of user-defined variables -------------------

# set the rank
hd_analyser.set_rank(
    zscore_columns=zscore_columns, 
    weights=weights, 
    filter_warning=False
)

# display the result
hd_analyser.descriptors_df.sort_values("rank", ascending=True)

Run dataset preparation and prediction in parallel#

Align the trajectory to the first frame#

When you work with skeletal myosin or another isoform, the provided reference structure may not be suitable for your analysis. In such cases, you can determine the pocket using the first frame (frame 0) of your trajectory and align the trajectory to this frame. Below is an example implementation:

from utils.datasets.traj_handler import TrajectoryHandler

traj_handler = TrajectoryHandler(
    top_path, traj_path, ligand_name=None, warning_check=False
)

# Load the pocket aux file
traj_handler.read_pocket_aux_file(ref_aux_path)
traj_handler.get_pocket_center(frame=0)

# Print the pocket residues and the pocket center
print(f"pocket residues: {traj_handler.residues_at_pocket_str}")
print(f"pocket center: {traj_handler.pocket_center}")

# Align the trajectory to the reference structure
traj_handler.align_traj_to_pocket(reference=0)  # aligned to the first frame

Optional parameters for input preparation#

Here is an example demonstrating the use of optional parameters for input preparation.

frames_list: If you want to process specific frames only, provide a list of frame numbers (e.g., frames_list=[1,2,3]).
index_path: This optional parameter specifies the path where the index file will be saved. The index file contains a list of filenames for each processed PDB file, allowing you to cross-check your data.

# ----------- Start of user-defined variables ------------
logger_path = f"{output_dir}/core-dataprep-info.log"
max_workers = 24
filename_p = "Apo_SMD_protein"  # filename base for each conformation
frames_list = [1, 2, 3]
index_path = f"{output_dir}/index_filename.txt"
# ----------- End of user-defined variables --------------

# Prepare to run the dataset preprocessing
p_jobs = TrajHandlerPreprocess(max_workers=max_workers, logger_path=logger_path)
p_jobs.prepare(
    traj_handler,
    root_path=output_dir,
    filename=filename_p,
    frames_list=frames_list,
    index_path=index_path,
)

# set the function
p_jobs.set_function(preprocess_workflow)

# run the preprocessing workflow
p_jobs.run()  # this takes around 3.5 minutes

Optional parameters for prediction#

Here is an example demonstrating the use of optional parameters for prediction.

frames_list: Specify the frames to process (if necessary).
device: Specify whether to use cpu or cuda for predictions.
descriptor_only:
- Set to False if you haven’t added predictions to your PLY files yet. This will perform a full prediction, including adding pred and predprob to the PLY file and generating the holo-descriptor JSON file.
- Set to True if you have run the predictions and add them to the PLY files. And, you only want to generate the holo-descriptor JSON file based on the existing PLY files.

# ----------- Start of user-defined variables ------------
logger_path = f"{output_dir}/core-predict-info.log"
max_workers = 16
filename_p = "Apo_SMD_protein"  # filename base for each conformation
model_path = f"{project_root}/utils/ppseg/myo/Kfold2_best_model_169_miou=0.7525.pt"
device = "cpu"
frames_list = [1, 2, 3]
decriptor_only = False
# ----------- End of user-defined variables --------------

# Prepare to run the dataset preprocessing
p_jobs = TrajHandlerPrediction(max_workers=max_workers, logger_path=logger_path)
p_jobs.prepare(
    traj_handler, root_path=output_dir, filename=filename_p, frames_list=frames_list
)

# set up the function
p_jobs.set_function(
    func=add_prediction_to_ply,
    model_path=model_path,
    device=device,
    descriptor_only=descriptor_only
)

# run the preprocessing workflow
p_jobs.run()  # this takes around 0.5 minutes"

Configuration-based method#

The configuration-based method is an alternative way to generate those files with a config file (.yaml).

You can use a config file to run the process (dataset preparation, prediction, and visualization with pymol). To do so, you need to prepare your own configuration file (.yaml). Note that the example YAML file provided is set to convert frames 0 to 4 only. If you want to process all conformations in the trajectory, set the frame range to null.

Additionally, we strongly recommend using absolute paths in the configuration file to prevent any potential errors. To make prediction on each frame of your trajectory, please prepare a configuration file (.yaml).

Dataset preparation#

Example yaml config file:

mode: core-dataprep-by-config # name to identify your task

# logger path
logger_path: /[absolue]/[path]/[to]/[FragBEST-Myo]/dataset/examples/outputs_holo_like/core_dataprep.log # path to the logger file

# dataset related parameters
frames_list: [0,1,2,3] # list of frames to be processed (default: null, which means all frames)
p_filename: Apo_SMD_protein # name of the processed data file
output_p_data_folderpath: /[absolue]/[path]/[to]/[FragBEST-Myo]/dataset/examples/outputs_holo_like # path to the folder where the processed data will be saved
output_index_path: null # path to the index file

# workers
max_workers: 4 # number of workers to use for parallel processing

Script:

# --------------------- Start of user-defined variables -------------------
project_root = "/[absolute]/[path]/FragBEST-Myo"
config_path = f"{project_root}/config/core_dataprep_template.yaml"
ref_path = f"{project_root}/dataset/ref/PPS_OMB_min_cg_pl.pdb"
ref_aux_path = f"{project_root}/dataset/ref/pocket_aux.txt"
top_path = f"{project_root}/dataset/examples/Apo_SMD_T300_p_fit.pdb"
traj_path = f"{project_root}/dataset/examples/Apo_SMD_200ns_skip10ns.xtc"
# --------------------- End of user-defined variables ---------------------


from omegaconf import OmegaConf
import sys

sys.path.append(project_root)

from utils.datasets.general import preprocess_workflow
from utils.datasets.traj_handler import TrajectoryHandler
from utils.parallel.framework import TrajHandlerPreprocess

# Load reference structure and trajectory into handler and align to the pocket
ref = TrajectoryHandler(ref_path, warning_check=False)
traj_handler = traj_handler = TrajectoryHandler(
    top_path, traj_path, ligand_name=None, warning_check=False
)
traj_handler.read_pocket_aux_file(ref_aux_path)
traj_handler.align_traj_to_pocket(reference=ref.universe)


# Load the configuration file
config = OmegaConf.load(config_path); print(config)

# Prepare to run the dataset preprocessing
p_jobs = TrajHandlerPreprocess(
    max_workers = config.max_workers, 
    logger_path = config.logger_path, 
    task_name = config.mode,
    )
p_jobs.prepare(traj_handler, config=config)

# Set the function
p_jobs.set_function(preprocess_workflow)

# Run the preprocessing workflow
p_jobs.run() # this takes a couple minutes"

Prediction#

Example yaml config file:

mode: core-predict-by-config # name to identify your task

# logger path
logger_path: /[absolute]/[path]/[to]/[FragBEST-Myo]/dataset/examples/outputs_holo_like/core_prediction.log # path to the logger file

# dataset
frames_list: [0,1,2,3] # list of frames to be processed. If null, all frames will be processed
input_p_data_folderpath: /[absolute]/[path]/[FragBEST-Myo]/dataset/examples/outputs_holo_like # path to the folder where the processed data is saved
input_p_filename: Apo_SMD_protein # name of the processed data file

# parameters
model_path: /[absolute]/[path]/[FragBEST-Myo]/utils/ppseg/myo/Kfold2_best_model_169_miou=0.7525.pt # path to the model
descriptor_only: False  # if True, only the descriptor is returned (which means you might have run the prediction in advance)
device: "cuda" # device to run the model on (cpu or cuda)

# workers
max_workers: 4 # number of workers to use for parallel processing

Script:

# --------------------- Start of user-defined variables -------------------
project_root = "/[absolute]/[path]/FragBEST-Myo"
config_path = f"{project_root}/config/core_prediction_template.yaml"
ref_path = f"{project_root}/dataset/ref/PPS_OMB_min_cg_pl.pdb"
ref_aux_path = f"{project_root}/dataset/ref/pocket_aux.txt"
top_path = f"{project_root}/dataset/examples/Apo_SMD_T300_p_fit.pdb"
traj_path = f"{project_root}/dataset/examples/Apo_SMD_200ns_skip10ns.xtc"
# --------------------- End of user-defined variables ---------------------


from omegaconf import OmegaConf
import sys

sys.path.append(project_root)

from utils.datasets.general import add_prediction_to_ply
from utils.datasets.traj_handler import TrajectoryHandler
from utils.parallel.framework import TrajHandlerPrediction

# Load reference structure and trajectory into handler and align to the pocket
ref = TrajectoryHandler(ref_path, warning_check=False)
traj_handler = traj_handler = TrajectoryHandler(
    top_path, traj_path, ligand_name=None, warning_check=False
)
traj_handler.read_pocket_aux_file(ref_aux_path)
traj_handler.align_traj_to_pocket(reference=ref.universe)


# Load the configuration file for prediction
config = OmegaConf.load(config_path); print(config)

# Prepare to run the dataset preprocessing
p_jobs = TrajHandlerPrediction(
    max_workers=config.max_workers, 
    logger_path=config.logger_path,
    task_name=config.mode,
)
p_jobs.prepare(traj_handler, config)

# Set up the function
p_jobs.set_function(
    func=add_prediction_to_ply,
    model_path=config.model_path,
    descriptor_only=config.descriptor_only,
    device=config.device,
)

# Run the prediction workflow
p_jobs.run() # this takes around 0.5 minutes

Generate PSE visualization files (for pymol)#

Example yaml config file:

mode: core-genpse-by-config # name to identify your task

# logger path
logger_path: /[absolute]/[path]/[to]/[FragBEST-Myo]/dataset/examples/outputs_holo_like/core_genpse.log # path to the logger file

# dataset
frames_list: [0,1,2,3] # list of frames to be processed. If null, all frames will be processed
input_p_data_folderpath: /[absolute]/[path]/[FragBEST-Myo]/dataset/examples/outputs_holo_like # path to the folder where the processed data is saved
input_p_filename: Apo_SMD_protein # name of the processed data file

# workers
max_workers: 4 # number of workers to use for parallel processing

Script:

# --------------------- Start of user-defined variables -------------------
project_root = "/[absolute]/[path]/FragBEST-Myo"
config_path = f"{project_root}/config/core_prediction_template.yaml"
ref_path = f"{project_root}/dataset/ref/PPS_OMB_min_cg_pl.pdb"
ref_aux_path = f"{project_root}/dataset/ref/pocket_aux.txt"
top_path = f"{project_root}/dataset/examples/Apo_SMD_T300_p_fit.pdb"
traj_path = f"{project_root}/dataset/examples/Apo_SMD_200ns_skip10ns.xtc"
# --------------------- End of user-defined variables ---------------------


from omegaconf import OmegaConf
import os
import sys

sys.path.append(project_root)

from utils.datasets.traj_handler import TrajectoryHandler
from utils.parallel.framework import TrajHandlerVisualization
from utils.pymol_scripts.vis_pdb_ply import generate_pse

# Load reference structure and trajectory into handler and align to the pocket
ref = TrajectoryHandler(ref_path, warning_check=False)
traj_handler = TrajectoryHandler(
    top_path, traj_path, ligand_name=None, warning_check=False
)
traj_handler.read_pocket_aux_file(ref_aux_path)
traj_handler.align_traj_to_pocket(reference=ref.universe)


# Set up pymol path
pymol_path = os.popen("pixi run -e pymol which pymol").read().strip()

# Load the configuration file for generating PSE visualization files
config = OmegaConf.load(config_path); print(config)

# Prepare to generate visualization state at the same time
p_jobs = TrajHandlerVisualization(
    max_workers=config.max_workers, 
    logger_path=config.logger_path,
    task_name=config.mode,
)
p_jobs.prepare(traj_handler, config=config)

# set the function
p_jobs.set_function(generate_pse, pymol_path=pymol_path)

# run the preprocessing workflow
p_jobs.run()  # this takes around 3 minutes