Parallel Processor#

class utils.parallel.framework.Parallelization(max_workers=8)[source]#

Bases: object

Parallelization framework for running functions in parallel. you will need to prepare the inputs and the function to run.

Parameters

max_workers (int) – maximum number of parallel workers (default: 8)

It is a simple framework that uses the multiprocessing.Pool module to run functions in parallel. You need to prepare the inputs as a list of lists, where each inner list contains the arguments for a single function call.

Example

# Import the Parallelization class.
from utils.parallel.framework import Parallelization

# Create a Parallelization object and prepare the inputs.
p_job = Parallelization(max_workers=4)
p_job.prepare(test=True)

# Print out the inputs in parallel.
p_job.run(func=p_job.test_func)
class utils.parallel.framework.TrajHandlerPreprocess(max_workers=8, logger_path: Optional[str] = None, task_name: str = 'core-dataprep')[source]#

Bases: Parallelization

Parallelization framework for preprocessing trajectory data. You will need to prepare the inputs and the function to run.

Parameters
  • max_workers (int) – maximum number of parallel workers (default: 8)

  • logger_path (str) – path to save the logger file (default: None). If None, no logger will be created.

  • task_name (str) – name of the task for the logger (default: "core-dataprep")

Example

# Import the TrajHandlerPreprocess class.
from utils.parallel.framework import TrajHandlerPreprocess

# Create a TrajHandlerPreprocess object and prepare the inputs.
p_job = TrajHandlerPreprocess(max_workers=4, logger_path="preprocess.log")
p_job.prepare(
    traj_handler=traj_handler,
    root_path="output/p_data",
    filename="traj_data",
    frames_list=[0, 1, 2, 3, 4],
)

# Print out the inputs in parallel.
p_job.run(func=preprocess_workflow)

Note

The source code of the preprocess_workflow function can be found in utils.datasets.general.preprocess_workflow().

prepare(traj_handler, config=None, **kwargs)[source]#

Prepare the inputs for the preprocess workflow.

Parameters
  • traj_handler – MDAnalysis Universe object

  • config – configuration object. If None, the root_path and filename are required in the kwargs (optional keyword arguments).

  • root_path (str, optional keyword argument) – root path for the output files. If None, the root path is required in the config file (output_p_data_folderpath).

  • filename (str, optional keyword argument) – filename for the output files. If None, the filename is required in the config file (p_filename).

  • frames_list (list, optional keyword argument) – list of frames to process (e.g. [1, 2, 3]). If None, all frames will be processed.

  • index_path (str, optional keyword argument) – path to the index file. If not provided, the one found in the config file (output_index_path) will be used. Otherwise, it will be generated by [root_path]/[filename]_index.txt.

class utils.parallel.framework.TrajHandlerPrediction(max_workers=8, logger_path: Optional[str] = None, task_name: str = 'core-predict')[source]#

Bases: TrajHandlerPreprocess

Parallelization framework for prediction workflow. You will need to prepare the inputs and the function to run.

Parameters
  • max_workers (int) – maximum number of parallel workers (default: 8)

  • logger_path (str) – path to save the logger file (default: None). If None, no logger will be created.

  • task_name (str) – name of the task for the logger (default: "core-predict")

Example

# Import the TrajHandlerPrediction class.
from utils.parallel.framework import TrajHandlerPrediction

# Create a TrajHandlerPrediction object and prepare the inputs.
p_job = TrajHandlerPrediction(max_workers=4, logger_path="predict.log")
p_job.prepare(
    traj_handler=traj_handler,
    root_path="output/p_data",
    filename="traj_data",
    frames_list=[0, 1, 2, 3, 4],
)

# Setup the function.
p_job.set_function(
    func=add_prediction_to_ply,
    model_path="model/best_model.pt",
)

# Run the prediction in parallel.
p_job.run()

Note

The source code of the add_prediction_to_ply function can be found in utils.datasets.general.add_prediction_to_ply().

prepare(traj_handler, config=None, **kwargs)[source]#

Prepare the inputs for the preprocess workflow.

Parameters
  • traj_handler – MDAnalysis Universe object

  • config – configuration object. If None, the root_path and filename are required in the kwargs (optional keyword arguments).

  • root_path (str, optional keyword argument) – root path for the output files. If None, the root path is required in the config file (output_p_data_folderpath).

  • filename (str, optional keyword argument) – filename for the output files. If None, the filename is required in the config file (p_filename).

  • frames_list (list, optional keyword argument) – list of frames to process (e.g. [1, 2, 3]). If None, all frames will be processed.

  • index_path (str, optional keyword argument) – path to the index file. If not provided, the one found in the config file (output_index_path) will be used. Otherwise, it will be generated by [root_path]/[filename]_index.txt.

class utils.parallel.framework.TrajHandlerVisualization(max_workers=8, logger_path: Optional[str] = None, task_name: str = 'core-vis')[source]#

Bases: TrajHandlerPreprocess

Parallelization framework for visualization workflow. You will need to prepare the inputs and the function to run.

Parameters
  • max_workers (int) – maximum number of parallel workers (default: 8)

  • logger_path (str) – path to save the logger file (default: None). If None, no logger will be created.

  • task_name (str) – name of the task for the logger (default: "core-vis")

Example

# Import the TrajHandlerVisualization class.
from utils.parallel.framework import TrajHandlerVisualization

# Create a TrajHandlerVisualization object and prepare the inputs.
p_job = TrajHandlerVisualization(max_workers=4)
p_job.prepare(
    traj_handler=traj_handler,
    root_path="output/p_data",
    filename="traj_data",
    frames_list=[0, 1, 2, 3, 4],
)

# Setup the function.
p_job.set_function(func=generate_pse, pymol_path="path/to/pymol")

# Run the visualization in parallel.
p_job.run()

Note

The source code of the generate_pse function can be found in utils.pymol_scripts.vis_pdb_ply.generate_pse().

prepare(traj_handler, config=None, **kwargs)[source]#

Prepare the inputs for the preprocess workflow.

Parameters
  • traj_handler – MDAnalysis Universe object

  • config – configuration object. If None, the root_path and filename are required in the kwargs (optional keyword arguments).

  • root_path (str, optional keyword argument) – root path for the output files. If None, the root path is required in the config file (output_p_data_folderpath).

  • filename (str, optional keyword argument) – filename for the output files. If None, the filename is required in the config file (p_filename).

  • frames_list (list, optional keyword argument) – list of frames to process (e.g. [1, 2, 3]). If None, all frames will be processed.

  • index_path (str, optional keyword argument) – path to the index file. If not provided, the one found in the config file (output_index_path) will be used. Otherwise, it will be generated by [root_path]/[filename]_index.txt.