Search code examples
pythonpython-3.xmdanalysis

MDAnalysis: Trajectories saved from Readers of the same Universe are incorrect


I have three trajectory replicas (xtc) of a membrane protein in a simulated physiological environment (water, ions, membrane...) in a MDAnalysis' (2.2.0) Universe. I want to save other three additional xtcs that contain only the trajectory of the protein (of the atoms of the protein), one per each of the original xtc trajectories. When I try to iterate through each of the three MDAnalysis' Readers contained in the Universe, the first saved trajectory seems to be correct, but the other two have the same coordinates in all the frames. The starting, complete trajectories are correct. If my starting point is necessarily a Universe with the three Readers, how do I do this correctly and efficiently?

Code:

import MDAnalysis as mda
u = mda.Universe("11159_dyn_117.pdb", "11156_trj_117.xtc", "11157_trj_117.xtc", "11158_trj_117.xtc")
protein = u.select_atoms("protein")
protein.write("protein.pdb")

for num, reader in enumerate(u.trajectory.readers, 1):
    with mda.Writer(f"{num}.xtc", protein.n_atoms) as w:
        for ts in reader.trajectory:
            w.write(protein.atoms)

# Then check the generated individual trajectories by loading them in 
# Universes and checking the positions array. I checked them in PyMOL.

Files downloadable at: https://submission.gpcrmd.org/dynadb/dynamics/id/117/ (model file and trajectory files)


Solution

  • You can write a trajectory directly from an AtomGroup with the AtomGroup.write(name, frames=trajectory_iterator) method. Access the start/stop frames in the chained trajectory with the private ChainReader._start_frames attribute (not documented).

    import MDAnalysis as mda
    
    # example data
    from MDAnalysisTests import datafiles as data
    
    # create a chained trajectory and select some atoms
    u = mda.Universe(data.PSF, [data.DCD, data.DCD])
    protein = u.select_atoms("protein")
    
    # get start/stop frames: 
    # array([  0,  98, 196]) for this example
    sf = u.trajectory._start_frames
    
    # write each subtrajectory of the chained trajectory
    # to a new file in a different format (only containing
    # the atoms of the selected AtomGroup)
    for i, (start, stop) in enumerate(zip(sf[:-1], sf[1:])):
        protein.atoms.write(f"protein_{i}.xtc", frames=u.trajectory[start:stop])
    

    This will produce trajectories protein_0.xtc and protein_1.xtc. If you want to load them, don't forget to create a file that contains a minimal topology for the selection

    protein.write("protein.gro")
    

    so that you can load the new trajectories with

    p1 = mda.Universe("protein.gro", "protein_1.xtc")
    p2 = mda.Universe("protein.gro", "protein_2.xtc")
    

    Notes

    • I am using the start/stop frames from ChainReader._start_frames instead of the raw lengths of trajectories because they are updated appropriately if the ChainReader detects overlapping frames when using continuous=True.
    • Instead of using frames=u.trajectory[start:stop] one can also use frames=slice(start, stop).
    • Instead of writing protein.atoms.write() one normally writes protein.write(), which is equivalent and shorter, but I wanted to make clear that it's always correct to go to the atoms, in particular if one wanted to write a whole universe, in which case one would use u.atoms.write().
    • Instead of using the convenience method AtomGroup.write() one could also use explicit trajectory writing where you open a trajectory for writing with
      with mda.Writer("protein_1.xtc", protein.n_atoms) as W:
          for ts in u.trajectory[start:stop]:
             W.write(protein)
      
      which provides more control over every step but is more verbose.