Search code examples
pythonpymatgen

Pymatgen: How to convert query result to structure


We have existing code to get some material properties for many materials (>60,000).

from pymatgen import MPRester
mpr = MPRester(api_key="")
criteria={"nelements":{'$lt':4}}
properties=["pretty_formula","cif","material_id", "formation_energy_per_atom", "band_gap"]

c = mpr.query(criteria=criteria,properties=properties)

But for this project we need the information in a specific form, namely in structures. I can get this structures easily by calling them for every material ID individually:

structures = []
for mid in mid_list:
    structures.append(mpr.get_structure_by_material_id(mid))

Which calls this function in matproj.py:

    def get_structure_by_material_id(self, material_id, final=True,
                                     conventional_unit_cell=False):
        """
        Get a Structure corresponding to a material_id.

        Args:
            material_id (str): Materials Project material_id (a string,
                e.g., mp-1234).
            final (bool): Whether to get the final structure, or the initial
                (pre-relaxation) structure. Defaults to True.
            conventional_unit_cell (bool): Whether to get the standard
                conventional unit cell

        Returns:
            Structure object.
        """

The problem is, that this takes very long (>4 hours) and sometimes gets stuck during the call to the API.

Is there a way to avoid calling the API 60,000 times and convert the initial query results instead?


Solution

  • You don't need to query for each individual mpid. Your first code block already queries for the "cif" information of all the materials!

    All you need to do is to convert the cif strings to structures using PyMatGen:

    from pymatgen.io.cif import CifParser
    structures = []
    for material in c:
        structures.append(CifParser.from_string(material["cif"]).get_structures()[0])