Search code examples
python-xarraygribgfscfgrib

Xarray mfdataset combining files with different variables using cfgrib engine


I have a folder with several files in .grib2 extension and some of them have the tcc variable (cloud cover) and others don't. I would like to open all files in a single array with this variable but it gives an error. I can only open a single file that has the tcc variable at a time. How can I edit the above code to only open the files that have the tcc variable and concatenate?

#!/usr/bin/env python
# coding: utf-8

# In[2]:


import os, sys
import xarray as xr
import pygrib
import pandas as pd
import windpowerlib
import numpy as np

from datetime import datetime, timedelta
import datetime
import warnings
warnings.filterwarnings('ignore')

import metpy
import metpy.calc as mpcalc
from metpy.units import units

from eccodes import *

#Colocar aqui o caminho dos arquivos do GFS

path_list = '/media/william/PhD/DownloadRadiation/GFS/20200716/gfs*.grib2'


low_cloud  = xr.open_mfdataset(path_list, concat_dim='valid_time', decode_times=False, combine='nested', engine='cfgrib', backend_kwargs={ 'filter_by_keys':{ 'cfVarName': 'tcc', 'typeOfLevel': 'lowCloudLayer'},'indexpath':''})

But it gives me an empty array. How can I read all files correctly?

If I modify the code above to read only the variable level I get the following error:

low_cloud  = xr.open_mfdataset(path_list, concat_dim='valid_time', decode_times=False, combine='nested', engine='cfgrib', backend_kwargs={ 'filter_by_keys':{'typeOfLevel': 'lowCloudLayer'},'indexpath':''})

Unexpected exception formatting exception. Falling back to standard exception

Traceback (most recent call last):
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code
    else 0x0
  File "/tmp/ipykernel_3037918/493588173.py", line 1, in <module>
    low_cloud  = xr.open_mfdataset(path_list, concat_dim='valid_time', decode_times=False, combine='nested', engine='cfgrib',
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/backends/api.py", line 1003, in open_mfdataset
    if parallel:
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/combine.py", line 365, in _nested_combine
    combined = _combine_nd(
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/combine.py", line 239, in _combine_nd
    combined_ids = _combine_all_along_first_dim(
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/combine.py", line 275, in _combine_all_along_first_dim
    new_combined_ids[new_id] = _combine_1d(
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/combine.py", line 298, in _combine_1d
    combined = concat(
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/concat.py", line 243, in concat
    fill_value=fill_value,
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/concat.py", line 504, in _dataset_concat
    grouped = {
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/merge.py", line 302, in merge_collected
    merged_vars[name] = unique_variable(
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/xarray/core/merge.py", line 156, in unique_variable
    raise MergeError(
xarray.core.merge.MergeError: conflicting values for variable 'time' on objects to be combined. You can skip this check by specifying compat='override'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 1997, in showtraceback
    sys.last_traceback = tb
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1112, in structured_traceback
    etype, value, tb = sys.exc_info()
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/IPython/core/ultratb.py", line 1006, in structured_traceback
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/IPython/core/ultratb.py", line 859, in structured_traceback
    evalue: Optional[BaseException],
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/IPython/core/ultratb.py", line 793, in format_exception_as_a_whole
    pass
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/IPython/core/ultratb.py", line 848, in get_records
    formatter = None
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/stack_data/core.py", line 597, in stack_data
    yield from collapse_repeated(
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/stack_data/utils.py", line 84, in collapse_repeated
    else:
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/stack_data/core.py", line 587, in mapper
    return cls(f, options)
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/stack_data/core.py", line 551, in __init__
    self.executing = Source.executing(frame_or_tb)
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/executing/executing.py", line 328, in executing
    # type: (Union[types.TracebackType, types.FrameType]) -> "Executing"
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/executing/executing.py", line 250, in for_frame
    self._nodes_by_line = defaultdict(list)
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/executing/executing.py", line 278, in for_filename
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/executing/executing.py", line 288, in _for_filename_and_lines
    filename = str(filename)
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/stack_data/core.py", line 97, in __init__
    return [
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/executing/executing.py", line 392, in asttokens
    # classes have a mappingproxy preventing us from using setdefault
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/asttokens/asttokens.py", line 73, in __init__
    self._line_numbers.line_to_offset(*start),
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/asttokens/asttokens.py", line 86, in mark_tokens
    return self._text[start: end]
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/asttokens/mark_tokens.py", line 61, in visit_tree
    util.visit_tree(node, self._visit_before_children, self._visit_after_children)
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/asttokens/util.py", line 246, in visit_tree
    ``par_value`` is as returned from ``previsit()`` of the parent, and ``value`` is as
  File "/home/william/anaconda3/envs/WRF/lib/python3.9/site-packages/asttokens/mark_tokens.py", line 87, in _visit_after_children
    if util.is_empty_astroid_slice(child):
AttributeError: module 'asttokens.util' has no attribute 'is_empty_astroid_slice'

Solution

  • open_mfdataset basically performs two steps for you, first opening all datasets then doing any necessary merge/concatenation. Because this can lead to confusion about what is going on behind the scenes, here is some general advice when attempting to build complex multi-file datasets in Xarray.

    1. Start small and gradually add complexity
    2. Open datasets individually and attempt to manually concatenate/merge them before throwing them at open_mfdataset.
    import glob
    import xarray as xr
    
    files = glob.glob('path/to/files/*.grib2')
    
    # try merging or concatenating the first two datasets
    ds0 = xr.open_dataset(files[0])
    ds1 = xr.open_dataset(files[1])
    
    # merge, see options here: https://docs.xarray.dev/en/stable/generated/xarray.merge.html
    ds_merged = xr.merge([ds0, ds1], ...)
    
    # or concat, see options here: https://docs.xarray.dev/en/stable/generated/xarray.concat.html
    ds_concat = xr.concat([ds0, ds1], dim='time', ... )
    

    Then, once you have figured out a pattern for combining your data, try open_mfdataset with those parameters. For example:

    xr.open_mfdataset(files, concat_dim='time', combine='by_coords', ...)