Search code examples
pythonpandastime-series

Python function for determining if selected file contains time sereis data, doesnot work,


I am trying to create an application for classfiying Time sereis data. I created a function which is supposed to check whether a file contains time series data or not.

I uploaded a time series data from the following website to check if my function worked as intended but unfortunately it does not.

Here is the website where I downloaded the specific dataset I used for testing. https://www.timeseriesclassification.com/description.php?Dataset=ACSF1

Here is my code, the function called 'is_time_series' is the one having problems and it is triggered when I press "Train Test". Please note the function that is not working properly is generated by ChatGPT.

Here is the rest of the main.py file. The function above is part of main.py


def is_time_series(self, file_path):
    try:
        # Assuming the file is comma-separated
        df = pd.read_csv(file_path, header=None)
        
        # Check if the data is numerical
        if not all(df.dtypes.apply(lambda x: np.issubdtype(x, np.number))):
            return False
        
        # Optionally, you can add more logic here to verify time series characteristics
        # For example, check if the first column is monotonic or if there are multiple columns of data.
        # Since your data doesn't have explicit datetime columns, we assume it's a valid time series.

        return True

    except Exception as e:
        print(f"Error reading file: {e}")
        return False
        

This is where the function is being called


def validate_inputs(self, classifierSelection):
        # Getting `train_data_entry` and `test_data_entry` from singleDataset.py
        train_file = classifierSelection.train_data_entry.get()
        test_file = classifierSelection.test_data_entry.get()
        numRuns = classifierSelection.runEntry.get()
        custom_classifier_file = classifierSelection.custom_classifier_entry.get()

        if not train_file:
            self.show_error("Error: Please select a training data file.")
            return False
        if not self.is_time_series(train_file):
            self.show_error("Error: Training data file does not seem to contain valid time series data.")
            return False
        if not test_file:
            self.show_error("Error: Please select a testing data file.")
            return False
        if not self.is_time_series(test_file):
            self.show_error("Error: Testing data file does not seem to contain valid time series data.")
            return False
        
        if custom_classifier_file and not custom_classifier_file.endswith(".py"):
            self.show_error("Error: Custom classifier file must end with '.py'.")
            return False
        
        if not self.checkRuns(numRuns):
            self.show_error("Error: The number of runs cannot be less than 1 or empty")

        return True

Solution

  • The following code worked for me. I was planning using a library which contained different classifiers for my application. I used load_from_ts_file function from the AEON library, and now my code does work fine.

    def is_time_series(self, file_path):
        try:
            # Attempt to load the file using aeon's load_from_tsfile function
            X, y = load_from_tsfile(file_path)
            
          
            if isinstance(X, pd.DataFrame):
                # Check if there's at least one data point
                if X.shape[0] < 1 or X.shape[1] < 1:
                    return False
                
                # Check if the data is numerical
                if not all(X.dtypes.apply(lambda x: np.issubdtype(x, np.number))):
                    return False
            
            # If all checks pass, return True
            return True
    
        except Exception as e:
            print(f"Error reading file: {e}")
            return False