Search code examples
pandasclassmatplotlibtkintermultipleselection

Python graphic interface for data visualisation that "leaks data"


I'm sorry for the not very specific title. My problem is pretty strange, here it is:

I am a PhD student and I am trying to code a Tkinter interface to manipulate my data (using matplotlib and the pandas library to create dataframe from csv files) so I can visualize and create figures easily. Each open csv file is loaded as an element (that I call widget) in a list on the right of the window. Here is how it looks: interface with two csv files loaded

It already works well for basic plot but I need to be able to select specific zones of the plots (using matplotlib RectangleSelector). The objective is to add as many rectangle selection as I want by clicking on "add selection" each time I make a rectangle and then plot the selection by clicking on "plot selection" to open a new instance of the interface with only the selected data. Then I can also click on "cancel selection" Here is the new instance appearing after clicking "plot selection": second instance called by the first one


Problems:

  • all "widgets" stay loaded even though I only selected data from the orange one before plotting
  • most important: the "cancel selection" function doesn't work. Meaning that if I select a part of the data, then "cancel selection", then select a different part of the data, then "plot selection" => both part will be loaded in the new interface window. Sometimes it will even create multiple instances of the same "widgets" (a same dataframe with the same filename will appear 2 times)

Here is the initialization of the two classes "PlotWindow" (a bad name for the main window interface) and "Data_Widget" (each widget is the materialisation of a dataframe that you can see on the right of the main window), it's long because I left the commentary, the whole code is about 600 lines:

class PlotWindow(tk.Frame):
    """ This Class creates the main window with the curve plot and general buttons like open files,
Clear all, Plot all.

    Arguments:
    -master: a parent window, not necessary
    -existing_data_widgets_list: used by the button "plot multiple selection" to create a new instance of PlotWindow with already selected data
    -existing_filepath_list: the corresponding list of filepath for the existing_data_widgets_list

    Methods:
    -create_general_controls : creates general controls like Open file, clear all, plot all...
    -matplotlib_spec : defines default matplotlib specifications
    -display_error : displays a message of error, used by other methods  (example: "no data loaded" or "no selection") can also be used for informative messages (example: "plot cleared")
    -load_folder : allows to select multiple folders and detects all CSV file in them
    -load_CSVfile : allows to select multiple CSV files
    -plot_widget : called by each widget method master_plot to plot widget curve with selected widget variables
    -clear_widget_plots : called by each widget method master_clear_plot to clear the widget plot(s)
    -clear_all_plot : clear all plots
    -add_selection : allows to select a range of data from multiple curves with rectangle selector can be called again to add selection to already selected data
    -cancel_selection : empty the selection of the add_selection method
    -plot_add_selection : plot the selection of the add_selection method in another PlotWindow instance creates a widget in this new instance for each widget that have been selected in previous PlotWindow
    -create_createvar_popup : create popup that allows to the create a new variable as function of existing variables
    -linear_regression_selection : make a linear regression of rectangle selection. plot the line on the rectangle selection
    """
    def __init__(self, master=None, existing_data_widgets_list=[], existing_filepath_list=[]):
        super().__init__(master)
        root.title("its plotin time")
        self.master = master
        self.pack()

        self.widgets_list = []                                      #stores all instances of Data_Widget class created
        self.lines = []                                             #stores the plt.plot objects (the curves) to be able to hide/show them
        self.lines_ids = []                                         #stores a number for each plot (each plot_id = widget_id it is plotted from)
        self.current_widget_id = 0                                  #number for each widget created so each widget is identifiable
        self.widget_ids_list=[]                                     #stores each widget ids (1 data widget=1id)
        self.selected_widget_data_list= []                          #used for multiple selection
        self.selected_widget_filepath_list = []                     #used for multiple selection
        self.existing_data_widgets_list= existing_data_widgets_list #only used if PlotWindow is created with existing widgets
        self.existing_filepath_list=existing_filepath_list          #only used if PlotWindow is created with existing widgets
    
        self.create_general_controls()                              #create all buttons and graphic elements
        self.matplotlib_spec()
    ...

class Data_Widget():
    """ This Class creates the main window with the curve plot and general buttons like open files, Clear all, Plot all.

    Arguments:
    -master: the parent window, automatically the PlotWindow class
    -data: data loaded into the widget self.data used to plot the curve
    -filepath: the filepath of the loaded data
    -color: color of the curve

    methods:
    -create_widget_controls : create the controls (buttons...)
    -set_widget_color : set the color of the associated curve
    -set_x_var and set_y_var : set the x and y var of the plotted curve
    -master_plot : gives the order of plotting the curve to the master class (PlotWindow) (method plot_widget)
    -master_clear_plot : gives the order of clearing the plot to the master class (PlotWindow) (method clear_widget_plots)
    -delete_widget : delete the widget
    -create_exportload_popup : export the data contained in the widget as CSV file.
                Used where widget has been created by selecting data from another widget with PlotWindow multiple selection method
    """
    def __init__(self, master, data = pd.DataFrame(), filepath='Nothing loaded', color="blue"):
        super().__init__()
        self.master = master
        # Create variables for data and plot
        self.widget_id=0                            #id of the widget   
        self.filepath=filepath                      #filepath of the loaded data
        self.data = data                            #dataframe loaded into the widget
        self.selected_data = pd.DataFrame()         #data selected by the user
        self.x_var = None                           #x variable of the plotted curve
        self.y_var = None                           #y variable of the plotted curve
        self.color = color                          #color of the plotted curve
        self.create_widget_controls()               #create the controls (buttons...)
    ...

Here is the code for the functions "add selection", "plot selection" and "cancel selection":

def add_selection(self):
       if not self.widgets_list:
           self.display_error('no data selected')
           return
       onselect_x1, onselect_y1, onselect_x2, onselect_y2 = self.selection_coords
       selected_data_index=[]
       mask=[]
       present_selected_data=[]
       idx=[]
       for i in self.widgets_list:                                         # this variable will be the list 
    of index where the row corresponds to the selection
           mask = (i.data[i.x_var] >= onselect_x1) & (i.data[i.x_var] <= onselect_x2) & (i.data[i.y_var] >= 
    onselect_y1) & (i.data[i.y_var] <= onselect_y2)
           selected_data_index = np.where(mask)[0]
           present_selected_data=i.data.iloc[selected_data_index]
           if i.selected_data.empty:                                    # tests if we already have selected 
    data and, if yes, (.empty is false) we put the selection in growing order with a row of NaN values 
    inbetween
               i.selected_data = i.data.iloc[selected_data_index]
           else:
               # present_selected_data = i.data.iloc[selected_data_index]
               idx = np.searchsorted(i.selected_data.index, present_selected_data.index[0])                 
    # I don't quite understand this but chatgpt is too strong
               i.selected_data = pd.concat([pd.DataFrame(np.nan, index=[0], columns=i.data.columns), 
    i.selected_data.iloc[:idx], pd.DataFrame(np.nan, index=[0], columns=i.data.columns), 
    present_selected_data, i.selected_data.iloc[idx:], pd.DataFrame(np.nan, index=[0], 
    columns=i.data.columns)])
           i.selected_data.reset_index(drop=True, inplace= True)
           self.selected_widget_data_list.append(i.selected_data)
           self.selected_widget_filepath_list.append(i.filepath)
       if self.selected_widget_data_list == []:
           self.display_error('no data selected')
           return
       if self.cancel_selection_button.winfo_viewable() == 0:
           self.cancel_selection_button.pack()
       if self.plot_add_selection_button.winfo_viewable() == 0:
           self.plot_add_selection_button.pack()
       self.display_error('added to selection')

   def cancel_selection(self):
       self.selected_widget_data_list = []
       self.selected_widget_filepath_list = []
       self.cancel_selection_button.pack_forget()
       self.plot_add_selection_button.pack_forget()
       self.display_error('selection canceled')
        
   def plot_add_selection(self):
       self.display_error('opening another window')
       root = tk.Tk()                                  #creates a new window to put the new PlotWindow into
       PlotWindow(root, self.selected_widget_data_list, self.selected_widget_filepath_list)
       root.mainloop()

I tried resetting all the temporary variables used by the "add_selection" function by placing

selected_data_index=[]
mask=[]
present_selected_data=[]
idx=[]

at the end of the function but it changed nothing.

I am still learning a lot in python and still consider myself a beginner, maybe there are obvious issues I didn't see but I feel a bit overwhelmed with all this code. This is my first post here so I hope I was clear enough.


Solution

  • Thank you @TheLizzard for your answer but I just found the solution! The problem was that I needed to reset the value of each widget selected_data. I just added the line i.selected_data=pd.DataFrame() at the end of the for loop in the add_selection function and it worked perfectly. I then moved this operation in the cancel function for clarity.

    My intuition was also about not destroyed instances of Data_Widget or PlotWindow and maybe it could have also been resolved that way but that's too deep for me.

    Also after correcting this I noticed another bug, after clicking on plot selection, if I made another selection it would create a second widget for the same dataframe. The first widget containing the first selection and the second widget containing the first and second selection. I corrected it by moving the population of self.selected_widget_data_list and self.selected_widget_filepath_list in the plot function.

    I also added an intermediary function inside the add function for readability.

    Here is the new version of the 3 functions:

    def add_selection(self):
        # print('before add, selected widget data list: \n', self.selected_widget_data_list)
        if not self.widgets_list:
            self.display_error('no data selected')
            return
        
        def process_widget_dataselection(widget):
            onselect_x1, onselect_y1, onselect_x2, onselect_y2 = self.selection_coords
            
            mask = (widget.data[widget.x_var] >= onselect_x1) & \
                (widget.data[widget.x_var] <= onselect_x2) & \
                (widget.data[widget.y_var] >= onselect_y1) & \
                (widget.data[widget.y_var] <= onselect_y2)
            selected_data_index = np.where(mask)[0]
            present_selected_data = widget.data.iloc[selected_data_index]
            if widget.selected_data.empty:
                widget.selected_data = present_selected_data
            else:
                idx = np.searchsorted(widget.selected_data.index, present_selected_data.index[0])
                widget.selected_data = pd.concat([
                    pd.DataFrame(np.nan, index=[0], columns=widget.data.columns),
                    widget.selected_data.iloc[:idx],
                    pd.DataFrame(np.nan, index=[0], columns=widget.data.columns),
                    present_selected_data,
                    widget.selected_data.iloc[idx:],
                    pd.DataFrame(np.nan, index=[0], columns=widget.data.columns)
                ])
            widget.selected_data.reset_index(drop=True, inplace=True)
        
        nb_widget_selected = 0
        for widget in self.widgets_list:
            process_widget_dataselection(widget)
            if not widget.selected_data.empty:
                nb_widget_selected += 1
        
        if nb_widget_selected == 0:
            self.display_error('no data selected')
            return
        if self.cancel_selection_button.winfo_viewable() == 0:
            self.cancel_selection_button.pack()
        if self.plot_add_selection_button.winfo_viewable() == 0:
            self.plot_add_selection_button.pack()
        self.display_error('added to selection')
    
    def cancel_selection(self):
        self.selected_widget_data_list = []
        self.selected_widget_filepath_list = []
        for widget in self.widgets_list:
            widget.selected_data = pd.DataFrame()
        self.cancel_selection_button.pack_forget()
        self.plot_add_selection_button.pack_forget()
        self.display_error('selection canceled')
        
    def plot_add_selection(self):
        self.display_error('opening another window')
        self.selected_widget_data_list=[]
        self.selected_widget_filepath_list=[]
        widgets_x_var=[]
        widgets_y_var=[]
        for widget in self.widgets_list:
            if widget.selected_data.empty:
                self.display_error('no data selected in ' + str(widget.filepath))
            else:
                self.selected_widget_data_list.append(widget.selected_data)
                self.selected_widget_filepath_list.append(widget.filepath)
                widgets_x_var.append(widget.x_var)
                widgets_y_var.append(widget.y_var)
        print('before plot cancel, number of data selected_widget_data_list: \n', len(self.selected_widget_data_list))
        root = tk.Tk()                                  #creates a new window to put the new PlotWindow into
        PlotWindow(root, self.selected_widget_data_list, self.selected_widget_filepath_list, widgets_x_var, widgets_y_var)
        root.mainloop()
    

    Sorry for the useless post, somebody tell me if I should delete it or not.