Search code examples
pythonarraysloopsopenpyxlxlsx

Issues converting xlsx to 2d array openpyxl


I have an excel spreadsheet with 2 columns that I am attempting to convert to a 2d array with the following code:

#!/usr/bin/python3

import openpyxl
import sys
import os

book = openpyxl.load_workbook('contact2019.xlsx')

sheet = book.active


first_sheet = book.get_sheet_names()[0]
worksheet = book.get_sheet_by_name(first_sheet)
excel_data = [[0 for x in range(2)] for y in range(1)]

print(len(excel_data))
first = 0
cell_num = 0
for row in range(2,worksheet.max_row+1):
    for column in "AB":  #Here you can add or reduce the columns
        cell_name = "{}{}".format(column, row)
        excel_data.append(worksheet[cell_name].value)



print(excel_data)

My issue is that the data is simply being added to a 1d array sequentially, so my output of this is as follows:

['Sam Adams', '*******@gmail.com']

This is not a 2d array which I have initialized.

Input data is as follows:

Sam Adams   **********@gmail.com
Sammy Adams **********@gmail.com
Samuel Adams    **********@gmail.com
Samantha Adams  **********@gmail.com
Sam Adams   **********@gmail.com

Why is this assigning data sequentially instead of two per row?


Solution

  • Welcome to SO!

    Your code below iterates through and adds each item as a separate item, hence you get everything in sequential order.

    for row in range(2,worksheet.max_row+1):
        for column in "AB":  #Here you can add or reduce the columns
            cell_name = "{}{}".format(column, row)
            excel_data.append(worksheet[cell_name].value)
    

    Rather than looping through first the rows and then all the columns, you could replace the code to only loop through the rows.

    for row in range(2,worksheet.max_row+1):        
        excel_data.append([worksheet["A{}".format(row)].value,worksheet["B{}".format(row)].value])
    

    By doing this you are now creating a list of 2 element lists (or a 2d array).

    If you aren't set on doing it this way, you can also consider the python library pandas which abstracts out a lot of the work and lets you work with spreadsheet like objects called DataFrames.