I am running into an issue when trying to removing duplicates from a list.
def my_list_bucket(self, bucketName, limit=sys.maxsize): #delimiter='/'):
a_bucket = self.storage_client.lookup_bucket(bucketName)
bucket_iterator = a_bucket.list_blobs()
for resource in bucket_iterator:
path_parts = resource.name.split('/')
date_folder = path_parts[0]
publisher_folder = path_parts[1]
desired_path = date_folder + '/' + publisher_folder + '/'
new_list = []
for path in desired_path:
if desired_path not in new_list:
new_list.append(desired_path)
print(new_list)
limit = limit - 1
if limit <= `0:
break
This is the results I get:
20230130/adelphic/
20230130/adelphic/
20230130/adelphic/
20230130/adelphic/
20230130/instacart/
20230130/instacart/
20230130/instacart/
20230130/instacart/
Its not removing the duplicates from the list as the duplicates are still there.
The results I want is:
20230130/adelphic/
20230130/instacart/
I have tried new_list = list(set(publisher_folder))
and it returns:
'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l'
'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l'
'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l'
When you do:
for path in desired_path:`
it is essentially:
for character in desired_path:
at the moment since desired_path
is a string that looks like "20230130/adelphic/"
.
At the moment your code breaks these strings into characters and reassemble them back into their original strings to print.
I assume what you seek is a list of distinct such strings and that might be done by:
import sys
def my_list_bucket(self, bucketName, limit=sys.maxsize): #delimiter='/'):
a_bucket = self.storage_client.lookup_bucket(bucketName)
new_list = set()
for resource in a_bucket.list_blobs():
new_list.add(f"{ '/'.join(resource.name.split('/')[:2]) }/")
limit -= 1
if not limit:
break
new_list = list(new_list)
print(new_list)
or potentially:
def my_list_bucket(self, bucketName, limit=sys.maxsize): #delimiter='/'):
a_bucket = self.storage_client.lookup_bucket(bucketName)
new_list = list(set(
f"{ '/'.join(resource.name.split('/')[:2]) }/"
for resource in a_bucket.list_blobs()[:limit]
))
print(new_list)