Search code examples
pythonstringdictionarysplituppercase

How to split on many different delimiters when assigning to dictionary


For the sake of practicing how to become more fluent using dictionaries, I trying to write a program that reads the chemical composition of the lunar atmosphere and assign the elements and their estimated composition as a key-value pair like this "NEON 20":40000

The data file looks like this

Estimated Composition (night, particles per cubic cm):
Helium 4 - 40,000 ; Neon 20 - 40,000 ; Hydrogen - 35,000
Argon 40 - 30,000 ; Neon 22 - 5,000 ; Argon 36 - 2,000
Methane - 1000 ; Ammonia - 1000 ; Carbon Dioxide - 1000

And my code so far looks like this:

def read_data(filename):
    dicti = {}

    with open(filename,"r") as infile:
        infile.readline()

        for line in infile:
            words = line.split(";")
            dicti[words[0]] = f"{words[1]}"

    for key in dicti:
        print(key, dicti[key])

read_data("atm_moon.txt")

My question is:

  • How do I split on both "-" and ";"?
  • How do I assign the elements and their estimated atmospheric composition as a key-value pair in a simple and elegant way from this data file?
  • How do I make the element names all upper case?

Is there anyone who is kind enough to help a rookie out? All help is welcomed.


Solution

  • What you have here is a list of lines. Each line can contain multiple items, separated by semicolons. Each item (or record) consists of an element name, a hyphen, and the particle count.

    You don't need to split on different delimiters at the same time here; instead, you can split out the individual items using the semicolons, and then split each item into the key/value pair you need for your dictionary based on the hyphen.

    for line in infile:
        for item in line.split(" ; "):
            key, value = item.split(" - ", 1)
            dicti[key.upper()] = value
    

    Note that I'm including the spaces around your delimiters, so they are removed when you split. Otherwise those will end up in your dictionary. An alternative would be to use strip(); that way it works properly even if there are more (or no) spaces there.

    for line in infile:
        for item in line.split(";"):
            key, value = item.split("-", 1)
            dicti[key.strip().upper()] = value.strip()
    

    However, if there's any chance that one of your records might have a semicolon or a hyphen in it that's not meant to be a separator, I'd leave the spaces in the .split() call.

    Now I'm going to go a step further and assume that you will want those values as actual numbers, not just strings. To do this we'll remove the commas and convert them to integers.

    for line in infile:
        for item in line.split(";"):
            key, value = item.split("-", 1)
            dicti[key.strip().upper()] = int(value.strip().replace(",", ""))
    

    If there were any values with fractional parts (decimal points), you could use float() in place of int() to convert those to floating-point numbers.