Search code examples
pythonlistsparql

How to link a "lists database" with Python?


This is a part of a larger program made to help translating roman dates in our current calendar.

I need to link a kind of "lists database" (all the Roman consulates and their date) as the following

def date_():
    if name in list1[Name A, Name B, Name C]:
        date = 509
        return date
    if name in list2[Name D, Name E]:
        date = 508
        return date
    if name in list3[Name F]:
        etc...

Aim is that the user enters a "Name" and then the date will be assigned to a variable. Then the name will be searched in all the lists and assigned to the corresponding date.

name = input("Consulate name: ")
date = date_()
print(name, " was consulate in ", date)

But doing that manually is pure suicide (we would have around 800 "if name in list[]:").

My questions here are

1° Is it ok to do like that ? Or am I just doing something very easy to understand but somewhat stupid ?

2° could I extract in some way the Wikipedia list for having the 800 lists already done with the names and a date assigned ? If no, should I do that in Excel and then find a way to make the lists ? Doing the 800 in Excel will be painful but still less than in Python. I could also try in a SPARQL in RDF here for selecting only the consuls with their name and the date of consulate but I don't know how it works...

Thanks for any thoughts !


Solution

  • Because values are in tables so you could use pandas to read page and get tables as DataFrames

    But it still may need some works to join tables and fill empty cells. And when you will have all data then you could keep them in dictionary.

    d = {"Name A": 509, "Name B": 509, "Name C": 509, "Name D": 508, "Name E": 508, ...}
    

    and use simple

    date = d.get(name, None)
    

    Simple code which gets all tables from Wikipedia page

    import pandas as pd
    
    all_tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_Roman_consuls')
    
    print('number of tables:', len(all_tables))
    
    for number, df in enumerate(all_tables):
        print('\n---', number, '---\n')
        print(df.head())
    

    Result:

    number of tables: 18
    
    --- 0 ---
    
                                                                                                                                      Ancient Rome
                                                                                                                                           Periods
      Roman Kingdom753–509 BC Roman Republic509–27 BC Roman Empire27 BC – AD 395 Principate Dominate WesternAD 395–476 EasternAD 395–1453 Timeline
    0                                 Roman Constitution                                                                                          
    1  Constitution of the Kingdom Constitution of th...                                                                                          
    2                                  Precedent and law                                                                                          
    3  Roman law Ius Imperium Mos maiorum Collegialit...                                                                                          
    4                                         Assemblies                                                                                          
    
    --- 1 ---
    
                                                       0  ...                                                  3
    0  A. = Aulus Ap. = Appius C. = Gaius Cn. = Gnaeu...  ...  Sex. = Sextus Sp. = Spurius T. = Titus Ti. = T...
    
    [1 rows x 4 columns]
    
    --- 2 ---
    
        Year                Consul prior              Consul posterior
    0    509            L. Junius Brutus      L. Tarquinius Collatinus
    1  suff.  Sp. Lucretius Tricipitinus         P. Valerius Poplicola
    2  suff.       M. Horatius Pulvillus                           NaN
    3    508    P. Valerius Poplicola II     T. Lucretius Tricipitinus
    4    507   P. Valerius Poplicola III  M. Horatius Pulvillus II[18]
    
    --- 3 ---
    
      Year  ...                       Consul posterior
    0  500  ...                     M'. Tullius Longus
    1  499  ...          C. Veturius Geminus Cicurinus
    2  498  ...        T. Lartius (Flavus or Rufus) II
    3  497  ...                  M. Minucius Augurinus
    4  496  ...  T. Verginius Tricostus Caeliomontanus
    
    [5 rows x 3 columns]
    
    --- 4 ---
    
      Year                   Consul prior            Consul posterior
    0  400              Consular Tribunes                         NaN
    1  NaN  P. Licinius Calvus Esquilinus      P. Maelius Capitolinus
    2  NaN               P. Manlius Vulso       Sp. Furius Medullinus
    3  NaN       L. Titinius Pansa Saccus  L. Publilius Philo Vulscus
    4  399              Consular Tribunes                         NaN
    
    --- 5 ---
    
        Year                    Consul prior                Consul posterior
    0    300    M. Valerius Maximus Corvus V              Q. Appuleius Pansa
    1    299             M. Fulvius Paetinus            T. Manlius Torquatus
    2  suff.                             NaN   M. Valerius Maximus Corvus VI
    3    298    L. Cornelius Scipio Barbatus  Cn. Fulvius Maximus Centumalus
    4    297  Q. Fabius Maximus Rullianus IV               P. Decius Mus III
    
    --- 6 ---
    
      Year                   Consul prior          Consul posterior
    0  200  P. Sulpicius Galba Maximus II         C. Aurelius Cotta
    1  199          L. Cornelius Lentulus       P. Villius Tappulus
    2  198        T. Quinctius Flamininus  Sex. Aelius Paetus Catus
    3  197          C. Cornelius Cethegus         Q. Minucius Rufus
    4  196             L. Furius Purpureo     M. Claudius Marcellus
    
    --- 7 ---
    
          Year                 Consul prior      Consul posterior
    0  100[63]                 C. Marius VI   L. Valerius Flaccus
    1       99                  M. Antonius  A. Postumius Albinus
    2       98  Q. Caecilius Metellus Nepos             T. Didius
    3       97       Cn. Cornelius Lentulus   P. Licinius Crassus
    4       96     Cn. Domitius Ahenobarbus   C. Cassius Longinus
    
    --- 8 ---
    
        Year  ...                     Consul posterior
    0  1[77]  ...   L. Aemilius Paullus (January–June)
    1  suff.  ...  M. Herennius Picens (July–December)
    2      2  ...                     P. Alfenus Varus
    3  suff.  ...    T. Quinctius Crispinus Valerianus
    4      3  ...                         M. Servilius
    
    [5 rows x 3 columns]
    
    --- 9 ---
    
             Year  ...                          Consul posterior
    0    101[134]  ...  Q. Articuleius Paetus II (January–March)
    1       suff.  ...                                       NaN
    2       suff.  ...                          M. Maecius Celer
    3       suff.  ...                                   ignotus
    4  suff.[135]  ...       L. Julius Marinus Caecilius Simplex
    
    [5 rows x 3 columns]
    
    --- 10 ---
    
             Year  ...                            Consul posterior
    0         201  ...                   M. Nonius Arrius Mucianus
    1         202  ...  Imp. Caesar M. Aurelius Antoninus Augustus
    2  suff.[182]  ...                      C. Cassius Regallianus
    3         203  ...                        P. Septimius Geta II
    4         204  ...                      M. Annius Flavius Libo
    
    [5 rows x 3 columns]
    
    --- 11 ---
    
      Year  ...                                   Consul posterior
    0  301  ...                                  Virius Nepotianus
    1  302  ...          C. Galerius Valerius Maximianus Caesar IV
    2  303  ...  Imp. Caesar M. Aurelius Valerius Maximianus Au...
    3  304  ...  Imp. Caesar M. Aurelius Valerius Maximianus Au...
    4  305  ...           C. Galerius Valerius Maximianus Caesar V
    
    [5 rows x 3 columns]
    
    --- 12 ---
    
        Year                  Consul prior               Consul posterior
    0  396.0  Flavius Arcadius Augustus IV  Flavius Honorius Augustus III
    1  397.0             Flavius Caesarius                 Nonius Atticus
    2  398.0           Flavius Eutychianus   Flavius Honorius Augustus IV
    3  399.0                     Eutropius      Flavius Mallius Theodorus
    4  400.0                    Aurelianus               Flavius Stilicho
    
    --- 13 ---
    
           Year                               Eastern Roman Consul
    0  535[205]                                 Flavius Belisarius
    1       536                          Post consulatum Belisarii
    2       537                       II post consulatum Belisarii
    3       538  Flavius Marianus Michaelius Gabrielius Archang...
    4       539          Flavius Strategius Apion Strategius Apion
    
    --- 14 ---
    
                      vteAncient Rome topics                           vteAncient Rome topics.1
    0                       Outline Timeline                                   Outline Timeline
    1                                 Epochs  Foundation Kingdom overthrow Republic Empire P...
    2  Foundation Kingdom overthrow Republic              Foundation Kingdom overthrow Republic
    3                                 Empire  Pax Romana Principate Dominate Western Empire ...
    4                           Constitution  History Kingdom Republic Empire Late Empire Se...
    
    --- 15 ---
    
                                           0                                                  1
    0  Foundation Kingdom overthrow Republic              Foundation Kingdom overthrow Republic
    1                                 Empire  Pax Romana Principate Dominate Western Empire ...
    
    --- 16 ---
    
                   0                                                  1
    0       Ordinary  Consul Censor Praetor Tribune Tribune of the P...
    1  Extraordinary  Rex Interrex Dictator Magister Equitum Decemvi...
    

    EDIT: This code get all tables, join them and save in 'roman_consuls.csv'

    import pandas as pd
    
    all_tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_Roman_consuls')
    
    df = all_tables[2]
    for number in range(3, 13):
        df = df.append(all_tables[number])
    
    # remove addnotations - ie. `[205]`
    df['Year'] = df['Year'].str.replace('\\[\d+\\]', '')
    df['Consul prior'] = df['Consul prior'].str.replace('\\[\d+\\]', '')
    df['Consul posterior'] = df['Consul posterior'].str.replace('\\[\d+\\]', '')
    
    # use value from previous rows (mostly from row in text 'Consular Tribunes') - `ffill` means `forward fill`
    df['Year'] = df['Year'].fillna(method='ffill')
    
    # TODO: remove rows with text `'Consular Tribunes'`
    #df.drop(df['Consul prior'] == 'Consular Tribunes', inplace=True)
    
    # save to file
    df.to_csv('roman_consuls.csv')
    

    and later you can read from file and search

    BTW: some names are many times in table so I always return list with years or None (but instead of None it could return empty list and then it can be always use with for-loop - even if list is empty)

    import pandas as pd
    
    # read from file
    df = pd.read_csv('roman_consuls.csv')
    
    def date_(name):
        mask1 = (df['Consul prior'] == name)
        mask2 = (df['Consul posterior'] == name)
        #mask1 = (df['Consul prior'].str.contains(name)) 
        #mask2 = (df['Consul posterior'].str.contains(name)) 
    
        rows = df[ mask1 | mask2 ]
    
        if len(rows) > 0:
            return rows['Year'].values #[0]
    
        # it will return `None` if there is no rows
    
    for name in ('M. Valerius Volusus', 'Unknow', 'M. Aemilius Lepidus', 'P. Cornelius Cossus', 'Ap. Claudius Crassus Sabinus Regillensis'):
        date = date_(name)
        print('name:', name)
        print('year:', date)
        print('---')
    

    Result

    name: M. Valerius Volusus
    year: ['505']
    ---
    name: Unknow
    year: None
    ---
    name: M. Aemilius Lepidus
    year: ['285' '232' '187' '158' '126' '78' '46']
    ---
    name: P. Cornelius Cossus
    year: ['415' '408' '395']
    ---
    name: Ap. Claudius Crassus Sabinus Regillensis
    year: ['451' '451']
    

    But columns may still need to clear data before use.

    • code doesn't add table Eastern Roman Consul which has different columns but still you can save all_tables[13] in separated file and use as separated data

    • code check in both columns Consul prior and Consul posterior but you can us e only one of them

    • for some consuls there was no year so I use .fillna() to get value from row above. Mostly it is row with text 'Consular Tribunes' but sometimes it may be other counsul and this method may gives wrong results.

    • some years/names have addnotation which adds number like [205] to date/name and I remove it.