Search code examples
pythoncsvuniqueabbreviation

setting unique abbreviation for every column in python


I have data like this in a csv file

Ad Group
Annuity Calculator
Tax Deferred Annuity
Annuity Tables
annuities calculator
annuity formula
Annuities Explained
Deferred Annuies Calculator
Current Annuity Rates
Forbes.com
Annuity Definition
fixed income
Immediate fixed Annuities
Deferred Variable Annuities
401k Rollover
Deferred Annuity Rates
Deferred Annuities
Immediate Annuities Definition
Immediate Variable Annuities
Variable Annuity
Aig Annuities
Retirement Income
retirment system
Online Financial Planner
Certified Financial Planner

I want to set a unique abbreviation for each column. For example:

  • Annuity Calculator = annca
  • annuities calculator = annsca

Can you please help me to figure out whats gonna be the best way to do it in python.

Thanks


Solution

  • Your problem isn't completely specified but seems fun. I took a stab at it. I wrote a function which takes a list of phrases and returns a dictionary where the abbreviations function as keys. It starts by taking the first two letters of each word and joining them for a candidate abbreviation. If that abbreviation has been used before it gradually brings into play more and more letters from the beginning of each word until you get a unique abbreviation. I then tested it on your sample data. You will almost certainly want to modify it but it should give you some ideas:

    def makeAbbreviations(headers):
        abbreviations = {}
        for header in headers:
            header = header.lower()
            words = header.split()
            n = max(len(w) for w in words)
            i = 2
            starts = [w[:i] for w in words]
            abbrev = ''.join(starts)
    
            while abbrev in abbreviations and i <= n:
                i += 1
                for j,w in enumerate(words):
                    starts[j] = w[:i]
                    abbrev = ''.join(starts)
                    if not abbrev in abbreviations: break
            abbreviations[abbrev] = header
        return abbreviations
    
    myHeaders = ['Ad Group', 'Annuity Calculator', 'Tax Deferred Annuity',
                 'Annuity Tables', 'annuities calculator', 'annuity formula',
                 'Annuities Explained', 'Deferred Annuies Calculator',
                 'Current Annuity Rates', 'Forbes.com', 'Annuity Definition',
                 'fixed income', 'Immediate fixed Annuities',
                 'Deferred Variable Annuities', '401k Rollover',
                 'Deferred Annuity Rates', 'Deferred Annuities',
                 'Immediate Annuities Definition', 'Immediate Variable Annuities',
                 'Variable Annuity', 'Aig Annuities', 'Retirement Income', 'retirment system',
                 'Online Financial Planner', 'Certified Financial Planner']
    
    d = makeAbbreviations(myHeaders)
    for (k,v) in d.items(): print(k,v,sep = " = ")
    

    Output:

    imande = immediate annuities definition
    adgr = ad group
    fiin = fixed income
    40ro = 401k rollover
    resy = retirment system
    vaan = variable annuity
    devaan = deferred variable annuities
    rein = retirement income
    imvaan = immediate variable annuities
    fo = forbes.com
    imfian = immediate fixed annuities
    dean = deferred annuities
    anca = annuity calculator
    cuanra = current annuity rates
    annca = annuities calculator
    onfipl = online financial planner
    aian = aig annuities
    ande = annuity definition
    anfo = annuity formula
    cefipl = certified financial planner
    tadean = tax deferred annuity
    deanca = deferred annuies calculator
    anex = annuities explained
    anta = annuity tables
    deanra = deferred annuity rates