I have data like this in a csv file
Ad Group
Annuity Calculator
Tax Deferred Annuity
Annuity Tables
annuities calculator
annuity formula
Annuities Explained
Deferred Annuies Calculator
Current Annuity Rates
Forbes.com
Annuity Definition
fixed income
Immediate fixed Annuities
Deferred Variable Annuities
401k Rollover
Deferred Annuity Rates
Deferred Annuities
Immediate Annuities Definition
Immediate Variable Annuities
Variable Annuity
Aig Annuities
Retirement Income
retirment system
Online Financial Planner
Certified Financial Planner
I want to set a unique abbreviation for each column. For example:
Can you please help me to figure out whats gonna be the best way to do it in python.
Thanks
Your problem isn't completely specified but seems fun. I took a stab at it. I wrote a function which takes a list of phrases and returns a dictionary where the abbreviations function as keys. It starts by taking the first two letters of each word and joining them for a candidate abbreviation. If that abbreviation has been used before it gradually brings into play more and more letters from the beginning of each word until you get a unique abbreviation. I then tested it on your sample data. You will almost certainly want to modify it but it should give you some ideas:
def makeAbbreviations(headers):
abbreviations = {}
for header in headers:
header = header.lower()
words = header.split()
n = max(len(w) for w in words)
i = 2
starts = [w[:i] for w in words]
abbrev = ''.join(starts)
while abbrev in abbreviations and i <= n:
i += 1
for j,w in enumerate(words):
starts[j] = w[:i]
abbrev = ''.join(starts)
if not abbrev in abbreviations: break
abbreviations[abbrev] = header
return abbreviations
myHeaders = ['Ad Group', 'Annuity Calculator', 'Tax Deferred Annuity',
'Annuity Tables', 'annuities calculator', 'annuity formula',
'Annuities Explained', 'Deferred Annuies Calculator',
'Current Annuity Rates', 'Forbes.com', 'Annuity Definition',
'fixed income', 'Immediate fixed Annuities',
'Deferred Variable Annuities', '401k Rollover',
'Deferred Annuity Rates', 'Deferred Annuities',
'Immediate Annuities Definition', 'Immediate Variable Annuities',
'Variable Annuity', 'Aig Annuities', 'Retirement Income', 'retirment system',
'Online Financial Planner', 'Certified Financial Planner']
d = makeAbbreviations(myHeaders)
for (k,v) in d.items(): print(k,v,sep = " = ")
Output:
imande = immediate annuities definition
adgr = ad group
fiin = fixed income
40ro = 401k rollover
resy = retirment system
vaan = variable annuity
devaan = deferred variable annuities
rein = retirement income
imvaan = immediate variable annuities
fo = forbes.com
imfian = immediate fixed annuities
dean = deferred annuities
anca = annuity calculator
cuanra = current annuity rates
annca = annuities calculator
onfipl = online financial planner
aian = aig annuities
ande = annuity definition
anfo = annuity formula
cefipl = certified financial planner
tadean = tax deferred annuity
deanca = deferred annuies calculator
anex = annuities explained
anta = annuity tables
deanra = deferred annuity rates