Search code examples
pythonpostgresqlparsingcurrency

How to robustly parse PostgreSQL money (in Python)


How can the money type that PostgreSQL offers be robustly parsed, to extract both the value and the currency symbol? (In Python, but something a bit non-language specific is also welcome)

The problems I think are that various components can change. e.g currency symbol can vary, as its position, as well as the symbol for what I would call decimal point, and maybe even the negative symbol...

Context: I'm writing a PostgreSQL adapter for Python, and wondering whether to parse money output, or leave it as a string.

Here is a list of all(?) 148 possible ways 12345.67 and -12345.67 can be format on my system based on the lc_monetary setting. (There are more lc_monetary possibilities but I've removed values that would duplicate output in this list)

lc_monetary 12345.67 -12345.67
aa_DJ.iso88591 $12 345.67 -$12 345.67
aa_ER.utf8 $ 12,346 -$ 12,346
aa_ET.utf8 $12,345.67 -$12,345.67
af_ZA.iso88591 R12,345.67 -R12,345.67
an_ES.iso885915 12.345,67 € -12.345,67 €
ar_AE.iso88596 د.إ. 12,345.670 د.إ. 12,345.670-
ar_BH.iso88596 د.ب. 12,345.670 د.ب. 12,345.670-
ar_DZ.iso88596 د.ج. 12,345.670 د.ج. 12,345.670-
ar_EG.iso88596 ج.م. 12,345.670 ج.م. 12,345.670-
ar_IN.utf8 ₹ 12,345.67 -₹ 12,345.67
ar_IQ.iso88596 د.ع. 12,345.670 د.ع. 12,345.670-
ar_JO.iso88596 د.أ. 12,345.670 د.أ. 12,345.670-
ar_KW.iso88596 د.ك. 12,345.670 د.ك. 12,345.670-
ar_LB.iso88596 ل.ل. 12,345.670 ل.ل. 12,345.670-
ar_LY.iso88596 د.ل. 12,345.670 د.ل. 12,345.670-
ar_MA.iso88596 د.م. 12,345.670 د.م. 12,345.670-
ar_OM.iso88596 ر.ع. 12,345.670 ر.ع. 12,345.670-
ar_QA.iso88596 ر.ق. 12,345.670 ر.ق. 12,345.670-
ar_SA.iso88596 12,345.67 ريال -12,345.67 ريال
ar_SD.iso88596 ج.س. 12,345.670 ج.س. 12,345.670-
ar_SY.iso88596 ل.س. 12,345.670 ل.س. 12,345.670-
ar_TN.iso88596 د.ت. 12,345.670 د.ت. 12,345.670-
ar_YE.iso88596 ر.ي. 12,345.670 ر.ي. 12,345.670-
ayc_PE.utf8 S/ 12,345.67 -S/ 12,345.67
az_AZ.utf8 12 345.67 man. -12 345.67 man.
be_BY.cp1251 12 345.67 руб -12 345.67 руб
be_BY.utf8@latin 12 345.67 Rub -12 345.67 Rub
bem_ZM.utf8 K12,345.67 -K12,345.67
ber_MA.utf8 ⴷ.ⵎ. 12,345.670 ⴷ.ⵎ. 12,345.670-
bg_BG.cp1251 12 345,67 лв -12 345,67 лв
bn_BD.utf8 ৳ 12,345.67 -৳ 12,345.67
bn_IN.utf8 ₹ 1,23,45.67 -₹ 1,23,45.67
bo_CN.utf8 ¥12,345.67 ¥-12,345.67
br_FR.iso88591 12 345,67 EUR -12 345,67 EUR
br_FR.iso885915@euro 12 345,67 € -12 345,67 €
bs_BA.iso88592 KM 12 345,67 -KM 12 345,67
ca_AD.iso885915 € 12.345,67 -€ 12.345,67
ca_ES.iso88591 EUR 12.345,67 -EUR 12.345,67
crh_UA.utf8 12 345.67 gr -12 345.67 gr
cs_CZ.iso88592 12 345,67 Kč -12 345,67 Kč
csb_PL.utf8 12.345,67 zł -12.345,67 zł
cv_RU.utf8 12 345.67 t -12 345.67 t
cy_GB.iso885914 £12,345.67 -£12,345.67
da_DK.iso88591 kr 12.345,67 kr -12.345,67
de_AT.iso88591 EUR 12 345,67 -EUR 12 345,67
de_AT.iso885915@euro € 12 345,67 -€ 12 345,67
de_BE.iso88591 EUR 12.345,67 EUR- 12.345,67
de_BE.iso885915@euro € 12.345,67 €- 12.345,67
de_CH.iso88591 Fr. 12'345.67 Fr.- 12'345.67
de_DE.iso88591 12.345,67 EUR -12.345,67 EUR
dv_MV.utf8 ރ. 12,345.67 -ރ.12,345.67
dz_BT.utf8 དངུལ་ཀྲམ་ 12,345.670 དངུལ་ཀྲམ་- 12,345.670
el_CY.iso88597 12.345,67€ -€12.345,67
en_BW.iso88591 Pu12,345.67 -Pu12,345.67
en_DK.iso88591 ¤12.345,67 -¤12.345,67
en_HK.iso88591 HK$12,345.67 (HK$12,345.67)
en_IE.iso88591 EUR12,345.67 -EUR12,345.67
en_IE.iso885915@euro €12,345.67 -€12,345.67
en_NG.utf8 ₦12,345.67 -₦12,345.67
en_PH.iso88591 Php12,345.67 (Php12,345.67)
en_SG.iso88591 $12,345.67 ($12,345.67)
en_ZW.iso88591 Z$12,345.67 -Z$12,345.67
es_AR.iso88591 $ 12.345,67 -$ 12.345,67
es_BO.iso88591 $b 12.345,67 -$b 12.345,67
es_CR.iso88591 C= 12 345,67 -C= 12 345,67
es_CR.utf8 ₡ 12 345,67 -₡ 12 345,67
es_CU.utf8 12 345,67 $ -12 345,67 $
es_DO.iso88591 $ 12,345.67 -$ 12,345.67
es_GT.iso88591 Q 12,345.67 -Q 12,345.67
es_HN.iso88591 L. 12,345.67 -L. 12,345.67
es_NI.iso88591 C$ 12,345.67 -C$ 12,345.67
es_PA.iso88591 B/ 12,345.67 -B/ 12,345.67
es_PY.iso88591 Gs. 12.345,67 -Gs. 12.345,67
es_SV.iso88591 C= 12,345.67 -C= 12,345.67
es_SV.utf8 ₡ 12,345.67 -₡ 12,345.67
es_VE.iso88591 Bs. 12.345,67 -Bs. 12.345,67
et_EE.iso88591 EUR 12 345,67 -EUR 12 345,67
et_EE.iso885915 € 12 345,67 -€ 12 345,67
eu_ES.iso885915@euro € 12.346 -€ 12.346
fa_IR.utf8 12٬346 ریال -12٬346 ریال
ff_SN.utf8 12,345.67 CFA -12,345.67 CFA
fi_FI.iso88591 12 345,67 EUR -12 345,67 EUR
fi_FI.iso885915@euro 12 345,67 € -12 345,67 €
fil_PH.utf8 PhP12,345.67 -PhP 12,345.67
fr_CA.iso88591 12 345,67 $ (12 345,67 $)
fy_NL.utf8 € 12 345,67 € 12 345,67-
gu_IN.utf8 +₹ 12,345.67 -₹ 12,345.67
he_IL.iso88598 שח 12,345.67 שח 12,345.67-
hr_HR.iso88592 Kn 12 345,67 -Kn 12 345,67
ht_HT.utf8 12 345,67 g -12 345,67 g
hu_HU.iso88592 12.345,67 Ft -12.345,67 Ft
hy_AM.utf8 Դ12,345.67 -Դ12,345.67
id_ID.iso88591 Rp12.345,67 -Rp12.345,67
is_IS.iso88591 12.346 kr -12.346 kr
ja_JP.eucjp ¥12,346 ¥-12,346
ka_GE.utf8 ლ12.345,67 -ლ12.345,67
kk_KZ.utf8 12 345.67 тг -12 345.67 тг
km_KH.utf8 12,345.67៛ -12,345.67៛
ko_KR.euckr ₩12,346 ₩-12,346
ku_TR.iso88599 12.345,67 TL -12.345,67 TL
ky_KG.utf8 12 345.67 сом -12 345.67 сом
lg_UG.iso885910 12,345.67/- -12,345.67/-
lo_LA.utf8 ₭ 12,345.67 ₭ -12,345.67
lt_LT.iso885913 12.345,67 Lt -12.345,67 Lt
lv_LV.iso885913 Ls 12 345,67 -Ls 12 345,67
mg_MG.iso885915 12 345,67 AR -12 345,67 AR
mhr_RU.utf8 12 345.67 ТЕҤ -12 345.67 ТЕҤ
mk_MK.iso88595 12 345,67 ден -12 345,67 ден
mn_MN.utf8 12 345.67 ₮ -12 345.67 ₮
ms_MY.iso88591 RM12,345.67 (RM12,345.67)
mt_MT.iso88593 12,345.67EUR (12,345.67EUR)
mt_MT.utf8 12,345.67€ (12,345.67€)
my_MM.utf8 12,345.67Ks -12,345.67Ks
nan_TW.utf8@latin NT$12,345.67 -NT$12,345.67
nb_NO.iso88591 kr12 345,67 kr-12 345,67
ne_NP.utf8 रू 12,345.67 -रू 12,345.67
nl_AW.utf8 Afl. 12 345,67 Afl. 12 345,67-
nl_BE.iso88591 EUR 12 345,67 EUR 12 345,67-
nn_NO.iso88591 kr 12 345,67 -kr12 345,67
om_KE.iso88591 Ksh12,345.67 -Ksh12,345.67
os_RU.utf8 12 345.67 сом -12 345.67 сом
pa_PK.utf8 Rs 12,345.67 -Rs12,345.67
pap_AN.utf8 f 12 345,67 f 12 345,67-
ps_AF.utf8 12٬346 افغانۍ -12٬346 افغانۍ
pt_BR.iso88591 R$ 12.345,67 -R$ 12.345,67
ro_RO.iso88592 Lei 12.345,67 -Lei 12.345,67
ru_RU.iso88595 12 345.67 руб -12 345.67 руб
ru_UA.koi8u 12 345.67 гр -12 345.67 гр
rw_RW.utf8 12.345,67 Frw -12.345,67 Frw
sd_IN.utf8@devanagari रु 12,345.67 -रु 12,345.67
se_NO.utf8 ru12.345,67 ru-12.345,67
si_LK.utf8 ₨ 12,345.67 -₨ 12,345.67
sq_AL.iso88591 Lek12.345,670 -Lek12.345,670
sq_MK.utf8 12 345,67 den -12 345,67 den
sr_RS.utf8 12.345,67 дин -12.345,67 дин
sr_RS.utf8@latin din 12.346 -din 12.346
sv_SE.iso88591 12 345,67 kr -12 345,67 kr
sw_TZ.utf8 TSh12,345.67 -TSh12,345.67
te_IN.utf8 ₹12,345.67 -₹12,345.67
th_TH.utf8 ฿ 12,345.67 ฿ -12,345.67
tk_TM.utf8 12,345.67 MANAT -12,345.67 MANAT
tt_RU.utf8@iqtelif 12 345.67 sum -12 345.67 sum
uk_UA.koi8u 12 345,67грн. -12 345,67 грн.
uz_UZ.iso88591 so'm12,345.67 -so'm12,345.67
uz_UZ.utf8@cyrillic сўм12,345.67 -сўм12,345.67
vi_VN.utf8 12.346₫ -₫12.346
wo_SN.utf8 12 345,67 CFA -12 345,67 CFA
yi_US.cp1255 $ 12,345.67 $ 12,345.67-

Solution

  • Based on people's comments, my answer is:

    Don't

    Either don't use the type, or at most use it as output-only, to not be parsed.