I am trying to format integers into IPv6 addresses.
IPv6 address here means a string that represents an integer between 0 and 2^128 - 1 (340282366920938463463374607431768211455), formatted into 32 hexadecimal digits, separated by colons (':'
) into 8 fields of 4 digits each.
Now I know of str(ipaddress.IPv6Address(n))
and int(ipaddress.IPv6Address(s))
, but I want to write my own functions in the name of learning, and I have already written them and I am trying to improve them.
I am looking for a way to format integers into IPv6 format using either f-strings or str.format
, I currently use this one-liner:
ipv6 = ':'.join(hex(n).removeprefix('0x').zfill(32)[i:i+4] for i in range(0, 32, 4))
And it is slow, because it uses string slicing.
I have already written codes to shorten IPv6 addresses using regexes, and I am looking to replace the above mentioned one-liner using a string format one-liner.
I have already implemented the first part (format an integer into 32 bit hexadecimal with leading zeros and without '0x'
prefix) using this:
"{0:0>32x}".format(n)
But I cannot implement the second part, I have Google searched for a way to insert a separator every N characters into strings using Python but most are irrelevant no matter what keywords I use, and I have only seen two relevant methods, one being the method I am using that I came up with by myself, the other is this:
re.sub('([\da-fA-F]{4})', r'\1:', s, 7)
But regexes are slow:
In [221]: re.sub('([\da-fA-F]{4})', r'\1:', 'c42d7a7d155b93f7658c20c9fea598ff', 7)
Out[221]: 'c42d:7a7d:155b:93f7:658c:20c9:fea5:98ff'
In [222]: s = 'c42d7a7d155b93f7658c20c9fea598ff'
In [223]: %timeit re.sub('([\da-fA-F]{4})', r'\1:', s, 7)
11.6 µs ± 993 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [224]: %timeit ':'.join(s[i:i+4] for i in range(0, 32, 4))
2.11 µs ± 23.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
But I have found this keyword: Python f string thousand separator in Google search suggestions and found this syntax: "{:,d}"
In [226]: "{:,d}".format(1234567890)
Out[226]: '1,234,567,890'
It is pretty close to what I seek, but unfortunately it isn't the method, first it deals with integers (d
), secondly it inserts a separator every 3 characters instead of 4, and finally it uses a comma instead of a colon, and I can't change that without invalidating the syntax...
So what is the correct f-string syntax to insert a colon every 4 digits while formatting integer to 32 digit hexadecimal string? Preferably without f-string nesting. And I am looking for a one-liner.
By f-string nesting I meaning something like this:
f'{f"{n:0>32x}"}'
I don't know if that is valid, and I don't like it.
Currently I have done this:
"{0:0>32_x}".format(n)
In [248]: "{0:0>32_x}".format(260764824896579434326633182196140447999)
Out[248]: 'c42d_7a7d_155b_93f7_658c_20c9_fea5_98ff'
But I can't change that underscore to a colon, I know I can use str.replace
but still I want it done in one command.
Performance comparison:
In [253]: %timeit "{0:0>32_x}".format(260764824896579434326633182196140447999).replace('_', ':')
988 ns ± 7.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [254]: %timeit ':'.join([hex(260764824896579434326633182196140447999).replace('0x',"").zfill(32)[i:i+4] for i in range(0, 32, 4)])
4.59 µs ± 493 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [255]: %timeit "{0:0>32_x}".format(260764824896579434326633182196140447999)
810 ns ± 48.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
String interpolation is much faster than string slicing.
But regexes are slow:
Note that you might harness re.compile
if you want to lessen required time, consider following example
import timeit
timeit.timeit(stmt='re.sub("(....)","\\1:",s)',setup='import re;s = "c42d7a7d155b93f7658c20c9fea598ff"') # 2.398639300000468
timeit.timeit(stmt='pat.sub("\\1:",s)',setup='import re;s = "c42d7a7d155b93f7658c20c9fea598ff";pat=re.compile("(....)")') # 1.6385453000002599
I use different pattern, as I assume input is always 32 hex digits, replace every 4 characters with them followed by :
.