Search code examples
pythonregexregex-group

Extract number float in exponential format from a bunch of long paths


I have a lot of strings corresponding each one to the path of files. I would like to extract number in exponential format in each string.

For example, I have :

../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_7.27168772219203e-07/wm_up

and I would like to extrat the float number : 7.27168772219203e-07

I would like to avoid using the splitmethod (with _ separator).

So I tried with python regexp like but I can't find which method to use (findall, research or sub) ?

How can I achieve this in a simple or short way (independently from wm_up substring since it may be other substrings (like this wm_dw for example))?

Clarifications

I would like to extract number since I want to sort in ascending order all these long srings. I would like to use natsorted:

For example, I have initially :

../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.301510038746646e-06/wm_up
../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.301510038746646e-06/wm_dw
../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.437191487625705e-05/wm_up
../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.437191487625705e-05/wm_dw

This is the result of natsortedof array of paths : as you can see, the ascending order takes into account the first digits and not the value of float exponential number (the real value) that I would like to extract. I would like to select by the ascending order of this value.


Solution

  • Here is the code:

    l = [
    '../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.301510038746646e-06/wm_up',
    '../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.301510038746646e-06/wm_dw',
    '../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.437191487625705e-05/wm_up',
    '../../Analysis_Pk_vs_Step_BEFORE_NEW_LAUNCH_13_DECEMBRE_22h57/Archive_WP_Pk_der_3_pts_step_9.437191487625705e-05/wm_dw'
    ] # the input that we have
    # regex from https://stackoverflow.com/a/4703508/7434857
    numeric_const_pattern = '[-+]? (?: (?: \d* \. \d+ ) | (?: \d+ \.? ) )(?: [Ee] [+-]? \d+ ) ?'
    rx = re.compile(numeric_const_pattern, re.VERBOSE) # compile the regex
    l.sort(key=lambda x: (float(rx.findall(x)[-1]),x))