Search code examples
rfilterdecimal

how do I filter dataset based on "Version" column containing _________.000 decimal?


I have a dataset where I am trying to filter based on 3 different columns.

I have the 2 columns that have character values figured out by doing:

filter(TRANSACTION_TYPE != "ABC", CUSTOMER_CODE == "123")

However, I have a "VERSION" column where there will be multiple versions for each customer which will then duplicate my $ amount. I want to filter on only the VERSION that contains ".000" as decimal since the .000 represents the final and most accurate version. For example, VERSION can = 20220901.000 and 20220901.002, 20220901.003, etc.

Version examples

However the numbers before the decimal will always change so I can't filter on it to equal this 20220901 as it will change by day.


Solution

  • Sample data:

    quux <- data.frame(VERS_chr = c("20220901.000","20220901.002","20220901.000","20220901.002"),
                       VERS_num = c(20220901.000,20220901.002,20220901.000,20220901.002))
    

    If is.character(quux$VERSION) is true in your data, then

    dplyr::filter(quux, grepl("\\.000$", VERS_chr))
    #       VERS_chr VERS_num
    # 1 20220901.000 20220901
    # 2 20220901.000 20220901
    

    Explanation:

    • "\\.000$" matches the literal period . (it needs to be escaped since it's a regex reserved symbol) followed by three literal zeroes 000, at the end of string ($). See https://stackoverflow.com/a/22944075/3358272 for more info on regex.

    If it is false (and it is not a factor), then

    dplyr::filter(quux, abs(VERS_num %% 1) < 1e-3)
    #       VERS_chr VERS_num
    # 1 20220901.000 20220901
    # 2 20220901.000 20220901
    

    Explanation: