Search code examples
swiftregextransformcapture

With Swift Regex Builder how do I transform a capture statement with multiple values to an integer


I am trying to learn the Swift Regex Builder. The string I am working with contains values as follows: x,xxx,xxx. I can capture the value as a string with capture and transform but would like to output an integer instead. Is there a way to access the values obtained by the Repeat statements in my capture block and convert them to one integer value? Below is my test function.

func Test()  {
    
    let stringData = "<tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 30, 2023</span></td><td class=Py(10px) Pstart(10px)><span>405.40</span></td><td class=Py(10px) Pstart(10px)><span>408.22</span></td><td class=Py(10px) Pstart(10px)><span>405.29</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>407.28</span></td><td class=Py(10px) Pstart(10px)><span>5,160,100</span></td></tr><tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 29, 2023</span></td><td class=Py(10px) Pstart(10px)><span>400.60</span></td><td class=Py(10px) Pstart(10px)><span>402.67</span></td><td class=Py(10px) Pstart(10px)><span>400.19</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>402.51</span></td><td class=Py(10px) Pstart(10px)><span>3,914,800</span></td></tr><tr class=BdT Bdc($seperatorColor) Ta(end) Fz(s) Whs(nw)><td class=Py(10px) Ta(start) Pend(10px)><span>Jun 28, 2023</span></td><td class=Py(10px) Pstart(10px)><span>401.35</span></td><td class=Py(10px) Pstart(10px)><span>403.49</span></td><td class=Py(10px) Pstart(10px)><span>400.71</span></td><td class=Py(10px) Pstart(10px)><span>402.55</span></td><td class=Py(10px) Pstart(10px)><span>400.97</span></td><td class=Py(10px) Pstart(10px)><span>4,320,700</span></td></tr>"
    
    let StartingSeperator = Regex {"<td class=Py(10px) Ta(start) Pend(10px)><span>"}
    let InterValueSeperator = Regex {"</span></td><td class=Py(10px) Pstart(10px)><span>"}
    let EndingSeperator = Regex {"</span>"}
    
    let CaptureDollarValue = Regex {
        TryCapture {
            OneOrMore(.digit)
            "."
            Repeat(.digit, count: 2)
        } transform: {
            Double($0)
        }
    }
    
    let tradingDayPattern = Regex {
        
        StartingSeperator
        
        Capture(.date(format: "\(month: .abbreviated) \(day: .twoDigits), \(year: .extended(minimumLength: 4))", locale: Locale(identifier: "en_US_POSIX") , timeZone: .gmt))
        InterValueSeperator
        
        CaptureDollarValue
        InterValueSeperator
        
        CaptureDollarValue
        InterValueSeperator
        
        CaptureDollarValue
        InterValueSeperator
        
        CaptureDollarValue
        InterValueSeperator
        
        CaptureDollarValue
        InterValueSeperator
        
//#########################################################################################
        Capture {
            Repeat(1...3) {
                One(.digit)
            }
            ","
            Repeat(count: 3) {
                One(.digit)
            }
            ","
            Repeat(count: 3) {
                One(.digit)
            }
        } transform: {
            String($0).    // this is where I would like to output an integer
        }
//#########################################################################################
        
        EndingSeperator
        
    }
    
    for match in stringData.matches(of: tradingDayPattern) {
        
        let (_, date, _, _, _, close, _, v1) = match.output
        print("\(date) - \(close) - \(v1)")
    }
} // end testing function


Solution

  • If it is always a well-formed number (i.e. comma separator every 3 digits, and nothing like 1,2,3 etc), then you can use an IntegerFormatStyle, similar to what you did when capturing the date.

    Capture(IntegerFormatStyle<Int>(locale: Locale(identifier: "en_US_POSIX")))
    

    You should specify an appropriate locale.

    If the number can be something weird like 1,2,3, then you can replace the commas and convert it to an Int:

    transform: {
        Int($0.replacing(",", with: ""))
    }
    

    Note that this produces an optional Int?.