Search code examples
iosnsstringnsrange

How to Parse NSString from index to index?


I would like to parse a string like this:
NSString *str = @"firstcolumn second column text Third Column Text";

I have three columns of text, each column could be text with spaces.
I know how wide the columns, col1 = 10 chars long, col2 = 20, col3 = 30
I know I could use NSRange(0,len1),(10,len2),(20,len3).

I get crashes 'Out of range" errors because the length varies, the length of the column text doesn't have to reach its max limit.

Any ideas how to do this?

NSString *str = @"A000 B11 This is text description This column is a longer Text description"; 
//A000 column can be 10 chars long
//B11 can be 20 chars
//This is some text description can be 30 characters long
 NSString *code1 = [line substringWithRange:NSMakeRange(0,10)];
 NSString *code2 = [line substringWithRange:NSMakeRange(10,20)];
 NSString *shorttext = [line substringWithRange:NSMakeRange(20,20)];
 NSString *longtext = [line substringWithRange:NSMakeRange(30,30)];

I would like to get code1 = A000 in the above example, this can be of length 10 chars long, but don't have to be as you can see. Same, thing goes for the other 2 columns, code2, and text. How can I do this?


Solution

  • If I understand correctly, you have an input NSString str which consists of three concatenated strings: col1, col2, and col3. Additionally, you know the following constraints about the problem

    • col1 is between 0 and 10 characters
    • col2 is between 0 and 20 characters
    • col3 is between 0 and 30 characters

    and want to recover these strings from str. Put differently, you want to uniquely determine col1, col2, and col3 so that str is equal to

    [NSString stringWithFormat:@"%@%@%@", col1, col2, col3];
    

    Unfortunately, as others have commented, this is not possible without modifying the problem. To see why not, consider the case where

    str = @"a";
    

    In this case, you know that one of the component strings (col1, col2, or col3) is equal to @"a" and the other two are equal to @"". However, it's not possible to determine which. If, for example col1 = @"a" and col2 and col3 are both equal to @""; then

    [NSString stringWithFormat:@"%@%@%@", col1, col2, col3]
    

    evaluates to

    @"a"
    

    as desired. However this is also true if col1 and col2 are equal to @"" and col3 = @"a" since

    [NSString stringWithFormat:@"%@%@%@", col1, col2, col3]
    

    still evaluates to

    @"a"
    

    The problem here is not that the component strings are able to be empty but rather that they're able to vary over a range.

    If we constrained the problem so that the lengths were exact

    • col1, which is 10 characters long
    • col2, which is 20 characters long
    • col3, which is 30 characters long

    it would then be possible to recover str with the following function:

    void GetColumnsFromString(NSString *str, NSString * __autoreleasing *col1, NSString * __autoreleasing *col2, NSString * __autoreleasing *col3)
    {
        if (col1) {
            *col1 = [str substringWithRange:NSMakeRange(0, 10)];
        }
        if (col2) {
            *col2 = [str substringWithRange:NSMakeRange(10, 20)];
        }
        if (col3) {
            *col3 = [str substringWithRange:NSMakeRange(30, 30)];
        }
    }
    

    Another, better, solution, as has been mentioned in the comments, is to use "special" characters in str to demarcate the boundary between the component strings. If we constructed str like this

    str = [NSString stringWithFormat:@"%@%@%@", col1, col2, col3];
    

    and we constrained col1 and col2 and col3 not to contain the character , then we could parse col1 and col2 as follows:

    NSArray *cols = [str componentsSeparatedByString:@""];
    col1 = cols[0];
    col2 = cols[1];
    col3 = cols[2];
    

    The situation is no different if instead of the character you use the space character.

    Edit: You added more information about the input string and the desired output:

    Rather than three, there are four component strings: col1, col2, col3, and col4. We have some information about them:

    • col1 is between 0 and 10 characters long
    • col1 does not contain the space character
    • col2 is between 0 and 20 characters long
    • col2 does not contain the space character
    • col3 is between 0 and 30 characters long
    • col3 MAY contain the space character
    • col4 isn't constrained in length
    • col4 MAY contain the space character

    Additionally, the four strings are separated by spaces in their concatenation. So your goal is to uniquely determine col1, col2, col3, and col4 so str is equal to

    [NSString stringWithFormat:@"%@ %@ %@ %@", col1, col2, col3, col4];
    

    You can use an NSScanner to extract col1 and col2 in this case:

    NSScanner *scanner = [NSScanner scannerWithString:str];
    NSCharacterSet *spaceCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@" "];
    NSString *col1 = nil, *col2 = nil;
    [scanner scanUpToCharactersFromSet:spaceCharacterSet intoString:&col1];
    [scanner scanUpToCharactersFromSet:spaceCharacterSet intoString:&col2];
    

    At this point, it's possible to extract the string remainder which contains the two final strings col3 and col4 separated by a space:

    NSCharacterSet *emptyCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@""];
    NSString *remainder = nil;
    [scanner scanUpToCharactersFromSet:emptyCharacterSet intoString:&remainder];
    

    At this point, you are back in the same sort of situation I described at the beginning. You have a string (remainder) which consists of two component strings (col3 and col4) which are separated by a space. The only way to detect the border between these two strings is that space.

    However, col3 may contain spaces. If it could not, then you could simply scan along until the next space was reached and extract the contents between the beginning and that space into col3 and the rest into col4.

    In addition, col4 may also contain spaces. If it could not, then you could scan from the end of remainder until the first space from the end was reached, extract that range into col4 and the rest into col3.