Search code examples
c#readline

Manipulate .txt WhatsApp backup


I was looking for a way to manipulate a .txt WhatsApp backup conversation, but I'm stuck now.

I want to save into a list the DateTime, Date, Time, User and Message of the conversation.

This is the "normal" format of the txt:

5/31/18, 11:18 AM - User1: Hey
5/31/18, 11:18 AM - User2: what's up!
5/31/18, 3:19 PM - User1: Not much. 

So I tough about this solution:

while ((line = file.ReadLine()) != null)
            {
                if(line.Length > 0)
                {
                    list.Add(new Whatsapp()
                    {
                        DateTime= line.Substring(0, line.IndexOf("-")).Replace(",", "").Trim(),
                        Date= line.Substring(0, line.IndexOf(",")).Trim(),
                        Time= line.Substring(0, line.IndexOf("-")).Trim().Substring(line.Substring(0, line.IndexOf("-")).Trim().IndexOf(",") + 2),
                        User = line.Substring(line.IndexOf("-") + 2).Substring(0, line.Substring(line.IndexOf("-") + 2).IndexOf(":")).Trim(),
                        Message= line.Substring(line.IndexOf("-") + 2).Trim().Substring(line.Substring(line.IndexOf("-") + 2).Trim().IndexOf(":") + 2).Trim()

                    });
                }
            } 

And It worked, until I face that the format breaks when in the same conversation the user break a line on the message, as:

5/31/18, 11:18 AM - User1: Hey
5/31/18, 11:18 AM - User2: what's up! 
5/31/18, 3:19 PM - User1: Not much. 
5/31/18, 3:20 PM - User2: Oh well..
Am I being annoying
doing
this
?
5/31/18, 3:19 PM - User1: Yep :(

So the file.ReadLine() doesn't work anymore, and I don't now how to turn around this. Any suggestions?


Solution

  • First and foremost I want to say that parsing a file based on assumed character positions is a horrible idea, especially if you don't have full control over the format of the data. All it'll take is some minor fluctuations and your whole thing will not only just not work, but possibly crash. That being said...

    while ((line = file.ReadLine()) != null)
    {
        if (line.Length <= 0)
        {
            continue;
        }
    
        var firstComma = line.IndexOf(",");
    
        if (firstComma >= 0)
        {
            var possibleDate = line.Substring(0, firstComma);
            if (DateTime.TryParse(possibleDate, out _))
            {
                list.Add(new Whatsapp
                {
                    DateTime = line.Substring(0, line.IndexOf("-")).Replace(",", "").Trim(),
                    Date = line.Substring(0, line.IndexOf(",")).Trim(),
                    Time = line.Substring(0, line.IndexOf("-")).Trim().Substring(line.Substring(0, line.IndexOf("-")).Trim().IndexOf(",") + 2),
                    User = line.Substring(line.IndexOf("-") + 2).Substring(0, line.Substring(line.IndexOf("-") + 2).IndexOf(":")).Trim(),
                    Message = line.Substring(line.IndexOf("-") + 2).Trim().Substring(line.Substring(line.IndexOf("-") + 2).Trim().IndexOf(":") + 2).Trim()
                });
            }
            else
            {
                list.Last().Message += $"{line.Trim()}\r\n";
            }
        }
        else
        {
            list.Last().Message += $"{line.Trim()}\r\n";
        }
    }
    

    It's big, it's ugly and frankly I don't agree with half of what you're doing in there already, but it will do what you want.

    To clarify exactly what it does above and beyond your already doing, as it iterates through each line it checks to see if there is a comma, if there isn't then it assumes it should be part of the message of the last line (dangerous action number 1). If there is a comma, then it tries to parse the text up to that comma into a datetime, if it can't, then it again assumes it's part of the message of the last line (dangerous action number 2). Otherwise it behaves as you had written.

    Unrelated comments, why are you storing things that are DateTime as strings? Your substring lines where you're assigning to the object are all but unreadable, you probably want to revisit those. But since those are outside of the scope of the question, I'll just leave them here as food for thought.

    Again, I know my additions aren't pretty, but then again parsing strings into things never is.