Search code examples
c#stringreplacediacriticsnon-ascii-characters

C# - How to add a space after every dash character for every line in a subtitle file


I'm making a very simple Windows application using Visual Studio and C# that edits subtitles files for movies. I want a program that adds a space to dialog sentences when there isn't one. For example:

-Hey, what's up?

-Nothing much.

to

- Hey, what's up?

- Nothing much.

I used the toolbox to create an interface with just one button for selecting the correct file. This is the code I have for this button:

private void button1_Click(object sender, EventArgs e)
    {
        if (openFileDialog1.ShowDialog() == DialogResult.OK)
        {
            string text = File.ReadAllText(openFileDialog1.FileName, Encoding.GetEncoding("iso-8859-1"));
            text = text.Replace("-A", "- A");
            File.WriteAllText(openFileDialog1.FileName, text, Encoding.GetEncoding("iso-8859-1"));
        }
    }

What this does is basically replace "-A" with "- A", thus creating a space. This is the solution that I've come up with and I was planning to do this with every letter, including accented letters, such as À, Á, È, É, etc, etc.

This does not work. If I put text = text.Replace("-É", "- É"); the program does nothing.

What I want to know is how do I fix this.

Thank you for reading and if you have a better alternative for my application then please feel free to let me know.


Solution

  • As for the comments, use Regex.

            var rx = new System.Text.RegularExpressions.Regex("^-([^ ])");
            ... in your loop
            var text = rx.Replace(text, "- $1");
    

    Basically what this does is that it searches for a dash at the beginning of the string, but only which is NOT followed by a space. The () means that the char following the dash should be "saved". The replace searches in the provided string and replaces (doh!) the matched text with a dash, a space, and the same character matched before. Whatever it is.

    Source: https://xkcd.com/208/

    Edit: you do not have a loop, you have a string containing the full content of a file in which every line should contain a subtitle line (right?). If that is the case, you can configure the regular expression to treat the string as a list of rows, as this:

            var rx = new Regex("^-([^ ])", RegexOptions.Multiline);
    

    See this fiddle for an example: https://dotnetfiddle.net/ciFlAu