Search code examples
batch-fileawksedcmdcygwin

How to Move a Text Pattern Horizontally Using CMD or Cygwin CLI Tool?


I don't know if this is even possible in command line, but anyway, here is what I want to do:

I have a text file written like that

- FileName1.txt
http://example.com/AnyName-For-File-1.txt
- FileName2.txt
- FileName3.txt
- FileName4.txt
http://example.com/AnyName-For-File-4.txt
- FileName5.txt
http://example.com/AnyName-For-File-5.txt

As you can see, the text was written randomly (somehow), which means that some files have an address, and some of them don't, so I can't Apply any rule on these lines like arranging\sorting and so ever, or I'm gonna lose the files "Names,Addresses" arrangement.

So, first I had to Move All of the addresses lines, one step up (that was the easy part in GUI), and I was able to do it using Np++/TextPad Regex as follow:- Find:\nhttp - Replace:http , The final result was like this:

Step.1:-

- FileName1.txt http://example.com/AnyName-For-File-1.txt
- FileName2.txt
- FileName3.txt
- FileName4.txt http://example.com/AnyName-For-File-4.txt
- FileName5.txt http://example.com/AnyName-For-File-5.txt

Now, The worst part (at least for me) is to move the matching pattern to the beginning of their lines, Exactly Like This:

Step.2:-

http://example.com/AnyName-For-File-1.txt- FileName1.txt
- FileName2.txt
- FileName3.txt
http://example.com/AnyName-For-File-4.txt- FileName4.txt 
http://example.com/AnyName-For-File-5.txt- FileName5.txt 

and now I can easily sort them, or whatever I need without any risk. So, my question is:-

In Command Line CMD or Cygwin :-

1- How to Find "\nhttp" , and Replace with " http" ?

2- How to Move The Matching Patterns (File Address, From http to .txt), to the beginning of their Lines ?

also if there is any other technique, it would be great to know it.

Thanks a lot guys for the help you're offering, in such a great community. I really appreciate your help :)


Solution

  • Here is an awk command which, I think, does what you want:

    $ awk '/^http/{print $0 last;last="";next} last {print last} {last=$0} END{if (last) print last;}' file2
    http://example.com/AnyName-For-File-1.txt- FileName1.txt
    - FileName2.txt
    - FileName3.txt
    http://example.com/AnyName-For-File-4.txt- FileName4.txt
    http://example.com/AnyName-For-File-5.txt- FileName5.txt
    

    How it works

    The script has one variable, last which contains the contents of the previous line. awk implicitly loops over every line in the input file

    • /^http/{print $0 last;last="";next}

      If the current line starts with http, then print it and the previous line together. Set last to empty and skip the remaining commands and jump to the next line.

    • last {print last}

      If the last variable is not empty, print it. This only happens if there was no URL to go with the last line.

    • {last=$0}

      Update the last variable with the current line. In awk, $0 denotes the whole of the current line.

    • END{if (last) print last;}

      At the end of the input, if there is still a line in last, print it. This only happens if the last line was a file name which lacked a URL.

    Doing just the first step in sed

    As long as file is not too big, this will work:

    $ sed  ':a;N;$!b a;s/\nhttp/ http/g' file
    - FileName1.txt http://example.com/AnyName-For-File-1.txt
    - FileName2.txt
    - FileName3.txt
    - FileName4.txt http://example.com/AnyName-For-File-4.txt
    - FileName5.txt http://example.com/AnyName-For-File-5.txt
    

    This works by reading the entire file into sed's pattern space and then substituting to replace \nhttp with http.

    In more detail:

    • :a;N;$!b a

      This is a loop. :a is a label. N reads the next line into the pattern space. b a jumps to label :a. We want to continue this loop until the end of the file. The last line in the file is called $ and ! means not. So, $!b a means jump to label :a unless we have reached the last line of the file.

    • s/\nhttp/ http/g

      Now that we have the whole of the file in the pattern space, we do a global substitution replacing \nhttp with http.

    This is a variation on the above. It reads lines into the pattern space until it reaches a line that starts with http. Then, it removes the newline from in front of that line:

    $ sed ':a;N;/http/!b a; s/\nhttp/ http/' file
    - FileName1.txt http://example.com/AnyName-For-File-1.txt
    - FileName2.txt
    - FileName3.txt
    - FileName4.txt http://example.com/AnyName-For-File-4.txt
    - FileName5.txt http://example.com/AnyName-For-File-5.txt
    

    Since this approach doesn't read the whole file in at once, it is easier on memory if the file is large.

    In more detail:

    • :a;N;/http/!b a

      Just as above, this is a loop. It keeps branching back to label :a to read another line until we get a line that includes http.

    • s/\nhttp/ http/

      This replaces the newline in front of http with a space.