Search code examples
regexgrepwhitespace

grep: match all characters up to (not including) first blank space


I have a text file that has the following format:

characters(that I want to keep) (space) characters(that I want to remove)

So for example:

foo garbagetext
hello moregarbage
keepthis removethis
(etc.)

So I was trying to use the grep command in Linux to keep only the characters in each line up to and not including the first blank space. I have tried numerous attempts such as:

grep '*[[:space:]]' text1.txt > text2.txt
grep '*[^\s]' text1.txt > text2.txt
grep '/^[^[[:space:]]]+/' text1.txt > text2.txt

trying to piece together from different examples, but I have had no luck. They all produce a blank text2.txt file. I am new to this. What am I doing wrong?

*EDIT:

The parts I want to keep include capital letters. So I want to keep any/all characters up to and not including the blank space (removing everything from the blank space onward) in each line.

**EDIT:

The garbage text (that I want to remove) can contain anything, including spaces, special characters, etc. So for example:

AA rough, cindery lava [n -S]

After running grep -o '[^ ]*' text1.txt > text2.txt, the line above becomes:

AA
rough,
cindery
lava
[n
-S]

in text2.txt. (All I want to keep is AA)


SOLUTION (provided by Rohit Jain with further input by beny23):

grep -o '^[^ ]*' text1.txt > text2.txt

Solution

  • You are putting quantifier * at the wrong place.

    Try instead this: -

    grep '^[^\s]*' text1.txt > text2.txt
    

    or, even better: -

    grep '^\S*' text1.txt > text2.txt  
    

    \S means match non-whitespace character. And anchor ^ is used to match at the beginning of the line.