Search code examples
javainputstreambufferedreaderbufferedwriterinputstreamreader

How can I remove all characters after the first instance of a tab?


I have a large text file, around 200,000 lines of word translations. I want to keep the translated text, which appears after the tab.

abaxial van  osovine
abbacy  opatstvo
abbaino     kora
abbatial    opatski
abbe    opat
abbé    opat
abbé    sveæenik
hematological parameters    hematološki pokazatelji

How can I get strip all characters before the first instance of a tab?


Solution

  • You can use this regex to match everything before the translation:

     .+? {2,}
    

    Try this regex online: https://regex101.com/r/P0TY1k/1

    Use this regex to call replaceAll on your string.

    yourString.replaceAll(".+? {2,}", "");
    

    EDIT: If the delimiter is not 2 spaces but a tab, you can try this regex instead:

    .+?(?: {2,}|\t)