Search code examples
javafilesearchbufferedreader

Searching through an a text file for a string?


I have a text file with thousands and thousands of lines of gibberish, Hidden somewhere inside is a string of words in english.

What would be the most efficient way to search through the text without having to read it line by line?

Is there a script I could write to read through the file?

I can post the file if anyones interested?

edit: If someone would be willing to show me how to check for words with a BufferedReader in Java that would be really cool!


Solution

  • If you know nothing more than that there is one streak of valid english words somewhere in the file, you will have to read in the file and check each word against a set of valid words (dictionary). On the first hit, you continue to read in the file until the first non-valid word occurs.

    This assumes there are no accidentally valid words within the gibberish. In that case, you would have to find all streaks of valid words, and then probably have a human (you) decide which is the right one.

    edit: another thing you can do is define a minimum streak length n if you know that the string of words you are looking for consists of a minimum on n valid words. This could at least spare you dealing with all the false positive 1-word-streaks of single accidentally valid words within the gibberish.