Here's an example input:
<div><a class="document-subtitle category" href="/store/apps/category/GAME_ADVENTURE"> <span itemprop="genre">Adventure</span> </a> </div> <div> </div>
The string i'm trying to locate is this:
document-subtitle category" href="/store/apps/category/
and I want to extract the characters that follows that string up until the end of the href attribute (">).
In this case, my output should be:
GAME_ADVENTURE
My input file is guaranteed to have only one string that matches exactly to:
document-subtitle category" href="/store/apps/category/
What's the easiest way of achieving this?
This worked for me:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public class ExtractData {
public static String matcher = "document-subtitle category\" href=\"/store/apps/category/";
public static void main(String[] args) throws IOException {
String filePath = args[0];
String content = new String(Files.readAllBytes(Paths.get(filePath)));
int startIndex = content.indexOf(matcher);
int endIndex = content.indexOf("\">", startIndex);
String category = content.substring(startIndex + matcher.length(), endIndex);
System.out.println("category is " + category);
}
}