I have a problem and unable to solve it since last two weeks. I want some help here. I actually want to get and use some useful data from a HTTP website. This website actually contains accidents, incidents and all info about them. I want to get this info from the website. I will use it in my Android app. I've already asked this question but still unable to solve it. Someone told me that you have to get this data from JSON. I have not done this before. If it is the only solution, then how can I do this. If any other simple way is there then please give me that. I actually have get all website content by using
private String DownloadText(String URL) {
int BUFFER_SIZE = 2000;
InputStream in = null;
try {
in = OpenHttpConnection(URL);
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
return "exception in downloadText";
}
InputStreamReader isr = new InputStreamReader(in);
int charRead;
String str = "";
char[] inputBuffer = new char[BUFFER_SIZE];
try {
while ((charRead = isr.read(inputBuffer))>0)
{
//---convert the chars to a String---
String readString = String.copyValueOf(inputBuffer, 0, charRead);
str += readString;
inputBuffer = new char[BUFFER_SIZE];
}
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return "";
}
return str;
}
private InputStream OpenHttpConnection(String urlString) throws IOException {
InputStream in = null;
int response = -1;
URL url = new URL(urlString);
URLConnection conn = url.openConnection();
if (!(conn instanceof HttpURLConnection))
throw new IOException("Not an HTTP connection");
try{
HttpURLConnection httpConn = (HttpURLConnection) conn;
httpConn.setAllowUserInteraction(false);
httpConn.setInstanceFollowRedirects(true);
httpConn.setRequestMethod("GET");
httpConn.connect();
response = httpConn.getResponseCode();
if (response == HttpURLConnection.HTTP_OK) {
in = httpConn.getInputStream();
}
}
catch (Exception ex) {
throw new IOException("Error connecting");
}
return in;
}
But it gives all the content i.e. all info+html+xml+++. But I want only required info.
Another thing is, is it compulsory to get website-admin permission before getting that data?
What you're looking for is something called web scraping or html scraping. Have a look at this SO question to get you started: Options for HTML scraping?