Search code examples
javaapache-httpclient-4.xwebservice-client

API's HTTP response yields the entire HTML page instead of the response's body


I am currently writing a Java program that uses FreeCite API (a citation extraction service) - the API guide is defined here (there is an example in Ruby). I've been trying the API using Java (Apache HttpClient) for days, but it doesn't work as expected.

Here is the example in Ruby

Code:

require 'net/http'

Net::HTTP.start('localhost', 3000) do |http|
  response = http.post('/citations/create',
    'citation=A. Bookstein and S. T. Klein,  \
    Detecting content-bearing words by serial clustering,  \
Proceedings of the Nineteenth Annual International ACM SIGIR Conference \
on Research and Development in Information Retrieval,   \
pp. 319327,   1995.',
'Accept' => 'text/xml')

  puts "Code: #{response.code}"
  puts "Message: #{response.message}"
  puts "Body:\n #{response.body}"
end

n.b.: localhost refers to FreeCite. The expected response code is 201, and the response is XML.

Result:

<citations>
  <citation valid=true>
  <authors>
    <author>I S Udvarhelyi</author>
    <author>C A Gatsonis</author>
    <author>A M Epstein</author>
    <author>C L Pashos</author>
    <author>J P Newhouse</author>
    <author>B J McNeil</author>
  </authors>
  <title>Acute Myocardial Infarction in the Medicare population: process of care and clinical outcomes</title>
  <journal>Journal of the American Medical Association</ journal>
  <pages>18--2530</pages>
  <year>1992</year>
  <raw_string>Udvarhelyi, I.S., Gatsonis, C.A., Epstein, A.M., Pashos, C.L., Newhouse, J.P. and McNeil, B.J. Acute Myocardial Infarction in the Medicare population: process of care and clinical outcomes. Journal of the American Medical Association, 1992; 18:2530-2536.</raw_string>
  <ctx:context-objects xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='info:ofi/fmt:xml:xsd:ctx http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx' xmlns:ctx='info:ofi/fmt:xml:xsd:ctx'>
    <ctx:context-object timestamp='2008-07-11T00:57:33-04:00'
    encoding='info:ofi/enc:UTF-8' version='Z39.88-2004' identifier=''>
      <ctx:referent>
        <ctx:metadata-by-val>
          <ctx:format>info:ofi/fmt:xml:xsd:journal</ctx:format>
          <ctx:metadata>
            <journal xmlns:rft='info:ofi/fmt:xml:xsd:journal' xsi:schemaLocation='info:ofi/fmt:xml:xsd:journal http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:journal'>
              <rft:atitle>Acute Myocardial Infarction in the Medicare population: process of care and clinical outcomes</rft:atitle>
              <rft:spage>18</rft:spage>
              <rft:date>1992</rft:date>
              <rft:stitle>Journal of the American Medical Association</rft:stitle>
              <rft:genre>article</rft:genre>
              <rft:epage>2530</rft:epage>
              <rft:au>I S Udvarhelyi</rft:au>
              <rft:au>C A Gatsonis</rft:au>
              <rft:au>A M Epstein</rft:au>
              <rft:au>C L Pashos</rft:au>
              <rft:au>J P Newhouse</rft:au>
              <rft:au>B J McNeil</rft:au>
            </journal>
          </ctx:metadata>
        </ctx:metadata-by-val>
      </ctx:referent>
    </ctx:context-object>
  </ctx:context-objects>
  </citation>
</citations>

Here's my project:

Code:

import java.io.IOException;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.io.IOUtils;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;

public class HttpClientTest {

    public static void main(String[] args) throws UnsupportedEncodingException {
        HttpClient httpclient = HttpClients.createDefault();
        HttpPost httppost = new HttpPost("http://freecite.library.brown.edu/citations/create");

        // Request parameters and other properties.
        List<NameValuePair> params = new ArrayList<NameValuePair>();
        params.add(new BasicNameValuePair("citation", "A. Bookstein and S. T. Klein, Detecting content-bearing words by serial clustering, "
                + "Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 319327, 1995."));
        httppost.setEntity(new UrlEncodedFormEntity(params, "UTF-8"));

        //Execute and get the response.
        HttpResponse response = null;
        try {
            response = httpclient.execute(httppost);
            response.setHeader("Content-Type", "text/xml");

        } catch (ClientProtocolException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        HttpEntity entity = response.getEntity();
        System.out.println(response.getStatusLine());

        if (entity != null) {
            InputStream instream = null;
            try {
                instream = entity.getContent();
                // NB: does not close inputStream, you can use IOUtils.closeQuietly for that
                String theString = IOUtils.toString(instream, "UTF-8"); 
                IOUtils.closeQuietly(instream);
                System.out.println(theString);
            } catch (UnsupportedOperationException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                // do something useful
            } finally {
                try {
                    instream.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}

Result:

I got the entire HTML page, instead of the XML; and response code 200 instead of 201.

HTTP/1.1 200 

<script src="/javascripts/prototype.js?1218559878" type="text/javascript"></script>
<link href="/stylesheets/citation.css?1218559878" media="screen" rel="stylesheet" type="text/css" />
<table>

  <tr>
  <td>
  <span class="citation"> <span class="authors"> <span class="author"> A Bookstein</span> <span class="author"> S T Klein</span> </span> <span class="title"> Detecting content-bearing words by serial clustering</span> <span class="booktitle"> Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</span> <span class="pages"> 319327</span> <span class="year"> 1995</span> <br> <span class="raw_string"> A. Bookstein and S. T. Klein, Detecting content-bearing words by serial clustering, Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 319327, 1995.</span> </span> 
  <br>
  <code> &lt;ctx:context-objects xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='info:ofi/fmt:xml:xsd:ctx http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:ctx' xmlns:ctx='info:ofi/fmt:xml:xsd:ctx'&gt;&lt;ctx:context-object timestamp='2016-10-29T02:43:38-04:00' encoding='info:ofi/enc:UTF-8' version='Z39.88-2004' identifier=''&gt;&lt;ctx:referent&gt;&lt;ctx:metadata-by-val&gt;&lt;ctx:format&gt;info:ofi/fmt:xml:xsd:book&lt;/ctx:format&gt;&lt;ctx:metadata&gt;&lt;book xmlns:rft='info:ofi/fmt:xml:xsd:book' xsi:schemaLocation='info:ofi/fmt:xml:xsd:book http://www.openurl.info/registry/docs/info:ofi/fmt:xml:xsd:book'&gt;&lt;rft:atitle&gt;Detecting content-bearing words by serial clustering&lt;/rft:atitle&gt;&lt;rft:date&gt;1995&lt;/rft:date&gt;&lt;rft:btitle&gt;Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval&lt;/rft:btitle&gt;&lt;rft:genre&gt;proceeding&lt;/rft:genre&gt;&lt;rft:pages&gt;319327&lt;/rft:pages&gt;&lt;rft:au&gt;A Bookstein&lt;/rft:au&gt;&lt;rft:au&gt;S T Klein&lt;/rft:au&gt;&lt;/book&gt;&lt;/ctx:metadata&gt;&lt;/ctx:metadata-by-val&gt;&lt;/ctx:referent&gt;&lt;/ctx:context-object&gt;&lt;/ctx:context-objects&gt; </code>
  </td>

  <td bgcolor="FF9999" class='choose_option'>
    <input id="unusable"
      name="citation_rating_13655375"
      type="radio"
      value="unusable"
      onclick="new Ajax.Request('/citations/set_rating/13655375', {parameters:{rating: this.value} }); return true;"
       />
    <label for='unusable'>unusable</label>
  </td>

  <td bgcolor="FFFFCC" class='choose_option'>
    <input id="usable"
      name="citation_rating_13655375"
      type="radio"
      value="usable"
      onclick="new Ajax.Request('/citations/set_rating/13655375', {parameters:{rating: this.value} }); return true;"
       />
    <label for='usable'>good enough</label>
  </td>

  <td bgcolor="CCFFCC" class='choose_option'>
    <input id="perfect"
      name="citation_rating_13655375"
      type="radio"
      value="perfect"
    onclick="new Ajax.Request('/citations/set_rating/13655375', {parameters:{rating: this.value} }); return true;"
       />
    <label for='perfect'>perfect</label>
  </td>

  </tr>

</table>

<br>
Key:
<span title="author" class="author">Authors</span>
<span title="title" class="title">Title</span>
<span title="journal" class="journal">Journal</span>
<span title="booktitle" class="booktitle">Booktitle</span>
<span title="editor" class="editor">Editor</span>
<span title="volume" class="volume">Volume</span>
<span title="publisher" class="publisher">Publisher</span>
<span title="institution" class="institution">Institution</span>
<span title="location" class="location">Location</span>
<span title="number" class="number">Number</span>
<span title="pages" class="pages">Pages</span>
<span title="year" class="year">Year</span>
<span title="tech" class="tech">Tech</span>
<span title="note" class="note">Note</span>
<br>
<span class="raw_string">Original citation string</span>
<br>
<code>ContextObject</code>
<br>
<a href="/welcome">Home</a>

n.b.: Inside the <code> tag above, there is this XML data:

<rft:atitle>Detecting content-bearing words by serial clustering</rft:atitle>
<rft:date>1995</rft:date>
<rft:btitle>Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</rft:btitle>
<rft:genre>proceeding</rft:genre>
<rft:pages>319327</rft:pages>
<rft:au>A Bookstein</rft:au>
<rft:au>S T Klein</rft:au>

Question: Where is the error and how could I fix this to get an XML response (w/ response code 201)?


Solution

  • Here is what you are doing in Ruby ...

    response = http.post('/citations/create',
       'citation=A. Bookstein and S. T. Klein,  \
       Detecting content-bearing words by serial clustering,  \
       Proceedings of the Nineteenth Annual International ACM SIGIR Conference \
       on Research and Development in Information Retrieval,   \
       pp. 319327,   1995.',
     'Accept' => 'text/xml')
    

    Here is what you are doing in Java

    HttpClient httpclient = HttpClients.createDefault();
    HttpPost httppost = new HttpPost(
        "http://freecite.library.brown.edu/citations/create");
    
    // Request parameters and other properties.
    List<NameValuePair> params = new ArrayList<NameValuePair>();
    params.add(new BasicNameValuePair("citation", 
        "A. Bookstein and S. T. Klein, Detecting content-bearing " +
        "words by serial clustering, " +
        "Proceedings of the Nineteenth Annual International ACM SIGIR " +
        "Conference on Research and Development in Information " +
        "Retrieval, pp. 319327, 1995."));
        httppost.setEntity(new UrlEncodedFormEntity(params, "UTF-8"));
    
    ...
    response = httpclient.execute(httppost);
    response.setHeader("Content-Type", "text/xml");
    

    See the difference?

    In the Java case:

    • you are setting Content-type instead of Accept
    • you are setting it on the Response object rather than HttpPost object
    • you are setting it AFTER executing the request.

    Now Accept and Content-type mean different things. The first one says "I want you to send me something of this type". The second one says "I am sending you something of this type".

    And, of course, setting a content type on a Response that you have just received is worse than useless. It is actually clobbering the real content type in the response ... which was probably "text/html", because your request didn't specify anything.

    You should actually be calling

    httppost.setHeader("Accept", "text/xml");
    

    before the execute call.