Search code examples
javahtmlservletsinputstreambufferedinputstream

manipulate the response body of .html and .jsp page


So I used this code to get the response body (source code of the page accessed) for .jsp page
can some one please help me as to how do i extract the response body for .html page.

public class DetailFilter implements Filter {
    private FilterConfig config;
    public DetailFilter() {
        super();
    }

    public void init(final FilterConfig filterConfig) throws ServletException {
        this.config = filterConfig;
    }


    public void destroy() {
        config = null;
    }

    public void doFilter(final ServletRequest request, final ServletResponse response,
                         final FilterChain chain) throws IOException, ServletException {

        ServletResponse newResponse = response;

        if (request instanceof HttpServletRequest) {
            newResponse = new CharResponseWrapper((HttpServletResponse) response);
        }

        chain.doFilter(request, newResponse);

        if (newResponse instanceof CharResponseWrapper) {
            String text = newResponse.toString();

            if (text != null) {
                response.getWriter().write(text);
                System.out.println("text is: "+text);
            }
        }
    }
}


public class CharResponseWrapper extends HttpServletResponseWrapper{
    protected CharArrayWriter charWriter;

    protected PrintWriter writer;

    protected boolean getOutputStreamCalled;

    protected boolean getWriterCalled;

    public CharResponseWrapper(HttpServletResponse response) {
        super(response);

        charWriter = new CharArrayWriter();
    }

    public ServletOutputStream getOutputStream() throws IOException {
        if (getWriterCalled) {
            throw new IllegalStateException("getWriter already called");
        }

        getOutputStreamCalled = true;
        return super.getOutputStream();
    }

    public PrintWriter getWriter() throws IOException {
        if (writer != null) {
            return writer;
        }
        if (getOutputStreamCalled) {
            throw new IllegalStateException("getOutputStream already called");
        }
        getWriterCalled = true;
        writer = new PrintWriter(charWriter);

        return writer;
    }

    public String toString() {
        String s = null;
        if (writer != null) {
            s = charWriter.toString();
        }
        System.out.println("tosting is:"+s);
        return s;
    }
}

The problem is for a .jsp page getWriter() method(in CharResponseWrapper) is being called and value is returned in writer but for .html page ServletOutputStream is called and it returns null value.

I also tried URLConnection and InputStreamReader for the same. Code i used is mentioned below

HttpServletRequest hReq = (HttpServletRequest) request;
StringBuffer ss=hReq.getRequestURL();
String u=ss.toString();
URL url = new URL(u);
URLConnection con = url.openConnection();
System.out.println("Connection successful");
InputStream is =con.getInputStream();
System.out.println("InputStream Successful");
BufferedReader br = new BufferedReader(new InputStreamReader(is));

String line = null;
String[] arr={};
while ((line = br.readLine()) != null) {
    System.out.println(line);
}

The code goes well and prints "Connection successful" on console but then it goes on as a infinite loop and never really executes"InputStream Successful". To my understanding once the connection is created when we call InputStream it sends a request to the same url and the whole process is repeated again and again. May be this process works only for a particular url eg url="www.abcd.com"

I want to extract the response body of the .html page for some data manipulation. any help on this plz.

EDIT

To continue this question, after I get the response body. I am inserting JS before the tag. When I SOP that string I see the inserted response body. Till this step all is fine.I convert it to byte array and write the byte array in servletoutputstream instance.

ServletOutputStream newResponse1= response.getOutputStream();
newResponse1.write(bArray);
newResponse1.close();

where bArray is the response body with JS inserted, in byte array format. response is ServletResponse. The output I get is strange. JSP page gets executed twice. Means if i have a button on jsp page, it shows that same button two times and the JS is executed. HTML page gets executed once but that final response, is not the same response I wrote. means the bArray data(injected data)I wrote is not the same response I see on browser.

I feel I need to override the getOutputStream method again which unfortunately I am not able to. Please help. revert if question is not clear.

I also took reference from How to read and copy the HTTP servlet response output stream content for logging


Solution

  • You need to intercept the output delivered to the underlying ServletOutputStream, in the same way CharArrayWriter does. So, I recommend you modify the getOutputStream method to encapsulate the returned object into your own instance of ServletOutputStream, and store it as an instance variable of CharResponseWrapper (the same as CharArrayWriter). It would enough to be like this:

    public class MyServletOutputStream extends ServletOutputStream
    {
        private final ServletOutputStream src;
    
        private final StringBuilder stb=new StringBuilder(4096);
    
        public MyServletOutputStream(ServletOutputStream src)
        {
            super();
            this.src=src;
        }
    
        @Override
        public void write(int b)
            throws IOException
        {
            this.src.write(b);
            this.stb.append((char)b);
        }
    
        public StringBuilder getStb()
        {
            return this.stb;
        }
    }
    

    Last, modify the toString method to decide what object it has to get the data from: CharArrayWriter or MyServletOutputStream.