I have an InputStream which takes the html file as input parameter. I have to get the bytes from the input stream .
I have a string: "XYZ"
. I'd like to convert this string to byte format and check if there is a match for the string in the byte sequence which I obtained from the InputStream. If there is then, I have to replace the match with the bye sequence for some other string.
Is there anyone who could help me with this? I have used regex to find and replace. however finding and replacing byte stream, I am unaware of.
Previously, I use jsoup to parse html and replace the string, however due to some utf encoding problems, the file seems to appear corrupted when I do that.
TL;DR: My question is:
Is a way to find and replace a string in byte format in a raw InputStream in Java?
Not sure you have chosen the best approach to solve your problem.
That said, I don't like to (and have as policy not to) answer questions with "don't" so here goes...
Have a look at FilterInputStream
.
From the documentation:
A FilterInputStream contains some other input stream, which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality.
It was a fun exercise to write it up. Here's a complete example for you:
import java.io.*;
import java.util.*;
class ReplacingInputStream extends FilterInputStream {
LinkedList<Integer> inQueue = new LinkedList<Integer>();
LinkedList<Integer> outQueue = new LinkedList<Integer>();
final byte[] search, replacement;
protected ReplacingInputStream(InputStream in,
byte[] search,
byte[] replacement) {
super(in);
this.search = search;
this.replacement = replacement;
}
private boolean isMatchFound() {
Iterator<Integer> inIter = inQueue.iterator();
for (int i = 0; i < search.length; i++)
if (!inIter.hasNext() || search[i] != inIter.next())
return false;
return true;
}
private void readAhead() throws IOException {
// Work up some look-ahead.
while (inQueue.size() < search.length) {
int next = super.read();
inQueue.offer(next);
if (next == -1)
break;
}
}
@Override
public int read() throws IOException {
// Next byte already determined.
if (outQueue.isEmpty()) {
readAhead();
if (isMatchFound()) {
for (int i = 0; i < search.length; i++)
inQueue.remove();
for (byte b : replacement)
outQueue.offer((int) b);
} else
outQueue.add(inQueue.remove());
}
return outQueue.remove();
}
// TODO: Override the other read methods.
}
class Test {
public static void main(String[] args) throws Exception {
byte[] bytes = "hello xyz world.".getBytes("UTF-8");
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
byte[] search = "xyz".getBytes("UTF-8");
byte[] replacement = "abc".getBytes("UTF-8");
InputStream ris = new ReplacingInputStream(bis, search, replacement);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int b;
while (-1 != (b = ris.read()))
bos.write(b);
System.out.println(new String(bos.toByteArray()));
}
}
Given the bytes for the string "Hello xyz world"
it prints:
Hello abc world