I'm reading file line by line and some lines have multiline values as below due to which my loop breaks and returns unexpected result.
TSNK/Metadata/tk.filename=PZSIIF-anefnsadual-rasdfepdasdort.pdf
TSNK/Metadata/tk_ISIN=LU0291600822,LU0871812862,LU0327774492,LU0291601986,LU0291605201
,LU0291595725,LU0291599800,LU0726995649,LU0726996290,LU0726995995,LU0726995136,LU0726995482,LU0726995219,LU0855227368
TSNK/Metadata/tk_GroupCode=PZSIIF
TSNK/Metadata/tk_GroupCode/PZSIIF=y
TSNK/Metadata/tk_oneTISNumber=16244,17007,16243,11520,19298,18247,20755
TSNK/Metadata/tk_oneTISNumber_TEXT=Neo Emerging Market Corporate Debt
Neo Emerging Market Debt Opportunities II
Neo Emerging Market Investment Grade Debt
Neo Floating Rate II
Neo Upper Tier Floating Rate
Global Balanced Regulation 28
Neo Multi-Sector Credit Income
Here TSNK/Metadata/tk_ISIN & TSNK/Metadata/tk_oneTISNumber_TEXT have multiline values. While reading line by line from file how do I read these fields as single line ?
I have tried below logic but it did not produce expected result:
try {
fr = new FileReader(FILENAME);
br = new BufferedReader(fr);
String sCurrentLine;
br = new BufferedReader(new FileReader(FILENAME));
int i=1;
CharSequence OneTIS = "TSNK/Metadata/tk_oneTISNumber_TEXT";
StringBuilder builder = new StringBuilder();
while ((sCurrentLine = br.readLine()) != null) {
if(sCurrentLine.contains(OneTIS)==true) {
System.out.println("Line number here -> "+i);
builder.append(sCurrentLine);
builder.append(",");
}
else {
System.out.println("else --->");
}
//System.out.println("Line number"+i+" Value is---->>>> "+sCurrentLine);
i++;
}
System.out.println("Line number"+i+" Value is---->>>> "+builder);
The solution involves Scanner
and multiline regular expressions.
The assumption here is that all of your lines start with TSNK/Metadata/
Scanner scanner = new Scanner(new File("file.txt"));
scanner.useDelimiter("TSNK/Metadata/");
Pattern p = Pattern.compile("(.*)=(.*)", Pattern.DOTALL | Pattern.MULTILINE);
String s = null;
do {
if (scanner.hasNext()) {
s = scanner.next();
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println("key = '" + matcher.group(1) + "'");
String[] values = matcher.group(2).split("[,\n]");
int i = 1;
for (String value : values) {
System.out.println(String.format(" val(%d)='%s',", (i++), value ));
}
}
}
} while (s != null);
The above produces output
key = 'tk.filename'
val(0)='PZSIIF-anefnsadual-rasdfepdasdort.pdf',
key = 'tk_ISIN'
val(0)='LU0291600822',
val(1)='LU0871812862',
val(2)='LU0327774492',
val(3)='LU0291601986',
val(4)='LU0291605201',
val(5)='',
val(6)='LU0291595725',
val(7)='LU0291599800',
val(8)='LU0726995649',
val(9)='LU0726996290',
val(10)='LU0726995995',
val(11)='LU0726995136',
val(12)='LU0726995482',
val(13)='LU0726995219',
val(14)='LU0855227368',
key = 'tk_GroupCode'
val(0)='PZSIIF',
key = 'tk_GroupCode/PZSIIF'
val(0)='y',
key = 'tk_oneTISNumber'
val(0)='16244',
val(1)='17007',
val(2)='16243',
val(3)='11520',
val(4)='19298',
val(5)='18247',
val(6)='20755',
key = 'tk_oneTISNumber_TEXT'
val(0)='Neo Emerging Market Corporate Debt ',
val(1)='Neo Emerging Market Debt Opportunities II ',
val(2)='Neo Emerging Market Investment Grade Debt ',
val(3)='Neo Floating Rate II ',
val(4)='Neo Upper Tier Floating Rate ',
val(5)='Global Balanced Regulation 28 ',
val(6)='Neo Multi-Sector Credit Income',
Please note empty entry (val(5)
for key tk_ISIN
) due to new line followed by a comma in that entry. It can be sorted quite easily either by rejecting empty strings or by adjusting the splitting pattern.
Hope this helps!