How to extract Typescript codes from this html page? It has p and class "synStatement", "synIdentifier", "synConstant", "synType". I am learning jsoup. The output from my Java jsoup program is not complete and also not formatted properly.
<p>Currency.ts</p>
<pre class="code lang-typescript" data-lang="typescript" data-unlink><span class="synStatement">export</span> <span class="synStatement">type</span> Currency <span class="synStatement">=</span> <span class="synIdentifier">{</span>
unit: <span class="synConstant">'EUR'</span> | <span class="synConstant">'GBP'</span> | <span class="synConstant">'JPY'</span> | <span class="synConstant">'USD'</span>
value: <span class="synType">number</span>
<span class="synIdentifier">}</span>
<span class="synStatement">export</span> <span class="synStatement">const</span> Currency <span class="synStatement">=</span> <span class="synIdentifier">{</span>
<span class="synStatement">from(</span>value: <span class="synType">number</span><span class="synStatement">,</span> unit: Currency<span class="synIdentifier">[</span><span class="synConstant">'unit'</span><span class="synIdentifier">]</span> <span class="synStatement">=</span> <span class="synConstant">'USD'</span><span class="synStatement">)</span>: Currency <span class="synIdentifier">{</span>
<span class="synStatement">return</span> <span class="synIdentifier">{</span> unit<span class="synStatement">,</span> value <span class="synIdentifier">}</span>
<span class="synIdentifier">}</span>
<span class="synIdentifier">}</span>
</pre>
Desired output:
Currency.ts
export type Currency = {
unit: 'EUR' | 'GBP' | 'JPY' | 'USD'
value: number
}
export const Currency = {
from(value: number, unit: Currency['unit'] = 'USD'): Currency {
return { unit, value }
}
}
I tried:
import java.io.File;
public class Currency
{
public static void main( String[] args )
{
try {
File input = new File("Currency.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
List<String> typescriptCode = new ArrayList<String>();
String strs[] = {
"synStatement",
"synIdentifier",
"synConstant",
"synType",
};
for (String str : strs) {
Elements spansWithsynStatementElements = doc.select("span." + str);
if (spansWithsynStatementElements != null) {
for (Element e : spansWithsynStatementElements) {
String text = "";
text += e.ownText();
typescriptCode.add(text);
}
}
}
int size = typescriptCode.size();
for (int i = 0; i < size; i++) {
System.out.println(typescriptCode.get(i));
System.out.println("");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
If formatting is not an issue for you, you can simply extract and print the text:
String script = doc.text();
System.out.println(script);
The output is:
Currency.ts export type Currency = { unit: 'EUR' | 'GBP' | 'JPY' | 'USD' value: number}export const Currency = { from(value: number, unit: Currency['unit'] = 'USD'): Currency { return { unit, value } }}
If you want to format the output, you'll have to use a pretty print library. You can look here for example.