I'm trying to write a xslt that takes a html page and transforms it so that it contains only the contents of a div tag with id "content". I'm using Apache ServiceMix to develop a service unit that performs this action but am completely lost!
So far I have created a unit that (well at least I think it does this) takes a file, applies the transformation and saves it to an output folder:
<?xml version="1.0" encoding="UTF-8"?>
<blueprint
xmlns="http://www.osgi.org/xmlns/blueprint/v1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.osgi.org/xmlns/blueprint/v1.0.0
http://www.osgi.org/xmlns/blueprint/v1.0.0/blueprint.xsd">
<camelContext xmlns="http://camel.apache.org/schema/blueprint">
<route>
<from uri="file:camel/input"/>
<log message="Moving ${file:name} to the output directory"/>
<to uri="xslt:file:///transform.xsl"/>
<to uri="file:camel/output"/>
</route>
</camelContext>
</blueprint>
and a transformation .xsl file:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xhtml">
<xsl:template match="/">
<html>
<head><title>HTML Transformation</title></head>
<body>
<xsl:copy-of select="//xhtml:DIV[@id='content']"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
but it keeps throwing this error:
21:23:50,395 | INFO | le://camel/input | route3 | 91 - org.apache.camel.camel-core - 2.8.5 | Moving INPUTFILE.html to the output directory
21:23:50,850 | ERROR | le://camel/input | DefaultErrorHandler | 91 - org.apache.camel.camel-core - 2.8.5 | Failed delivery for exchangeId: ID-servicemix-48257-1358413760241-2-2137. Exhausted after delivery attempt: 1 caught: javax.xml.transform.TransformerException: java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd
javax.xml.transform.TransformerException: java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd
at org.apache.xalan.transformer.TransformerImpl.fatalError(TransformerImpl.java:782)[:]
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:756)[:]
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1273)[:]
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:1251)[:]
at org.apache.camel.builder.xml.XsltBuilder.process(XsltBuilder.java:123)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.impl.ProcessorEndpoint.onExchange(ProcessorEndpoint.java:102)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.impl.ProcessorEndpoint$1.process(ProcessorEndpoint.java:72)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.impl.converter.AsyncProcessorTypeConverter$ProcessorToAsyncProcessorBridge.process(AsyncProcessorTypeConverter.java:48)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:78)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.SendProcessor$2.doInAsyncProducer(SendProcessor.java:114)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.impl.ProducerCache.doInAsyncProducer(ProducerCache.java:284)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.SendProcessor.process(SendProcessor.java:109)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:78)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.DelegateAsyncProcessor.processNext(DelegateAsyncProcessor.java:98)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:89)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:69)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:78)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.DelegateAsyncProcessor.processNext(DelegateAsyncProcessor.java:98)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:89)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.interceptor.TraceInterceptor.process(TraceInterceptor.java:90)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:78)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.RedeliveryErrorHandler.processErrorHandler(RedeliveryErrorHandler.java:318)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:209)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.DefaultChannel.process(DefaultChannel.java:306)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:78)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.Pipeline.process(Pipeline.java:116)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.Pipeline.process(Pipeline.java:79)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.UnitOfWorkProcessor.processAsync(UnitOfWorkProcessor.java:139)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.UnitOfWorkProcessor.process(UnitOfWorkProcessor.java:106)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:78)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.DelegateAsyncProcessor.processNext(DelegateAsyncProcessor.java:98)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:89)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:69)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.component.file.GenericFileConsumer.processExchange(GenericFileConsumer.java:353)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.component.file.GenericFileConsumer.processBatch(GenericFileConsumer.java:176)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.component.file.GenericFileConsumer.poll(GenericFileConsumer.java:137)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.impl.ScheduledPollConsumer.doRun(ScheduledPollConsumer.java:139)[91:org.apache.camel.camel-core:2.8.5]
at org.apache.camel.impl.ScheduledPollConsumer.run(ScheduledPollConsumer.java:91)[91:org.apache.camel.camel-core:2.8.5]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)[:1.6.0_38]
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)[:1.6.0_38]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)[:1.6.0_38]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)[:1.6.0_38]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)[:1.6.0_38]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)[:1.6.0_38]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)[:1.6.0_38]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)[:1.6.0_38]
at java.lang.Thread.run(Thread.java:662)[:1.6.0_38]
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1459)[:1.6.0_38]
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)[:]
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)[:]
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)[:]
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)[:]
at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)[:]
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)[:]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)[:]
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)[:]
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)[:]
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)[:]
at org.apache.xml.dtm.ref.DTMManagerDefault.getDTM(DTMManagerDefault.java:439)[:]
at org.apache.xalan.transformer.TransformerImpl.transform(TransformerImpl.java:699)[:]
... 45 more
and I have no idea what it means :( Can anyone help?
Thanks heaps
As discussed in the comments, the issue here is that the DTD in the HTML page you are accessing is referencing an inaccessible file and this is causing the parser to fail while trying to access the file.
If the HTML is something you can modify, a possible solution would be to modify the DTD (it looks like the HTML4 loose DTD is accessible at this URL: http://www.w3.org/TR/html4/loose.dtd). The parser you're using may have an option to ignore the DTD, though that may not be the best option because the documents could be using HTML-only entities like
.