public class Test {
public static void main(String[] args) throws Exception{
String data;
TikaConfig tikaConfig = TikaConfig.getDefaultConfig();
Metadata metadata = new Metadata();
ContentHandler handler;
try (InputStream stream = new BufferedInputStream(new FileInputStream(new File("E:\\AllTypes\\PPT\\Presentation1.pptx")))) {
Detector detector = tikaConfig.getDetector();
Parser parser = tikaConfig.getParser();
MediaType type = detector.detect(stream, metadata);
metadata.set(Metadata.CONTENT_TYPE, type.toString());
handler = new BodyContentHandler(-1);
parser.parse(stream, handler, metadata, new ParseContext());
data = handler.toString();
System.out.println(data);
}
}
}
I have only Hello world! in the input ppt So i want only Hello world! Output: [Content_Types].xml
_rels/.rels
ppt/slides/_rels/slide1.xml.rels
ppt/_rels/presentation.xml.rels
ppt/presentation.xml
ppt/slides/slide1.xml Hello world!
ppt/slideLayouts/_rels/slideLayout6.xml.rels
ppt/slideLayouts/_rels/slideLayout7.xml.rels
ppt/slideLayouts/_rels/slideLayout9.xml.rels
ppt/slideLayouts/_rels/slideLayout10.xml.rels
ppt/slideLayouts/_rels/slideLayout8.xml.rels
ppt/slideLayouts/_rels/slideLayout11.xml.rels
ppt/slideLayouts/_rels/slideLayout1.xml.rels
ppt/slideLayouts/_rels/slideLayout2.xml.rels
ppt/slideLayouts/_rels/slideLayout3.xml.rels
ppt/slideLayouts/_rels/slideLayout4.xml.rels
ppt/slideMasters/_rels/slideMaster1.xml.rels
ppt/slideLayouts/slideLayout11.xml Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout10.xml Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout3.xml Click to edit Master title style Click to edit Master text styles 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout2.xml Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout1.xml Click to edit Master title style Click to edit Master subtitle style 1/30/2018 ‹#›
ppt/slideMasters/slideMaster1.xml Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout4.xml Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level Click to edit Master text styles Second level Third level Fourth level Fifth level 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout5.xml Click to edit Master title style Click to edit Master text styles Click to edit Master text styles Second level Third level Fourth level Fifth level Click to edit Master text styles Click to edit Master text styles Second level Third level Fourth level Fifth level 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout6.xml Click to edit Master title style 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout7.xml 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout8.xml Click to edit Master title style Click to edit Master text styles Second level Third level Fourth level Fifth level Click to edit Master text styles 1/30/2018 ‹#›
ppt/slideLayouts/slideLayout9.xml Click to edit Master title style Click to edit Master text styles 1/30/2018 ‹#›
ppt/slideLayouts/_rels/slideLayout5.xml.rels
ppt/theme/theme1.xml
docProps/thumbnail.jpeg
ppt/presProps.xml
ppt/tableStyles.xml
ppt/viewProps.xml
docProps/core.xml PowerPoint Presentation srinuk srinuk 1 2018-01-30T10:19:34Z 2018-01-30T10:22:05Z
docProps/app.xml 2 3 Microsoft Office PowerPoint Widescreen 1 1 0 0 0 false Fonts Used 3 Theme 1 Slide Titles 1 Arial Calibri Calibri Light Office Theme PowerPoint Presentation false false false 15.0000
You can try to use tika-app.jar.Just use a Tika extract text function.
Tika tika = new Tika();
File file = new File("path");
String str = tika.parseToString(file);
This code just parses text content from the file.