从文档(office,pdf,hwp)抽取文本的Java类库:JSearch
从文档(office,pdf,hwp)抽取文本的Java类库:JSearch。
Download & Installation
JSearch.jar
Just import JSearch.jar to your project
Requirement
- It should work with various types of document. ex) hwp, pdf, office
- It should support extract string and rapidly find keyword from doucments.
- It will be jar library.
- All functions are synchronous.
- a result of extraction contains full string.
- a result of finding contains word count.
Class
public class JSearch
JSearch supports various types of documents with open source engines.
And this library contains 3 types of functions. extract...() and isContainsKeyword...() and getFileList...()
HWP, DOC, PPT, EXCEL, TEXT, PDF and UNKNOWN are supported.
Modifier and Type | Method and Description |
---|---|
static java.lang.String | extractContentsFromFile(java.io.File target) extract string |
static java.lang.String | extractContentsFromFile(java.lang.String filePath) extract string |
static java.util.List | getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword) get a list of files which are containing keyword. |
static java.util.List | getFileListContainsKeywordFromDirectory(java.lang.String dirPath, java.lang.String keyword, boolean recursive) get a list of files which are containing keyword. |
static boolean | isContainsKeywordFromFile(java.io.File file, java.lang.String keyword) get true or false about containing keyword. |
static boolean | isContainsKeywordFromFile(java.lang.String filePath, java.lang.String keyword) get true or false about containing keyword. |
本文由用户 jopen 自行上传分享,仅供网友学习交流。所有权归原作者,若您的权利被侵害,请联系管理员。
转载本站原创文章,请注明出处,并保留原始链接、图片水印。
本站是一个以用户分享为主的开源技术平台,欢迎各类分享!