ICTCLAS2010接口文档

wccy100

贡献于2015-09-28

字数:37648 关键词:

 kevinzhang@bit.edu.cn ICTCLAS2010接口文档 Http://hi.baidu.com/drkevinzhang/ 2010-1 Online testing can be available on http://http://hi.baidu.com/drkevinzhang//test.html For the latest information about ICTCLAS, please visit Http://hi.baidu.com/drkevinzhang/ ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn 目录 ICTCLAS2010接口文档 1 目录 2 ICTCLAS介绍 2 1.C++接口 4 1.1 ICTCLAS_Init 4 1.2 ICTCLAS_Exit 5 1.3 ICTCLAS_ImportUserDict 6 1.4 ICTCLAS_ParagraphProcess 8 1.5 ICTCLAS_ParagraphProcessA 9 1.6 ICTCLAS_FileProcess 11 1.7 ICTCLAS_GetParagraphProcessAWordCount 12 1.8 ICTCLAS_ ParagraphProcessAW 15 1.9 ICTCLAS_AddUserWord 16 1.10 ICTCLAS_SaveTheUsrDic 17 1.11 ICTCLAS_DelUsrWord 18 1.12 ICTCLAS_KeyWord 20 1.13 ICTCLAS_FingerPrint 21 1.14 ICTCLAS_SetPOSmap 23 2.JNI接口 24 2.1 ICTCLAS_Init 24 2.2 ICTCLAS_Exit 26 2.3 ICTCLAS_ImportUserDict 27 2.4 ICTCLAS_ParagraphProcess 29 2.5 ICTCLAS_FileProcess 30 2.6 ICTCLAS_IsWord 31 2.7 ICTCLAS_GetUniProb 33 2.8 nativeProcAPara 34 2.9 ICTCLAS_SaveTheUsrDic 36 2.10 ICTCLAS_DelUsrWord 38 2.11 ICTCLAS_KeyWord 39 2.12 ICTCLAS_FingerPrint 41 2.13 ICTCLAS_SetPOSmap 43 作者简介 45 ICTCLAS介绍 我们在多年研究工作积累的基础上,研制出了汉语词法分析系统ICTCLAS(Institute of Computing Technology, Chinese Lexical Analysis System),主要功能包括中文分词;词性标注;命名实体识别;新词识别;同时支持用户词典。我们先后精心打造五年,内核升级 ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn 7次,目前已经升级到了ICTCLAS2010。 选择ICTCLAS2008的五大理由: 1.综合性能最优   分词系统能否达到实用性要求主要取决于两个因素:分词精度与分析速度,这两者相互制约,难以平衡。大多数系统往往陷入“快而不准,准而不快”的窘境。我们研制出了完美PDAT大规模知识库管理技术(200510130690.3),在高速度与高精度之间取得了重大突破,该技术可以管理百万级别的词典知识库,单机每秒可以查询100万词条,而内存消耗不到知识库大小的1.5倍。基于该技术,ICTCLAS2010分词速度单机996KB/s,分词精度98.45%,API不超过200KB,各种词典数据压缩后不到3M,是当前世界上最好的汉语词法分析器。 2.统一的语言计算理论框架   汉语分词牵涉到汉语分词、未定义词识别、词性标注以及语言特例等多个因素,大多数系统缺乏统一的处理方法,往往采用松散耦合的模块组合方式,最终模型并不能准确有效地表达千差万别的语言现象,而ICTCLAS采用了层叠隐马尔可夫模型(Hierarchical Hidden Markov Model),将汉语词法分析的所有环节都统一到了一个完整的理论框架中,获得最好的总体效果,相关理论研究发表在顶级国际会议和杂志上,从理论上和实践上都证实了该模型的先进性。 3.全方位支持各种环境下的应用开发   ICTCLAS全部采用C/C++编写,支持Linux、FreeBSD及Windows系列操作系统,支持C/C++/C#/Delphi/Java等主流的开发语言; 4.应需而变,量身定做   所有功能模块均可拆卸组装,ICTCLAS有GB2312和BIG5版本,可分别处理目简繁体中文;支持当前广泛承认的分词和词类标准,包括计算所词类标注集ICTPOS3.0,北大标准、滨州大学标准、国家语委标准、台湾“中研院”、香港“城市大学”;用户可以直接自定义输出的词类标准,定义输出格式;用户可以根据自己的需求,进行量身自助式定做适合自己的分词系统。 5.国内和国际权威的公开评测、五万客户的认可   有些公司为了商业目的,关门自测,自称准确度99.50%,没有介绍测试环境和测试方法,封闭测试或者小规模的开放测试准确度100%都不足为奇的,ICTCLAS1.0在国内973专家组组织的评测中活动获得了第一名,ICTCLAS2.0在第一届国际中文处理研究机构SigHan组织的评测中都获得了多项第一名,具体的参见系统评测部分。这些都是权威机构进行大规模现场开放测试的结果,真实可信。 ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn 目前,ICTCLAS已经向国内外的企业和学术机构颁发了30,000多份授权,其中包括腾讯、NEC、中华商务网、硅谷动力、云南日报等企业,北京大学、清华大学、华南理工、麻省大学:同时,ICTCLAS广泛地被《科学时报》、《人民日报》海外版、《科技日报》等多家媒体报道。您可以访问Google进一步了解ICTCLAS的应用情况。 1.C++接口 1.1 ICTCLAS_Init Init the analyzer and prepare necessary data for ICTCLAS according the configure file. bool ICTCLAS_Init(const char * sInitDirPath=0); Routine Required Header ICTCLAS_Init Return Value Return true if init succeed. Otherwise return false. Parameters sInitDirPath: Initial Directory Path, where file Configure.xml and Data directory stored. the default value is 0, it indicates the initial directory is current working directory path Remarks The ICTCLAS_Init function must be invoked before any operation with ICTCLAS. The whole system need call the function only once before starting ICTCLAS. When stopping the system and make no more operation, ICTCLAS_Exit should be invoked to destroy all working buffer. Any operation will fail if init do not succeed. ICTCLAS_Init fails mainly because of two reasons: 1) Required data is incompatible or missing 2) Configure file missing or invalid parameters. Moreover, you could learn more from the log file ictclas.log in the default directory. Example #include "ICTCLAS30.h" #include #include ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { sResult = ICTCLAS_ParagraphProcess(sString,0); printf("%s\nInput string now('q' to quit)!\n", sResult); scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } Output 1.2 ICTCLAS_Exit Exit the program and free all resources and destroy all working buffer used in ICTCLAS. bool ICTCLAS_Exit(); Routine Required Header ICTCLAS_Exit Return Value Return true if succeed. Otherwise return false. Parameters none Remarks ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn The ICTCLAS_Exit function must be invoked while stopping the system and make no more operation. And call ICTCLAS_Init function to restart ICTCLAS. Example #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { sResult = ICTCLAS_ParagraphProcess(sString,1); printf("%s\nInput string now('q' to quit)!\n", sResult); scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } Output 1.3 ICTCLAS_ImportUserDict Import user-defined dictionary from a text file. unsigned int ICTCLAS_ImportUserDict(const char *sFilename); Routine Required Header ICTCLAS_ImportUserDict Return Value ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn The number of lexical entry imported successfully Parameters sFilename: Text filename for user dictionary Remarks The ICTCLAS_ImportUserDict function works properly only if ICTCLAS_Init succeeds. The text dictionary file foramt see User-defined Lexicon. You only need to invoke the function while you want to make some change in your customized lexicon or first use the lexicon. After you import once and make no change again, ICTCLAS will load the lexicon automatically if you set UserDict "on" in the configure file. While you turn UserDict "off", user-defined lexicon would not be applied. Example #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]="张华平于1978年3月9日出生于江西省波阳县。"; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } //Sample4: User-defined dictionary sResult=ICTCLAS_ParagraphProcess("1989年春夏之交的政治风波1989年政治风波24小时降雪量24小时降雨量863计划ABC防护训练APEC会议BB机BP机C2系统C3I系统C3系统C4ISR系统C4I系统CCITT建议",1); printf("Before Adding User-defined lexicon, the result is:\n%s\n",sResult); unsigned int nItems=ICTCLAS_ImportUserDict("userdict.txt");//Import user dictionary printf("%d user-defined lexical entries added!\n",nItems); sResult=ICTCLAS_ParagraphProcess("1989年春夏之交的政治风波1989年政治风波24小时降雪量24小时降雨量863计划ABC防护训练APEC会议BB机BP机C2系统C3I系统C3系统C4ISR系统C4I系统CCITT建议",1); printf("After Adding User-defined lexicon, the result is:\n%s\n",sResult); ICTCLAS_Exit(); return 0; } ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Output Before Adding User-defined lexicon, the result is: 1989年/t 春/tg 夏/tg 之/uzhi 交/ng 的/ude1 政治/n 风波/n 1989年/t 政治/n 风波/n 24/m 小时/n 降雪/vn 量/n 24/m 小时/q 降雨量/n 863/m 计划ABC防护训练APEC会议BB机B P机C2系统C3I系统C3系统C4ISR系统C4I/nt 系统/n CCITT/x 建议/n 14321 user-defined lexical entries added! After Adding User-defined lexicon, the result is: 1989年春夏之交的政治风波/n 1989年政治风波/n 24小时降雪量/n 24小时降雨量/n 863计划/n ABC防护训练/vn APE C会议/nz BB机/n BP机/n C2系统/n C3I系统/n C3系统/n C4ISR系统/n C4I系统/n CCITT建议/t 1.4 ICTCLAS_ParagraphProcess Process a paragraph, and return the result buffer pointer const char * ICTCLAS_ParagraphProcess(const char *sParagraph,int bPOStagged=1); Routine Required Header ICTCLAS_ParagraphProcess Return Value Return the pointer of result buffer. Parameters sParagraph: The source paragraph bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:1. Remarks The ICTCLAS_ParagraphProcess function works properly only if ICTCLAS_Init succeeds. Example ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char *sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { sResult=ICTCLAS_ParagraphProcess(sSentence,1); printf("%s\nInput string now('q' to quit)!\n",sResult); scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } Output 1.5 ICTCLAS_ParagraphProcessA result_t * ICTCLAS_ParagraphProcessA(const char *sParagraph,int *pResultCount); Routine Required Header ICTCLAS_ParagraphProcessA Return Value the pointer of result vector, it is managed by system, user cannot alloc and free it struct result_t{ int start; //start position,词语在输入句子中的开始位置 int length; //length,词语的长度 ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn char sPOS[POS_SIZE];//word type,词性ID值,可以快速的获取词性表 int iPOS;//词性 int word_ID; //如果是未登录词,设成或者-1 int word_type; //区分用户词典;1,是用户词典中的词;,非用户词典中的词 int weight;// word weight }; Parameters sParagraph: The source paragraph pResultCount: pointer to result vector size Remarks The ICTCLAS_ParagraphProcessA function works properly only if ICTCLAS_Init succeeds. Example #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const result_t *pVecResult; int nCount; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } printf("Input sentence now!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { pVecResult=ICTCLAS_ParagraphProcessA(sInput,&nCount); for (int i=0;i Return Value Return true if processing succeed. Otherwise return false. Parameters sSourceFilename: The source file name to be analysized; sResultFilename: The result file name to store the results. bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:1. Remarks The ICTCLAS_FileProcess function works properly only if ICTCLAS_Init succeeds. The output format is customized in ICTCLAS configure. Example #include "ICTCLAS30.h" ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn int main(int argc, char* argv[]) { //Sample2: File text lexical analysis if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } printf("Input sentence now('q' to quit)!\n"); ICTCLAS_FileProcess("Test.txt","Test_result.txt",1); ICTCLAS_Exit(); return 0; } Output 1.7 ICTCLAS_GetParagraphProcessAWordCount Get ProcessAWordCount, API for C# int ICTCLAS_GetParagraphProcessAWordCount(const char *sParagraph); Routine Required Header ICTCLAS_FileProcess Return Value Return the paragraph word count. Parameters sParagraph: The source paragraph Remarks The ICTCLAS_GetParagraphProcessAWordCount function works properly only if ICTCLAS_Init succeeds. The output format is customized in ICTCLAS configure. Example using System; using System.IO; using System.Runtime.InteropServices; ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn namespace win_csharp { [StructLayout(LayoutKind.Explicit)] public struct result_t { [FieldOffset(0)] public int start; [FieldOffset(4)] public int length; [FieldOffset(8)] public int sPos; [FieldOffset(12)] public int sPosLow; [FieldOffset(16)] public int POS_id; [FieldOffset(20)] public int word_ID; [FieldOffset(24)] public int word_type; [FieldOffset(28)] public int weight; } /// /// Class1 的摘要说明。 /// class Class1 { const string path = @"ICTCLAS30.dll"; [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_Init")] public static extern bool ICTCLAS_Init(String sInitDirPath); [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_ParagraphProcess")] public static extern String ICTCLAS_ParagraphProcess(String sParagraph,int bPOStagged); [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_Exit")] public static extern bool ICTCLAS_Exit(); [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_ImportUserDict")] public static extern int ICTCLAS_ImportUserDict(String sFilename); [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_FileProcess")] public static extern bool ICTCLAS_FileProcess(String sSrcFilename,String sDestFilename,int bPOStagged); [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_FileProcessEx")] ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn public static extern bool ICTCLAS_FileProcessEx(String sSrcFilename,String sDestFilename); [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_GetParagraphProcessAWordCount")] static extern int ICTCLAS_GetParagraphProcessAWordCount(String sParagraph); //ICTCLAS_GetParagraphProcessAWordCount [DllImport(path,CharSet=CharSet.Ansi,EntryPoint="ICTCLAS_ParagraphProcessAW")] static extern void ICTCLAS_ParagraphProcessAW(int nCount, [Out,MarshalAs(UnmanagedType.LPArray)] result_t[] result); [DllImport(path, CharSet = CharSet.Ansi, EntryPoint = "ICTCLAS_AddUserWord")] static extern int ICTCLAS_AddUserWord(String sWord); [DllImport(path, CharSet = CharSet.Ansi, EntryPoint = "ICTCLAS_SaveTheUsrDic")] static extern int ICTCLAS_SaveTheUsrDic(); [DllImport(path, CharSet = CharSet.Ansi, EntryPoint = "ICTCLAS_DelUsrWord")] static extern int ICTCLAS_DelUsrWord(String sWord); /// /// 应用程序的主入口点。 /// [STAThread] static void Main(string[] args) { // // TODO: 在此处添加代码以启动应用程序 // if(!ICTCLAS_Init(null)) { System.Console.WriteLine("Init ICTCLAS failed!"); return; } String s ="点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn int count = ICTCLAS_GetParagraphProcessAWordCount(s);//先得到结果的词数 result_t[] result = new result_t[count];//在客户端申请资源 ICTCLAS_ParagraphProcessAW(count,result);//获取结果存到客户的内存中 int i=1; foreach(result_t r in result) { String sWhichDic=""; switch (r.word_type) { case 0: sWhichDic = "核心词典"; break; case 1: sWhichDic = "用户词典"; break; case 2: sWhichDic = "专业词典"; break; default: break; } Console.WriteLine("No.{0}:start:{1}, length:{2},POS_ID:{3},Word_ID:{4}, UserDefine:{5}, Word:{6}\n", i++, r.start, r.length, r.POS_id, r.word_ID, sWhichDic, s.Substring(r.start / 2, r.length / 2)); } ICTCLAS_Exit(); } } } Output 1.8 ICTCLAS_ ParagraphProcessAW Process a paragraph, API for C# void ICTCLAS_ParagraphProcessAW(int nCount,result_t * result); ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Routine Required Header ICTCLAS_FileProcess Return Value Parameters nCount: the paragraph word count. result: Pointer to structure to store results. Remarks The ICTCLAS_ParagraphProcessAW function works properly only if ICTCLAS_Init succeeds. The output format is customized in ICTCLAS configure. Example (见上1.7例子) Output 1.9 ICTCLAS_AddUserWord Add a word to the user dictionary. int ICTCLAS_AddUserWord(const char *sWord); Routine Required Header ICTCLAS_AddUserWord Return Value Return 1 if add succeed. Otherwise return 0. Parameters sWord:the word added. Remarks The ICTCLAS_AddUserWord function works properly only if ICTCLAS_Init succeeds. Example #include "ICTCLAS30.h" #include #include ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } ICTCLAS_AddUserWord(“爱思客 n”);//添加词:爱思客\t词性。其中“爱思客”为要添加的词,“n”为词的词性,”\t”为分隔符 printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { sResult = ICTCLAS_ParagraphProcess(sString,0); printf("%s\nInput string now('q' to quit)!\n", sResult); scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } Output 1.10 ICTCLAS_SaveTheUsrDic Save the user dictionary to disk. int ICTCLAS_SaveTheUsrDic(); Routine Required Header ICTCLAS_SaveTheUsrDic Return Value Return 1 if save succeed. Otherwise return 0. Parameters Remarks ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn The ICTCLAS_SaveTheUsrDic function works properly only if ICTCLAS_Init succeeds. Example #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } ICTCLAS_AddUserWord(“爱思客n”);//你好\t词性 ICTCLAS_SaveTheUsrDic();//保存用户词典 printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { sResult = ICTCLAS_ParagraphProcess(sString,0); printf("%s\nInput string now('q' to quit)!\n", sResult); scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } Output 1.11 ICTCLAS_DelUsrWord Delete a word from the user dictionary. int ICTCLAS_DelUsrWord(const char *sWord); Routine Required Header ICTCLAS_DelUsrWord ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Return Value Return -1, the word not exist in the user dictionary; else, the handle of the word deleted Parameters sWord:the word to be delete. Remarks The ICTCLAS_DelUsrWord function works properly only if ICTCLAS_Init succeeds. Example #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } ICTCLAS_AddUserWord(“iThinker n”);//你好\t词性 ICTCLAS_AddUserWord(“爱思客 n”); ICTCLAS_DelUsrWord(“iThinker”);//删除iThinker ICTCLAS_SaveTheUsrDic();//保存用户词典 printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { sResult = ICTCLAS_ParagraphProcess(sString,0); printf("%s\nInput string now('q' to quit)!\n", sResult); scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Output 1.12 ICTCLAS_KeyWord Extract keyword from paragraph. int ICTCLAS_KeyWord(result_t * resultKey, int &nCountKey); Routine Required Header ICTCLAS_KeyWord) Return Value Return 1 if excute succeed. Otherwise return 0. Parameters resultKey, the returned key word. nCountKey, the returned key num. Remarks ICTCLAS_ParagraphProcessAW or ICTCLAS_ParagraphProcessA must excute before ICTCLAS_KeyWord . Example #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn ICTCLAS_AddUserWord(“你好 n”);//你好\t词性 ICTCLAS_SaveTheUsrDic(); printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); Int nCount = 0; while(_stricmp(sSentence,"q")!=0) { ICTCLAS_ParagraphProcessA(sString,&nCount); //关键词提取,须在ICTCLAS_ParagraphProcessAW函数执行完后执行 result_t *resultKey = (result_t*)malloc(sizeof(result_t)*nCount); int nCountKey; ICTCLAS_KeyWord(resultKey, nCountKey); for (int i=0; i ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Return Value 0, failed; else, the finger print of the content Parameters Remarks ICTCLAS_ParagraphProcessAW or ICTCLAS_ParagraphProcessA must excute before ICTCLAS_FingerPrint. Example #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } ICTCLAS_AddUserWord(“你好 n”);//你好\t词性 ICTCLAS_SaveTheUsrDic(); printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); Int nCount = 0; while(_stricmp(sSentence,"q")!=0) { ICTCLAS_ParagraphProcessA(sString,&nCount); //指纹提取,须在ICTCLAS_ParagraphProcessAW函数执行完后执行 unsigned long lFinger = ICTCLAS_FingerPrint(); ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } Output 1.14 ICTCLAS_SetPOSmap select which pos map will use. int ICTCLAS_SetPOSmap(int nPOSmap); Routine Required Header ICTCLAS_SetPOSmap Return Value Return 1 if excute succeed. Otherwise return 0. Parameters Parameters :nPOSmap : ICT_POS_MAP_FIRST 计算所一级标注集 ICT_POS_MAP_SECOND 计算所二级标注集 PKU_POS_MAP_SECOND 北大二级标注集 PKU_POS_MAP_FIRST 北大一级标注集 Remarks The ICTCLAS_SetPOSmap function works properly only if ICTCLAS_Init succeeds. Example #include "ICTCLAS30.h" #include #include int main(int argc, char* argv[]) { //Sample1: Sentence or paragraph lexical analysis with only one result char sSentence[2000]; const char * sResult; if(!ICTCLAS_Init()) { printf("Init fails\n"); return -1; } ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn ICTCLAS_SetPOSmap(ICT_POS_MAP_FIRST); printf("Input sentence now('q' to quit)!\n"); scanf("%s",sSentence); while(_stricmp(sSentence,"q")!=0) { sResult = ICTCLAS_ParagraphProcess(sString,0); printf("%s\nInput string now('q' to quit)!\n", sResult); scanf("%s",sSentence); } ICTCLAS_Exit(); return 0; } Output 2.JNI接口 2.1 ICTCLAS_Init Init the analyzer and prepare necessary data for ICTCLAS according the configure file. boolean ICTCLAS_Init(byte[] sPath); Routine Required Header ICTCLAS_Init ICTCLAS.I3S.AC.ICTCLAS30 Return Value Return true if init succeed. Otherwise return false. Parameters sPath: Initial Directory Path, where file Configure.xml and Data directory stored. Remarks The ICTCLAS_Init function must be invoked before any operation with ICTCLAS. The whole system need call the function only once before starting ICTCLAS. When stopping the system and make no more operation, ICTCLAS_Exit should be invoked to destroy all working buffer. Any operation will fail if init do not succeed. ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn ICTCLAS_Init fails mainly because of two reasons: 1) Required data is incompatible or missing 2) Configure file missing or invalid parameters. Moreover, you could learn more from the log file ictclas.log in the default directory. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; String encoding="UTF-8"; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes(),encoding.getBytes()) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); testICTCLAS30.ICTCLAS_Exit(); } } Output Init Success! 点/n 击/vg 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj 2.2 ICTCLAS_Exit Exit the program and free all resources and destroy all working buffer used in ICTCLAS. boolean ICTCLAS_Exit(); ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Routine Required Header ICTCLAS_Exit ICTCLAS.I3S.AC.ICTCLAS30 Return Value Return true if succeed. Otherwise return false. Parameters none Remarks The ICTCLAS_Exit function must be invoked while stopping the system and make no more operation. And call ICTCLAS_Init function to restart ICTCLAS. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes("GB2312")) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); testICTCLAS30.ICTCLAS_Exit(); } } ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Output Init Success! 点/n 击/vg 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj 2.3 ICTCLAS_ImportUserDict Import user-defined dictionary from a text file. int ICTCLAS_ImportUserDict(byte[] sPath); Routine Required Header ICTCLAS_ImportUserDict ICTCLAS.I3S.AC.ICTCLAS30 Return Value The number of lexical entry imported successfully Parameters sPath: Text filename for user dictionary Remarks The ICTCLAS_ImportUserDict function works properly only if ICTCLAS_Init succeeds. The text dictionary file foramt see User-defined Lexicon. You only need to invoke the function while you want to make some change in your customized lexicon or first use the lexicon. After you import once and make no change again, ICTCLAS will load the lexicon automatically if you set UserDict "on" in the configure file. While you turn UserDict "off", user-defined lexicon would not be applied. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn if (testICTCLAS30.ICTCLAS_Init(argu.getBytes("GB2312")) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println("Before Import User Dictionary: "+ nativeStr); str="userdict.txt"; int nCount=testICTCLAS30.ICTCLAS_ImportUserDict(str.getBytes("GB2312")); System.out.println("Import User Dictionary entries: "+ nCount); str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println("After Import User Dictionary: "+ nativeStr); testICTCLAS30.ICTCLAS_Exit(); } } Output Init Success! Before Import User Dictionary: 点/n 击/vg 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj Import User Dictionary entries: 3 After Import User Dictionary: 点击/v 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj 2.4 ICTCLAS_ParagraphProcess Process a paragraph, and return the result buffer pointer byte[] ICTCLAS_ParagraphProcess(byte[] sSrc,int bPOSTagged); Routine Required Header ICTCLAS_ParagraphProcess ICTCLAS.I3S.AC.ICTCLAS30 ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Return Value Return the pointer of result buffer. Parameters sSrc: The source paragraph bPOSTagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging. Remarks The ICTCLAS_ParagraphProcess function works properly only if ICTCLAS_Init succeeds. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes("GB2312")) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); testICTCLAS30.ICTCLAS_Exit(); } } Output Init Success! ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn 点/n 击/vg 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj 2.5 ICTCLAS_FileProcess Process a text file boolean ICTCLAS_FileProcess(byte[] sSrcFilename,byte[] sDestFilename,int bPOSTagged); Routine Required Header ICTCLAS_FileProcess ICTCLAS.I3S.AC.ICTCLAS30 Return Value Return true if processing succeed. Otherwise return false. Parameters sSrcFilename: The source file name to be analysized; sDestFilename: The result file name to store the results. bPOSTagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging. Remarks The ICTCLAS_FileProcess function works properly only if ICTCLAS_Init succeeds. The output format is customized in ICTCLAS configure. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes("GB2312")) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); //file analysis String argu1 ="Test.txt"; String argu2 ="Test_result.txt"; testICTCLAS30.ICTCLAS_FileProcess(argu1.getBytes("GB2312"),argu2.getBytes("GB2312"),0); testICTCLAS30.ICTCLAS_Exit(); } } Output Init Success! 点/n 击/vg 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj 2.6 ICTCLAS_IsWord Search in the lexicon, and determin whether the word is listed in the core lexicon. boolean ICTCLAS_IsWord(byte[] sWord); Routine Required Header ICTCLAS_IsWord ICTCLAS.I3S.AC.ICTCLAS30 Return Value Return true if exists. Otherwise return false. Parameters sWord:word to be searched. Remarks The ICTCLAS_IsWord function works properly only if ICTCLAS_Init succeeds. Example ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes("GB2312")) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); str="中国"; boolean bExist=testICTCLAS30.ICTCLAS_IsWord(str.getBytes("GB2312")); System.out.println(str+" exists?"+bExist); float dProb=testICTCLAS30.ICTCLAS_GetUniProb(str.getBytes("GB2312")); System.out.println(str+" Probability= "+dProb); testICTCLAS30.ICTCLAS_Exit(); } } Output Init Success! 点/n 击/vg 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj 中国 exists?true 中国 Probability= 0.0022491838 2.7 ICTCLAS_GetUniProb Get the unigram probability of a given word. float ICTCLAS_GetUniProb(byte[] sWord); ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Routine Required Header ICTCLAS_GetUniProb ICTCLAS.I3S.AC.ICTCLAS30 Return Value Return the unigram probability after some simple smoothing technique. Parameters sWord:word to be queried. Remarks The ICTCLAS_GetUniProb function works properly only if ICTCLAS_Init succeeds. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes("GB2312")) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); str="中国"; boolean bExist=testICTCLAS30.ICTCLAS_IsWord(str.getBytes("GB2312")); System.out.println(str+" exists?"+bExist); float dProb=testICTCLAS30.ICTCLAS_GetUniProb(str.getBytes("GB2312")); System.out.println(str+" Probability= "+dProb); ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn testICTCLAS30.ICTCLAS_Exit(); } } Output Init Success! 点/n 击/vg 下载/v 超女/nz 纪敏佳/nr 深受/v 观众/n 喜爱/vn 。/wj 禽流感/n 爆发/v 在/p 非典/nz 之后/f 。/wj 中国 exists?true 中国 Probability= 0.0022491838 2.8 nativeProcAPara public native byte[] nativeProcAPara(byte[] src); Routine Required Header nativeProcAPara Return Value the pointer of result vector, it is managed by system, user cannot alloc and free it struct result_t{ int start; //start position,词语在输入句子中的开始位置 int length; //length,词语的长度 char sPOS[POS_SIZE];//word type,词性ID值,可以快速的获取词性表 int iPOS;//词性 int word_ID; //如果是未登录词,设成或者-1 int word_type; //区分用户词典;1,是用户词典中的词;,非用户词典中的词 int weight;// word weight }; Parameters src: The source paragraph ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Remarks The nativeProcAPara function works properly only if ICTCLAS_Init succeeds. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; String encoding="UTF-8"; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes(),encoding.getBytes()) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.nativeProcAPara(str.getBytes("GB2312")); int nativeElementSize = 4 * 6 +8;//size of result_t in native code int nElement = nativeBytes.length / nativeElementSize; ByteArrayInputStream(nativeBytes)); nativeBytes = new byte[nativeBytes.length]; Result[] resultArr = new Result[nElement]; DataInputStream dis = new DataInputStream(new ByteArrayInputStream(nativeBytes)); for (int i = 0; i < nElement; i++) { ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn resultArr[i] = new Result(); resultArr[i].start = Integer.reverseBytes(dis.readInt()); resultArr[i].length = Integer.reverseBytes(dis.readInt()); dis.skipBytes(8); resultArr[i].posId = Integer.reverseBytes(dis.readInt()); resultArr[i].wordId = Integer.reverseBytes(dis.readInt()); resultArr[i].word_type = Integer.reverseBytes(dis.readInt()); resultArr[i].weight = Integer.reverseBytes(dis.readInt()); } dis.close(); for (int i = 0; i < resultArr.length; i++) { System.out.println("start=" + resultArr[i].start + ",length=" + resultArr[i].length + "pos=" + resultArr[i].posId + "word=" + resultArr[i].wordId + " weight=" + resultArr[i].weight); } testICTCLAS30.ICTCLAS_Exit(); } } Output 2.9 ICTCLAS_SaveTheUsrDic Save the user dictionary to disk. public native int ICTCLAS_SaveTheUsrDic(); Routine Required Header ICTCLAS_SaveTheUsrDic Return Value Return 1 if save succeed. Otherwise return 0. Parameters Remarks The ICTCLAS_SaveTheUsrDic function works properly only if ICTCLAS_Init succeeds. Example ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; String encoding="UTF-8"; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes(),encoding.getBytes()) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //动态添加用户词 str = "爱思客"; testICTCLAS30.ICTCLAS_AddUserWord(str.getBytes("GB2312")); testICTCLAS30.ICTCLAS_SaveTheUsrDic();//保存用户词典 //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); testICTCLAS30.ICTCLAS_Exit(); } } Output 2.10 ICTCLAS_DelUsrWord Delete a word from the user dictionary. public native int ICTCLAS_DelUsrWord(byte[] sWord); Routine Required Header ICTCLAS_DelUsrWord ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Return Value Return -1, the word not exist in the user dictionary; else, the handle of the word deleted Parameters sWord:the word to be delete. Remarks The ICTCLAS_DelUsrWord function works properly only if ICTCLAS_Init succeeds. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; String encoding="UTF-8"; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes(),encoding.getBytes()) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //动态添加用户词 str = "爱思客"; testICTCLAS30.ICTCLAS_AddUserWord(str.getBytes("GB2312")); str = "iThinker"; testICTCLAS30.ICTCLAS_AddUserWord(str.getBytes("GB2312")); testICTCLAS30.ICTCLAS_DelUsrWord(str.getBytes("GB2312")); testICTCLAS30.ICTCLAS_SaveTheUsrDic();//保存用户词典 //analysis ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); testICTCLAS30.ICTCLAS_Exit(); } } Output 2.11 ICTCLAS_KeyWord Extract keyword from paragraph. public int ICTCLAS_KeyWord(byte[] resultKey, int nCount); Routine Required Header ICTCLAS_KeyWord) Return Value Return the key num. Parameters resultKey, the returned key word. nCount,the word num.can be get as follow: nativeBytes = testICTCLAS30.nativeProcAPara(str.getBytes("GB2312")); int nativeElementSize = 4 *6+8;//size of result_t in native code,depend on the complie Remarks nativeProcAPara must excute before ICTCLAS_KeyWord . Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; String encoding="UTF-8"; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes(),encoding.getBytes()) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); //关键词提取 int nCountKey = 0; str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.nativeProcAPara(str.getBytes("GB2312")); int nativeElementSize = 4 * 6 +8;//size of result_t in native code int nElement = nativeBytes.length / nativeElementSize; ByteArrayInputStream(nativeBytes)); nativeBytes = new byte[nativeBytes.length]; nCountKey = testICTCLAS30.ICTCLAS_KeyWord(nativeBytes, nElement); Result[] resultArr = new Result[nCountKey]; DataInputStream dis = new DataInputStream(new ByteArrayInputStream(nativeBytes)); for (int i = 0; i < nCountKey; i++) { resultArr[i] = new Result(); resultArr[i].start = Integer.reverseBytes(dis.readInt()); resultArr[i].length = Integer.reverseBytes(dis.readInt()); dis.skipBytes(8); resultArr[i].posId = Integer.reverseBytes(dis.readInt()); resultArr[i].wordId = Integer.reverseBytes(dis.readInt()); resultArr[i].word_type = Integer.reverseBytes(dis.readInt()); resultArr[i].weight = Integer.reverseBytes(dis.readInt()); ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn } dis.close(); for (int i = 0; i < resultArr.length; i++) { System.out.println("start=" + resultArr[i].start + ",length=" + resultArr[i].length + "pos=" + resultArr[i].posId + "word=" + resultArr[i].wordId + " weight=" + resultArr[i].weight); } testICTCLAS30.ICTCLAS_Exit(); } } Output 2.12 ICTCLAS_FingerPrint Extract a finger print from the paragraph . public native long ICTCLAS_FingerPrint(); Routine Required Header ICTCLAS_FingerPrint Return Value 0, failed; else, the finger print of the content Parameters Remarks nativeProcAPara must excute before ICTCLAS_FingerPrint. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; String encoding="UTF-8"; ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn if (testICTCLAS30.ICTCLAS_Init(argu.getBytes(),encoding.getBytes()) == false) { System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.nativeProcAPara(str.getBytes("GB2312")); long finger = testICTCLAS30.ICTCLAS_FingerPrint();//指纹提取 testICTCLAS30.ICTCLAS_Exit(); } } //关键词提取,须在ICTCLAS_ParagraphProcessAW函数执行完后执行 result_t *resultKey = (result_t*)malloc(sizeof(result_t)*nCount); int nCountKey; ICTCLAS_KeyWord(resultKey, nCountKey); for (int i=0; i Return Value Return 1 if excute succeed. Otherwise return 0. Parameters Parameters :nPOSmap : ICT_POS_MAP_FIRST 计算所一级标注集 ICT_POS_MAP_SECOND 计算所二级标注集 PKU_POS_MAP_SECOND 北大二级标注集 PKU_POS_MAP_FIRST 北大一级标注集 Remarks The ICTCLAS_SetPOSmap function works properly only if ICTCLAS_Init succeeds. Example import ICTCLAS.I3S.AC.ICTCLAS30; import java.util.*; import java.io.*; public class TestICTCLAS30 { public static void main(String[] args) throws Exception{ ICTCLAS30 testICTCLAS30 = new ICTCLAS30(); //init String argu=""; String encoding="UTF-8"; if (testICTCLAS30.ICTCLAS_Init(argu.getBytes(),encoding.getBytes()) == false) { ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn System.out.println("Init Fail!"); return ; } System.out.println("Init Success!"); testICTCLAS30. ICTCLAS_SetPOSmap(ICT_POS_MAP_FIRST); //analysis str = "点击下载超女纪敏佳深受观众喜爱。禽流感爆发在非典之后。"; nativeBytes = testICTCLAS30.ICTCLAS_ParagraphProcess(str.getBytes("GB2312"),1); nativeStr = new String(nativeBytes,0,nativeBytes.length,"GB2312"); System.out.println(nativeStr); testICTCLAS30.ICTCLAS_Exit(); } } Output 作者简介 张华平 博士 副研究员 硕导 北京理工大学计算机语言信息处理研究所 副所长 地址:北京海淀区中关村南大街5号 Email:kevinzhang@bit.edu.cn MSN: pipy_zhang@msn.com; 个人空间:http://hi.baidu.com/drkevinzhang Dr. Kevin Zhang (张华平,Zhang Hua-Ping) Associate Professor, Graduate Supervisor Vice Director , Institute of Computer Language and Information Processing Beijing Institute of Technology ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51 kevinzhang@bit.edu.cn Email:kevinzhang@bit.edu.cn MSN: pipy_zhang@msn.com; Space:http://hi.baidu.com/drkevinzhang ICTCLAS Copyright © 2010 Kevin Zhang. All rights reserved. /51

下载文档,方便阅读与编辑

文档的实际排版效果,会与网站的显示效果略有不同!!

需要 8 金币 [ 分享文档获得金币 ]
0 人已下载

下载文档

相关文档