資源描述:
《現(xiàn)代漢語自動(dòng)分詞歧義的分析及其消歧處理的研究》由會(huì)員上傳分享,免費(fèi)在線閱讀,更多相關(guān)內(nèi)容在學(xué)術(shù)論文-天天文庫。
1、碩士學(xué)位論文摘要自然語言處理的目標(biāo)之一就是找到一種方法把由簡單詞序列構(gòu)成的句子中加上豐富的注釋符號(hào),使句子表達(dá)的含義從有結(jié)構(gòu)標(biāo)注的句子中比無結(jié)構(gòu)信息標(biāo)注的句子中更易于抽取出來。眾所周知,漢語的句子是由字串構(gòu)成的,詞與詞之間沒有空格隔開。因此漢語的自動(dòng)分詞就成了漢語信息處理的第一步,同時(shí)也是后續(xù)的詞性標(biāo)注、句法分析和語義分析的基礎(chǔ)。其中,分詞歧義排歧和未登錄詞識(shí)別成為漢語自動(dòng)分詞的兩大障礙,本文集中研究了分詞歧義的特征與消歧處理。首先給m了漢語分詞及其歧義類型的形式化描述;然后分別對(duì)兩種主要分詞歧義及其消歧處理做了詳細(xì)的研究;最后給出了實(shí)驗(yàn)結(jié)果。對(duì)于組合型歧義,通過語料庫
2、學(xué)習(xí)到歧義字段的消歧規(guī)則列表并對(duì)其進(jìn)行優(yōu)化,然后用來糾正這些歧義字段的分詞錯(cuò)誤。與語言學(xué)專家總結(jié)的規(guī)則相比,自動(dòng)學(xué)習(xí)到的規(guī)則更客觀、更全面、更節(jié)省人力,是今后計(jì)算語言學(xué)研究的發(fā)展方向。對(duì)于交集型歧義,首先通過語料庫學(xué)習(xí)到了每一類交集型歧義的消歧規(guī)則,然后用于糾正交集型歧義字段。同時(shí)也采用了最大概率算法和查表的方法對(duì)交集型歧義字段進(jìn)行排歧,實(shí)驗(yàn)取得了較好的效果。關(guān)鍵詞:自然語言處理;自動(dòng)分詞;交集型歧義;組合型歧義;現(xiàn)代漢語自動(dòng)分詞歧義分析及其消歧處理研究AbstractOnegoalofnanlrallanguageprocessingistodiscoVerameth
3、odforass培ningarichstmcnlralannotationtosemencesthatarepresentedassimplelinearstringsofwords,meaJlingcanbemorereadilvextracted丘omastmcnIrallvaIlIlotatedsentencethan丘DmasentencewithnostmctllralinfbⅡnation.BecauseitiswellknownthatChinesesemenceconsistsofasequenceofChinesecharacters,Cllinese
4、wordsegmentationbecamethefirstst印ofCmnese協(xié)fonnationprocessing.Moreovetitistllefoundationofpartofspeechtaggin&syntaxanalysisandsemanticanalysis.Wordsegmentationambiguousa11ddistinguish丘omu11lmownwordintoChinesewordsarethetwoobstaclesinthetechn0109iesofChinesewordsegmentation.thisdissertat
5、ionfocusontheresearchofthecharacteristicsanddisambi譬uationofambiguoussegmentation.Firstlv,thisdissertationDresentstheformalizationdcscriptionorChinesewordsegmentationanditsmainambi里uoustvpes;secondlMeachofthesetwoambiguousanditsdisambiguationmethodswerethoroughstudied:fina兒yIcxperimental
6、陀sultsweregivenTocombinationambiguoussegmenLa“on,weacquireandoptimizedisambiguationruleslistthroughcorpus,thenapplythemlestocorrectambiguousse譬mentation.Comparedwiththemlcscreatedmanuallvbv1an譬ua譬eexperts,theautomatlcacqulnngmlesaremoreobJectIve,morecomprehenslVe,andmoresaVm&1tisthefmllr
7、edirectionofcomputationallinguisticsresearch.Tbovcrlappingambiguoussegmentation,thedis砌biguationnllesofeachambiguousclassareacquiredthrou曲corpustoo.a(chǎn)ndcorrectthe鋤biguousse舯entation.Atthes鋤etimc,山isdissenationalsousedthemethodsofbasedonmaximmDrobabilitv州thmeticandbasedonse