資源描述:
《語料庫語料庫研究方法概述.ppt》由會員上傳分享,免費在線閱讀,更多相關內容在教育資源-天天文庫。
1、選題、設計與方法Putitaltogether李文中中國外語教育研究中心2012語料庫不是人學的,正則表達式不是女人學的。Corpus-drivenisbasicallycorpusbased.Anycorpus-basedresearchisnecessarilydrivenbycorpusdata.目標:通過語料庫分析和研究:驗證假設、直覺獲得新發(fā)現(xiàn)建立新的假設構建新的理論驗證已有的發(fā)現(xiàn)解決難題創(chuàng)新:數據方法技術解讀/理論/視角√新√√√√√√√√√√基于語料庫方法是一種驗證程序語料庫驅動方法是一種發(fā)現(xiàn)程序理據:
2、任何感知都是推斷Anyperceptionisbutinferencing.worldofrealityworldoftextEinsteinGulfUnbridgeable眼耳鼻舌身意色聲香味觸法學問思辨行文本基本步驟:確定題目提出問題確定總體和樣本選擇工具處理數據描述結果:分類、總結特征(description)解釋結果:觀察、描述、解釋(explanation)解讀結果:意義、價值、應用(interpretation)IdentifyingaproblemSomethingorphenomenon:outofe
3、xpectationIncongruentNeedasolutionpuzzlingReadingtobebetterinformedWhathasbeendoneascontributionWhathasbeenleftundoneWhathasbeendonewrongNevercountsomeoneelse’smoney.FormulatingresearchquestionsNaming:whatis…Classificatory:Howaretheyinterrelated(patterned)?Expl
4、anatory:towhatextentdotheyco-occur?Predictive:Whatwillhappenif…?Neveraskaquestiontowhichyoualreadyknowtheanswer;neverask'howto'questionFindingamethodPopulationSampleSamplingP(population)S(Sample)R(Result)I(Interpretation)SamplingvalidityreliabilityValidityGener
5、alizabilityIFP?SS?RR?ITHENI?PDescriptiveresearchsingletexttextvs.textpeoplevs.textResearchquestionsHowmanydifferentwordformsareusedinthetext?Howmanyrunningwordsareused?Whatistheirdistribution?Towhatextentcanthelevelofdifficultyofthetextbecomputedonthebasisofthe
6、gradedwordlists?Howmanydifferentwordclassesareused?Whatisthenumberofeachwordclass?MethodToanswerRQ1,generateawordlistofthegiventextandobserve:ThenumberoftypesThenumberoftokensthetype/tokenratio(TTR)Ifthetextisverylarge,standardizetheTTRthetypesandtheirfrequency
7、cumulativepercentageToanswerRQ2,computethewordlistagainstabatchofgradedwordlists,andobserve:HowmanytypesonLevel1,2,and3listsareusedinthetext?Andwhatistheirpercentage?Whatabouttheirtokens?Howmanytypesthatarenotonanylistareusedinthetext?Summarizetheirfeatures.Toa
8、nswerRQ3,retrieveeachwordclassfromthePOStaggedtext,andsortthemonfrequencyindecreasingorderRetrieveallthenouns,verbs,andadjectivesSortthelistInstrumentsUseAntconc3.0togenerat