資源描述:
《推薦系統(tǒng)搭建全程圖文攻略.doc》由會員上傳分享,免費在線閱讀,更多相關(guān)內(nèi)容在行業(yè)資料-天天文庫。
1、北京尚學堂大數(shù)據(jù)班資料推薦系統(tǒng)搭建全程圖文攻略一.推薦系統(tǒng)架構(gòu)簡介整體推薦架構(gòu)圖:1.推薦整體從數(shù)據(jù)處理開始,默認數(shù)據(jù)從關(guān)系型數(shù)據(jù)到每天增量導入到hive,在hive中通過中間表和調(diào)用python文件等一系列操作,將數(shù)據(jù)處理為算法數(shù)學建模的入口數(shù)據(jù),這里只是模擬一下,所以用一個scala文件產(chǎn)生所有準備數(shù)據(jù),并直接load到hive中去做數(shù)據(jù)處理2.數(shù)據(jù)處理完以后開始數(shù)學建模,通過recommend.scala文件對邏輯回歸算法的調(diào)用,產(chǎn)生模型文件,將三個模型文件拷貝到dubbox項目的響應(yīng)目錄,啟動項目,訪問測試整
2、個過程默認已經(jīng)有hive環(huán)境,intellijidea的環(huán)境,并且可以執(zhí)行scala文件流程如下:Scala文件產(chǎn)生數(shù)據(jù)èload到hive,處理數(shù)據(jù)èrecommond.scala調(diào)用邏輯回歸算法計算模型,生成模型文件è將模型文件拷貝到項目制定目錄,運行項目è瀏覽器訪問測試二.數(shù)據(jù)預(yù)處理1.創(chuàng)建測試數(shù)據(jù)通過DataGenerator類創(chuàng)建數(shù)據(jù),參見附件DataGenerator.scala文件,傳入?yún)?shù)兩個,數(shù)據(jù)條數(shù)和輸出目錄比如:100000E:推薦系統(tǒng)資料hitop會輸出三個文件北京尚學堂大數(shù)據(jù)班資料2.
3、hive建表真實的生產(chǎn)場景涉及到大概五十張表的字段,這里全部簡化流程,直接給出最終的三張表:應(yīng)用詞表用戶歷史下載表正負例樣本表建表語句:應(yīng)用詞表:CREATEEXTERNALTABLEIFNOTEXISTSdim_rcm_hitop_id_list_ds(hitop_idSTRING,nameSTRING,authorSTRING,sversionSTRING,ischargeSMALLINT,designerSTRING,fontSTRING,icon_countINT,starsDOUBLE,priceINT,f
4、ile_sizeINT,comment_numINT,screenSTRING,dlnumINT)rowformatdelimitedfieldsterminatedby't';用戶歷史下載表:CREATEEXTERNALTABLEIFNOTEXISTSdw_rcm_hitop_userapps_dm(device_idSTRING,devid_applistSTRING,device_nameSTRING,pay_abilitySTRING)rowformatdelimitedfieldsterminatedby
5、't';正負例樣本表:CREATEEXTERNALTABLEIFNOTEXISTSdw_rcm_hitop_sample2learn_dm(labelSTRING,device_idSTRING,hitop_idSTRING,北京尚學堂大數(shù)據(jù)班資料screenSTRING,en_nameSTRING,ch_nameSTRING,authorSTRING,sversionSTRING,mncSTRING,event_local_timeSTRING,interfaceSTRING,designerSTRING,is_
6、safeINT,icon_countINT,update_timeSTRING,starsDOUBLE,comment_numINT,fontSTRING,priceINT,file_sizeINT,ischargeSMALLINT,dlnumINT)rowformatdelimitedfieldsterminatedby't';3.load數(shù)據(jù)分別往三張表load數(shù)據(jù):用戶詞表:loaddatalocalinpath'/opt/sxt/recommender/script/applist.txt'intotabl
7、edim_rcm_hitop_id_list_ds;用戶歷史下載表:loaddatalocalinpath'/opt/sxt/recommender/script/userdownload.txt'intotabledw_rcm_hitop_userapps_dm;正負例樣本表:loaddatalocalinpath'/opt/sxt/recommender/script/sample.txt'intotabledw_rcm_hitop_sample2learn_dm;4.構(gòu)建訓練數(shù)據(jù)1.創(chuàng)建臨時表CREATETAB
8、LEIFNOTEXISTStmp_dw_rcm_hitop_prepare2train_dm(device_idSTRING,labelSTRING,北京尚學堂大數(shù)據(jù)班資料hitop_idSTRING,screenSTRING,ch_nameSTRING,authorSTRING,sversionSTRING,mncSTRING,interfa