前言
研究生畢業(yè)了,方向是機器翻譯,抽空整理一下相關(guān)的資料,希望能幫助其他人。本篇博客將介紹統(tǒng)計機器翻譯工具Moses在Ubuntu上的安裝過程以及Ubuntu的相關(guān)配置。Moses 是一個統(tǒng)計機器翻譯系統(tǒng),可以為任意兩種語言執(zhí)行翻譯任務,使用篇在這 統(tǒng)計機器翻譯(SMT)工具Moses在Ubuntu上的安裝及使用(使用篇)。
Ubuntu配置
我當時做實驗時是用的學校的服務器,系統(tǒng)版本是Ubuntu 16,現(xiàn)在畢業(yè)生登陸賬號已經(jīng)被刪除了,所以在VMware虛擬機中重新安裝了Ubuntu 16.04 LTS,以重現(xiàn)實驗過程,安裝包下載鏈接: Ubuntu 16.04.1 LTS (Xenial Xerus)。之前還在Ubuntu 14(虛擬機)和Deepin 20.1(實體機)上進行安裝并訓練,沒有發(fā)現(xiàn)什么問題,其他的Linux版本也應該可以依此教程安裝。
1、關(guān)閉系統(tǒng)自動休眠(可選)
查看當前自動休眠模式是否開啟:
sudo systemctl status sleep.target
輸出為:
● sleep.target - Sleep
# Sleep的狀態(tài)是loaded,意味著自動休眠模式開啟
Loaded: loaded (/lib/systemd/system/sleep.target; static; vendor preset: enabled)
Active: inactive (dead)
Docs: man:systemd.special(7)
關(guān)閉自動休眠模式:
sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
再次查看當前自動休眠模式:
● sleep.target
# Sleep的狀態(tài)變成了masked,意味著自動休眠模式關(guān)閉
Loaded: masked (/dev/null; bad)
Active: inactive (dead)
2、更換軟件源
從Ubuntu默認的軟件源上安裝軟件的速度不穩(wěn)定,有時候訪問相當慢,換不換源看個人需要,如果你已經(jīng)更換過源請略過。(注意,國內(nèi)軟件源的同步速度不是實時的,如果你想獲取實時更新,請切換到Ubuntu的默認軟件源。)
1、先備份一下目前的軟件源,保存在當前目錄下
sudo cp /etc/apt/sources.list sources.list.old
2、確認Ubuntu的版本(軟件源和Ubuntu的版本要相互對應),在終端中輸入
sudo lsb_release -a
輸出為:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04 LTS
Release: 16.04 # Ubuntu版本號
Codename: xenial
以下提供幾個軟件源的地址:
清華大學軟件源
中國科學技術(shù)大學軟件源
阿里云軟件源
3、修改源配置文件
本文將Ubuntu軟件源更換成阿里源。
命令行輸入:
sudo vi /etc/apt/sources.list
顯示出當前的軟件源配置:
#deb cdrom:[Ubuntu 16.04 LTS _Xenial Xerus_ - Release amd64 (20160420.1)]/ xenial main restricted
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
deb http://us.archive.ubuntu.com/ubuntu/ xenial main restricted
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial main restricted
## Major bug fix updates produced after the final release of the
## distribution.
deb http://us.archive.ubuntu.com/ubuntu/ xenial-updates main restricted
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial-updates main restricted
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## universe WILL NOT receive any review or updates from the Ubuntu security
## team.
deb http://us.archive.ubuntu.com/ubuntu/ xenial universe
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial universe
deb http://us.archive.ubuntu.com/ubuntu/ xenial-updates universe
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial-updates universe
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
deb http://us.archive.ubuntu.com/ubuntu/ xenial multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial multiverse
deb http://us.archive.ubuntu.com/ubuntu/ xenial-updates multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial-updates multiverse
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ xenial-backports main restricted universe multiverse
# deb-src http://us.archive.ubuntu.com/ubuntu/ xenial-backports main restricted universe multiverse
## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu xenial partner
# deb-src http://archive.canonical.com/ubuntu xenial partner
deb http://security.ubuntu.com/ubuntu xenial-security main restricted
# deb-src http://security.ubuntu.com/ubuntu xenial-security main restricted
deb http://security.ubuntu.com/ubuntu xenial-security universe
# deb-src http://security.ubuntu.com/ubuntu xenial-security universe
deb http://security.ubuntu.com/ubuntu xenial-security multiverse
# deb-src http://security.ubuntu.com/ubuntu xenial-security multiverse
將鍵盤調(diào)整到英文輸入模式,長按d,刪除全部默認軟件源。
打開阿里云軟件源選擇對應的版本,復制。
deb https://mirrors.aliyun.com/ubuntu/ xenial main
deb-src https://mirrors.aliyun.com/ubuntu/ xenial main
deb https://mirrors.aliyun.com/ubuntu/ xenial-updates main
deb-src https://mirrors.aliyun.com/ubuntu/ xenial-updates main
deb https://mirrors.aliyun.com/ubuntu/ xenial universe
deb-src https://mirrors.aliyun.com/ubuntu/ xenial universe
deb https://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb-src https://mirrors.aliyun.com/ubuntu/ xenial-updates universe
deb https://mirrors.aliyun.com/ubuntu/ xenial-security main
deb-src https://mirrors.aliyun.com/ubuntu/ xenial-security main
deb https://mirrors.aliyun.com/ubuntu/ xenial-security universe
deb-src https://mirrors.aliyun.com/ubuntu/ xenial-security universe
隨后切換到終端窗口,輸入i切換到輸入模式,右鍵點擊即可將剪貼板文字復制到終端中,按Esc退出編輯,輸入:wq保存文本,如果輸錯了不知道怎么改可以鍵入:q!強制不保存并退出,重新再復制即可。
更新一下軟件包:
sudo apt-get update
更新結(jié)束后,提示如下錯誤:
E: Problem executing scripts APT::Update::Post-Invoke-Success 'if /usr/bin/test -w /var/cache/app-info -a -e /usr/bin/appstreamcli; then appstreamcli refresh > /dev/null; fi'
E: Sub-process returned an error code
依次執(zhí)行
cd /tmp && mkdir asfix
cd asfix
wget https://launchpad.net/ubuntu/+archive/primary/+files/appstream_0.9.4-1ubuntu1_amd64.deb --no-check-certificate
wget https://launchpad.net/ubuntu/+archive/primary/+files/libappstream3_0.9.4-1ubuntu1_amd64.deb --no-check-certificate
sudo dpkg -i *.deb
再執(zhí)行一次更新沒有問題了:
Hit:1 https://mirrors.aliyun.com/ubuntu xenial InRelease
Hit:2 https://mirrors.aliyun.com/ubuntu xenial-updates InRelease
Hit:3 https://mirrors.aliyun.com/ubuntu xenial-security InRelease
Reading package lists... Done
升級一下軟件包
sudo apt-get upgrade
不放心就再執(zhí)行一下:
sudo apt-get update && sudo apt-get upgrade -y
Moses安裝
安裝教程主要參考:
Moses官網(wǎng)
Moses官方手冊,安裝方法在第二章
How to install Moses (Statistical Machine Translation) on Ubuntu?
1、安裝相關(guān)依賴包:
sudo apt-get install build-essential git-core pkg-config automake libtool wget zlib1g-dev libicu-dev python-dev libbz2-dev libsoap-lite-perl subversion libboost-all-dev liblzma-dev graphviz imagemagick make cmake libgoogle-perftools-dev autoconf doxygen
如果遇到包依賴問題可以嘗試使用aptitude包管理器重新安裝:
sudo apt-get install aptitude
sudo aptitude install build-essential git-core pkg-config automake libtool wget zlib1g-dev libicu-dev python-dev libbz2-dev libsoap-lite-perl subversion libboost-all-dev liblzma-dev graphviz imagemagick make cmake libgoogle-perftools-dev autoconf doxygen
2、檢查gcc和g++的版本
在下面安裝IRSTLM的時候高版本的gcc可能會報錯,我測試過gcc 4.8
或者gcc 4.9
都可以順利安裝。
首先打開sources.list
:
sudo vi /etc/apt/sources.list
在末尾處添加如下內(nèi)容:
#gcc-4.9 g++-4.9 g++-4.9-multilib
deb http://dk.archive.ubuntu.com/ubuntu xenial main
deb http://dk.archive.ubuntu.com/ubuntu xenial universe
更新一下:
sudo apt-get update
安裝gcc 4.9
、g++ 4.9
sudo apt-get install gcc-4.9 g++-4.9 g++-4.9-multilib
將gcc 4.9
、g++ 4.9
設置為默認編譯器:
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 50
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 50
如果你安裝了多個gcc
g++
版本,你也可以下面的命令指定默認的編譯器:
sudo update-alternatives --config gcc
sudo update-alternatives --config g++
確認一下當前的編譯器版本:
gcc -v
g++ -v
3、新建Moses的工作目錄和安裝包下載目錄
下面將使用自定義組件的方式編譯Moses,不過Moses也提供了一種更簡單的編譯方式,可以直接拉到文章末尾查看。
自定義安裝方式借助于bjam
來編譯Moses,可以自由添加你想要的功能,其他參數(shù)可以參見Moses官方手冊:
./bjam --with-irstlm=/path/to/irstlm # 集成 irstlm 語言模型
--with-randlm=/path/to/randlm # 集成 randlm 語言模型
--with-nplm=/path/to/nplm # 集成 nplm 語言模型
--with-srilm=/path/to/srilm # 集成 srilm 語言模型
--with-boost=/path/to/boost # 指定 boost 的安裝目錄
--with-xmlrpc-c=/path/to/xmlrpc-c # 指定 xmlrpc-c 的安裝目錄
--with-cmph=/path/to/cmph # 指定 cmph 的安裝目錄
--without-tcmalloc # 指定 tcmalloc 的安裝目錄
--with-regtest=/path/to/moses-regression-tests # 指定 regtest 的安裝目錄
安裝包下載目錄用于存放編譯moses時要用的安裝包:boost 1.72.0
、giza++
、irstlm 5.80.08
、cmph 2.0
、xmlrpc-c 1.33.17
,他們將被安裝在Moses的工作目錄中 :
sudo mkdir /home/moses # Moses 工作目錄
sudo mkdir /home/downloads # 安裝包下載目錄
切換到下載目錄并下載安裝包
cd /home/downloads
sudo wget https://boostorg.jfrog.io/artifactory/main/release/1.72.0/source/boost_1_72_0.tar.gz
sudo wget https://jaist.dl.sourceforge.net/project/irstlm/irstlm/irstlm-5.80/irstlm-5.80.08.tgz
sudo wget http://downloads.sourceforge.net/project/cmph/cmph/cmph-2.0.tar.gz
sudo wget http://downloads.sourceforge.net/project/xmlrpc-c/Xmlrpc-c%20Super%20Stable/1.33.17/xmlrpc-c-1.33.17.tgz
4、安裝boost 1.72.0
cd /home/downloads
sudo tar zxvf boost_1_72_0.tar.gz
cd boost_1_72_0/
sudo ./bootstrap.sh --prefix=/home/moses/boost
sudo ./b2 --prefix=/home/moses/boost --libdir=/home/moses/boost/lib64 --layout=system link=static install || echo FAILURE
無錯誤信息顯示boost就安裝好了。
5、安裝irstlm 5.80.08:
cd /home/downloads
sudo tar zxvf irstlm-5.80.08.tgz
cd irstlm-5.80.08/trunk
sudo ./regenerate-makefiles.sh
sudo ./configure --prefix=/home/moses/irstlm
sudo make install
make install
6、安裝cmph 2.0:
cd /home/downloads
sudo tar zxvf cmph-2.0.tar.gz
cd cmph-2.0/
sudo ./configure --prefix=/home/moses/cmph
sudo make
sudo make install
7、安裝xmlrpc-c 1.33.17:
這一步不知道怎么回事,執(zhí)行cd xmlrpc-c-1.33.17
時顯示沒權(quán)限,所以用sudo su
切到root
賬戶安裝了,利用root
賬戶執(zhí)行以下命令時不需要加sudo
。正常情況下使用下面的命令就可以:
cd /home/downloads
sudo tar zxvf xmlrpc-c-1.33.17.tgz
cd xmlrpc-c-1.33.17
sudo ./configure --prefix=/home/moses/xmlrpc
sudo make
sudo make install
8、安裝giza++
在Moses工作目錄中安裝giza++
,這里拉取源碼時用了GitHub的緩存加速網(wǎng)站,能提些速度,要不要使用看個人網(wǎng)絡情況吧。比較推薦的詞對齊工具還有mgiza++
,Berkeley Aligner
,mgiza++
是giza++
多線程版本。
cd /home/moses
sudo git clone https://gitclone.com/github.com/moses-smt/giza-pp.git
# sudo git clone https://github.com/moses-smt/giza-pp.git
cd giza-pp
sudo make
9、下載Moses源碼
cd /home/moses
sudo git clone https://gitclone.com/github.com/moses-smt/mosesdecoder.git
# sudo git clone https://github.com/moses-smt/mosesdecoder.git
此時,在/home/moses
目錄可以看到如下文件夾,boost
、cmph
、irstlm
、xmlrpc
、giza-pp
是我們剛剛安裝的包,mosesdecoder
是下載的Moses源碼:
然后,在mosesdecoder
中創(chuàng)建文件夾tools
,從giza-pp文件夾復制下面三個可執(zhí)行文件到tools
中:
cd /home/moses/
sudo mkdir /home/moses/mosesdecoder/tools
sudo cp ./giza-pp/GIZA++-v2/GIZA++ ./giza-pp/GIZA++-v2/snt2cooc.out ./giza-pp/mkcls-v2/mkcls ./mosesdecoder/tools
10、編譯Moses
進入mosesdecoder
,注意編譯時最好使用絕對路徑,并且路徑中不能包含空格,使用相對路徑可能會報錯,過程比較慢,最后顯示success,表示編譯成功。
cd /home/moses/mosesdecoder
sudo ./bjam --with-boost=/home/moses/boost --with-cmph=/home/moses/cmph --with-irstlm=/home/moses/irstlm --with-xmlrpc-c=/home/moses/xmlrpc --with-giza=/home/moses/giza-pp
我的筆記本CPU
為i5-6300HQ
,4核4線程,內(nèi)存16G,虛擬機設置如下,Moses編譯耗時45分鐘。
一些安裝教程里還額外執(zhí)行了下面這個命令,在本文中不需要,這句命令需要搭配 ./compile.sh
使用,提供了編譯Moses的簡便方式,但是不夠定制化,而且某些因為網(wǎng)絡原因?qū)е孪螺d時間非常長,可能需要手動改一下里面的下載地址。文章來源:http://www.zghlxwxcb.cn/news/detail-716393.html
cd /home/moses/mosesdecoder
sudo make -f contrib/Makefiles/install-dependencies.gmake
install-dependencies.gmake
中指定了第三方安裝包的的版本,boost 1.68.0
、irstlm-5.80.08
、cmph 2.0
、xmlrpc-c 1.33.17
文章來源地址http://www.zghlxwxcb.cn/news/detail-716393.html
# -*- mode: makefile; tab-width: 4; -*-
# Makefile for installing 3rd-party software required to build Moses.
# author: Ulrich Germann
#
# run as
# make -f /path/to/this/file
#
# By default, everything will be installed in ./opt.
# If you want an alternative destination specify PREFIX=... with the make call
#
# make -f /path/to/this/file PREFIX=/where/to/install/things
#
# The name of the current directory must not contain spaces! The build scripts for
# at least some of the external software can't handle them.
space :=
space +=
# $(CWD) may contain space, safepath escapes them
# Update: doesn't work, because the build scripts for some of the external packages
# can't handle spaces in path names.
safepath=$(subst $(space),\$(space),$1)
# current working directory: bit of a hack to get the nfs-accessible
# path instead of the local real path
CWD := $(shell cd . && pwd)
# by default, we install in ./opt and build in ./build
PREFIX ?= $(CWD)/opt
BUILD_DIR = $(CWD)/opt/build/${URL}
# you can also specify specific prefixes for different packages:
XMLRPC_PREFIX ?= ${PREFIX}
CMPH_PREFIX ?= ${PREFIX}
IRSTLM_PREFIX ?= ${PREFIX}/irstlm-5.80.08
BOOST_PREFIX ?= ${PREFIX}
# currently, the full enchilada means xmlrpc-c, cmph, irstlm, boost
all: xmlrpc cmph boost
# we use bash and fail when pipelines fail
SHELL = /bin/bash -e -o pipefail
# evaluate prefixes now to avoid recursive evaluation problems later ...
XMLRPC_PREFIX := ${XMLRPC_PREFIX}
CMPH_PREFIX := ${CMPH_PREFIX}
IRSTLM_PREFIX := ${IRSTLM_PREFIX}
BOOST_PREFIX := ${BOOST_PREFIX}
# Code repositories:
github = https://github.com/
sourceforge = http://downloads.sourceforge.net/project
# functions for building software from sourceforge
nproc := $(shell getconf _NPROCESSORS_ONLN)
sfget = mkdir -p '${TMP}' && cd '${TMP}' && wget -qO- ${URL} | tar xz
configure-make-install = cd '$1' && ./configure --prefix='${PREFIX}'
configure-make-install += && make -j${nproc} && make install
# XMLRPC-C for moses server
xmlrpc: URL=$(sourceforge)/xmlrpc-c/Xmlrpc-c%20Super%20Stable/1.33.17/xmlrpc-c-1.33.17.tgz
xmlrpc: TMP=$(CWD)/build/xmlrpc
xmlrpc: override PREFIX=${XMLRPC_PREFIX}
xmlrpc: | $(call safepath,${XMLRPC_PREFIX}/bin/xmlrpc-c-config)
$(call safepath,${XMLRPC_PREFIX}/bin/xmlrpc-c-config):
$(sfget)
$(call configure-make-install,${TMP}/xmlrpc-c-1.33.17)
rm -rf ${TMP}
# CMPH for CompactPT
cmph: URL=$(sourceforge)/cmph/cmph/cmph-2.0.tar.gz
cmph: TMP=$(CWD)/build/cmph
cmph: override PREFIX=${CMPH_PREFIX}
cmph: | $(call safepath,${CMPH_PREFIX}/bin/cmph)
$(call safepath,${CMPH_PREFIX}/bin/cmph):
$(sfget)
$(call configure-make-install,${TMP}/cmph-2.0)
rm -rf ${TMP}
# irstlm for irstlm
irstlm: URL=$(sourceforge)/irstlm/irstlm/irstlm-5.80/irstlm-5.80.08.tgz
irstlm: TMP=$(CWD)/build/irstlm
irstlm: VERSION=$(basename $(notdir $(irstlm_url)))
irstlm: override PREFIX=${IRSTLM_PREFIX}
irstlm: | $(call safepath,$(IRSTLM_PREFIX)/bin/build-lm.sh)
$(call safepath,$(IRSTLM_PREFIX)/bin/build-lm.sh):
$(sfget)
cd $$(find '${TMP}' -name trunk) && ./regenerate-makefiles.sh \
&& ./configure --prefix='${PREFIX}' && make -j${nproc} && make install -j${nproc}
rm -rf ${TMP}
# boost
boost: VERSION=1.68.0
boost: UNDERSCORED=$(subst .,_,$(VERSION))
boost: URL=http://sourceforge.net/projects/boost/files/boost/${VERSION}/boost_${UNDERSCORED}.tar.gz/download
boost: TMP=$(CWD)/build/boost
boost: override PREFIX=${BOOST_PREFIX}
boost: | $(call safepath,${BOOST_PREFIX}/include/boost)
$(call safepath,${BOOST_PREFIX}/include/boost):
$(sfget)
cd '${TMP}/boost_${UNDERSCORED}' && ./bootstrap.sh && ./b2 --prefix=${PREFIX} -j${nproc} --layout=system link=static install
rm -rf ${TMP}
到了這里,關(guān)于統(tǒng)計機器翻譯(SMT)工具Moses在Ubuntu上的安裝及使用(安裝篇)的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!