安裝Maven
1)上傳apache-maven-3.6.3-bin.tar.gz到/opt/software目錄,并解壓更名
tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /opt/module/
mv?apache-maven-3.6.3?maven
2)添加環(huán)境變量到/etc/profile中
sudo?vim /etc/profile
#MAVEN_HOME
export MAVEN_HOME=/opt/module/maven
export PATH=$PATH:$MAVEN_HOME/bin
3)測試安裝結果
source /etc/profile
mvn -v
- 修改setting.xml,指定為阿里倉庫地址
vim /opt/module/maven/conf/settings.xml
<!-- 添加阿里云鏡像-->
<mirror>
????????<id>nexus-aliyun</id>
????????<mirrorOf>central</mirrorOf>
????????<name>Nexus aliyun</name>
????????<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
編譯Hudi源碼
源碼下載地址:https://github.com/apache/hudi/releases/tag/release-0.12.0
- 解壓hudi源碼包到/opt/software文件夾下
cd /opt/software
tar -zxvf hudi-release-0.12.0.tar.gz
- 修改pom文件
vim /opt/software/hudi-0.12.0/pom.xml
新增repository加速依賴下載
<repository>
????????<id>nexus-aliyun</id>
????????<name>nexus-aliyun</name>
????????<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
????????<releases>
????????????<enabled>true</enabled>
????????</releases>
????????<snapshots>
????????????<enabled>false</enabled>
????????</snapshots>
????</repository>
3)修改依賴的組件版本
<hadoop.version>3.1.3</hadoop.version>
<hive.version>3.1.2</hive.version>
修改源碼使其兼容Hadoop3
Hudi默認依賴的hadoop2,要兼容hadoop3,除了修改版本,還需要修改如下代碼:
vim /opt/software/hudi-0.12.0/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieParquetDataBlock.java
修改第110行,原先只有一個參數(shù),添加第二個參數(shù)null:
4)手動安裝Kafka依賴
有幾個kafka的依賴需要手動安裝,否則編譯報錯如下:
[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.12.0: The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: Failure to find io.confluent:kafka-avro-serializer:jar:5.3.4 in https://maven.aliyun.com/repository/public was cached in the local repository, resolution will not be reattempted until the update interval of aliyunmaven has elapsed or updates are forced -> [Help 1]
下載jar包
通過網(wǎng)址下載:http://packages.confluent.io/archive/5.3/confluent-5.3.4-2.12.zip
解壓后找到以下jar包,上傳服務器hadoop102任意位置
jar包放在了本課程的資料包中。
- common-config-5.3.4.jar
- common-utils-5.3.4.jar
- kafka-avro-serializer-5.3.4.jar
- kafka-schema-registry-client-5.3.4.jar
install到maven本地倉庫
mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-config-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-utils-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-avro-serializer-5.3.4.jar
mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-schema-registry-client-5.3.4.jar
5)?Hudi集成Flink
我們將編譯好的hudi-flink1.14-bundle_2.12-0.11.0.jar放到Flink的lib目錄下
cp?/opt/software/hudi-0.12.0/packaging/hudi-flink-bundle/target/?hudi-flink1.14-bundle-0.12.0.jar /opt/module/flink/lib/
解決guava依賴沖突
cp /opt/module/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /opt/module/flink/lib/
將項目用到的connector的jar包放入flink的lib中
#需要下載的jar放入flink的lib
https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.12/1.13.6/flink-sql-connector-kafka_2.12-1.13.6.jar
https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.2.1/flink-sql-connector-mysql-cdc-2.2.1.jar
https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-hive-3.1.2_2.12/1.13.6/flink-sql-connector-hive-3.1.2_2.12-1.13.6.jar
需要注意:
(1) hive-connector必須解決guava沖突。使用壓縮軟件打開jar,刪除 com目錄下的google文件夾
(2) 解決找不到hadoop的依賴問題
cp /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.1.3.jar flink/lib
Hudi集成Hive
解壓Hive
把apache-hive-3.1.2-bin.tar.gz上傳到Linux的/opt/software目錄下
解壓apache-hive-3.1.2-bin.tar.gz到/opt/module/目錄下面
tar -zxvf /opt/software/apache-hive-3.1.3-bin.tar.gz -C /opt/module/
修改apache-hive-3.1.2-bin.tar.gz的名稱為hive
mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive
將Hudi集成至Hive
將 hudi-hadoop-mr-bundle-0.12.0.jar和hudi-hive-sync-bundle-0.12.0.jar放到hive節(jié)點的lib目錄下;
cp /opt/software/hudi-0.12.0/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.12.0.jar /opt/module/hive/lib/
cp /opt/software/hudi-0.12.0/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.12.0.jar /opt/module/hive/lib/
配置Hive與環(huán)境變量
修改/etc/profile.d/my_env.sh,添加環(huán)境變量
sudo vim /etc/profile.d/my_env.sh
添加內容
#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin
source操作
source /etc/profile.d/my_env.sh
將MySQL的JDBC驅動拷貝到Hive的lib目錄下
cp /opt/software/mysql-connector-java-5.1.37.jar $HIVE_HOME/lib
在$HIVE_HOME/conf目錄下新建hive-site.xml文件
[atguigu@hadoop102 software]$ vim $HIVE_HOME/conf/hive-site.xml
添加如下內容:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
????<!-- jdbc連接的URL -->
????<property>
????????<name>javax.jdo.option.ConnectionURL</name>
????????<value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>
????</property>
????<!-- jdbc連接的Driver-->
????<property>
????????<name>javax.jdo.option.ConnectionDriverName</name>
????????<value>com.mysql.jdbc.Driver</value>
????</property>
????<!-- jdbc連接的username-->
????<property>
????????<name>javax.jdo.option.ConnectionUserName</name>
????????<value>root</value>
????</property>
????<!-- jdbc連接的password -->
????<property>
????????<name>javax.jdo.option.ConnectionPassword</name>
????????<value>123456</value>
????</property>
????<!-- Hive默認在HDFS的工作目錄 -->
????<property>
????????<name>hive.metastore.warehouse.dir</name>
????????<value>/user/hive/warehouse</value>
????</property>
????<!-- Hive元數(shù)據(jù)存儲的驗證 -->
????<property>
????????<name>hive.metastore.schema.verification</name>
????????<value>false</value>
????</property>
????<!-- 元數(shù)據(jù)存儲授權 ?-->
????<property>
????????<name>hive.metastore.event.db.notification.api.auth</name>
????????<value>false</value>
????</property>
????<!-- 指定hiveserver2連接的host -->
????<property>
????????<name>hive.server2.thrift.bind.host</name>
????????<value>hadoop102</value>
????</property>
????<!-- 指定hiveserver2連接的端口號 -->
????<property>
????????<name>hive.server2.thrift.port</name>
????????<value>10000</value>
????</property>
????<!-- hiveserver2高可用參數(shù),開啟此參數(shù)可以提高hiveserver2啟動速度 -->
????<property>
????????<name>hive.server2.active.passive.ha.enable</name>
????????<value>true</value>
????</property>
????<!-- 指定metastore服務的地址 -->
????<property>
????????<name>hive.metastore.uris</name>
????????<value>thrift://hadoop102:9083</value>
????</property>
????<!-- 打印表名 -->
????<property>
????????<name>hive.cli.print.header</name>
????????<value>true</value>
????</property>
????<!-- 打印庫名 -->
????<property>
????????<name>hive.cli.print.current.db</name>
????????<value>true</value>
????</property></configuration>
初始化Hive元數(shù)據(jù)庫
登錄MySQL
mysql -uroot -p123456
新建Hive元數(shù)據(jù)庫
mysql> create database metastore;
mysql> quit;
初始化Hive元數(shù)據(jù)庫(修改為采用MySQL存儲元數(shù)據(jù))
bin/schematool -dbType mysql -initSchema -verbose
啟動Hive Metastore和Hiveserver2服務(附腳本)
啟動hiveserver2和metastore服務的命令如下:
bin/hive?--service?hiveserver2
bin/hive --service metastore
Flink 同步Hive
1)使用方式
Flink hive sync 現(xiàn)在支持兩種 hive sync mode, 分別是 hms 和 jdbc 模式。 其中 hms 只需要配置 metastore uris;而 jdbc 模式需要同時配置 jdbc 屬性 和 metastore uris,具體配置模版如下:
## hms mode 配置
CREATE TABLE t1(
??uuid VARCHAR(20),
??name VARCHAR(10),
??age INT,
??ts TIMESTAMP(3),
??`partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
with(
??'connector'='hudi',
??'path' = 'hdfs://xxx.xxx.xxx.xxx:9000/t1',
??'table.type'='COPY_ON_WRITE', ???????-- MERGE_ON_READ方式在沒生成 parquet 文件前,hive不會有輸出
??'hive_sync.enable'='true', ??????????-- required,開啟hive同步功能
??'hive_sync.table'='${hive_table}', ?????????????-- required, hive 新建的表名
??'hive_sync.db'='${hive_db}', ????????????-- required, hive 新建的數(shù)據(jù)庫名
??'hive_sync.mode' = 'hms', ???????????-- required, 將hive sync mode設置為hms, 默認jdbc
??'hive_sync.metastore.uris' = 'thrift://ip:9083' -- required, metastore的端口
);文章來源:http://www.zghlxwxcb.cn/news/detail-411827.html
2)案例實操
CREATE TABLE t11(
??id int,
??num int,
??ts int,
??primary key (id) not enforced
)
PARTITIONED BY (num)
with(
??'connector'='hudi',
??'path' = 'hdfs://ns1:8020/hudi/hudi_dwd/t11',
??'table.type'='COPY_ON_WRITE',?
??'hive_sync.enable'='true',?
??'hive_sync.table'='h10',?
??'hive_sync.db'='smart_village',?
??'hive_sync.mode' = 'hms',
??'hive_sync.metastore.uris' = 'thrift://szxc-13:9083'
);
insert into t10 values(1,1,1); 文章來源地址http://www.zghlxwxcb.cn/news/detail-411827.html
到了這里,關于Hudi集成Flink的文章就介紹完了。如果您還想了解更多內容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!