Hudi集成Flink

這篇具有很好參考價值的文章主要介紹了Hudi集成Flink。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

安裝Maven

1）上傳apache-maven-3.6.3-bin.tar.gz到/opt/software目錄，并解壓更名

tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /opt/module/

mv?apache-maven-3.6.3?maven

2）添加環(huán)境變量到/etc/profile中

sudo?vim /etc/profile

#MAVEN_HOME

export MAVEN_HOME=/opt/module/maven

export PATH=$PATH:$MAVEN_HOME/bin

3）測試安裝結果

source /etc/profile

mvn -v

修改setting.xml，指定為阿里倉庫地址

vim /opt/module/maven/conf/settings.xml

????????<id>nexus-aliyun</id>

????????<mirrorOf>central</mirrorOf>

????????<name>Nexus aliyun</name>

????????<url>http://maven.aliyun.com/nexus/content/groups/public</url>

</mirror>

編譯Hudi源碼

源碼下載地址：https://github.com/apache/hudi/releases/tag/release-0.12.0

解壓hudi源碼包到/opt/software文件夾下

cd /opt/software

tar -zxvf hudi-release-0.12.0.tar.gz

修改pom文件

vim /opt/software/hudi-0.12.0/pom.xml

新增repository加速依賴下載

????????<id>nexus-aliyun</id>

????????<name>nexus-aliyun</name>

????????<url>http://maven.aliyun.com/nexus/content/groups/public/</url>

????????<releases>

????????????<enabled>true</enabled>

????????</releases>

????????<snapshots>

????????????<enabled>false</enabled>

????????</snapshots>

????</repository>

3）修改依賴的組件版本

<hadoop.version>3.1.3</hadoop.version>

<hive.version>3.1.2</hive.version>

修改源碼使其兼容Hadoop3

Hudi默認依賴的hadoop2，要兼容hadoop3，除了修改版本，還需要修改如下代碼：

vim /opt/software/hudi-0.12.0/hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieParquetDataBlock.java

修改第110行，原先只有一個參數(shù)，添加第二個參數(shù)null：

Hudi集成Flink

4）手動安裝Kafka依賴

有幾個kafka的依賴需要手動安裝，否則編譯報錯如下：

[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.12.0: The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: Failure to find io.confluent:kafka-avro-serializer:jar:5.3.4 in https://maven.aliyun.com/repository/public was cached in the local repository, resolution will not be reattempted until the update interval of aliyunmaven has elapsed or updates are forced -> [Help 1]

下載jar包

通過網(wǎng)址下載：http://packages.confluent.io/archive/5.3/confluent-5.3.4-2.12.zip

解壓后找到以下jar包，上傳服務器hadoop102任意位置

jar包放在了本課程的資料包中。

common-config-5.3.4.jar
common-utils-5.3.4.jar
kafka-avro-serializer-5.3.4.jar
kafka-schema-registry-client-5.3.4.jar

install到maven本地倉庫

mvn install:install-file -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-config-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.4 -Dpackaging=jar -Dfile=./common-utils-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-avro-serializer-5.3.4.jar

mvn install:install-file -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.4 -Dpackaging=jar -Dfile=./kafka-schema-registry-client-5.3.4.jar

5）?Hudi集成Flink

我們將編譯好的hudi-flink1.14-bundle_2.12-0.11.0.jar放到Flink的lib目錄下

cp?/opt/software/hudi-0.12.0/packaging/hudi-flink-bundle/target/?hudi-flink1.14-bundle-0.12.0.jar /opt/module/flink/lib/

解決guava依賴沖突

cp /opt/module/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar /opt/module/flink/lib/

將項目用到的connector的jar包放入flink的lib中

#需要下載的jar放入flink的lib

https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.12/1.13.6/flink-sql-connector-kafka_2.12-1.13.6.jar

https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.2.1/flink-sql-connector-mysql-cdc-2.2.1.jar

https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-hive-3.1.2_2.12/1.13.6/flink-sql-connector-hive-3.1.2_2.12-1.13.6.jar

需要注意：

（1） hive-connector必須解決guava沖突。使用壓縮軟件打開jar，刪除 com目錄下的google文件夾

Hudi集成Flink

（2）解決找不到hadoop的依賴問題

cp /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.1.3.jar flink/lib

Hudi集成Hive

解壓Hive

把apache-hive-3.1.2-bin.tar.gz上傳到Linux的/opt/software目錄下

解壓apache-hive-3.1.2-bin.tar.gz到/opt/module/目錄下面

tar -zxvf /opt/software/apache-hive-3.1.3-bin.tar.gz -C /opt/module/

修改apache-hive-3.1.2-bin.tar.gz的名稱為hive

mv /opt/module/apache-hive-3.1.2-bin/ /opt/module/hive

將Hudi集成至Hive

將 hudi-hadoop-mr-bundle-0.12.0.jar和hudi-hive-sync-bundle-0.12.0.jar放到hive節(jié)點的lib目錄下；

cp /opt/software/hudi-0.12.0/packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.12.0.jar /opt/module/hive/lib/

cp /opt/software/hudi-0.12.0/packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.12.0.jar /opt/module/hive/lib/

配置Hive與環(huán)境變量

修改/etc/profile.d/my_env.sh，添加環(huán)境變量

sudo vim /etc/profile.d/my_env.sh

添加內容

#HIVE_HOME

export HIVE_HOME=/opt/module/hive

export PATH=$PATH:$HIVE_HOME/bin

source操作

source /etc/profile.d/my_env.sh

將MySQL的JDBC驅動拷貝到Hive的lib目錄下

cp /opt/software/mysql-connector-java-5.1.37.jar $HIVE_HOME/lib

在$HIVE_HOME/conf目錄下新建hive-site.xml文件

[atguigu@hadoop102 software]$ vim $HIVE_HOME/conf/hive-site.xml

添加如下內容：

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

????

????<property>

????????<name>javax.jdo.option.ConnectionURL</name>

????????<value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value>

????</property>

????

????<property>

????????<name>javax.jdo.option.ConnectionDriverName</name>

????????<value>com.mysql.jdbc.Driver</value>

????</property>

????

????<property>

????????<name>javax.jdo.option.ConnectionUserName</name>

????????<value>root</value>

????</property>

????

????<property>

????????<name>javax.jdo.option.ConnectionPassword</name>

????????<value>123456</value>

????</property>

????

????<property>

????????<name>hive.metastore.warehouse.dir</name>

????????<value>/user/hive/warehouse</value>

????</property>

????

????<property>

????????<name>hive.metastore.schema.verification</name>

????????<value>false</value>

????</property>

????

????<property>

????????<name>hive.metastore.event.db.notification.api.auth</name>

????????<value>false</value>

????</property>

????

????<property>

????????<name>hive.server2.thrift.bind.host</name>

????????<value>hadoop102</value>

????</property>

????

????<property>

????????<name>hive.server2.thrift.port</name>

????????<value>10000</value>

????</property>

????

????<property>

????????<name>hive.server2.active.passive.ha.enable</name>

????????<value>true</value>

????</property>

????

????<property>

????????<name>hive.metastore.uris</name>

????????<value>thrift://hadoop102:9083</value>

????</property>

????

????<property>

????????<name>hive.cli.print.header</name>

????????<value>true</value>

????</property>

????

????<property>

????????<name>hive.cli.print.current.db</name>

????????<value>true</value>

????</property></configuration>

初始化Hive元數(shù)據(jù)庫

登錄MySQL

mysql -uroot -p123456

新建Hive元數(shù)據(jù)庫

mysql> create database metastore;

mysql> quit;

初始化Hive元數(shù)據(jù)庫（修改為采用MySQL存儲元數(shù)據(jù)）

bin/schematool -dbType mysql -initSchema -verbose

啟動Hive Metastore和Hiveserver2服務（附腳本）

啟動hiveserver2和metastore服務的命令如下：

bin/hive?--service?hiveserver2

bin/hive --service metastore

Flink 同步Hive

1）使用方式

Flink hive sync 現(xiàn)在支持兩種 hive sync mode, 分別是 hms 和 jdbc 模式。其中 hms 只需要配置 metastore uris；而 jdbc 模式需要同時配置 jdbc 屬性和 metastore uris，具體配置模版如下：

## hms mode 配置

CREATE TABLE t1(

??uuid VARCHAR(20),

??name VARCHAR(10),

??age INT,

??ts TIMESTAMP(3),

??`partition` VARCHAR(20)

)

PARTITIONED BY (`partition`)

with(

??'connector'='hudi',

??'path' = 'hdfs://xxx.xxx.xxx.xxx:9000/t1',

??'table.type'='COPY_ON_WRITE', ???????-- MERGE_ON_READ方式在沒生成 parquet 文件前，hive不會有輸出

??'hive_sync.enable'='true', ??????????-- required，開啟hive同步功能

??'hive_sync.table'='${hive_table}', ?????????????-- required, hive 新建的表名

??'hive_sync.db'='${hive_db}', ????????????-- required, hive 新建的數(shù)據(jù)庫名

??'hive_sync.mode' = 'hms', ???????????-- required, 將hive sync mode設置為hms, 默認jdbc

??'hive_sync.metastore.uris' = 'thrift://ip:9083' -- required, metastore的端口

);

2）案例實操

CREATE TABLE t11(

??id int,

??num int,

??ts int,

??primary key (id) not enforced

)

PARTITIONED BY (num)

with(

??'connector'='hudi',

??'path' = 'hdfs://ns1:8020/hudi/hudi_dwd/t11',

??'table.type'='COPY_ON_WRITE',?

??'hive_sync.enable'='true',?

??'hive_sync.table'='h10',?

??'hive_sync.db'='smart_village',?

??'hive_sync.mode' = 'hms',

??'hive_sync.metastore.uris' = 'thrift://szxc-13:9083'

);

insert into t10 values(1,1,1); 文章來源地址http://www.zghlxwxcb.cn/news/detail-411827.html

到了這里，關于Hudi集成Flink的文章就介紹完了。如果您還想了解更多內容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章，希望大家以后多多支持TOY模板網(wǎng)！

国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区