国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<span id="f88f8"><tr id="f88f8"><span id="f88f8"></span></tr></span>

大數(shù)據(jù)技術(shù)之HBase（超級詳細(xì)）

2年前作者：星川皆無恙分類：Toy博客閱讀(23)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了大數(shù)據(jù)技術(shù)之HBase（超級詳細(xì)）。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

大數(shù)據(jù)技術(shù)之HBase

第1章 HBase簡介

1.1 什么是HBase

HBase的原型是Google的BigTable論文，受到了該論文思想的啟發(fā)，目前作為Hadoop的子項(xiàng)目來開發(fā)維護(hù)，用于支持結(jié)構(gòu)化的數(shù)據(jù)存儲。
官方網(wǎng)站：http://hbase.apache.org
– 2006年Google發(fā)表BigTable白皮書
– 2006年開始開發(fā)HBase
– 2008年北京成功開奧運(yùn)會，程序員默默地將HBase弄成了Hadoop的子項(xiàng)目
– 2010年HBase成為Apache頂級項(xiàng)目
– 現(xiàn)在很多公司二次開發(fā)出了很多發(fā)行版本，你也開始使用了。
HBase是一個高可靠性、高性能、面向列、可伸縮的分布式存儲系統(tǒng)，利用HBASE技術(shù)可在廉價(jià)PC Server上搭建起大規(guī)模結(jié)構(gòu)化存儲集群。
HBase的目標(biāo)是存儲并處理大型的數(shù)據(jù)，更具體來說是僅需使用普通的硬件配置，就能夠處理由成千上萬的行和列所組成的大型數(shù)據(jù)。
HBase是Google Bigtable的開源實(shí)現(xiàn)，但是也有很多不同之處。比如：Google Bigtable利用GFS作為其文件存儲系統(tǒng)，HBase利用Hadoop HDFS作為其文件存儲系統(tǒng)；Google運(yùn)行MAPREDUCE來處理Bigtable中的海量數(shù)據(jù)，HBase同樣利用Hadoop MapReduce來處理HBase中的海量數(shù)據(jù)；Google Bigtable利用Chubby作為協(xié)同服務(wù)，HBase利用Zookeeper作為對應(yīng)。

1.2 HBase特點(diǎn)

1）海量存儲
Hbase適合存儲PB級別的海量數(shù)據(jù)，在PB級別的數(shù)據(jù)以及采用廉價(jià)PC存儲的情況下，能在幾十到百毫秒內(nèi)返回?cái)?shù)據(jù)。這與Hbase的極易擴(kuò)展性息息相關(guān)。正式因?yàn)镠base良好的擴(kuò)展性，才為海量數(shù)據(jù)的存儲提供了便利。
2）列式存儲
這里的列式存儲其實(shí)說的是列族存儲，Hbase是根據(jù)列族來存儲數(shù)據(jù)的。列族下面可以有非常多的列，列族在創(chuàng)建表的時候就必須指定。
3）極易擴(kuò)展
Hbase的擴(kuò)展性主要體現(xiàn)在兩個方面，一個是基于上層處理能力（RegionServer）的擴(kuò)展，一個是基于存儲的擴(kuò)展（HDFS）。
通過橫向添加RegionSever的機(jī)器，進(jìn)行水平擴(kuò)展，提升Hbase上層的處理能力，提升Hbsae服務(wù)更多Region的能力。
備注：RegionServer的作用是管理region、承接業(yè)務(wù)的訪問，這個后面會詳細(xì)的介紹通過橫向添加Datanode的機(jī)器，進(jìn)行存儲層擴(kuò)容，提升Hbase的數(shù)據(jù)存儲能力和提升后端存儲的讀寫能力。
4）高并發(fā)
由于目前大部分使用Hbase的架構(gòu)，都是采用的廉價(jià)PC，因此單個IO的延遲其實(shí)并不小，一般在幾十到上百ms之間。這里說的高并發(fā)，主要是在并發(fā)的情況下，Hbase的單個IO延遲下降并不多。能獲得高并發(fā)、低延遲的服務(wù)。
5）稀疏
稀疏主要是針對Hbase列的靈活性，在列族中，你可以指定任意多的列，在列數(shù)據(jù)為空的情況下，是不會占用存儲空間的。
hbase,大數(shù)據(jù),HBase,大數(shù)據(jù)系統(tǒng)運(yùn)維,hbase,大數(shù)據(jù),hadoop,數(shù)據(jù)庫
從圖中可以看出Hbase是由Client、Zookeeper、Master、HRegionServer、HDFS等幾個組件組成，下面來介紹一下幾個組件的相關(guān)功能：
1）Client
Client包含了訪問Hbase的接口，另外Client還維護(hù)了對應(yīng)的cache來加速Hbase的訪問，比如cache的.META.元數(shù)據(jù)的信息。
2）Zookeeper
HBase通過Zookeeper來做master的高可用、RegionServer的監(jiān)控、元數(shù)據(jù)的入口以及集群配置的維護(hù)等工作。具體工作如下：
通過Zoopkeeper來保證集群中只有1個master在運(yùn)行，如果master異常，會通過競爭機(jī)制產(chǎn)生新的master提供服務(wù)
通過Zoopkeeper來監(jiān)控RegionServer的狀態(tài)，當(dāng)RegionSevrer有異常的時候，通過回調(diào)的形式通知Master RegionServer上下線的信息
通過Zoopkeeper存儲元數(shù)據(jù)的統(tǒng)一入口地址
3）Hmaster
master節(jié)點(diǎn)的主要職責(zé)如下：
為RegionServer分配Region
維護(hù)整個集群的負(fù)載均衡
維護(hù)集群的元數(shù)據(jù)信息
發(fā)現(xiàn)失效的Region，并將失效的Region分配到正常的RegionServer上
當(dāng)RegionSever失效的時候，協(xié)調(diào)對應(yīng)Hlog的拆分
4）HregionServer
HregionServer直接對接用戶的讀寫請求，是真正的“干活”的節(jié)點(diǎn)。它的功能概括如下：
管理master為其分配的Region
處理來自客戶端的讀寫請求
負(fù)責(zé)和底層HDFS的交互，存儲數(shù)據(jù)到HDFS
負(fù)責(zé)Region變大以后的拆分
負(fù)責(zé)Storefile的合并工作
5）HDFS
HDFS為Hbase提供最終的底層數(shù)據(jù)存儲服務(wù)，同時為HBase提供高可用（Hlog存儲在HDFS）的支持，具體功能概括如下：
提供元數(shù)據(jù)和表數(shù)據(jù)的底層分布式存儲服務(wù)
數(shù)據(jù)多副本，保證的高可靠和高可用性

1.3 HBase中的角色

1.3.1 HMaster

功能
1．監(jiān)控RegionServer
2．處理RegionServer故障轉(zhuǎn)移
3．處理元數(shù)據(jù)的變更
4．處理region的分配或轉(zhuǎn)移
5．在空閑時間進(jìn)行數(shù)據(jù)的負(fù)載均衡
6．通過Zookeeper發(fā)布自己的位置給客戶端
1.3.2 RegionServer
功能：
1．負(fù)責(zé)存儲HBase的實(shí)際數(shù)據(jù)
2．處理分配給它的Region
3．刷新緩存到HDFS
4．維護(hù)Hlog
5．執(zhí)行壓縮
6．負(fù)責(zé)處理Region分片

1.2.3 其他組件

1．Write-Ahead logs
HBase的修改記錄，當(dāng)對HBase讀寫數(shù)據(jù)的時候，數(shù)據(jù)不是直接寫進(jìn)磁盤，它會在內(nèi)存中保留一段時間（時間以及數(shù)據(jù)量閾值可以設(shè)定）。但把數(shù)據(jù)保存在內(nèi)存中可能有更高的概率引起數(shù)據(jù)丟失，為了解決這個問題，數(shù)據(jù)會先寫在一個叫做Write-Ahead logfile的文件中，然后再寫入內(nèi)存中。所以在系統(tǒng)出現(xiàn)故障的時候，數(shù)據(jù)可以通過這個日志文件重建。
2．Region
Hbase表的分片，HBase表會根據(jù)RowKey值被切分成不同的region存儲在RegionServer中，在一個RegionServer中可以有多個不同的region。
3．Store
HFile存儲在Store中，一個Store對應(yīng)HBase表中的一個列族。
4．MemStore
顧名思義，就是內(nèi)存存儲，位于內(nèi)存中，用來保存當(dāng)前的數(shù)據(jù)操作，所以當(dāng)數(shù)據(jù)保存在WAL中之后，RegsionServer會在內(nèi)存中存儲鍵值對。
5．HFile
這是在磁盤上保存原始數(shù)據(jù)的實(shí)際的物理文件，是實(shí)際的存儲文件。StoreFile是以Hfile的形式存儲在HDFS的。

第2章 HBase安裝

2.1 Zookeeper正常部署

首先保證Zookeeper集群的正常部署，并啟動之：

[atguigu@hadoop102 zookeeper-3.4.10]$ bin/zkServer.sh start
[atguigu@hadoop103 zookeeper-3.4.10]$ bin/zkServer.sh start
[atguigu@hadoop104 zookeeper-3.4.10]$ bin/zkServer.sh start

2.2 Hadoop正常部署

Hadoop集群的正常部署并啟動：

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh
[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

2.3 HBase的解壓

解壓HBase到指定目錄：

[atguigu@hadoop102 software]$ tar -zxvf hbase-1.3.1-bin.tar.gz -C /opt/module

2.4 HBase的配置文件

修改HBase對應(yīng)的配置文件。
1）hbase-env.sh修改內(nèi)容：

export JAVA_HOME=/opt/module/jdk1.8.0_144
export HBASE_MANAGES_ZK=false

2）hbase-site.xml修改內(nèi)容：

<configuration>
	<property>     
		<name>hbase.rootdir</name>     
		<value>hdfs://hadoop102:9000/hbase</value>   
	</property>

<property>   
	<name>hbase.cluster.distributed</name>
	<value>true</value>
</property>

   <!-- 0.98后的新變動，之前版本沒有.port,默認(rèn)端口為60000 -->
	<property>
		<name>hbase.master.port</name>
		<value>16000</value>
	</property>

<property>   
		<name>hbase.zookeeper.quorum</name>
	 <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
	</property

<property>   
		<name>hbase.zookeeper.property.dataDir</name>
	     <value>/opt/module/zookeeper-3.4.10/zkData</value>
	</property>
</configuration>

3）regionservers：

hadoop102
hadoop103
hadoop104

4）軟連接hadoop配置文件到hbase：

[atguigu@hadoop102 module]$ ln -s /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml 
/opt/module/hbase/conf/core-site.xml

[atguigu@hadoop102 module]$ ln -s /opt/module/hadoop-2.7.2/etc/hadoop/hdfs-site.xml 
/opt/module/hbase/conf/hdfs-site.xml

2.5 HBase遠(yuǎn)程發(fā)送到其他集群

[atguigu@hadoop102 module]$ xsync hbase/

2.6 HBase服務(wù)的啟動

1．啟動方式1

[atguigu@hadoop102 hbase]$ bin/hbase-daemon.sh start master
[atguigu@hadoop102 hbase]$ bin/hbase-daemon.sh start regionserver

提示：如果集群之間的節(jié)點(diǎn)時間不同步，會導(dǎo)致regionserver無法啟動，拋出ClockOutOfSyncException異常。

第3章 HBase Shell操作

3.1 基本操作

1．進(jìn)入HBase客戶端命令行

[atguigu@hadoop102 hbase]$ bin/hbase shell

2．查看幫助命令

hbase(main):001:0> help

3．查看當(dāng)前數(shù)據(jù)庫中有哪些表

hbase(main):002:0> list

3.2 表的操作
1．創(chuàng)建表

hbase(main):002:0> create 'student','info'

2．插入數(shù)據(jù)到表

hbase(main):003:0> put 'student','1001','info:sex','male'
hbase(main):004:0> put 'student','1001','info:age','18'
hbase(main):005:0> put 'student','1002','info:name','Janna'
hbase(main):006:0> put 'student','1002','info:sex','female'
hbase(main):007:0> put 'student','1002','info:age','20'

3．掃描查看表數(shù)據(jù)

hbase(main):008:0> scan 'student'

hbase(main):009:0> scan 'student',{STARTROW => '1001', STOPROW  => '1001'}

hbase(main):010:0> scan 'student',{STARTROW => '1001'}

4．查看表結(jié)構(gòu)

hbase(main):011:0> describe ‘student’

5．更新指定字段的數(shù)據(jù)

hbase(main):012:0> put 'student','1001','info:name','Nick'
hbase(main):013:0> put 'student','1001','info:age','100'

6．查看“指定行”或“指定列族:列”的數(shù)據(jù)

hbase(main):014:0> get 'student','1001'
hbase(main):015:0> get 'student','1001','info:name'

7．統(tǒng)計(jì)表數(shù)據(jù)行數(shù)

hbase(main):021:0> count 'student'

8．刪除數(shù)據(jù)
刪除某rowkey的全部數(shù)據(jù)：

hbase(main):016:0> deleteall 'student','1001'

刪除某rowkey的某一列數(shù)據(jù)：

hbase(main):017:0> delete 'student','1002','info:sex'

9．清空表數(shù)據(jù)

hbase(main):018:0> truncate 'student'

提示：清空表的操作順序?yàn)橄萪isable，然后再truncate。
10．刪除表
首先需要先讓該表為disable狀態(tài)：

hbase(main):019:0> disable 'student'

然后才能drop這個表：
hbase(main):020:0> drop 'student'
提示：如果直接drop表，會報(bào)錯：ERROR: Table student is enabled. Disable it first.
11．變更表信息
將info列族中的數(shù)據(jù)存放3個版本：

hbase(main):022:0> alter 'student',{NAME=>'info',VERSIONS=>3}

hbase(main):022:0> get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}

第4章 HBase數(shù)據(jù)結(jié)構(gòu)

4.1 RowKey

與nosql數(shù)據(jù)庫們一樣,RowKey是用來檢索記錄的主鍵。訪問HBASE table中的行，只有三種方式：
1.通過單個RowKey訪問
2.通過RowKey的range（正則）
3.全表掃描
RowKey行鍵 (RowKey)可以是任意字符串(最大長度是64KB，實(shí)際應(yīng)用中長度一般為 10-100bytes)，在HBASE內(nèi)部，RowKey保存為字節(jié)數(shù)組。存儲時，數(shù)據(jù)按照RowKey的字典序(byte order)排序存儲。設(shè)計(jì)RowKey時，要充分排序存儲這個特性，將經(jīng)常一起讀取的行存儲放到一起。(位置相關(guān)性)

4.2 Column Family

列族：HBASE表中的每個列，都?xì)w屬于某個列族。列族是表的schema的一部分(而列不是)，必須在使用表之前定義。列名都以列族作為前綴。例如 courses:history，courses:math都屬于courses 這個列族。

4.3 Cell

由{rowkey, column Family:columu, version} 唯一確定的單元。cell中的數(shù)據(jù)是沒有類型的，全部是字節(jié)碼形式存貯。
關(guān)鍵字：無類型、字節(jié)碼

4.4 Time Stamp

HBASE 中通過rowkey和columns確定的為一個存貯單元稱為cell。每個 cell都保存著同一份數(shù)據(jù)的多個版本。版本通過時間戳來索引。時間戳的類型是 64位整型。時間戳可以由HBASE(在數(shù)據(jù)寫入時自動 )賦值，此時時間戳是精確到毫秒的當(dāng)前系統(tǒng)時間。時間戳也可以由客戶顯式賦值。如果應(yīng)用程序要避免數(shù)據(jù)版本沖突，就必須自己生成具有唯一性的時間戳。每個 cell中，不同版本的數(shù)據(jù)按照時間倒序排序，即最新的數(shù)據(jù)排在最前面。
為了避免數(shù)據(jù)存在過多版本造成的的管理 (包括存貯和索引)負(fù)擔(dān)，HBASE提供了兩種數(shù)據(jù)版本回收方式。一是保存數(shù)據(jù)的最后n個版本，二是保存最近一段時間內(nèi)的版本（比如最近七天）。用戶可以針對每個列族進(jìn)行設(shè)置。

4.5 命名空間

命名空間的結(jié)構(gòu): hbase,大數(shù)據(jù),HBase,大數(shù)據(jù)系統(tǒng)運(yùn)維,hbase,大數(shù)據(jù),hadoop,數(shù)據(jù)庫

Table：表，所有的表都是命名空間的成員，即表必屬于某個命名空間，如果沒有指定，則在default默認(rèn)的命名空間中。
RegionServer group：一個命名空間包含了默認(rèn)的RegionServer Group。
Permission：權(quán)限，命名空間能夠讓我們來定義訪問控制列表ACL（Access Control List）。例如，創(chuàng)建表，讀取表，刪除，更新等等操作。
Quota：限額，可以強(qiáng)制一個命名空間可包含的region的數(shù)量。

第5章 HBase原理

5.1 讀流程

HBase讀數(shù)據(jù)流程如圖3所示
hbase,大數(shù)據(jù),HBase,大數(shù)據(jù)系統(tǒng)運(yùn)維,hbase,大數(shù)據(jù),hadoop,數(shù)據(jù)庫
1）Client先訪問zookeeper，從meta表讀取region的位置，然后讀取meta表中的數(shù)據(jù)。meta中又存儲了用戶表的region信息；
2）根據(jù)namespace、表名和rowkey在meta表中找到對應(yīng)的region信息；
3）找到這個region對應(yīng)的regionserver；
4）查找對應(yīng)的region；
5）先從MemStore找數(shù)據(jù)，如果沒有，再到BlockCache里面讀；
6）BlockCache還沒有，再到StoreFile上讀(為了讀取的效率)；
7）如果是從StoreFile里面讀取的數(shù)據(jù)，不是直接返回給客戶端，而是先寫入BlockCache，再返回給客戶端。

5.2 寫流程

Hbase寫流程如圖2所示
hbase,大數(shù)據(jù),HBase,大數(shù)據(jù)系統(tǒng)運(yùn)維,hbase,大數(shù)據(jù),hadoop,數(shù)據(jù)庫
1）Client向HregionServer發(fā)送寫請求；
2）HregionServer將數(shù)據(jù)寫到HLog（write ahead log）。為了數(shù)據(jù)的持久化和恢復(fù)；
3）HregionServer將數(shù)據(jù)寫到內(nèi)存（MemStore）；
4）反饋Client寫成功。
5.3 數(shù)據(jù)Flush過程
1）當(dāng)MemStore數(shù)據(jù)達(dá)到閾值（默認(rèn)是128M，老版本是64M），將數(shù)據(jù)刷到硬盤，將內(nèi)存中的數(shù)據(jù)刪除，同時刪除HLog中的歷史數(shù)據(jù)；
2）并將數(shù)據(jù)存儲到HDFS中；
3）在HLog中做標(biāo)記點(diǎn)。
5.4 數(shù)據(jù)合并過程
1）當(dāng)數(shù)據(jù)塊達(dá)到4塊，Hmaster觸發(fā)合并操作，Region將數(shù)據(jù)塊加載到本地，進(jìn)行合并；
2）當(dāng)合并的數(shù)據(jù)超過256M，進(jìn)行拆分，將拆分后的Region分配給不同的HregionServer管理；
3）當(dāng)HregionServer宕機(jī)后，將HregionServer上的hlog拆分，然后分配給不同的HregionServer加載，修改.META.；
4）注意：HLog會同步到HDFS。

第6章 HBase API操作

6.1 環(huán)境準(zhǔn)備

新建項(xiàng)目后在pom.xml中添加依賴：

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.3.1</version>
</dependency>

<dependency>
	<groupId>jdk.tools</groupId>
	<artifactId>jdk.tools</artifactId>
	<version>1.8</version>
	<scope>system</scope>
	<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>

6.2 HBaseAPI

6.2.1 獲取Configuration對象

public static Configuration conf;
static{
	//使用HBaseConfiguration的單例方法實(shí)例化
	conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "192.168.9.102");
conf.set("hbase.zookeeper.property.clientPort", "2181");
}

6.2.2 判斷表是否存在

public static boolean isTableExist(String tableName) throws MasterNotRunningException,
 ZooKeeperConnectionException, IOException{
	//在HBase中管理、訪問表需要先創(chuàng)建HBaseAdmin對象
//Connection connection = ConnectionFactory.createConnection(conf);
//HBaseAdmin admin = (HBaseAdmin) connection.getAdmin();
	HBaseAdmin admin = new HBaseAdmin(conf);
	return admin.tableExists(tableName);
}
6.2.3 創(chuàng)建表
public static void createTable(String tableName, String... columnFamily) throws
 MasterNotRunningException, ZooKeeperConnectionException, IOException{
	HBaseAdmin admin = new HBaseAdmin(conf);
	//判斷表是否存在
	if(isTableExist(tableName)){
		System.out.println("表" + tableName + "已存在");
		//System.exit(0);
	}else{
		//創(chuàng)建表屬性對象,表名需要轉(zhuǎn)字節(jié)
		HTableDescriptor descriptor = new HTableDescriptor(TableName.valueOf(tableName));
		//創(chuàng)建多個列族
		for(String cf : columnFamily){
			descriptor.addFamily(new HColumnDescriptor(cf));
		}
		//根據(jù)對表的配置，創(chuàng)建表
		admin.createTable(descriptor);
		System.out.println("表" + tableName + "創(chuàng)建成功！");
	}
}

6.2.4 刪除表

public static void dropTable(String tableName) throws MasterNotRunningException,
 ZooKeeperConnectionException, IOException{
	HBaseAdmin admin = new HBaseAdmin(conf);
	if(isTableExist(tableName)){
		admin.disableTable(tableName);
		admin.deleteTable(tableName);
		System.out.println("表" + tableName + "刪除成功！");
	}else{
		System.out.println("表" + tableName + "不存在！");
	}
}

6.2.5 向表中插入數(shù)據(jù)

public static void addRowData(String tableName, String rowKey, String columnFamily, String
 column, String value) throws IOException{
	//創(chuàng)建HTable對象
	HTable hTable = new HTable(conf, tableName);
	//向表中插入數(shù)據(jù)
	Put put = new Put(Bytes.toBytes(rowKey));
	//向Put對象中組裝數(shù)據(jù)
	put.add(Bytes.toBytes(columnFamily), Bytes.toBytes(column), Bytes.toBytes(value));
	hTable.put(put);
	hTable.close();
	System.out.println("插入數(shù)據(jù)成功");
}

6.2.6 刪除多行數(shù)據(jù)

public static void deleteMultiRow(String tableName, String... rows) throws IOException{
	HTable hTable = new HTable(conf, tableName);
	List<Delete> deleteList = new ArrayList<Delete>();
	for(String row : rows){
		Delete delete = new Delete(Bytes.toBytes(row));
		deleteList.add(delete);
	}
	hTable.delete(deleteList);
	hTable.close();
}

6.2.7 獲取所有數(shù)據(jù)

public static void getAllRows(String tableName) throws IOException{
	HTable hTable = new HTable(conf, tableName);
	//得到用于掃描region的對象
	Scan scan = new Scan();
	//使用HTable得到resultcanner實(shí)現(xiàn)類的對象
	ResultScanner resultScanner = hTable.getScanner(scan);
	for(Result result : resultScanner){
		Cell[] cells = result.rawCells();
		for(Cell cell : cells){
			//得到rowkey
			System.out.println("行鍵:" + Bytes.toString(CellUtil.cloneRow(cell)));
			//得到列族
			System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
			System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
			System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
		}
	}
}

6.2.8 獲取某一行數(shù)據(jù)

public static void getRow(String tableName, String rowKey) throws IOException{
	HTable table = new HTable(conf, tableName);
	Get get = new Get(Bytes.toBytes(rowKey));
	//get.setMaxVersions();顯示所有版本
    //get.setTimeStamp();顯示指定時間戳的版本
	Result result = table.get(get);
	for(Cell cell : result.rawCells()){
		System.out.println("行鍵:" + Bytes.toString(result.getRow()));
		System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
		System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
		System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
		System.out.println("時間戳:" + cell.getTimestamp());
	}
}

6.2.9 獲取某一行指定“列族:列”的數(shù)據(jù)

public static void getRowQualifier(String tableName, String rowKey, String family, String
 qualifier) throws IOException{
	HTable table = new HTable(conf, tableName);
	Get get = new Get(Bytes.toBytes(rowKey));
	get.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier));
	Result result = table.get(get);
	for(Cell cell : result.rawCells()){
		System.out.println("行鍵:" + Bytes.toString(result.getRow()));
		System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
		System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
		System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
	}
}

6.3 MapReduce

通過HBase的相關(guān)JavaAPI，我們可以實(shí)現(xiàn)伴隨HBase操作的MapReduce過程，比如使用MapReduce將數(shù)據(jù)從本地文件系統(tǒng)導(dǎo)入到HBase的表中，比如我們從HBase中讀取一些原始數(shù)據(jù)后使用MapReduce做數(shù)據(jù)分析。

6.3.1 官方HBase-MapReduce

1．查看HBase的MapReduce任務(wù)的執(zhí)行

$ bin/hbase mapredcp

2．環(huán)境變量的導(dǎo)入
（1）執(zhí)行環(huán)境變量的導(dǎo)入（臨時生效，在命令行執(zhí)行下述操作）

$ export HBASE_HOME=/opt/module/hbase-1.3.1
$ export HADOOP_HOME=/opt/module/hadoop-2.7.2
$ export HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`

（2）永久生效：在/etc/profile配置

export HBASE_HOME=/opt/module/hbase-1.3.1
export HADOOP_HOME=/opt/module/hadoop-2.7.2

并在hadoop-env.sh中配置：（注意：在for循環(huán)之后配）

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase/lib/*

3．運(yùn)行官方的MapReduce任務(wù)
– 案例一：統(tǒng)計(jì)Student表中有多少行數(shù)據(jù)

$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar rowcounter student

– 案例二：使用MapReduce將本地?cái)?shù)據(jù)導(dǎo)入到HBase
1）在本地創(chuàng)建一個tsv格式的文件：fruit.tsv

1001	Apple	Red
1002	Pear		Yellow
1003	Pineapple	Yellow

2）創(chuàng)建HBase表

hbase(main):001:0> create 'fruit','info'

3）在HDFS中創(chuàng)建input_fruit文件夾并上傳fruit.tsv文件

$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -mkdir /input_fruit/

$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -put fruit.tsv /input_fruit/

4）執(zhí)行MapReduce到HBase的fruit表中

$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar importtsv \
-Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:color fruit \
hdfs://hadoop102:9000/input_fruit

5）使用scan命令查看導(dǎo)入后的結(jié)果

hbase(main):001:0> scan ‘fruit’

6.3.2 自定義HBase-MapReduce1

目標(biāo)：將fruit表中的一部分?jǐn)?shù)據(jù)，通過MR遷入到fruit_mr表中。
分步實(shí)現(xiàn)：
1．構(gòu)建ReadFruitMapper類，用于讀取fruit表中的數(shù)據(jù)

package com.atguigu;

import java.io.IOException;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;

public class ReadFruitMapper extends TableMapper<ImmutableBytesWritable, Put> {

	@Override
	protected void map(ImmutableBytesWritable key, Result value, Context context) 
	throws IOException, InterruptedException {
	//將fruit的name和color提取出來，相當(dāng)于將每一行數(shù)據(jù)讀取出來放入到Put對象中。
		Put put = new Put(key.get());
		//遍歷添加column行
		for(Cell cell: value.rawCells()){
			//添加/克隆列族:info
			if("info".equals(Bytes.toString(CellUtil.cloneFamily(cell)))){
				//添加/克隆列：name
				if("name".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
					//將該列cell加入到put對象中
					put.add(cell);
					//添加/克隆列:color
				}else if("color".equals(Bytes.toString(CellUtil.cloneQualifier(cell)))){
					//向該列cell加入到put對象中
					put.add(cell);
				}
			}
		}
		//將從fruit讀取到的每行數(shù)據(jù)寫入到context中作為map的輸出
		context.write(key, put);
	}
}

2．構(gòu)建WriteFruitMRReducer類，用于將讀取到的fruit表中的數(shù)據(jù)寫入到fruit_mr表中

package com.atguigu.hbase_mr;

import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;

public class WriteFruitMRReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {
	@Override
	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) 
	throws IOException, InterruptedException {
		//讀出來的每一行數(shù)據(jù)寫入到fruit_mr表中
		for(Put put: values){
			context.write(NullWritable.get(), put);
		}
	}
}

3．構(gòu)建Fruit2FruitMRRunner extends Configured implements Tool用于組裝運(yùn)行Job任務(wù)

//組裝Job
	public int run(String[] args) throws Exception {
		//得到Configuration
		Configuration conf = this.getConf();
		//創(chuàng)建Job任務(wù)
		Job job = Job.getInstance(conf, this.getClass().getSimpleName());
		job.setJarByClass(Fruit2FruitMRRunner.class);

		//配置Job
		Scan scan = new Scan();
		scan.setCacheBlocks(false);
		scan.setCaching(500);

		//設(shè)置Mapper，注意導(dǎo)入的是mapreduce包下的，不是mapred包下的，后者是老版本
		TableMapReduceUtil.initTableMapperJob(
		"fruit", //數(shù)據(jù)源的表名
		scan, //scan掃描控制器
		ReadFruitMapper.class,//設(shè)置Mapper類
		ImmutableBytesWritable.class,//設(shè)置Mapper輸出key類型
		Put.class,//設(shè)置Mapper輸出value值類型
		job//設(shè)置給哪個JOB
		);
		//設(shè)置Reducer
		TableMapReduceUtil.initTableReducerJob("fruit_mr", WriteFruitMRReducer.class, job);
		//設(shè)置Reduce數(shù)量，最少1個
		job.setNumReduceTasks(1);

		boolean isSuccess = job.waitForCompletion(true);
		if(!isSuccess){
			throw new IOException("Job running with error");
		}
		return isSuccess ? 0 : 1;
	}

4．主函數(shù)中調(diào)用運(yùn)行該Job任務(wù)

public static void main( String[] args ) throws Exception{
Configuration conf = HBaseConfiguration.create();
int status = ToolRunner.run(conf, new Fruit2FruitMRRunner(), args);
System.exit(status);
}

5．打包運(yùn)行任務(wù)

$ /opt/module/hadoop-2.7.2/bin/yarn jar ~/softwares/jars/hbase-0.0.1-SNAPSHOT.jar
 com.z.hbase.mr1.Fruit2FruitMRRunner

提示：運(yùn)行任務(wù)前，如果待數(shù)據(jù)導(dǎo)入的表不存在，則需要提前創(chuàng)建。
提示：maven打包命令：-P local clean package或-P dev clean package install（將第三方j(luò)ar包一同打包，需要插件：maven-shade-plugin）

6.3.3 自定義HBase-MapReduce2

目標(biāo)：實(shí)現(xiàn)將HDFS中的數(shù)據(jù)寫入到HBase表中。
分步實(shí)現(xiàn)：
1．構(gòu)建ReadFruitFromHDFSMapper于讀取HDFS中的文件數(shù)據(jù)

package com.atguigu;

import java.io.IOException;

import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class ReadFruitFromHDFSMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
	@Override
	protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
		//從HDFS中讀取的數(shù)據(jù)
		String lineValue = value.toString();
		//讀取出來的每行數(shù)據(jù)使用\t進(jìn)行分割，存于String數(shù)組
		String[] values = lineValue.split("\t");
		
		//根據(jù)數(shù)據(jù)中值的含義取值
		String rowKey = values[0];
		String name = values[1];
		String color = values[2];
		
		//初始化rowKey
		ImmutableBytesWritable rowKeyWritable = new ImmutableBytesWritable(Bytes.toBytes(rowKey));
		
		//初始化put對象
		Put put = new Put(Bytes.toBytes(rowKey));
		
		//參數(shù)分別:列族、列、值  
        put.add(Bytes.toBytes("info"), Bytes.toBytes("name"),  Bytes.toBytes(name)); 
        put.add(Bytes.toBytes("info"), Bytes.toBytes("color"),  Bytes.toBytes(color)); 
        
        context.write(rowKeyWritable, put);
	}
}

2．構(gòu)建WriteFruitMRFromTxtReducer類

package com.z.hbase.mr2;

import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.io.NullWritable;

public class WriteFruitMRFromTxtReducer extends TableReducer<ImmutableBytesWritable, Put, NullWritable> {
	@Override
	protected void reduce(ImmutableBytesWritable key, Iterable<Put> values, Context context) throws IOException, InterruptedException {
		//讀出來的每一行數(shù)據(jù)寫入到fruit_hdfs表中
		for(Put put: values){
			context.write(NullWritable.get(), put);
		}
	}
}

3．創(chuàng)建Txt2FruitRunner組裝Job

public int run(String[] args) throws Exception {
//得到Configuration
Configuration conf = this.getConf();

//創(chuàng)建Job任務(wù)
Job job = Job.getInstance(conf, this.getClass().getSimpleName());
job.setJarByClass(Txt2FruitRunner.class);
Path inPath = new Path("hdfs://hadoop102:9000/input_fruit/fruit.tsv");
FileInputFormat.addInputPath(job, inPath);

//設(shè)置Mapper
job.setMapperClass(ReadFruitFromHDFSMapper.class);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);

//設(shè)置Reducer
TableMapReduceUtil.initTableReducerJob("fruit_mr", WriteFruitMRFromTxtReducer.class, job);

//設(shè)置Reduce數(shù)量，最少1個
job.setNumReduceTasks(1);

boolean isSuccess = job.waitForCompletion(true);
if(!isSuccess){
throw new IOException("Job running with error");
}

return isSuccess ? 0 : 1;
}

4．調(diào)用執(zhí)行Job

public static void main(String[] args) throws Exception {
		Configuration conf = HBaseConfiguration.create();
	    int status = ToolRunner.run(conf, new Txt2FruitRunner(), args);
	    System.exit(status);
}

5．打包運(yùn)行

$ /opt/module/hadoop-2.7.2/bin/yarn jar hbase-0.0.1-SNAPSHOT.jar com.atguigu.hbase.mr2.Txt2FruitRunner

提示：運(yùn)行任務(wù)前，如果待數(shù)據(jù)導(dǎo)入的表不存在，則需要提前創(chuàng)建之。
提示：maven打包命令：-P local clean package或-P dev clean package install（將第三方j(luò)ar包一同打包，需要插件：maven-shade-plugin）

6.4 與Hive的集成

6.4.1 HBase與Hive的對比

1．Hive
(1) 數(shù)據(jù)倉庫
Hive的本質(zhì)其實(shí)就相當(dāng)于將HDFS中已經(jīng)存儲的文件在Mysql中做了一個雙射關(guān)系，以方便使用HQL去管理查詢。
(2) 用于數(shù)據(jù)分析、清洗
Hive適用于離線的數(shù)據(jù)分析和清洗，延遲較高。
(3) 基于HDFS、MapReduce
Hive存儲的數(shù)據(jù)依舊在DataNode上，編寫的HQL語句終將是轉(zhuǎn)換為MapReduce代碼執(zhí)行。
2．HBase
(1) 數(shù)據(jù)庫
是一種面向列存儲的非關(guān)系型數(shù)據(jù)庫。
(2) 用于存儲結(jié)構(gòu)化和非結(jié)構(gòu)化的數(shù)據(jù)
適用于單表非關(guān)系型數(shù)據(jù)的存儲，不適合做關(guān)聯(lián)查詢，類似JOIN等操作。
(3) 基于HDFS
數(shù)據(jù)持久化存儲的體現(xiàn)形式是Hfile，存放于DataNode中，被ResionServer以region的形式進(jìn)行管理。
(4) 延遲較低，接入在線業(yè)務(wù)使用
面對大量的企業(yè)數(shù)據(jù)，HBase可以直線單表大量數(shù)據(jù)的存儲，同時提供了高效的數(shù)據(jù)訪問速度。

6.4.2 HBase與Hive集成使用

尖叫提示：HBase與Hive的集成在最新的兩個版本中無法兼容。所以，我們只能含著淚勇敢的重新編譯：hive-hbase-handler-1.2.2.jar?。『脷猓?！
環(huán)境準(zhǔn)備
因?yàn)槲覀兒罄m(xù)可能會在操作Hive的同時對HBase也會產(chǎn)生影響，所以Hive需要持有操作HBase的Jar，那么接下來拷貝Hive所依賴的Jar包（或者使用軟連接的形式）。

export HBASE_HOME=/opt/module/hbase
export HIVE_HOME=/opt/module/hive

ln -s $HBASE_HOME/lib/hbase-common-1.3.1.jar  $HIVE_HOME/lib/hbase-common-1.3.1.jar
ln -s $HBASE_HOME/lib/hbase-server-1.3.1.jar $HIVE_HOME/lib/hbase-server-1.3.1.jar
ln -s $HBASE_HOME/lib/hbase-client-1.3.1.jar $HIVE_HOME/lib/hbase-client-1.3.1.jar
ln -s $HBASE_HOME/lib/hbase-protocol-1.3.1.jar $HIVE_HOME/lib/hbase-protocol-1.3.1.jar
ln -s $HBASE_HOME/lib/hbase-it-1.3.1.jar $HIVE_HOME/lib/hbase-it-1.3.1.jar
ln -s $HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar $HIVE_HOME/lib/htrace-core-3.1.0-incubating.jar
ln -s $HBASE_HOME/lib/hbase-hadoop2-compat-1.3.1.jar $HIVE_HOME/lib/hbase-hadoop2-compat-1.3.1.jar
ln -s $HBASE_HOME/lib/hbase-hadoop-compat-1.3.1.jar $HIVE_HOME/lib/hbase-hadoop-compat-1.3.1.jar

同時在hive-site.xml中修改zookeeper的屬性，如下：

<property>
  <name>hive.zookeeper.quorum</name>
  <value>hadoop102,hadoop103,hadoop104</value>
  <description>The list of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
</property>
<property>
  <name>hive.zookeeper.client.port</name>
  <value>2181</value>
  <description>The port of ZooKeeper servers to talk to. This is only needed for read/write locks.</description>
</property>

1．案例一
目標(biāo)：建立Hive表，關(guān)聯(lián)HBase表，插入數(shù)據(jù)到Hive表的同時能夠影響HBase表。
分步實(shí)現(xiàn)：
(1) 在Hive中創(chuàng)建表同時關(guān)聯(lián)HBase

CREATE TABLE hive_hbase_emp_table(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno")
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");

提示：完成之后，可以分別進(jìn)入Hive和HBase查看，都生成了對應(yīng)的表
(2) 在Hive中創(chuàng)建臨時中間表，用于load文件中的數(shù)據(jù)
提示：不能將數(shù)據(jù)直接load進(jìn)Hive所關(guān)聯(lián)HBase的那張表中

CREATE TABLE emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
row format delimited fields terminated by '\t';

(3) 向Hive中間表中l(wèi)oad數(shù)據(jù)

hive> load data local inpath '/home/admin/softwares/data/emp.txt' into table emp;

(4) 通過insert命令將中間表中的數(shù)據(jù)導(dǎo)入到Hive關(guān)聯(lián)HBase的那張表中

hive> insert into table hive_hbase_emp_table select * from emp;

(5) 查看Hive以及關(guān)聯(lián)的HBase表中是否已經(jīng)成功的同步插入了數(shù)據(jù)
Hive：

hive> select * from hive_hbase_emp_table;

HBase：

hbase> scan ‘hbase_emp_table’

2．案例二
目標(biāo)：在HBase中已經(jīng)存儲了某一張表hbase_emp_table，然后在Hive中創(chuàng)建一個外部表來關(guān)聯(lián)HBase中的hbase_emp_table這張表，使之可以借助Hive來分析HBase這張表中的數(shù)據(jù)。
注：該案例2緊跟案例1的腳步，所以完成此案例前，請先完成案例1。
分步實(shí)現(xiàn)：
(1) 在Hive中創(chuàng)建外部表

CREATE EXTERNAL TABLE relevance_hbase_emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = 
":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno") 
TBLPROPERTIES ("hbase.table.name" = "hbase_emp_table");

(2) 關(guān)聯(lián)后就可以使用Hive函數(shù)進(jìn)行一些分析操作了

hive (default)> select * from relevance_hbase_emp;

第7章 HBase優(yōu)化

7.1 高可用

在HBase中Hmaster負(fù)責(zé)監(jiān)控RegionServer的生命周期，均衡RegionServer的負(fù)載，如果Hmaster掛掉了，那么整個HBase集群將陷入不健康的狀態(tài)，并且此時的工作狀態(tài)并不會維持太久。所以HBase支持對Hmaster的高可用配置。
1．關(guān)閉HBase集群（如果沒有開啟則跳過此步）

[atguigu@hadoop102 hbase]$ bin/stop-hbase.sh

2．在conf目錄下創(chuàng)建backup-masters文件

[atguigu@hadoop102 hbase]$ touch conf/backup-masters

3．在backup-masters文件中配置高可用HMaster節(jié)點(diǎn)

[atguigu@hadoop102 hbase]$ echo hadoop103 > conf/backup-masters

4．將整個conf目錄scp到其他節(jié)點(diǎn)

[atguigu@hadoop102 hbase]$ scp -r conf/ hadoop103:/opt/module/hbase/

[atguigu@hadoop102 hbase]$ scp -r conf/ hadoop104:/opt/module/hbase/

5．打開頁面測試查看

http://hadooo102:16010

7.2 預(yù)分區(qū)

每一個region維護(hù)著startRow與endRowKey，如果加入的數(shù)據(jù)符合某個region維護(hù)的rowKey范圍，則該數(shù)據(jù)交給這個region維護(hù)。那么依照這個原則，我們可以將數(shù)據(jù)所要投放的分區(qū)提前大致的規(guī)劃好，以提高HBase性能。
1．手動設(shè)定預(yù)分區(qū)

hbase> create 'staff1','info','partition1',SPLITS => ['1000','2000','3000','4000']

2．生成16進(jìn)制序列預(yù)分區(qū)

create 'staff2','info','partition2',{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

3．按照文件中設(shè)置的規(guī)則預(yù)分區(qū)
創(chuàng)建splits.txt文件內(nèi)容如下：

aaaa
bbbb
cccc
dddd

然后執(zhí)行：

create 'staff3','partition3',SPLITS_FILE => 'splits.txt'

4．使用JavaAPI創(chuàng)建預(yù)分區(qū)
//自定義算法，產(chǎn)生一系列Hash散列值存儲在二維數(shù)組中
byte[][] splitKeys = 某個散列值函數(shù)
//創(chuàng)建HBaseAdmin實(shí)例
HBaseAdmin hAdmin = new HBaseAdmin(HBaseConfiguration.create());
//創(chuàng)建HTableDescriptor實(shí)例
HTableDescriptor tableDesc = new HTableDescriptor(tableName);
//通過HTableDescriptor實(shí)例和散列值二維數(shù)組創(chuàng)建帶有預(yù)分區(qū)的HBase表
hAdmin.createTable(tableDesc, splitKeys);

7.3 RowKey設(shè)計(jì)

一條數(shù)據(jù)的唯一標(biāo)識就是rowkey，那么這條數(shù)據(jù)存儲于哪個分區(qū)，取決于rowkey處于哪個一個預(yù)分區(qū)的區(qū)間內(nèi)，設(shè)計(jì)rowkey的主要目的，就是讓數(shù)據(jù)均勻的分布于所有的region中，在一定程度上防止數(shù)據(jù)傾斜。接下來我們就談一談rowkey常用的設(shè)計(jì)方案。
1．生成隨機(jī)數(shù)、hash、散列值
比如：
原本rowKey為1001的，SHA1后變成：dd01903921ea24941c26a48f2cec24e0bb0e8cc7
原本rowKey為3001的，SHA1后變成：49042c54de64a1e9bf0b33e00245660ef92dc7bd
原本rowKey為5001的，SHA1后變成：7b61dec07e02c188790670af43e717f0f46e8913
在做此操作之前，一般我們會選擇從數(shù)據(jù)集中抽取樣本，來決定什么樣的rowKey來Hash后作為每個分區(qū)的臨界值。
2．字符串反轉(zhuǎn)
20170524000001轉(zhuǎn)成10000042507102
20170524000002轉(zhuǎn)成20000042507102
這樣也可以在一定程度上散列逐步put進(jìn)來的數(shù)據(jù)。
3．字符串拼接
20170524000001_a12e
20170524000001_93i7

7.4 內(nèi)存優(yōu)化

HBase操作過程中需要大量的內(nèi)存開銷，畢竟Table是可以緩存在內(nèi)存中的，一般會分配整個可用內(nèi)存的70%給HBase的Java堆。但是不建議分配非常大的堆內(nèi)存，因?yàn)镚C過程持續(xù)太久會導(dǎo)致RegionServer處于長期不可用狀態(tài)，一般16~48G內(nèi)存就可以了，如果因?yàn)榭蚣苷加脙?nèi)存過高導(dǎo)致系統(tǒng)內(nèi)存不足，框架一樣會被系統(tǒng)服務(wù)拖死。

7.5 基礎(chǔ)優(yōu)化

1．允許在HDFS的文件中追加內(nèi)容
hdfs-site.xml、hbase-site.xml
屬性：dfs.support.append
解釋：開啟HDFS追加同步，可以優(yōu)秀的配合HBase的數(shù)據(jù)同步和持久化。默認(rèn)值為true。
2．優(yōu)化DataNode允許的最大文件打開數(shù)
hdfs-site.xml
屬性：dfs.datanode.max.transfer.threads
解釋：HBase一般都會同一時間操作大量的文件，根據(jù)集群的數(shù)量和規(guī)模以及數(shù)據(jù)動作，設(shè)置為4096或者更高。默認(rèn)值：4096
3．優(yōu)化延遲高的數(shù)據(jù)操作的等待時間
hdfs-site.xml
屬性：dfs.image.transfer.timeout
解釋：如果對于某一次數(shù)據(jù)操作來講，延遲非常高，socket需要等待更長的時間，建議把該值設(shè)置為更大的值（默認(rèn)60000毫秒），以確保socket不會被timeout掉。
4．優(yōu)化數(shù)據(jù)的寫入效率
mapred-site.xml
屬性：
mapreduce.map.output.compress
mapreduce.map.output.compress.codec
解釋：開啟這兩個數(shù)據(jù)可以大大提高文件的寫入效率，減少寫入時間。第一個屬性值修改為true，第二個屬性值修改為：org.apache.hadoop.io.compress.GzipCodec或者其他壓縮方式。
5．設(shè)置RPC監(jiān)聽數(shù)量
hbase-site.xml
屬性：hbase.regionserver.handler.count
解釋：默認(rèn)值為30，用于指定RPC監(jiān)聽的數(shù)量，可以根據(jù)客戶端的請求數(shù)進(jìn)行調(diào)整，讀寫請求較多時，增加此值。
6．優(yōu)化HStore文件大小
hbase-site.xml
屬性：hbase.hregion.max.filesize
解釋：默認(rèn)值10737418240（10GB），如果需要運(yùn)行HBase的MR任務(wù)，可以減小此值，因?yàn)橐粋€region對應(yīng)一個map任務(wù)，如果單個region過大，會導(dǎo)致map任務(wù)執(zhí)行時間過長。該值的意思就是，如果HFile的大小達(dá)到這個數(shù)值，則這個region會被切分為兩個Hfile。
7．優(yōu)化hbase客戶端緩存
hbase-site.xml
屬性：hbase.client.write.buffer
解釋：用于指定HBase客戶端緩存，增大該值可以減少RPC調(diào)用次數(shù)，但是會消耗更多內(nèi)存，反之則反之。一般我們需要設(shè)定一定的緩存大小，以達(dá)到減少RPC次數(shù)的目的。
8．指定scan.next掃描HBase所獲取的行數(shù)
hbase-site.xml
屬性：hbase.client.scanner.caching
解釋：用于指定scan.next方法獲取的默認(rèn)行數(shù)，值越大，消耗內(nèi)存越大。
9．flush、compact、split機(jī)制
當(dāng)MemStore達(dá)到閾值，將Memstore中的數(shù)據(jù)Flush進(jìn)Storefile；compact機(jī)制則是把flush出來的小文件合并成大的Storefile文件。split則是當(dāng)Region達(dá)到閾值，會把過大的Region一分為二。
涉及屬性：
即：128M就是Memstore的默認(rèn)閾值
hbase.hregion.memstore.flush.size：134217728
即：這個參數(shù)的作用是當(dāng)單個HRegion內(nèi)所有的Memstore大小總和超過指定值時，flush該HRegion的所有memstore。RegionServer的flush是通過將請求添加一個隊(duì)列，模擬生產(chǎn)消費(fèi)模型來異步處理的。那這里就有一個問題，當(dāng)隊(duì)列來不及消費(fèi)，產(chǎn)生大量積壓請求時，可能會導(dǎo)致內(nèi)存陡增，最壞的情況是觸發(fā)OOM。
hbase.regionserver.global.memstore.upperLimit：0.4
hbase.regionserver.global.memstore.lowerLimit：0.38
即：當(dāng)MemStore使用內(nèi)存總量達(dá)到hbase.regionserver.global.memstore.upperLimit指定值時，將會有多個MemStores flush到文件中，MemStore flush 順序是按照大小降序執(zhí)行的，直到刷新到MemStore使用內(nèi)存略小于lowerLimit

第8章 HBase實(shí)戰(zhàn)之谷粒微博

8.1 需求分析

微博內(nèi)容的瀏覽，數(shù)據(jù)庫表設(shè)計(jì)
用戶社交體現(xiàn)：關(guān)注用戶，取關(guān)用戶
拉取關(guān)注的人的微博內(nèi)容
8.2 代碼實(shí)現(xiàn)
8.2.1 代碼設(shè)計(jì)總覽：
創(chuàng)建命名空間以及表名的定義
創(chuàng)建微博內(nèi)容表
創(chuàng)建用戶關(guān)系表
創(chuàng)建用戶微博內(nèi)容接收郵件表
發(fā)布微博內(nèi)容
添加關(guān)注用戶
移除（取關(guān)）用戶
獲取關(guān)注的人的微博內(nèi)容
測試

8.2.2 創(chuàng)建命名空間以及表名的定義

//獲取配置conf
private Configuration conf = HBaseConfiguration.create();

//微博內(nèi)容表的表名
private static final byte[] TABLE_CONTENT = Bytes.toBytes("weibo:content");
//用戶關(guān)系表的表名
private static final byte[] TABLE_RELATIONS = Bytes.toBytes("weibo:relations");
//微博收件箱表的表名
private static final byte[] TABLE_RECEIVE_CONTENT_EMAIL = Bytes.toBytes("weibo:receive_content_email");
public void initNamespace(){
	HBaseAdmin admin = null;
	try {
		admin = new HBaseAdmin(conf);
		//命名空間類似于關(guān)系型數(shù)據(jù)庫中的schema，可以想象成文件夾
		NamespaceDescriptor weibo = NamespaceDescriptor
				.create("weibo")
				.addConfiguration("creator", "Jinji")
				.addConfiguration("create_time", System.currentTimeMillis() + "")
				.build();
		admin.createNamespace(weibo);
	} catch (MasterNotRunningException e) {
		e.printStackTrace();
	} catch (ZooKeeperConnectionException e) {
		e.printStackTrace();
	} catch (IOException e) {
		e.printStackTrace();
	}finally{
		if(null != admin){
			try {
				admin.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
}

8.2.3 創(chuàng)建微博內(nèi)容表

表結(jié)構(gòu)：
方法名 creatTableeContent
Table Name weibo:content
RowKey 用戶ID_時間戳
ColumnFamily info
ColumnLabel 標(biāo)題,內(nèi)容,圖片
Version 1個版本
代碼：

/**
 * 創(chuàng)建微博內(nèi)容表
 * Table Name:weibo:content
 * RowKey:用戶ID_時間戳
 * ColumnFamily:info
 * ColumnLabel:標(biāo)題	內(nèi)容		圖片URL
 * Version:1個版本
 */
public void createTableContent(){
	HBaseAdmin admin = null;
	try {
		admin = new HBaseAdmin(conf);
		//創(chuàng)建表表述
		HTableDescriptor content = new HTableDescriptor(TableName.valueOf(TABLE_CONTENT));
		//創(chuàng)建列族描述
		HColumnDescriptor info = new HColumnDescriptor(Bytes.toBytes("info"));
		//設(shè)置塊緩存
		info.setBlockCacheEnabled(true);
		//設(shè)置塊緩存大小
		info.setBlocksize(2097152);
		//設(shè)置壓縮方式
//			info.setCompressionType(Algorithm.SNAPPY);
		//設(shè)置版本確界
		info.setMaxVersions(1);
		info.setMinVersions(1);
		
		content.addFamily(info);
		admin.createTable(content);
		
	} catch (MasterNotRunningException e) {
		e.printStackTrace();
	} catch (ZooKeeperConnectionException e) {
		e.printStackTrace();
	} catch (IOException e) {
		e.printStackTrace();
	}finally{
		if(null != admin){
			try {
				admin.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
}

8.2.4 創(chuàng)建用戶關(guān)系表

表結(jié)構(gòu)：
方法名 createTableRelations
Table Name weibo:relations
RowKey 用戶ID
ColumnFamily attends、fans
ColumnLabel 關(guān)注用戶ID，粉絲用戶ID
ColumnValue 用戶ID
Version 1個版本
代碼：

/**
 * 用戶關(guān)系表
 * Table Name:weibo:relations
 * RowKey:用戶ID
 * ColumnFamily:attends,fans
 * ColumnLabel:關(guān)注用戶ID，粉絲用戶ID
 * ColumnValue:用戶ID
 * Version：1個版本
 */
public void createTableRelations(){
	HBaseAdmin admin = null;
	try {
		admin = new HBaseAdmin(conf);
		HTableDescriptor relations = new HTableDescriptor(TableName.valueOf(TABLE_RELATIONS));
		
		//關(guān)注的人的列族
		HColumnDescriptor attends = new HColumnDescriptor(Bytes.toBytes("attends"));
		//設(shè)置塊緩存
		attends.setBlockCacheEnabled(true);
		//設(shè)置塊緩存大小
		attends.setBlocksize(2097152);
		//設(shè)置壓縮方式
//			info.setCompressionType(Algorithm.SNAPPY);
		//設(shè)置版本確界
		attends.setMaxVersions(1);
		attends.setMinVersions(1);
		
		//粉絲列族
		HColumnDescriptor fans = new HColumnDescriptor(Bytes.toBytes("fans"));
		fans.setBlockCacheEnabled(true);
		fans.setBlocksize(2097152);
		fans.setMaxVersions(1);
		fans.setMinVersions(1);
		
		
		relations.addFamily(attends);
		relations.addFamily(fans);
		admin.createTable(relations);
		
	} catch (MasterNotRunningException e) {
		e.printStackTrace();
	} catch (ZooKeeperConnectionException e) {
		e.printStackTrace();
	} catch (IOException e) {
		e.printStackTrace();
	}finally{
		if(null != admin){
			try {
				admin.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
}

8.2.5 創(chuàng)建微博收件箱表

表結(jié)構(gòu)：
方法名 createTableReceiveContentEmails
Table Name weibo:receive_content_email
RowKey 用戶ID
ColumnFamily info
ColumnLabel 用戶ID
ColumnValue 取微博內(nèi)容的RowKey
Version 1000
代碼：

/**
 * 創(chuàng)建微博收件箱表
 * Table Name: weibo:receive_content_email
 * RowKey:用戶ID
 * ColumnFamily:info
 * ColumnLabel:用戶ID-發(fā)布微博的人的用戶ID
 * ColumnValue:關(guān)注的人的微博的RowKey
 * Version:1000
 */
public void createTableReceiveContentEmail(){
	HBaseAdmin admin = null;
	try {
		admin = new HBaseAdmin(conf);
		HTableDescriptor receive_content_email = new HTableDescriptor(TableName.valueOf(TABLE_RECEIVE_CONTENT_EMAIL));
		HColumnDescriptor info = new HColumnDescriptor(Bytes.toBytes("info"));
		
		info.setBlockCacheEnabled(true);
		info.setBlocksize(2097152);
		info.setMaxVersions(1000);
		info.setMinVersions(1000);
		
		receive_content_email.addFamily(info);;
		admin.createTable(receive_content_email);
	} catch (MasterNotRunningException e) {
		e.printStackTrace();
	} catch (ZooKeeperConnectionException e) {
		e.printStackTrace();
	} catch (IOException e) {
		e.printStackTrace();
	}finally{
		if(null != admin){
			try {
				admin.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
}
8.2.6 發(fā)布微博內(nèi)容
a、微博內(nèi)容表中添加1條數(shù)據(jù)
b、微博收件箱表對所有粉絲用戶添加數(shù)據(jù)
代碼：Message.java
package com.atguigu.weibo;

public class Message {
	private String uid;
	private String timestamp;
	private String content;
	
	public String getUid() {
		return uid;
	}
	public void setUid(String uid) {
		this.uid = uid;
	}
	public String getTimestamp() {
		return timestamp;
	}
	public void setTimestamp(String timestamp) {
		this.timestamp = timestamp;
	}
	public String getContent() {
		return content;
	}
	public void setContent(String content) {
		this.content = content;
	}
	@Override
	public String toString() {
		return "Message [uid=" + uid + ", timestamp=" + timestamp + ", content=" + content + "]";
	}
}
代碼：public void publishContent(String uid, String content)
/**
 * 發(fā)布微博
 * a、微博內(nèi)容表中數(shù)據(jù)+1
 * b、向微博收件箱表中加入微博的Rowkey
 */
public void publishContent(String uid, String content){
	HConnection connection = null;
	try {
		connection = HConnectionManager.createConnection(conf);
		//a、微博內(nèi)容表中添加1條數(shù)據(jù)，首先獲取微博內(nèi)容表描述
		HTableInterface contentTBL = connection.getTable(TableName.valueOf(TABLE_CONTENT));
		//組裝Rowkey
		long timestamp = System.currentTimeMillis();
		String rowKey = uid + "_" + timestamp;
		
		Put put = new Put(Bytes.toBytes(rowKey));
		put.add(Bytes.toBytes("info"), Bytes.toBytes("content"), timestamp, Bytes.toBytes(content));
		
		contentTBL.put(put);
		
		//b、向微博收件箱表中加入發(fā)布的Rowkey
		//b.1、查詢用戶關(guān)系表，得到當(dāng)前用戶有哪些粉絲
		HTableInterface relationsTBL = connection.getTable(TableName.valueOf(TABLE_RELATIONS));
		//b.2、取出目標(biāo)數(shù)據(jù)
		Get get = new Get(Bytes.toBytes(uid));
		get.addFamily(Bytes.toBytes("fans"));
		
		Result result = relationsTBL.get(get);
		List<byte[]> fans = new ArrayList<byte[]>();
		
		//遍歷取出當(dāng)前發(fā)布微博的用戶的所有粉絲數(shù)據(jù)
		for(Cell cell : result.rawCells()){
			fans.add(CellUtil.cloneQualifier(cell));
		}
		//如果該用戶沒有粉絲，則直接return
		if(fans.size() <= 0) return;
		//開始操作收件箱表
		HTableInterface recTBL = connection.getTable(TableName.valueOf(TABLE_RECEIVE_CONTENT_EMAIL));
		List<Put> puts = new ArrayList<Put>();
		for(byte[] fan : fans){
			Put fanPut = new Put(fan);
			fanPut.add(Bytes.toBytes("info"), Bytes.toBytes(uid), timestamp, Bytes.toBytes(rowKey));
			puts.add(fanPut);
		}
		recTBL.put(puts);
	} catch (IOException e) {
		e.printStackTrace();
	}finally{
		if(null != connection){
			try {
				connection.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}
}

8.2.7 添加關(guān)注用戶

a、在微博用戶關(guān)系表中，對當(dāng)前主動操作的用戶添加新關(guān)注的好友
b、在微博用戶關(guān)系表中，對被關(guān)注的用戶添加新的粉絲
c、微博收件箱表中添加所關(guān)注的用戶發(fā)布的微博
代碼實(shí)現(xiàn)：

public void addAttends(String uid, String... attends)
/**
 * 關(guān)注用戶邏輯
 * a、在微博用戶關(guān)系表中，對當(dāng)前主動操作的用戶添加新的關(guān)注的好友
 * b、在微博用戶關(guān)系表中，對被關(guān)注的用戶添加粉絲（當(dāng)前操作的用戶）
 * c、當(dāng)前操作用戶的微博收件箱添加所關(guān)注的用戶發(fā)布的微博rowkey
 */
public void addAttends(String uid, String... attends){
	//參數(shù)過濾
	if(attends == null || attends.length <= 0 || uid == null || uid.length() <= 0){
		return;
	}
	HConnection connection = null;
	try {
		connection = HConnectionManager.createConnection(conf);
		//用戶關(guān)系表操作對象（連接到用戶關(guān)系表）
		HTableInterface relationsTBL = connection.getTable(TableName.valueOf(TABLE_RELATIONS));
		List<Put> puts = new ArrayList<Put>();
		//a、在微博用戶關(guān)系表中，添加新關(guān)注的好友
		Put attendPut = new Put(Bytes.toBytes(uid));
		for(String attend : attends){
			//為當(dāng)前用戶添加關(guān)注的人
			attendPut.add(Bytes.toBytes("attends"), Bytes.toBytes(attend), Bytes.toBytes(attend));
			//b、為被關(guān)注的人，添加粉絲
			Put fansPut = new Put(Bytes.toBytes(attend));
			fansPut.add(Bytes.toBytes("fans"), Bytes.toBytes(uid), Bytes.toBytes(uid));
			//將所有關(guān)注的人一個一個的添加到puts（List）集合中
			puts.add(fansPut);
		}
		puts.add(attendPut);
		relationsTBL.put(puts);
		
		//c.1、微博收件箱添加關(guān)注的用戶發(fā)布的微博內(nèi)容（content）的rowkey
		HTableInterface contentTBL = connection.getTable(TableName.valueOf(TABLE_CONTENT));
		Scan scan = new Scan();
		//用于存放取出來的關(guān)注的人所發(fā)布的微博的rowkey
		List<byte[]> rowkeys = new ArrayList<byte[]>();
		
		for(String attend : attends){
			//過濾掃描rowkey，即：前置位匹配被關(guān)注的人的uid_
			RowFilter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator(attend + "_"));
			//為掃描對象指定過濾規(guī)則
			scan.setFilter(filter);
			//通過掃描對象得到scanner
			ResultScanner result = contentTBL.getScanner(scan);
			//迭代器遍歷掃描出來的結(jié)果集
			Iterator<Result> iterator = result.iterator();
			while(iterator.hasNext()){
				//取出每一個符合掃描結(jié)果的那一行數(shù)據(jù)
				Result r = iterator.next();
				for(Cell cell : r.rawCells()){
					//將得到的rowkey放置于集合容器中
					rowkeys.add(CellUtil.cloneRow(cell));
				}
				
			}
		}
		
		//c.2、將取出的微博rowkey放置于當(dāng)前操作用戶的收件箱中
		if(rowkeys.size() <= 0) return;
		//得到微博收件箱表的操作對象
		HTableInterface recTBL = connection.getTable(TableName.valueOf(TABLE_RECEIVE_CONTENT_EMAIL));
		//用于存放多個關(guān)注的用戶的發(fā)布的多條微博rowkey信息
		List<Put> recPuts = new ArrayList<Put>();
		for(byte[] rk : rowkeys){
			Put put = new Put(Bytes.toBytes(uid));
			//uid_timestamp
			String rowKey = Bytes.toString(rk);
			//借取uid
			String attendUID = rowKey.substring(0, rowKey.indexOf("_"));
			long timestamp = Long.parseLong(rowKey.substring(rowKey.indexOf("_") + 1));
			//將微博rowkey添加到指定單元格中
			put.add(Bytes.toBytes("info"), Bytes.toBytes(attendUID), timestamp, rk);
			recPuts.add(put);
		}
		
		recTBL.put(recPuts);
		
	} catch (IOException e) {
		e.printStackTrace();
	}finally{
		if(null != connection){
			try {
				connection.close();
			} catch (IOException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
	}
}

8.2.8 移除（取關(guān)）用戶

a、在微博用戶關(guān)系表中，對當(dāng)前主動操作的用戶移除取關(guān)的好友(attends)
b、在微博用戶關(guān)系表中，對被取關(guān)的用戶移除粉絲
c、微博收件箱中刪除取關(guān)的用戶發(fā)布的微博
代碼：

public void removeAttends(String uid, String... attends)
/**
 * 取消關(guān)注（remove)
 * a、在微博用戶關(guān)系表中，對當(dāng)前主動操作的用戶刪除對應(yīng)取關(guān)的好友
 * b、在微博用戶關(guān)系表中，對被取消關(guān)注的人刪除粉絲（當(dāng)前操作人）
 * c、從收件箱中，刪除取關(guān)的人的微博的rowkey
 */
public void removeAttends(String uid, String... attends){
	//過濾數(shù)據(jù)
	if(uid == null || uid.length() <= 0 || attends == null || attends.length <= 0) return;
	HConnection connection = null;
	
	try {
		connection = HConnectionManager.createConnection(conf);
		//a、在微博用戶關(guān)系表中，刪除已關(guān)注的好友
		HTableInterface relationsTBL = connection.getTable(TableName.valueOf(TABLE_RELATIONS));
		
		//待刪除的用戶關(guān)系表中的所有數(shù)據(jù)
		List<Delete> deletes = new ArrayList<Delete>();
		//當(dāng)前取關(guān)操作者的uid對應(yīng)的Delete對象
		Delete attendDelete = new Delete(Bytes.toBytes(uid));
		//遍歷取關(guān)，同時每次取關(guān)都要將被取關(guān)的人的粉絲-1
		for(String attend : attends){
			attendDelete.deleteColumn(Bytes.toBytes("attends"), Bytes.toBytes(attend));
			//b
			Delete fansDelete = new Delete(Bytes.toBytes(attend));
			fansDelete.deleteColumn(Bytes.toBytes("fans"), Bytes.toBytes(uid));
			deletes.add(fansDelete);
		}
		
		deletes.add(attendDelete);
		relationsTBL.delete(deletes);
		
		//c、刪除取關(guān)的人的微博rowkey 從 收件箱表中
		HTableInterface recTBL = connection.getTable(TableName.valueOf(TABLE_RECEIVE_CONTENT_EMAIL));
		
		Delete recDelete = new Delete(Bytes.toBytes(uid));
		for(String attend : attends){
			recDelete.deleteColumn(Bytes.toBytes("info"), Bytes.toBytes(attend));
		}
		recTBL.delete(recDelete);
	} catch (IOException e) {
		e.printStackTrace();
	}
}
8.2.9 獲取關(guān)注的人的微博內(nèi)容
a、從微博收件箱中獲取所關(guān)注的用戶的微博RowKey 
b、根據(jù)獲取的RowKey，得到微博內(nèi)容
代碼實(shí)現(xiàn)：public List<Message> getAttendsContent(String uid)
/**
 * 獲取微博實(shí)際內(nèi)容
 * a、從微博收件箱中獲取所有關(guān)注的人的發(fā)布的微博的rowkey
 * b、根據(jù)得到的rowkey去微博內(nèi)容表中得到數(shù)據(jù)
 * c、將得到的數(shù)據(jù)封裝到Message對象中
 */
public List<Message> getAttendsContent(String uid){
	HConnection connection = null;
	try {
		connection = HConnectionManager.createConnection(conf);
		HTableInterface recTBL = connection.getTable(TableName.valueOf(TABLE_RECEIVE_CONTENT_EMAIL));
		//a、從收件箱中取得微博rowKey
		Get get = new Get(Bytes.toBytes(uid));
		//設(shè)置最大版本號
		get.setMaxVersions(5);
		List<byte[]> rowkeys = new ArrayList<byte[]>();
		Result result = recTBL.get(get);
		for(Cell cell : result.rawCells()){
			rowkeys.add(CellUtil.cloneValue(cell));
		}
		//b、根據(jù)取出的所有rowkey去微博內(nèi)容表中檢索數(shù)據(jù)
		HTableInterface contentTBL = connection.getTable(TableName.valueOf(TABLE_CONTENT));
		List<Get> gets = new ArrayList<Get>();
		//根據(jù)rowkey取出對應(yīng)微博的具體內(nèi)容
		for(byte[] rk : rowkeys){
			Get g = new Get(rk);
			gets.add(g);
		}
		//得到所有的微博內(nèi)容的result對象
		Result[] results = contentTBL.get(gets);
		
		List<Message> messages = new ArrayList<Message>();
		for(Result res : results){
			for(Cell cell : res.rawCells()){
				Message message = new Message();
				
				String rowKey = Bytes.toString(CellUtil.cloneRow(cell));
				String userid = rowKey.substring(0, rowKey.indexOf("_"));
				String timestamp = rowKey.substring(rowKey.indexOf("_") + 1);
				String content = Bytes.toString(CellUtil.cloneValue(cell));
				
				message.setContent(content);
				message.setTimestamp(timestamp);
				message.setUid(userid);
				
				messages.add(message);
			}
		}
		return messages;
	} catch (IOException e) {
		e.printStackTrace();
	}finally{
		try {
			connection.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	return null;
}

8.2.10 測試

-- 測試發(fā)布微博內(nèi)容 
public void testPublishContent(WeiBo wb)
-- 測試添加關(guān)注
public void testAddAttend(WeiBo wb)
-- 測試取消關(guān)注
public void testRemoveAttend(WeiBo wb)
-- 測試展示內(nèi)容
public void testShowMessage(WeiBo wb)

代碼：

/**
 * 發(fā)布微博內(nèi)容
 * 添加關(guān)注
 * 取消關(guān)注
 * 展示內(nèi)容
 */
public void testPublishContent(WeiBo wb){
	wb.publishContent("0001", "今天買了一包空氣，送了點(diǎn)薯片，非常開心??！");
	wb.publishContent("0001", "今天天氣不錯。");
}

public void testAddAttend(WeiBo wb){
	wb.publishContent("0008", "準(zhǔn)備下課！");
	wb.publishContent("0009", "準(zhǔn)備關(guān)機(jī)！");
	wb.addAttends("0001", "0008", "0009");
}

public void testRemoveAttend(WeiBo wb){
	wb.removeAttends("0001", "0008");
}

public void testShowMessage(WeiBo wb){
	List<Message> messages = wb.getAttendsContent("0001");
	for(Message message : messages){
		System.out.println(message);
	}
}

public static void main(String[] args) {
	WeiBo weibo = new WeiBo();
	weibo.initTable();
	weibo.testPublishContent(weibo);
	weibo.testAddAttend(weibo);
	weibo.testShowMessage(weibo);
	weibo.testRemoveAttend(weibo);
	weibo.testShowMessage(weibo);
}

第9章擴(kuò)展

9.1 HBase在商業(yè)項(xiàng)目中的能力

每天：

消息量：發(fā)送和接收的消息數(shù)超過60億
將近1000億條數(shù)據(jù)的讀寫
高峰期每秒150萬左右操作
整體讀取數(shù)據(jù)占有約55%，寫入占有45%
超過2PB的數(shù)據(jù)，涉及冗余共6PB數(shù)據(jù)
數(shù)據(jù)每月大概增長300千兆字節(jié)。

9.2 布隆過濾器

在日常生活中，包括在設(shè)計(jì)計(jì)算機(jī)軟件時，我們經(jīng)常要判斷一個元素是否在一個集合中。比如在字處理軟件中，需要檢查一個英語單詞是否拼寫正確（也就是要判斷它是否在已知的字典中）；在 FBI，一個嫌疑人的名字是否已經(jīng)在嫌疑名單上；在網(wǎng)絡(luò)爬蟲里，一個網(wǎng)址是否被訪問過等等。最直接的方法就是將集合中全部的元素存在計(jì)算機(jī)中，遇到一個新元素時，將它和集合中的元素直接比較即可。一般來講，計(jì)算機(jī)中的集合是用哈希表（hash table）來存儲的。它的好處是快速準(zhǔn)確，缺點(diǎn)是費(fèi)存儲空間。當(dāng)集合比較小時，這個問題不顯著，但是當(dāng)集合巨大時，哈希表存儲效率低的問題就顯現(xiàn)出來了。比如說，一個像 Yahoo,Hotmail 和 Gmai 那樣的公眾電子郵件（email）提供商，總是需要過濾來自發(fā)送垃圾郵件的人（spamer）的垃圾郵件。一個辦法就是記錄下那些發(fā)垃圾郵件的 email 地址。由于那些發(fā)送者不停地在注冊新的地址，全世界少說也有幾十億個發(fā)垃圾郵件的地址，將他們都存起來則需要大量的網(wǎng)絡(luò)服務(wù)器。如果用哈希表，每存儲一億個 email 地址，就需要 1.6GB 的內(nèi)存（用哈希表實(shí)現(xiàn)的具體辦法是將每一個 email 地址對應(yīng)成一個八字節(jié)的信息指紋googlechinablog.com/2006/08/blog-post.html，然后將這些信息指紋存入哈希表，由于哈希表的存儲效率一般只有 50%，因此一個 email 地址需要占用十六個字節(jié)。一億個地址大約要 1.6GB，即十六億字節(jié)的內(nèi)存）。因此存貯幾十億個郵件地址可能需要上百 GB 的內(nèi)存。除非是超級計(jì)算機(jī)，一般服務(wù)器是無法存儲的。
布隆過濾器只需要哈希表 1/8 到 1/4 的大小就能解決同樣的問題。
Bloom Filter是一種空間效率很高的隨機(jī)數(shù)據(jù)結(jié)構(gòu)，它利用位數(shù)組很簡潔地表示一個集合，并能判斷一個元素是否屬于這個集合。Bloom Filter的這種高效是有一定代價(jià)的：在判斷一個元素是否屬于某個集合時，有可能會把不屬于這個集合的元素誤認(rèn)為屬于這個集合（false positive）。因此，Bloom Filter不適合那些“零錯誤”的應(yīng)用場合。而在能容忍低錯誤率的應(yīng)用場合下，Bloom Filter通過極少的錯誤換取了存儲空間的極大節(jié)省。

9.2 HBase2.0新特性

2017年8月22日凌晨2點(diǎn)左右，HBase發(fā)布了2.0.0 alpha-2，相比于上一個版本，修復(fù)了500個補(bǔ)丁，我們來了解一下2.0版本的HBase新特性。
最新文檔：
http://hbase.apache.org/book.html#ttl
官方發(fā)布主頁：
http://mail-archives.apache.org/mod_mbox/www-announce/201708.mbox/<CADcMMgFzmX0xYYso-UAYbU7V8z-Obk1J4pxzbGkRzbP5Hps+iA@mail.gmail.com
舉例：

region進(jìn)行了多份冗余
主region負(fù)責(zé)讀寫，從region維護(hù)在其他HregionServer中，負(fù)責(zé)讀以及同步主region中的信息，如果同步不及時，是有可能出現(xiàn)client在從region中讀到了臟數(shù)據(jù)（主region還沒來得及把memstore中的變動的內(nèi)容flush）。
更多變動可以去看：
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12340859&styleName=&projectId=12310753&Create=Create&atl_token=A5KQ-2QAV-T4JA-FDED%7Ce6f233490acdf4785b697d4b457f7adb0a72b69f%7Clout

最近有點(diǎn)忙，我發(fā)現(xiàn)自己感覺到累的時候，能學(xué)到很多，發(fā)現(xiàn)很多問題，這個感覺就對了，有壓力才有動力，找出問題不斷進(jìn)步。
想起了毛澤東同志的一句名言：“我們的同志要在困難的時候，要看到成績，要看到光明，要提高我們的勇氣”。
兄弟們一起加油，一起變強(qiáng)！文章來源地址http://www.zghlxwxcb.cn/news/detail-795554.html

到了這里，關(guān)于大數(shù)據(jù)技術(shù)之HBase（超級詳細(xì)）的文章就介紹完了。如果您還想了解更多內(nèi)容，請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

大數(shù)據(jù)之Hadoop分布式數(shù)據(jù)倉庫HBase
HBase 是一個構(gòu)建在 Hadoop 文件系統(tǒng)之上的面向列的數(shù)據(jù)庫管理系統(tǒng)。要想明白為什么產(chǎn)生 HBase，就需要先了解一下 Hadoop 存在的限制？Hadoop 可以通過 HDFS 來存儲結(jié)構(gòu)化、半結(jié)構(gòu)甚至非結(jié)構(gòu)化的數(shù)據(jù)，它是傳統(tǒng)數(shù)據(jù)庫的補(bǔ)充，是海量數(shù)據(jù)存儲的最佳方法，它針對大文件的存儲，
2024年02月02日
瀏覽(27)
大數(shù)據(jù)：Hadoop基礎(chǔ)常識hive，hbase，MapReduce，Spark
Hadoop是根據(jù)Google三大論文為基礎(chǔ)研發(fā)的，Google 三大論文分別是: MapReduce、 GFS和BigTable。 Hadoop的核心是兩個部分：一、分布式存儲（HDFS，Hadoop Distributed File System）。二、分布式計(jì)算（MapReduce）。 MapReduce MapReduce是“ 任務(wù)的分解與結(jié)果的匯總”。 Map把數(shù)據(jù)切分——分布式存放
2024年04月25日
瀏覽(33)
一篇搞定分布式大數(shù)據(jù)系統(tǒng)所有概念，包括有Hadoop、MapReduce、HDFS、HBASE、NoSql 、ZooKeeper 、Reidis 、Nginx 、BASE、CAP定義、特點(diǎn)和應(yīng)用場景
1.1hadoop定義和特點(diǎn) Hadoop定義： Hadoop是一個開源的分布式計(jì)算框架，用于存儲和處理大規(guī)模數(shù)據(jù)集。它基于Google的MapReduce論文和Google文件系統(tǒng)（GFS）的設(shè)計(jì)理念，并由Apache軟件基金會進(jìn)行開發(fā)和維護(hù)。 Hadoop的主要特點(diǎn)包括：分布式存儲：Hadoop通過分布式文件系統(tǒng)（Hadoop Dist
2024年02月03日
瀏覽(95)
大數(shù)據(jù)篇 | Hadoop、HDFS、HIVE、HBase、Spark之間的聯(lián)系與區(qū)別
Hadoop是一個開源的分布式計(jì)算框架，用于存儲和處理大規(guī)模數(shù)據(jù)集。它提供了一個可擴(kuò)展的分布式文件系統(tǒng)（HDFS）和一個分布式計(jì)算框架（MapReduce），可以在大量廉價(jià)硬件上進(jìn)行并行計(jì)算。 HDFS（Hadoop Distributed File System）是Hadoop的分布式文件系統(tǒng)。它被設(shè)計(jì)用于在集群中存儲
2024年02月16日
瀏覽(119)
Zookeeper+Hadoop+Spark+Flink+Kafka+Hbase+Hive 完全分布式高可用集群搭建(保姆級超詳細(xì)含圖文)
說明: 本篇將詳細(xì)介紹用二進(jìn)制安裝包部署hadoop等組件，注意事項(xiàng)，各組件的使用，常用的一些命令，以及在部署中遇到的問題解決思路等等，都將詳細(xì)介紹。 ip hostname 192.168.1.11 node1 192.168.1.12 node2 192.168.1.13 node3 1.2.1系統(tǒng)版本 1.2.2內(nèi)存建議最少4g、2cpu、50G以上的磁盤容量本次
2024年02月12日
瀏覽(38)
Linux CentOS下大數(shù)據(jù)環(huán)境搭建（zookeeper+hadoop+hbase+spark+scala）
本篇文章是結(jié)合我個人學(xué)習(xí)經(jīng)歷所寫，如果遇到什么問題或者我有什么錯誤，歡迎討論。百度網(wǎng)盤鏈接：https://pan.baidu.com/s/1DCkQQVYqYHYtPws9hWGpgw?pwd=zh1y 提取碼：zh1y 軟件在連接中VMwareWorkstation_V16.2.1_XiTongZhiJia的文件夾下。雙擊運(yùn)行安裝包，這里下一步即可。這里勾選我接受許可
2024年04月15日
瀏覽(62)
大數(shù)據(jù)HBASE的詳細(xì)使用
摘要：本文將深入探討大數(shù)據(jù)HBASE的使用步驟，幫助讀者了解和掌握這一強(qiáng)大的分布式數(shù)據(jù)庫系統(tǒng)的基本概念和操作技巧。通過本文的閱讀，讀者將能夠熟悉HBASE的基本設(shè)置，了解其核心概念，掌握基本的查詢和管理操作，并理解其在大數(shù)據(jù)環(huán)境中的應(yīng)用場景。 HBASE是一種開源
2024年02月10日
瀏覽(9)
HBase與其他大數(shù)據(jù)技術(shù)的比較
HBase是一個分布式、可擴(kuò)展、高性能的列式存儲系統(tǒng)，基于Google的Bigtable設(shè)計(jì)。HBase是Hadoop生態(tài)系統(tǒng)的一部分，可以與HDFS、MapReduce、ZooKeeper等技術(shù)整合。HBase的核心特點(diǎn)是提供低延遲、高可擴(kuò)展性的隨機(jī)讀寫訪問。 HBase與其他大數(shù)據(jù)技術(shù)的比較有以下幾個方面： 1.1 HBase與HDFS的
2024年02月22日
瀏覽(21)
輕大21級軟工大數(shù)據(jù)實(shí)驗(yàn)（手把手教你入門Hadoop、hbase、spark）
寫在最前面，如果你只是來找答案的，那么很遺憾，本文盡量避免給出最后結(jié)果，本文適合Linux0基礎(chǔ)學(xué)生，給出詳細(xì)的環(huán)境配置過程，實(shí)驗(yàn)本身其實(shí)很簡單，供大家一起學(xué)習(xí)交流。 1 ．編程實(shí)現(xiàn)以下指定功能，并利用 Hadoop 提供的 Shell 命令完成相同任務(wù) : 向HDFS 中上傳任意文
2024年02月05日
瀏覽(112)
數(shù)據(jù)分片技術(shù)及其在HBase中的應(yīng)用
作者：禪與計(jì)算機(jī)程序設(shè)計(jì)藝術(shù) HBase是一個開源的分布式NoSQL數(shù)據(jù)庫系統(tǒng)，可以用于海量結(jié)構(gòu)化和半結(jié)構(gòu)化的數(shù)據(jù)存儲。相比于傳統(tǒng)的關(guān)系型數(shù)據(jù)庫系統(tǒng)，HBase在很多方面都優(yōu)秀，例如高速讀寫、高容錯性和動態(tài)伸縮等，但同時也存在一些不足。比如它的查詢延遲較長，因?yàn)樗?/p>
2024年02月10日
瀏覽(22)

<bdo id="hrlrx"></bdo>

<legend id="hrlrx"><menu id="hrlrx"></menu></legend>