一、快照机制snapshots
简单在hbase上做个表做测试:
hbase(main):044:0> scan 'student'
ROW COLUMN+CELL num1 column=shuxing:name, timestamp=1412189531346, value=jaybing num2 column=shuxing:name, timestamp=1412189623682, value=jaychou num3 column=shuxing:like, timestamp=1412189669404, value=game 3 row(s) in 0.0260 seconds创建这个表的快照:
hbase(main):045:0> snapshot 'student','snapshot_student'0 row(s) in 1.2620 seconds[root@nn ~]# hadoop fs -ls /tmpdir/
Found 9 itemsdrwxr-xr-x - root supergroup 0 2014-10-02 02:58 /tmpdir/.hbase-snapshotdrwxr-xr-x - root supergroup 0 2014-10-01 21:48 /tmpdir/.tmpdrwxr-xr-x - root supergroup 0 2014-10-01 21:37 /tmpdir/WALsdrwxr-xr-x - root supergroup 0 2014-10-02 02:42 /tmpdir/archivedrwxr-xr-x - root supergroup 0 2014-09-28 00:42 /tmpdir/corruptdrwxr-xr-x - root supergroup 0 2014-09-26 11:20 /tmpdir/data-rw-r--r-- 2 root supergroup 42 2014-09-26 11:20 /tmpdir/hbase.id-rw-r--r-- 2 root supergroup 7 2014-09-26 11:20 /tmpdir/hbase.versiondrwxr-xr-x - root supergroup 0 2014-10-02 02:48 /tmpdir/oldWALs[root@nn ~]# hadoop fs -ls /tmpdir/.hbase-snapshotFound 2 itemsdrwxr-xr-x - root supergroup 0 2014-10-02 02:58 /tmpdir/.hbase-snapshot/.tmpdrwxr-xr-x - root supergroup 0 2014-10-02 02:58 /tmpdir/.hbase-snapshot/snapshot_student 这应该就是快照的数据文件;删除student表两行,模拟数据文件损坏;
hbase(main):061:0> disable 'student'
0 row(s) in 2.0310 secondshbase(main):062:0> is_
is_a? is_disabled is_enabledhbase(main):062:0> is_enabled 'student'false 0 row(s) in 0.0800 secondshbase(main):063:0> drop
drop drop_all drop_namespacehbase(main):063:0> drop 'student'0 row(s) in 0.1940 secondshbase(main):064:0> list
TABLE
0 row(s) in 0.0200 seconds=> []
用快照恢复表:
hbase(main):070:0> restore_snapshot 'snapshot_student'
0 row(s) in 6.4950 secondshbase(main):071:0> scan 'student'
ROW COLUMN+CELL num1 column=shuxing:name, timestamp=1412189531346, value=jaybing num2 column=shuxing:name, timestamp=1412189623682, value=jaychou num3 column=shuxing:like, timestamp=1412189669404, value=game 3 row(s) in 0.2190 seconds注: 快照只是保存着快照时hbase表那一刻的数据,至于快照以后的增量的数据,快照是 不支持的;
二、导出表Export
HBase的表导出工具是一个内置的功能,它使数据很容易从hbase导入hdfs目录下的sequencefiles文件,它创造了一个Map reduce任务,通过一系列的hbase api来调用集群,获取指定的表格的每一行数据,并将数据写入指定 的HDFS目录中;
........
三、拷贝表copytable
HBase的表拷贝工具和导出工具差不多,拷贝表也hbase api创建map reduce任务,从源数据读取数据,不同的是拷贝的输出是hbase 的另一个表;这个表可在本地集群,也可在远程集群;