HBase/Phoenix
HBase相关术语
region 是部分数据,所以是所有数据的一个自己,但region包括完整的行,所以region 是行为单位 表的一个子集。每个region 有三个主要要素:(HBase扩展和负载均衡的基本单位是Region)
它属于那张表; 它所包含的第一行(第一个region没有首行);它所包含的最后一行(最后一个region没有最后一行)从HMaster的角度,每个HRegion都纪录了它的StartKey和EndKey(第一个HRegion的StartKey为空,最后一个HRegion的EndKey为空)
HBase(Hadoop Database)是一个开源的、面向列(Column-Oriented)、适合存储海量非结构化数据或半结构化数据的、具备高可靠性、高性能、可灵活扩展伸缩的、支持实时数据读写的分布式存储系统
There is one MemStore per CF; when one is full, they all flush. It also saves the last written sequence number so the system knows what was persisted so far.
一个Region对应一个或多个MemStore;Column Family : 列族,hbase表中的每个列,都归属与某个列族。列族是表的schema的一部分(而列不是),必须在使用表之前定义。列名都以列族作为前缀。例如courses:history , courses:math 都属于 courses这个列族。
Column families are stored in separate filesColumn qualifier—Data within a column family is addressed via its column qualifier, or column. Column qualifiers need not be specified in advance. Column qualifiers need not be consistent between rows. Like rowkeys, column qualifiers don’t have a data type and are always treated as a byte[]. 一个列名是由它的列族前缀和修饰符(qualifier)连接而成
Row key行键 (Row key),在HBase内部,row key保存为字节数组。
Auto-Sharding(自动分区) : which implies that tables are dynamically split and distributed by the database when they become too large.
HBase can handle petabytes of data. HBase is designed for queries of massive data sets and is optimized for read performance.
A Region is further comprised of a Store per column family. A Store has an in-memory component called the MemStore and a persistent storage component called an HFile or StoreFile. A region is a contiguous segment of a column family
HBase Sizing : HBase规划;Catalog Table : 目录表
MDs have very low risk of collisions, the risk nonetheless still exists.
HBase Shell链接到远程HBase集群,修改ZooKeeper配置就可以了。
In the land of Hadoop/HBase, RAM is King .. generally the more RAM you can throw at the problem the better. In Hadoop/HBase deployments I've managed in the past the 24GB to 36GB per node was the norm, 16GB would be the minimum.
HBase Master (HMaster) has small set of responsibilities which does not require a lot of memory. HMaster is used for administrative tasks like assigning regions, updating meta table and for DDL statements. Clients do not interact with HMaster when they need to read/scan/write data.
You can easily reduce your HMaster heap size to 4 GB
HBase MVCC (HBase Multiple Version Concurrency Control)多版本并发控制协议,广泛使用于数据库系统
what exactly an HBase “block” is?
In the HBase context, a block is a single unit of I/O. When writing data out to an HFile, the block is the smallest unit of data written. Likewise, a single block is the smallest amount of data HBase can read back out of an HFile. Be careful not to confuse an HBase block with an HDFS block, or with the blocks of the underlying file system – these are all different.在script中运行HBase相关命令:
$HBASE_HOME/bin/hbase shell $PATH_TO_SCRIPTS/hbase-create.hbaseHBase 'move' command to move region? Region Server startcode?
Get Region Server startcode: run : status 'simple'How to list the regions in an HBase table through the shell?
scan 'hbase:meta',{FILTER=>"PrefixFilter('TraceV2')"}
move '6f6b83192cbaf4ddf13602683b71386c','apm-datanode01,16020,1498020609021'
More Commands:
start the HBase REST server to listen on an available port, for example 9080
./hbase-daemon.sh start rest -p 9080Determine the size of HBase table
$HADOOP_HOME/bin/hdfs dfs -du -h /hbase/data/defaultGet one row: get "AgentInfo","i-0357ef7620c546981\x00\x00\x00\x00\x00\x7F\xFF\xFE\xA32B4\x09"
HBase list row keys of one table : count 'table_name', INTERVAL=> 1
References
Leverage Large Physical Memory to Improve HBase Read Performance
An In-Depth Look at the HBase Architecture
- Tuning G1GC For Your HBase Cluster
- Configure HBase Garbage Collection
- Tuning Java Garbage Collection For HBase
- HBase Script解析
- HBase Znode
- General HBase Tuning
- Configuring HBase Memstore: What You Should Know
- HBase Region Server Sizing
- HBase FAQ: Sizing A HBase Cluster
- Hbase参数优化以及问题汇总
- HBase Master Architecture
- HBase 深入分析RegionServer
- HBase Region Splitting And Merging
- HBase Installation : Fully Distributed Mode
- HBase BlockCache 101 hortonworks
- HBase and data locality
- HBase Compaction and Data Locality with Hadoop
- HBase Region Balance实践
- Hadoop+HBase+ZooKeeper分布式集群环境搭建
- Hbase does not closing a closed socket resulting in many CLOSE_WAIT
- Improper error handling in WAL Reader/Writer creation
hbase性能调优
- HBase major compaction per cronjob
- Understanding HBase Compaction
- HBase Shell Command