Cassandra操作入门

来自： http://blog.csdn.net/fenglibing/article/details/9411021

1. Cassandra是什么

Apache Cassandra是一套开源分布式NoSQL数据库系统。它最初由非死book开发，用于储存收件箱等简单格式数据，集Google BigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身。非死book于2008将 Cassandra 开源，此后，由于Cassandra良好的可扩放性，被Digg、推ter等知名Web 2.0网站所采纳，成为了一种流行的分布式结构化数据存储方案。

详细参看:http://zh.wikipedia.org/wiki/Cassandra

2. 下载、安装、运行服务端及客户端

下载：http://cassandra.apache.org/download/

安装：因Cassandra是JAVA编写，所以理论上是在具有JDK6及上版本的机器上都可以运行，官方测试的和JDK有OpenJDK 及Sun的JDK。

运行服务端：在WINDOWS上可不用修改任何文件，直接运行bin/cassandra.bat；

在Linux上，如果不修改配置文件，一定要保证目录“/var/log/cassandra”及“/var/lib/cassandra”是必须存在的，并且拥有权限，直接运行bin/cassandra

运行客户端：

在windows上面运行bin/cassandra-cli.bat，linux上面运行bin/cassandra-cli，没有报错且出现类似这样的提示符就说明成功连接上了：

[default]

3、配置文件

conf/cassandra.yaml：这个是核心配置文件，包括各种策略、数据日志及cache data存放的地方等，如数据文件的配置项“data_file_directories”，上面我们是直接启动了cassandra，默认在的日志及数据存放目录分别是：

Windows：

在Cassandra运行的所在盘的根目录下面，会有一个var这样的目录，然后下面分别会有log、lib目录分别用于存放数据及日志；

Linux：

存放日志及数据的目录是“/var/log/cassandra”及“/var/lib/cassandra”

详细的配置项就自己看了。

4、操作示例

4.1 简介

Cassandra的操作命令，类似于我们平时操作的关系数据库一样，对熟悉MYSQL的朋友来说，看到的都会是一些熟悉的身影，如创建是用create，删除是用drop，更新是用update，查看对象是用show，要使用某个列族长则用use，非常的好记。如果是第一次使用，建设还是看这个官方的入门操作文档吧：http://wiki.apache.org/cassandra/GettingStarted。

4.2 创建keyspace

Cassandra的存储抽象结构和数据库一样，keyspace对应关系数据库的database或schema，column family对应于table，所以我们现在就和操作关系数据库一样，在连上去过后的第一步，就是创建一个keyspace（注：如果不知道命令如何使用，打入help命令，很多东西都可以看到如何使用）：

create keyspace myspace   with placement_strategy='org.apache.cassandra.locator.SimpleStrategy'   and strategy_options={replication_factor:1};

第一行很简单理解，就是创建一个名为myspace的keyspace，第二行就是存储策略，这里共有三种存储策略，第三行就是指定的存储策略的参数选项了。三种存储策略分别是：

org.apache.cassandra.locator.SimpleStrategy  org.apache.cassandra.locator.NetworkTopologyStrategy  org.apache.cassandra.locator.OldNetworkTopologyStrategy

SimpleStrategy针对是一个data center中的多个存储节点(node)的存储，strategy_options表示数据存储所有存储节点(node)的复本数量，选择node的规则是在data center中按照顺时针的方向进行选择；

NetworkTopologyStrategy是针对多个data center的情况进行处理，这个是以防同一个data center中的所以节点同时出现问题，如掉电；

OldNetworkT opologyStrategy，这个可能会很少用上了，对data center的个数及复本的数量支持的有限，有了NetworkTopologyStrategy就不需要OldNetworkTopologyStrategy了。

详细请参看：http://www.datastax.com/docs/1.0/cluster_architecture/replication

4.3 创建column family

首先得选择我们刚才创建的keyspace：

use myspace;

创建column family：

create column family mycolumn                  with key_validation_class = 'UTF8Type'       and comparator = 'UTF8Type'                  and default_validation_class = 'UTF8Type';

4.4 插入及获取数据库

插入数据：

set mycolumn[1][name1]=tom;

获取数据：

get mycolumn[1];

会显示如下：

[default@myspace] get mycolumn[1];  => (name=name1, value=tom, timestamp=1374485996562000)  Returned 1 results.  Elapsed time: 7.99 msec(s).

4.5、通过JAVA操作Cassandra

Hector是一个比较好的选择，完全开源，这个是GitHub的源码地址：https://github.com/rantav/hector，以下是一个基于Hector的CRUB的示例，依赖的包在Cassandra的lib目录下面就可以找到：

package test.cassandra;    import me.prettyprint.cassandra.serializers.StringSerializer;  import me.prettyprint.hector.api.Cluster;  import me.prettyprint.hector.api.Keyspace;  import me.prettyprint.hector.api.beans.ColumnSlice;  import me.prettyprint.hector.api.beans.Rows;  import me.prettyprint.hector.api.factory.HFactory;  import me.prettyprint.hector.api.mutation.Mutator;  import me.prettyprint.hector.api.query.MultigetSliceQuery;  import me.prettyprint.hector.api.query.QueryResult;  import me.prettyprint.hector.api.query.SliceQuery;    public class CassandraExample {     // The string serializer translates the byte[] to and from String using   // utf-8 encoding   private static StringSerializer stringSerializer = StringSerializer.get();     public static void insertData() {    try {     // Create a cluster object from your existing Cassandra cluster     Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "localhost:9160");       // Create a keyspace object from the existing keyspace we created     // using CLI     Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);       // Create a mutator object for this keyspace using utf-8 encoding     Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);       // Use the mutator object to insert a column and value pair to an     // existing key     mutator.insert("sample", "authCollection", HFactory.createStringColumn("username", "admin"));     mutator.insert("sample", "authCollection", HFactory.createStringColumn("password", "admin"));       System.out.println("Data Inserted");     System.out.println();    } catch (Exception ex) {     System.out.println("Error encountered while inserting data!!");     ex.printStackTrace();    }   }     public static void retrieveData() {    try {     // Create a cluster object from your existing Cassandra cluster     Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "localhost:9160");       // Create a keyspace object from the existing keyspace we created     // using CLI     Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);     SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);     sliceQuery.setColumnFamily("authCollection").setKey("sample");     sliceQuery.setRange("", "", false, 4);       QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();     System.out.println("\nInserted data is as follows:\n" + result.get());     System.out.println();    } catch (Exception ex) {     System.out.println("Error encountered while retrieving data!!");     ex.printStackTrace();    }   }     public static void updateData() {    try {       // Create a cluster object from your existing Cassandra cluster     Cluster cluster = HFactory.getOrCreateCluster("Test Sample", "localhost:9160");       // Create a keyspace object from the existing keyspace we created     // using CLI     Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);       // Create a mutator object for this keyspace using utf-8 encoding     Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);       // Use the mutator object to update a column and value pair to an     // existing key     mutator.insert("sample", "authCollection", HFactory.createStringColumn("username", "administrator"));       // Check if data is updated     MultigetSliceQuery<String, String, String> multigetSliceQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);     multigetSliceQuery.setColumnFamily("authCollection");     multigetSliceQuery.setKeys("sample");       // The 3rd parameter returns the columns in reverse order if true     // The 4th parameter in setRange determines the maximum number of     // columns returned per key     multigetSliceQuery.setRange("username", "", false, 1);     QueryResult<Rows<String, String, String>> result = multigetSliceQuery.execute();     System.out.println("Updated data..." + result.get());      } catch (Exception ex) {     System.out.println("Error encountered while updating data!!");     ex.printStackTrace();    }   }     public static void deleteData() {    try {       // Create a cluster object from your existing Cassandra cluster     Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "localhost:9160");       // Create a keyspace object from the existing keyspace we created     // using CLI     Keyspace keyspace = HFactory.createKeyspace("AuthDB", cluster);       // Create a mutator object for this keyspace using utf-8 encoding     Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);       // Use the mutator object to delete row     mutator.delete("sample", "authCollection", null, stringSerializer);       System.out.println("Data Deleted!!");       // try to retrieve data after deleting     SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);     sliceQuery.setColumnFamily("authCollection").setKey("sample");     sliceQuery.setRange("", "", false, 4);       QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();     System.out.println("\nTrying to Retrieve data after deleting the key 'sample':\n" + result.get());       // close connection     cluster.getConnectionManager().shutdown();      } catch (Exception ex) {     System.out.println("Error encountered while deleting data!!");     ex.printStackTrace();    }   }     public static void main(String[] args) {      insertData();    retrieveData();    updateData();    deleteData();     }  }

5、搭建及验证多节点集群

Cassandra是基于Gossip协议，水平扩展是非常的方便，增加新的节点，不需要重启服务，他们会自动发现，因页搭建单个集群的多结点，是非常简单的一件事情，只需要做几件事情：

5.1 在conf/cassandra.yaml中的“seed_provider”指定现在有的节点IP，这里的IP一定要是基于当前网卡的IP地址，而不能够是127.0.0.1之类的：

seeds: "192.168.26.128,192.168.2.204"

分隔符为逗号，可以同时指定多个IP；

5.2 指定listen_address，这个是用于监听其它节点，这里一定要写成当前节点的网站IP地址，如：192.168.26.128；

5.3 指定rpc_address，这个设置是表示在哪里监听客户端，因为某个服务器可能有多个网卡，这里可以设置为与listen_address的值一样，也可以设置为0.0.0.0，表示监听所有的网卡。

上面就完全成了一个存储节点的配置，搭建多个节点，只需要将这些这个结点上的Cassandra拷贝到新的结点服务器上去就可以了，需要做的就是修改listen_address及rpc_address为新的结点的网卡IP地址，seeds就不用修改了。

搭建这样就OK了，下面我们就验证一下了。

5.4 验证多点节集群
Cassandra自带非常好的工具接口nodetool，它通过JMX的方式将命令发送到cassandra上去执行，然后得到返回结果。当前nodetool只能够具有cassandra环境的节点上面执行，因为它需要共享cassandra本身的一些配置文件，如log4j等。执行nodetool需要带IP和JMX端口，命令格式为“nodetool -host <host> -port <JMX_PORT> <command>”，示例如下：

nodetool -host 192.168.26.128 -port 7199 ring

注：JMX_PORT这个变量在cassandra-env.sh里面有设置，里面可以看到值为7199，但是windows的配置文件中没有看到有，应该是默认为7199吧。
noodtool常用的命令有

ring — ring命令用于查看集群的节点信息，ring来源于consistent hash，在consistent hash中，各个节点组成一个环，通常称为ring。

ring命令的输出中包括当前集群的节点，各个节点的状态（Up还是Down），节点的load（数据量），节点在ring上的位置等信息。

示例输出：

Starting NodeTool  Note: Ownership information does not include topology; for complete information,   specify a keyspace    Datacenter: datacenter1  ==========  Address         Rack        Status State   Load            Owns                T  oken                                                                                 7  160946931665707836  192.168.26.128  rack1       Up     Normal  78.18 KB        43.86%              -  3195122621607553968  192.168.2.204   rack1       Up     Normal  81.56 KB        56.14%              7  160946931665707836

这个示例里面显示了两个节点，当前状态都是Up。

info — info命令用于显示一个节点的信息，包括当前的load（数据量），运行时间，内存使用情况等。

示例输出：

Starting NodeTool  Token            : -3195122621607553968  ID               : 1c65f178-0742-4379-bd8d-9011b9f7c4a3  Gossip active    : true  Thrift active    : true  Load             : 78.18 KB  Generation No    : 1374563151  Uptime (seconds) : 3802  Heap Memory (MB) : 18.18 / 1022.44  Data Center      : datacenter1  Rack             : rack1  Exceptions       : 0  Key Cache        : size 952 (bytes), capacity 53477376 (bytes), 43 hits, 59 requ  ests, 0.729 recent hit rate, 14400 save period in seconds  Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN r  ecent hit rate, 0 save period in seconds

cfstats — 查看各个column family的详细信息，包括读写次数、响应时间、memtable、sstable等。

输出比较多，就不贴示例输出了。

热门搜索

Cassandra操作入门