总结cassandra的特性:
1. 支持SQL语句, 本身带CQL工具,支持等值查询, 但对range query支持很弱, 不支持Join. 2. cassandra 虽然是NoSQL数据库, 但比MongoDB要更强调schema, 但不如MongoDB3. cassandra 安装和配置简单, HBase 相比配置要复杂多了. 4. 线性扩展 和 高可用性(支持多data center的高可用)5. 灵活性, 如果关注性能, 可以consistency设置的低一些,如果关注data integrity, consistency设置的高一些6. cassandra 适合大量数据的存储, 而且写的效率非常高, 比RDBMS和HBase都高很多, 类外提供类SQL的语句, 上手很快.Cassandra 仅适合特定场景, 下面是我罗列的使用场景:
1. [不推荐]交易型的系统, 显然需要完整的ACID特性支持, 还是建议使用传统的RDBMS. 2. [不推荐]其他OLTP系统, 如果一定要使用NoSQL数据库, 还是使用MongoDB, MongoDB3. [不推荐]通用的数据分析系统(BI类的应用). Cassandra 不支持 Join/Range 查询, 所以它不太适合绝大多数OLAP场景(至少单独使用不合适)4. [推荐]代替 redis 的场景, 比如作为 缓存 服务器, 团购/秒杀业务中替代Redis, 借助数据的TTL特性5. [推荐]和RDBMS配合着使用的某些场景, 比如单笔数据量很大的记录, 可以将这些大记录存放到Cassandra中, 将记录关键查询信息记保存到RDBMS, 由RDBMS提供丰富的range和统计查询功能. 6. [推荐]用户画像数据库, 用户画像的标签会非常多, 也仅仅需要等值查询功能即可, Cassandra可以添加无数个column. 7. [推荐]其他一些代替 hbase 的场景. HBase 需要搭建在Hadoop集群上, 管理复杂, Cassandra 集群搭建很简单, 可以代替 HBase + phoenix 组合, 而且写性能更好下面Cassandra/MongoDB对比摘自: https://scalegrid.io/blog/cassandra-vs-mongodb/#
非常不错的学习资料, 包含日常运维, 升级, 数据迁移, 架构
http://zqhxuyuan.github.io/tags/cassandra低成本搭建多可用区域高可用Cassandra集群, http://chuansong.me/n/840485751454 , 理解不同replication factor和数据中心和写入策略下,Cassandra的高可用性
Cassandra Note https://chenhm.com/slides/cassandra/cassandra.html#true-columns很不错的系列 https://www.flyml.net/2016/10/30/some-comments-on-column-family-database/
http://www.csdn.net/article/2014-10-24/2822278-how-to-bulida-spark-and-cassandra-based-high-performance-data-pipeline/2【问底】许鹏:使用Spark+Cassandra打造高性能数据分析平台(一)
https://killrvideo.github.io/ 是一个完整的Cassandra+C#的Web示例应用, 而且是一个微服务的完整案例, 包括如何使用docker+etcd, 包括Cassandra 模型的完整设计.
spark机器学习笔记--包含好几个公开的数据集
http://blog.csdn.net/u013719780/article/details/51768720Learn Apache Cassandra by Example with CDM( Cassandra Dataset Manager)http://thelastpickle.com/blog/2016/09/21/learn-cassandra-by-example-with-cdm.html机器学习常用的公开数据集: http://blog.csdn.net/u013719780/article/details/51768720 =========================================使用场景=========================================www.tuicool.com/articles/RjUjUrBhttps://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysishttp://stackoverflow.com/questions/2634955/when-not-to-use-cassandrahttp://www.datastax.com/nosql-databases/nosql-cassandra-and-hadoophttp://www.edureka.co/blog/why-learn-cassandra-with-hadoop/ =========================================教程=========================================中文文档http://pimin.net/tags/Cassandrahttp://rustyrazorblade.com/2015/08/migrating-from-mysql-to-cassandra-using-spark/http://rustyrazorblade.com/2016/05/working-relationally-with-cassandra/Spark SQL + Cassandra https://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/spark/sparkTOC.htmlhttp://www.planetcassandra.org/blog/the-new-analytics-toolbox-with-apache-spark-going-beyond-hadoop/Cassandra Tutorialhttps://intellipaat.com/tutorial/cassandra-tutorial/Cassandra + PySpark DataFrames revistedhttp://rustyrazorblade.com/2015/07/cassandra-pyspark-dataframes-revisted/使用 Python 开发http://datastax.github.io/python-driver/getting_started.htmlhttp://slides.com/amberdoctor/getting-started-with-cassandra-python#/6python on cassandra http://yyri.blog.163.com/blog/static/148943951201221983458871/Python 操作Cassandrahttp://www.cnblogs.com/zhfan/p/4181529.htmlhttp://pycon-2012-notes.readthedocs.io/en/latest/apache_cassandra_and_python.html =========================================Model=========================================
Cassandra Modeling for Real-Time Analyticshttp://www.datasciencecentral.com/profiles/blogs/cassandra-modeling-for-real-time-analytics http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modelinghttp://www.datastax.com/dev/blog/thrift-to-cql3cql3 for DataStax 2.0 & 2.1 https://docs.datastax.com/en/cql/3.1/pdf/cql31.pdfCassandra By Example: Data Modelling with CQL3 http://www.slideshare.net/jericevans/cassandra-by-example-data-modelling-with-cql3Primary Key(Partitioning Key, clustering Key) http://www.planetcassandra.org/blog/primary-keys-in-cql/http://opensourceconnections.com/blog/2013/07/24/understanding-how-cql3-maps-to-cassandras-internal-data-structure/http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/=========================================数据迁移=========================================www.tuicool.com/articles/ie67Vnhttp://www.svds.com/flexible-data-architecture-with-spark-cassandra-and-impala/http://www.codeproject.com/Articles/279947/Migration-of-Relational-Data-structure-to-Cassandrhttp://wiki.apache.org/cassandra/FAQhttp://wiki.apache.org/cassandra/Operations =========================================原理=========================================一致性hash算法释义http://www.cnblogs.com/haippy/archive/2011/12/10/2282943.htmlhttps://my.oschina.net/xianggao/blog/394545下面这个文章是虽然是将LevelDB的, 但Cassandra的很多概念都是类似, 尤其是内存/log等存储结构,数据分析与处理之二(Leveldb 实现原理)http://www.cnblogs.com/haippy/archive/2011/12/04/2276064.html[译]Cassandra 架构简述http://www.cnblogs.com/hxdong/archive/2013/06/16/3135455.htmldelete 操作后空间的释放 http://www.sestevez.com/range-tombstones/constant, GCGraceSeconds 参数, default setting is very conservative, at 10 days为表设置Compaction Strategy, https://www.instaclustr.com/blog/2016/01/27/apache-cassandra-compaction/建模: http://www.devx.com/dbzone/cassandra-for-sql-developers.html建模 https://academy.datastax.com/resources/getting-started-time-series-data-modelingselect 语句的限制: http://mechanics.flite.com/blog/2013/11/05/breaking-down-the-cql-where-clause/建模: http://rustyrazorblade.com/2015/08/migrating-from-mysql-to-cassandra-using-spark/http://rustyrazorblade.com/2016/05/working-relationally-with-cassandra/Cassandra + PySpark DataFrames revistedhttp://rustyrazorblade.com/2015/07/cassandra-pyspark-dataframes-revisted/=========================================安装=========================================cassandra 下载
https://academy.datastax.com/planet-cassandra//cassandra/Cassandra部署与安装
http://dongxicheng.org/nosql/cassandra-install/ cassandra集群添加新的数据中心 http://openwares.net/database/cassandra_add_new_datacenter.html Cassandra中的各种策略(分区,备份,一致性, 存储等策略)http://dongxicheng.org/nosql/cassandra-strategy/详解Cassandra0.7配置文件http://www.cnblogs.com/gpcuster/archive/2010/11/12/1875388.htmlCassandra单集群实验2个节点http://blog.fens.me/cassandra-clustor/动态增删Cassandra机器节点http://www.codes51.com/article/detail_430313.html=========================================练习=========================================cqlsh> CREATE KEYSPACE mykeyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};-- Breaking Down the CQL Where Clause--http://mechanics.flite.com/blog/2013/11/05/breaking-down-the-cql-where-clause/create table temperature_by_day( wheatherstation_id text, date text, event_time timestamp, temperature text, primary key((wheatherstation_id, date), event_time) ); CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time)); SELECT *FROM temperature_by_dayWHERE event_time = '2013-04-03 06:00:00' cqlsh> USE mykeyspace;cqlsh> CREATE TABLE mytable (a INT PRIMARY KEY, b INT, c INT, d INT);cqlsh> INSERT INTO mytable (a, b, c, d) VALUES (1, 2, 3, 4);