设为首页收藏本站

 找回密码
 注册

QQ登录

只需一步,快速开始

查看: 665|回复: 3

Apache Cassandra

[复制链接]
发表于 2014-2-26 09:32:28 | 显示全部楼层 |阅读模式
Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters,[1] with asynchronous masterless replication allowing low latency operations for all clients.
Cassandra also places a high value on performance. University of Toronto researchers studying NoSQL systems concluded that "In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments."[2]
Cassandra's data model is a partitioned row store with tunable consistency.[3] Rows are organized into tables; the first component of a table's primary key is the partition key; within a partition, rows are clustered by the remaining columns of the key.[4] Other columns may be indexed separately from the primary key.[5]
Tables may be created, dropped, and altered at runtime without blocking updates and queries.[6]
Cassandra does not support joins or subqueries, except for batch analysis via Hadoop. Rather, Cassandra emphasizes denormalization through features like collections.[7]



The Apache Cassandra Project
 楼主| 发表于 2014-2-26 09:53:39 | 显示全部楼层
Introducing DataStax Enterprise  What is DataStax Enterprise?      
DataStax Enterprise is a NoSQL database platform architected for today's line-of-business    applications that is powered by Apache Cassandra and designed to securely manage real-time,    analytic, and search data all in the same database cluster.
  
  How Does DataStax Enterprise Work?      
DataStax Enterprise contains a production-certified version of Cassandra for handling    real-time, transactional workloads as well as advanced security for protecting sensitive    data.
   
Analytics on Cassandra data may easily be performed by adding nodes dedicated to analytic    operations (currently powered by Hadoop). Enterprise search operations on Cassandra data can be    run by adding nodes devoted to search (currently handled by Solr) to a cluster.
   
Each workload (real-time, analytics, and search) are isolated to nodes devoted to their    respective operations so that real-time transactional workloads do not negatively impact    analytic operations, which in turn do not affect search tasks. Full workload management is built    in to each cluster.
   
Adding additional capacity or different workloads is done simply by adding new nodes to a    cluster and choosing how to replicate data between them:
   


  

Home  Introducing DataStax Enterprise
Documentation
Documentation home

 楼主| 发表于 2014-2-26 10:25:56 | 显示全部楼层
Architecture in brief
An overview of Cassandra's structure.
  
Cassandra is designed to handle big data workloads across multiple nodes with no single point   of failure. Its architecture is based in the understanding that system and hardware failure can   and do occur. Cassandra addresses the problem of failures by employing a peer-to-peer distributed   system where all nodes are the same and data is distributed among all nodes in the cluster. Each   node exchanges information across the cluster every second. A commit log on each node captures   write activity to ensure data durability. Data is also written to an in-memory structure, called   a memtable, and then written to a data file called an SSTable on disk once the memory structure   is full. All writes are automatically partitioned and replicated throughout the cluster.
  
Cassandra is a row-oriented database. Cassandra's architecture allows any authorized user to   connect to any node in any data center and access data using the CQL language. For ease of use,   CQL uses a similar syntax to SQL. From the CQL perspective the database consists of tables.   Typically, a cluster has one keyspace per application. Developers can access CQL through cqlsh as   well as via drivers for application languages.
  
Client read or write requests can go to any node in the cluster. When a client connects to a   node with a request, that node serves as the coordinator for that particular client operation. The coordinator acts as a proxy   between the client application and the nodes that own the data being requested. The coordinator   determines which nodes in the ring should get the request based on how the cluster is configured.   For more information, see Client requests.
     Key components for configuring Cassandra        
  •       
    Gossip: A peer-to-peer communication protocol       to discover and share location and state information about the other nodes in a Cassandra       cluster.
          
    Gossip information is also persisted locally by each node to use immediately when a node       restarts. You may want to purge gossip history       on node restart for various reasons, such as when the node's IP addresses has changed.
  •       
    Partitioner: A partitioner determines       how to distribute the data across the nodes in the cluster. Choosing a partitioner determines       which node to place the first copy of data on.
          
    You must set the partitioner type and       assign the node a num_tokens value for       each node. If not using virtual nodes (vnodes), use the initial_token setting instead.
  •       
    Replica placement strategy:       Cassandra stores copies (replicas) of data on multiple nodes to ensure reliability and fault       tolerance. A replication strategy determines which nodes to place replicas on. The first       replica of data is simply the first copy; it is not unique in any sense.
          
    When you create a keyspace, you must define the replica placement strategy and the       number of replicas you want.
  •       
    Snitch: A snitch defines the topology       information that the replication strategy uses to place replicas and route requests       efficiently.
          
    You need to configure a snitch when you       create a cluster. The snitch is responsible for knowing the location of nodes within your       network topology and distributing replicas by grouping machines into data centers and       racks.
  •       
    The cassandra.yaml file is the main       configuration file for Cassandra. In this file, you set the initialization properties for a       cluster, caching parameters for tables, properties for tuning and resource utilization,       timeout settings, client connections, backups, and security.
  •       
    Cassandra stores table properties in the system keyspace. You set       storage configuration attributes on a per-keyspace or per-table basis programmatically or       using a client application, such as CQL.
          
    By default, a node is configured to store the data it manages in the        /var/lib/cassandra directory. In a production cluster deployment, you       change the commitlog-directory       to a different disk drive from the data_file_directories.

   
Related topics
       
  

Parent topic: Understanding the architecture



 楼主| 发表于 2014-2-26 10:46:21 | 显示全部楼层
Querying Cassandra

Quickly master inserting and retrieving data from Cassandra 2.0 using the cqlsh utility.
Attention: The information presented here applies only to Cassandra 2.x not to Cassandra 1.2.

You can run Cassandra Query Language (CQL) using the cqlsh utility to:

    Create a keyspace, which is akin to the namespace of an SQL database.
    Use the keyspace to create a table, which is similar to an SQL table.
    Insert data into the table.
    Use queries to sort, retrieve, alter, automatically expire, and drop the data.

Procedure

From a terminal:

    Assuming Cassandra is running, start cqlsh on Windows or Linux from the installation directory. In a shell on Mac OS X, for example:

    $ ./bin/cqlsh

    At the cqlsh prompt, use the DESCRIBE cqlsh command to see the keyspaces that already exist in Cassandra:

    DESCRIBE keyspaces;

    The output is a list of system keyspaces containing tables of details about database objects and cluster configuration:

    system system_auth system_traces

    Create a keyspace.

    CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

    Use the keyspace, just as you would use an SQL database.

    USE mykeyspace;

    Create a simple table with three columns for the ids, first names, and last names of users.

    CREATE TABLE users (
      user_id int PRIMARY KEY,
      fname text,
      lname text
    );

    Check that your table and keyspace has been created.

    DESCRIBE TABLES;

    The output is the list of tables, in the case just one, in the keyspace you're using:

    users

    Insert the ids, first name, and last name of a few users into the table.

    INSERT INTO users (user_id,  fname, lname)
      VALUES (1745, 'john', 'smith');
    INSERT INTO users (user_id,  fname, lname)
      VALUES (1744, 'john', 'doe');
    INSERT INTO users (user_id,  fname, lname)
      VALUES (1746, 'john', 'smith');

    Retrieve all the data from the users table.

    SELECT * FROM users;

    The output lists the data in the order Cassandra stores it.

     user_id | fname | lname
    ---------+-------+-------
        1745 |  john | smith
        1744 |  john |   doe
        1746 |  john | smith

    Retrieve data about users whose last name is smith by first creating an index, and then querying the table.

    CREATE INDEX ON users (lname);

    Note: You need the index because your WHERE clause will use a column that isn't the primary key.

    SELECT * FROM users WHERE lname = 'smith';

     user_id | fname | lname
    ---------+-------+-------
        1745 |  john | smith
        1746 |  john | smith

    Drop the users table.

    DROP TABLE users;

您需要登录后才可以回帖 登录 | 注册

本版积分规则

手机版|小黑屋|BC Morning Website ( Best Deal Inc. 001 )  

GMT-8, 2025-8-25 18:07 , Processed in 0.020329 second(s), 16 queries .

Supported by Best Deal Online X3.5

© 2001-2025 Discuz! Team.

快速回复 返回顶部 返回列表