Setting up access to Keyspace in Jupyter notebook

Prerequisites

Two files are required for accessing Keyspace in Jupyter notebook:

  1. Starfield digital certificate
  2. AWS credentials for Keyspace

More details about the setup can be found in Using a Cassandra Python client.

Starfield digital certificate

The digital certificate can be downloaded by:

!curl -O https://www.amazontrust.com/repository/AmazonRootCA1.pem
!curl -O https://www.amazontrust.com/repository/AmazonRootCA2.pem
!curl -O https://www.amazontrust.com/repository/AmazonRootCA3.pem
!curl -O https://www.amazontrust.com/repository/AmazonRootCA4.pem
!curl -O https://certs.secureserver.net/repository/sf-class2-root.crt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1188  100  1188    0     0  18569      0 --:--:-- --:--:-- --:--:-- 19800
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1883  100  1883    0     0  59867      0 --:--:-- --:--:-- --:--:-- 69740
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   656  100   656    0     0  14694      0 --:--:-- --:--:-- --:--:-- 16000
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   737  100   737    0     0  13336      0 --:--:-- --:--:-- --:--:-- 14450
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  1468  100  1468    0     0   5144      0 --:--:-- --:--:-- --:--:--  5187
# combine key files
!type AmazonRootCA1.pem AmazonRootCA2.pem AmazonRootCA3.pem AmazonRootCA4.pem sf-class2-root.crt > keyspaces-bundle.pem

AmazonRootCA1.pem



AmazonRootCA2.pem



AmazonRootCA3.pem



AmazonRootCA4.pem



sf-class2-root.crt

AWS credentials

The AWS credentials file can be downloaded via Canvas. Place this credentials file under .aws/ folder on your local machine, or your EC2 instance, so that AWS knows where to look for the credentials.

MAJOR NOTE: Do not store this credentials anywhere that is publicly accessible, github, public S3 bucket, etc. That is the primary reason why this file is only available on Canvas.

Sample code for connection

  1. Install cassandra-sigv4 via the following command:
!pip install cassandra-sigv4
  1. Set up a boto3 session and a Cassandra cluster (the Python way of interacting with AWS).
from cassandra.cluster import Cluster
from ssl import SSLContext, PROTOCOL_TLSv1_2 , CERT_REQUIRED
import boto3
from cassandra_sigv4.auth import SigV4AuthProvider

ssl_context = SSLContext(PROTOCOL_TLSv1_2)
ssl_context.load_verify_locations('keyspaces-bundle.pem')
ssl_context.verify_mode = CERT_REQUIRED
<positron-console-cell-5>:6: DeprecationWarning: ssl.PROTOCOL_TLSv1_2 is deprecated
import pandas as pd
access_key = pd.read_csv('../admin/de300-keyspaces_accessKeys.csv')
# use this if you want to use Boto to set the session parameters.
boto_session = boto3.Session(aws_access_key_id="AKIAYAAO5HRMC6K3K34T",
                             aws_secret_access_key=access_key['Secret access key'].values[0],
                            #  aws_session_token="AQoDYXdzEJr...<remainder of token>",
                             region_name="us-east-1")
auth_provider = SigV4AuthProvider(boto_session)

cluster = Cluster(['cassandra.us-east-1.amazonaws.com'], 
                  ssl_context=ssl_context, 
                  auth_provider=auth_provider,
                  port=9142)
session = cluster.connect()
r = session.execute('select * from system_schema.keyspaces')
print(r.current_rows)
[Row(keyspace_name='system_schema', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='system_schema_mcs', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='system', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='system_multiregion_info', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_acharya', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_chan', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_demo', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_fingerson', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_lacombefarina', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_sokolenko', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_wilks', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')]))]

Working with Cassandra (AWS Keyspace)

# establishing connection to Keyspace
session = cluster.connect()
# Insert any CQL queries between .connect() and .shutdown()

# For example, show all keyspaces created
r = session.execute('''
    SELECT * FROM system_schema.keyspaces;
    ''')
print(r.current_rows)
[Row(keyspace_name='system_schema', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='system_schema_mcs', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='system', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='system_multiregion_info', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_acharya', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_chan', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_demo', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_fingerson', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_lacombefarina', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_sokolenko', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')])), Row(keyspace_name='de300_wilks', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')]))]
q = '''
SELECT * FROM de300_chan.test_table;
'''

r = session.execute(q)
r.current_rows
[Row(country='US', user_id=1002, gender='F'),
 Row(country='GB', user_id=1001, gender='M')]
from cassandra import ConsistencyLevel
session.default_consistency_level = ConsistencyLevel.LOCAL_QUORUM

q = '''
INSERT INTO de300_chan.test_table (country, gender, user_id) VALUES ('US', 'F', 1002);
'''

r = session.execute(q)

# For example, create a keyspace for HW2
r = session.execute('''
    CREATE KEYSPACE IF NOT EXISTS de300_demo 
    WITH replication = {'class': 'SingleRegionStrategy'};
    ''')
print(r.current_rows)

Shutdown your Cassandra connection

session.shutdown()