Setting up EC2 instance for jupyter notebook

  1. Access AWS
    • Always access AWS by using the Northwestern AWS portal and your NetID: https://nu-sso.awsapps.com/start/
    • You should have access through the mse-tl-dataeng300-EMR user account.
    • If you do not, reach out to the instructor.
    • (Preferred) Check that you are in the N. Virginia region (right top corner).

AWS Start page
  1. If you do not currently have an EC2 instance under your name, launch an EC2 instance, by doing the following:
    • Services -> EC2
    • (Optional) Launch an instance using Launch instance from template:
      • You will find a de300-t2.medium template available.
    • Verify the following configurations (or set up an instance from scratch):
      • Image: AMI (Linux 64-bit)
      • Instance type: t2.medium
      • Key pair: You should create a new key pair for your instance (if you do not currently have one).
        • Name your key pair name in the same way such that you can identify it. Default options are fine.
        • Upon creating a key pair, a “key file” [name].pem will be automatically downloaded. Store it somewhere your machine has read/write access to.
        • (You only have to do this once.) run chmod 400 [name].pem on your console. (If you use a Windows machine, use either bash or PowerShell.) The command modifies the “key file” to be read-only.
      • Subnet: RDS-Pvt-subnet-5
      • Firewall: Select launch-wizard-3 security group
      • Storage: Specify 12 Gb of EBS storage.
      • Resource tags: Key: Name, Value: [firstname][lastname]-de300 as your instance name.
    • Launch instance.
      • An instance will be running automatically. You may track running instances under the Instances page.

Launching from template
  1. SSH into EC2
    • Once your instance is running, select the instance under the Instances page and hit Connect.
    • Under the SSH client tab, you will see the instructions to connect to the EC2 instance via SSH on your console.
    • Troubleshoot:
      • If access is denied, double check your [name].pem file is located in your console working directory.
      • If an error about unprotected key arises, make sure you have run chmod 400 [name].pem.

EC2 Instances
  1. Anaconda installation on EC2
    • Fetch source code: wget https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh or the appropriate source code.
    • (Optional) Verify the source code SHA256 key: sha256sum [filename].sh.
    • Install: bash Anaconda3-2023.09-0-Linux-x86_64.sh
      • The first prompt requires you to read the agreement and respond yes
      • You may have to confirm the location of installation. The default is fine.
      • By the end of installation, you will see a prompt about conda init, you may respond yes such that each time you log into EC2, the conda environment is automatically loaded.
    • Activate conda environment:
      • If you responded yes above, run source .bashrc to activate the conda environment. You should see (base) as the beginning of your console cursor.
      • If you did not respond yes above, you can run eval "$($CONDA_PATH/bin/conda shell.bash hook)" to activate the conda environment.
    • To verify that Anaconda is successfully installed, you can run which python in the console, and a path to the Python executables should be returned.
      • If no python is found, it means that your conda environment was not successfully activated.
  2. Set Jupyter notebook password
    • Run jupyter notebook password.
    • Enter and verify your password.
  3. SSH into Jupyter notebook on EC2
    • In the console where EC2 is running, run jupyter notebook --no-browser --port=8888
    • Once a jupyter notebook is running, open a new console on your local machine.
    • On the new console, run ssh -i [name].pem -N -f -L 8888:localhost:8888 ec2-user@[your-public-DNS].compute-1.amazonaws.com.
    • Open a new browser (Chrome, Firefox, etc.), access https://localhost:8888.
    • Enter your password to access your Jupyter notebook.
  4. Stop your EC2 instance on AWS Instances page.

Important: Whenever you are not running anything (for more than 15 minutes), you should stop the EC2 instance on the AWS Instances page.

Reconnect to EC2 Jupyter notebook

  1. Start your EC2 instance from https://nu-sso.awsapps.com/start/. (The one you have a private key for.)
  2. Once the EC2 instance is running, connect to the instance via SSH
    • Under the SSH client tab, you will see the instructions to connect to the EC2 instance via SSH on your console.
  3. If your EC2 console shows (base) before your cursor, it indicates that conda environment is already running.
    • If not, you have to run eval "$($CONDA_PATH/bin/conda shell.bash hook)" to activate the conda environment.
  4. SSH into Jupyter notebook on EC2 (as in step 7 above.)