Commit 411a5979 authored by Simon Clarke's avatar Simon Clarke
Browse files

Added introduction to Colab notebook

parent 03ac5b22
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -3540,7 +3540,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
......@@ -3554,7 +3554,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.8.10"
}
},
"nbformat": 4,
......
{
"cells": [
{
"cell_type": "markdown",
"id": "84ffb5ad",
"metadata": {},
"source": [
"# Introduction to Google Colab"
]
},
{
"cell_type": "markdown",
"id": "48040d43",
"metadata": {},
"source": [
"Here we will cover the basic of accessing data and filesystem navigation on Google Colab (https://colab.research.google.com/). This includes:\n",
"* Accessing notebooks in Google Drive,\n",
"* Sharing notebooks,\n",
"* Uploading and downloading data,\n",
"* Accessing data via a git repository,\n",
"* Accessing data via Google Drive.\n",
"\n",
"For further information, see the online help in Colab or pages such as https://neptune.ai/blog/google-colab-dealing-with-files-2, which has information on how to access Cloud Data on Amazon and Kaggle servers."
]
},
{
"cell_type": "markdown",
"id": "a4e49ac0",
"metadata": {},
"source": [
"## Accessing notebooks in Google Drive"
]
},
{
"cell_type": "markdown",
"id": "500c6ca6",
"metadata": {},
"source": [
"When you first open Colab you will see a splash screen which gives you the following options to open a notebook:\n",
"* Examples (preloaded notebooks with help on how to use Colab and code snippets),\n",
"* Recent (your recent activity),\n",
"* Google Drive (files you have saved or uploaded to Google Drive),\n",
"* Github (files on a github repository),\n",
"* Upload (files to upload from your local machine).\n",
"\n",
"To get started, read the [Introduction to Colab](https://colab.research.google.com/notebooks/intro.ipynb), which is the first file in `Examples`.\n",
"\n",
"To get used to dealing with notebooks and data download this [notebook](https://gitlab.erc.monash.edu.au/bads/data-challenges-resources/-/blob/main/Pandas-DataFrames/05-PandasWeather.ipynb) and [data file](https://gitlab.erc.monash.edu.au/bads/data-challenges-resources/-/blob/main/Pandas-DataFrames/IDV60901.94866.csv), and then upload the notebook to Colab.\n",
"\n",
"The interface for Colab is fairly similar to Jupyter, and most of the commands that you use in Jupyter are transferrable to Colab. At the top there are two buttons: **+Code** and **+Text**, to add Python code and Markdown text respectively. When you are editing a cell, there is a menubar of useful commands that appears in the upper right-hand side of the cell. While when your are editing a text cell, there is a second menubar which appears at the upper left-hand side of the cell and a real-time preview on the right-hand side. This preview can be toggled, so it is below the Markdown code. The completion feature for editing of code is particularly useful. This will suggest possible completions for functions and data structures.\n",
"\n",
"On the left of each cell, you will see an arrowhead in a circle to run the cell. Press this and once the cell calculations have been completed a green tick will appear, along with the time taken to run the cell. Alternatively, you can run a cell by focussing on the cell and pressing `Shift + Enter`. You can also use the commands in the **Runtime** menu to run individual cells, or run all of the cells. \n",
"\n",
"Click on the first cell of the notebook and try to run all the cells. You should get an error that it can't find the data file. Goto the left-side panel and click on the folder icon. That will open an interface which shows a folder called _sample_data_, and three icons above this. Click on **Upload** and upload the csv file you previously downloaded. You might get a warning that the data will only be available for this session. Once the file is uploaded, click on the three dots at the side and select **Copy path**, then paste this in the `pd.read_csv` command that failed. You should now get that the path has _/content/_ prepended to the filename. \n",
"\n",
"Now try running the notebook again. It should run without any errors.\n",
"\n",
"Shell escapes are commands which are preceeded by a `!`. These run shell commands which interact with the computer's operating system, rather than Python commands. On Colab these are Unix or Linux commands.\n",
"\n",
"If you type (pwd => print working directory)\n",
"\n",
" !pwd\n",
"\n",
"in a cell, it will give you the name of the current directory or folder that you are working in. This should return _/content_, which is the default directory that Colab opens in. Now we can open files by giving their absolute or relative path. Just before we gave the absolute path, i.e., the path from the top of the directory structure, which always starts with _/_. The relative path is the path relative to the current directory, this will not include _/_ at the beginning. Since the file _IDV60901.94866.csv_ is in the current directory, we could use just this field to open the file. Change `pd.read_csv('/content/IDV60901.94866.csv')` to `pd.read_csv('IDV60901.94866.csv')` and rerun the notebook. \n",
"\n",
"In the directory _sample_data_ there is a file _california_housing_test.csv_. To open this file from the current directory (_/content_) we can specify the absolute path\n",
"\n",
" cht = pd.read_csv('/content/sample_data/IDV60901.94866.csv')\n",
"\n",
"or the relative path\n",
"\n",
" cht = pd.read_csv('sample_data/IDV60901.94866.csv')\n",
"\n",
"You could also open the file by first moving into the directory:\n",
"\n",
" %cd sample_data\n",
" cht = pd.read_csv('california_housing_test.csv')\n",
"\n",
"Here we have used the magic command `%cd`, rather than the shell escape `!cd`, as the shell escape only applies for that command, whereas the magic command permanently changes our working directory.\n",
"\n",
"To change back to the original directory you can use (the double dots represent the parent directory of the current directory)\n",
"\n",
" %cd /content\n",
"or\n",
" \n",
" %cd ..\n",
"\n",
"Check your directory after entering these commands using `!pwd`. \n",
"\n",
"To see a listing of the files in the current directory, use\n",
"\n",
" !ls\n",
"\n",
"Other shell commands that you might find useful are `mkdir`, `rm`, `mv`, `cp` and `man`. \n",
"\n",
"`mkdir` will create a directory in the current directory using the syntax :\n",
"\n",
" !mkdir newdir\n",
"\n",
"`rm` will remove a file or a directory. To remove a file use:\n",
"\n",
" !rm filename\n",
"\n",
"To remove an empty directory use:\n",
"\n",
" !rm -d newdir\n",
"\n",
"and to remove a directory and any files or directory contained in this directory (be careful about doing this, as there is no going back) use\n",
"\n",
" !rm -r newdir\n",
"\n",
"`mv` will rename or move a file or directory. We will just consider files, but the syntax for directories is the same. To rename a file, but keep it in the same directory, use\n",
"\n",
" !mv filename anotherfile\n",
"\n",
"To move a file to a different directory use\n",
"\n",
" !mv filename pathdir \n",
"\n",
"where pathdir is the absolute or relative path of the directory. \n",
"\n",
"`cp` is similar to `mv`, except that it makes a copy of the file or directory.\n",
"\n",
"`man` display the manual page for the command. For example, to find out everything your need to know about the `rm` command, type\n",
"\n",
" !man rm\n",
"\n",
"To obtain a brief listing of the usage of the command use the `--help` option. For example,\n",
"\n",
" !rm --help\n",
"\n",
"These are always useful, as it is too difficult to remember all the commands and options."
]
},
{
"cell_type": "markdown",
"id": "29d3be77",
"metadata": {},
"source": [
"![Xkcd](https://imgs.xkcd.com/comics/tar.png)"
]
},
{
"cell_type": "markdown",
"id": "66ac959e",
"metadata": {},
"source": [
"You can also upload and download files using the Python cells. The following code will upload files and store them in the current directory. Then any of the files can be accessed as normal.\n",
"\n",
" from google.colab import files\n",
" files.upload()\n",
"\n",
"To download files, we can use the `files.download()` method. For example, to download the file _california_housing_test.csv_ to our default Download directory on our local machine, we would use\n",
"\n",
" files.download('/content/sample_data/california_housing_test.csv')\n",
"\n",
"Now goto to the **File** menu. From there you can download the current notebook in various formats. However, click on **Locate in Drive**, which will open a window which shows that the file is saved in _My Drive > Colab Notebooks_.\n",
"\n",
"Goto **Runtime > Manage session** and terminate the current session. Since sessions have an idle time-limit of 90 minutes and a maximum runtime-limit of 12 hours, this simulates starting a new session the next day. Then goto **File > Open Notebook** and open the notebook that was just closed using **Recent** or **Google Drive**. Once the session opens, open the file explorer on the left-hand side. Now the csv file no longer appears. To run this notebook again, you will need to upload the csv file using the methods described above.\n",
"\n",
"You could also open the notebook by going to the web interface for Google Drive and navigating to _My Drive > Colab Notebooks_. Then click on the notebook and Colab will open. Notebook files don't need to be stored in _Colab Notebooks_. Any notebook in Google Drive will be opened in Colab.\n",
"\n",
"You can't open notebooks using the file explorer on Colab, this is only for accessing data. \n"
]
},
{
"cell_type": "markdown",
"id": "bce91aa7",
"metadata": {},
"source": [
"## Sharing notebooks"
]
},
{
"cell_type": "markdown",
"id": "a01139b9",
"metadata": {},
"source": [
"Sharing notebooks works the same as sharing files in Google Drive. Click **Share** in the upper right-hand corner and this will open a panel, where you can share the notebook by entering the person's email address.\n",
"\n",
"Sharing is a bit like using a git repository, but without the flexibility. You will need to determine a workflow for managing files. The simplest approach is to work on a copy of the file, and then let one person in your group integrate the changes."
]
},
{
"cell_type": "markdown",
"id": "67cc3c14",
"metadata": {},
"source": [
"## Accessing data via a git repository"
]
},
{
"cell_type": "markdown",
"id": "d47e98c6",
"metadata": {},
"source": [
"If your data is in a git repository, you can access it by cloning the repository to your current directory. Last semester you should have created a git repository on https://github.com of the form https://github.com/username/reponame, which we will refer to as `giturl`. Use your repository name wherever giturl occurs in the following cells and replace `filename`, `username`, `reponame` and `personalaccesstoken` by your personal details. \n",
"\n",
"To clone that repository to the current directory we can use\n",
"\n",
" !git clone giturl\n",
"\n",
"Once this is completed all your files in the repository will have been uploaded to the Colab filesystem in your current directory and can be accessed either using the file explorer or the Python interface.\n",
"\n",
"This process is the same as choosing the **Gitlab** option when opening a file.\n",
"\n",
"Now change directory into your git repository, and create a file. This could be done by saving data using `pd.to_csv()`, copying a file from another directory or using (`touch` creates an empty file):\n",
"\n",
" !touch filename\n",
"\n",
"In a new window in your browser goto your github repository and create a Personal Access Token by following the instructions at https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token. Then in Colab change back to the top directory of your git repository and upload the changes you have made on Colab to https://github.com/username/reponame using the following commands (just make sure you don't store the notebook with the Personal Access Token included):\n",
"\n",
" !git status\n",
" !git add filename\n",
" !git commit -m 'My message'\n",
" !git push https://username:personalaccesstoken@github.com/username/reponame.git\n",
"\n",
"We can also clone public repositories. For example, to clone the repository with the BADS notebooks, we can use:\n",
"\n",
" !git clone https://gitlab.erc.monash.edu.au/bads/data-challenges-resources\n",
"\n",
"However, as this is public, we cannot upload any changes that are made locally to the repository. If you want to save the changes to the filesystem, you will need to download the files or copy the filesystem to Google Drive.\n",
"\n",
"One last trick is that we can open any public notebook on github by copy the link to the file and changing `github.com` to `colab.research.google.com/github`. For example, try opening the file https://github.com/coramthomas/bads-repo/blob/main/SinPlot.ipynb in Colab."
]
},
{
"cell_type": "markdown",
"id": "0f4d16f7",
"metadata": {},
"source": [
"## Accessing data via Google Drive"
]
},
{
"cell_type": "markdown",
"id": "4f2e5b65",
"metadata": {},
"source": [
"The most effective way for accessing data is using Google Drive. We can mount the drive or connect it to the Colab filesystem using the file explorer or the Python interface.\n",
"\n",
"To mount Google Drive using the file explorer, click on the **Google Drive** icon, which is the third icon at the top of the file explorer. This will insert the following codelet into your notebook:\n",
"\n",
" from google.colab import drive\n",
" drive.mount('/content/drive')\n",
"\n",
"You will then need to run the cell and follow the instructions to generate an authorization code to copy into the cell. Then your Google Drive files will appear at _/content/drive_, with _Shareddrive_ and _My Drive_ folders under these directories.\n",
"\n",
"The second way to mount the drive is just to copy the above codelet into a cell in one of your notebooks, which will then be run automatically each time you run the notebook.\n",
"\n",
"Once you have mounted Google Drive you can then navigate the directories and files using the file explorer or using the commands explained above."
]
},
{
"cell_type": "markdown",
"id": "32bae118",
"metadata": {},
"source": [
"## Exercises"
]
},
{
"cell_type": "markdown",
"id": "45375cf8",
"metadata": {},
"source": [
"You will need to complete the exercises in Colab. Upload this file to Colab, then when complete download the notebook to your local computer and then submit via Moodle."
]
},
{
"cell_type": "markdown",
"id": "8fe8734a",
"metadata": {},
"source": [
"### Exercise 1 (2 marks)"
]
},
{
"cell_type": "markdown",
"id": "c755caa5",
"metadata": {},
"source": [
"The following is a codelet from https://gitlab.erc.monash.edu.au/bads/data-challenges-resources/-/blob/main/Topic-Hints/Time-Lag-Features/Pedestrians.ipynb. Upload the data file from the directory for this notebook and then run this code. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fad1f855",
"metadata": {},
"outputs": [],
"source": [
"import pathlib\n",
"import pandas as pd \n",
"# Read from csv file and set up a datetime index with the correct type.\n",
"data = (\n",
" pd.read_csv(\"pedestrians-september.csv\")\n",
" .assign(TimeOfDay=lambda df: pd.to_datetime(df['TimeOfDay']))\n",
" .rename(columns={\"TimeOfDay\": \"Timestamp\"})\n",
" .set_index(\"Timestamp\")\n",
")\n",
"data.head()"
]
},
{
"cell_type": "markdown",
"id": "853b8b24",
"metadata": {},
"source": [
"### Exercise 2 (3 marks)"
]
},
{
"cell_type": "markdown",
"id": "09df8fbb",
"metadata": {},
"source": [
"In your home directory on Colab (_/content_), using one call to `mkdir` create a directory _data/pedestrians_ and copy the data file from Exercise 1 into this directory. Change into this directory and create a README file with a one line description by using `echo` and `>`. The latter redirects the output of a command to a file. For example, `!pwd > WorkingDirectory` will create a file called `WorkingDirectory` containing the name of the current directory. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e91cdbd3",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "4f50d4ce",
"metadata": {},
"source": [
"### Exercise 3 (3 marks)"
]
},
{
"cell_type": "markdown",
"id": "fa775ef7",
"metadata": {},
"source": [
"Clone your github repository and create a text file in the top directory of this repository, then commit these changes to the local repository (don't commit the changes to the remote repository). If you don't have a github repository, use a public repository."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8046f5a2",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "f31ef57c",
"metadata": {},
"source": [
"### Exercise 4 (2 marks)"
]
},
{
"cell_type": "markdown",
"id": "7baf4e91",
"metadata": {},
"source": [
"Mount your Google Drive, change directory to where the Colab notebooks are stored and list the files with a long listing, ordered from newest to oldest."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "889efdd7",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment