💻 Labs

💻 Labs#

Table of Contents! 👇🏽

Lab 0: Tools
Lab 1: Tidy Data
Lab 2: Data Engineering
Lab 3: Geo-Visualisation
Lab 4: Networks and Spatial Weights
Lab 5: Linear Regression
Lab 6: Clustering
Lab 7: Points

Tip

DOWNLOAD ALL THE LABS

You can download all the labs, associated datasets, html files showing the output of the labs, and homework notebooks using the links below.

Lab 0 - Tools#

lab-0

IMPORTANT This is a supplementary notebook that covers many basics of the tools we will use in the course but does not explain anything directly related to Spatial Data Science.

Students are encouraged to read it before getting started with the other notebooks and then keep it as a reference throughout the rest of the course. There are some basic Python operations in there that act as a refresher, practice or learning material, if you may need it.

Extra Material#

If you want to explore further by yourself the contents presented in this tutorial, the following pointers are good places to start:

Gallery of interesting notebooks: a wealth of examples of Jupyter (formerly called IPython) notebooks.
(Downey, 2012): very good general introduction to Python as a programming language and to the algorithmic way of thinking. The book is freely available in HTML and PDF.

Lab 1 - Tidy Data#

lab-1

Data#

This session uses the “Census socio-demographics” dataset of The Hague, Netherlands and Liverpool, United Kingdom in two parts. The dataset for this lab is provided in the zipped lab files above.

Table of neighbourhoods called Buurten areas in The Hague with population counts by World region and Age groups. The table is derived from the denhaagcijfers dataset.
Table of LSOA areas in Liverpool with population counts by World region. The table is derived from the CDRC Census data pack.
Collection of socio-demographic characteristics from the 2011 Census for the city of Liverpool.

Extra Material#

[Visualization] Python library seaborn tutorial.
NY Times article about the importance of cleaning data.

Lab 2 - Data Engineering#

lab-2

Data#

This session uses two datasets which are provided in the zipped lab files above.

A dataset about wines from different countries, download from Kaggle.
A dataset scraped and collected from Goodreads.

Lab 3 - Geo-Visualisation#

lab-3

The homework exercises are embedded within the lab files itself. You have to complete the exercises as you go along understanding the rest of the code. There are two files for this lab, geovis and eda. Since this geocomputational lab is not as straightforward as other python code, a solution set is also provided for questions indicated in the lab.

Data#

This session uses multiple datasets which are provided in the zipped lab files above.

A “Census socio-demographics” dataset from the Municipality of the city of Cape Town from the Open Data Portal for the City of the Cape Town. Source.
Shape files for the city of Cape Town from Media Monitoring Africa (MMA) which is an independent watchdog for ethical and fair journalism that upholds human rights. Source.
Simple datasets on heart diseases, titanic and a mystery are also provided.

Extra Material#

A good introduction to the geopandas project is provided by Kelsey Jordahl, the project’s founder in this set of slides from a 2015 talk and the companion repository.
An additional great resource is this 4h. workshop by Carson Farmer.

Lab 4 - Networks and Spatial Weights#

lab-4

Data#

This session uses multiple datasets which are all provided in the zipped lab files above.

This session uses multiple datasets which are provided in the zipped lab files above.

An “Index of Multiple Deprivation”” dataset as well as the Ordnance Survey (OS) Geodata Pack. Scores, ranks, and components of the 2015 Index of Multiple Deprivation (IMD). Source: CDRC’s English Indices of Deprivation 2015 Geodata Pack for the city of Liverpool (UK).
Additionally, you will need the raster file for the basemap of Liverpool. This has been assembled by Dani Arribas-Bel from the OS VectorMap District (Backdrop Raster), and it is licensed as OpenData.
A Brexit dataset.

The Brexit dataset is of the results of the 2016 referendum vote to leave the EU, at the local authority level. All the necessary data have been assembled for convenience in a single file that contains geographic information about each local authority in England, Wales and Scotland, as well as the vote attributes. The file is in the modern geospatial format GeoPackage, which presents several advantages over the more traditional shapefile (chief among them, the need of a single file instead of several).

The source data used to compile the file linked above include:

Electoral Commission data on the EU referendum results (url)
Local Authority District boundaries (archived, not available online anymore)

Lab 5 - Linear Regression#

lab-5

Since this lab is not as straightforward as other python code, a solution set is provided for questions indicated in the lab and homework. It is better if you try yourself and get feedback from your peers, and then look at the solutions.

Data#

This session uses multiple data files.

A dataset downloaded from Kaggle on stats about the premiere league.
A Boston housing dataset and its training set companion.
An IMDB cast dataset.
A car dataset.
A cab dataset.

These sets are not that relevant to global urban issues but simple to work with on small regression practice sets.

Lab 6 - Clustering#

lab-6

Data#

This session uses the “AirBnb listing for Inner London - MSOA level” dataset.

This dataset contains information for AirBnb properties for the area of Inner London aggregated at the MSOA level. It has been prepared by Dani Arribas-Bel using as the original source the full listing of AirBnb locations for London provided by Inside AirBnb. Same as the source, the dataset is released under a CC0 1.0 Universal License.

For every polygon, the following variables are provided:

id: MSOA unique identifier.
accommodates: average property capacity in the MSOA.
bathrooms: average number of bathrooms in the properties within the MSOA.
bedrooms: average number of bedrooms in the properties within the MSOA.
beds: average number of beds in the properties within the MSOA.
number_of_reviews: average number of reviews received by the properties within the MSOA.
reviews_per_month: average number of reviews per month received by the properties within the MSOA.
review_scores_ratings: average rating score received by the properties within the MSOA.
review_scores_accuracy: average accuracy score received by the properties within the MSOA.
review_scores_cleanliness: average cleanliness score received by the properties within the MSOA.
review_scores_checkin: average checkin score received by the properties within the MSOA.
review_scores_communication: average communication score received by the properties within the MSOA.
review_scores_location: average location score received by the properties within the MSOA.
review_scores_value: average value score received by the properties within the MSOA.
property_count: total number of AirBnb properties listed withing the MSOA.

The lab also uses an additional file that contains the boundary lines of the London boroughs provided in the data folder as well.

Extra Material#

The documentation for scikit-learn, a world-class Python library for machine learning, is excellent and includes many examples that cover the entire functionality set of the library.

Lab 7 - Points#

lab-7

Data#

This lab uses a sample of geo-referenced locations of photographs taken in Tokyo.

Extra Material#

A very good resource for kernel density estimation in Python is provided in this blog post by Jake Vanderplas.