Skip to content

Tools, resources, and documentation to get started at data night

Notifications You must be signed in to change notification settings

NewportDataProject/data-night

Repository files navigation

data-night

Welcome to Data Night! This is a community event where we explore our neighborhood through the lens of data. The goal is to find, clean up, analyze, and visualize data sets related to our community.

Contents

The Problem

The focus of this Data Night is pedestrian safety - specifically crosswalks. The City of Newport Bicycle and Pedestrian Advisory Commission (BPAC) has been asked to make recommendations on crosswalk improvements and additions. We're helping out by combing through publically available data to get a good understanding on what the best options are for the city. We're also building visualizations to bring out the stories behind the datasets.

At this Data Night we are going to explore the data we already have and find new data that we need. We want to end with nice clean, well-structured data files and interesting plots, graphs, and maps. This notebook walks through the general process of what we are trying to accomplish.

Setup

This repository has pretty much everything you need to get started. Fork it and clone it to your computer. Alternatively, you can download a zip file. Create a new folder in the /workspace directory and save your work there so we can pull it all back together!

There are a few ways to contribute at Data Night:

  • If you're more comfortable with researching on the web and using google tools, check out the data mining team.
  • If you know or want to practice python, keep reading below for a few options.
  • If you've already got your own data science workflow, grab some datasets and have fun.

Python in Jupyter

Jupyter is a browser-based development environment for Python. Run code and display data and graphics from your browser. There are some pre-built notebooks in the /workspace directory to kick-start your analysis.

Jupyter Hub

If you just want to skip the setup and start coding, we have a jupyter hub running for Data Night at http://jupyter.newportdatproject.org.

  1. Log in with your github credentials (sign up here if you don't have them)
  2. Upload one of the example *.ipynb notebooks from the /workspace directory or start a fresh one
  3. When you're done, make sure you download your notebooks, because the server will be shutdown after the event

Docker

Docker is a tool for creating / deploying reproducible and consistent environments called "containers". The idea behind creating a container is that the environment / runtime can be scripted and thus rebuilt the same way easily over and over again.

For the sake of Data Night, the intent of using Docker is provide a unified and consistent environment for anyone to get all the tools that are considered most useful in one place with minimal overhead installing them all manually.

Starting the Container
  1. Install Docker.
  2. In the command shell, navigate to the root of this repository
  3. Run docker-compose up -d (First time running this may a take a little time)
  4. Navigate to localhost:8889 in your browser to access a Jupyter Notebook (from the container)
Using the Container

Following the steps above, you can use the Docker like any other shell.

  1. Connect to the container by running docker exec -it datanight_python_1 /bin/bash
  2. Run some commands! For example:
:/# python --version
Python 3.6.3 :: Anaconda, Inc.

The default working directory is /workspace

Anaconda (Python)

Anaconda is a distribution of python focused on data science. It provides a very good package manager conda and manages isolated virtual environments so you can run multiple versions of python on your machine for different projects. These instructions will get you up and running with the base packages needed for the examples.

  1. If you don't have it already, install Anaconda or miniconda.

  2. Set up the data-night conda environment with the command:

    $ conda create -n data-night python=3.6
    $ activate data-night  # if you're using bash, use source activate here
    $ conda install -c conda-forge geopandas bokeh folium osmnx geopy
  3. Open your console in this directory, activate the environment using activate data-night (use source if you're using bash) and start jupyter with jupyter notebook.

  4. Your browser should open to the jupyter server. If not, copy and paste the link (with the token) from your console. (it should be something like `127.0.0.1:8888/?token=####)

Python Packages

These are some python libraries that may come in handy at Data Night. They're installed in the docker image and on the Jupyter notebook server.

  • Pandas - The favorite data processing library. Read and write data files; sort, slice, and access data elements; analyze and visualize data. Pandas is also used as a base for a number of more specialized packages.
  • GeoPandas - An extension of Pandas that works with geospatial data.
  • osmnx - A tool for analyzing and visualizing street networks, pulling from OpenStreetMap data and using standard python network analysis libararies.
  • Folium and Bokeh - Two visualization libraries that work well in juyter notebooks. Folium is an interface to build interactive maps, and Bokeh builds interactive charts, including maps.

Data

There are public datasets available all over the internet. Here are a couple places to start looking:

About

Tools, resources, and documentation to get started at data night

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •