Setup Guide for Coding Machine Learning Methods in Python

In order to implement the algorithms seen in class and work on the projects, we’ll be using Python notebooks. This first lab will serve as an introduction to the Python language, the environment we are going to be using, and how to do basic vector and matrix manipulations.

Once you have the notebook system up and running, if you are new to the numpy library, you can try to do the exercises in the Matlab introduction pdf - but using numpy.

The environment

Python distribution: Anaconda

We will be using the Anaconda distribution to run Python 3, as it is easy to install and comes with most packages we will need. To install Anaconda, go to the download page and get the Python installer for your OS - make sure to use the newer version 3.x, not 2.x. Follow the instructions of the installer and you’re done.

Warning! The installer will ask you if you want to add Anaconda to your path. Your default answer should be yes, unless you have specific reasons not to want this.

Development Environment

During the course, we will use Jupyter Notebooks, which is a great tool for exploratory and interactive programming and in particular for data analysis. Notebooks are browser based, and you start it on your localhost by typing jupyter notebook in the console. Notebooks are already available by default by Anaconda. The interface is pretty intuitive, but they are a few tweaks and shortcuts that will make your life easier, which we’ll detail in the next section. You can of course ask any of the TAs for help on using the Notebooks.

The Notebook System

For additional resources on how the notebook system works, we recommend

Examples

We provide you with an example of a notebook for this first lab, but if you want to see some more examples already, feel free to take a look at

Tips & Tricks

There are a few handy commands that you should start every notebook with

# Plot figures in the notebook (instead of a new window)
%matplotlib notebook # or %matplotlib inline

# Automatically reload modules
%load_ext autoreload
%autoreload 2        

# The usual imports
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Keyboard shortcuts

Python

We will be working in Python. If you already have been introduced to Python, feel free to skip this section. If you come from another background, you might want to take some tutorials in addition to this lab in the next week to feel comfortable with it. You do not need to become an expert in Python, but you should be confortable with the general syntax, some of the idiosyncrasies of Python and know how to do basic vector and matrix algebra. For the last part, we will be using NumPy, a library we will introduce later.

For a nice introduction to Python, you should take a look at the Python tutorial. Here are some reading recommendations:

Here are some additionnal ressources on Python:

NumPy and Vector Calculations

Our npprimer.ipynb notebook as part of the first lab has some useful commands and exercises to help you get started with NumPy.

If you are familiar with Matlab, a good starting point is this guide. Be careful that we will use way more the array structure compared to the matrix structure.

A good and probably more complete reference is this one.

Installation FAQ

Other shell. If you are using another shell (e.g. zsh on Mac OSX), after installing Anaconda you still need to add the installed software to your path, that is to add it to the correct profile of your shell. To do so, run the following commands in your terminal touch ~/.bash_profile; open ~/.bash_profile. It will open your bash profile where you’ll see the line that was added by the Python installer. Copy it. Then touch ~/.zshrc; open ~/.zshrc, that will open the profile for zsh, you can paste the line at the bottom of the file.)

Alternative Python IDEs. While we recommend plain Jupyter Notebooks, if you are more comfortable using a more traditional IDE, you can give PyCharm a try. Your EPFL email gives you access to the free educational version. You should keep this option in mind if you need a full fledged debugger to find a nasty bug.

And of course, as a third alternative, you can always use a decent text editor and run your code from the console or any plugin. Keep in mind that the TAs might not be able to help you with your setup if you go down this path.

Additional References

A good Python and NumPy Tutorial from CS231n, Stanford. (WARNING : Uses Python 2 and not 3)