Coursera and online MOOCs for Machine Learning

Coursera and online MOOCs for Machine Learning

Wow, what a great resource, Andrew Ng’s CS229A is available for free on Coursera. It reminds me that to really learn something, you do have to code it and try it. The course is free and takes a nominal 12 weeks. They even supply TAs and folks to talk to. The only thing that is hard is understanding how vectorization works (and remember linear algebra and calculus).
It’s one of the top-rated courses on machine learning. I love the way you can submit things for grading.
The next level of classes are Andrew Karpathy’s CS231 Deep Learning and CS224 computer vision classes, but Stanford has these but doesn’t grade the exercises, although you can compare them by doing a google search for the answers.
Coursera does offer for $49 a month, the Deeplearning.ai classes or you can spend $400 a month at Udacity. I’m trying the $49 a month five-part series. It is decent but uses a Jupiter notebook that they host. I’d much rather run offline and keep the code.

Using Coursera offline

There is a tool called https://github.com/courseradl/courseradl that does a download of everything from the site. This includes all the videos and things which is super useful for offline work when you are on a plane. You can even use a docker container to extract the data with a simple command

docker run --rm -it courseradl/courseradl -u _username _course 1_ _course2_ ...

This works semi-well, the problem is that the API doesn’t work all the time. Note that documentation suggests putting the password into clear type on the command line, I wouldn’t recommend that, it puts your password into the history file.
The developer suggests that you keep trying and also to try to log into a subsidiary site. They also suggest digging out the authentication cookie from Chrome and add that, this example assumes the download failed, you can use --resume to restart it

coursera-dl -ca '_insert the cookie token' --resume machine-learning

Running Notebook Offline

And to get the notebooks for the exercises, Jan Van De Poel explains how you can suck out the notebooks by a magic tar command and then run it locally. The basic trick is that the end of every notebook, you create a new cell with the incantation which basically takes the whole notebook and makes it a compress archive file and then hit Shift+Enter to run it. This incantation basically says, use the z zipping, v for verbose so you can see what it is doing, then the last f gives the location of the file. Finally, there is an exclude you don’t copy itself, then the asterisk says everything at this level

!tar chvzf notebook.tar.gz --exclude notebook.tar.gz *

Then you choose Open and save this magic file to your local machine. You definitely want to commit it. As an aside, GitHub just changed their policy, so free accounts can private repositories, so save yourself $7/month if you are just hacking around. They also give you 1GB of LFS storage for free now too.
This doesn’t work super well mainly because of a “25MB chunk is too long” from the Coursera server, so instead, you have to copy out pieces. The easiest way to do this is to tar out pieces which you do with the -T and - command to read files from the standard input.

!find .. -name "*.h5" | tar chvzf h5.tar.gz --exclude h5.tar.gz -T -

If you want all the exercises, then you need to know a little bit about the directory structure, but for Courses 1, 2 and 5, the top is two levels up, so

!tar chvzf notebook.tar.gz --exclude notebook.tar.gz ../..

However for Course 4, it is just a little shorter, but the files are longer. At least on Safari, you can’t exceed 25MB in a transfer, so the notebook below may not copy for you, you have to divide it down because the test sets are big. Alternatively, you can use one of the GitHub repos that have the files (but also the answers, so be careful, you will have to remove the code here lines so you can actually do the programming

!tar chvzf notebook.tar.gz --exclude notebook.tar.gz ../week1 ../dummy
!tar chvzf notebook2.tar.gz --exclude notebook2.tar.gz ../week2

Finally, to actually extract it, you need to choose File/Open and then click on notebook.tar.gz which will then download and expand so you can get the files. It is actually really hard to get all the big files down because there is 25MB limit, but if you go into the File/Open dialog, then you
Now you just need to install conda (for Anaconda on your local machine). You have two good choices. The first is to run Anaconda locally and natively on the Mac with

# you may want to use pyenv if you need python 2.7 also
brew cask install python  # Installs python 3.7 in May 2019
# you may what to use python -venv if you have clashing libraries
pip install matplotlib pillow scipy numpy tensorflow
brew cask install jupyter
cd _location of course files_
jupyter notebook

Personally, I don’t find it that useful
The second is to install a dockerized version of Anaconda. Since Anaconda runs as a web server anyway, this is just as convenient as running it locally and is reproducible as continuum.io explains. You do probably want to fork the Docker build so you snapshot what they are up to.
If you want a prebuilt container that runs this class then you can also use the docker container that I maintain. Then you will see a gigantic token number so insert that in the next command:

docker run tongfamily/jupyter
open http://localhost:8888
# You will get a prompt page and enter the token from docker pull

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Skip to toolbar