add week 7 solutions

19534fbb · Nicolas Aspert · 35b48597 · 19534fbb · 19534fbb
Commit 19534fbb authored 2 weeks ago by Nicolas Aspert
--- a/solutions/week07/RBF networks.ipynb
+++ b/solutions/week07/RBF networks.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bd39160c",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "outputs": [],
+   "source": [
+    "# Initialize Otter\n",
+    "import otter\n",
+    "grader = otter.Notebook(\"RBF networks.ipynb\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b62d1e09-41c3-40d0-a763-e8231d437183",
+   "metadata": {},
+   "source": [
+    "# Matrix Analysis 2024 - EE312\n",
+    "## Week 8 - Image classification with Radial Basis Function (RBF) networks\n",
+    "[LTS2](https://lts2.epfl.ch)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "nasty-access",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from matplotlib import pyplot as plt\n",
+    "from sklearn.metrics import pairwise_distances\n",
+    "from scipy.spatial import distance_matrix\n",
+    "from scipy.special import softmax\n",
+    "import requests\n",
+    "from pathlib import Path"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "vietnamese-basin",
+   "metadata": {},
+   "source": [
+    "##  1. Image Classification\n",
+    "In this exercise, we will be doing image classification with a simple neural network. For simplicity, let's assume we will be working with black and white images.\n",
+    "Given an input image $i$ represented as a vector of  pixel intensities $ \\mathbf{x}_i \\in [0,1]^d$, we want to predict its correct label $\\mathbf{y}_i$, which is represented as a one-hot vector in $\\{0,1\\}^K$, where $K$ is the number of possible categories (classes) that the image may belong to. For example, we may have pictures of cats and dogs, and our goal would be to correctly tag those images as either cat or dog. In that case we would have $K=2$, and the vectors $\\begin{pmatrix}0 \\\\ 1\\end{pmatrix}$ and $\\begin{pmatrix}1 \\\\ 0\\end{pmatrix}$ to represent the classes of cat and dog.  \n",
+    "\n",
+    "In today's example we will be using the MNIST handwritten digit dataset. It contains images of handwritten numbers from 0 to 9 and our goal is to create a model that can accurately tag each image with its number. Let's load the data first."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "exact-cycling",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "### Load the data\n",
+    "\n",
+    "# Download data if needed\n",
+    "if not Path(\"./mnist_data.npz\").is_file():\n",
+    "    r = requests.get('https://os.unil.cloud.switch.ch/swift/v1/lts2-ee312/mnist_data.npz', allow_redirects=True)\n",
+    "    with open('mnist_data.npz', 'wb') as f: # save locally\n",
+    "        f.write(r.content)\n",
+    "\n",
+    "\n",
+    "mnist = np.load('mnist_data.npz')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "varying-taylor",
+   "metadata": {},
+   "source": [
+    "In the context of classification, neural networks are models that given one (or multiple) input data points produce as output a set of corresponding labels for each input. The model itself consists of parametric functions $g_i$ which can be applied sequentially to the input data, resulting in a set of labels which are the model's prediction for the data. For example, in a model that consists of two parameteric functions $g_1$ and $g_2$, for a given $\\mathbf{x}_i$, we have the predicted label $ \\hat{\\mathbf{y}}_i = g_1(g_2(\\mathbf{x}_i))$. The parametric functions are commonly called \"layers\".\n",
+    "\n",
+    "In a standard image classification setup, we are given some training data which we can use to tune the parameters of the parametric functions $g_i$ in order to improve its ability to predict the labels correctly. The parameters are generally tuned with respect to some objective (commonly called a loss function). We want to find the parameters of the model that minimize this loss function. Various loss functions can be used, but in general they tend to encode how \"wrong\" the model is. For\n",
+    "example, on a given image $i$ one can use the loss $\\mathcal{L}(\\hat{\\mathbf{y}_i}, \\mathbf{y}_i)= \\sum_{j=1}^{K}(\\hat{{y}}_{ij} -{y}_{ij})^2 $, which is the mean squared difference between the vector coordinates of the predicted label of the image and the ones of the actual label $\\mathbf{y}_i$.\n",
+    "Minimizing the loss over the whole training set is referred to as \"training the model\". Furthermore, the goal is that given new data we have not seen before and we have not trained our model with, the model will still be able to classify accurately.\n",
+    "\n",
+    "Before we go into the details of the model and how we will train it, let's prepare the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e2140162-4414-47c2-ae99-d616300a1d65",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Preprocess the data\n",
+    "\n",
+    "images = mnist['data']\n",
+    "num_images = images.shape[0]\n",
+    "\n",
+    "train_set_size = 60000\n",
+    "test_set_size = 10000\n",
+    "\n",
+    "train_images = images[:train_set_size]\n",
+    "train_images = train_images/255.\n",
+    "train_images =  train_images\n",
+    "\n",
+    "test_images = images[-test_set_size:]\n",
+    "test_images = test_images/255.\n",
+    "test_images = test_images\n",
+    "\n",
+    "#create one-hot encodings of labels\n",
+    "mnist_target = mnist['target']\n",
+    "num_classes = mnist_target.max()+1\n",
+    "labels = []\n",
+    "for k in range(num_images):\n",
+    "    one_hot = np.zeros(num_classes)\n",
+    "    one_hot[int(mnist_target[k])]=1\n",
+    "    labels += [one_hot]\n",
+    "labels = np.array(labels)\n",
+    "\n",
+    "#labels in one-hot\n",
+    "train_labels = labels[:train_set_size]\n",
+    "test_labels = labels[-test_set_size:]\n",
+    "\n",
+    "#labels in integer form\n",
+    "int_labels = np.array(mnist_target, dtype=int)\n",
+    "int_labels_train = int_labels[:train_set_size]\n",
+    "int_labels_test = int_labels[-test_set_size:]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "00b3e353-a254-4cb5-975d-0741a50576c7",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaEAAAGdCAYAAAC7EMwUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAbdElEQVR4nO3df3DV9b3n8dchwBGZk3OLkJwTiWlqYewSlmkBgSw/AnfJmB0pGN1F3duB2ZbVGrhDg2VKmb1ke2eIa0eGdqm4Or1UWhFmehHZCyOmFxLqIJ3IxStLFeMSJF6SZkk1JwQ8EPjsHyxneySGfo7n5J2TPB8zZ8ac833z/fj1Oz79ek6+J+CccwIAwMAw6wUAAIYuIgQAMEOEAABmiBAAwAwRAgCYIUIAADNECABghggBAMwMt17AZ127dk3nzp1TKBRSIBCwXg4AwJNzTl1dXSooKNCwYX1f6wy4CJ07d06FhYXWywAAfEEtLS0aP358n9sMuAiFQiFJ0mz9Ow3XCOPVAAB89eiK3tD+xL/P+5KxCD377LP68Y9/rNbWVk2aNEmbN2/WnDlzbjl343/BDdcIDQ8QIQDIOv/vjqR/zlsqGflgwq5du7R69WqtX79ex48f15w5c1RRUaGzZ89mYncAgCyVkQht2rRJ3/72t/Wd73xHX/va17R582YVFhZq69atmdgdACBLpT1Cly9f1rFjx1ReXp70fHl5uY4cOXLT9vF4XLFYLOkBABga0h6h8+fP6+rVq8rPz096Pj8/X21tbTdtX1tbq3A4nHjwyTgAGDoy9suqn31DyjnX65tU69atU2dnZ+LR0tKSqSUBAAaYtH86buzYscrJybnpqqe9vf2mqyNJCgaDCgaD6V4GACALpP1KaOTIkZo6darq6uqSnq+rq1NpaWm6dwcAyGIZ+T2h6upqfetb39K0adM0a9YsPf/88zp79qwef/zxTOwOAJClMhKhpUuXqqOjQz/60Y/U2tqqkpIS7d+/X0VFRZnYHQAgSwWcc856EX8qFospHA6rTIu5YwIAZKEed0X1elWdnZ3Kzc3tc1u+ygEAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADNECABghggBAMwQIQCAGSIEADBDhAAAZogQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADNECABghggBAMwQIQCAGSIEADBDhAAAZogQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwM9x6AQD+PDl3jPGeCYRzU9rX2QcLvGc+Heu8Z776X//Ze+baxYveMxi4uBICAJghQgAAM2mPUE1NjQKBQNIjEomkezcAgEEgI+8JTZo0Sb/5zW8SP+fk5GRiNwCALJeRCA0fPpyrHwDALWXkPaGmpiYVFBSouLhYDz/8sE6fPv2528bjccVisaQHAGBoSHuEZsyYoe3bt+vAgQN64YUX1NbWptLSUnV0dPS6fW1trcLhcOJRWFiY7iUBAAaogHPO/8P9Hrq7u3X33Xdr7dq1qq6uvun1eDyueDye+DkWi6mwsFBlWqzhgRGZXBqQVfg9oev4PaGBr8ddUb1eVWdnp3Jz+z4HM/7LqqNHj9bkyZPV1NTU6+vBYFDBYDDTywAADEAZ/z2heDyud999V9FoNNO7AgBkmbRH6Mknn1RDQ4Oam5v1u9/9Tg899JBisZiWLVuW7l0BALJc2v933EcffaRHHnlE58+f17hx4zRz5kwdPXpURUVF6d4VACDLpT1CO3fuTPcfCQxow0ru8Z5pWjfKe+Y/TT7iPbPmjgPeM/3pa/mPe89MWH4sAyuBFe4dBwAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYyfiX2gEWAtMnpzT3wfdyvGfqZ2/xnhmX4/9FjsNS+G/GfRe/5D0jSafjed4zVV865T3zy7kveM/87XT/r4VxjSe8Z9A/uBICAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADNECABghggBAMwQIQCAGe6ijX6VM26c98z7P7nTe+Z/lj7rPSNJXxkxIoUp/ztip2JbrNB7Zs+Ds1Pa17Wg/3Go+gf/u2hPC171nrmUP8p75jbvCfQXroQAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADPcwBT96l/+aoL3zMl5P0lhT6nciLT//CqVm5EuKfWeuXrqfe8ZSQp8fVJKc4AvroQAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADPcwBT96s5vnrFeQp9+fSHiPbPp/b/0nslf67xnrp5q8p5J1ceTc/ttXxjauBICAJghQgAAM94ROnz4sBYtWqSCggIFAgHt2bMn6XXnnGpqalRQUKBRo0aprKxMJ0+eTNd6AQCDiHeEuru7NWXKFG3ZsqXX159++mlt2rRJW7ZsUWNjoyKRiBYuXKiurq4vvFgAwODi/cGEiooKVVRU9Pqac06bN2/W+vXrVVlZKUl68cUXlZ+frx07duixxx77YqsFAAwqaX1PqLm5WW1tbSovL088FwwGNW/ePB05cqTXmXg8rlgslvQAAAwNaY1QW1ubJCk/Pz/p+fz8/MRrn1VbW6twOJx4FBYWpnNJAIABLCOfjgsEAkk/O+dueu6GdevWqbOzM/FoaWnJxJIAAANQWn9ZNRK5/ot+bW1tikajiefb29tvujq6IRgMKhgMpnMZAIAskdYroeLiYkUiEdXV1SWeu3z5shoaGlRaWprOXQEABgHvK6ELFy7ogw8+SPzc3Nyst99+W2PGjNFdd92l1atXa+PGjZowYYImTJigjRs36vbbb9ejjz6a1oUDALKfd4TeeustzZ8/P/FzdXW1JGnZsmX6xS9+obVr1+rSpUt64okn9PHHH2vGjBl6/fXXFQqF0rdqAMCg4B2hsrIyOff5N18MBAKqqalRTU3NF1kXBqsV/u///auqVd4zhXVXvWckafTJ3j/F2ZexH77vPZPa6vrPxfzeP0gEpBv3jgMAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAICZtH6zKnArVz9o9p756vf8Z1LV0297GtiuTO+yXgKGCK6EAABmiBAAwAwRAgCYIUIAADNECABghggBAMwQIQCAGSIEADBDhAAAZogQAMAMEQIAmCFCAAAz3MAU+ILO/k2p90zP7c5/RwH/EaWwG0mqnPBmaoOeVn5U5j0z6rV/8p5J8TCgH3AlBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCY4QamGPBycnO9Zz69d0JK+xqx7g/eM+/c899T2pevEYEc75kr7moGVtK7Q5du95756D/f5T3jet71nsHAxZUQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGG5giZYFg0Hvm8rzJ3jPfe/aX3jPzR/2j94wk/eFq3Hvm0KUvec/8zfuLvWdenvQL75mC4f7/jFJ127Ar3jOn/8NfeM985dRt3jPXPv3Uewb9gyshAIAZIgQAMOMdocOHD2vRokUqKChQIBDQnj17kl5fvny5AoFA0mPmzJnpWi8AYBDxjlB3d7emTJmiLVu2fO429913n1pbWxOP/fv3f6FFAgAGJ+8PJlRUVKiioqLPbYLBoCKRSMqLAgAMDRl5T6i+vl55eXmaOHGiVqxYofb29s/dNh6PKxaLJT0AAEND2iNUUVGhl156SQcPHtQzzzyjxsZGLViwQPF47x99ra2tVTgcTjwKCwvTvSQAwACV9t8TWrp0aeKvS0pKNG3aNBUVFWnfvn2qrKy8aft169apuro68XMsFiNEADBEZPyXVaPRqIqKitTU1NTr68FgUMEUfukRAJD9Mv57Qh0dHWppaVE0Gs30rgAAWcb7SujChQv64IMPEj83Nzfr7bff1pgxYzRmzBjV1NTowQcfVDQa1ZkzZ/TDH/5QY8eO1QMPPJDWhQMAsp93hN566y3Nnz8/8fON93OWLVumrVu36sSJE9q+fbs++eQTRaNRzZ8/X7t27VIoFErfqgEAg0LAOeesF/GnYrGYwuGwyrRYwwMjrJczJAy7zf+GkJLUsfTr3jO/3fjTlPbla9LLq1KaG3/oqvdMcF+j98zwqP/v0f2bA83eM2vu+F/eMwPdrL/9a++Z/O3/nNK+rl28mNLcUNfjrqher6qzs1O5ubl9bsu94wAAZogQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGAm49+siv4VSOFbat/b9K9T2td7i/vnjtiLTy3xnpn449Mp7evqH9q9Z4YXjveembL3rPfM9+/4vfdM57XL3jOSNOPv13jPRO/xP3b/OHmX98yb/8X/vFv6yP3eM5J0/qeTvWdu67iS0r585dT/U7/sJ9O4EgIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzHAD0wEsMNz/H8+pzVO8Z9775s+8ZyTpo56498w3/8da75kv/93/9p7pSeFGpJJ05d9O9Z4p+W/HvWc25B3zntkWK/Ke+eX6Rd4zkvTV3Ue9Z3LG3uE9U7ZwlfdM99JO75lXvv6C94wkjf+p/w2BU/EP3f7H7vmJX8nASvofV0IAADNECABghggBAMwQIQCAGSIEADBDhAAAZogQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBluYDqAtXz/Xu+Z9775E++ZcynciFSS/v1T3/ee+fKe094zf1xQ7D3j/irkPSNJvy7xP37jcvxvcjlpp/+NOyc+f9575vZTv/OeSdXV8x3eM7kvpzLjPaKHnvC/ca4k5T/0YUpz3tb8RQpDJ9O9ChNcCQEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADNECABghggBAMwQIQCAGSIEADBDhAAAZgLOOWe9iD8Vi8UUDodVpsUaHhhhvRxT60+/7T0zI3jFe+aPV1O7gelzH8/wnrlz5MfeM8ty++kmkimatOOvvWe+uq7Re8b19HjPABZ63BXV61V1dnYqNze3z225EgIAmCFCAAAzXhGqra3V9OnTFQqFlJeXpyVLlujUqVNJ2zjnVFNTo4KCAo0aNUplZWU6eXJwfO8FACC9vCLU0NCgqqoqHT16VHV1derp6VF5ebm6u7sT2zz99NPatGmTtmzZosbGRkUiES1cuFBdXV1pXzwAILt5fbPqa6+9lvTztm3blJeXp2PHjmnu3Llyzmnz5s1av369KisrJUkvvvii8vPztWPHDj322GPpWzkAIOt9ofeEOjs7JUljxoyRJDU3N6utrU3l5eWJbYLBoObNm6cjR470+mfE43HFYrGkBwBgaEg5Qs45VVdXa/bs2SopKZEktbW1SZLy8/OTts3Pz0+89lm1tbUKh8OJR2FhYapLAgBkmZQjtHLlSr3zzjt6+eWXb3otEAgk/eycu+m5G9atW6fOzs7Eo6WlJdUlAQCyjNd7QjesWrVKe/fu1eHDhzV+/PjE85FIRNL1K6JoNJp4vr29/aaroxuCwaCCwWAqywAAZDmvKyHnnFauXKndu3fr4MGDKi4uTnq9uLhYkUhEdXV1iecuX76shoYGlZaWpmfFAIBBw+tKqKqqSjt27NCrr76qUCiUeJ8nHA5r1KhRCgQCWr16tTZu3KgJEyZowoQJ2rhxo26//XY9+uijGfkbAABkL68Ibd26VZJUVlaW9Py2bdu0fPlySdLatWt16dIlPfHEE/r44481Y8YMvf766wqFQmlZMABg8OAGpgPYnHc+9Z75/h0nMrASW/e/V+k9c/bN8bfeqBdf+XWn94w7+YH/zJXL3jNAtuAGpgCArECEAABmiBAAwAwRAgCYIUIAADNECABghggBAMwQIQCAGSIEADBDhAAAZogQAMAMEQIAmCFCAAAzKX2zKvrHkfkF3jMz/uMC75nOKand0Xn4//G/y/nE5/7Ffz9t7d4zX/40ta+Jv5bSFIBUcSUEADBDhAAAZogQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJjhBqYD2NWOP3rP5P/0iP+M90TqevpxXwAGPq6EAABmiBAAwAwRAgCYIUIAADNECABghggBAMwQIQCAGSIEADBDhAAAZogQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADNeEaqtrdX06dMVCoWUl5enJUuW6NSpU0nbLF++XIFAIOkxc+bMtC4aADA4eEWooaFBVVVVOnr0qOrq6tTT06Py8nJ1d3cnbXffffeptbU18di/f39aFw0AGByG+2z82muvJf28bds25eXl6dixY5o7d27i+WAwqEgkkp4VAgAGrS/0nlBnZ6ckacyYMUnP19fXKy8vTxMnTtSKFSvU3t7+uX9GPB5XLBZLegAAhoaUI+ScU3V1tWbPnq2SkpLE8xUVFXrppZd08OBBPfPMM2psbNSCBQsUj8d7/XNqa2sVDocTj8LCwlSXBADIMgHnnEtlsKqqSvv27dMbb7yh8ePHf+52ra2tKioq0s6dO1VZWXnT6/F4PClQsVhMhYWFKtNiDQ+MSGVpAABDPe6K6vWqOjs7lZub2+e2Xu8J3bBq1Srt3btXhw8f7jNAkhSNRlVUVKSmpqZeXw8GgwoGg6ksAwCQ5bwi5JzTqlWr9Morr6i+vl7FxcW3nOno6FBLS4ui0WjKiwQADE5e7wlVVVXpV7/6lXbs2KFQKKS2tja1tbXp0qVLkqQLFy7oySef1JtvvqkzZ86ovr5eixYt0tixY/XAAw9k5G8AAJC9vK6Etm7dKkkqKytLen7btm1avny5cnJydOLECW3fvl2ffPKJotGo5s+fr127dikUCqVt0QCAwcH7f8f1ZdSoUTpw4MAXWhAAYOjg3nEAADNECABghggBAMwQIQCAGSIEADBDhAAAZogQAMAMEQIAmCFCAAAzRAgAYIYIAQDMECEAgBkiBAAwQ4QAAGaIEADADBECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADPDrRfwWc45SVKPrkjOeDEAAG89uiLp///7vC8DLkJdXV2SpDe033glAIAvoqurS+FwuM9tAu7PSVU/unbtms6dO6dQKKRAIJD0WiwWU2FhoVpaWpSbm2u0Qnsch+s4DtdxHK7jOFw3EI6Dc05dXV0qKCjQsGF9v+sz4K6Ehg0bpvHjx/e5TW5u7pA+yW7gOFzHcbiO43Adx+E66+NwqyugG/hgAgDADBECAJjJqggFg0Ft2LBBwWDQeimmOA7XcRyu4zhcx3G4LtuOw4D7YAIAYOjIqishAMDgQoQAAGaIEADADBECAJjJqgg9++yzKi4u1m233aapU6fqt7/9rfWS+lVNTY0CgUDSIxKJWC8r4w4fPqxFixapoKBAgUBAe/bsSXrdOaeamhoVFBRo1KhRKisr08mTJ20Wm0G3Og7Lly+/6fyYOXOmzWIzpLa2VtOnT1coFFJeXp6WLFmiU6dOJW0zFM6HP+c4ZMv5kDUR2rVrl1avXq3169fr+PHjmjNnjioqKnT27FnrpfWrSZMmqbW1NfE4ceKE9ZIyrru7W1OmTNGWLVt6ff3pp5/Wpk2btGXLFjU2NioSiWjhwoWJ+xAOFrc6DpJ03333JZ0f+/cPrnswNjQ0qKqqSkePHlVdXZ16enpUXl6u7u7uxDZD4Xz4c46DlCXng8sS9957r3v88ceTnrvnnnvcD37wA6MV9b8NGza4KVOmWC/DlCT3yiuvJH6+du2ai0Qi7qmnnko89+mnn7pwOOyee+45gxX2j88eB+ecW7ZsmVu8eLHJeqy0t7c7Sa6hocE5N3TPh88eB+ey53zIiiuhy5cv69ixYyovL096vry8XEeOHDFalY2mpiYVFBSouLhYDz/8sE6fPm29JFPNzc1qa2tLOjeCwaDmzZs35M4NSaqvr1deXp4mTpyoFStWqL293XpJGdXZ2SlJGjNmjKShez589jjckA3nQ1ZE6Pz587p69ary8/OTns/Pz1dbW5vRqvrfjBkztH37dh04cEAvvPCC2traVFpaqo6ODuulmbnxz3+onxuSVFFRoZdeekkHDx7UM888o8bGRi1YsEDxeNx6aRnhnFN1dbVmz56tkpISSUPzfOjtOEjZcz4MuLto9+WzX+3gnLvpucGsoqIi8deTJ0/WrFmzdPfdd+vFF19UdXW14crsDfVzQ5KWLl2a+OuSkhJNmzZNRUVF2rdvnyorKw1XlhkrV67UO++8ozfeeOOm14bS+fB5xyFbzoesuBIaO3ascnJybvovmfb29pv+i2coGT16tCZPnqympibrpZi58elAzo2bRaNRFRUVDcrzY9WqVdq7d68OHTqU9NUvQ+18+Lzj0JuBej5kRYRGjhypqVOnqq6uLun5uro6lZaWGq3KXjwe17vvvqtoNGq9FDPFxcWKRCJJ58bly5fV0NAwpM8NSero6FBLS8ugOj+cc1q5cqV2796tgwcPqri4OOn1oXI+3Oo49GbAng+GH4rwsnPnTjdixAj385//3P3+9793q1evdqNHj3ZnzpyxXlq/WbNmjauvr3enT592R48edffff78LhUKD/hh0dXW548ePu+PHjztJbtOmTe748ePuww8/dM4599RTT7lwOOx2797tTpw44R555BEXjUZdLBYzXnl69XUcurq63Jo1a9yRI0dcc3OzO3TokJs1a5a78847B9Vx+O53v+vC4bCrr693ra2ticfFixcT2wyF8+FWxyGbzoesiZBzzv3sZz9zRUVFbuTIke4b3/hG0scRh4KlS5e6aDTqRowY4QoKClxlZaU7efKk9bIy7tChQ07STY9ly5Y5565/LHfDhg0uEom4YDDo5s6d606cOGG76Azo6zhcvHjRlZeXu3HjxrkRI0a4u+66yy1btsydPXvWetlp1dvfvyS3bdu2xDZD4Xy41XHIpvOBr3IAAJjJiveEAACDExECAJghQgAAM0QIAGCGCAEAzBAhAIAZIgQAMEOEAABmiBAAwAwRAgCYIUIAADNECABg5v8C0bsnr9hbA+wAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# View an image to make sure everything went well\n",
+    "which_one = 5\n",
+    "plt.imshow(train_images[which_one].reshape((28,28)));"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "textile-tower",
+   "metadata": {},
+   "source": [
+    "## 2. Radial Basis Function (RBF) networks"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "acquired-malaysia",
+   "metadata": {},
+   "source": [
+    "For our task, we will be using Radial Basis Function (RBF) Networks as our neural network model.\n",
+    "The pipeline, which is presented in the image below, consists of two layers. The first employs non-linear functions $g_1(\\mathbf{x};\\boldsymbol{\\mu}): \\mathbb{R}^{n \\times d} \\rightarrow \\mathbb{R}^{n \\times c}$.\n",
+    "The second is a linear layer, represented by a matrix of weights $\\mathbf{W} \\in \\mathbb{R}^{c \\times K}$, which maps the output of the previous layer to class scores; its role is to predict labels. \n",
+    "\n",
+    "The pipeline proceeds in the following steps:\n",
+    "\n",
+    "i) Choose a set of $c$ points $\\boldsymbol{\\mu}_j\\in [0,1]^d$.     \n",
+    "ii) Compute $g_1(\\mathbf{x}_i;\\boldsymbol{\\mu}_j) = \\exp^{-\\frac{||{\\mathbf{x}_i-\\boldsymbol{\\mu}_j||^2}}{\\sigma^2}}=a_{ij}$ for all possible pairs of $i$ and $j$. Here $\\sigma$ is a hyperparameter that controls the width of the gaussian.  \n",
+    "iii) Compute the predicted labels $g_2(\\mathbf{a}_i)= \\mathbf{a}_i^{\\top}\\mathbf{W}= \\hat{\\mathbf{y}}_i$. Here $\\mathbf{a}_i \\in \\mathbb{R}^c$ are the outputs of the layer $g_1$ for an input image $i$. $\\hat{\\mathbf{y}}_i$ is a row vector and $\\hat{y}_{ij} = \\sum_{m=1}^{c}a_{im}w_{mj}$, $j\\in\\{1,...,K\\}$. \n",
+    "\n",
+    "![RBF_NN.png](images/RBF_NN.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f762f16e-4a88-43a8-813f-c93c62407c72",
+   "metadata": {},
+   "source": [
+    "Intuitively, the first layer of the RBF network can be viewed as matching the input data with a set of prototypes (templates) through a gaussian whose width is determined by $\\sigma$. The second layer performs a weighted combination of the matching scores of the previous layer to determine the predicted label for a given point. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "126fdb8e-9607-4f93-9726-692e5ed8bb91",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "source": [
+    "**1.** For hyperparameters $c$ and $\\sigma$ of your choice, select $c$ prototypes and obtain the output of the first layer of the RBF network. The prototypes can simply be random images from your training set.\n",
+    "\n",
+    "The following functions might be helpful:\n",
+    "- [pairwise_distances (from scikit-learn)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)\n",
+    "- [random.choice (from numpy)](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html)\n",
+    "\n",
+    "You can (optionally) perform an additional normalization step on the activations using the [softmax](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html) function.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "0e4ab127-352e-49c3-9b72-9e69bfc8b4ba",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "def get_rand_centers(num_centers, imgs):\n",
+    "    \"\"\"\n",
+    "    Sample num_centers (randomly) from imgs\n",
+    "\n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    num_centers : number of samples\n",
+    "    imgs : matrix to sample rows from\n",
+    "\n",
+    "    Returns\n",
+    "    -------\n",
+    "    The samples matrix\n",
+    "    \"\"\"\n",
+    "    # BEGIN SOLUTION\n",
+    "    num_imgs = imgs.shape[0]\n",
+    "    if num_centers > num_imgs or num_centers < 1:\n",
+    "        raise ValueError(\"Invalid number of centers requested\")\n",
+    "    rand_center_indices = np.random.choice(num_imgs, size=num_centers, replace=False)\n",
+    "    rand_centers = imgs[rand_center_indices]\n",
+    "    return rand_centers\n",
+    "    # END SOLUTION\n",
+    "    \n",
+    "def get_activations(imgs, rand_centers, sigma, softmax_norm=False):\n",
+    "    \"\"\"\n",
+    "    Computes the activations of the images vs. the sample centers\n",
+    "\n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    imgs: images matrix to compute activations for\n",
+    "    rand_centers: matrix of centers points\n",
+    "    sigma: parameter of the gaussian kernel\n",
+    "    softmax_norm: if true, perform softmax activation on the activations\n",
+    "\n",
+    "    Returns\n",
+    "    -------\n",
+    "    The activation matrix A\n",
+    "    \"\"\"\n",
+    "    # BEGIN SOLUTION\n",
+    "    # compute activations of the first layer\n",
+    "    if imgs.shape[1] != rand_centers.shape[1]:\n",
+    "        raise ValueError(\"Size mismatch between images/centers\")\n",
+    "    distmat = pairwise_distances(imgs, rand_centers)\n",
+    "\n",
+    "    activations = np.exp(-distmat*distmat/(sigma*sigma))\n",
+    "\n",
+    "    # optional normalization step\n",
+    "    if softmax_norm:\n",
+    "        activations = softmax(activations, axis=1)\n",
+    "    return activations\n",
+    "    # END SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "id": "19e1cca2-bb2c-40c7-b64a-e14c3b42e54d",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "#pick random centers\n",
+    "number_of_centers = 200\n",
+    "\n",
+    "rand_centers = get_rand_centers(number_of_centers, train_images)\n",
+    "\n",
+    "sigma = 10\n",
+    "activations = get_activations(train_images, rand_centers, sigma)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6bec41d9",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "outputs": [],
+   "source": [
+    "grader.check(\"q1\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "institutional-thompson",
+   "metadata": {},
+   "source": [
+    "## 3. Training the network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "disciplinary-present",
+   "metadata": {},
+   "source": [
+    "To make things easier, we will fix the parameters $\\boldsymbol{\\mu}$ and $\\sigma$ of the network, i.e., we decide their values before and the remain constant throughout training and testing of the model. Therefore, the only trainable parameters are going to be the weights of the second layer.\n",
+    "To train the model, we are going to use the mean squared loss function that we mentioned earlier. For a training dataset with $n$ images we have\n",
+    "\n",
+    "$$ \\mathcal{L}(\\text{training data}, \\text{training labels}) = \\frac{1}{2n}\\sum_{i=1}^n\\mathcal{L}(\\hat{\\mathbf{y}}_i,\\mathbf{y}_i) = \\frac{1}{2n}\\sum_{i=1}^n ||(\\hat{\\mathbf{y}}_{i} - \\mathbf{y}_{i})||^2.$$\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    " There are two ways of tuning those:  \n",
+    "i) Backpropagation.   \n",
+    "ii) Solve a linear system."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0ef20c5-0cc0-4a7d-a80b-8acfa207f82e",
+   "metadata": {},
+   "source": [
+    "### 3.1 Training with backpropagation\n",
+    "\n",
+    "Backpropagation depends on [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent#Description). The goal is to update the trainable parameters of the network by \"moving them\" in the direction that will decrease the loss function.\n",
+    "In our case, the weights $w_{kl}$ are updated in the following manner\n",
+    "$$ w_{kl}' = w_{kl}- \\gamma \\frac{\\partial\\mathcal{L}(\\text{training data}, \\text{training labels})}{\\partial w_{kl}}, $$\n",
+    "where $\\gamma$ is a hyper-parameter called the learning rate. The gradient of the Loss points towards the direction of steepest descent, hence we update the weights of the network towards that direction.  \n",
+    "\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "189e8b8e-a909-4f9e-950a-e0f65a48d70c",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "source": [
+    "<!-- BEGIN QUESTION -->\n",
+    "\n",
+    "2. For the mean squared error loss, what is the gradient of the loss with respect to the weights $w_{kl}$ of the network?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0df8dee2",
+   "metadata": {
+    "tags": [
+     "otter_answer_cell"
+    ]
+   },
+   "source": [
+    "_Type your answer here, replacing this text._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9813a207-46ae-4891-9b43-1b59a2a3717c",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "source": [
+    "First let us expand the expression of the loss:\n",
+    "\n",
+    "$\\sum_{i=1}^n ||(\\hat{\\mathbf{y}}_{i} - \\mathbf{y}_{i})||^2 = \\sum_{i=1}^n \\sum_{p=1}^K(\\hat{\\mathbf{y}}_{ip} - \\mathbf{y}_{ip})^2$\n",
+    "\n",
+    "The gradient of the loss with respect to the weights for a given input image can be calculated as\n",
+    "$$\\frac{\\partial \\mathcal{L}(\\hat{\\mathbf{y}}_i,\\mathbf{y}_i)}{\\partial w_{kl}} =\\frac{\\partial \\sum_{p=1}^K (\\hat{y}_{ip}-y_{ip})^2}{\\partial w_{kl}}=  \\frac{ \\sum_{p=1}^K \\partial(\\hat{y}_{ip}-y_{ip})^2}{\\partial w_{kl}} = \\sum_{p=1}^K 2(\\hat{y}_{ip}-y_{ip})\\frac{ \\partial\\hat{y}_{ip}}{\\partial w_{kl}}.  $$\n",
+    "\n",
+    "If we expand $\\hat{y}_{ip} = \\sum_{m=1}^{c}a_{im}w_{mp}$ in the partial derivative, this leads to :\n",
+    "\n",
+    "$\\frac{\\partial \\mathcal{L}(\\hat{\\mathbf{y}}_i,\\mathbf{y}_i)}{\\partial w_{kl}} = \\sum_{p=1}^K 2(\\hat{y}_{ip}-y_{ip})\\frac{ \\partial\\sum_{m=1}^{c}a_{im}w_{mp}}{\\partial w_{kl}} = \\sum_{p=1}^K 2(\\hat{y}_{ip}-y_{ip})\\sum_{m=1}^{c}a_{im}\\frac{ \\partial w_{mp}}{\\partial w_{kl}}$.\n",
+    "\n",
+    "The term $\\frac{ \\partial w_{mp}}{\\partial w_{kl}}$ is always zero except when $p=l$ and $m=k$, therefore:\n",
+    "\n",
+    "$\\frac{\\partial \\mathcal{L}(\\hat{\\mathbf{y}}_i,\\mathbf{y}_i)}{\\partial w_{kl}} = 2(\\hat{y}_{il}-y_{il})a_{ik}$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "injured-mother",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "source": [
+    "<!-- END QUESTION -->\n",
+    "\n",
+    "3. Train the weights of the linear layer using stochastic gradient descent. For $p$ iterations (called epochs), you have to update each weight $w_{kl}$ of the network once for each image, by computing the gradient of the loss with respect to that weight.\n",
+    "\n",
+    "NB: if you implement gradient computation naively, it might be very slow. Consider using [numpy.outer](https://numpy.org/doc/stable/reference/generated/numpy.outer.html) to speed up computation.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "ddb1a6cc-c9d5-4561-8c0d-0f0c5b396ae1",
+   "metadata": {
+    "deletable": false,
+    "editable": false,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# Helper function to compute the loss\n",
+    "def get_predictions_loss(activations, weight, labels, int_labels):\n",
+    "    predictions = activations@weights\n",
+    "    num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()\n",
+    "    loss = ((predictions - labels)*(predictions - labels)).sum(1).mean()\n",
+    "    return loss, num_correct_predictions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 66,
+   "id": "e365cb99-d2c3-4b98-a814-e5c5485f3967",
+   "metadata": {
+    "scrolled": true,
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# compute the gradient for a single input\n",
+    "def compute_gradient(activation, weights, train_label):\n",
+    "    \"\"\"\n",
+    "    Computes gradients of the weight for a single activation\n",
+    "\n",
+    "    Parameters\n",
+    "    ----------\n",
+    "    activation : vector containing the activation of the current image\n",
+    "    weights: current weights\n",
+    "    train_label: label of the current image\n",
+    "\n",
+    "    Returns\n",
+    "    -------\n",
+    "    The gradient to update the weights \n",
+    "    \"\"\"\n",
+    "    # BEGIN SOLUTION\n",
+    "    prediction = activation@weights # denoted as \\hat{y} in the text\n",
+    "    delta = prediction - train_label\n",
+    "    grad = np.outer(delta, activation).T\n",
+    "    # the above line is a much faster shortcut for :\n",
+    "    #grad = np.zeros(weights.shape)\n",
+    "    #for k in range(grad.shape[0]):\n",
+    "    #    grad[k, :] = 2*delta*activation[k]\n",
+    "    return grad\n",
+    "    # END SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "46e31c54-3682-4124-8a08-f675fba974f7",
+   "metadata": {
+    "scrolled": true,
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "# Initial values for hyperparams. Feel free to experiment with them.\n",
+    "weights = (1/28)*np.random.randn(number_of_centers, num_classes)\n",
+    "epochs = 5 # you should train for more epochs !\n",
+    "learning_rate = 0.1\n",
+    "\n",
+    "#Backpropagation with SGD\n",
+    "for k in range(epochs):\n",
+    "    for counter, activation in enumerate(activations):\n",
+    "        gradient = compute_gradient(activation, weights, train_labels[counter])/(2*train_set_size) # SOLUTION\n",
+    "        weights = weights - learning_rate*gradient # SOLUTION\n",
+    "    \n",
+    "    loss_train, num_correct_predictions_train = get_predictions_loss(activations, weights, train_labels, int_labels_train)\n",
+    "    print(\"Loss:\", loss_train)\n",
+    "    print(\"Number of correct predictions:\", num_correct_predictions_train)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "957d63fb",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "outputs": [],
+   "source": [
+    "grader.check(\"q3\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "criminal-imperial",
+   "metadata": {},
+   "source": [
+    "We can now check how well your network does on the test set and print its accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 88,
+   "id": "civil-riding",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The accuracy on the test set is: 50.73 %\n"
+     ]
+    }
+   ],
+   "source": [
+    "def get_accuracy(predictions, int_labels, set_size):\n",
+    "    num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()\n",
+    "    return num_correct_predictions/set_size\n",
+    "\n",
+    "test_activations = get_activations(test_images, rand_centers, sigma)\n",
+    "test_predictions = test_activations@weights\n",
+    "print(f\"The accuracy on the test set is: {get_accuracy(test_predictions, int_labels_test, test_set_size)*100} %\")  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "promising-funds",
+   "metadata": {},
+   "source": [
+    "### 3.2 Solving the linear system"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8234141-0eeb-49da-8d37-892a6416e41d",
+   "metadata": {},
+   "source": [
+    "Since we only have one weight matrix to tune, we can avoid learning with backpropagation entirely. Consider the mean squared error for the whole dataset and a one-dimensional binary label $y_i$ for each data point for simplicity. The mean squared loss for the dataset is\n",
+    "$$  \\sum_{i=1}^n (\\hat{{y}}_{i} - {y}_{i})^2=  ||(\\mathbf{A}\\mathbf{w} - \\mathbf{y})||^2.$$ Here $\\mathbf{A} \\in \\mathbb{R}^{n \\times c}$ is the matrix that contains the outputs (activations) of the first layer. From a linear algebra perspective, we are looking for a matrix $\\mathbf{w}$ that solves the linear system $ \\mathbf{A}\\mathbf{w} = \\mathbf{y}.$  \n",
+    "\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e691348-afe4-4021-a42b-3985a4e9999e",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "source": [
+    "<!-- BEGIN QUESTION -->\n",
+    "\n",
+    "4. Can we find solutions to this system (justify) and how ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf9e153e",
+   "metadata": {
+    "tags": [
+     "otter_answer_cell"
+    ]
+   },
+   "source": [
+    "_Type your answer here, replacing this text._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "express-office",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "source": [
+    "The system is overdetermined. We have more equations (1 equation per data point) than unknown variables (1 variable for each template). We can find approximate solutions with a [least squares](https://en.wikipedia.org/wiki/Overdetermined_system#Approximate_solutions) approach, i.e., using the pseudoinverse."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "subsequent-exercise",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "source": [
+    "<!-- END QUESTION -->\n",
+    "\n",
+    "<!-- BEGIN QUESTION -->\n",
+    "\n",
+    "5. Based on your answer above, compute the weights of the neural network that best classify the data points of the training set."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e7fb43a",
+   "metadata": {
+    "tags": [
+     "otter_answer_cell"
+    ]
+   },
+   "source": [
+    "_Type your answer here, replacing this text._"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 89,
+   "id": "collected-renaissance",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "#calculate the weights of the linear layer\n",
+    "weights_lsq = np.linalg.pinv(activations)@train_labels # SOLUTION"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "perfect-intersection",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "source": [
+    "Using the weights you computed, classify the points in the training set and print the accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 90,
+   "id": "chinese-foster",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The accuracy on the training set is: 91.43 %\n"
+     ]
+    }
+   ],
+   "source": [
+    "#predict the labels of each image in the training set and compute the accuracy\n",
+    "train_prediction_lsq = activations@weights_lsq # SOLUTION\n",
+    "print(f\"The accuracy on the training set is: {get_accuracy(train_prediction_lsq, int_labels_train, train_set_size)*100} %\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "determined-start",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "source": [
+    "Using the weights you computed, classify the points in the test set and print the accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 91,
+   "id": "young-invitation",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "#calculate the activations of the test set\n",
+    "test_activations = get_activations(test_images, rand_centers, sigma)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 92,
+   "id": "naked-dover",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The accuracy on the test set is: 91.75 %\n"
+     ]
+    }
+   ],
+   "source": [
+    "#predict the accuracy on the test set\n",
+    "test_predictions_lsq = test_activations@weights_lsq # SOLUTION\n",
+    "print(f\"The accuracy on the test set is: {get_accuracy(test_predictions_lsq, int_labels_test, test_set_size)*100} %\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "handmade-warrant",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "source": [
+    "<!-- END QUESTION -->\n",
+    "\n",
+    "<!-- BEGIN QUESTION -->\n",
+    "\n",
+    "### 6. **Open ended**: On the choice of templates. \n",
+    "Suggest a different or more refined way to select templates for the RBF network and implement it. Check how it compares with your original approach.\n",
+    "Check how it works with the backpropagation and linear system solutions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b05dc5e",
+   "metadata": {
+    "tags": [
+     "otter_answer_cell"
+    ]
+   },
+   "source": [
+    "_Type your answer here, replacing this text._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c84c3a52-26bd-49d5-b92c-724e6f309b82",
+   "metadata": {
+    "tags": [
+     "otter_assign_solution_cell"
+    ]
+   },
+   "source": [
+    "Here are a few things that can be tried in order to better select the template images:\n",
+    "\n",
+    "- split up the images based on label and sample an equal amount of images from each class.  This ensures that there won't be imbalances in the representations of templates/centers.\n",
+    "- split up the images based on label and furthermore subdivide the images of each class into a few groups. Compute the average image in each group. You will have a few \"average images\" in each class. Use those as centroids/templates. (All of this averaging stuff is easily done with matrix ops)\n",
+    "- filter the images (similar to how it was done in previous notebooks) and then sample extra centers from the filtered data. Use those as additional centers.\n",
+    "- apply standard clustering techniques like [K-Means](https://en.wikipedia.org/wiki/K-means_clustering) and [Spectral Clustering](https://en.wikipedia.org/wiki/Spectral_clustering)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8734c27e",
+   "metadata": {
+    "deletable": false,
+    "editable": false
+   },
+   "source": [
+    "<!-- END QUESTION -->\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "51d702c97c98eba6c840e0251c2afd2cba9486e0f46fe64ab3cf287f49537506"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.10"
+  },
+  "otter": {
+   "OK_FORMAT": true,
+   "tests": {
+    "q1": {
+     "name": "q1",
+     "points": null,
+     "suites": [
+      {
+       "cases": [
+        {
+         "code": ">>> imgs = np.ones((100, 20))\n>>> assert get_rand_centers(10, imgs).shape == (10, imgs.shape[1])\n",
+         "failure_message": "get_rand_centers should return a (num_centers, img_size) matrix",
+         "hidden": false,
+         "locked": false,
+         "success_message": "Good, your implementation of get_rand_centers returns a matrix of the correct size"
+        },
+        {
+         "code": ">>> imgs = np.outer(np.arange(0, 4), np.ones((1, 10)))\n>>> centers = np.array([imgs[0, :], imgs[3, :]])\n>>> np.testing.assert_almost_equal(get_activations(imgs, centers, 10.0, softmax_norm=False), np.array([[1.0, 0.40656966], [0.90483742, 0.67032005], [0.67032005, 0.90483742], [0.40656966, 1.0]]))\n",
+         "failure_message": "Your activations (without softmax) look incorrect, check your implementation",
+         "hidden": false,
+         "locked": false,
+         "success_message": "Good, your activations (without softmax) look correct"
+        },
+        {
+         "code": ">>> imgs = np.outer(np.arange(0, 4), np.ones((1, 10)))\n>>> centers = np.array([imgs[0, :], imgs[3, :]])\n>>> np.testing.assert_almost_equal(get_activations(imgs, centers, 10.0, softmax_norm=True), np.array([[0.64415184, 0.35584816], [0.5583621, 0.4416379], [0.4416379, 0.5583621], [0.35584816, 0.64415184]]))\n",
+         "failure_message": "Your activations (with soiftmax) look incorrect, check your implementation",
+         "hidden": false,
+         "locked": false,
+         "success_message": "Good, your activations (with softmax) look correct"
+        }
+       ],
+       "scored": true,
+       "setup": "",
+       "teardown": "",
+       "type": "doctest"
+      }
+     ]
+    },
+    "q3": {
+     "name": "q3",
+     "points": null,
+     "suites": [
+      {
+       "cases": [
+        {
+         "code": ">>> def test_grad(N, Nc, acts, w, label, k):\n...     g = compute_gradient(acts[k, :], w, label)\n...     assert g.shape == (N, Nc)\n...     gt = np.zeros((N, Nc))\n...     gt[k, :] = np.ones((1, Nc))\n...     np.testing.assert_array_almost_equal(g, gt)\n>>> N = 10\n>>> Nc = 2\n>>> act = np.eye(N)\n>>> w = np.ones((N, Nc))\n>>> train_label = 0\n>>> for k in range(N):\n...     test_grad(N, Nc, act, w, train_label, k)\n",
+         "failure_message": "Check your gradient computation",
+         "hidden": false,
+         "locked": false,
+         "success_message": "Good, your gradient look correct"
+        }
+       ],
+       "scored": true,
+       "setup": "",
+       "teardown": "",
+       "type": "doctest"
+      }
+     ]
+    }
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
+%% Cell type:code id:bd39160c tags:
+
+``` python
+# Initialize Otter
+import otter
+grader = otter.Notebook("RBF networks.ipynb")
+```
+
+%% Cell type:markdown id:b62d1e09-41c3-40d0-a763-e8231d437183 tags:
+
+# Matrix Analysis 2024 - EE312
+## Week 8 - Image classification with Radial Basis Function (RBF) networks
+[LTS2](https://lts2.epfl.ch)
+
+%% Cell type:code id:nasty-access tags:
+
+``` python
+import numpy as np
+from matplotlib import pyplot as plt
+from sklearn.metrics import pairwise_distances
+from scipy.spatial import distance_matrix
+from scipy.special import softmax
+import requests
+from pathlib import Path
+```
+
+%% Cell type:markdown id:vietnamese-basin tags:
+
+##  1. Image Classification
+In this exercise, we will be doing image classification with a simple neural network. For simplicity, let's assume we will be working with black and white images.
+Given an input image $i$ represented as a vector of  pixel intensities $ \mathbf{x}_i \in [0,1]^d$, we want to predict its correct label $\mathbf{y}_i$, which is represented as a one-hot vector in $\{0,1\}^K$, where $K$ is the number of possible categories (classes) that the image may belong to. For example, we may have pictures of cats and dogs, and our goal would be to correctly tag those images as either cat or dog. In that case we would have $K=2$, and the vectors $\begin{pmatrix}0 \\ 1\end{pmatrix}$ and $\begin{pmatrix}1 \\ 0\end{pmatrix}$ to represent the classes of cat and dog.
+
+In today's example we will be using the MNIST handwritten digit dataset. It contains images of handwritten numbers from 0 to 9 and our goal is to create a model that can accurately tag each image with its number. Let's load the data first.
+
+%% Cell type:code id:exact-cycling tags:
+
+``` python
+### Load the data
+
+# Download data if needed
+if not Path("./mnist_data.npz").is_file():
+    r = requests.get('https://os.unil.cloud.switch.ch/swift/v1/lts2-ee312/mnist_data.npz', allow_redirects=True)
+    with open('mnist_data.npz', 'wb') as f: # save locally
+        f.write(r.content)
+
+
+mnist = np.load('mnist_data.npz')
+```
+
+%% Cell type:markdown id:varying-taylor tags:
+
+In the context of classification, neural networks are models that given one (or multiple) input data points produce as output a set of corresponding labels for each input. The model itself consists of parametric functions $g_i$ which can be applied sequentially to the input data, resulting in a set of labels which are the model's prediction for the data. For example, in a model that consists of two parameteric functions $g_1$ and $g_2$, for a given $\mathbf{x}_i$, we have the predicted label $ \hat{\mathbf{y}}_i = g_1(g_2(\mathbf{x}_i))$. The parametric functions are commonly called "layers".
+
+In a standard image classification setup, we are given some training data which we can use to tune the parameters of the parametric functions $g_i$ in order to improve its ability to predict the labels correctly. The parameters are generally tuned with respect to some objective (commonly called a loss function). We want to find the parameters of the model that minimize this loss function. Various loss functions can be used, but in general they tend to encode how "wrong" the model is. For
+example, on a given image $i$ one can use the loss $\mathcal{L}(\hat{\mathbf{y}_i}, \mathbf{y}_i)= \sum_{j=1}^{K}(\hat{{y}}_{ij} -{y}_{ij})^2 $, which is the mean squared difference between the vector coordinates of the predicted label of the image and the ones of the actual label $\mathbf{y}_i$.
+Minimizing the loss over the whole training set is referred to as "training the model". Furthermore, the goal is that given new data we have not seen before and we have not trained our model with, the model will still be able to classify accurately.
+
+Before we go into the details of the model and how we will train it, let's prepare the data.
+
+%% Cell type:code id:e2140162-4414-47c2-ae99-d616300a1d65 tags:
+
+``` python
+# Preprocess the data
+
+images = mnist['data']
+num_images = images.shape[0]
+
+train_set_size = 60000
+test_set_size = 10000
+
+train_images = images[:train_set_size]
+train_images = train_images/255.
+train_images =  train_images
+
+test_images = images[-test_set_size:]
+test_images = test_images/255.
+test_images = test_images
+
+#create one-hot encodings of labels
+mnist_target = mnist['target']
+num_classes = mnist_target.max()+1
+labels = []
+for k in range(num_images):
+    one_hot = np.zeros(num_classes)
+    one_hot[int(mnist_target[k])]=1
+    labels += [one_hot]
+labels = np.array(labels)
+
+#labels in one-hot
+train_labels = labels[:train_set_size]
+test_labels = labels[-test_set_size:]
+
+#labels in integer form
+int_labels = np.array(mnist_target, dtype=int)
+int_labels_train = int_labels[:train_set_size]
+int_labels_test = int_labels[-test_set_size:]
+```
+
+%% Cell type:code id:00b3e353-a254-4cb5-975d-0741a50576c7 tags:
+
+``` python
+# View an image to make sure everything went well
+which_one = 5
+plt.imshow(train_images[which_one].reshape((28,28)));
+```
+
+%% Output
+
+
+
+%% Cell type:markdown id:textile-tower tags:
+
+## 2. Radial Basis Function (RBF) networks
+
+%% Cell type:markdown id:acquired-malaysia tags:
+
+For our task, we will be using Radial Basis Function (RBF) Networks as our neural network model.
+The pipeline, which is presented in the image below, consists of two layers. The first employs non-linear functions $g_1(\mathbf{x};\boldsymbol{\mu}): \mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times c}$.
+The second is a linear layer, represented by a matrix of weights $\mathbf{W} \in \mathbb{R}^{c \times K}$, which maps the output of the previous layer to class scores; its role is to predict labels.
+
+The pipeline proceeds in the following steps:
+
+i) Choose a set of $c$ points $\boldsymbol{\mu}_j\in [0,1]^d$.
+ii) Compute $g_1(\mathbf{x}_i;\boldsymbol{\mu}_j) = \exp^{-\frac{||{\mathbf{x}_i-\boldsymbol{\mu}_j||^2}}{\sigma^2}}=a_{ij}$ for all possible pairs of $i$ and $j$. Here $\sigma$ is a hyperparameter that controls the width of the gaussian.
+iii) Compute the predicted labels $g_2(\mathbf{a}_i)= \mathbf{a}_i^{\top}\mathbf{W}= \hat{\mathbf{y}}_i$. Here $\mathbf{a}_i \in \mathbb{R}^c$ are the outputs of the layer $g_1$ for an input image $i$. $\hat{\mathbf{y}}_i$ is a row vector and $\hat{y}_{ij} = \sum_{m=1}^{c}a_{im}w_{mj}$, $j\in\{1,...,K\}$.
+
+![RBF_NN.png](images/RBF_NN.png)
+
+%% Cell type:markdown id:f762f16e-4a88-43a8-813f-c93c62407c72 tags:
+
+Intuitively, the first layer of the RBF network can be viewed as matching the input data with a set of prototypes (templates) through a gaussian whose width is determined by $\sigma$. The second layer performs a weighted combination of the matching scores of the previous layer to determine the predicted label for a given point.
+
+%% Cell type:markdown id:126fdb8e-9607-4f93-9726-692e5ed8bb91 tags:
+
+**1.** For hyperparameters $c$ and $\sigma$ of your choice, select $c$ prototypes and obtain the output of the first layer of the RBF network. The prototypes can simply be random images from your training set.
+
+The following functions might be helpful:
+- [pairwise_distances (from scikit-learn)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)
+- [random.choice (from numpy)](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html)
+
+You can (optionally) perform an additional normalization step on the activations using the [softmax](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html) function.
+
+%% Cell type:code id:0e4ab127-352e-49c3-9b72-9e69bfc8b4ba tags:otter_assign_solution_cell
+
+``` python
+def get_rand_centers(num_centers, imgs):
+    """
+    Sample num_centers (randomly) from imgs
+
+    Parameters
+    ----------
+    num_centers : number of samples
+    imgs : matrix to sample rows from
+
+    Returns
+    -------
+    The samples matrix
+    """
+    # BEGIN SOLUTION
+    num_imgs = imgs.shape[0]
+    if num_centers > num_imgs or num_centers < 1:
+        raise ValueError("Invalid number of centers requested")
+    rand_center_indices = np.random.choice(num_imgs, size=num_centers, replace=False)
+    rand_centers = imgs[rand_center_indices]
+    return rand_centers
+    # END SOLUTION
+
+def get_activations(imgs, rand_centers, sigma, softmax_norm=False):
+    """
+    Computes the activations of the images vs. the sample centers
+
+    Parameters
+    ----------
+    imgs: images matrix to compute activations for
+    rand_centers: matrix of centers points
+    sigma: parameter of the gaussian kernel
+    softmax_norm: if true, perform softmax activation on the activations
+
+    Returns
+    -------
+    The activation matrix A
+    """
+    # BEGIN SOLUTION
+    # compute activations of the first layer
+    if imgs.shape[1] != rand_centers.shape[1]:
+        raise ValueError("Size mismatch between images/centers")
+    distmat = pairwise_distances(imgs, rand_centers)
+
+    activations = np.exp(-distmat*distmat/(sigma*sigma))
+
+    # optional normalization step
+    if softmax_norm:
+        activations = softmax(activations, axis=1)
+    return activations
+    # END SOLUTION
+```
+
+%% Cell type:code id:19e1cca2-bb2c-40c7-b64a-e14c3b42e54d tags:otter_assign_solution_cell
+
+``` python
+#pick random centers
+number_of_centers = 200
+
+rand_centers = get_rand_centers(number_of_centers, train_images)
+
+sigma = 10
+activations = get_activations(train_images, rand_centers, sigma)
+```
+
+%% Cell type:code id:6bec41d9 tags:
+
+``` python
+grader.check("q1")
+```
+
+%% Cell type:markdown id:institutional-thompson tags:
+
+## 3. Training the network
+
+%% Cell type:markdown id:disciplinary-present tags:
+
+To make things easier, we will fix the parameters $\boldsymbol{\mu}$ and $\sigma$ of the network, i.e., we decide their values before and the remain constant throughout training and testing of the model. Therefore, the only trainable parameters are going to be the weights of the second layer.
+To train the model, we are going to use the mean squared loss function that we mentioned earlier. For a training dataset with $n$ images we have
+
+$$ \mathcal{L}(\text{training data}, \text{training labels}) = \frac{1}{2n}\sum_{i=1}^n\mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i) = \frac{1}{2n}\sum_{i=1}^n ||(\hat{\mathbf{y}}_{i} - \mathbf{y}_{i})||^2.$$
+
+
+
+
+ There are two ways of tuning those:
+i) Backpropagation.
+ii) Solve a linear system.
+
+%% Cell type:markdown id:f0ef20c5-0cc0-4a7d-a80b-8acfa207f82e tags:
+
+### 3.1 Training with backpropagation
+
+Backpropagation depends on [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent#Description). The goal is to update the trainable parameters of the network by "moving them" in the direction that will decrease the loss function.
+In our case, the weights $w_{kl}$ are updated in the following manner
+$$ w_{kl}' = w_{kl}- \gamma \frac{\partial\mathcal{L}(\text{training data}, \text{training labels})}{\partial w_{kl}}, $$
+where $\gamma$ is a hyper-parameter called the learning rate. The gradient of the Loss points towards the direction of steepest descent, hence we update the weights of the network towards that direction.
+
+
+---
+
+%% Cell type:markdown id:189e8b8e-a909-4f9e-950a-e0f65a48d70c tags:
+
+<!-- BEGIN QUESTION -->
+
+2. For the mean squared error loss, what is the gradient of the loss with respect to the weights $w_{kl}$ of the network?
+
+%% Cell type:markdown id:0df8dee2 tags:otter_answer_cell
+
+_Type your answer here, replacing this text._
+
+%% Cell type:markdown id:9813a207-46ae-4891-9b43-1b59a2a3717c tags:otter_assign_solution_cell
+
+First let us expand the expression of the loss:
+
+$\sum_{i=1}^n ||(\hat{\mathbf{y}}_{i} - \mathbf{y}_{i})||^2 = \sum_{i=1}^n \sum_{p=1}^K(\hat{\mathbf{y}}_{ip} - \mathbf{y}_{ip})^2$
+
+The gradient of the loss with respect to the weights for a given input image can be calculated as
+$$\frac{\partial \mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i)}{\partial w_{kl}} =\frac{\partial \sum_{p=1}^K (\hat{y}_{ip}-y_{ip})^2}{\partial w_{kl}}=  \frac{ \sum_{p=1}^K \partial(\hat{y}_{ip}-y_{ip})^2}{\partial w_{kl}} = \sum_{p=1}^K 2(\hat{y}_{ip}-y_{ip})\frac{ \partial\hat{y}_{ip}}{\partial w_{kl}}.  $$
+
+If we expand $\hat{y}_{ip} = \sum_{m=1}^{c}a_{im}w_{mp}$ in the partial derivative, this leads to :
+
+$\frac{\partial \mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i)}{\partial w_{kl}} = \sum_{p=1}^K 2(\hat{y}_{ip}-y_{ip})\frac{ \partial\sum_{m=1}^{c}a_{im}w_{mp}}{\partial w_{kl}} = \sum_{p=1}^K 2(\hat{y}_{ip}-y_{ip})\sum_{m=1}^{c}a_{im}\frac{ \partial w_{mp}}{\partial w_{kl}}$.
+
+The term $\frac{ \partial w_{mp}}{\partial w_{kl}}$ is always zero except when $p=l$ and $m=k$, therefore:
+
+$\frac{\partial \mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i)}{\partial w_{kl}} = 2(\hat{y}_{il}-y_{il})a_{ik}$.
+
+%% Cell type:markdown id:injured-mother tags:
+
+<!-- END QUESTION -->
+
+3. Train the weights of the linear layer using stochastic gradient descent. For $p$ iterations (called epochs), you have to update each weight $w_{kl}$ of the network once for each image, by computing the gradient of the loss with respect to that weight.
+
+NB: if you implement gradient computation naively, it might be very slow. Consider using [numpy.outer](https://numpy.org/doc/stable/reference/generated/numpy.outer.html) to speed up computation.
+
+%% Cell type:code id:ddb1a6cc-c9d5-4561-8c0d-0f0c5b396ae1 tags:
+
+``` python
+# Helper function to compute the loss
+def get_predictions_loss(activations, weight, labels, int_labels):
+    predictions = activations@weights
+    num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()
+    loss = ((predictions - labels)*(predictions - labels)).sum(1).mean()
+    return loss, num_correct_predictions
+```
+
+%% Cell type:code id:e365cb99-d2c3-4b98-a814-e5c5485f3967 tags:otter_assign_solution_cell
+
+``` python
+# compute the gradient for a single input
+def compute_gradient(activation, weights, train_label):
+    """
+    Computes gradients of the weight for a single activation
+
+    Parameters
+    ----------
+    activation : vector containing the activation of the current image
+    weights: current weights
+    train_label: label of the current image
+
+    Returns
+    -------
+    The gradient to update the weights
+    """
+    # BEGIN SOLUTION
+    prediction = activation@weights # denoted as \hat{y} in the text
+    delta = prediction - train_label
+    grad = np.outer(delta, activation).T
+    # the above line is a much faster shortcut for :
+    #grad = np.zeros(weights.shape)
+    #for k in range(grad.shape[0]):
+    #    grad[k, :] = 2*delta*activation[k]
+    return grad
+    # END SOLUTION
+```
+
+%% Cell type:code id:46e31c54-3682-4124-8a08-f675fba974f7 tags:otter_assign_solution_cell
+
+``` python
+# Initial values for hyperparams. Feel free to experiment with them.
+weights = (1/28)*np.random.randn(number_of_centers, num_classes)
+epochs = 5 # you should train for more epochs !
+learning_rate = 0.1
+
+#Backpropagation with SGD
+for k in range(epochs):
+    for counter, activation in enumerate(activations):
+        gradient = compute_gradient(activation, weights, train_labels[counter])/(2*train_set_size) # SOLUTION
+        weights = weights - learning_rate*gradient # SOLUTION
+
+    loss_train, num_correct_predictions_train = get_predictions_loss(activations, weights, train_labels, int_labels_train)
+    print("Loss:", loss_train)
+    print("Number of correct predictions:", num_correct_predictions_train)
+```
+
+%% Cell type:code id:957d63fb tags:
+
+``` python
+grader.check("q3")
+```
+
+%% Cell type:markdown id:criminal-imperial tags:
+
+We can now check how well your network does on the test set and print its accuracy.
+
+%% Cell type:code id:civil-riding tags:
+
+``` python
+def get_accuracy(predictions, int_labels, set_size):
+    num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()
+    return num_correct_predictions/set_size
+
+test_activations = get_activations(test_images, rand_centers, sigma)
+test_predictions = test_activations@weights
+print(f"The accuracy on the test set is: {get_accuracy(test_predictions, int_labels_test, test_set_size)*100} %")
+```
+
+%% Output
+
+    The accuracy on the test set is: 50.73 %
+
+%% Cell type:markdown id:promising-funds tags:
+
+### 3.2 Solving the linear system
+
+%% Cell type:markdown id:f8234141-0eeb-49da-8d37-892a6416e41d tags:
+
+Since we only have one weight matrix to tune, we can avoid learning with backpropagation entirely. Consider the mean squared error for the whole dataset and a one-dimensional binary label $y_i$ for each data point for simplicity. The mean squared loss for the dataset is
+$$  \sum_{i=1}^n (\hat{{y}}_{i} - {y}_{i})^2=  ||(\mathbf{A}\mathbf{w} - \mathbf{y})||^2.$$ Here $\mathbf{A} \in \mathbb{R}^{n \times c}$ is the matrix that contains the outputs (activations) of the first layer. From a linear algebra perspective, we are looking for a matrix $\mathbf{w}$ that solves the linear system $ \mathbf{A}\mathbf{w} = \mathbf{y}.$
+
+
+---
+
+%% Cell type:markdown id:4e691348-afe4-4021-a42b-3985a4e9999e tags:
+
+<!-- BEGIN QUESTION -->
+
+4. Can we find solutions to this system (justify) and how ?
+
+%% Cell type:markdown id:cf9e153e tags:otter_answer_cell
+
+_Type your answer here, replacing this text._
+
+%% Cell type:markdown id:express-office tags:otter_assign_solution_cell
+
+The system is overdetermined. We have more equations (1 equation per data point) than unknown variables (1 variable for each template). We can find approximate solutions with a [least squares](https://en.wikipedia.org/wiki/Overdetermined_system#Approximate_solutions) approach, i.e., using the pseudoinverse.
+
+%% Cell type:markdown id:subsequent-exercise tags:
+
+<!-- END QUESTION -->
+
+<!-- BEGIN QUESTION -->
+
+5. Based on your answer above, compute the weights of the neural network that best classify the data points of the training set.
+
+%% Cell type:markdown id:6e7fb43a tags:otter_answer_cell
+
+_Type your answer here, replacing this text._
+
+%% Cell type:code id:collected-renaissance tags:otter_assign_solution_cell
+
+``` python
+#calculate the weights of the linear layer
+weights_lsq = np.linalg.pinv(activations)@train_labels # SOLUTION
+```
+
+%% Cell type:markdown id:perfect-intersection tags:otter_assign_solution_cell
+
+Using the weights you computed, classify the points in the training set and print the accuracy.
+
+%% Cell type:code id:chinese-foster tags:otter_assign_solution_cell
+
+``` python
+#predict the labels of each image in the training set and compute the accuracy
+train_prediction_lsq = activations@weights_lsq # SOLUTION
+print(f"The accuracy on the training set is: {get_accuracy(train_prediction_lsq, int_labels_train, train_set_size)*100} %")
+```
+
+%% Output
+
+    The accuracy on the training set is: 91.43 %
+
+%% Cell type:markdown id:determined-start tags:otter_assign_solution_cell
+
+Using the weights you computed, classify the points in the test set and print the accuracy.
+
+%% Cell type:code id:young-invitation tags:otter_assign_solution_cell
+
+``` python
+#calculate the activations of the test set
+test_activations = get_activations(test_images, rand_centers, sigma)
+```
+
+%% Cell type:code id:naked-dover tags:otter_assign_solution_cell
+
+``` python
+#predict the accuracy on the test set
+test_predictions_lsq = test_activations@weights_lsq # SOLUTION
+print(f"The accuracy on the test set is: {get_accuracy(test_predictions_lsq, int_labels_test, test_set_size)*100} %")
+```
+
+%% Output
+
+    The accuracy on the test set is: 91.75 %
+
+%% Cell type:markdown id:handmade-warrant tags:
+
+<!-- END QUESTION -->
+
+<!-- BEGIN QUESTION -->
+
+### 6. **Open ended**: On the choice of templates.
+Suggest a different or more refined way to select templates for the RBF network and implement it. Check how it compares with your original approach.
+Check how it works with the backpropagation and linear system solutions.
+
+%% Cell type:markdown id:4b05dc5e tags:otter_answer_cell
+
+_Type your answer here, replacing this text._
+
+%% Cell type:markdown id:c84c3a52-26bd-49d5-b92c-724e6f309b82 tags:otter_assign_solution_cell
+
+Here are a few things that can be tried in order to better select the template images:
+
+- split up the images based on label and sample an equal amount of images from each class.  This ensures that there won't be imbalances in the representations of templates/centers.
+- split up the images based on label and furthermore subdivide the images of each class into a few groups. Compute the average image in each group. You will have a few "average images" in each class. Use those as centroids/templates. (All of this averaging stuff is easily done with matrix ops)
+- filter the images (similar to how it was done in previous notebooks) and then sample extra centers from the filtered data. Use those as additional centers.
+- apply standard clustering techniques like [K-Means](https://en.wikipedia.org/wiki/K-means_clustering) and [Spectral Clustering](https://en.wikipedia.org/wiki/Spectral_clustering)
+
+%% Cell type:markdown id:8734c27e tags:
+
+<!-- END QUESTION -->
+
--- a/solutions/week07/images/RBF_NN.png
+++ b/solutions/week07/images/RBF_NN.png