Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • lts2/matrix-analysis-2025
  • magbi/matrix-analysis-2025
  • ibelhadj/matrix-analysis-2025
  • aboujaou/matrix-analysis-2025
  • wazen/matrix-analysis-2025
  • kuyumcu/matrix-analysis-2025
6 results
Show changes
Commits on Source (21)
Showing
with 9141 additions and 10 deletions
......@@ -26,13 +26,13 @@ Follow the below instructions to install it and create an environment for the co
1. Download the Python 3.x installer for Windows, macOS, or Linux from <https://conda.io/miniconda.html> and install with default settings. For windows, it is advised to use the the Windows Subsystem for Linux (WSL) which allows you to run all Linux commands and applications within Windows. In order to use it, you need to install it first, e.g. for [Ubuntu](https://ubuntu.com/wsl) (but other Linux distributions are available as well). Once installed, open the Ubuntu WSL and proceed with Miniconda installation for Linux and create the environment.
Skip this step if you have conda already installed.
* macOS: double-click on `Miniconda3-latest-MacOSX-x86_64.pkg` or run `bash Miniconda3-latest-MacOSX-x86_64.sh` in a terminal.
* macOS: double-click on `Miniconda3-latest-MacOSX-x86_64.pkg` or run `bash Miniconda3-latest-MacOSX-x86_64.sh` in a terminal. Use the `MacOSX-arm64` file on newer Macs with M1/M2/M3 chips.
* Linux: run `bash Miniconda3-latest-Linux-x86_64.sh` in a terminal or use your package manager.
1. Open a terminal.
Windows: open the Anaconda Prompt from the Start menu.
1. Install git with `conda install git`.
1. Navigate to the folder where you want to store the course material with `cd path/to/folder`.
1. Download this repository with `git clone https://gitlab.epfl.ch/lts2/matrix-analysis-2024`.
1. Download this repository with `git clone https://gitlab.epfl.ch/lts2/matrix-analysis-2025`.
1. Enter the repository with `cd matrix-analysis-2025`.
1. Create an environment with the packages required for the course with `conda env create -f environment.yml`.
1. Run the steps below to start Jupyter. You should be able to run the [`test_install.ipynb`][test_install] notebook.
......@@ -54,18 +54,17 @@ Every time you want to work, do the following:
[miniconda]: https://conda.io/miniconda.html
[conda]: https://conda.io
[conda-forge]: https://conda-forge.org
[test_install]: https://nbviewer.org/github/epfl-lts2/matrix-analysis-2025/blob/master/test_install.ipynb
[test_install]: https://nbviewer.org/urls/gitlab.epfl.ch/lts2/matrix-analysis-2025/-/raw/master/test_install.ipynb/%3Fref_type%3Dheads
### Binder/Colab...
You can also run those notebooks using other online services such as [binder](https://mybinder.org) or [Google colab](https://colab.research.google.com/).
### Binder
You can also run those notebooks using other online services such as [binder](https://mybinder.org).
Clicking on one of the badges below will open this repository in binder/colab.
Clicking on the badge below will open this repository in binder.
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/epfl-lts2/matrix-analysis-2024/HEAD)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/epfl-lts2/matrix-analysis-2024/)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fgitlab.epfl.ch%2Flts2%2Fmatrix-analysis-2025/HEAD?urlpath=%2Fdoc%2Ftree%2Ftest_install.ipynb)
### Workaround for MacOS swiss layout keyboards
Jupyterlab suffers from an annoying quirk when using a swiss layout keyboard on MacOs which prevents you from (simply) typing the '@' character in a cell using `Alt + G`. This is a [known issue](https://github.com/jupyterlab/jupyterlab/issues/7704), for which a workaround exists.
Jupyterlab might suffer from an annoying quirk when using a swiss layout keyboard on MacOs which prevents you from (simply) typing the '@' character in a cell using `Alt + G`. This is a [known issue](https://github.com/jupyterlab/jupyterlab/issues/7704), for which a workaround exists.
Create a keyboard shortcut: open `Settings>Advanced Settings>Keyboard Shortcuts` and add the following configuration block:
```
......
......@@ -11,4 +11,5 @@ dependencies:
- scikit-image=0.24.0
- scikit-learn=1.5.1
- scipy=1.14.1
\ No newline at end of file
- pip:
- otter-grader==6.1.0
%% Cell type:code id:9c8c9f98 tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Background material - introduction to python.ipynb")
```
%% Cell type:markdown id:6ea9e1f3-15b6-494e-ad15-de8b909eb99a tags:
# Matrix Analysis 2025 - EE312
## Week 1 - Background material - Python/Numpy
[N. Aspert](https://people.epfl.ch/nicolas.aspert) - [LTS2](https://lts2.epfl.ch)
The assignments and exercises for this will use Python as a programming language. Python is a general-purpose programming language which is used in various areas. It benefits from an ecosystem of libraries, such as [numpy](https://numpy.org), [scipy](https://scipy.org), [matplotlib](https://matplotlib.org) and many others, which turns it into a powerful tool for scientific computing.
## Objectives
This notebook is meant as an introduction (or a reminder) to Python, Numpy and Matplotlib.
### Notebooks
During this course, we will be using Python through **notebooks**. A notebook allow you to write and execute Python code within your web browser. You can either run the notebook locally on your computer, or using an online service such as [noto](https://noto.epfl.ch) , or [binder](https://mybinder.org). See the [README](https://gitlab.epfl.ch/lts2/matrix-analysis-2025) file in the course Github repository for more details and installation instructions.
You can always call the `help` function within the notebook to access documentation about a particular function/type..., e.g.
%% Cell type:code id:8a73970c-f4a0-4827-badc-987031ea32b9 tags:
``` python
help(complex)
```
%% Cell type:markdown id:c6d34fae-b5e4-492a-8acb-d51523b4e3a0 tags:
## Python
Python is a very popular dynamically-typed interpreted language. Its design emphasizes readability, by using indentation as code block delimiter, and plain English words (`and`, `or`, ...) instead of the logical operator symbols you may encounter in C or Java :
```
// in C or Java, indentation is optional (but highly recommended to ensure code remains readable)
int f(int x, int y) {
if (x == 1 || y > 2) {
x += 2;
y = x*3;
}
return x*y;
}
```
is written as
```
# in Python indentation is NOT optional
def f(x, y):
if x == 1 or y > 2:
x += 2
y = x*3
return x*y
```
### Basic types
Unsurprisingly, Python has a number of base types you can also find in other programming languages, such as integers, floats, strings and booleans, and behave in a similar way.
#### Numbers
%% Cell type:code id:8717ffea-a560-4c93-ba08-a7e1fae84ce0 tags:
``` python
x = 5
print(type(x)) # Prints "<class 'int'>"
print(x) # Prints "5"
print(x + 1) # Addition; prints "6"
print(x - 1) # Subtraction; prints "4"
print(x * 2) # Multiplication; prints "10"
print(x ** 2) # Exponentiation; prints "25"
x += 1
print(x) # Prints "6"
x *= 2
print(x) # Prints "12"
y = 1.5
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints "1.5 2.5 3.0 2.25"
```
%% Cell type:markdown id:1ee1f385-dbfe-44bf-85ee-34bb49bb85f4 tags:
In Python, the `float` type is *double-precision* (i.e. 64-bit), unlike C or Java. Python also benefits from a built-in `complex` type:
%% Cell type:code id:2fc7c8dd-8237-4cac-8213-068736b3ff74 tags:
``` python
z = 1. + 1.j
print(2*z) # Prints "2+2j"
print(z**2) # Prints "2j"
print(type(z)) # Prints "<class 'complex'>"
```
%% Cell type:markdown id:16707a13-1a84-46db-8ef6-60ce83a893f7 tags:
#### Boolean
Boolean type can have (surprise) two values: `True` or `False` (capital first character matters). Instead of using symbolic operators, Python uses plain English words `and`, `or`, `not`:
%% Cell type:code id:a3d629ea-4dda-4a4d-8da7-08e1f2cf153b tags:
``` python
b = True
c = False
print(type(b)) # Prints "<class 'bool'>"
print(b and c) # Prints "False"
print(b and not c) # Prints "True"
```
%% Cell type:markdown id:5da8a22b-b99a-476c-94b8-312f4e587df3 tags:
A little care needs to be taken about the `None` value. The `None` keyword is used to define a null value, or no value at all. `None` is not the same as 0, False, or an empty string.
%% Cell type:code id:d27e37db-af3e-4330-ba5a-a5cdde69738f tags:
``` python
n = None
print(type(n)) # Prints "< class 'NoneType'>"
```
%% Cell type:markdown id:22da9ea1-828b-49c7-bbfc-a690ca9febce tags:
However, logical operators treat `None` in a specific way
%% Cell type:code id:191609f5-93ab-421a-abe6-d5e39509f6ae tags:
``` python
print(n and b) # Prints "None"
print(not n) # Prints "True"
print(n or b) # Prints "True"
```
%% Cell type:markdown id:9c115ccf-6f2a-43aa-9d75-5d77244917f4 tags:
#### Strings
Python supports strings
%% Cell type:code id:78e0f787-45d8-4843-b7e1-d51d115cc7c5 tags:
``` python
ma = 'matrix' # String literals can use single quotes
an = "analysis" # or double quotes; it does not matter.
print(ma) # Prints "matrix"
print(len(ma)) # String length; prints "6"
maan = ma + ' ' + an # String concatenation
print(maan) # prints "matrix analysis"
maan23 = '{} {} {}'.format(ma, an, 2023) # string formatting
print(maan23) # prints "matrix analysis 2023"
y0 = 2000
y1 = 23
maan23b = f'{ma} {an} {y0 + y1}' # in python >= 3.6, you can do "string interpolation"
print(maan23b)
```
%% Cell type:markdown id:e8159eda-9dfe-4ee1-b1f7-cd0a4ea0f28e tags:
`string` has several built-in methods (check the official [documentation](https://docs.python.org/3.10/library/stdtypes.html#string-methods) for more) :
%% Cell type:code id:23c524d0-6456-4ed0-8c01-cbeeb8da280e tags:
``` python
s = "matrix"
print(s.capitalize()) # Capitalize a string; prints "Matrix"
print(s.upper()) # Convert a string to uppercase; prints "MATRIX"
print(s.rjust(7)) # Right-justify a string, padding with spaces; prints " matrix"
print(s.center(8)) # Center a string, padding with spaces; prints " matrix "
print(an.replace('a', '(a)')) # Replace all instances of one substring with another;
# prints "(a)n(a)lysis"
print(' analysis '.strip()) # Strip leading and trailing whitespace; prints "analysis"
```
%% Cell type:markdown id:97c55eaf-76ae-4ae0-a8ff-590f79fbb440 tags:
### Functions
Python functions are defined using the `def` keyword, e.g. :
%% Cell type:code id:2c3c69a7-329e-4618-a1cc-58975e4ed9bc tags:
``` python
def f(x, y):
if x == 1 or y > 2:
x += 2
y = x*3
return x*y
```
%% Cell type:code id:8de94c7f-42d3-4168-a072-7ede461ee21d tags:
``` python
f(1, 2)
```
%% Cell type:code id:54b35024-477f-4cd2-a126-b9288752f73e tags:
``` python
f(2, 2)
```
%% Cell type:markdown id:cde2dc6c-a93b-4589-bb11-3f78cb153790 tags:
default arguments can be handy
%% Cell type:code id:b69bb42a-b8d3-435e-9c45-8cb3f463b1dc tags:
``` python
def f_default(x, y=2):
return x**y + x
```
%% Cell type:code id:26e9ee43-51d5-45dc-a822-551fac13ff8b tags:
``` python
f_default(3) # equivalent to writing: f_default(3, 2)
```
%% Cell type:markdown id:db9c6909-e292-4118-bd70-d59770bee052 tags:
It is often the case that functions have a lot of default arguments and you only need to specify a small number of them. You can then use the named argument shortcut:
%% Cell type:code id:acdbc77f-bb18-4499-9f78-c908d6daf04b tags:
``` python
def f_long(x, y=1, z=2, h=3, t=None):
res = x + y - z + h
if t: # test if t has a value
res = res*t
return res
```
%% Cell type:code id:25b5a8ca-1e7d-4935-8842-3ac2e685e12b tags:
``` python
f_long(12, h=4)
```
%% Cell type:markdown id:134846f6-2f40-49bc-aa11-b1ffa3352b71 tags:
Check the [documentation](https://docs.python.org/3.10/tutorial/controlflow.html#defining-functions) for more details.
#### Control flow
Python supports `if`...`elif`...`else`, `match`..., `for` loops, `while`..., check the [documentation](https://docs.python.org/3.10/reference/compound_stmts.html) or the [tutorial](https://docs.python.org/3.10/tutorial/controlflow.html#) for more details.
### Data structures
Python has several built-in containers, we will only present the lists and tuples. Check the [documentation](https://docs.python.org/3.10/tutorial/datastructures.html) for more details and other interesting data structures (such as dictionaries or sets).
#### Lists
A Python list is the equivalent of an array, but it can be resized and can mix different types of elements.
%% Cell type:code id:249faa6f-1ebd-4f41-8326-9354f6f685ba tags:
``` python
xs = [2, 0, 3] # Create a list
print(xs, xs[2]) # Prints "[2, 0, 3] 3"
print(xs[-1]) # Negative indices count from the end of the list; prints "3"
xs[2] = 'hello' # Lists can contain elements of different types
print(xs) # Prints "[2, 0, 'hello']"
xs.append('world') # Add a new element to the end of the list
print(xs) # Prints "[2, 0, 'hello', 'world']"
x = xs.pop() # Remove and return the last element of the list
print(x, xs) # Prints "world [2, 0, 'hello']"
```
%% Cell type:markdown id:cbd37263-ab32-4b5e-b62f-177598215443 tags:
You can not only access individual elements, but also range of elements. This is known as *slicing*
%% Cell type:code id:993c8e49-4394-41e7-b8ac-7208e018e152 tags:
``` python
nums = list(range(8)) # range is a built-in function that creates a list of integers
print(nums) # Prints "[0, 1, 2, 3, 4, 6, 7]"
print(nums[3:5]) # Get a slice from index 2 to 4 (exclusive); prints "[3, 4]"
print(nums[4:]) # Get a slice from index 2 to the end; prints "[4, 5, 6, 7]"
print(nums[:2]) # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:]) # Get a slice of the whole list; prints "[0, 1, 2, 3, 4, 5, 6, 7]"
print(nums[:-1]) # Slice indices can be negative; prints "[0, 1, 2, 3, 4, 5, 6]"
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print(nums) # Prints "[0, 1, 8, 9, 4, 5, 6, 7]"
```
%% Cell type:markdown id:75654e1e-0b1a-4bca-82a0-4f6568768f66 tags:
You can iterate over a list:
%% Cell type:code id:23e132bb-a53a-448d-b557-f4ca688f6f81 tags:
``` python
fruits = ['apple', 'banana', 'pear']
for fruit in fruits:
print(fruit)
# Prints "apple", "banana", "pear", each on its own line.
```
%% Cell type:markdown id:2cc8dedf-42d3-4e4d-ab8c-da6225ae1b7c tags:
*List comprehensions* are a convenient shortcut for list transformation operations :
%% Cell type:code id:14e8ea49-de95-43ce-aab5-074895e8cbba tags:
``` python
nums = list(range(4))
dbl = []
for n in nums:
dbl.append(n*2)
print(dbl)
# Will print "[0, 2, 4, 6]"
```
%% Cell type:markdown id:9b98c63a-a15c-4f27-9c32-f603ac54c80b tags:
Rewrite the above loop with a list comprehension:
%% Cell type:code id:8008e4fe-df16-4bb5-a0a1-c9a17576f77a tags:
``` python
dbl_c = [n*2 for n in nums]
print(dbl_c) # Prints "[0, 2, 4, 6]"
```
%% Cell type:markdown id:df73bcde-1c47-4b07-b6da-5ae74f2516cb tags:
You can include conditions in list comprehensions:
%% Cell type:code id:21d6a2a8-910c-4b2d-8043-2b2b9ff9e2e2 tags:
``` python
dbl_cc = [n*2 for n in nums if n < 2]
print(dbl_cc) # Prints "[0, 2]"
```
%% Cell type:markdown id:801114b6-3e68-4924-95ad-becc288ebd54 tags:
#### Tuples
A tuple is an immutable list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot.
%% Cell type:code id:c183a69f-bce4-4d51-93ab-823b736b9526 tags:
``` python
t = (5, 6) # Create a tuple
print(type(t)) # Prints "<class 'tuple'>"
print(t[0]) # Prints "5"
```
%% Cell type:markdown id:6a8c72d4-8edc-4918-a81c-13fc68494493 tags:
## Numpy
[Numpy](https://nupy.org) is a core library for scientific computing. It provides optimized routines for multi-dimensional arrays. You can find many online tutorials about Numpy. If you are familiar with Matlab, you may want to check the [Numpy for Matlab users tutorial](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html). As always, check [Numpy documentation](https://numpy.org/doc/stable/user/basics.html) for more information.
### Arrays
Arrays are the base type used by Numpy. Unlike Python lists, they consists in elements having all the same type. They are indexed by non-negative integers, and their elements can also be accessed using square brackets.
You can easily create Numpy arrays from nested lists.
%% Cell type:code id:3ded689b-7cf3-4523-9248-bb096e3809e8 tags:
``` python
import numpy as np
a = np.array([1, 2, 3]) # Create a 1D array
print(type(a)) # Prints "<class 'numpy.ndarray'>"
print(a.shape) # Prints "(3,)", this is Python's way of displaying a tuple with only one element
print(a[0], a[1], a[2]) # Prints "1 2 3"
a[0] = 42 # Change an element of the array
print(a) # Prints "[42, 2, 3]"
b = np.array([[1,2,3],[4,5,6]]) # Create a 2D array
print(b.shape) # Prints "(2, 3)"
print(b[0, 0], b[0, 2], b[1, 0]) # Prints "1 3 4"
```
%% Cell type:markdown id:645ed555-832f-457d-aeea-e517b0529a12 tags:
Numpy has helper functions to create arrays:
%% Cell type:code id:eff9d7e8-96d7-4850-8af7-d03be195390d tags:
``` python
a = np.zeros((3,3)) # Create an array of all zeros. You have to pass a tuple as argument
# (beware of the double parentheses)
print(a) # Prints "[[ 0. 0. 0.]
# [ 0. 0. 0.]
# [ 0. 0. 0.]]"
b = np.ones((1, 3)) # Create an array of all ones
print(b) # Prints "[[ 1. 1. 1.]]"
c = np.full((2,2), 4) # Create a constant array
print(c) # Prints "[[ 4. 4.]
# [ 4. 4.]]"
d = np.eye(2) # Create a 2x2 identity matrix
print(d) # Prints "[[ 1. 0.]
# [ 0. 1.]]"
e = np.random.random((2,2)) # Create an array filled with random values
print(e) # Might print "[[0.45199657 0.95344055]
# [0.65255911 0.79999078]]"
```
%% Cell type:markdown id:805a0618-b803-40e9-8480-50520f9ff1be tags:
### Array indexing
We will just review the most important ways of accessing Numpy array elements. Check the [documentation](https://numpy.org/doc/stable/reference/arrays.indexing.html) for details.
You can slice Numpy arrays similarly to lists.
%% Cell type:code id:03a2dad6-8df9-4ddf-8c6c-3ee93a8e0c72 tags:
``` python
# Create the following rank 2 array with shape (3, 4)
# [[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
# [6 7]]
b = a[:2, 1:3]
# A slice of an array is a view into the same data, so modifying it
# will modify the original array !!
print(a[0, 1]) # Prints "2"
b[0, 0] = 77 # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1]) # Prints "77"
# You can use ':' to specify the whole range of a dimension.
# This prints
# [[ 1 77]
# [ 5 6]
# [ 9 10]]
print(a[:, :2])
# When dealing with n-dimensional arrays, the special shortcut '...' stands for 'all remaining dimensions'
c = np.ones((3,2,2)) # create a 3D array
c[1, ...] = c[1, :, :]*2
c[2, ...] = c[2, :, :]*3
# This prints
# [[[1. 1.]
# [1. 1.]]
#
# [[2. 2.]
# [2. 2.]]
#
# [[3. 3.]
# [3. 3.]]]
print(c)
# This prints
# [[[1., 1.],
# [1., 1.]]
#
# [[2., 2.],
# [2., 2.]]]
print(c[:2, ...])
```
%% Cell type:markdown id:c66cff24-0346-4eca-a259-ce095e5db4a1 tags:
If you want to avoid modifying the original array when accessing a slice, you need to make a copy of it before writing values.
%% Cell type:code id:c1deae7f-fb12-4ee4-8487-2286f9844993 tags:
``` python
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
b = np.array(a) # create a copy of a
c = a[:2, :2] # create a slice
b[0, 0] = 42 # modify the copy
c[0, 0] = 17 # slice modification
# The following will print :
# [[17 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
# [[42 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
# [[17 2]
# [ 5 6]]
print(a)
print(b)
print(c)
```
%% Cell type:markdown id:00fcee14-8d4a-4693-83d2-26e274e5ec8c tags:
While slicing produces a sub-array of the original array, **integer array indexing** provides a way to construct an arbitrary array from the original one.
%% Cell type:code id:92810750-d561-4c76-ab6f-b8cbc961f141 tags:
``` python
a = np.array([[1,2], [3, 4], [5, 6], [7, 8]])
# An example of integer array indexing.
# The returned array will have shape (3,) and
# the elements a[0, 0], a[2, 1] and a[3, 0]
print(a[[0, 2, 3],
[0, 1, 0]]) # Prints "[1 6 7]"
# When using integer array indexing, you can use the same
# element from the source array multiple times:
print(a[[0, 0, 0], [1, 1, 1]]) # Prints "[2 2 2]"
# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1], a[0, 1]])) # Prints "[2 2 2]"
```
%% Cell type:markdown id:13801e62-70fb-4947-a27c-63e7b53c0e61 tags:
**Boolean indexing** lets you pick arbitrary elements from an array. This is useful when selecting elements that satisfy some condition.
%% Cell type:code id:65eab8fb-98ff-46ae-a0f7-abe744b012d3 tags:
``` python
a = np.array([[1,2], [3, 4], [5, 6], [7, 8]])
bool_idx = (a > 3) # Returns an numpy array of booleans, having a shape identical to a.
# Each element will be True if it satisfies the condition, i.e the corresponding element of a is > 3.
# This prints
# [[False False]
# [False True]
# [ True True]
# [ True True]]
print(bool_idx)
# This will construct a 1D array made of the values of a corresponding to the True values in bool_idx
print(a[bool_idx]) # Prints "[4 5 6 7 8]"
# This can also be done in a single statement:
print(a[a > 3]) # Prints "[4 5 6 7 8]"
```
%% Cell type:markdown id:7d677384-af91-40e1-b35d-8c0f62919a63 tags:
### Data types
Numpy arrays' elements, unlike python lists, are all of the same type. Numpy provides [many types](https://numpy.org/doc/stable/reference/arrays.dtypes.html) that can be used to build an array. You can either let Numpy guess the best datatype for your elements, or explicitly specify it.
%% Cell type:code id:8ff11366-992d-4b0f-8198-662759c58bd3 tags:
``` python
x = np.array([1, 2]) # Let numpy choose the datatype
print(x.dtype) # Prints "int64"
x = np.array([1.0, 2.0]) # Let numpy choose the datatype
print(x.dtype) # Prints "float64"
x = np.array([1, 2], dtype=np.float64) # Force a particular datatype
print(x.dtype) # Prints "float64"
```
%% Cell type:markdown id:8530c0db-4133-4117-8c30-07e5eddcc983 tags:
### Array operations
You can perform element-wise mathematical operations on Numpy arrays, either using operator overloads or Numpy functions. If you are familiar with Matlab be careful: `*` is the element-wise multiplication in Numpy, and NOT the matrix multiplication (as it is in Matlab) !
Check the [full list of mathematical operations](https://numpy.org/doc/stable/reference/routines.math.html).
%% Cell type:code id:0b9a9d7a-b927-4f41-b477-7da5f376c100 tags:
``` python
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
# Elementwise sum; both produce the array
# [[ 6.0 8.0]
# [10.0 12.0]]
print(x + y)
print(np.add(x, y))
# Elementwise difference; both produce the array
# [[-4.0 -4.0]
# [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))
# Elementwise product; both produce the array
# [[ 5.0 12.0]
# [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))
# Elementwise division; both produce the array
# [[ 0.2 0.33333333]
# [ 0.42857143 0.5 ]]
print(x / y)
print(np.divide(x, y))
# Elementwise square root; produces the array
# [[ 1. 1.41421356]
# [ 1.73205081 2. ]]
print(np.sqrt(x))
```
%% Cell type:markdown id:e7fcfee6-e5f7-499d-8bb2-27a235459bf0 tags:
Reshaping an array can be useful, especially through transposition. Transposition can be achieved using the `T` attribute. Numpy also has a `reshape` function.
%% Cell type:code id:530bbab9-c477-429f-908a-7402391161e1 tags:
``` python
x = np.array([[1, 0], [2, -1]])
print(x) # Prints "[[ 1 0]
# [ 2 -1]]"
print(x.T) # Prints "[[ 1 2]
# [ 0 -1]]"
# Be careful with complex matrices, if you want to get the Hermitian transpose,
# you need to do it explicitely via the H attribute, and not just call .T
z = np.array([[1+1.j, 1.j], [0, -1.j]])
print(z.T) # Prints "[[ 1.+1.j 0.+0.j]
# [ 0.+1.j -0.-1.j]]", i.e. this is just a transposition
print(np.conjugate(z.T)) # Prints "[[ 1.-1.j 0.-0.j]
# [ 0.-2.j -0.+1.j]]"
# reshape examples
a = np.arange(0, 6)
print(a) # Prints "[0 1 2 3 4 5]"
# reshape the 1D array into a 2D one
print(a.reshape((2,3))) # Prints "[[0 1 2]
# [3 4 5]]"
# reshape a 2D array into a 1D one
print(x.reshape(4)) # Prints "[ 1 0 2 -1]"
# You can choose if you want to reshape in C (default for Numpy) or Fortran (default for Matlab) layout,
# i.e. by rows or columns.
print(x.reshape(4, order='F')) # Prints "[ 1 2 0 -1]"
```
%% Cell type:markdown id:d6dc2e0b-105c-4c1d-a6cf-ae73eed52b71 tags:
As stated previously, `*` is the element-wise multiplication. If you want to multiply two matrices, you need to call
`np.matmul` or its shortcut `@` (make sure you checked the course repository's README if you have troubles typing it in Jupyter). `np.dot` computes the inner product between two vectors. It will return the same results as `np.matmul` for matrices (i.e. 2D arrays) but using `np.matmul`or `@` should be preferred.
%% Cell type:code id:bcc63eed-b6ca-4ca5-86c9-482d684cdba9 tags:
``` python
a = np.array([[1, 2], [-2, 1]])
b = np.array([[1, 0], [0, -1]])
# This will print 4 times
# [[ 1 -2]
# [-2 -1]]
# It will *NOT* work for arrays of dimension greater than 2.
print(np.dot(a, b))
print(np.matmul(a, b))
print(a@b)
print(np.inner(a, b))
```
%% Cell type:markdown id:04c0f3a7-a464-4bd4-b7a7-9da71cf69f2c tags:
There are several ways of computing the inner product $\langle u,v \rangle = u^Tv = \sum_i u_i v_i$, but those equivalences hold only for matrices and (often) not for $N$-dimensional arrays (with $N>2$).
Similarly, you can compute the outer product $uv^T$ using different methods, with the same remarks regarding dimension.
%% Cell type:code id:fa3e7434-1e1f-4610-a1fe-6147665826ed tags:
``` python
import numpy as np
u = np.array([1, 1]) # shape = (2,)
uu = np.array([[1], [1]]) # shape = (2,1), we made sure uu would be a column vector, as in the definitions used in the slides.
v = np.array([-1, 0]) # shape = (2,)
vv = np.array([[-1], [0]]) # shape = (2,1)
# Compute the inner product between 2 vectors
# If you want to compute u^T v, you need to resahpe vectors from (2,) to (2,1),
# since here u.T = u and v.T = v
# This prints twice "-1" and once "[[-1]]". Using the matrix multiplication returns an array, instead of a scalar.
print(np.dot(u, v))
print(np.inner(u, v))
print(uu.T@vv)
# If you want to compute u^T.v you need to reshape vectors from (2,) to (2,1) (or use np.outer)
# This prints 3 times
# [[-1 0]
# [-1 0]]
print(np.dot(uu, vv.T))
print(np.outer(u, v))
print(uu@vv.T)
```
%% Cell type:markdown id:dd8254a1-f9e9-4d9b-8ccc-746ea4a43e98 tags:
### Broadcasting
Broadcasting allows to perform arithmetical operations between arrays of different sizes. Of course you need to follow a rule in order to do so:
"In order to broadcast, the size of the trailing axes for both arrays in an operation must either be the same size or one of them must be one."
In the following example, `x.shape` is `(4, 3)` and `v.shape` is `(3,)`, which fulfills the "equal trailing axis size" condition.
%% Cell type:code id:437cb02e-ab70-4360-95ef-04d7a78dfb09 tags:
``` python
# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v # Add v to each row of x using broadcasting
print(y) # Prints "[[ 2 2 4]
# [ 5 5 7]
# [ 8 8 10]
# [11 11 13]]"
```
%% Cell type:markdown id:0b6e33b9-df5b-4745-a8a6-04ac8851df20 tags:
Implementing this without broadcasting could be done as:
%% Cell type:code id:7eb57e40-daf5-4267-8e8f-586de76df4e3 tags:
``` python
y = np.empty_like(x) # Create an empty matrix with the same shape as x
# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
y[i, :] = x[i, :] + v
# Now y is the following
# [[ 2 2 4]
# [ 5 5 7]
# [ 8 8 10]
# [11 11 13]]
print(y)
```
%% Cell type:markdown id:33807f29-227f-463a-994b-658f6271acb7 tags:
In addition to being more compact notation-wise, broadcasting allows for faster execution time and reduced memory requirements (useful when you work with bigger matrices).
In general, for performance reasons, you should try to avoid using `for` loops and use Numpy functions and broadcasting.
### Exercise
In order to experience that for yourself, fill the function below that will implement "naive" matrix multiplication, using `for` loops (do not use Numpy functions or the `@` operator). Do not forget to check that those matrices can be multiplied together before proceeding with the loops. If you find they are not compatible, you can use the `raise` statement to throw an exception, e.g.
```
def matmul(a, b):
if <insert test here>:
raise ValueError('Incompatible sizes')
```
%% Cell type:code id:10e5409b-35a5-4c39-8fe8-d84f6096a89a tags:otter_answer_cell
``` python
def my_matmul(a, b):
...
```
%% Cell type:markdown id:541328cb-5986-4667-9288-673b8026f093 tags:
Make sure those examples are giving the expected results
%% Cell type:code id:879bf25c-02c5-4961-9046-3a367fbd407f tags:
``` python
a = np.ones((2,2))
b = np.eye(2)
c = np.array([[1, -1, 0], [-1, 0, 1]])
print(my_matmul(a,b))
print(a@b)
print(my_matmul(a, c))
print(a@c)
# print(matmul(a, c.T) # This should generate an exception because of incompatible sizes
# print(a@c.T) # This will generate an exception because of incompatible sizes
```
%% Cell type:code id:4a47ced4 tags:
``` python
grader.check("q1")
```
%% Cell type:markdown id:bce512cf-2040-4589-94db-f09563ce075c tags:
Once done, we will check its performance when compared to the Numpy version. Let us generate matrices of suitable size filled with random coefficients (feel free to play with the size to see the impact on performance).
%% Cell type:code id:dac07fba-b91d-44ee-877c-9ef8363e048a tags:
``` python
A1 = np.random.rand(100, 200)
A2 = np.random.rand(200, 100)
```
%% Cell type:code id:675d7469-2cdf-4db8-94cc-e9eb346aec3b tags:
``` python
%%timeit
matmul(A1, A2)
```
%% Cell type:markdown id:284efd22-85cb-4219-bf5d-1c053bf14cbc tags:
The `%%timeit` cell magic will display timing information about the execution of a cell. You can try different matrix sizes to see the impact. In general it is always better to use Numpy native functions (if it is missing, look again in the documentation because it is very likely that you missed it ;) than implementing them yourself.
%% Cell type:code id:e951d875-6450-4bc6-b03b-6b4c98c03b8a tags:
``` python
%%timeit
A1@A2
```
%% Cell type:markdown id:4ca9606e-f97d-4b54-a28c-ef40f9a6525c tags:
## Matplotlib
[Matplotlib](https://matplotlib.org/) is a library to create plots in Python. It has lots of features, we will therefore only focus on a small subset of them, especially `pyplot` which is a collection of functions that make Matplotlib work like Matlab. Make sure to check the [cheatsheets](https://matplotlib.org/cheatsheets), the [tutorials](https://matplotlib.org/stable/tutorials/index) and of course the [documentation](https://matplotlib.org/stable/api/index.html).
### Basic plots
%% Cell type:code id:1e2c66bd-fd49-4ca9-82c3-04556f08319f tags:
``` python
import matplotlib.pyplot as plt
# Build some data to be plotted
# x between -2pi and +2pi, sampled with a 0.2 interval
x = np.arange(-2*np.pi, 2*np.pi, 0.2)
y = np.cos(x)
# you can plot directly the 'y' values (python will fill the 'x' values with indices)
plt.plot(y)
plt.show() # can be omitted but avoids extra info to be displayed
```
%% Cell type:code id:538ac798-77f3-4e4c-8f90-38c493a9789a tags:
``` python
plt.plot(x,y) # if you specify 'x', the x-axis uses those values for display.
plt.show()
```
%% Cell type:markdown id:62ee9b8b-cce4-44ee-90de-9612e11ab8da tags:
It is fairly easy to add legends, axis labels, etc.
%% Cell type:code id:7cd1d0b9-55ba-4b45-915a-0de4e438e400 tags:
``` python
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(-2*np.pi, 2*np.pi, 0.2)
y_sin = np.sin(x)
y_cos = np.cos(x)
# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])
plt.show()
```
%% Cell type:markdown id:381eca03-226d-4c86-9c79-ca6d7ebf9868 tags:
You can style each plot differently (axes, colors, markers, ...). Check the [plot documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html) for more.
%% Cell type:code id:b56af9d3-fb1f-4a33-85fa-33a4798fbeb4 tags:
``` python
plt.plot(x, y_sin, 'r+')
plt.plot(x, y_cos, 'g-')
plt.axis([-10, 10, -2, 2])
plt.show()
```
%% Cell type:markdown id:016e33cd-b947-4737-950e-62b9f2fe3466 tags:
### Subplots
A single figure can contain multiple plots thanks to `subplot`
%% Cell type:code id:5f6e754e-c815-4465-a299-282ccea4c162 tags:
``` python
# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)
# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')
# Set the second subplot as active, and make the second plot.
plt.subplot(212) # shortcut notation for plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')
plt.subplots_adjust(top=1.5) # avoid plot legend collision with axis
plt.show()
```
%% Cell type:markdown id:be159e75-4e55-419c-a36b-3e5d2332dfef tags:
### Visualizing linear transforms
Let us use Matplotlib to visualize the effect of (simple) linear transforms.
We use `mgrid` to generate the x/y coordinates of a regular lattice.
%% Cell type:code id:f1f42c59-0d19-4e25-8eac-7528c72da926 tags:
``` python
c = np.mgrid[-5:6, -5:6]
plt.scatter(c[0], c[1]) # 'scatter' plots point clouds
plt.axis([-10, 10, -10, 10])
plt.grid(visible=True) # show the grid lines
plt.show()
```
%% Cell type:markdown id:9093b27d-07bd-4015-8308-f6ff76521384 tags:
For ease of visualization, our linear transform will be defined by a matrix $A \in \mathbb{R}^{2 \times 2}$ as transformation $\mathbb{R}^2 \rightarrow \mathbb{R}^2$.
%% Cell type:code id:b349ad3c-ceba-4911-9610-8071fec5a3c9 tags:
``` python
A = np.array([[2., 0.5], [0, 0.5]])
print(A)
```
%% Cell type:code id:d4c285b9-aedb-42d2-a917-126acb647621 tags:
``` python
# coordinates need to be reshaped for matrix multiplication
coords = np.vstack([c[0].ravel(), c[1].ravel()]) # ravel will flatten a 2D array into a 1D one
print(coords.shape)
```
%% Cell type:code id:14cec790-0d80-42c9-b1fa-4f0ccf851a15 tags:
``` python
coord_transformed = A@coords
```
%% Cell type:code id:e0a3cd72-8bdb-4f0c-9c0c-a0c7e719ad5a tags:
``` python
plt.scatter(c[0], c[1]) # 'scatter' plots point clouds
plt.axis('equal') # display x and y axes with equal steps
plt.scatter(coord_transformed[0, :], coord_transformed[1, :], marker='+', color='r')
plt.grid(visible=True) # show the grid lines
plt.show()
```
%% Cell type:markdown id:56299471-4125-48fc-ab97-f7c6037c95a3 tags:
For future use, let us wrap this in a function (you may tweak it according to your needs/preferences)
%% Cell type:code id:f35fb90b-d021-4cc5-87c6-3232a1bc8ac6 tags:
``` python
# expects an input grid produced via mgrid
def visualize_transform_grid(input_matrix, input_grid):
coords = np.vstack([input_grid[0].ravel(), input_grid[1].ravel()])
coords_tr = input_matrix@coords
plt.scatter(input_grid[0], input_grid[1])
plt.axis('equal')
plt.scatter(coords_tr[0, :], coords_tr[1, :], marker='+', color='r')
plt.grid(visible=True) # show the grid lines
plt.show()
```
%% Cell type:markdown id:c548ea41-003c-4184-bc6c-0121d09d8506 tags:
Let us visualize the effect of a transformation using an "ellipse plot"
%% Cell type:code id:616fe759-be09-4db8-b74c-114ca9a0d0b0 tags:
``` python
# v1 and v2 are the indices of two unit vectors
def visualize_transform_ellipse(input_matrix, v1=65, v2=100):
# Creating the vectors for a circle and storing them in x
xi1 = np.linspace(-1.0, 1.0, 100)
xi2 = np.linspace(1.0, -1.0, 100)
yi1 = np.sqrt(1 - xi1**2)
yi2 = -np.sqrt(1 - xi2**2)
xi = np.concatenate((xi1, xi2), axis=0)
yi = np.concatenate((yi1, yi2), axis=0)
x = np.vstack((xi, yi))
# getting two samples vector from x
x_sample1 = x[:, v1]
x_sample2 = x[:, v2]
# compute the action of A on x
t = input_matrix @ x
# find transformed sample vectors
t_sample1 = t[:, v1]
t_sample2 = t[:, v2]
# plot the result
f, (ax1, ax2) = plt.subplots(1, 2, sharex=True)
ax1.plot(x[0,:],x[1,:], color='black')
ax1.axis('equal')
ax1.quiver(x_sample1[0], x_sample1[1], angles='xy', scale_units='xy', scale=1, color='b')
ax1.quiver(x_sample2[0], x_sample2[1], angles='xy', scale_units='xy', scale=1, color='r')
ax1.grid()
ax2.plot(t[0,:],t[1,:],color='black')
ax2.axis('equal')
ax2.quiver(t_sample1[0], t_sample1[1], angles='xy', scale_units='xy', scale=1, color='b')
ax2.quiver(t_sample2[0], t_sample2[1], angles='xy', scale_units='xy', scale=1, color='r')
ax2.grid()
```
%% Cell type:code id:a827f052-58fa-41d2-a94d-42dbae2d2140 tags:
``` python
# change the values of v1 and v2 to see the effect depending on the orientation
visualize_transform_ellipse(A, 75, 100)
```
%% Cell type:markdown id:a2ecaec6-1d68-49a3-ae2f-31ff96091ff9 tags:
The blue vector gets rotated and stretched but the red horizontal vector is only stretched:
$
A x_2 = \lambda x_2
$
In this example that first eigenvector was easy to spot since reading the columns of the matrix we see that the vector $(1,0)$ is mapped to $(2,0)$, i.e. it is an eigenvector with eigenvalue 2.
%% Cell type:markdown id:6731f49f-f631-44cb-9b2f-f2c636311023 tags:
### Exercise
Let us study one particular transform, defined by the following matrix:
$$B = \begin{pmatrix} \frac{1}{2} & -\frac{\sqrt{3}}{2} \\ \frac{\sqrt{3}}{2} & \frac{1}{2} \end{pmatrix}$$
%% Cell type:markdown id:37da1312-a192-40dc-892c-a759a9e571a3 tags:
<!-- BEGIN QUESTION -->
*Question:* What are the properties of this matrix ? What is the effect of this transformation (you can use the visualization functions defined previously) ?
%% Cell type:markdown id:0bc1090d tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:abd49697 tags:
<!-- END QUESTION -->
%% Cell type:code id:8b4a8658-7242-459d-96ab-75692a91e0e0 tags:
``` python
B = np.array([[0.5, -0.5*np.sqrt(3)],[0.5*np.sqrt(3), 0.5]])
```
%% Cell type:markdown id:440262bf-caab-4fd3-9e11-8ab411387ea3 tags:
Compute the matrix of the inverse transformation (without using Numpy's builtin `numpy.linalg.inv`). Hint: remember the properties of $B$.
%% Cell type:code id:d3bdce77-a2a3-43cb-bd41-9e5b9d0d3af5 tags:otter_answer_cell
``` python
B_inv = ...
```
%% Cell type:code id:81be4c98-b933-47c1-ae60-18f05f396855 tags:
``` python
# verify that you are correct
B@B_inv
```
%% Cell type:code id:b75d9858 tags:
``` python
grader.check("q3")
```
%% Cell type:code id:60552e92 tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Linear transforms.ipynb")
```
%% Cell type:markdown id:73bbe3f2-6e0a-463e-abc6-01213d4e5811 tags:
# Matrix Analysis 2025 - EE312
## Week 2 - Linear transforms
[N. Aspert](https://people.epfl.ch/nicolas.aspert) - [LTS2](https://lts2.epfl.ch)
The first week notebook (introduction to Python, Numpy and Matplotlib) can be used as a help.
## Objective
The end goal is to understand purely algebraic, matrix based, view of a few linear transforms. You will use those linear transform to perform some basic time-frequency analysis of signals.
%% Cell type:code id:9e522683-69ce-458f-a78c-9780e07a865e tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
```
%% Cell type:markdown id:5901baaa-4eb1-446a-8c0b-88369a9e213e tags:
## Part I - Fourier
%% Cell type:markdown id:7e81811f-e9c9-4d57-bbb9-bee71b7df722 tags:
<!-- BEGIN QUESTION -->
1. Prove that any set of orthogonal vectors $v_i \in \mathbb{C}^N, \, i=1, \ldots , M \leq N$ such that $v_i^H v_j = C \delta_{i,j}$ is linearly independent (where $C$ is some constant).
%% Cell type:markdown id:b6087a12 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:accea561-acef-429f-aa68-cc002cba19b8 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
2. Compute $a_r = \sum_{n=0}^{N-1}e^{j2\pi r\frac{n}{N}}$, where $r$ is an integer (discuss the result depending on the value of $r$).
%% Cell type:markdown id:3f0ff7a0 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:b591574c-ba6d-4a15-8b42-0cb62508da73 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
3. Let $v_k \in \mathbb{C}^N$ be such that $v_k[n] = e^{-j 2 \pi \frac{kn}{N}}$, for $k,n = 0, \ldots , N-1$.
- Prove that these vectors are mutually orthogonal, hence linearly independent.
- Compute the norm of $v_k$.
%% Cell type:markdown id:d8031ee7 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:efc7a6a3-1602-4100-8ec8-d82f9e01ad86 tags:
<!-- END QUESTION -->
4. Implement the function `get_fourier_matrix` that returns a normalized Fourier matrix of size $N\times N$. Do not make use of the builtin DFT/FFT functions in `numpy` or `scipy`. Raise a `ValueError` exception if a ngetive or zero $N$ value is supplied.
%% Cell type:code id:f58a90c7-d4e3-48d9-a21b-42a4d79e2344 tags:otter_answer_cell
``` python
def get_fourier_matrix(N):
...
```
%% Cell type:code id:2283adf4 tags:
``` python
grader.check("q4")
```
%% Cell type:markdown id:a5f6b819-1e53-456a-bf67-b76258922d6e tags:
Let us now generate two test signals.
The first one $x_1(t)$ is made of four piecewise sinusoids, of different frequencies:
$$
x_1(t) = \cos(2\pi 5t), 0\le t < 2\\
x_1(t) = \cos(2\pi 10t), 2\le t < 4\\
x_1(t) = \cos(2\pi 25t), 4\le t < 6\\
x_1(t) = \cos(2\pi 50t), 6\le t < 8\\
$$
%% Cell type:code id:081e39c8-1874-4274-a7dc-59c8bdab84e4 tags:
``` python
Fs = 256 # sampling frequency
t = np.arange(0, Fs*8)/Fs
x1 = np.zeros(Fs*8)
x1[0:Fs*2] = np.cos(2*np.pi*5*t[0:Fs*2])
x1[Fs*2:Fs*4] = np.cos(2*np.pi*10*t[Fs*2:Fs*4])
x1[Fs*4:Fs*6] = np.cos(2*np.pi*25*t[Fs*4:Fs*6])
x1[Fs*6:Fs*8] = np.cos(2*np.pi*50*t[Fs*6:Fs*8])
```
%% Cell type:markdown id:b7f7475c-bfb0-4e9f-9dd6-ac9eeb5abef1 tags:
The second signal $x_2(t)$ is the sum of the same sinusoids over the complete time interval, with a scaling term s.t. the amplitude of both signals is identical.
%% Cell type:code id:8e0199b4-baf4-4376-a61c-b778aabe59f7 tags:
``` python
x2 = 0.25*(np.cos(2*np.pi*5*t) + np.cos(2*np.pi*10*t) + np.cos(2*np.pi*25*t) + np.cos(2*np.pi*50*t))
```
%% Cell type:markdown id:c52f3daf-0017-4252-b297-b5d6bb6c8151 tags:
<!-- BEGIN QUESTION -->
5.
- Display the generated signals using `plt.plot`.
- Compute their Fourier transforms using the Fourier matrix generate at the previous question.
- Display the amplitude of their Fourier spectrum. What do you observe ?
%% Cell type:markdown id:cbde7de1 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:7d4aee96-f61e-4e51-84e8-56b4e357aa0f tags:otter_answer_cell
``` python
# plot x1
...
```
%% Cell type:code id:7a504349-03a3-4efd-bc3b-8249c09453fb tags:otter_answer_cell
``` python
# plot x2
...
```
%% Cell type:code id:7bf1bf4a-04b8-43f6-8128-29647e1f964a tags:otter_answer_cell
``` python
# Compute the Fourier transform of x1 and x2
...
```
%% Cell type:code id:159e693c-4427-4a8b-9f53-42dc63bafbc9 tags:otter_answer_cell
``` python
# Plot the spectrum of x1
...
```
%% Cell type:code id:f2c3061b-900e-44d9-bb73-02c98334a818 tags:otter_answer_cell
``` python
# Plot the spectrum of x2
...
```
%% Cell type:markdown id:19060be9-6494-4b4e-9803-1a4cb6fc36ac tags:
<!-- END QUESTION -->
In order to have a better compromise between time and frequency, the input signal will be split in smaller non-overlapping blocks of length $p$, and we will perform the DFT of each block.
6. Using the `get_fourier_matrix` implemented previously, fill the `get_block_dft_matrix` function below s.t. it returns a $N\times N$ matrix that will perform the block Fourier transform when applied to an input vector. Raise a `ValueError` if $p$ does not divide $N$.
Hint: [numpy.pad](https://numpy.org/doc/stable/reference/generated/numpy.pad.html#numpy.pad) and/or [numpy.kron](https://numpy.org/doc/stable/reference/generated/numpy.kron.html) might prove useful.
%% Cell type:code id:903005b2-e249-48a0-b28b-24daef16cdb7 tags:otter_answer_cell
``` python
def get_block_dft_matrix(N, p):
...
```
%% Cell type:code id:58978e43 tags:
``` python
grader.check("q6")
```
%% Cell type:markdown id:4e894db3-b5e6-4b44-a7d1-988c59542b74 tags:
<!-- BEGIN QUESTION -->
We will now use this block Fourier transform to how the frequencies of the signal evolve through time.
7.
- Using the `reshape` and `plt.imshow` functions, display the amplitude of the result of the block Fourier transform of $x_1$ and $x_2$. Is the result improved when compared to the one observed in question 5 ?
- What is the influence of parameter $p$ ?
%% Cell type:markdown id:6239185e tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:6da33694-7bd6-4257-9a5f-9bb300cd5c69 tags:otter_answer_cell
``` python
# Compute the block DFT matrix Wb
...
```
%% Cell type:code id:e1eb36fe-3079-4feb-86df-10596293e550 tags:otter_answer_cell
``` python
# Plot the block DFT of x1
...
```
%% Cell type:code id:4de97281-75ba-49d0-bc02-2573cf9791ec tags:otter_answer_cell
``` python
# Plot the block DFT of x2
...
```
%% Cell type:markdown id:76cc3589-b7d9-44ec-98ea-07fc6550a53a tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
8. In a real-world application, would generating a $N\times N$ matrix to perform the block Fourier transform be a good way to implement it ? (Justify)
%% Cell type:markdown id:ff1a761a tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:87372bd1-4055-4742-ab6a-f7b6ba8750af tags:
<!-- END QUESTION -->
## Part II - Haar transform
In this part we will study another approach to study the time/frequency properties of signals.
Let us consider a vector $x\in\mathbb{R}^N$, with $N$ being even. The single-level Haar transform decomposes $x$ into two vectors $a^1$ and $d^1$ belonging to $\mathbb{R}^{\frac{N}{2}}$.
The coefficients of $a^1$ are defined as follows: $a^1_n = \frac{1}{\sqrt{2}}(x_{2n-1} + x_{2n}), n=1, ..., \frac{N}{2}$. $a^1$ is referred to as the *average coefficients vector*.
The coefficients of $d^1$ are defined as follows: $d^1_n = \frac{1}{\sqrt{2}}(x_{2n-1} - x_{2n}), n=1, ..., \frac{N}{2}$. $d^1$ is referred to as the *detail coefficients vector*.
%% Cell type:markdown id:170a3e62-8b25-482c-a8dd-91988b1d08ed tags:
<!-- BEGIN QUESTION -->
9. Let us represent the single-level Haar transform by a matrix $H_1$ s.t.
$$
H_1 x = \begin{pmatrix}a^1 \\ d^1\end{pmatrix}
$$
Prove that $H_1$ is orthonormal.
%% Cell type:markdown id:56fd9eda tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:92f1b78b-3017-4129-bc95-f132717bc7b9 tags:
<!-- END QUESTION -->
10. Write a function that returns the single-level Haar transform matrix $H_1$ for a given $N$. Raise a `ValueError` if $N$ is invalid.
%% Cell type:code id:5d123a1c-7f98-4edf-b896-7d773bf48e76 tags:otter_answer_cell
``` python
def get_sl_haar_matrix(N):
...
```
%% Cell type:code id:86b9b019 tags:
``` python
grader.check("q10")
```
%% Cell type:markdown id:e2e7a4af-0f4e-4972-a051-468a12967e79 tags:
The multi-level Haar transform is defined by recursively applying the single-level transform **to the average coefficient parts**.
For instance constructing 2-level Haar transform over $N$ points start with the previously defined $H_{1,N}$ matrix of size $N\times N$ and the corresponding $\frac{N}{2}\times\frac{N}{2}$ version denoted by $H_{1,\frac{N}{2}}$.
$H_{1,N}$ can be written as
$$
\begin{pmatrix} H_{1, N}^a \\ H_{1,N}^d \end{pmatrix},
$$
where $H_{1, N}^a$ and $H_{1,N}^d$ are respectively the average and detail coefficient matrices, both of size $\frac{N}{2}\times N$.
Following these notations, the 2-level Haar transform matrix $H_{2,N}$ can be written as:
$$
\begin{pmatrix} H_{1,\frac{N}{2}}H_{1, N}^a \\ H_{1,N}^d \end{pmatrix},
$$
11. Implement a function that returns the $H_{p,N}$ matrix of size $N\times N$ that performs a $p$-level haar transform. Raise a `ValueError` if the size and the level are incompatible, or if the level is smaller than 1.
%% Cell type:code id:d73eb83b-8d4b-4bf9-9c19-b3d15ee927c1 tags:otter_answer_cell
``` python
def get_haar_matrix(N, level):
...
```
%% Cell type:code id:c3138843 tags:
``` python
grader.check("q11")
```
%% Cell type:markdown id:918653e3-22c1-4b25-abbc-2461a6736557 tags:
<!-- BEGIN QUESTION -->
12. Prove that $H_{p,N}$ is orthonormal.
%% Cell type:markdown id:02e21ab7 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:8224d668-7cdc-4175-a587-0647df59b2b4 tags:
<!-- END QUESTION -->
### Haar transform visualization
In order to make the visualization of the Haar decomposition easy, we provide you the `plot_haar_coeffs` function that will display the average and detail coefficients of the different levels.
The function takes 2 arguments:
- the input signal
- the number of levels
%% Cell type:markdown id:b6cac5b0-a173-499c-ad89-9e4b82166c7a tags:
<!-- BEGIN QUESTION -->
13. Display the Haar transform of $x1$ and $x2$.
%% Cell type:code id:5ca2ec21-222e-430c-a7bd-d2af54e2de47 tags:
``` python
from display_helper import plot_haar_coeffs
```
%% Cell type:code id:6a946638-18dc-44fa-8ffc-0b44a3f653e1 tags:otter_answer_cell
``` python
# display the decomposition of x1
...
```
%% Cell type:code id:b01b9267-9609-474e-b5e0-8b7adb687164 tags:otter_answer_cell
``` python
# display the decomposition of x2
...
```
%% Cell type:markdown id:81e23ac6-de1a-4cb4-98f2-89545052f9f3 tags:
<!-- END QUESTION -->
## Part III - Denoising
We will now use the different transforms defined in part I and II to perform denoising.
Let us create a noisy signal for this purpose.
%% Cell type:code id:e5583ca2-13ea-49f8-baf6-4d6a223f3b2e tags:
``` python
angle1 = np.linspace(0, 5*np.pi/2, 300)
wave1 = np.sin(angle1)
angle2 = np.linspace(0, 3*np.pi/2, 300)
wave2 = np.sin(angle2)
angle3 = np.linspace(np.pi/2, 2*np.pi, 424)
wave3 = np.sin(angle3)
wave = np.append(wave1, wave2)
wave = np.append(wave, wave3)
wave_noisy = wave + 0.2*np.random.normal(0, 1, 1024)
```
%% Cell type:code id:d3278fc1-e814-47b9-83f0-61f4fad8d4d7 tags:
``` python
plt.plot(wave_noisy, 'r')
plt.plot(wave)
```
%% Cell type:markdown id:33489d2d-b038-4896-87e7-0af6568ad702 tags:
The noise is usually located in the higher frequencies. However, the signal we created is a bit special as it has two discontinuities which also generate high frequencies components (remember the Fourier transform of a rectangle function is a sinc).
%% Cell type:markdown id:fbd9ce45-7214-4ca7-a5e3-cf0c340458a3 tags:
<!-- BEGIN QUESTION -->
14. Implement a function `denoise_signal` that perform denoising of the input signal by using a supplied orthonormal transform matrix, and by setting the transformed coefficients having an amplitude smaller than a given threshold to 0. You might want to use the [numpy.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html) function for this. When denoising using the Haar transform, you can preform the thresholding only on the detail coefficients. Implement the function `denoise_signal_haar` that performs this operation.
NB: The result returned should be real, in order to be displayed.
%% Cell type:code id:cf7270a7-de1d-4c24-913c-916479ee8ab5 tags:otter_answer_cell
``` python
def denoise_signal(W, x, threshold=0.1):
"""
W: NxN input orthonormal matrix (Fourier, block-Fourier or Haar)
x: input signal (of length N)
"""
...
```
%% Cell type:code id:c029ab19-11a5-4bc9-b348-31bf8ca07bca tags:otter_answer_cell
``` python
def denoise_signal_haar(W, x, threshold=0.1, detail_start_index=256):
"""
W: NxN input orthonormal matrix (Fourier, block-Fourier or Haar)
x: input signal (of length N)
detail_start_index: thresholding is performed on x[detail_start_index:]
"""
...
```
%% Cell type:code id:1b47fbc5-070b-49c4-a616-d8e753adede1 tags:otter_answer_cell
``` python
# Perform denoising with the full Fourier transform and display the result.
# Make sure you choose a good threshold
...
```
%% Cell type:code id:d06fa927-ca94-4387-b987-a4e3ffeabeb2 tags:otter_answer_cell
``` python
# Perform denoising with the block Fourier transform and display the result
# Make sure you choose a good threshold and block size
...
```
%% Cell type:code id:98e6e4a6-a289-4552-b5a9-27f40c40cb40 tags:otter_answer_cell
``` python
# Perform denoising with the Haar transform and display the result
# Make sure you choose a good threshold and an appropriate number of levels
...
```
%% Cell type:code id:d5cb5707 tags:
``` python
grader.check("q14")
```
%% Cell type:markdown id:e6b05f2f-34a1-4689-82ef-c3be92c842ba tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
15. Compare the three denoising methods (Fourier, block Fourier and Haar). Which one performs better (in terms of noise removal but also in terms of discontinuity preservation) ? Was that expected (justify) ?
%% Cell type:markdown id:a999fadd tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:e705a9ec tags:
<!-- END QUESTION -->
import pywt
import matplotlib.pyplot as plt
from scipy.interpolate import griddata
def plot_haar_coeffs(data, title, levels=5):
"""Show dwt coefficients for given data ."""
w = pywt.Wavelet('haar')
a = data
ca = []
cd = []
mode = pywt.Modes.sp1DWT = 1
for i in range(levels):
(a, d) = pywt.dwt(a, w, mode)
ca.append(a)
cd.append(d)
fig = plt.figure()
ax_main = fig.add_subplot(len(ca) + 1, 1, 1)
ax_main.set_title(title)
ax_main.plot(data)
ax_main.set_xlim(0, len(data) - 1)
for i, x in enumerate(ca):
ax = fig.add_subplot(len(ca) + 1, 2, 3 + i * 2)
ax.plot(x, 'r')
ax.set_ylabel("A%d" % (i + 1))
ax.set_xlim(0, len(x) - 1)
for i, x in enumerate(cd):
ax = fig.add_subplot(len(cd) + 1, 2, 4 + i * 2)
ax.plot(x, 'g')
ax.set_ylabel("D%d" % (i + 1))
# Scale axes
ax.set_xlim(0, len(x) - 1)
ax.set_ylim(min(0, 1.4 * min(x)), max(0, 1.4 * max(x)))
fig.subplots_adjust(left=0.2, right=1.5, top=1.9, bottom=0.1)
\ No newline at end of file
%% Cell type:code id:0b9af790 tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Projections and signal restoration.ipynb")
```
%% Cell type:markdown id:4133723e-952b-4b1c-bb3d-a7f78a957024 tags:
# Matrix Analysis 2025 - EE312
## Week 3 - Signal restoration using projections
[LTS2](https://lts2.epfl.ch)
%% Cell type:markdown id:fb42154f-8832-4960-9fee-3372b473ef69 tags:
Let us consider a signal with $N$ elements, i.e. a vector in $\mathbb{R}^N$.
Under our observations conditions, we can only recover partially the signal's values, the other remain unknown, i.e. we know that:
$x[k] = x_k$ for $k\in E = \{e_0, e_1, \dots,e_{M-1}\}$, with $E\in\mathbb{N}^M$ and $e_i\neq e_j \forall i\neq j$ (i.e. each known index is unique).
We also make the assumption that the signal is contained within **lower frequencies of the spectrum**. This can be expressed using the (normalized) Fourier matrix you have constructed last week $\hat{W}$.
In this notebook, we will try to reconstruct the signal by projecting its observation successively on the Fourier subspace defined above, then back to its original domain (with the constraint regarding its values), then on the Fourier domain, etc. This is a simplified version of a more general method called ["Projection onto convex sets" (POCS)](https://en.wikipedia.org/wiki/Projections_onto_convex_sets). The method is illustrated by the figure below (of course you do not project on lines but on spaces having larger dimension).
![POCS](images/pocs.png "POCS")
### 1. Low-pass filter
Let us first create example signals to validate our implementation of the filtering operation.
%% Cell type:code id:d238cbc9-055f-4d55-a6ae-29d83881e2f5 tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
N = 100
k = np.arange(0, N)
w1 = 3
w2 = 7
w3 = 12
# generate simple signals
x1 = np.cos(2*w1*np.pi*k/N) + np.sin(2*w2*np.pi*k/N)
x2 = np.sin(2*w1*np.pi*k/N) + np.cos(2*w3*np.pi*k/N)
```
%% Cell type:code id:59a703e8-f351-4a02-8278-12560a67b023 tags:
``` python
# visualize the base signals
plt.plot(x1)
plt.plot(x2, 'r')
```
%% Cell type:markdown id:09249ca9-e441-4537-9bc5-018be67bc099 tags:
1. Implement a function that returns the normalized Fourier matrix of size N (use your implementation from week 2)
%% Cell type:code id:f886d8a6-4cb7-4187-ae4b-4615a06caa2d tags:otter_answer_cell
``` python
def fourier_matrix(N):
...
```
%% Cell type:code id:f367f4e1 tags:
``` python
grader.check("q1")
```
%% Cell type:code id:62fae5a8-f3c4-477b-a53f-a0106730b842 tags:
``` python
W_hat = fourier_matrix(N)
```
%% Cell type:markdown id:1e0a2337-446d-49d3-8521-55d07a1539bc tags:
Let $X = \hat{W}x$, $X$ is the *discrete Fourier transform* of the input signal $x$. The frequency condition can then be rewritten as
$X[k] = 0$ for $w_c < k\leq N-w_c$.
2. Implement the function that returns a $N\times N$ matrix $P$, s.t. $PX$ satisfies the above condition for a given $w_c$. Make sure the input values are valid, if not raise a `ValueError` exception.
%% Cell type:code id:aeed4067-1130-4e6c-9fbd-ab8d0a3a41c0 tags:otter_answer_cell
``` python
def lowpass_matrix(N, w_c):
"""
Computes the P matrix that will remove high-frequency coefficients in a DFT transformed signal
Parameters
----------
N : length of the input signal
w_c : cutoff frequency
Returns
-------
The P matrix
"""
...
```
%% Cell type:code id:38529611 tags:
``` python
grader.check("q2")
```
%% Cell type:markdown id:fef68750-d97f-4625-bd5e-5f13e7894f7d tags:
3. Now let us try the filtering on the test signals and make sure it behaves appropriately. Using the matrix $P$ defined above, choose the parameter $w_c$ approiately s.t. the filtered signals retain $w_1$ and $w_2$ but discard $w_3$.
%% Cell type:code id:bc1370d5-c7b2-4598-b573-d57bec56aa1b tags:otter_answer_cell
``` python
w_c = ...
```
%% Cell type:code id:85787e70-1427-4860-8a7b-94d20432bd76 tags:otter_answer_cell
``` python
# Compute the DFT of x1 and x2
X1 = W_hat@x1
X2 = W_hat@x2
# Get the lowpass filter matrix
P = lowpass_matrix(N, w_c)
# Filter X1 and X2 and apply inverse DFT
x1_f = ...
x2_f = ...
```
%% Cell type:markdown id:700eb2e2-3137-4916-a829-9e8e2ba8c6e5 tags:
NB: Make sure the filtered output is **real** (or its imaginary part should be smaller than $10^{-12}$).
%% Cell type:code id:e960728d-c2c0-4b9d-ab73-3db971f559a1 tags:
``` python
# Plot the filtered signals
# x1_f should still contain both frequencies, x2_f only one
plt.plot(x1_f)
plt.plot(x2_f, 'r')
```
%% Cell type:code id:b6f5b443 tags:
``` python
grader.check("q3")
```
%% Cell type:markdown id:c34b8fc7-8a51-4fe5-a6ab-1cc27da2cd1e tags:
<!-- BEGIN QUESTION -->
4. Prove that $P$ is a projection.
%% Cell type:markdown id:0cd9f0a9 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:7f17f3ae-6a39-4e86-a29c-879c96b4e5fb tags:
<!-- END QUESTION -->
### 2. Signal extension
In order to express the condition on $x[k]$ as a pure matrix operation we need to make use of an extension of the input signal and design a slightly different Fourier transform matrix to use properly those extended signals.
Let us denote by $x_E$ the vector from $\mathbb{R}^M$ containing the known values of $x$, i.e. the $x_k$ for $k\in E$.
For each vector $y$ in $\mathbb{R}^N$ we define its extension as $\tilde{y} = \begin{pmatrix}y \\ x_E\end{pmatrix}$.
**This notation will remain throughout the notebook**, i.e. a vector with a tilde denotes its extension made by adding the $x_E$ values at the end.
%% Cell type:markdown id:f9434f5f-ceb9-42bf-9b5e-bd7d7a94dae8 tags:
5. Let us define the matrix $B_E$ and $y=B_E x$, s.t. $y[k] = 0$ if $k\in E$ and $y[k] = x[k]$ otherwise. Write a function that returns its extended version $\tilde{B_E}$ s.t. $\tilde{y}=\tilde{B_E}\tilde{x}$, for any $x\in\mathbb{R}^N$.
- $\tilde{B_E}$ is a square matrix of size $N+M$.
- Check the validity of parameters and raise a `ValueError` in case of invalid inputs.
%% Cell type:code id:dae1c4aa-c6ce-4f35-b309-aa6dd09e9abf tags:otter_answer_cell
``` python
def Bt_E(N, E):
"""
Computes the $\tilde{B}_E$ matrix
Parameters
----------
N : length of the input signal
E : list of suitable indices. e.g. for N=5 a valid E could be [1, 3]
Returns
-------
The $\tilde{B}_E$ matrix
"""
...
```
%% Cell type:code id:751ff499 tags:
``` python
grader.check("q5")
```
%% Cell type:markdown id:853c1a3b-9ed9-461c-86f9-12992e07337d tags:
Let us know design $C_E$ as an operator from $\mathbb{R}^{N+M} \rightarrow \mathbb{R}^{N+M}$ such that when applied to any extended vector $\tilde{z}$ s.t. $\tilde{z}[k] = 0, \forall k\in E$ as input (i.e. for instance the output of $\tilde{B}_E$), it generates a vector $\tilde{z}_R$ s.t.:
$\tilde{z}_R[k] = \left\{\begin{matrix} x_k \mbox{ if } k\in E \\ \tilde{z}[k] \mbox{ otherwise} \end{matrix}\right.$
6. Write a function that generates $C_E$.
%% Cell type:code id:932ae153-f336-4b0f-b7c0-774c02ccaad1 tags:otter_answer_cell
``` python
def C_E(N, E):
"""
Computes the $C_E$ matrix
Parameters
----------
N : length of the input signal
E : list of suitable indices. e.g. for N=5 a valid E could be [1, 3]
Returns
-------
The $C_E$ matrix
"""
...
```
%% Cell type:code id:473d2fb2 tags:
``` python
grader.check("q6")
```
%% Cell type:markdown id:0e57be3f-1c44-445a-86ad-07e20fbcd1d2 tags:
<!-- BEGIN QUESTION -->
7. What is the effect of $D_E = C_E \tilde{B}_E$ on an extended signal $\tilde{x}$ ?
%% Cell type:markdown id:786cd4ed tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:3b2caf03-41cb-41f9-93eb-778047b9b944 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
8. Verify (numerically on an example) that $D_E$ is a projector. You can use the function `numpy.testing.assert_array_almost_equal` to check that arrays are almost equal (with a good precision)
%% Cell type:code id:94dbc7f8-c2fc-4337-9828-dd2e3311830d tags:otter_answer_cell
``` python
# Set some parameters
N=5
E=[1,3]
# Compute D_E
D_E = ...
# Now check that D_E is a projector
...
```
%% Cell type:markdown id:7fb30993-5350-4e9b-aa45-b657b450a3a1 tags:
<!-- END QUESTION -->
### 3. Extended signals in the Fourier domain
Let us now define an operator that will work almost as the normalized Fourier transform, except that it will be applied to the extended signals and preserve the $x_E$ part. This can be easily done using the following block matrix:
$\tilde{W} = \begin{pmatrix}\hat{W} & 0 \\0 & I_M\end{pmatrix}$.
Using this definition, multiplying an extended signal $\tilde{x}$ by $\tilde{W}$ will result in a vector containing the Fourier transform of $x$ followed by $x_E$, preserving the known initial values.
%% Cell type:markdown id:3157e22e-f7d7-44b8-a4fd-9ddec1d7aa8a tags:
<!-- BEGIN QUESTION -->
9. Prove that $\tilde{W}$ is orthonormal.
%% Cell type:markdown id:7abf655c tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:f140c720-eaed-4fd2-8b9f-6b99a1a5b0ff tags:
<!-- END QUESTION -->
10. Write a function that returns $\tilde{W}$.
%% Cell type:code id:b744a698-40d5-41a6-9623-76a19f50bcd2 tags:otter_answer_cell
``` python
def Wt_E(N, E):
"""
Computes the $\tilde{W}_E$ matrix
Parameters
----------
N : length of the input signal
E : list of suitable indices. e.g. for N=5 a valid E could be [1, 3]
Returns
-------
The $\tilde{W}_E$ matrix
"""
...
```
%% Cell type:code id:60244622 tags:
``` python
grader.check("q10")
```
%% Cell type:markdown id:1afee45d-78e6-498b-905b-cbc97eeaf2d5 tags:
11. Recalling the low-pass matrix $P$ defined previously, build $\tilde{P}$ s.t. when applied to $\tilde{W}\tilde{x}$ it results in a vector containing the filtered values (still in the Fourier domain) followed by $x_E$.
%% Cell type:code id:4576bfbc-a249-4d89-919d-47763c8a60f9 tags:otter_answer_cell
``` python
def Pt_E(N, M, w_c):
...
```
%% Cell type:markdown id:781d7752-74ae-4ec2-8ea2-4279b54ee5e7 tags:
<!-- BEGIN QUESTION -->
12. A signal $\tilde{x}$ will be filtered by doing $\tilde{W}^H \tilde{P}\tilde{W}\tilde{x}$.
Prove that this is a projection.
%% Cell type:markdown id:d38cefa0 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:1c8c4469-7264-4968-b700-897be635a193 tags:
<!-- END QUESTION -->
### 4. Signal restoration
%% Cell type:markdown id:ac24806d-105d-4fbf-bc98-788dbcc7c0ba tags:
<!-- BEGIN QUESTION -->
13. We can now use all defined above to implement a function that performs an iteration, taking as input the extension of $x$ as defined above, that does the following:
- compute the filtered version of the signal using $\tilde{W}^H \tilde{P}\tilde{W}$ (i.e. projecting on the space of "smooth signals")
- restore the known values in the signal by using $D_E = C_E\tilde{B}_E$ (i.e project back on the space of signals having the known initial values we have set)
Make sure to force the result to real values by using `np.real` before returning
%% Cell type:code id:155e3277-bcef-44ed-b53f-a1b2e5ad7363 tags:otter_answer_cell
``` python
def restoration_iter(Wt, Pt, Dt, xt):
"""
Performs a full restoration iteration by
- projecting the signal into Fourier, performing low-pass filtering followed by inverse DFT
- restoring the known initial values into the filtered signal
Parameters
----------
Wt : \tilde{W} matrix
Pt : \tilde{P} matrix
Dt : \tilde{D} matrix
xt : \tilde{x} vector
Returns
-------
The new $\tilde{x} vector after the iteration
"""
...
```
%% Cell type:markdown id:758aa9ce-1ebc-49e1-b2f6-ca8e1035b31c tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
15. Finally we are ready to apply all this to an example.
%% Cell type:code id:b467649e-2a46-40f2-bc98-73c354fb0484 tags:
``` python
# Setup an example
N = 100
w_c = 10 # cut-off
w1 = 3
w2 = 7
E = np.array([0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95])
# E = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90]) # try with less known points
M = len(E)
x = np.cos(2*w1*np.pi*np.arange(0, N)/N) + np.sin(2*w2*np.pi*np.arange(0,N)/N) # original signal
# Create a signal that is only noise, except at known initial values
y = np.random.rand(N) #
y[E] = x[E] # restore known values
xe = x[E]
```
%% Cell type:code id:510b6f8f-b1c2-48b7-af00-eee8e55b5868 tags:
``` python
plt.plot(y) # plot the "acquired" signal
plt.plot(x, 'r+-', alpha=0.5) # plot the ground truth signal
```
%% Cell type:markdown id:b71f1ff5-73d6-4bc1-ba06-1452c8bf8adb tags:
Generate $\tilde{y}$ (this only need to be done once)
%% Cell type:code id:4e0943bf-0f35-4837-b6db-1615ba466ae5 tags:
``` python
yt = np.append(y, xe) # SOLUTION
```
%% Cell type:markdown id:c25239d9-3015-4a33-b0c0-0c906ee4129e tags:
Run a number of iterations of the method and plot the result:
%% Cell type:code id:6aaf1a71-6c2a-4f43-a51e-82b40cc6d6d0 tags:
``` python
def signal_restore(Wt, Pt, Dt, yt, niter=20):
yr0 = yt # initialize
for k in range(niter):
yr1 = restoration_iter(Wt, Pt, Dt, yr0)
yr0 = yr1
return yr1
```
%% Cell type:code id:951418c2-ee12-4e8d-8f58-cba29d26dd7e tags:
``` python
Wt = Wt_E(N, E)
Dt = C_E(N, E)@Bt_E(N, E)
Pt = Pt_E(N, M, w_c)
yr = signal_restore(Wt, Pt, Dt, yt)
```
%% Cell type:code id:43b3d939-eeb2-4171-9293-0076f1c06c83 tags:
``` python
plt.plot(yr[:N]) # plot reconstructed signal
plt.plot(x, 'r+', alpha=0.7) # plot original for comparison
```
%% Cell type:markdown id:bf876405-b60c-4f7c-9162-56d58cafd19f tags:
Depending on $M$, you will find that the method can reconstruct perfectly the signal or not.
%% Cell type:markdown id:ccf83efe tags:
<!-- END QUESTION -->
exercises/week03/images/pocs.png

76.9 KiB

%% Cell type:code id:faef2260 tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Deblurring.ipynb")
```
%% Cell type:markdown id:b8b107c5-c56c-4e5f-8cee-b970f8e10f8f tags:
# Matrix Analysis 2025 - EE312
## Week 4 - Image deblurring using projections
[LTS2](https://lts2.epfl.ch)
### Objectives
In this week's exercises you will need to use the properties of the four fundamental subspaces, as well as the projection operator on those subspaces. You will apply those to deblur an image.
## I. Exercises
%% Cell type:markdown id:3acf015e-a0ae-48a6-925f-7ef5e5b18c01 tags:
<!-- BEGIN QUESTION -->
### 1.
Let $A=\begin{pmatrix}1 & 1 & 1 &0\\1 & 2 & 2 & 1\\2 & 5 & 5 & 3\end{pmatrix}$
Compute a basis for each of its four fundamental subspaces.
%% Cell type:markdown id:c5974723 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:5504f856-3edd-436f-b14f-4e660a7faf7c tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
### 2.
Let $A \in \mathbb{R}^{m \times n}$ and suppose it is onto. Prove that $AA^T$ is nonsingular.
%% Cell type:markdown id:658270cd tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:44bf7882-513e-4c29-9619-96bd357670b0 tags:
<!-- END QUESTION -->
## II. Image deblurring
### Introduction
Since we will need to visualize images, just a brief reminder on how to use matplotlib for this task
%% Cell type:code id:a05d6e95-711e-41ce-bf8a-5b983d8999fd tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
import skimage
import skimage.io
import os
```
%% Cell type:markdown id:f387c338-7c82-46cf-9950-def341903021 tags:
We will use the `imread` function from the `scikit-image` package (pre-installed on noto, make sure to install it if you are using a local environment)
%% Cell type:code id:97017b6f-fc95-4d94-b1f6-f7da6621444f tags:
``` python
filename = os.path.join(skimage.data_dir, 'camera.png')
camera = skimage.io.imread(filename)
```
%% Cell type:markdown id:6b4b81f7-6867-43b7-9fd5-9478d421db86 tags:
Some basic information regarding the contents of `camera`:
%% Cell type:code id:03e539bd-2d68-4240-9d72-ee4ef02b9081 tags:
``` python
camera.shape
```
%% Cell type:code id:3b723f81-ebf9-45e5-83a9-6e3d7048586b tags:
``` python
camera.dtype
```
%% Cell type:markdown id:f2c5f73f-bfee-47e7-b68d-d364f6478663 tags:
The type `uint8` means each value in `camera` is represented using a single byte, and is within $[0, 255]$.
Displaying the image is fairly straightforward :
%% Cell type:code id:04ce2927-48e4-41d6-961f-f1a5f0791cc3 tags:
``` python
plt.imshow(camera, cmap='gray')
```
%% Cell type:markdown id:18174a60-7b66-4ccf-a5f4-43ec56299c0e tags:
The image here is widely used in many image processing scientific publications.
The `cmap` parameter specifies the colormap used to display the image. Matplotlib default colormap not being grayscale, passing `cmap='gray'` is needed. You can find [several predefined colormaps](https://matplotlib.org/stable/tutorials/colors/colormaps.html) in Matplotlib.
You can also check the images distributed with `scikit-image` on their [github repository](https://github.com/scikit-image/scikit-image/tree/main/skimage/data). Feel free to use another image, or even one of your own.
%% Cell type:markdown id:30dbc6b0-be1f-43b6-a575-ffdedac904b8 tags:
## 1. Blurring operator
In order to efficiently deblur an image, let us design our own blurring operator. Since it will be entirely known, it will be possible to compute its inverse.
Let us consider an input vector $x=\begin{pmatrix}x_0 \\ x_1 \\ ... \\ x_{N-1}\end{pmatrix} \in \mathbb{R}^N$.
Our blurring operator $B$ will do two things:
- replace each value $x_k$ by the average of $2p+1$ values surrounding $x_k$, i.e. between $x_{k-p}$ and $x_{k+p}$.
- compute this average not for every $x_k$ but only every $s$ values, with $s$ being a subsampling factor, $s>0$.
Formally, if we denote by $y$ the output of the blurring operator on $x$, this means that
- $y\in\mathbb{R}^M$, with $M=\frac{N}{s}$ (NOTE: we will only consider the case where $N$ is a mutiple of $s$)
- $y_m = \frac{1}{2p+1}\sum_{k=sm-p}^{sm+p}x_k$
As you may have noticed, our summation indices in the definition of $y_m$ can be negative in some cases and bigger than $N-1$ in others. In those cases the missing value of $x_k$ is replaced by 0. This way of handling borders is often referred to as **zero-padding**, as it can be achieved by creating an extended version of $x$ with a number of leading and trailing zeros. There are other ways of handling borders, e.g. by using $x_0$ for negative indices and $x_{N-1}$ for indices greater than $N-1$, or even by mirroring the input vector.
%% Cell type:markdown id:63fecde4-7ee7-4692-b21e-dfc33e4a4de7 tags:
1. Write a function that creates the matrix $B$ that performs the operations described above.
Remember that in python the `//` operator performs the integer division.
**!!Warning!!** the automated tests are here to help you, they might however be incomplete and do not fully guarantee that your implementation is correct even if they all pass !
%% Cell type:code id:e40d4cba-1187-448d-96fb-d0033c129aaf tags:otter_answer_cell
``` python
def blur_matrix(N, p, s):
"""
Computes the blurring matrix
Parameters
----------
N : length of the input signal
p : half-length of the blurring filter
s: subsampling factor
Returns
-------
The blurring matrix
"""
...
```
%% Cell type:code id:73ed3a03 tags:
``` python
grader.check("q2p1")
```
%% Cell type:markdown id:af393d26-6c86-4ff0-b8fb-2f78fae94c57 tags:
<!-- BEGIN QUESTION -->
2. What is the rank of $B$ ?
Hint:
- row-rank might be easier to consider
- you might also need to add constraints on $s$ and $p$
%% Cell type:markdown id:d4849ba8 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:67659a9b-5206-40da-a8ab-0f190884d901 tags:
<!-- END QUESTION -->
## 2. Blurring the image
We now have our blurring matrix ready, let us use it to blur the `camera` image.
As $B$ is designed to operate on vectors, we will need two steps to blur the image
- apply $B$ on each column of the input image
- apply $B$ on each row of the column-blurred image computed above
%% Cell type:markdown id:596b49e7-438a-42c6-b5e6-5ea1a7f338f2 tags:
3. Implement a function that blurs an image using the matrix defined in the previous question
%% Cell type:code id:0a8714b7-1dfa-4aa9-b09a-a5a9caa317bc tags:otter_answer_cell
``` python
def blur_image(img, B):
"""
Blurs the input image using the blur matrix
Parameters
----------
img : input image
B : blur matrix
Returns
-------
The blurred image
"""
...
```
%% Cell type:code id:6498302c tags:
``` python
grader.check("q2p3")
```
%% Cell type:code id:c167c207-66da-41a7-b536-5fc4577420e3 tags:
``` python
# build the blur matrix
B = blur_matrix(512, 3, 2)
```
%% Cell type:code id:42231cfd-b223-4f90-97c5-dec703aee5af tags:
``` python
# Blur on rows
camera_blurred = blur_image(camera, B)
```
%% Cell type:code id:78aa4a09-ff28-4e21-83f9-2b14c6d6ad83 tags:
``` python
plt.imshow(camera_blurred, cmap='gray') # check the result
```
%% Cell type:code id:492b6b43-abb8-49a1-a8cd-4a34c9b1a4eb tags:
``` python
# Display blurred and original side by side
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(15, 7), dpi=80, sharey=True, sharex=True)
ax[1].imshow(camera_blurred, cmap='gray')
ax[0].imshow(camera, cmap='gray')
```
%% Cell type:markdown id:1d64733c-35eb-440b-a572-60bbdfd446e5 tags:
## 3. Deblurring
Now everything is setup, we can proceed with the image restoration.
%% Cell type:markdown id:c962b694-45b2-4c2b-b7f5-ee69ffd8e342 tags:
<!-- BEGIN QUESTION -->
4. Let us denote the result of a blur operation by $y=Bx$. Using the properties of the four fundamental subspaces, write $x^*$ as a sum, making use of projections operators those subspaces. Hint: you might need results from question 2.
%% Cell type:markdown id:c35a1228 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:8f024ccc-04b6-4437-aa6a-daba627ed4d4 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
5. Using this result, find a possible solution for the complete blurring operation (on both rows and columns).
%% Cell type:markdown id:ab1c7611 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:cf20cfff-0c77-4367-8b4e-92f41c41acad tags:
<!-- END QUESTION -->
6. Implement a function that computes the right inverse of the blur matrix, and uses it preform image restoration. Do not forget to consider row and columns blurring.
You can use `np.linalg.inv` to perform the inverse of a matrix.
%% Cell type:code id:32426413-a8ea-46ac-ae3e-968f480ff150 tags:otter_answer_cell
``` python
def inverse_blur(blurred_img, B):
...
```
%% Cell type:code id:f82aa4d6 tags:
``` python
grader.check("q2p6")
```
%% Cell type:markdown id:812c7443-3492-46d2-ad99-b2c83826163b tags:
Using this function, compute the reconstructed image from the blurred one
%% Cell type:code id:f270a921-f7c3-40fd-a969-d37010ad2954 tags:
``` python
camera_unblur = inverse_blur(camera_blurred, B)
```
%% Cell type:code id:75296a85-053c-407c-915f-b562ba9c8acd tags:
``` python
plt.imshow(camera_unblur) # check the result
```
%% Cell type:code id:0dfd3731-b40a-4425-b78d-e0b65590c9f3 tags:
``` python
# Let us compare the original image, the restored image and an upscaled version of the blurred image
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(15, 7), dpi=80, sharey=True, sharex=True)
ax[1].imshow(camera_unblur, cmap='gray')
ax[0].imshow(camera, cmap='gray')
ax[2].imshow(skimage.transform.resize(camera_blurred, (512,512)), cmap='gray')
```
%% Cell type:markdown id:ca9e1169-772a-402a-be66-8fd166826e54 tags:
<!-- BEGIN QUESTION -->
6. Evaluate the performance of the restoration method. What are its limitations for a real-world usage ? Check that the computed solution is valid by blurring the restored image.
%% Cell type:markdown id:2d3adde9 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:3658f662-3832-4098-84f4-5d618b95fd02 tags:otter_answer_cell
``` python
...
```
%% Cell type:code id:54f26333 tags:
``` python
grader.check("q2p6")
```
%% Cell type:markdown id:650659df tags:
<!-- END QUESTION -->
%% Cell type:markdown id:0bd3e058 tags:
## Submission
Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**
%% Cell type:code id:d10563fe tags:
``` python
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)
```
%% Cell type:markdown id:de374d0d tags:
%% Cell type:code id:044b8034 tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Linear systems - Poisson equation.ipynb")
```
%% Cell type:markdown id:d133bce9-d159-4b1d-a25f-69bed8da493a tags:
# Matrix Analysis 2025 - EE312
## Week 5 - Discrete Poisson equation
[LTS2](https://lts2.epfl.ch)
### Objectives
Apply your knowledge about linear systems to Poisson equation resolution.
%% Cell type:markdown id:654ad4ee-c060-4bc3-b218-3a6709952750 tags:
## Poisson equation
Let $u,v \in \mathbb{R}^n$ represent a physical quantity $f$ and $g: \mathbb{R} \mapsto \mathbb{R}$ sampled at $n$ equi-spaced locations $x_i$, i.e $u_i = f(x_i)$, $v_i = g(x_i)$.
Let us assume that the underlying continuous object satisfies the Poisson equation: $\frac{d^2}{dx^2} f (x)= g(x)$ with constraints $f(x_j) = y_j$ for a subset of $m$ locations $j \in \{i_1, \ldots i_m \}$.
We will assume that **all the $i_k$ are distinct**.
This equation governs a number of physical principles, e.g. gravity, fluid mechanics and electrostatics. In the latter we have $\Delta \varphi = -\frac{\rho}{\varepsilon}$, where $\Delta$ is the Laplacian operator $(\frac{d^2}{dx^2} + \frac{d^2}{dy^2} + \frac{d^2}{dz^2})$, $\varphi$ is the electric potential, $\rho$ the density of electric charges and $\varepsilon$ the electric permittivity.
---
%% Cell type:markdown id:06ac5cfa-62ef-4477-a957-e240ddeb85d8 tags:
### 1D case
%% Cell type:code id:7f3cb74d-72f0-460b-a978-a47f87de8557 tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
```
%% Cell type:markdown id:9fd89d0b-50df-4711-a32b-ec1026e6b122 tags:
<!-- BEGIN QUESTION -->
1. Write down a matrix equation for the discrete version of $\frac{d^2}{dx^2} f (x)= g(x)$, using the finite-difference approximation of the derivative $\frac{d^2}{dx^2} f = f(x_{k+1}) - 2f(x_k) +f(x_{k-1})$. For the boundary conditions we will make the assumption that $x_{-1}=x_0$ and $x_{n-1}=x_n$ (also referred to as Dirichlet boundary conditions)
What is the rank of the corresponding matrix $D$ ?
%% Cell type:markdown id:d1f8bd46 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:aed76fe2-8cb5-4b28-bd3b-fa0bea113a46 tags:
<!-- END QUESTION -->
<Your answers here>
%% Cell type:markdown id:cceac118-c43d-4782-9f7e-4392e5ecc451 tags:
2. Implement a function that creates the $D$ matrix (also called Laplacian). The [diag](https://numpy.org/doc/stable/reference/generated/numpy.diag.html) function in numpy might be useful.
%% Cell type:code id:d41105d8-6933-4ef8-b2f0-c6ac053e1f55 tags:otter_answer_cell
``` python
def lapl_matrix(N):
...
```
%% Cell type:code id:756d9370 tags:
``` python
grader.check("q2")
```
%% Cell type:markdown id:4cea41dc-6f12-42f2-8644-b51e4918504d tags:
<!-- BEGIN QUESTION -->
3. Write down a matrix equation for the discrete version of $f(x_j) = y_j$. What is the rank of the corresponding matrix $B$ ?
%% Cell type:markdown id:fc67bfcb tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:418bc83f-7d46-41c2-8a7e-df892857320c tags:
<!-- END QUESTION -->
4. Implement a function that creates matrix $B$
%% Cell type:code id:1d294eda-7edc-42c2-9640-13bec779bd48 tags:otter_answer_cell
``` python
def b_matrix(N, idx_list): # idx_list is a list of valid indices, e.g. N=5, idx_list=[1,3]
...
```
%% Cell type:code id:d592616d tags:
``` python
grader.check("q4")
```
%% Cell type:markdown id:b77f215e-e4d0-4976-9ed6-7ee210c689a6 tags:
<!-- BEGIN QUESTION -->
3. Write down a matrix equation for the full problem (Hint: Consider the matrix $C=\begin{pmatrix}D\\B\end{pmatrix}$). Discuss the rank of the matrix and deduce a way to numerically solve the linear system. Implement a function that returns matrix $C$.
%% Cell type:markdown id:70f5637a tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:2eaf5a8c-b83a-4f53-9c4e-6192bf4e6fa4 tags:otter_answer_cell
``` python
def c_matrix(D, B):
...
```
%% Cell type:code id:d77116d4 tags:
``` python
grader.check("q4")
```
%% Cell type:markdown id:10a3d136-f5f3-4c1b-9c0a-ece58f891dfe tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
5. What explicit formula can you use to compute the pseudo-inverse of $C$ (justify)?
%% Cell type:markdown id:ce8aa19a tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:3726eb03-0c82-4aed-b179-77ff947583c3 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
6. Implement a function that return the solution of the problem. You can either use the formula above (you may then use the [linalg.inv](https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html) or compute the pseudo-inverse using [linalg.pinv](https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html).
%% Cell type:code id:79f8c6d1-7b92-48b6-bb90-e888aaa41243 tags:otter_answer_cell
``` python
# v is a vector of size N
# u is a vector of size len(idx_list)
def solve_poisson(N, v, idx_list, u):
...
```
%% Cell type:markdown id:ff6184e1-e308-47d1-8b72-ab4559b48adb tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
7. Let us now consider the solutions of the particular case $g(x) = 0, \forall x\in\mathbb{R}$. What are the analytical solutions that would solve Poisson equation ?
%% Cell type:markdown id:53e63141 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:4a33db39-077d-4775-a55d-ae5bd7400b23 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
Let us verify that our implementation of the solution works with a small example. Fill the values for $u$ s.t. they belong to the solution space.
%% Cell type:code id:96381701-e737-41c1-8642-fa1a5c13924c tags:otter_answer_cell
``` python
N = 50
v = np.zeros(N)
idx_list = [10, 20, 30, 40]
u = ...
sol = solve_poisson(N, v, idx_list, u)
plt.plot(sol) # plot the result
```
%% Cell type:markdown id:04e6fc55-6c8a-4dfa-8ca0-9b45ed2f157c tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
9. Are the results conform to what we expected ? What happens near the boundaries ?
%% Cell type:markdown id:e1c9c95f tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:ecf46b87-accd-4043-8ab6-06835830699b tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
10. Let us use a step function for $f$ and try to solve the system
%% Cell type:code id:d81b4501-b2a5-4412-a266-d04ffa8dd2c7 tags:
``` python
idx_list = [10, 20, 30, 40]
u2 = [1, 1, -1, -1]
sol2 = solve_poisson(N, v, idx_list, u2)
plt.scatter(idx_list, u2, marker='+', color='r')
plt.plot(sol2)
```
%% Cell type:markdown id:4918d31c-40a2-4df3-88ec-d18a80e879e7 tags:
What do you observe and what is the reason for that ?
%% Cell type:markdown id:1482c041 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:b520b2c6-7c84-4c3e-be16-8eb866648a97 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
11. Let us now consider the application of the Poisson equation to electrostatic. Knowing a distribution of electric charges, we can solve it to compute the electric potential. The distribution of electric charges will be very simple, consisting of two punctual charges
%% Cell type:code id:96932aa1-74aa-4f4e-bbf5-df0f0d85c47c tags:
``` python
N = 50
v3 = np.zeros(N)
v3[10] = 1
v3[40] = 1
plt.plot(v3)
```
%% Cell type:markdown id:42861911-53e1-4cb3-8dc7-aa617c01a7e8 tags:
- What are the analytical solutions for this problem ? (remember the discrete derivative of a step function is a Dirac function, i.e. 0 everywhere except at one point where it is 1).
- Plot the analytical solution
- Compute and plot the numerical solution
- What do you observe ?
%% Cell type:markdown id:1b19e6d2 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:f42fc23b-c412-4791-b5b6-bef54fa3bab3 tags:otter_answer_cell
``` python
...
```
%% Cell type:code id:f2f9bd66-067e-4590-8088-87b7969621bb tags:otter_answer_cell
``` python
# Plot analytical solution
...
```
%% Cell type:code id:22eae7ab-4c91-4000-b123-7105cc4c023a tags:otter_answer_cell
``` python
idx_list = [10, 20, 30, 40]
u3 = [1, 1, 1, 1]
sol3 = solve_poisson(N, v3, idx_list, u3)
plt.scatter(idx_list, u3, marker='+', color='r')
plt.plot(sol3)
```
%% Cell type:code id:b0310f39-9bbf-4518-be7d-af0e3f6724a7 tags:otter_answer_cell
``` python
# check the 2nd derivative of the solution computed
...
```
%% Cell type:markdown id:c0f0755f tags:
<!-- END QUESTION -->
%% Cell type:code id:e3f2b37e tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Recursive least squares.ipynb")
```
%% Cell type:markdown id:eb88859c-9077-43c0-868b-e5f45289ff06 tags:
# Matrix Analysis 2025 - EE312
## Week 6 - Recursive least squares
[N. Aspert](https://people.epfl.ch/nicolas.aspert) - [LTS2](https://lts2.epfl.ch)
## Important
- The unit tests are here to help you, however they might pass even if your implementation is faulty (and conversely fail even if your implementation is correct), due to platform/environment variability. Feel free to ignore them (after carefully reviewing your answers/results)
## Objective
The goal of this notebook is to find the parameters of a system using least squares, and more specifically use a recursive formulation to update those parameters with a reduced amount of computations.
%% Cell type:code id:f4ab0d05-aa5a-4dc7-b2b5-b04ad6dd426e tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
```
%% Cell type:markdown id:d1a159ae-f3e6-4629-bf5b-6d93af1b0b68 tags:
## Part I - Least-square parameter estimation
In this notebook, we will estimate physical parameters of a moving system. Its $(x,y)$ coordinates are sampled at regular time intervals, but are noisy (e.g. due to measurement artefacts). You can load them from a file using the code below.
%% Cell type:code id:68883575-5b43-4ae7-8e23-13cdab9b2fa4 tags:
``` python
data = np.load('data.npz')
```
%% Cell type:markdown id:b82107e6-9019-4598-9e9a-9b123fba4e58 tags:
Once loaded, the `data` variable is a dictionary containing 3 arrays:
- `time` for the time values corresponding to each sample
- `tr1` for the first system trajectory
- `tr2` for the second system trajectory
The trajectories are stored as complex numbers to have a single value representing the $x$ and $y$ components. In the first part of the problem we will only use ther data from `tr1`.
%% Cell type:code id:141016a9-b347-4c7b-b76e-13af8ade65e3 tags:
``` python
plt.scatter(np.real(data['tr1']), np.imag(data['tr1']), marker='+')
plt.xlabel('x')
plt.ylabel('y')
```
%% Cell type:markdown id:575ce09b-fcbd-4442-a8e7-dc2d5e587226 tags:
<!-- BEGIN QUESTION -->
Let us now build the model of this system. In this part we will assume that the system is defined by 3 parameters:
- its initial position $p_0$
- its initial velocity $v_0$
- its acceleration $a$ which is constant
Those parameters will be represented as complex values (in order to take into account their $x$ and $y$ components)
1. Express the position of the system $p(t)$ as a function of $t$ and of the 3 above parameters.
%% Cell type:markdown id:2857a8f4 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:5e93d1f2-82a7-4e1b-b543-d89dd38c1c69 tags:
<!-- END QUESTION -->
Since we have the whole trajectory of the system and given the relatively small number of samples, we can now estimate directly the 3 parameters using a least-square approximation.
2. Using the previous question, write a function that generates the matrix $A$ s.t.
$$
A\begin{pmatrix}p_0 \\ v_0 \\ a\end{pmatrix} = \begin{pmatrix}p(t_0) \\ p(t_1) \\ \vdots \\ p(t_{N-1}) \end{pmatrix}
$$
%% Cell type:code id:fbeb5b6b-77ee-4b03-8871-81be21c5cf01 tags:otter_answer_cell
``` python
def computeA(t):
"""
Computes the A matrix
Parameters
----------
t : vector containing values of time samples
Returns
-------
The A matrix
"""
...
```
%% Cell type:code id:f62efb3c tags:
``` python
grader.check("q2")
```
%% Cell type:markdown id:9903f3fe-bd77-4e83-880f-5fdddbd450a6 tags:
3. Estimate the parameters of the system using a least square approximation
%% Cell type:code id:a8623196-a2cd-47ef-b2c6-30804172868d tags:otter_answer_cell
``` python
def solve_lsq(t, pos):
"""
Solve the system Ax = pos, where x is the vector (p_0, v_0, a) and pos contains the position values at time t
Parameters
----------
t : vector containing values of time samples
pos: vector containing the positions of the system corresponding to the values of t (complex)
Returns
-------
The vector x containing the approximation of the system's parameters in the least-square sense.
"""
...
```
%% Cell type:code id:8182465f tags:
``` python
grader.check("q3")
```
%% Cell type:markdown id:f7edef23-f5d4-4c59-b4c8-ef57ad9d6b8b tags:
<!-- BEGIN QUESTION -->
4. Display the measured trajectory `tr1` and the one reconstructed from the parameters you estimated. If your implementation is correct, they should match. Also plot the euclidean distance between the measured and reconstructed point fo each timestep.
%% Cell type:markdown id:c1867853 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:18e792a6-6104-422d-a7b3-c7b53a56b32b tags:otter_answer_cell
``` python
# Plot the observed and predicted data
...
```
%% Cell type:code id:fa59fa59-f036-4906-87d3-52e558e7685c tags:otter_answer_cell
``` python
# Plot the distance between each point
...
```
%% Cell type:markdown id:832f6e06-46c1-4ac9-a01f-6acb138560df tags:
<!-- END QUESTION -->
## Part II - Recursive least-square estimation
The problem addressed in the first part was fairly easy: we had all measurements available at once (which is fine in an "offline" setting), and computing the least-square solution did not require to invert a huge matrix. In a more realistic setting, the measurements would come gradually and it would be nice to update our parameters without recomputing the whole solution at each step. This is where recursive least-squares come into place.
%% Cell type:markdown id:f51903c3-e84d-4352-a4ea-e568bdc3aead tags:
<!-- BEGIN QUESTION -->
We first need to prove a classical result. Let us consider four matrices $A,B,C$ and $D$ such that:
- $A$ is an invertible $N\times N$ matrix
- $B$ is a $N\times M$ matrix
- $C$ is $M\times N$ matrix
- $D$ is an invertible $M\times M$ matrix
5. Prove that $(A + BDC)^{-1} = A^{-1} - A^{-1}B(D^{-1}+CA^{-1}B)^{-1}CA^{-1}$
%% Cell type:markdown id:be98a834 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:6768f75a-e59d-42f1-99b7-9c16150ec2a0 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
Let us consider the least-square problem $Ax=y$, and suppose we already have received a small (but still keeping the system over-determined) number of measurements $y_0$ and computed $\hat{x}_0$ that is the least-square solution of $A_0\hat{x}_0=y_0$. Let us assume we have already computed $Q_0=(A_0^TA_0)^{-1}$.
Now another set of measurements $y_1$ arrives, and the system you want to solve looks like
$$
\begin{pmatrix}A_0\\A_1\end{pmatrix}x = \begin{pmatrix}y_0\\y_1\end{pmatrix}
$$
6. Using this block matrix formulation, rewrite the normal equation and express the updated solution $\hat{x}_1$ using $A_0$, $A_1$, $Q_0^{-1}$, $\hat{x}_0$, and $y_1$.
Let us denote by $Q_1 = (A_0^TA_0 + A_1^TA_1)^{-1}$. Express $\hat{x}_1$ as a function of $Q_1$, $A_1$, $y1$ and $\hat{x}_0$.
%% Cell type:markdown id:c3516e6e tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:58d88136-aa54-4839-af16-672638af3087 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
Using the result shown in question 5, express $Q_1$ as a function of $Q_0$, $A_1$ and $U=Q_0A_1^T$.
Hints:
- you might consider $Q_1^{-1}$
- what about $Q_0$ symmetry ?
Why is it interesting to update the system's parameters by computing $Q_1$ this way ?
%% Cell type:markdown id:4368082f tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:29107f5e-92d0-4790-b235-8e1362cd8250 tags:
<!-- END QUESTION -->
8. Using the results of questions 6 and 7, implement a function that computes the updated parameters for a new set of measurements and complete the `recursive_lsq`implementation.
%% Cell type:code id:bab14427-11c5-47b9-99b1-04e99fb92571 tags:otter_answer_cell
``` python
def update_lsq(Q0, xhat0, meas_time, y1):
"""
Compute the updated parameters xhat1 from Q0, y1, xhat0
Parameters
----------
Q0 : matrix Q0
xhat0: previously computed parameters of the system
meas_time: vector containing the times at which samples were acquired
y1: vector containing the new positions of the system corresponding to the values of meas_time
Returns
-------
xhat1 and Q1
"""
...
```
%% Cell type:code id:371b7525-b48b-43b8-a636-180cbd2e2566 tags:otter_answer_cell
``` python
def recursive_lsq(t, pos, batch_size):
"""
Perform recursive LSQ fit on iput data splitted in batches
Parameters
----------
t: vector containing values of time samples
pos: vector containing the positions of the system corresponding to the values of t (complex)
batch_size: size of each batch of measurements
Returns
-------
array containing model parameters estimated after each batch
"""
if batch_size < 2:
raise ValueError("Invalid batch size")
nsteps = len(t)//batch_size
# initialization step
A0 =computeA(t[:batch_size])
Q0 = np.linalg.inv(A0.T@A0)
xhat0 = ...
xh_cur = xhat0
Q_cur = Q0
x_h = [xhat0] # store the history of parameters in a list
for k in np.arange(1, nsteps):
cur_time = t[k*batch_size:(k+1)*batch_size]
cur_pos = pos[k*batch_size:(k+1)*batch_size]
xhat1, Q1 = ...
x_h.append(xhat1)
xh_cur = xhat1
Q_cur = Q1
return np.array(x_h)
```
%% Cell type:code id:ebdfb081 tags:
``` python
grader.check("q8")
```
%% Cell type:markdown id:a9f907da-74a3-4d1c-9892-8d2b9fa9c69f tags:
<!-- BEGIN QUESTION -->
9. Complete the `predict_trajectory` function, and the `tr1` data:
- display the predicted trajectory of the system when using recursive least-squares
- display the distance between the predicted and observed trajectory
Make sure to chose a suitable batch size.
%% Cell type:code id:7cbfe794-5aa1-4e12-a13c-3af377dfa828 tags:otter_answer_cell
``` python
def predict_trajectory(x_h, t, pos, batch_size):
pos_pred = np.zeros(pos.shape, dtype='complex')
# Initialization
pos_pred[:batch_size] = pos[:batch_size]
for k in np.arange(1, len(x_h)):
A1 = computeA(t[k*batch_size:(k+1)*batch_size])
pos_pred[k*batch_size:(k + 1)*batch_size] = ...
return pos_pred
```
%% Cell type:code id:97513eaf-ef5a-43c5-86f9-29cbc9bbd4de tags:otter_answer_cell
``` python
batch_size = ...
```
%% Cell type:code id:2cfd4110-2ee6-4e5e-bfdc-e6155745e19e tags:otter_answer_cell
``` python
x_h = ...
```
%% Cell type:code id:5daa05c1-23bf-4f00-a7d4-9849b6ef27f6 tags:otter_answer_cell
``` python
pred_traj = ...
```
%% Cell type:code id:e1961a97-4a16-4d47-8f0e-df2e106a68a6 tags:otter_answer_cell
``` python
# Plot the predicted and observed data
...
```
%% Cell type:code id:11ec5c11-9ee2-49e0-b4d3-507629f2f71d tags:otter_answer_cell
``` python
# Plot the error between prediction and observation
...
```
%% Cell type:markdown id:31ef60cf-1277-401a-b8d3-32b8e50e5797 tags:
<!-- END QUESTION -->
## Part III - Recursive least square with forgetting factor
%% Cell type:markdown id:528dc1bb-b87b-4af9-8247-c718a74fbf7a tags:
<!-- BEGIN QUESTION -->
10. Use the recursive least square on the `tr2` trajectory, whose acceleration varies across time. Display the predicted trajectory and approximation error. Is that what you expected ?
%% Cell type:markdown id:31abd7d1 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:e39bedb4-7c88-4398-a41b-26c8b815640c tags:otter_answer_cell
``` python
x2_h = ...
```
%% Cell type:code id:341e47da-1516-4a69-a90f-58e227297e10 tags:otter_answer_cell
``` python
pred_traj2 = ...
```
%% Cell type:code id:ebe4b99f-9f32-47b4-9af6-3086608358f4 tags:otter_answer_cell
``` python
# Plot the predicted and observed data
...
```
%% Cell type:code id:d564e0b0-73f5-4315-b5ea-0986822e1388 tags:otter_answer_cell
``` python
# Plot the error between prediction and observation
...
```
%% Cell type:markdown id:70180a33-f723-4ccd-8f09-6c772941865a tags:
<!-- END QUESTION -->
In order to solve this issue, we could either build a more complex model. However, making the assumption that the acceleration changes infrequently, we can try to update the recursive least-square in a way that will give higher importance to the recent measurements with respect to older ones. In order to do so, we will introduce a **forgetting factor** $\alpha$, with $0<\alpha<1$.
We will use this as follows:
When another set of measurements $y_1$ arrives, and the system defined in question 6 will be rewritten as
$$
\begin{pmatrix}\alpha A_0\\A_1\end{pmatrix}x = \begin{pmatrix}\alpha y_0\\y_1\end{pmatrix}
$$
%% Cell type:markdown id:6275538d-4edd-42d1-aa15-312936c217d9 tags:
<!-- BEGIN QUESTION -->
11. Using the new "forgetful" system defined above, update the expressions $Q_1$ and $\hat{x}_1$ derived in questions 6 and 7
Hint:
- You might want to introduce $U_\alpha = \frac{1}{\alpha} Q_0A_1^T$,
%% Cell type:markdown id:4d5f71ee tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:bc7bc015-18a8-4059-a2cb-cc680f199b1e tags:
<!-- END QUESTION -->
12. Using the results found at the previous question, implement the recursive least square with a forgetting factor.
%% Cell type:code id:6476b120-1e67-43b7-80fa-42eff433fb46 tags:otter_answer_cell
``` python
def update_lsq_ff(Q0, xhat0, meas_time, y1, ff):
"""
Compute the updated parameters xhat1 from Q0, y1, xhat0, ff
Parameters
----------
Q0 : matrix Q0
xhat0: previously computed parameters of the system
meas_time: vector containing the times at which samples were acquired
y1: vector containing the new positions of the system corresponding to the values of meas_time
ff: forgetting factor
Returns
-------
xhat1 and Q1
"""
...
```
%% Cell type:code id:70c04c51-59d8-4742-8bba-6c5ebe639c5f tags:otter_answer_cell
``` python
def recursive_lsq_ff(t, pos, batch_size, ff):
"""
Perform recursive LSQ fit on iput data splitted in batches
Parameters
----------
t: vector containing values of time samples
pos: vector containing the positions of the system corresponding to the values of t (complex)
batch_size: size of each batch of measurements
ff: forgetting factor (in practice, keep it close to 1)
Returns
-------
array containing model parameters estimated after each batch
"""
if batch_size < 2:
raise ValueError("Invalid batch size")
nsteps = len(t)//batch_size
# initialization step
A0 = computeA(t[:batch_size])
Q0 = np.linalg.inv(A0.T@A0)
xhat0 = ...
xh_cur = xhat0
Q_cur = Q0
x_h = [xhat0] # store the history of parameters in a list
for k in np.arange(1, nsteps):
cur_time = t[k*batch_size:(k+1)*batch_size]
cur_pos = pos[k*batch_size:(k+1)*batch_size]
xhat1, Q1 = ...
x_h.append(xhat1)
xh_cur = xhat1
Q_cur = Q1
return np.array(x_h)
```
%% Cell type:code id:cf21410e tags:
``` python
grader.check("q12")
```
%% Cell type:markdown id:fb61e06c-cf61-41c0-88c1-2d49c58870ca tags:
<!-- BEGIN QUESTION -->
13. Use the functions defined at question 12 to predict `tr2` trajectory. Adjust the batch size and forgetting factor (hint: keep it close to 1) to improve results. Display the predicted trajectory and observed one, and also the evolution of the prediction error across time.
%% Cell type:markdown id:d01eed91 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:0de3bc00-2da1-4758-9158-829d3724322a tags:otter_answer_cell
``` python
ff = ...
```
%% Cell type:code id:6adab39d-b558-4b7f-a4d2-9ef44350f312 tags:otter_answer_cell
``` python
batch_size = ...
```
%% Cell type:code id:b421fd4d-d7ca-4b7b-8aa5-09280ed01322 tags:otter_answer_cell
``` python
x2_h_ff = ...
```
%% Cell type:code id:7f4204d6-6da3-4652-a491-aaf95bc6e455 tags:otter_answer_cell
``` python
pred_traj2_ff = ...
```
%% Cell type:code id:31cd1cdc-0b4e-4563-a2cd-4a0b1874d4cb tags:otter_answer_cell
``` python
# Plot the predicted and observed data
...
```
%% Cell type:code id:e8414ea3-f39e-4ede-a690-4e3d94ce6b29 tags:otter_answer_cell
``` python
# Plot the error between prediction and observation
...
```
%% Cell type:markdown id:a84ca578 tags:
<!-- END QUESTION -->
File added
%% Cell type:code id:bd39160c tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("RBF networks.ipynb")
```
%% Cell type:markdown id:b62d1e09-41c3-40d0-a763-e8231d437183 tags:
# Matrix Analysis 2024 - EE312
## Week 8 - Image classification with Radial Basis Function (RBF) networks
[LTS2](https://lts2.epfl.ch)
%% Cell type:code id:nasty-access tags:
``` python
import numpy as np
from matplotlib import pyplot as plt
from sklearn.metrics import pairwise_distances
from scipy.spatial import distance_matrix
from scipy.special import softmax
import requests
from pathlib import Path
```
%% Cell type:markdown id:vietnamese-basin tags:
## 1. Image Classification
In this exercise, we will be doing image classification with a simple neural network. For simplicity, let's assume we will be working with black and white images.
Given an input image $i$ represented as a vector of pixel intensities $ \mathbf{x}_i \in [0,1]^d$, we want to predict its correct label $\mathbf{y}_i$, which is represented as a one-hot vector in $\{0,1\}^K$, where $K$ is the number of possible categories (classes) that the image may belong to. For example, we may have pictures of cats and dogs, and our goal would be to correctly tag those images as either cat or dog. In that case we would have $K=2$, and the vectors $\begin{pmatrix}0 \\ 1\end{pmatrix}$ and $\begin{pmatrix}1 \\ 0\end{pmatrix}$ to represent the classes of cat and dog.
In today's example we will be using the MNIST handwritten digit dataset. It contains images of handwritten numbers from 0 to 9 and our goal is to create a model that can accurately tag each image with its number. Let's load the data first.
%% Cell type:code id:exact-cycling tags:
``` python
### Load the data
# Download data if needed
if not Path("./mnist_data.npz").is_file():
r = requests.get('https://os.unil.cloud.switch.ch/swift/v1/lts2-ee312/mnist_data.npz', allow_redirects=True)
with open('mnist_data.npz', 'wb') as f: # save locally
f.write(r.content)
mnist = np.load('mnist_data.npz')
```
%% Cell type:markdown id:varying-taylor tags:
In the context of classification, neural networks are models that given one (or multiple) input data points produce as output a set of corresponding labels for each input. The model itself consists of parametric functions $g_i$ which can be applied sequentially to the input data, resulting in a set of labels which are the model's prediction for the data. For example, in a model that consists of two parameteric functions $g_1$ and $g_2$, for a given $\mathbf{x}_i$, we have the predicted label $ \hat{\mathbf{y}}_i = g_1(g_2(\mathbf{x}_i))$. The parametric functions are commonly called "layers".
In a standard image classification setup, we are given some training data which we can use to tune the parameters of the parametric functions $g_i$ in order to improve its ability to predict the labels correctly. The parameters are generally tuned with respect to some objective (commonly called a loss function). We want to find the parameters of the model that minimize this loss function. Various loss functions can be used, but in general they tend to encode how "wrong" the model is. For
example, on a given image $i$ one can use the loss $\mathcal{L}(\hat{\mathbf{y}_i}, \mathbf{y}_i)= \sum_{j=1}^{K}(\hat{{y}}_{ij} -{y}_{ij})^2 $, which is the mean squared difference between the vector coordinates of the predicted label of the image and the ones of the actual label $\mathbf{y}_i$.
Minimizing the loss over the whole training set is referred to as "training the model". Furthermore, the goal is that given new data we have not seen before and we have not trained our model with, the model will still be able to classify accurately.
Before we go into the details of the model and how we will train it, let's prepare the data.
%% Cell type:code id:e2140162-4414-47c2-ae99-d616300a1d65 tags:
``` python
# Preprocess the data
images = mnist['data']
num_images = images.shape[0]
train_set_size = 60000
test_set_size = 10000
train_images = images[:train_set_size]
train_images = train_images/255.
train_images = train_images
test_images = images[-test_set_size:]
test_images = test_images/255.
test_images = test_images
#create one-hot encodings of labels
mnist_target = mnist['target']
num_classes = mnist_target.max()+1
labels = []
for k in range(num_images):
one_hot = np.zeros(num_classes)
one_hot[int(mnist_target[k])]=1
labels += [one_hot]
labels = np.array(labels)
#labels in one-hot
train_labels = labels[:train_set_size]
test_labels = labels[-test_set_size:]
#labels in integer form
int_labels = np.array(mnist_target, dtype=int)
int_labels_train = int_labels[:train_set_size]
int_labels_test = int_labels[-test_set_size:]
```
%% Cell type:code id:00b3e353-a254-4cb5-975d-0741a50576c7 tags:
``` python
# View an image to make sure everything went well
which_one = 5
plt.imshow(train_images[which_one].reshape((28,28)));
```
%% Cell type:markdown id:textile-tower tags:
## 2. Radial Basis Function (RBF) networks
%% Cell type:markdown id:acquired-malaysia tags:
For our task, we will be using Radial Basis Function (RBF) Networks as our neural network model.
The pipeline, which is presented in the image below, consists of two layers. The first employs non-linear functions $g_1(\mathbf{x};\boldsymbol{\mu}): \mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times c}$.
The second is a linear layer, represented by a matrix of weights $\mathbf{W} \in \mathbb{R}^{c \times K}$, which maps the output of the previous layer to class scores; its role is to predict labels.
The pipeline proceeds in the following steps:
i) Choose a set of $c$ points $\boldsymbol{\mu}_j\in [0,1]^d$.
ii) Compute $g_1(\mathbf{x}_i;\boldsymbol{\mu}_j) = \exp^{-\frac{||{\mathbf{x}_i-\boldsymbol{\mu}_j||^2}}{\sigma^2}}=a_{ij}$ for all possible pairs of $i$ and $j$. Here $\sigma$ is a hyperparameter that controls the width of the gaussian.
iii) Compute the predicted labels $g_2(\mathbf{a}_i)= \mathbf{a}_i^{\top}\mathbf{W}= \hat{\mathbf{y}}_i$. Here $\mathbf{a}_i \in \mathbb{R}^c$ are the outputs of the layer $g_1$ for an input image $i$. $\hat{\mathbf{y}}_i$ is a row vector and $\hat{y}_{ij} = \sum_{m=1}^{c}a_{im}w_{mj}$, $j\in\{1,...,K\}$.
![RBF_NN.png](images/RBF_NN.png)
%% Cell type:markdown id:f762f16e-4a88-43a8-813f-c93c62407c72 tags:
Intuitively, the first layer of the RBF network can be viewed as matching the input data with a set of prototypes (templates) through a gaussian whose width is determined by $\sigma$. The second layer performs a weighted combination of the matching scores of the previous layer to determine the predicted label for a given point.
%% Cell type:markdown id:126fdb8e-9607-4f93-9726-692e5ed8bb91 tags:
**1.** For hyperparameters $c$ and $\sigma$ of your choice, select $c$ prototypes and obtain the output of the first layer of the RBF network. The prototypes can simply be random images from your training set.
The following functions might be helpful:
- [pairwise_distances (from scikit-learn)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)
- [random.choice (from numpy)](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html)
You can (optionally) perform an additional normalization step on the activations using the [softmax](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html) function.
%% Cell type:code id:0e4ab127-352e-49c3-9b72-9e69bfc8b4ba tags:otter_answer_cell
``` python
def get_rand_centers(num_centers, imgs):
"""
Sample num_centers (randomly) from imgs
Parameters
----------
num_centers : number of samples
imgs : matrix to sample rows from
Returns
-------
The samples matrix
"""
...
def get_activations(imgs, rand_centers, sigma, softmax_norm=False):
"""
Computes the activations of the images vs. the sample centers
Parameters
----------
imgs: images matrix to compute activations for
rand_centers: matrix of centers points
sigma: parameter of the gaussian kernel
softmax_norm: if true, perform softmax activation on the activations
Returns
-------
The activation matrix A
"""
...
```
%% Cell type:code id:19e1cca2-bb2c-40c7-b64a-e14c3b42e54d tags:otter_answer_cell
``` python
#pick random centers
number_of_centers = 200
rand_centers = get_rand_centers(number_of_centers, train_images)
sigma = 10
activations = get_activations(train_images, rand_centers, sigma)
```
%% Cell type:code id:6bec41d9 tags:
``` python
grader.check("q1")
```
%% Cell type:markdown id:institutional-thompson tags:
## 3. Training the network
%% Cell type:markdown id:disciplinary-present tags:
To make things easier, we will fix the parameters $\boldsymbol{\mu}$ and $\sigma$ of the network, i.e., we decide their values before and the remain constant throughout training and testing of the model. Therefore, the only trainable parameters are going to be the weights of the second layer.
To train the model, we are going to use the mean squared loss function that we mentioned earlier. For a training dataset with $n$ images we have
$$ \mathcal{L}(\text{training data}, \text{training labels}) = \frac{1}{2n}\sum_{i=1}^n\mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i) = \frac{1}{2n}\sum_{i=1}^n ||(\hat{\mathbf{y}}_{i} - \mathbf{y}_{i})||^2.$$
There are two ways of tuning those:
i) Backpropagation.
ii) Solve a linear system.
%% Cell type:markdown id:f0ef20c5-0cc0-4a7d-a80b-8acfa207f82e tags:
### 3.1 Training with backpropagation
Backpropagation depends on [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent#Description). The goal is to update the trainable parameters of the network by "moving them" in the direction that will decrease the loss function.
In our case, the weights $w_{kl}$ are updated in the following manner
$$ w_{kl}' = w_{kl}- \gamma \frac{\partial\mathcal{L}(\text{training data}, \text{training labels})}{\partial w_{kl}}, $$
where $\gamma$ is a hyper-parameter called the learning rate. The gradient of the Loss points towards the direction of steepest descent, hence we update the weights of the network towards that direction.
---
%% Cell type:markdown id:189e8b8e-a909-4f9e-950a-e0f65a48d70c tags:
<!-- BEGIN QUESTION -->
2. For the mean squared error loss, what is the gradient of the loss with respect to the weights $w_{kl}$ of the network?
%% Cell type:markdown id:0df8dee2 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:injured-mother tags:
<!-- END QUESTION -->
3. Train the weights of the linear layer using stochastic gradient descent. For $p$ iterations (called epochs), you have to update each weight $w_{kl}$ of the network once for each image, by computing the gradient of the loss with respect to that weight.
NB: if you implement gradient computation naively, it might be very slow. Consider using [numpy.outer](https://numpy.org/doc/stable/reference/generated/numpy.outer.html) to speed up computation.
%% Cell type:code id:ddb1a6cc-c9d5-4561-8c0d-0f0c5b396ae1 tags:
``` python
# Helper function to compute the loss
def get_predictions_loss(activations, weight, labels, int_labels):
predictions = activations@weights
num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()
loss = ((predictions - labels)*(predictions - labels)).sum(1).mean()
return loss, num_correct_predictions
```
%% Cell type:code id:e365cb99-d2c3-4b98-a814-e5c5485f3967 tags:otter_answer_cell
``` python
# compute the gradient for a single input
def compute_gradient(activation, weights, train_label):
"""
Computes gradients of the weight for a single activation
Parameters
----------
activation : vector containing the activation of the current image
weights: current weights
train_label: label of the current image
Returns
-------
The gradient to update the weights
"""
...
```
%% Cell type:code id:46e31c54-3682-4124-8a08-f675fba974f7 tags:otter_answer_cell
``` python
# Initial values for hyperparams. Feel free to experiment with them.
weights = (1/28)*np.random.randn(number_of_centers, num_classes)
epochs = 5 # you should train for more epochs !
learning_rate = 0.1
#Backpropagation with SGD
for k in range(epochs):
for counter, activation in enumerate(activations):
gradient = ...
weights = ...
loss_train, num_correct_predictions_train = get_predictions_loss(activations, weights, train_labels, int_labels_train)
print("Loss:", loss_train)
print("Number of correct predictions:", num_correct_predictions_train)
```
%% Cell type:code id:957d63fb tags:
``` python
grader.check("q3")
```
%% Cell type:markdown id:criminal-imperial tags:
We can now check how well your network does on the test set and print its accuracy.
%% Cell type:code id:civil-riding tags:
``` python
def get_accuracy(predictions, int_labels, set_size):
num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()
return num_correct_predictions/set_size
test_activations = get_activations(test_images, rand_centers, sigma)
test_predictions = test_activations@weights
print(f"The accuracy on the test set is: {get_accuracy(test_predictions, int_labels_test, test_set_size)*100} %")
```
%% Cell type:markdown id:promising-funds tags:
### 3.2 Solving the linear system
%% Cell type:markdown id:f8234141-0eeb-49da-8d37-892a6416e41d tags:
Since we only have one weight matrix to tune, we can avoid learning with backpropagation entirely. Consider the mean squared error for the whole dataset and a one-dimensional binary label $y_i$ for each data point for simplicity. The mean squared loss for the dataset is
$$ \sum_{i=1}^n (\hat{{y}}_{i} - {y}_{i})^2= ||(\mathbf{A}\mathbf{w} - \mathbf{y})||^2.$$ Here $\mathbf{A} \in \mathbb{R}^{n \times c}$ is the matrix that contains the outputs (activations) of the first layer. From a linear algebra perspective, we are looking for a matrix $\mathbf{w}$ that solves the linear system $ \mathbf{A}\mathbf{w} = \mathbf{y}.$
---
%% Cell type:markdown id:4e691348-afe4-4021-a42b-3985a4e9999e tags:
<!-- BEGIN QUESTION -->
4. Can we find solutions to this system (justify) and how ?
%% Cell type:markdown id:cf9e153e tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:subsequent-exercise tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
5. Based on your answer above, compute the weights of the neural network that best classify the data points of the training set.
%% Cell type:markdown id:6e7fb43a tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:collected-renaissance tags:otter_answer_cell
``` python
#calculate the weights of the linear layer
weights_lsq = ...
```
%% Cell type:code id:chinese-foster tags:otter_answer_cell
``` python
#predict the labels of each image in the training set and compute the accuracy
train_prediction_lsq = ...
print(f"The accuracy on the training set is: {get_accuracy(train_prediction_lsq, int_labels_train, train_set_size)*100} %")
```
%% Cell type:code id:young-invitation tags:otter_answer_cell
``` python
#calculate the activations of the test set
test_activations = get_activations(test_images, rand_centers, sigma)
```
%% Cell type:code id:naked-dover tags:otter_answer_cell
``` python
#predict the accuracy on the test set
test_predictions_lsq = ...
print(f"The accuracy on the test set is: {get_accuracy(test_predictions_lsq, int_labels_test, test_set_size)*100} %")
```
%% Cell type:markdown id:handmade-warrant tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
### 6. **Open ended**: On the choice of templates.
Suggest a different or more refined way to select templates for the RBF network and implement it. Check how it compares with your original approach.
Check how it works with the backpropagation and linear system solutions.
%% Cell type:markdown id:4b05dc5e tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:8734c27e tags:
<!-- END QUESTION -->
exercises/week07/images/RBF_NN.png

47 KiB

%% Cell type:code id:60552e92 tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Linear transforms.ipynb")
```
%% Cell type:markdown id:73bbe3f2-6e0a-463e-abc6-01213d4e5811 tags:
# Matrix Analysis 2025 - EE312
## Week 2 - Linear transforms
[N. Aspert](https://people.epfl.ch/nicolas.aspert) - [LTS2](https://lts2.epfl.ch)
The first week notebook (introduction to Python, Numpy and Matplotlib) can be used as a help.
## Objective
The end goal is to understand purely algebraic, matrix based, view of a few linear transforms. You will use those linear transform to perform some basic time-frequency analysis of signals.
%% Cell type:code id:9e522683-69ce-458f-a78c-9780e07a865e tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
```
%% Cell type:markdown id:5901baaa-4eb1-446a-8c0b-88369a9e213e tags:
## Part I - Fourier
%% Cell type:markdown id:7e81811f-e9c9-4d57-bbb9-bee71b7df722 tags:
<!-- BEGIN QUESTION -->
1. Prove that any set of orthogonal vectors $v_i \in \mathbb{C}^N, \, i=1, \ldots , M \leq N$ such that $v_i^H v_j = C \delta_{i,j}$ is linearly independent (where $C$ is some constant).
%% Cell type:markdown id:b6087a12 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:5d9c0799-a145-4b31-b8a7-3dc6d9fcf6da tags:otter_assign_solution_cell
We have a set of orthogonal vectors $v_i \in \mathbb{C}^N,$ with $i=1, \ldots , M \leq N$.\
Therefore, $v_i^H v_j = C \delta_{i,j} =\left\{\begin{matrix}
C,& if & i=j\\
0,& if & i\neq j
\end{matrix}\right.$.<br/>
We want to prove that the set is linearly independent, in other words that we have, with $c_{i}$ some scalar, \
$\sum_{j = 1}^{N} c_{j}.v_{j} = 0 \Leftrightarrow c_{j} = 0$.<br/><br/><br/>
Let's start, we can write:\
$v_{i}^H. (\sum_{j = 1}^{N}c_{j}.v_{j}) = v_{i}.0 = 0$<br/><br/>
Therefore,\
$0 = v_{i}^H. (\sum_{j = 1}^{N}c_{j}.v_{j}) =\sum_{j = 1}^{N}c_{j}.v_{j}.v_{i}^H$<br/><br/>
Yet, the set of vectors $v_i$ is orthogonal, $v_i^H v_j = C \delta_{i,j}$. So, all the terms of the sum are equal to zero except for the one where $i=j$:<br/>
$0 =\sum_{j = 1}^{N}c_{j}.v_{j}.v_{i}^H = 0 + c_{i}.v_{i}.v_{i}^H = c_{i}.v_{i}.v_{i}^H$.\
$0 = c_{i}.v_{i}.v_{i}^H$.<br/><br/>
Nonetheless, we know that $v_{i}$ is a non-zero vector, $v_{i}.v_{i}^H =C$. Hence, we can deduce:\
$0 = c_{i}.v_{i}.v_{i}^H$ *if and only if* $c_{i} = 0$<br/><br/>
This reasonning works for all $i=1, \ldots , M \leq N$.<br/><br/>
We can conclude that:\
$\sum_{j = 1}^{N} c_{j}.v_{j} = 0 \Leftrightarrow c_{j} = 0$.<br/>
**The set of orthogonal vectors is linearly independent.**
%% Cell type:markdown id:accea561-acef-429f-aa68-cc002cba19b8 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
2. Compute $a_r = \sum_{n=0}^{N-1}e^{j2\pi r\frac{n}{N}}$, where $r$ is an integer (discuss the result depending on the value of $r$).
%% Cell type:markdown id:3f0ff7a0 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:6da31060-083f-4873-9ad3-bf33c4a62b84 tags:otter_assign_solution_cell
If $r\equiv 0 \mod N$, then $e^{j2\pi r\frac{n}{N}}=1$, and $a_r = \sum_{n = 0}^{N-1} 1 = N$.
If $r\not\equiv 0 \mod N$, this is the sum of a geometric series $a_r = \sum_{n = 0}^{N-1} b^{n} = \frac{1 - b^N}{1 - b}$, with $b=e^{\frac{2jr\pi}{N}}$.
Since $b^N = e^{2jr\pi} = 1$, we then have $a_r = 0$.
%% Cell type:markdown id:b591574c-ba6d-4a15-8b42-0cb62508da73 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
3. Let $v_k \in \mathbb{C}^N$ be such that $v_k[n] = e^{-j 2 \pi \frac{kn}{N}}$, for $k,n = 0, \ldots , N-1$.
- Prove that these vectors are mutually orthogonal, hence linearly independent.
- Compute the norm of $v_k$.
%% Cell type:markdown id:d8031ee7 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:1cd4d461-705e-405d-a433-ee0d4caeb522 tags:otter_assign_solution_cell
$v_{k}$ has the form $\begin{pmatrix}
e^{0}\\
e^{-j 2 \pi \frac{k}{N}}\\
...\\
e^{-j 2 \pi \frac{kN-1}{N}}
\end{pmatrix}$, for $k,n = 0, \ldots , N-1$.<br/><br/>
As we mentioned in the previous question, to prove that vectors are mutually orthogonal, we need to demonstrate that $v_k^H v_l = C \delta_{k,l}$, with $C$ some constant.<br/><br/>
$v_k^H v_l = \sum_{n = 0}^{N-1} v_{k}[n]^H .v_l[n]$,&nbsp; with $k,l,n = 0, \ldots , N-1$. &nbsp;&nbsp;&nbsp; (simple rule of matrix multiplication)<br/><br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $= \sum_{n = 0}^{N-1} e^{j2\pi k\frac{n}{N}}.e^{-j2\pi l\frac{n}{N}} $<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $= \sum_{n = 0}^{N-1} e^{j2\pi \frac{n}{N}(k-l)}$<br/>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $= \sum_{n = 0}^{N-1} e^{j2\pi \frac{n}{N}r}$, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if we define $r=k-l$.<br/><br/>
Using the result of question 2, $v_k^H v_l = 0$ for $k\neq l$.
For $r = 0$, question2 shows that $v_k^H v_l = N$ for $k = l$.<br/><br/>
We can conclude with:<br/>
$v_k \in \mathbb{C}^N$ such as $v_k[n] = e^{-j 2 \pi \frac{kn}{N}}$, for $k,n = 0, \ldots , N-1$ give us:<br/>
$v_k^H v_l = N \delta_{k,l}$.
**The vectors are mutually orthogonal, hence linear independent.**
From the above result we have $v_k^Hv_k = N = \|v_k\|^2$, and therefore $\|v_k\| = \sqrt{N}$.
%% Cell type:markdown id:efc7a6a3-1602-4100-8ec8-d82f9e01ad86 tags:
<!-- END QUESTION -->
4. Implement the function `get_fourier_matrix` that returns a normalized Fourier matrix of size $N\times N$. Do not make use of the builtin DFT/FFT functions in `numpy` or `scipy`. Raise a `ValueError` exception if a ngetive or zero $N$ value is supplied.
%% Cell type:code id:f58a90c7-d4e3-48d9-a21b-42a4d79e2344 tags:otter_assign_solution_cell
``` python
def get_fourier_matrix(N):
# BEGIN SOLUTION
if N <= 0:
raise ValueError("Invalid size")
k = np.arange(0, N)
Ws = np.exp(-2j*np.pi*np.outer(k, k)/N)
return Ws/np.sqrt(N)
# END SOLUTION
```
%% Cell type:code id:2283adf4 tags:
``` python
grader.check("q4")
```
%% Output
q4 results: All test cases passed!
q4 - 1 message: Good, your implementation returns correct results
q4 - 2 message: Good, you did not use the scipy.linalg.dft function
q4 - 3 message: Good, you did not use the scipy.fft.fft function
q4 - 4 message: Good, you did not use the numpy.fft.fft function
q4 - 5 message: Good, your implementation returns an orthogonal matrix
q4 - 6 message: Good, you properly validated size before computing the result
%% Cell type:markdown id:a5f6b819-1e53-456a-bf67-b76258922d6e tags:
Let us now generate two test signals.
The first one $x_1(t)$ is made of four piecewise sinusoids, of different frequencies:
$$
x_1(t) = \cos(2\pi 5t), 0\le t < 2\\
x_1(t) = \cos(2\pi 10t), 2\le t < 4\\
x_1(t) = \cos(2\pi 25t), 4\le t < 6\\
x_1(t) = \cos(2\pi 50t), 6\le t < 8\\
$$
%% Cell type:code id:081e39c8-1874-4274-a7dc-59c8bdab84e4 tags:
``` python
Fs = 256 # sampling frequency
t = np.arange(0, Fs*8)/Fs
x1 = np.zeros(Fs*8)
x1[0:Fs*2] = np.cos(2*np.pi*5*t[0:Fs*2])
x1[Fs*2:Fs*4] = np.cos(2*np.pi*10*t[Fs*2:Fs*4])
x1[Fs*4:Fs*6] = np.cos(2*np.pi*25*t[Fs*4:Fs*6])
x1[Fs*6:Fs*8] = np.cos(2*np.pi*50*t[Fs*6:Fs*8])
```
%% Cell type:markdown id:b7f7475c-bfb0-4e9f-9dd6-ac9eeb5abef1 tags:
The second signal $x_2(t)$ is the sum of the same sinusoids over the complete time interval, with a scaling term s.t. the amplitude of both signals is identical.
%% Cell type:code id:8e0199b4-baf4-4376-a61c-b778aabe59f7 tags:
``` python
x2 = 0.25*(np.cos(2*np.pi*5*t) + np.cos(2*np.pi*10*t) + np.cos(2*np.pi*25*t) + np.cos(2*np.pi*50*t))
```
%% Cell type:markdown id:c52f3daf-0017-4252-b297-b5d6bb6c8151 tags:
<!-- BEGIN QUESTION -->
5.
- Display the generated signals using `plt.plot`.
- Compute their Fourier transforms using the Fourier matrix generate at the previous question.
- Display the amplitude of their Fourier spectrum. What do you observe ?
%% Cell type:markdown id:cbde7de1 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:7d4aee96-f61e-4e51-84e8-56b4e357aa0f tags:otter_assign_solution_cell
``` python
# plot x1
# BEGIN SOLUTION
plt.plot(x1)
# END SOLUTION
```
%% Cell type:code id:7a504349-03a3-4efd-bc3b-8249c09453fb tags:otter_assign_solution_cell
``` python
# plot x2
# BEGIN SOLUTION
plt.plot(x2)
# END SOLUTION
```
%% Cell type:code id:7bf1bf4a-04b8-43f6-8128-29647e1f964a tags:otter_assign_solution_cell
``` python
# Compute the Fourier transform of x1 and x2
# BEGIN SOLUTION
W = get_fourier_matrix(len(x1))
xf1 = W@x1
xf2 = W@x2
# END SOLUTION
```
%% Cell type:code id:159e693c-4427-4a8b-9f53-42dc63bafbc9 tags:otter_assign_solution_cell
``` python
# Plot the spectrum of x1
# BEGIN SOLUTION
plt.plot(np.abs(xf1))
# END SOLUTION
```
%% Cell type:code id:f2c3061b-900e-44d9-bb73-02c98334a818 tags:otter_assign_solution_cell
``` python
# Plot the spectrum of x2
# BEGIN SOLUTION
plt.plot(np.abs(xf2))
# END SOLUTION
```
%% Cell type:markdown id:61c226cc-df74-4a71-9e18-8b2071e65a91 tags:otter_assign_solution_cell
Despite having $x_1$ and $x_2$ being quite different signals, we notice that the amplitude of their Fourier transform is very similar. The Fourier transform of $x_1$ exhibit more ripples around the peaks due to discontinuites, but both show four symmetrical peaks at the frequencies of the 4 sinusoids composing the signals. While the input signals can be reconstructed from their Fourier transform, these representations are not very well suited, as they give precise informations about the frequencies present in both signals, but no information about their location in time.
%% Cell type:markdown id:19060be9-6494-4b4e-9803-1a4cb6fc36ac tags:
<!-- END QUESTION -->
In order to have a better compromise between time and frequency, the input signal will be split in smaller non-overlapping blocks of length $p$, and we will perform the DFT of each block.
6. Using the `get_fourier_matrix` implemented previously, fill the `get_block_dft_matrix` function below s.t. it returns a $N\times N$ matrix that will perform the block Fourier transform when applied to an input vector. Raise a `ValueError` if $p$ does not divide $N$.
Hint: [numpy.pad](https://numpy.org/doc/stable/reference/generated/numpy.pad.html#numpy.pad) and/or [numpy.kron](https://numpy.org/doc/stable/reference/generated/numpy.kron.html) might prove useful.
%% Cell type:code id:903005b2-e249-48a0-b28b-24daef16cdb7 tags:otter_assign_solution_cell
``` python
def get_block_dft_matrix(N, p):
# BEGIN SOLUTION
if N % p != 0:
raise ValueError("Incorrect size")
return np.kron(np.eye(N//p), get_fourier_matrix(p))
# END SOLUTION
```
%% Cell type:code id:58978e43 tags:
``` python
grader.check("q6")
```
%% Cell type:markdown id:4e894db3-b5e6-4b44-a7d1-988c59542b74 tags:
<!-- BEGIN QUESTION -->
We will now use this block Fourier transform to how the frequencies of the signal evolve through time.
7.
- Using the `reshape` and `plt.imshow` functions, display the amplitude of the result of the block Fourier transform of $x_1$ and $x_2$. Is the result improved when compared to the one observed in question 5 ?
- What is the influence of parameter $p$ ?
%% Cell type:markdown id:6239185e tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:6da33694-7bd6-4257-9a5f-9bb300cd5c69 tags:otter_assign_solution_cell
``` python
# Compute the block DFT matrix Wb
# BEGIN SOLUTION
p = 64
Wb = get_block_dft_matrix(len(x1), p)
# END SOLUTION
```
%% Cell type:code id:e1eb36fe-3079-4feb-86df-10596293e550 tags:otter_assign_solution_cell
``` python
# Plot the block DFT of x1
# BEGIN SOLUTION
plt.imshow(np.rot90(np.abs(Wb@x1).reshape(len(x1)//p, p)))
# END SOLUTION
```
%% Cell type:code id:4de97281-75ba-49d0-bc02-2573cf9791ec tags:otter_assign_solution_cell
``` python
# Plot the block DFT of x2
# BEGIN SOLUTION
plt.imshow(np.rot90(np.abs(Wb@x2).reshape(len(x2)//p, p)))
# END SOLUTION
```
%% Cell type:markdown id:34644739-72f3-4570-b27f-579baeb2f33d tags:otter_assign_solution_cell
We now have a better idea of how the frequencies present in the signals evolve through time. It is more obvious that $x_1$ contains a sequence of sinusoids and $x_2$ a superposition of sinusoids.
However, the limitation is that parameter $p$ controls the frequency resolution. If we chose a value of $p$ too small, the frequency resolution is poor, and if we chose a $p$ too big, the time resolution will be in turn bad. So while we improve the time-frequency display, there is a tradeoff to be made between time and frequency resolutions, as both cannot be simultaneously high.
%% Cell type:markdown id:76cc3589-b7d9-44ec-98ea-07fc6550a53a tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
8. In a real-world application, would generating a $N\times N$ matrix to perform the block Fourier transform be a good way to implement it ? (Justify)
%% Cell type:markdown id:ff1a761a tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:21f8ce3e-1548-45fe-aa05-7b59aa052eca tags:otter_assign_solution_cell
In a real-world application, the matrix as we implemented it is a poor choice. As the blocks that compose it are identical, it would be much more efficient to apply the smaller block Fourier matrix directly to the chunks of the input signals.
%% Cell type:markdown id:87372bd1-4055-4742-ab6a-f7b6ba8750af tags:
<!-- END QUESTION -->
## Part II - Haar transform
In this part we will study another approach to study the time/frequency properties of signals.
Let us consider a vector $x\in\mathbb{R}^N$, with $N$ being even. The single-level Haar transform decomposes $x$ into two vectors $a^1$ and $d^1$ belonging to $\mathbb{R}^{\frac{N}{2}}$.
The coefficients of $a^1$ are defined as follows: $a^1_n = \frac{1}{\sqrt{2}}(x_{2n-1} + x_{2n}), n=1, ..., \frac{N}{2}$. $a^1$ is referred to as the *average coefficients vector*.
The coefficients of $d^1$ are defined as follows: $d^1_n = \frac{1}{\sqrt{2}}(x_{2n-1} - x_{2n}), n=1, ..., \frac{N}{2}$. $d^1$ is referred to as the *detail coefficients vector*.
%% Cell type:markdown id:170a3e62-8b25-482c-a8dd-91988b1d08ed tags:
<!-- BEGIN QUESTION -->
9. Let us represent the single-level Haar transform by a matrix $H_1$ s.t.
$$
H_1 x = \begin{pmatrix}a^1 \\ d^1\end{pmatrix}
$$
Prove that $H_1$ is orthonormal.
%% Cell type:markdown id:56fd9eda tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:e5906e37-9a3f-469c-9904-ba4fe551612c tags:otter_assign_solution_cell
In order to check that $H_1H_1^T=H_1^TH_1 = I$ we need to distinguish different cases.
First, we see that $H_1$ has two types of rows: the upper-half is made of shifted copies of $\frac{1}{\sqrt{2}}(1, 1, 0, ...., 0)$, the lower-half is made of shifted copies of
$\frac{1}{\sqrt{2}}(1, -1, 0, ...., 0)$. Both are shifted by multiple of 2 elements, and the only case where those vectors have a common support is when they are shifted by the same amount of elements. Otherwise they have disjoint supports, and their scalar product is 0, which makes them orthogonal.
When they have a common support, there are two cases:
- we compute the scalar product of a row of $H_1$ with itself, which in this case is clearly equal to 1
- we compute the scalar product of a row of $H_1$ with its counterpart having the same support in the other half of the matrix, i.e. we compute the scalar product of $(..., 1, 1, 0, ... )$ with $(..., 1, -1, ...)$, which is clearly equal to 0.
This shows that $H_1$ is an orthonormal matrix.
%% Cell type:markdown id:92f1b78b-3017-4129-bc95-f132717bc7b9 tags:
<!-- END QUESTION -->
10. Write a function that returns the single-level Haar transform matrix $H_1$ for a given $N$. Raise a `ValueError` if $N$ is invalid.
%% Cell type:code id:5d123a1c-7f98-4edf-b896-7d773bf48e76 tags:otter_assign_solution_cell
``` python
def get_sl_haar_matrix(N):
# BEGIN SOLUTION
if N <= 0 or N%2 != 0:
raise ValueError("Invalid size")
Hu = np.kron(np.eye(N//2), [1, 1])
Hl = np.kron(np.eye(N//2), [1, -1])
return np.vstack([Hu, Hl])/np.sqrt(2)
# END SOLUTION
```
%% Cell type:code id:86b9b019 tags:
``` python
grader.check("q10")
```
%% Cell type:markdown id:e2e7a4af-0f4e-4972-a051-468a12967e79 tags:
The multi-level Haar transform is defined by recursively applying the single-level transform **to the average coefficient parts**.
For instance constructing 2-level Haar transform over $N$ points start with the previously defined $H_{1,N}$ matrix of size $N\times N$ and the corresponding $\frac{N}{2}\times\frac{N}{2}$ version denoted by $H_{1,\frac{N}{2}}$.
$H_{1,N}$ can be written as
$$
\begin{pmatrix} H_{1, N}^a \\ H_{1,N}^d \end{pmatrix},
$$
where $H_{1, N}^a$ and $H_{1,N}^d$ are respectively the average and detail coefficient matrices, both of size $\frac{N}{2}\times N$.
Following these notations, the 2-level Haar transform matrix $H_{2,N}$ can be written as:
$$
\begin{pmatrix} H_{1,\frac{N}{2}}H_{1, N}^a \\ H_{1,N}^d \end{pmatrix},
$$
11. Implement a function that returns the $H_{p,N}$ matrix of size $N\times N$ that performs a $p$-level haar transform. Raise a `ValueError` if the size and the level are incompatible, or if the level is smaller than 1.
%% Cell type:code id:d73eb83b-8d4b-4bf9-9c19-b3d15ee927c1 tags:otter_assign_solution_cell
``` python
def get_haar_matrix(N, level):
# BEGIN SOLUTION
if level < 1:
raise ValueError("Invalid level")
H = get_sl_haar_matrix(N)
cur_N = N
for k in range(level - 1):
cur_N = cur_N//2
if cur_N % 2 != 0:
raise ValueError("Incompatible size")
Hs = get_sl_haar_matrix(cur_N)
H[:cur_N, :] = Hs@H[:cur_N, :]
return H
# END SOLUTION
```
%% Cell type:code id:c3138843 tags:
``` python
grader.check("q11")
```
%% Cell type:markdown id:918653e3-22c1-4b25-abbc-2461a6736557 tags:
<!-- BEGIN QUESTION -->
12. Prove that $H_{p,N}$ is orthonormal.
%% Cell type:markdown id:02e21ab7 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:136913af-42f3-417a-a94d-2ae7fa7411c7 tags:otter_assign_solution_cell
We have shown that $H_{1,N}$ is orthonormal. Using the definition of $H_{2,N}$, we have
$$
H_{2,N}H_{2,N}^T=\begin{pmatrix} H_{1,\frac{N}{2}}H_{1, N}^a \\ H_{1,N}^d \end{pmatrix}\begin{pmatrix} {H_{1, N}^a}^T H_{1,\frac{N}{2}}^T & {H_{1,N}^d}^T \end{pmatrix},\\
= \begin{pmatrix} H_{1,\frac{N}{2}}(H_{1, N}^a{H_{1, N}^a}^T) H_{1,\frac{N}{2}}^T & H_{1,\frac{N}{2}}H_{1, N}^a {H_{1,N}^d}^T \\ H_{1,N}^d{H_{1, N}^a}^T H_{1,\frac{N}{2}}^T & H_{1,N}^d{_{1,N}^d}^T\end{pmatrix}.
$$
Since the vectors from $H^a$ and $H^d$ are orthogonal, we can simplify the expression to
$$
H_{2,N}H_{2,N}^T=\begin{pmatrix} I & 0 \\ 0 & I\end{pmatrix}.
$$
By induction, this is true for all levels.
%% Cell type:markdown id:8224d668-7cdc-4175-a587-0647df59b2b4 tags:
<!-- END QUESTION -->
### Haar transform visualization
In order to make the visualization of the Haar decomposition easy, we provide you the `plot_haar_coeffs` function that will display the average and detail coefficients of the different levels.
The function takes 2 arguments:
- the input signal
- the number of levels
%% Cell type:markdown id:b6cac5b0-a173-499c-ad89-9e4b82166c7a tags:
<!-- BEGIN QUESTION -->
13. Display the Haar transform of $x1$ and $x2$.
%% Cell type:code id:5ca2ec21-222e-430c-a7bd-d2af54e2de47 tags:
``` python
from display_helper import plot_haar_coeffs
```
%% Cell type:code id:6a946638-18dc-44fa-8ffc-0b44a3f653e1 tags:otter_assign_solution_cell
``` python
# display the decomposition of x1
# BEGIN SOLUTION
plot_haar_coeffs(x1, 5)
# END SOLUTION
```
%% Cell type:code id:b01b9267-9609-474e-b5e0-8b7adb687164 tags:otter_assign_solution_cell
``` python
# display the decomposition of x2
# BEGIN SOLUTION
plot_haar_coeffs(x2, 5)
# END SOLUTION
```
%% Cell type:markdown id:81e23ac6-de1a-4cb4-98f2-89545052f9f3 tags:
<!-- END QUESTION -->
## Part III - Denoising
We will now use the different transforms defined in part I and II to perform denoising.
Let us create a noisy signal for this purpose.
%% Cell type:code id:e5583ca2-13ea-49f8-baf6-4d6a223f3b2e tags:
``` python
angle1 = np.linspace(0, 5*np.pi/2, 300)
wave1 = np.sin(angle1)
angle2 = np.linspace(0, 3*np.pi/2, 300)
wave2 = np.sin(angle2)
angle3 = np.linspace(np.pi/2, 2*np.pi, 424)
wave3 = np.sin(angle3)
wave = np.append(wave1, wave2)
wave = np.append(wave, wave3)
wave_noisy = wave + 0.2*np.random.normal(0, 1, 1024)
```
%% Cell type:code id:d3278fc1-e814-47b9-83f0-61f4fad8d4d7 tags:
``` python
plt.plot(wave_noisy, 'r')
plt.plot(wave)
```
%% Cell type:markdown id:33489d2d-b038-4896-87e7-0af6568ad702 tags:
The noise is usually located in the higher frequencies. However, the signal we created is a bit special as it has two discontinuities which also generate high frequencies components (remember the Fourier transform of a rectangle function is a sinc).
%% Cell type:markdown id:fbd9ce45-7214-4ca7-a5e3-cf0c340458a3 tags:
<!-- BEGIN QUESTION -->
14. Implement a function `denoise_signal` that perform denoising of the input signal by using a supplied orthonormal transform matrix, and by setting the transformed coefficients having an amplitude smaller than a given threshold to 0. You might want to use the [numpy.where](https://numpy.org/doc/stable/reference/generated/numpy.where.html) function for this. When denoising using the Haar transform, you can preform the thresholding only on the detail coefficients. Implement the function `denoise_signal_haar` that performs this operation.
NB: The result returned should be real, in order to be displayed.
%% Cell type:code id:cf7270a7-de1d-4c24-913c-916479ee8ab5 tags:otter_assign_solution_cell
``` python
def denoise_signal(W, x, threshold=0.1):
"""
W: NxN input orthonormal matrix (Fourier, block-Fourier or Haar)
x: input signal (of length N)
"""
# BEGIN SOLUTION
x_tf = W@x
Winv = np.conj(W.T)
return np.real(Winv@np.where(np.abs(x_tf)<threshold, 0, x_tf))
# END SOLUTION
```
%% Cell type:code id:c029ab19-11a5-4bc9-b348-31bf8ca07bca tags:otter_assign_solution_cell
``` python
def denoise_signal_haar(W, x, threshold=0.1, detail_start_index=256):
"""
W: NxN input orthonormal matrix (Fourier, block-Fourier or Haar)
x: input signal (of length N)
detail_start_index: thresholding is performed on x[detail_start_index:]
"""
# BEGIN SOLUTION
x_tf = W@x
Winv = W.T
x_dn = x_tf
x_dn[detail_start_index:] = np.where(np.abs(x_tf[detail_start_index:])<threshold, 0, x_tf[detail_start_index:])
return Winv@x_dn
# END SOLUTION
```
%% Cell type:code id:1b47fbc5-070b-49c4-a616-d8e753adede1 tags:otter_assign_solution_cell
``` python
# Perform denoising with the full Fourier transform and display the result.
# Make sure you choose a good threshold
# BEGIN SOLUTION
waverec_fourier = denoise_signal(get_fourier_matrix(len(wave_noisy)), wave_noisy, 0.55)
plt.plot(wave_noisy, 'r')
plt.plot(waverec_fourier)
# END SOLUTION
```
%% Cell type:code id:d06fa927-ca94-4387-b987-a4e3ffeabeb2 tags:otter_assign_solution_cell
``` python
# Perform denoising with the block Fourier transform and display the result
# Make sure you choose a good threshold and block size
# BEGIN SOLUTION
waverec_block_fourier = denoise_signal(get_block_dft_matrix(len(wave_noisy), 16), wave_noisy, 0.4)
plt.plot(wave_noisy, 'r')
plt.plot(waverec_block_fourier)
# END SOLUTION
```
%% Cell type:code id:98e6e4a6-a289-4552-b5a9-27f40c40cb40 tags:otter_assign_solution_cell
``` python
# Perform denoising with the Haar transform and display the result
# Make sure you choose a good threshold and an appropriate number of levels
# BEGIN SOLUTION
waverec_haar = denoise_signal_haar(get_haar_matrix(len(wave_noisy), 5), wave_noisy, 0.9, 256)
plt.plot(wave_noisy, 'r')
plt.plot(waverec_haar)
# END SOLUTION
```
%% Cell type:code id:d5cb5707 tags:
``` python
grader.check("q14")
```
%% Cell type:markdown id:e6b05f2f-34a1-4689-82ef-c3be92c842ba tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
15. Compare the three denoising methods (Fourier, block Fourier and Haar). Which one performs better (in terms of noise removal but also in terms of discontinuity preservation) ? Was that expected (justify) ?
%% Cell type:markdown id:a999fadd tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:eb6b1561-b5cb-4db0-87a0-ca9a4245b7f3 tags:otter_assign_solution_cell
We see quite different results in terms of output. The full Fourier transform generates a cleaner denoised signal, however it also smoothes out the discontinuties, which might not be desireable (if you think of the 2D version applied to an image with sharp contours, they would be blurred out). Block DFT and Haar preserve discontinuities better but our very simple denoising method introduces also undesirable artifacts in the block DFT version, while Haar yields better results (good noise removal and discontinuity preservation). The Haar details coefficients are computed using only few neighboring samples, unlike high frequency Fourier coefficients. This helps in preserving the discontinuities when denoising.
%% Cell type:markdown id:e705a9ec tags:
<!-- END QUESTION -->
Source diff could not be displayed: it is too large. Options to address this: view the blob.
solutions/week03/images/pocs.png

76.9 KiB

Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:code id:5410f063 tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("Linear systems - Poisson equation.ipynb")
```
%% Cell type:markdown id:d133bce9-d159-4b1d-a25f-69bed8da493a tags:
# Matrix Analysis 2025 - EE312
## Week 5 - Discrete Poisson equation
[LTS2](https://lts2.epfl.ch)
### Objectives
Apply your knowledge about linear systems to Poisson equation resolution.
%% Cell type:markdown id:654ad4ee-c060-4bc3-b218-3a6709952750 tags:
## Poisson equation
Let $u,v \in \mathbb{R}^n$ represent a physical quantity $f$ and $g: \mathbb{R} \mapsto \mathbb{R}$ sampled at $n$ equi-spaced locations $x_i$, i.e $u_i = f(x_i)$, $v_i = g(x_i)$.
Let us assume that the underlying continuous object satisfies the Poisson equation: $\frac{d^2}{dx^2} f (x)= g(x)$ with constraints $f(x_j) = y_j$ for a subset of $m$ locations $j \in \{i_1, \ldots i_m \}$.
We will assume that **all the $i_k$ are distinct**.
This equation governs a number of physical principles, e.g. gravity, fluid mechanics and electrostatics. In the latter we have $\Delta \varphi = -\frac{\rho}{\varepsilon}$, where $\Delta$ is the Laplacian operator $(\frac{d^2}{dx^2} + \frac{d^2}{dy^2} + \frac{d^2}{dz^2})$, $\varphi$ is the electric potential, $\rho$ the density of electric charges and $\varepsilon$ the electric permittivity.
---
%% Cell type:markdown id:06ac5cfa-62ef-4477-a957-e240ddeb85d8 tags:
### 1D case
%% Cell type:code id:7f3cb74d-72f0-460b-a978-a47f87de8557 tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
```
%% Cell type:markdown id:9fd89d0b-50df-4711-a32b-ec1026e6b122 tags:
<!-- BEGIN QUESTION -->
1. Write down a matrix equation for the discrete version of $\frac{d^2}{dx^2} f (x)= g(x)$, using the finite-difference approximation of the derivative $\frac{d^2}{dx^2} f = f(x_{k+1}) - 2f(x_k) +f(x_{k-1})$. For the boundary conditions we will make the assumption that $x_{-1}=x_0$ and $x_{n-1}=x_n$ (also referred to as Dirichlet boundary conditions)
What is the rank of the corresponding matrix $D$ ?
%% Cell type:markdown id:ca746265 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:a21ed0d8-4b5d-42eb-b1bf-6955938ebe2d tags:otter_assign_solution_cell
If we use zero-padding instead of the constant extension this equations satisfy $Du = v$ with
$D=\begin{pmatrix}
-2 & 1 & 0 & 0& ... & 0\\
1 & -2 & 1 & 0 &... &0\\
0 & 1 & -2 & 1 &... &0\\
& & & ... & & \\
0 & 0 & ... & 1 & -2 & 1\\
0 & 0 & ... & 0 & 1 & -2\\
\end{pmatrix}$
This is the laplacian matrix.
$\mbox{rank }D = n$.
If we use a Dirichlet boundary condition, i.e. constant extension, we have $u_{-1}=u_0$ and $u_n = u_{n+1}$, therefore
$D=\begin{pmatrix}
-1 & 1 & 0 & 0& ... & 0\\
1 & -2 & 1 & 0 &... &0\\
0 & 1 & -2 & 1 &... &0\\
& & & ... & & \\
0 & 0 & ... & 1 & -2 & 1\\
0 & 0 & ... & 0 & 1 & -1\\
\end{pmatrix}$
In this case the rank is $n-1$ since the sum of rows/columns is 0 and if we remove one row/column they become linearly independent.
%% Cell type:markdown id:aed76fe2-8cb5-4b28-bd3b-fa0bea113a46 tags:
<!-- END QUESTION -->
<Your answers here>
%% Cell type:markdown id:cceac118-c43d-4782-9f7e-4392e5ecc451 tags:
2. Implement a function that creates the $D$ matrix (also called Laplacian). The [diag](https://numpy.org/doc/stable/reference/generated/numpy.diag.html) function in numpy might be useful.
%% Cell type:code id:d41105d8-6933-4ef8-b2f0-c6ac053e1f55 tags:otter_assign_solution_cell
``` python
def lapl_matrix(N):
# BEGIN SOLUTION
O = np.ones(N-1)
D = -2*np.eye(N) + np.diag(O, -1) + np.diag(O, 1)
D[0,0] = -1
D[N-1, N-1] = -1
return D
# END SOLUTION
```
%% Cell type:code id:1abc2b2c tags:
``` python
grader.check("q2")
```
%% Cell type:markdown id:4cea41dc-6f12-42f2-8644-b51e4918504d tags:
<!-- BEGIN QUESTION -->
3. Write down a matrix equation for the discrete version of $f(x_j) = y_j$. What is the rank of the corresponding matrix $B$ ?
%% Cell type:markdown id:63a9357b tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:d47029a2-bd30-4da8-ae9a-f41240154377 tags:otter_assign_solution_cell
This operation can be represented by a matrix $B\in\mathbb{R}^{n\times m}$ where the coefficients $b_{kp} = \delta_{ki_p}$.
The rank of this matrix is $m$, under the assumption that all $i_k$ are distinct, since every row has a single non-zero value.
Using this matrix we can rewrite the condition $f(x_j) = y_j$ as $Bu=\begin{pmatrix}y_0\\ \vdots \\y_{m-1}\end{pmatrix}$
%% Cell type:markdown id:418bc83f-7d46-41c2-8a7e-df892857320c tags:
<!-- END QUESTION -->
4. Implement a function that creates matrix $B$
%% Cell type:code id:1d294eda-7edc-42c2-9640-13bec779bd48 tags:otter_assign_solution_cell
``` python
def b_matrix(N, idx_list): # idx_list is a list of valid indices, e.g. N=5, idx_list=[1,3]
# BEGIN SOLUTION
m = len(idx_list)
if m<1:
raise ValueError("Supply at least one index")
if not np.all(np.array(idx_list)<N):
raise ValueError("Invalid list of indices supplied")
if not np.all(np.array(idx_list)>=0):
raise ValueError("Negative indices supplied")
if N<1 or m >= N:
raise ValueError("Invalid dimensions")
B = np.zeros((m, N))
for p in zip(range(m), idx_list):
B[p[0], p[1]] = 1.
return B
# END SOLUTION
```
%% Cell type:code id:58dbd947 tags:
``` python
grader.check("q4")
```
%% Cell type:markdown id:b77f215e-e4d0-4976-9ed6-7ee210c689a6 tags:
<!-- BEGIN QUESTION -->
3. Write down a matrix equation for the full problem (Hint: Consider the matrix $C=\begin{pmatrix}D\\B\end{pmatrix}$). Discuss the rank of the matrix and deduce a way to numerically solve the linear system. Implement a function that returns matrix $C$.
%% Cell type:markdown id:19ecce3a tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:c9ea268f-a134-43d5-a74f-7853bd8d3547 tags:otter_assign_solution_cell
Let us use the matrix $C=\begin{pmatrix}D\\B\end{pmatrix}$.
The following holds $Cu = z$, with $z=\begin{pmatrix}v_0\\v_1\\...\\v_{n-1}\\y_0\\y_1\\...\\y_{m-1}\end{pmatrix}$.
Let us suppose there exist non zero $\alpha_i$ s.t.
$\sum_{i=0}^{n-1} \alpha_i C_i = 0$, where $C_i$ is the $i$-th column of $C$.
Looking at the bottom part of $C$, we must have $\alpha_i=0, i\in\{i_1,...,i_m\}$. The $\alpha_i$ are can only be non-zero when the lower part of the vector only has 0s. Given a subset of columns of $D$ will have at least one row with a single non zero coefficient making the associated $\alpha_i = 0$, which then propagates to the rest (since columns have disjoint support), the rank of $C$ is $n$.
We can get the solution using $C^+$.
%% Cell type:code id:2eaf5a8c-b83a-4f53-9c4e-6192bf4e6fa4 tags:otter_assign_solution_cell
``` python
def c_matrix(D, B):
# BEGIN SOLUTION
sd = D.shape
sb = B. shape
C = np.zeros((sd[0]+sb[0], sd[1]))
C[:sd[0], :] = D
C[sd[0]:, :] = B
return C
# END SOLUTION
```
%% Cell type:code id:c0c17d8b tags:
``` python
grader.check("q4")
```
%% Cell type:markdown id:10a3d136-f5f3-4c1b-9c0a-ece58f891dfe tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
5. What explicit formula can you use to compute the pseudo-inverse of $C$ (justify)?
%% Cell type:markdown id:fd6f7561 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:4daa620b-d617-49e8-9504-41151b02ab00 tags:otter_assign_solution_cell
The rank of $C$ is $n$, the columns of $C$ are independent, therefore the matrix is 1-1. We can use the left pseudo inverse
$C^+=(C^TC)^{-1}C^T$
%% Cell type:markdown id:3726eb03-0c82-4aed-b179-77ff947583c3 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
6. Implement a function that return the solution of the problem. You can either use the formula above (you may then use the [linalg.inv](https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html) or compute the pseudo-inverse using [linalg.pinv](https://numpy.org/doc/stable/reference/generated/numpy.linalg.pinv.html).
%% Cell type:code id:79f8c6d1-7b92-48b6-bb90-e888aaa41243 tags:otter_assign_solution_cell
``` python
# v is a vector of size N
# u is a vector of size len(idx_list)
def solve_poisson(N, v, idx_list, u):
# BEGIN SOLUTION
D = lapl_matrix(N)
B = b_matrix(N, idx_list)
C = c_matrix(D, B)
z = np.zeros(C.shape[0])
z[:N] = v
for p in zip(range(len(u)), u):
z[N+p[0]] = p[1]
Cinv = np.linalg.pinv(C)
#or
# Cinv = np.linalg.inv(C.T@C)@C.T
return Cinv@z
# END SOLUTION
```
%% Cell type:markdown id:ff6184e1-e308-47d1-8b72-ab4559b48adb tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
7. Let us now consider the solutions of the particular case $g(x) = 0, \forall x\in\mathbb{R}$. What are the analytical solutions that would solve Poisson equation ?
%% Cell type:markdown id:d47ff697 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:fc80136a-8da3-4fa2-a9c1-c77962761e38 tags:otter_assign_solution_cell
The analytical solutions are polynomial of degree 1 (s.t. their second derivative is 0), i.e. linear functions, i.e. $f(x)=ax+b$.
%% Cell type:markdown id:4a33db39-077d-4775-a55d-ae5bd7400b23 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
Let us verify that our implementation of the solution works with a small example. Fill the values for $u$ s.t. they belong to the solution space.
%% Cell type:code id:96381701-e737-41c1-8642-fa1a5c13924c tags:otter_assign_solution_cell
``` python
N = 50
v = np.zeros(N)
idx_list = [10, 20, 30, 40]
u = [10, 20, 30, 40] # SOLUTION
sol = solve_poisson(N, v, idx_list, u)
plt.plot(sol) # plot the result
```
%% Output
[<matplotlib.lines.Line2D at 0x10af67a10>]
%% Cell type:markdown id:04e6fc55-6c8a-4dfa-8ca0-9b45ed2f157c tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
9. Are the results conform to what we expected ? What happens near the boundaries ?
%% Cell type:markdown id:41b1af96 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:5710b3b8-a7be-4263-86a5-81d735062bad tags:otter_assign_solution_cell
We obtain a solution that is linear except near it boundaries. The aspect of the solution near the boundaries is due to the Dirichlet conditions we impose (i.e. solution should be constant outside the interval displayed). The values we had set as constraints are matched closely.
%% Cell type:markdown id:ecf46b87-accd-4043-8ab6-06835830699b tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
10. Let us use a step function for $f$ and try to solve the system
%% Cell type:code id:d81b4501-b2a5-4412-a266-d04ffa8dd2c7 tags:
``` python
idx_list = [10, 20, 30, 40]
u2 = [1, 1, -1, -1]
sol2 = solve_poisson(N, v, idx_list, u2)
plt.scatter(idx_list, u2, marker='+', color='r')
plt.plot(sol2)
```
%% Output
[<matplotlib.lines.Line2D at 0x10fe128d0>]
%% Cell type:markdown id:4918d31c-40a2-4df3-88ec-d18a80e879e7 tags:
What do you observe and what is the reason for that ?
%% Cell type:markdown id:e0d2ec3d tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:6f86a2fe-3fd2-45ed-871f-8850cbe198a4 tags:otter_assign_solution_cell
If we deliberately choose $f$ to be outside of the solution space, we obtain a poor approximation and our constraints are not met (which is expected).
We can check that the Laplacian of computed solution and the Laplacian of the step function (from which the constraints are sampled) do not match
%% Cell type:code id:21d1f686-af17-4660-a29e-656e6eee670b tags:otter_assign_solution_cell
``` python
D = lapl_matrix(N)
z = D@sol2 # Laplacian of the computed solution
step=np.zeros(50)
step[:25]=1
step[25:]=-1
plt.plot(z)
plt.plot(D@step) # plot the Laplacian of the step
plt.legend(['Lapl. solution', 'Lapl. step'])
```
%% Output
<matplotlib.legend.Legend at 0x1187e2610>
%% Cell type:markdown id:b520b2c6-7c84-4c3e-be16-8eb866648a97 tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
11. Let us now consider the application of the Poisson equation to electrostatic. Knowing a distribution of electric charges, we can solve it to compute the electric potential. The distribution of electric charges will be very simple, consisting of two punctual charges
%% Cell type:code id:96932aa1-74aa-4f4e-bbf5-df0f0d85c47c tags:
``` python
N = 50
v3 = np.zeros(N)
v3[10] = 1
v3[40] = 1
plt.plot(v3)
```
%% Output
[<matplotlib.lines.Line2D at 0x10feeef50>]
%% Cell type:markdown id:42861911-53e1-4cb3-8dc7-aa617c01a7e8 tags:
- What are the analytical solutions for this problem ? (remember the discrete derivative of a step function is a Dirac function, i.e. 0 everywhere except at one point where it is 1).
- Plot the analytical solution
- Compute and plot the numerical solution
- What do you observe ?
%% Cell type:markdown id:ddc8c11f tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:bc86b8ba-89c7-42a4-bae3-4279b8314fee tags:otter_assign_solution_cell
The analytical solution is a piecewise linear solution. Integrating once the 2-peaked will lead to picewise constant function, and piecewise linear if we integrate twice.
If you use the constraints we impose for the numerical example you can compute the theoretical solution of the problem satisfying the constraints (that was not asked in the question so the above answer was sufficient). If you impose the solution to be constant between 10 and 40, it means its derivative is 0 in this interval. Below 10 the derivative is then -1 and above 40 the derivative is 1.
%% Cell type:code id:f42fc23b-c412-4791-b5b6-bef54fa3bab3 tags:otter_assign_solution_cell
``` python
# BEGIN SOLUTION
d=np.zeros(50)
d[:10]=-1
d[40:]=1
plt.plot(d)
# END SOLUTION
```
%% Output
[<matplotlib.lines.Line2D at 0x10ff78890>]
%% Cell type:markdown id:6aba7c71-0e5b-40de-ae37-bce24524da06 tags:otter_assign_solution_cell
If we solve $-x+a = 1$ for $x=10$ and $x+b=1$ for $x = 40$, the analytical solution that satisfies the constraint is:
- $f(x) = -x+11, x<10$
- $f(x) = 1, 10\le x\le 40$
- $f(x) = x - 39, x>40$
%% Cell type:code id:f2f9bd66-067e-4590-8088-87b7969621bb tags:otter_assign_solution_cell
``` python
# Plot analytical solution
# BEGIN SOLUTION
x1 = np.arange(0, 11)
x2 = np.arange(10, 41)
x3 = np.arange(40, 51)
plt.plot(x1, -x1+11)
plt.plot(x2, np.ones(x2.shape))
plt.plot(x3, x3-39)
# END SOLUTION
```
%% Output
[<matplotlib.lines.Line2D at 0x10fff5990>]
%% Cell type:code id:22eae7ab-4c91-4000-b123-7105cc4c023a tags:otter_assign_solution_cell
``` python
idx_list = [10, 20, 30, 40]
u3 = [1, 1, 1, 1]
sol3 = solve_poisson(N, v3, idx_list, u3)
plt.scatter(idx_list, u3, marker='+', color='r')
plt.plot(sol3)
```
%% Output
[<matplotlib.lines.Line2D at 0x10ff7ad90>]
%% Cell type:code id:b0310f39-9bbf-4518-be7d-af0e3f6724a7 tags:otter_assign_solution_cell
``` python
# check the 2nd derivative of the solution computed
plt.plot(lapl_matrix(50)@sol3[:50]) # SOLUTION
```
%% Output
[<matplotlib.lines.Line2D at 0x1180751d0>]
%% Cell type:markdown id:6e71baf6-6e79-4b39-b420-0b29d47444b5 tags:otter_assign_solution_cell
We can see the solution computed is acceptable as its second derivative matches the original distribution (it is not fully constant between the peaks but close enough). The Dirichlet boundary conditions are once again interfering at $x=0$ and $x=50$, and the derivative not being continuous also has an impact on the constant part.
%% Cell type:markdown id:daf06368 tags:
<!-- END QUESTION -->
Source diff could not be displayed: it is too large. Options to address this: view the blob.
%% Cell type:code id:bd39160c tags:
``` python
# Initialize Otter
import otter
grader = otter.Notebook("RBF networks.ipynb")
```
%% Cell type:markdown id:b62d1e09-41c3-40d0-a763-e8231d437183 tags:
# Matrix Analysis 2024 - EE312
## Week 8 - Image classification with Radial Basis Function (RBF) networks
[LTS2](https://lts2.epfl.ch)
%% Cell type:code id:nasty-access tags:
``` python
import numpy as np
from matplotlib import pyplot as plt
from sklearn.metrics import pairwise_distances
from scipy.spatial import distance_matrix
from scipy.special import softmax
import requests
from pathlib import Path
```
%% Cell type:markdown id:vietnamese-basin tags:
## 1. Image Classification
In this exercise, we will be doing image classification with a simple neural network. For simplicity, let's assume we will be working with black and white images.
Given an input image $i$ represented as a vector of pixel intensities $ \mathbf{x}_i \in [0,1]^d$, we want to predict its correct label $\mathbf{y}_i$, which is represented as a one-hot vector in $\{0,1\}^K$, where $K$ is the number of possible categories (classes) that the image may belong to. For example, we may have pictures of cats and dogs, and our goal would be to correctly tag those images as either cat or dog. In that case we would have $K=2$, and the vectors $\begin{pmatrix}0 \\ 1\end{pmatrix}$ and $\begin{pmatrix}1 \\ 0\end{pmatrix}$ to represent the classes of cat and dog.
In today's example we will be using the MNIST handwritten digit dataset. It contains images of handwritten numbers from 0 to 9 and our goal is to create a model that can accurately tag each image with its number. Let's load the data first.
%% Cell type:code id:exact-cycling tags:
``` python
### Load the data
# Download data if needed
if not Path("./mnist_data.npz").is_file():
r = requests.get('https://os.unil.cloud.switch.ch/swift/v1/lts2-ee312/mnist_data.npz', allow_redirects=True)
with open('mnist_data.npz', 'wb') as f: # save locally
f.write(r.content)
mnist = np.load('mnist_data.npz')
```
%% Cell type:markdown id:varying-taylor tags:
In the context of classification, neural networks are models that given one (or multiple) input data points produce as output a set of corresponding labels for each input. The model itself consists of parametric functions $g_i$ which can be applied sequentially to the input data, resulting in a set of labels which are the model's prediction for the data. For example, in a model that consists of two parameteric functions $g_1$ and $g_2$, for a given $\mathbf{x}_i$, we have the predicted label $ \hat{\mathbf{y}}_i = g_1(g_2(\mathbf{x}_i))$. The parametric functions are commonly called "layers".
In a standard image classification setup, we are given some training data which we can use to tune the parameters of the parametric functions $g_i$ in order to improve its ability to predict the labels correctly. The parameters are generally tuned with respect to some objective (commonly called a loss function). We want to find the parameters of the model that minimize this loss function. Various loss functions can be used, but in general they tend to encode how "wrong" the model is. For
example, on a given image $i$ one can use the loss $\mathcal{L}(\hat{\mathbf{y}_i}, \mathbf{y}_i)= \sum_{j=1}^{K}(\hat{{y}}_{ij} -{y}_{ij})^2 $, which is the mean squared difference between the vector coordinates of the predicted label of the image and the ones of the actual label $\mathbf{y}_i$.
Minimizing the loss over the whole training set is referred to as "training the model". Furthermore, the goal is that given new data we have not seen before and we have not trained our model with, the model will still be able to classify accurately.
Before we go into the details of the model and how we will train it, let's prepare the data.
%% Cell type:code id:e2140162-4414-47c2-ae99-d616300a1d65 tags:
``` python
# Preprocess the data
images = mnist['data']
num_images = images.shape[0]
train_set_size = 60000
test_set_size = 10000
train_images = images[:train_set_size]
train_images = train_images/255.
train_images = train_images
test_images = images[-test_set_size:]
test_images = test_images/255.
test_images = test_images
#create one-hot encodings of labels
mnist_target = mnist['target']
num_classes = mnist_target.max()+1
labels = []
for k in range(num_images):
one_hot = np.zeros(num_classes)
one_hot[int(mnist_target[k])]=1
labels += [one_hot]
labels = np.array(labels)
#labels in one-hot
train_labels = labels[:train_set_size]
test_labels = labels[-test_set_size:]
#labels in integer form
int_labels = np.array(mnist_target, dtype=int)
int_labels_train = int_labels[:train_set_size]
int_labels_test = int_labels[-test_set_size:]
```
%% Cell type:code id:00b3e353-a254-4cb5-975d-0741a50576c7 tags:
``` python
# View an image to make sure everything went well
which_one = 5
plt.imshow(train_images[which_one].reshape((28,28)));
```
%% Output
%% Cell type:markdown id:textile-tower tags:
## 2. Radial Basis Function (RBF) networks
%% Cell type:markdown id:acquired-malaysia tags:
For our task, we will be using Radial Basis Function (RBF) Networks as our neural network model.
The pipeline, which is presented in the image below, consists of two layers. The first employs non-linear functions $g_1(\mathbf{x};\boldsymbol{\mu}): \mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times c}$.
The second is a linear layer, represented by a matrix of weights $\mathbf{W} \in \mathbb{R}^{c \times K}$, which maps the output of the previous layer to class scores; its role is to predict labels.
The pipeline proceeds in the following steps:
i) Choose a set of $c$ points $\boldsymbol{\mu}_j\in [0,1]^d$.
ii) Compute $g_1(\mathbf{x}_i;\boldsymbol{\mu}_j) = \exp^{-\frac{||{\mathbf{x}_i-\boldsymbol{\mu}_j||^2}}{\sigma^2}}=a_{ij}$ for all possible pairs of $i$ and $j$. Here $\sigma$ is a hyperparameter that controls the width of the gaussian.
iii) Compute the predicted labels $g_2(\mathbf{a}_i)= \mathbf{a}_i^{\top}\mathbf{W}= \hat{\mathbf{y}}_i$. Here $\mathbf{a}_i \in \mathbb{R}^c$ are the outputs of the layer $g_1$ for an input image $i$. $\hat{\mathbf{y}}_i$ is a row vector and $\hat{y}_{ij} = \sum_{m=1}^{c}a_{im}w_{mj}$, $j\in\{1,...,K\}$.
![RBF_NN.png](images/RBF_NN.png)
%% Cell type:markdown id:f762f16e-4a88-43a8-813f-c93c62407c72 tags:
Intuitively, the first layer of the RBF network can be viewed as matching the input data with a set of prototypes (templates) through a gaussian whose width is determined by $\sigma$. The second layer performs a weighted combination of the matching scores of the previous layer to determine the predicted label for a given point.
%% Cell type:markdown id:126fdb8e-9607-4f93-9726-692e5ed8bb91 tags:
**1.** For hyperparameters $c$ and $\sigma$ of your choice, select $c$ prototypes and obtain the output of the first layer of the RBF network. The prototypes can simply be random images from your training set.
The following functions might be helpful:
- [pairwise_distances (from scikit-learn)](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)
- [random.choice (from numpy)](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html)
You can (optionally) perform an additional normalization step on the activations using the [softmax](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html) function.
%% Cell type:code id:0e4ab127-352e-49c3-9b72-9e69bfc8b4ba tags:otter_assign_solution_cell
``` python
def get_rand_centers(num_centers, imgs):
"""
Sample num_centers (randomly) from imgs
Parameters
----------
num_centers : number of samples
imgs : matrix to sample rows from
Returns
-------
The samples matrix
"""
# BEGIN SOLUTION
num_imgs = imgs.shape[0]
if num_centers > num_imgs or num_centers < 1:
raise ValueError("Invalid number of centers requested")
rand_center_indices = np.random.choice(num_imgs, size=num_centers, replace=False)
rand_centers = imgs[rand_center_indices]
return rand_centers
# END SOLUTION
def get_activations(imgs, rand_centers, sigma, softmax_norm=False):
"""
Computes the activations of the images vs. the sample centers
Parameters
----------
imgs: images matrix to compute activations for
rand_centers: matrix of centers points
sigma: parameter of the gaussian kernel
softmax_norm: if true, perform softmax activation on the activations
Returns
-------
The activation matrix A
"""
# BEGIN SOLUTION
# compute activations of the first layer
if imgs.shape[1] != rand_centers.shape[1]:
raise ValueError("Size mismatch between images/centers")
distmat = pairwise_distances(imgs, rand_centers)
activations = np.exp(-distmat*distmat/(sigma*sigma))
# optional normalization step
if softmax_norm:
activations = softmax(activations, axis=1)
return activations
# END SOLUTION
```
%% Cell type:code id:19e1cca2-bb2c-40c7-b64a-e14c3b42e54d tags:otter_assign_solution_cell
``` python
#pick random centers
number_of_centers = 200
rand_centers = get_rand_centers(number_of_centers, train_images)
sigma = 10
activations = get_activations(train_images, rand_centers, sigma)
```
%% Cell type:code id:6bec41d9 tags:
``` python
grader.check("q1")
```
%% Cell type:markdown id:institutional-thompson tags:
## 3. Training the network
%% Cell type:markdown id:disciplinary-present tags:
To make things easier, we will fix the parameters $\boldsymbol{\mu}$ and $\sigma$ of the network, i.e., we decide their values before and the remain constant throughout training and testing of the model. Therefore, the only trainable parameters are going to be the weights of the second layer.
To train the model, we are going to use the mean squared loss function that we mentioned earlier. For a training dataset with $n$ images we have
$$ \mathcal{L}(\text{training data}, \text{training labels}) = \frac{1}{2n}\sum_{i=1}^n\mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i) = \frac{1}{2n}\sum_{i=1}^n ||(\hat{\mathbf{y}}_{i} - \mathbf{y}_{i})||^2.$$
There are two ways of tuning those:
i) Backpropagation.
ii) Solve a linear system.
%% Cell type:markdown id:f0ef20c5-0cc0-4a7d-a80b-8acfa207f82e tags:
### 3.1 Training with backpropagation
Backpropagation depends on [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent#Description). The goal is to update the trainable parameters of the network by "moving them" in the direction that will decrease the loss function.
In our case, the weights $w_{kl}$ are updated in the following manner
$$ w_{kl}' = w_{kl}- \gamma \frac{\partial\mathcal{L}(\text{training data}, \text{training labels})}{\partial w_{kl}}, $$
where $\gamma$ is a hyper-parameter called the learning rate. The gradient of the Loss points towards the direction of steepest descent, hence we update the weights of the network towards that direction.
---
%% Cell type:markdown id:189e8b8e-a909-4f9e-950a-e0f65a48d70c tags:
<!-- BEGIN QUESTION -->
2. For the mean squared error loss, what is the gradient of the loss with respect to the weights $w_{kl}$ of the network?
%% Cell type:markdown id:0df8dee2 tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:9813a207-46ae-4891-9b43-1b59a2a3717c tags:otter_assign_solution_cell
First let us expand the expression of the loss:
$\sum_{i=1}^n ||(\hat{\mathbf{y}}_{i} - \mathbf{y}_{i})||^2 = \sum_{i=1}^n \sum_{p=1}^K(\hat{\mathbf{y}}_{ip} - \mathbf{y}_{ip})^2$
The gradient of the loss with respect to the weights for a given input image can be calculated as
$$\frac{\partial \mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i)}{\partial w_{kl}} =\frac{\partial \sum_{p=1}^K (\hat{y}_{ip}-y_{ip})^2}{\partial w_{kl}}= \frac{ \sum_{p=1}^K \partial(\hat{y}_{ip}-y_{ip})^2}{\partial w_{kl}} = \sum_{p=1}^K 2(\hat{y}_{ip}-y_{ip})\frac{ \partial\hat{y}_{ip}}{\partial w_{kl}}. $$
If we expand $\hat{y}_{ip} = \sum_{m=1}^{c}a_{im}w_{mp}$ in the partial derivative, this leads to :
$\frac{\partial \mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i)}{\partial w_{kl}} = \sum_{p=1}^K 2(\hat{y}_{ip}-y_{ip})\frac{ \partial\sum_{m=1}^{c}a_{im}w_{mp}}{\partial w_{kl}} = \sum_{p=1}^K 2(\hat{y}_{ip}-y_{ip})\sum_{m=1}^{c}a_{im}\frac{ \partial w_{mp}}{\partial w_{kl}}$.
The term $\frac{ \partial w_{mp}}{\partial w_{kl}}$ is always zero except when $p=l$ and $m=k$, therefore:
$\frac{\partial \mathcal{L}(\hat{\mathbf{y}}_i,\mathbf{y}_i)}{\partial w_{kl}} = 2(\hat{y}_{il}-y_{il})a_{ik}$.
%% Cell type:markdown id:injured-mother tags:
<!-- END QUESTION -->
3. Train the weights of the linear layer using stochastic gradient descent. For $p$ iterations (called epochs), you have to update each weight $w_{kl}$ of the network once for each image, by computing the gradient of the loss with respect to that weight.
NB: if you implement gradient computation naively, it might be very slow. Consider using [numpy.outer](https://numpy.org/doc/stable/reference/generated/numpy.outer.html) to speed up computation.
%% Cell type:code id:ddb1a6cc-c9d5-4561-8c0d-0f0c5b396ae1 tags:
``` python
# Helper function to compute the loss
def get_predictions_loss(activations, weight, labels, int_labels):
predictions = activations@weights
num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()
loss = ((predictions - labels)*(predictions - labels)).sum(1).mean()
return loss, num_correct_predictions
```
%% Cell type:code id:e365cb99-d2c3-4b98-a814-e5c5485f3967 tags:otter_assign_solution_cell
``` python
# compute the gradient for a single input
def compute_gradient(activation, weights, train_label):
"""
Computes gradients of the weight for a single activation
Parameters
----------
activation : vector containing the activation of the current image
weights: current weights
train_label: label of the current image
Returns
-------
The gradient to update the weights
"""
# BEGIN SOLUTION
prediction = activation@weights # denoted as \hat{y} in the text
delta = prediction - train_label
grad = np.outer(delta, activation).T
# the above line is a much faster shortcut for :
#grad = np.zeros(weights.shape)
#for k in range(grad.shape[0]):
# grad[k, :] = 2*delta*activation[k]
return grad
# END SOLUTION
```
%% Cell type:code id:46e31c54-3682-4124-8a08-f675fba974f7 tags:otter_assign_solution_cell
``` python
# Initial values for hyperparams. Feel free to experiment with them.
weights = (1/28)*np.random.randn(number_of_centers, num_classes)
epochs = 5 # you should train for more epochs !
learning_rate = 0.1
#Backpropagation with SGD
for k in range(epochs):
for counter, activation in enumerate(activations):
gradient = compute_gradient(activation, weights, train_labels[counter])/(2*train_set_size) # SOLUTION
weights = weights - learning_rate*gradient # SOLUTION
loss_train, num_correct_predictions_train = get_predictions_loss(activations, weights, train_labels, int_labels_train)
print("Loss:", loss_train)
print("Number of correct predictions:", num_correct_predictions_train)
```
%% Cell type:code id:957d63fb tags:
``` python
grader.check("q3")
```
%% Cell type:markdown id:criminal-imperial tags:
We can now check how well your network does on the test set and print its accuracy.
%% Cell type:code id:civil-riding tags:
``` python
def get_accuracy(predictions, int_labels, set_size):
num_correct_predictions = ((predictions.argmax(1) - int_labels)==0).sum()
return num_correct_predictions/set_size
test_activations = get_activations(test_images, rand_centers, sigma)
test_predictions = test_activations@weights
print(f"The accuracy on the test set is: {get_accuracy(test_predictions, int_labels_test, test_set_size)*100} %")
```
%% Output
The accuracy on the test set is: 50.73 %
%% Cell type:markdown id:promising-funds tags:
### 3.2 Solving the linear system
%% Cell type:markdown id:f8234141-0eeb-49da-8d37-892a6416e41d tags:
Since we only have one weight matrix to tune, we can avoid learning with backpropagation entirely. Consider the mean squared error for the whole dataset and a one-dimensional binary label $y_i$ for each data point for simplicity. The mean squared loss for the dataset is
$$ \sum_{i=1}^n (\hat{{y}}_{i} - {y}_{i})^2= ||(\mathbf{A}\mathbf{w} - \mathbf{y})||^2.$$ Here $\mathbf{A} \in \mathbb{R}^{n \times c}$ is the matrix that contains the outputs (activations) of the first layer. From a linear algebra perspective, we are looking for a matrix $\mathbf{w}$ that solves the linear system $ \mathbf{A}\mathbf{w} = \mathbf{y}.$
---
%% Cell type:markdown id:4e691348-afe4-4021-a42b-3985a4e9999e tags:
<!-- BEGIN QUESTION -->
4. Can we find solutions to this system (justify) and how ?
%% Cell type:markdown id:cf9e153e tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:express-office tags:otter_assign_solution_cell
The system is overdetermined. We have more equations (1 equation per data point) than unknown variables (1 variable for each template). We can find approximate solutions with a [least squares](https://en.wikipedia.org/wiki/Overdetermined_system#Approximate_solutions) approach, i.e., using the pseudoinverse.
%% Cell type:markdown id:subsequent-exercise tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
5. Based on your answer above, compute the weights of the neural network that best classify the data points of the training set.
%% Cell type:markdown id:6e7fb43a tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:code id:collected-renaissance tags:otter_assign_solution_cell
``` python
#calculate the weights of the linear layer
weights_lsq = np.linalg.pinv(activations)@train_labels # SOLUTION
```
%% Cell type:markdown id:perfect-intersection tags:otter_assign_solution_cell
Using the weights you computed, classify the points in the training set and print the accuracy.
%% Cell type:code id:chinese-foster tags:otter_assign_solution_cell
``` python
#predict the labels of each image in the training set and compute the accuracy
train_prediction_lsq = activations@weights_lsq # SOLUTION
print(f"The accuracy on the training set is: {get_accuracy(train_prediction_lsq, int_labels_train, train_set_size)*100} %")
```
%% Output
The accuracy on the training set is: 91.43 %
%% Cell type:markdown id:determined-start tags:otter_assign_solution_cell
Using the weights you computed, classify the points in the test set and print the accuracy.
%% Cell type:code id:young-invitation tags:otter_assign_solution_cell
``` python
#calculate the activations of the test set
test_activations = get_activations(test_images, rand_centers, sigma)
```
%% Cell type:code id:naked-dover tags:otter_assign_solution_cell
``` python
#predict the accuracy on the test set
test_predictions_lsq = test_activations@weights_lsq # SOLUTION
print(f"The accuracy on the test set is: {get_accuracy(test_predictions_lsq, int_labels_test, test_set_size)*100} %")
```
%% Output
The accuracy on the test set is: 91.75 %
%% Cell type:markdown id:handmade-warrant tags:
<!-- END QUESTION -->
<!-- BEGIN QUESTION -->
### 6. **Open ended**: On the choice of templates.
Suggest a different or more refined way to select templates for the RBF network and implement it. Check how it compares with your original approach.
Check how it works with the backpropagation and linear system solutions.
%% Cell type:markdown id:4b05dc5e tags:otter_answer_cell
_Type your answer here, replacing this text._
%% Cell type:markdown id:c84c3a52-26bd-49d5-b92c-724e6f309b82 tags:otter_assign_solution_cell
Here are a few things that can be tried in order to better select the template images:
- split up the images based on label and sample an equal amount of images from each class. This ensures that there won't be imbalances in the representations of templates/centers.
- split up the images based on label and furthermore subdivide the images of each class into a few groups. Compute the average image in each group. You will have a few "average images" in each class. Use those as centroids/templates. (All of this averaging stuff is easily done with matrix ops)
- filter the images (similar to how it was done in previous notebooks) and then sample extra centers from the filtered data. Use those as additional centers.
- apply standard clustering techniques like [K-Means](https://en.wikipedia.org/wiki/K-means_clustering) and [Spectral Clustering](https://en.wikipedia.org/wiki/Spectral_clustering)
%% Cell type:markdown id:8734c27e tags:
<!-- END QUESTION -->