# Lab 3: Stochastic Gradient Descent

The goal of this lab session is to code an optimization algorithm that optimzes the penalized loss function of the logistic regression model.

You have to send the filled notebook named **"L3_familyname1_familyname2.ipynb"** (groups of 2) by email to aml.centralesupelec.2019@gmail.com by October 17, 2019. Please put **"AML-L3"** in the subject. 

We begin with the standard imports:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
import pandas as pd

We import the dataset that we are going to use, an indian dataset including in the last column information about the diabetes status of patients:

In [None]:
from sklearn import model_selection

diabetes_data = pd.read_csv("diabetes_data.csv", sep=",")

diabetes_train, diabetes_test = model_selection.train_test_split(diabetes_data)
diabetes_train_x = diabetes_train.iloc[:, :-1].values
diabetes_train_y = diabetes_train.iloc[:, -1].values
diabetes_train_y[diabetes_train_y == 0] = -1

diabetes_test_x = diabetes_test.iloc[:, :-1].values
diabetes_test_y = diabetes_test.iloc[:, -1].values
diabetes_test_y[diabetes_test_y == 0] = -1

## Logistic Regression



Today we’ll be moving from linear regression to logistic regression, one of the simplest ways to deal with a classification problem. Instead of fitting a line, logistic regression models the probability that the outcome is 1 given the value of the predictor. In order to do this we need a function that transforms our predictor variable to a value between 0 and 1. Lots of functions can do that, but the logistic function is the most common choice:

$$f(z) = \frac{1}{1+\exp{-z}}.$$

To predict the class of our observations we'll have to minimize the corresponding loss function and as we are in a high-dimensional context we'll add an $l_2$ regularization to the model:

$$L(\textbf{w}) = \sum_{i=1}^n log(1+\exp(-y_i\textbf{w}^Tx_i))+\frac{\lambda}{2} \| \textbf{w} \|^2,$$

where $x_i$ is the vector of features for the observation $i$ and $y_i \in \{-1, 1\}$ is the class label. 


We first use the `sklearn` implementation:

In [None]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(penalty="l2", C=2) 
model.fit(diabetes_train_x, diabetes_train_y)
y_pred = model.predict(diabetes_test_x)

and we compute the accuracy score to evaluate the model performance:

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(diabetes_test_y, y_pred)

### Assignment

Implement from scratch your own logistic regression model with stochastic gradient descent optimization. 

- Fill in the class

- Display the evolution of the cost function along iterations. Do this for several strategies for the setting of the learning rate

- Try the different acceleration strategies

- Train the model with the training set and evaluate its performance in the test set

In [None]:
class StochasticLogisticRegression():
 """ Class for logistic regression:
 
 Attributes:
 -----------
 coef_: 1-dimensional np.array
 coefficients 
 alpha_: float
 regularization parameter
 lr_: float
 the learning rate
 bsize: integer
 the size of the mini-batch >=1
 coef_history_: list
 the list of all visited betas
 f_history_: list 
 the list of all evaluations in visited betas
 """
 def __init__(self, alpha):
 self.coef_ = None
 self.alpha_ = alpha
 self.lr_ = None
 self.bsize_ = None
 self.coef_history_ = []
 self.f_history_ = []

 def logistic(self, z):
 # logistic function
 
 def fit(self, X, y, start, lr=1e-1, bsize=50, max_iter=500):
 """ Fit the data (X, y).
 
 Parameters:
 -----------
 X: (num_samples, num_features) np.array
 Design matrix
 y: (num_sampes, ) np.array
 Output vector
 
 Note:
 -----
 Updates self.coef_
 """ 
 
 def f_lr(beta):
 '''evaluate the F=\sum_{i=1}^n f_i in beta'''
 
 
 def predict(self, X):
 """ Make binary predictions for data X.
 
 Parameters:
 -----------
 X: (num_samples, num_features) np.array
 Design matrix
 
 Returns:
 -----
 y_pred: (num_samples, ) np.array
 Predictions (0 or 1)
 """

Apply to the data

Comment the results

Implement only one acceleration method and compare the results