{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lab 7: Non-negative Matrix Factorization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The goal of this lab session is to code a NMF algorithm and use it in some applications.\n",
    "\n",
    "You have to send the filled notebook named **\"L7_familyname1_familyname2.ipynb\"** (groups of 2) by email to aml.centralesupelec.2019@gmail.com before 23:59 on December 5, 2018 and put **\"AML-L7\"** in the subject. \n",
    "\n",
    "We begin with the standard imports:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "%matplotlib inline\n",
    "sns.set_context('poster')\n",
    "sns.set_color_codes()\n",
    "plot_kwds = {'alpha' : 0.25, 's' : 80, 'linewidths':0}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NMF"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Non-negative Matrix Factorization is a model where a matrix V is factorized into two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to interpret.\n",
    "\n",
    "Fill in the following class that implements a NMF by multiplicative updates using the Frobenius norm or the Kullback-Leiber divergence as loss function (implement both), you can add more methods if needed. Try 10 different random initializations and choose the best one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class my_NMF():\n",
    "    \n",
    "    def __init__(self, n_components, loss, epsilon, max_iter = 60):\n",
    "        '''\n",
    "        Attributes:\n",
    "        \n",
    "        n_components_ : integer\n",
    "            the unknown dimension of W and H\n",
    "        max_iter_: integer\n",
    "            maximum number of iterations\n",
    "        epsilon_: float\n",
    "            convergence\n",
    "        loss_ = {\"Frobenius\", \"KL\"}\n",
    "        w_: np.array\n",
    "            W Matrix factor\n",
    "        H_: np.array\n",
    "            H Matrix factor\n",
    "        '''\n",
    "        self.n_components_ = n_components\n",
    "        self.max_iter_ = max_iter\n",
    "        self.loss_ = loss\n",
    "        self.epsilon_ = epsilon\n",
    "        self.W_ = None\n",
    "        self.H_ = None\n",
    "        \n",
    "    def fit_transform(self, X):\n",
    "        \"\"\" Find the factor matrices W and H\n",
    "        \n",
    "        Parameters:\n",
    "        -----------\n",
    "        X: (n, p) np.array\n",
    "            Data matrix\n",
    "        \n",
    "        Returns:\n",
    "        -----\n",
    "        self\n",
    "        \"\"\"        \n",
    "        # TODO:\n",
    "        # initialize both matrices\n",
    "        #            random(0, 1)\n",
    "\n",
    "        # While not(convergence):\n",
    "        #     Update W\n",
    "        #     Update H\n",
    "        \n",
    "        # Return self"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Bonus (not graded)**: Implement the regularized version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Applications"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### First application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the first application you are going to analyse the following data to give an interpretation of the factorization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import fetch_olivetti_faces\n",
    "\n",
    "dataset = fetch_olivetti_faces(shuffle=True)\n",
    "\n",
    "faces = dataset.data\n",
    "image_shape = (64, 64)\n",
    "\n",
    "n_samples, n_features = faces.shape\n",
    "\n",
    "def plot_faces(title, images, image_shape, n_col=5, n_row=5, cmap=plt.cm.gray):\n",
    "    plt.figure(figsize=(2. * n_col, 2.26 * n_row))\n",
    "    plt.suptitle(title, size=16)\n",
    "    for i, comp in enumerate(images):\n",
    "        plt.subplot(n_row, n_col, i + 1)\n",
    "        vmax = max(comp.max(), -comp.min())\n",
    "        plt.imshow(comp.reshape(image_shape), cmap=cmap,\n",
    "                   interpolation='nearest',\n",
    "                   vmin=-vmax, vmax=vmax)\n",
    "        plt.xticks(())\n",
    "        plt.yticks(())\n",
    "    plt.subplots_adjust(0.01, 0.05, 0.99, 0.93, 0.04, 0.)\n",
    "    \n",
    "plot_faces(\"Some faces\", faces[:25], image_shape)\n",
    "\n",
    "faces.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Apply your NMF algorithm for this dataset and plot the approximated face pictures."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Plot the $W$ matrix as images in a $(\\sqrt{r}, \\sqrt{r})$ grid\n",
    "- Choose one face, plot its corresponding weights (in $H$) in a grid  and explain the interpretation of both factor matrices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# TODO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Second application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Import the 20newsgroups dataset (from sklearn.datasets import fetch_20newsgroups_vectorized) that contains a collection of ~18,000 newsgroup documents from 20 different newsgroups.\n",
    "\n",
    "Model the topics present in a subsample with NMF. Print the most common words of each topic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# TODO"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}