{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 1: Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to the advanced Machine Learning Course.\n", "\n", "The objective of this lab session is to code a few regression algorithms and to apply them to synthetic and real datasets.\n", "\n", "We begin with the standard imports:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# import libraries\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns; sns.set()\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Simple Linear Regression\n", "\n", "We will start with the most familiar linear regression, a straight-line fit to data.\n", "A straight-line fit is a model of the form\n", "$$\n", "y = ax + b\n", "$$\n", "where $a$ is commonly known as the *slope*, and $b$ is commonly known as the *intercept*.\n", "\n", "Consider the following data, which is scattered about a line with a slope of 2 and an intercept of -5:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Simulate data\n", "rng = np.random.RandomState(1)\n", "x = rng.rand(30)\n", "slope, intercept, noise = 2, -5, 0.1 * rng.randn(30)\n", "y = slope * x + intercept + noise\n", "\n", "# Plot data\n", "fig, ax = plt.subplots()\n", "ax.scatter(x, y)\n", "ax.set(title='Data', \n", " xlabel='x', ylabel='y')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fill in the MultivariateLinearRegression class whose method 'fit' takes a matrix $X$ and vector $y$ as input and returns an vector of coefficients $\\beta$" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "class MultivariateLinearRegression():\n", " # Class for least-squares linear regression:\n", "\n", " def __init__(self,):\n", " self.coef_ = None\n", " \n", " def fit(self, X, y):\n", " \"\"\" Fit the data (X, y).\n", " \n", " Parameters:\n", " -----------\n", " X: (n_samples, n_samples) np.array\n", " Design matrix\n", " y: (n_samples, ) np.array\n", " Output vector\n", " \n", " Note:\n", " -----\n", " Updates self.coef_\n", " \"\"\"\n", " # Create a (num_samples, num_features+1) np.array X_aug whose first column \n", " # is a column of all ones (so as to fit an intercept).\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features + 1))\n", " X_aug[:, 1:] = X\n", " \n", " # Update self.coef_\n", " self.coef_ = np.linalg.pinv(X_aug.T @ X_aug) @ X_aug.T @ y\n", " \n", " \n", " def predict(self, X):\n", " \"\"\" Make predictions for data X.\n", " \n", " Parameters:\n", " -----------\n", " X: (num_samples, num_features) np.array\n", " Design matrix\n", " \n", " Returns:\n", " -----\n", " y_pred: (num_samples, ) np.array\n", " Predictions\n", " \"\"\"\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features + 1))\n", " X_aug[:, 1:] = X\n", "\n", " y_pred = X_aug @ self.coef_\n", " return(y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try your model on the data and plot the data points and the fitted line:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Train model\n", "model = MultivariateLinearRegression()\n", "model.fit(x[:, np.newaxis], y)\n", "\n", "# Visualize the linear fit\n", "xfit = np.linspace(0, 1, 1000)\n", "yfit = model.predict(xfit[:, np.newaxis])\n", "\n", "fig, ax = plt.subplots()\n", "ax.scatter(x, y, label='Truth')\n", "ax.plot(xfit, yfit, label='Linear Fit')\n", "ax.set(title='Data', \n", " xlabel='x', ylabel='y')\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Print the slope and the intercept:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True slope: 2\n", "True intercept: -5 \n", "\n", "Model slope: 1.9292055341290553\n", "Model intercept: -4.976046835178198\n" ] } ], "source": [ "print(f\"True slope: {slope}\")\n", "print(f\"True intercept: {intercept} \\n\")\n", "\n", "print(f\"Model slope: {model.coef_[1]}\")\n", "print(f\"Model intercept: {model.coef_[0]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the results are very close to the inputs, as we might hope.\n", "\n", "Of course our linear regression estimator is much more capable than this, however—in addition to simple straight-line fits, it can also handle multidimensional linear models of the form\n", "$$\n", "y = a_0 + a_1 x_1 + a_2 x_2 + \\cdots\n", "$$\n", "where there are multiple $x$ values.\n", "Geometrically, this is akin to fitting a plane to points in three dimensions, or fitting a hyper-plane to points in higher dimensions.\n", "\n", "The multidimensional nature of such regressions makes them more difficult to visualize, but we can see one of these fits in action by building a toy example:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model coefficients: [ 0.5 1.5 -2. 1. ]\n", "True coefficients: [ 0.5 1.5 -2. 1. ]\n" ] } ], "source": [ "# Simulate data\n", "rng = np.random.RandomState(1)\n", "X = 3 * rng.rand(100, 3)\n", "coef = np.array([0.5, 1.5, -2., 1.])\n", "y = coef[0] + np.dot(X, coef[1:])\n", "\n", "# Fit model\n", "model.fit(X, y)\n", "\n", "# Assess quality-of-fit\n", "print(\"Model coefficients: \", model.coef_) \n", "print(\"True coefficients: \", coef)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here the $y$ data is constructed from three random $x$ values, and the linear regression recovers the coefficients used to construct the data.\n", "\n", "In this way, we can our estimator to fit lines, planes, or hyperplanes to our data.\n", "It still appears that this approach would be limited to strictly linear relationships between variables, but it turns out we can relax this as well.\n", "\n", "## 2. Basis Function Regression\n", "\n", "One trick you can use to adapt linear regression to nonlinear relationships between variables is to transform the data according to *basis functions*.\n", "\n", "The idea is to take our multidimensional linear model:\n", "$$\n", "y = a_0 + a_1 x_1 + a_2 x_2 + a_3 x_3 + \\cdots \n", "$$\n", "and replace the variables $(x_1, x_2, x_3, ...)$ with *transformations* of them, for example $(x_1^2, x_1 x_2, x_3^2, ...)$.\n", "\n", "Notice that this is *still a linear model*—the linearity refers to the fact that the coefficients $a_n$ never multiply or divide each other.\n", "What we have effectively done is to create a *nonlinear* dictionary of terms, so that a linear model can fit more 'complicated' relationships between $x$ and $y$.\n", "\n", "### Polynomial basis functions\n", "\n", "This polynomial projection is useful enough that it is built into Scikit-Learn, using the ``PolynomialFeatures`` transformer:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 2., 4., 8., 16., 32.],\n", " [ 3., 9., 27., 81., 243.],\n", " [ 4., 16., 64., 256., 1024.]])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.preprocessing import PolynomialFeatures\n", "x = np.array([2, 3, 4])\n", "poly = PolynomialFeatures(5, include_bias=False) # with or without intercept\n", "poly.fit_transform(x[:, None])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see here that the transformer has converted our one-dimensional array into a three-dimensional array by taking the exponent of each value.\n", "This new, higher-dimensional data representation can then be plugged into a linear regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this transform, we can use the linear model to fit much more complicated relationships between $x$ and $y$. \n", "For example, here is a sine wave with noise:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Simulate data\n", "rng = np.random.RandomState(1)\n", "x = rng.rand(50)\n", "noise = 0.1 * rng.randn(50)\n", "y = 2 * np.sin(1.8*np.pi*x) + noise\n", "\n", "# Fit Model\n", "poly = PolynomialFeatures(degree=25, include_bias=False)\n", "polyx = poly.fit_transform(x[:, None]) # apply a polynomial transform to 1 feature\n", "polyxfit = poly.fit_transform(xfit[:, None])\n", "model.fit(polyx, y)\n", "yfit = model.predict(polyxfit)\n", "\n", "# Visualize \n", "fig, ax = plt.subplots()\n", "ax.scatter(x, y, label=\"Truth\")\n", "ax.plot(xfit, yfit, label=\"Fit\")\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try with different maximum degrees. Our linear model can provide an excellent fit to this non-linear data!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Regularization\n", "\n", "The introduction of basis functions into our linear regression makes the model much more flexible, but it also can very quickly lead to over-fitting and numeric problems." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ridge regression ($L_2$ Regularization)\n", "\n", "Perhaps the most common form of regularization is known as *ridge regression* or $L_2$ *regularization*, sometimes also called *Tikhonov regularization*.\n", "This proceeds by penalizing the sum of squares (2-norms) of the model coefficients; in this case, the penalty on the model fit would be \n", "$$\n", "P = \\alpha\\sum_{n=1}^N \\theta_n^2\n", "$$\n", "where $\\alpha$ is a free parameter that controls the strength of the penalty." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Fill in the following class:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "class RidgeRegularization():\n", " # Class for least-squares linear regression:\n", "\n", " def __init__(self, alpha):\n", " self.coef_ = None\n", " self.alpha_ = alpha\n", " \n", " def fit(self, X, y):\n", " \"\"\" Fit the data (X, y).\n", " \n", " Parameters:\n", " -----------\n", " X: (num_samples, num_features) np.array\n", " Design matrix\n", " y: (num_sampes, ) np.array\n", " Output vector\n", " \n", " Note:\n", " -----\n", " Updates self.coef_\n", " \"\"\"\n", " # Create a (num_samples, num_features+1) np.array X_aug whose first column \n", " # is a column of all ones (so as to fit an intercept).\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features + 1))\n", " X_aug[:, 1:] = X\n", " \n", " # Update self.coef_\n", " self.coef_ = np.linalg.inv(X_aug.T @ X_aug + self.alpha_ * np.eye(n_features+1)) @ X_aug.T @ y\n", " \n", " def predict(self, X):\n", " \"\"\" Make predictions for data X.\n", " \n", " Parameters:\n", " -----------\n", " X: (num_samples, num_features) np.array\n", " Design matrix\n", " \n", " Returns:\n", " -----\n", " y_pred: (num_samples, ) np.array\n", " Predictions\n", " \"\"\"\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features + 1))\n", " X_aug[:, 1:] = X\n", "\n", " y_pred = X_aug @ self.coef_\n", " return(y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try the model in our data:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "alpha = 1e-4\n", "\n", "# Fit polynomial-basis regression with L2 regularization\n", "poly = PolynomialFeatures(300, include_bias=False) \n", "holu = poly.fit_transform(x[:, None]) \n", "model = RidgeRegularization(alpha) \n", "model.fit(holu, y)\n", "\n", "pred = model.predict(poly.fit_transform(xfit[:, None]))\n", "\n", "# Visualize\n", "fig, ax = plt.subplots()\n", "ax.scatter(x[:, np.newaxis], y, label='Truth')\n", "ax.plot(xfit, pred, label='Fit')\n", "ax.set(xlim=(-0.02, 1.02), \n", " ylim=(-2.5, 2.5))\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The $\\alpha$ parameter is essentially a knob controlling the complexity of the resulting model.\n", "In the limit $\\alpha \\to 0$, we recover the standard linear regression result; in the limit $\\alpha \\to \\infty$, all model responses will be suppressed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lasso regression ($L_1$ Regularization)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "class LassoRegression(): \n", " # Class for least-squares linear regression:\n", " # with coordinate descent algorithm\n", " \n", " def __init__(self, alpha):\n", " self.coef_ = None\n", " self.alpha_ = alpha\n", " \n", " def soft_threshold(self, alpha, x):\n", " if x > alpha:\n", " return x - alpha\n", " elif x < -alpha:\n", " return x + alpha\n", " else:\n", " return 0\n", " \n", " def fit(self, X, y):\n", " \"\"\" Fit the data (X, y).\n", " \n", " Parameters:\n", " -----------\n", " X: (num_samples, num_features) np.array\n", " Design matrix\n", " y: (num_sampes, ) np.array\n", " Output vector\n", " \n", " Note:\n", " -----\n", " Updates self.coef_\n", " \"\"\"\n", " # Create a (num_samples, num_features+1) np.array X_aug whose first column \n", " # is a column of all ones (so as to fit an intercept).\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features+1))\n", " X_aug[:, 1:] = X\n", " X = X_aug\n", " \n", " self.coef_ = np.zeros((n_features+1, ))\n", " \n", " self.coef_[0] = np.mean(y)\n", " \n", " convergence = False\n", " \n", " while not(convergence):\n", " newcoef = np.copy(self.coef_)\n", " # coordinate descent\n", " for j in range(1, n_features + 1):\n", " exclude_j = np.array(range(n_features + 1)) != j\n", " partial_residuals = y - X[:, exclude_j] @ newcoef[exclude_j] \n", " beta = self.soft_threshold(self.alpha_, X[:, j] @ partial_residuals)/(X[:, j]**2).sum()\n", " newcoef[j] = beta\n", " newcoef[0] = np.sum(y - (X[:, 1:] @ newcoef[1:])) / (X.shape[0])\n", " convergence = np.inner(self.coef_ - newcoef, self.coef_ - newcoef) < 10**(-5) \n", " self.coef_ = newcoef\n", "\n", " def predict(self, X):\n", " \"\"\" Make predictions for data X.\n", " \n", " Parameters:\n", " -----------\n", " X: (num_samples, num_features) np.array\n", " Design matrix\n", " \n", " Returns:\n", " -----\n", " y_pred: (num_samples, ) np.array\n", " Predictions\n", " \"\"\"\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features + 1))\n", " X_aug[:, 1:] = X\n", "\n", " y_pred = X_aug @ self.coef_\n", " return(y_pred)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "alpha = 1e-4\n", "\n", "# Fit polynomial-basis regression with L1 regularization\n", "model = LassoRegression(alpha)\n", "poly = PolynomialFeatures(degree=300, include_bias=False) \n", "holu = poly.fit_transform(x[:, None]) \n", "\n", "model.fit(holu, y)\n", "\n", "pred = model.predict(poly.fit_transform(xfit[:, None]))\n", "\n", "# Visualize\n", "fig, ax = plt.subplots()\n", "ax.scatter(x[:, np.newaxis], y, label='Truth')\n", "ax.plot(xfit, pred, label='Fit')\n", "ax.set(xlim=(-0.02, 1.02),\n", " ylim=(-2.5, 2.5))\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Robust regression" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "class RobustRegression():\n", " # Class for least-squares linear regression:\n", "\n", " def __init__(self, potential, k):\n", " self.coef_ = None\n", " self.potential_ = potential\n", " self.k_ = k\n", " \n", " def mad(self, x):\n", " '''Computes the Median absolute deviation. \n", " It is the L1 equivalent of Variance.'''\n", " return(np.median(np.abs(x-np.median(x))))\n", " \n", " def weight_function(self, x, potential, k):\n", " if potential == \"huber\":\n", " if np.abs(x)<=k:\n", " return(1)\n", " else:\n", " return(k/np.abs(x))\n", " if potential == \"bisquare\":\n", " if np.abs(x)<=k:\n", " return((1-(x/k)**2)**2)\n", " else:\n", " return(0)\n", " \n", " def fit(self, X, y):\n", " \"\"\" Fit the data (X, y).\n", " \n", " Parameters:\n", " -----------\n", " X: (num_samples, num_features) np.array\n", " Design matrix\n", " y: (num_sampes, ) np.array\n", " Output vector\n", " \n", " Note:\n", " -----\n", " Updates self.coef_\n", " \"\"\"\n", " # Create a (num_samples, num_features+1) np.array X_aug whose first column \n", " # is a column of all ones (so as to fit an intercept).\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features + 1))\n", " X_aug[:, 1:] = X\n", " \n", " self.coef_ = np.linalg.inv(X_aug.T @ X_aug ) @ X_aug.T @ y\n", " \n", " convergence = False\n", " \n", " while not(convergence):\n", " \n", " residuals = y - X_aug @ self.coef_\n", " norm_residuals = residuals/self.mad(residuals) # normalize with MAD\n", " weight_matrix = np.diag(np.array([self.weight_function(ri, self.potential_, self.k_) for ri in norm_residuals]))\n", " newcoef = np.linalg.inv(X_aug.T @ weight_matrix @ X_aug) @ X_aug.T @ weight_matrix @ y\n", " \n", " convergence = np.inner(self.coef_-newcoef, self.coef_-newcoef) < 10**(-5) \n", " self.coef_ = np.copy(newcoef)\n", " \n", " def predict(self, X):\n", " \"\"\" Make predictions for data X.\n", " \n", " Parameters:\n", " -----------\n", " X: (num_samples, num_features) np.array\n", " Design matrix\n", " \n", " Returns:\n", " -----\n", " y_pred: (num_samples, ) np.array\n", " Predictions\n", " \"\"\"\n", " n_samples, n_features = X.shape\n", " X_aug = np.ones((n_samples, n_features + 1))\n", " X_aug[:, 1:] = X\n", "\n", " y_pred = X_aug @ self.coef_\n", " return(y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try it in the following data and compare with the performance of the different models:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD7CAYAAACPDORaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAdv0lEQVR4nO3de3RT14Eu8O9IsiU/ZVsWGAgxMUnghiQNKTMNCYU4pYB4GGKXhEBy86CQ3KySZrW3JYWsBasZ1zRk2rBg2oSZUKBDZ1KyGEzaOKGB4E4Cd6VhyAMHQoak5TEYy5YfkpFkPfb9w0ixLNmSpSPL2vp+a7FA58jn7I3g0zn77IcihBAgIiIpaVJdACIiSh6GPBGRxBjyREQSY8gTEUmMIU9EJDGGPBGRxBjyREQS06W6AP21t3fD70+s677JlI+2NodKJUoPmVZn1ld+mVbneOur0SgoLs4bcP+IC3m/XyQc8oHjZJpMqzPrK79Mq3My6svmGiIiiTHkiYgkNuKaayIRQqC93YqeHheA6LczLS0a+P3+5BdsWCjIzjaguNgMRVFSXRgiSjNpEfIORycURcHo0ddAUaLffOh0Gni9coS8EH50dLTC4ehEQUFRqotDRGkmLULe6XSgpGR0TAEvG0XRoKCgGDbbZYY8URo51tSMfY1n0dblhqlQj+pZEzF9StmwlyMtQt7v90GrTYuiJoVWq4Pf70t1MYjS3nAF77GmZuxqOI2eqy0KbV1u7Go4DQDDHvRpc2mcye3RmVx3IrUEgretyw3gq+A91tSs+rn2NZ4NBnxAj9ePfY1nQ7YJIdDZ3YMvL3XB1eNVvRxAmlzJjzSXLv0PHnigGhMmVIRsv/HGSZg5827MmDELa9Y8jq1bX05RCYmov8GCV+2r68AXSaTtew6egbXTidZOF1o7nejx9JbpO/fcgPl/P17VcgAM+biVlpqxc+fvBtx/4sTxYSwNEUUzWPAmwuP1obXTBWtHb2i3driQrdOEfaEEHG1qhtloQFlJLm6+rgSlRgNKi3Lwza+Ph73TmVBZImHIq6i2diOmTv06zpzpbXtbteph/PM/70pxqYgIADQKEGlAqSZKa6jP70d7lxvWThdaO5y9v18Nc2unE52OnpD3Z+k0yDPo4OnuQd/FVXVaBctn34i7p46LeB5Dtg72oVYqBmkX8u99cgnvfnxp0PcoChDPyrUzbh2Du24ZE9N7W1uteOSR5cHXc+bMC/756ad/hNdee5UBTzSCDDRjgF8AHQ53MLRbOwJNKS5YO5ywdbnh7xMoGkVBSaEepUYDbqkwwXz1StxszEFpkQGFednQKAp716S7SM01tbUbU1MYIhpUt8sDY14WOrs9Eff/YNt7Ia+NedkoLTLg+nFGlE4xoNSYEwzz4gI9dNrofVamTylLSaj3l3Yhf9ct0a+2ZRoMRTTSjJQr1L7cnt528cBVuDVwNX61ecXpjr3nSpZWwX33XJ/yOqkl7UI+XWi1Wni9Xuh0/Cum9BEtwFPV/9vr88Nmdw8Y4l3doe3i2ToNTEYDzEU5uP4aI0qNOXjj//0VDmf0sPf4RFJ63KQKEyhJZsyYiUceWY5XXvkt9Hp9qotDFFUsAZ6sboh+IdDp6Al5oOlw+XDhchesHS7Y7K6Q52xaTaBdPAe3XW9C6dX28ECzSmFedtj4kt+/898xlyfRHjcjCUM+DmPGjMVrr70etn39+o3BP9fWbh7GEhElLpYAj7cbohAC3S5vyBV4a2fgQWfvQ06vL7yJVadVMKGsEHfeXIbSIkPw4WZxgR5azdDGcpoK9TGHt6lQngszhjylvZHYRpyOYgnwgYLSVKiHu8cXDO2vwtsZ7D/u6gmdmiPPoENpUQ6uMefhthtKYTYaYO104dAH5+Hx9V62e30C5y7bUXn7uIQ/0+pZE0PuVIDeLxHhF/D1uUvI1mlQPWtiQucaSRjylNZG0hwh6W6wAAd628W/NW089h05C2+f/ogKgG6XF//nF40hP5edpem98jYaMOnaomDvlNKrbeU5+vD4+dGv3gsGfIBao1L7Njn1vSCItE2mfzsMeUprwzlUXXb3zqzArobTISGrUYAsnRb/91fvod3uDht/olGAMaY8TBxnhPlqm3igWaUgN2vI8y4la1RqwEDdGmX+t8KQp7SW7FCQiRACDqcH75y4iLc/OA+H0wt9lgalRTmwX/GE9VABgBy9DgW5WSg1FgZDPPB7cYEemmjDRYco2t0EDR1DntIaQyGU0+0N6Vr41ejN3tfufu3ibo8fF63d6B/VWVoFD82dhBm3jh2+wiNyu7lsbeTDjSFPaS3TQsHj9aOtKzTE+4a5wxk6olOfre1tCzfmYHJ5MY5+cglX3OFrE/Qf8e/xCdS/++Wwh3zfdnNblxslEraRDzeGPKW1gR6mpWso+P0C7XZ3SK8Ua4cLnVd6cKm1Gx12d0gg67QKTIW9DzQnlBWEPNgsNRqQnxPaLv72BxdiLkuqmrwC7eZmcwGs1mRM2ZVZkhbyn376Ke677z6cPHkyWadIiX/8x5/jk08+gtfrwYUL54Nzyi9dugwLFlRF/fl33/0zLlw4h2XLHsQrr/TON79y5eNJLbPsRsocIX0N1K1TCAH7FU/ELoatHS60dbng69dzpbhQjzGl+bipvDgsxIsK9NAM4eFmpvYVz2RJCXmn04mf/vSn8HgiTwY0HJLVd/qHP1wLoHfhkDVrHh90TvlIPvvsVMJloJGt8cOL2POnM/Be7aXS1uXGv/zhU7x25Cy6XZ7gIhEBvQ82czBhTAGmTR4VMujHVGiATqtR7ao2U/uKZ7KkhPymTZvwyCOP4MSJE8k4fFRHT14a9r7Tr7zyMpqaTqKlpRnV1ffh8OE/4bHHVuP226cFvxA2b96C+vp9AICyst5J1k6dasITTzwGq7UF8+cv4lV9Goi0SETfK/NuV/j8KEIADqcHlVPH9c6pEhyGb4Ahe/haTTO1r3gmU/1f16FDh+ByuTBv3rzob06Sve+kpu90T48b//qvewEAhw//KWz/dddVYPHiagDAggVVeOWVl2Gz2fDSSztw5coVfOc7C/HAAw8iNzcvaWWUlZp3bkNdJEKn1Vxd3ceAirGFeOfExYjH9Xj9WPatG+Iqk5oysa94Jos75BsaGlBXVxeyraKiAg6HAzt37oy7QCZTfti2lhYNdLrY56lo63RF3t7lHtJxBqO9Op904HgajYKbb74l+FpRFGi1veXu+95Av+LAn++88y7k5hqQm2tAUVERurvtKCwsCDufRtN7yz6YaPtlE6jvr1/7EG8c+1twe1uXG7vf/AyFBQbc/fXwNTOFEOiwu3HZdgXNtiu4bOvG5bYruGzr/dXa4QxpF9coQGlRDkaX5OHvxhox2pSL0SVf/SouMIT0Fz/5Vxus7eHLuJmLcxL6jDLt8wUyr87JqG/cIW+xWGCxWEK27d27Fy+//DJWrFgR3LZ48WLs2bMH+fnh4R1JW5sD/n5LuPj9/iHND28yGiIGvalQr9o8876rkykFjuf3C2RlZYcc3+frLbfb3RN8b6BugT8rSt+57xV4vZHr6vf7B2yTPdbUjP3vfglruzNjbrUDbdTHmppDAj7A7fHhX+o/gcvZ81WzytUpats6XWF3eoV52TAbDZhQVoBpk8zBB5ulRTkoGWSRCH+PF21tjpBtS2ZcF7Fb55IZ18Xdrp6JPU0yrc7x1lejUSJeHAeo2lyzdOlSLF26NPh60qRJqK+vV/MUsZWjciJ2/OFUSvtOG41F+PLLs7j99mn4z/88Etyu1WrR0xM+sjBemTp3i9vjw/+0duPf3z4z4Hs6uz34p//o7d2Vo9fBbDRgjCmvd8m2PiFeajRAn6VVrWyydeuk9CZlP/k7bx4D39WJ/1P1n2zFiv+N2tqN+OMfD+Cb37w7uP22225Hbe1GlJSUqHKedJq7ZSjt5oMtEtHa6UJnhCH4/eXn6PDD+6eitMiAPEOW2tUZ1Ejs1kmZSREiniWvkydSc01z899QVlYe8zFkXP5voL+DxzYdHvBndjxzTzKLNCT97ziA3qHz8+4oR1lxbli/8XZ75MWTA1fg5eOMyNVp8O+HPkfXlchddVctukmaoM20pgsg8+qcFs01NPxSNXdLtKvywCIRgV4pew5+FnbH4fEJvP7eX4OvjfnZMBtzcMN4Y8jCyWajAcWFerx/qgX7Gs/i1N/acfp8B5bMuA73f+uGsC8PAKicOlaagCdKBEM+zaVi7pZIzwF2/PEUjjU1I0urCU6I5YwwR0oktau+AVOhAdmDtIv3P6e13YldDafxsGUyHrZMZvs30QDSJuSFEEOem1oWg7WoBcIsGb1rvD4/bF2ukAmwrB1O/NcZa3A0Z4DPL3DyCxvGluah1GjAjdcUfbXmZpEBW177CO328HZ0U6EeY0zRxwUM9uxh85N3MdSJBpAWIa/RaOHzeaHTDe/Ds5HC5/NCo9EO2EQyfUoZqu6+Ycjtef6r/cX7P9gMDP7pv0hEYPHk/gHf1z989xsRt3/n7usTuuPgvPFE8UmLkM/JyYfd3oGiIhMURZ3BTIlwOD1ot7vh8/mh1WpQXKBHfk5yvoCE8MNub8flDn94E8kfPsW/vX0GDqcX5uIcLJlxXVi7uMPpiRziHU60dblCAlsBUFSg712ubXzxVyv9XB3NWXJ10M+PfvXekJ8DJNqtkPPGE8UnLXrXCCHQ3m5FT48L4TNfh9NoNPD7k9O7xtXjQ7fTE1YKjaIg16CDIVud/tauHh+cbi+8PoHmdg/+8H4b7M7B27i1GgX/q7wYOq1mwEUi8nOygv3Dzf1+NxUakBXDiOBIPWWydRo8bJmctGaTVJxzJMm0niZA5tU5o3vXKIqCkpJRMb8/mf84BrqKBYYeOgM1v0QKtFj4/AInv7RhnDkPZmMOJl9bHBLipUZDxMWThyoVg336nzPSnQsRhUuLkFeLGpNYDdYGPJRBSAP1UDn+WQuavrQNOeD7em5l5HZxNaVisE/fc2baVR5RvDIm5NUa/h9t0YX++wZaJOLYyUvwROih8l9nWmMuy0DlIyIKyJiQV2v4f6R+6X3l6rXY86czfRZQdsHtCW0XL8jNCgv4vgb6IsnP0UGfpUVblxt5Bi3cHn/Ig9NULvSQrEVaiCgxGRPyiXTBCywS0drpgsvtxeTyYjR9aQuZjjbgituH9z65hFJjDkYV5+CmCSUhK/0EFokYrIfKQAOcHph9Y0hw9g3WVLZRZ+okaUTpIGNCfrAueH6/gM3uClvhJ9DVsCNskQgFpUU5yNIqsHY44fb4kZ+jw+xp43HP7dcgz6CLOnBrsJGqsT7YHClt1Ok0SRpRpsmIkBdCYN43yvHq4c9D+4UrvWH0+AtHQhdPVoCSAj1KjTmYcl1Jn6vwHJiLcmDMzx7S4smRRAvydJrFkAOViEYuqUL+WFMzXjvy32i390CfpcGo4lx4ff6Ii0QoCmC+unhy37nFzUYDSq4unpxs6RTkg+FAJaKRS5qQ798u7Pb4caHFgWvLCnD31HGhA4CMOdCrNGiJUjNJGhHFRpqQj9QuLAA4rvSMiMWTZcaVkIhGLmlCnu3CqSVL0xORbFI/25dKBmr/ZbswEWUyaUK+etZEZPebXIvtwkSU6aRprunbLmzrcqOE7cJERPKEPPBVuzAnryIi6iVNcw0REYVjyBMRSYwhT0QkMYY8EZHEGPJERBJjyBMRSYwhT0QkMYY8EZHEVA/5lpYWrF69GkuWLMGyZctw4cIFtU9BREQxUj3kf/zjH6OyshL79+/H4sWL8cILL6h9CiIiipGq0xrYbDacPn0av/nNbwAANTU1mD59upqnICKiIVD1Sv78+fMYO3Ysfvazn6GqqgpPPfUUsrKy1DwFERENgSKEENHfFq6hoQF1dXUh28rLy/GXv/wFv/71r1FZWYm9e/fiwIED+O1vf6tKYYmIaGjiDvlIzp07h3vvvRfHjx8HADidTtxxxx346KOPYj5GW5sDfn9iRcrEWSgzrc6sr/wyrc7x1lejUWAy5Q+8P5FC9Xfttddi9OjRaGxsBAC88847mDJlipqnICKiIVB9Pvlt27Zhw4YN2Lx5M/Lz87Fp0ya1T0FERDFSPeQrKirYBk9ENEJwxCsRkcQY8kREEmPIExFJjCFPRCQxhjwRkcQY8kREEmPIExFJjCFPRCQxhjwRkcQY8kREEmPIExFJTPW5a9LJsaZm7Gs8i7YuN0yFelTPmojpU8pSXSwiItVkbMgfa2rGrobT6PH6AQBtXW7sajgNAAx6IpJGxob8vsazwYAP6PH6sa/xLEOekoZ3jzTcMjbk27rcQ9pOlCjePVIqZOyDV1OhfkjbiRI12N0jUbJkbMhXz5qIbF1o9bN1GlTPmpiiEpHsePdIqZCxzTWB22O2j9JwMRXqIwY67x4pmTI25IHeoGeo03CpnjUxpE0e4N0jJV9GhzzRcOLdI6UCQ55oGPHukYZbxj54JSLKBAx5IiKJMeSJiCTGkCcikhhDnohIYgx5IiKJMeSJiCTGkCcikhhDnohIYqqH/IULF7BixQosXrwYDz30EC5evKj2KYiIKEaqh/yWLVuwYMEC1NfXY86cOfjlL3+p9imIiChGqoe83++Hw+EAADidThgMBrVPQUREMVKEEELNA547dw7Lli2DVquFx+PBq6++ivLycjVPQUREMYo75BsaGlBXVxeyraKiAm63GytXrsTs2bPx1ltvYdu2bThw4AAURYnpuG1tDvj9iX3vmM0FsFrtCR0j3WRanVlf+WVaneOtr0ajwGTKH3B/3FMNWywWWCyWkG02mw0WiwWzZ88GAMydOxcbNmxAe3s7SkpK4j0VERHFSdU2+eLiYuj1enzwwQcAgOPHjyMvL48BT0SUIqouGqIoCrZt24bnnnsOLpcLeXl52Lp1q5qnICKiIVB9Zahbb70Ve/fuVfuwREQUB454JSKSGEOeiEhiDHkiIokx5ImIJMaQJyKSGEOeiEhiDHkiIokx5ImIJMaQJyKSGEOeiEhiDHkiIokx5ImIJMaQJyKSGEOeiEhiDHkiIokx5ImIJMaQJyKSGEOeiEhiDHkiIokx5ImIJMaQJyKSGEOeiEhiDHkiIokx5ImIJMaQJyKSGEOeiEhiDHkiIokx5ImIJMaQJyKSWMIhv2XLFmzdujX4uqurC6tXr4bFYsGKFStgtVoTPQUREcUp7pC32+1Yt24dduzYEbL9xRdfxLRp09DQ0IClS5eitrY24UISEVF84g75Q4cOYcKECXj00UdDth85cgSLFi0CACxcuBB//vOf4fF4EislERHFJe6QX7JkCVavXg2tVhuyvaWlBWazGQCg0+mQn58Pm82WWCmJiCguumhvaGhoQF1dXci2iooK7Ny5M+aTaDSxf5eYTPkxv3cwZnOBKsdJJ5lWZ9ZXfplW52TUN2rIWywWWCyWmA84atQotLa2oqysDF6vFw6HA0VFRTH/fFubA36/iPn9kZjNBbBa7QkdI91kWp1ZX/llWp3jra9Gowx6cax6F8pZs2Zh//79AIA33ngD06ZNQ1ZWltqnISKiGES9kh+q73//+3jmmWewYMECFBQU4IUXXlD7FEREFKOEQ37NmjUhr4uKivDSSy8lelgiIlIBR7wSEUmMIU9EJDGGPBGRxBjyREQSY8gTEUmMIU9EJDGGPBGRxBjyREQSY8gTEUmMIU9EJDGGPBGRxBjyREQSY8gTEUmMIU9EJDGGPBGRxBjyREQSY8gTEUmMIU9EJDGGPBGRxBjyREQSY8gTEUmMIU9EJDGGPBGRxBjyREQSY8gTEUmMIU9EJDGGPBGRxBjyREQSY8gTEUmMIU9EJLGEQ37Lli3YunVr8PXZs2exfPlyLF68GPfffz9OnTqV6CmIiChOcYe83W7HunXrsGPHjpDtzz77LFatWoX6+no8/fTTWLt2bcKFJCKi+MQd8ocOHcKECRPw6KOPhmxfunQpZs6cCQCYNGkSLl26lFgJiYgobooQQiRygEBTzZo1a8L2bdy4EW63G3V1dYmcgoiI4qSL9oaGhoawkK6oqMDOnTsH/BkhBJ5//nl89NFH2L1795AK1NbmgN+f0PcOzOYCWK32hI6RbjKtzqyv/DKtzvHWV6NRYDLlD7g/ashbLBZYLJaYT+j1erF27VpcvnwZu3fvRkFBQcw/S0RE6ooa8kP185//HA6HAzt27EB2drbahycioiFQNeRtNhv27NmDa665BkuXLg1ur6+vV/M0REQUo4RDvu8D15KSEnz66aeJHpKIiFTCEa9ERBJjyBMRSYwhT0QkMYY8EZHEGPJERBJjyBMRSYwhT0QkMYY8EZHEGPJERBJjyBMRSYwhT0QkMYY8EZHEGPJERBJjyBMRSYwhT0QkMdVXhqLYHGtqxr7Gs2jrcsNUqEf1rImYPqUs1cUiIskw5FPgWFMzdjWcRo/XDwBo63JjV8NpAGDQE5Gq2FyTAvsazwYDPqDH68e+xrMpKhERyYohnwJtXe4hbSciihdDPgVMhfohbSciihdDPgWqZ01Eti70rz5bp0H1rIkpKhERyYoPXlMg8HCVvWuIKNkY8ikyfUoZQ52Iko7NNUREEmPIExFJjCFPRCQxhjwRkcRG3INXjUYZUcdJJ5lWZ9ZXfplW53jqG+1nFCGEiLdAREQ0srG5hohIYgx5IiKJMeSJiCTGkCcikhhDnohIYgx5IiKJMeSJiCTGkCcikhhDnohIYmkd8q+//jrmz5+Pb3/729izZ0/Y/lOnTqGmpgZz587F+vXr4fV6U1BK9USr79tvv43FixejqqoKTz75JDo7O1NQSnVFq3PAkSNHcM899wxjyZIjWn2/+OILPPTQQ6iqqsLKlSvT/jOOVt+mpibU1NSgqqoKjz/+OLq6ulJQSnU5HA4sXLgQFy5cCNuXlMwSaaq5uVlUVlaK9vZ20d3dLRYtWiQ+//zzkPcsWLBAnDhxQgghxE9+8hOxZ8+eVBRVFdHqa7fbxV133SWam5uFEEK8+OKL4rnnnktVcVURy2cshBBWq1XMmzdPVFZWpqCU6olWX7/fL+bMmSMaGxuFEEJs3rxZPP/886kqbsJi+XwfeOABceTIESGEEHV1deIXv/hFKoqqmg8//FAsXLhQTJkyRZw/fz5sfzIyK22v5I8ePYo77rgDRUVFyM3Nxdy5c/Hmm28G91+8eBEulwu33XYbAKC6ujpkf7qJVl+Px4ONGzdi9OjRAIBJkybh0qVLqSquKqLVOeDZZ5/F9773vRSUUF3R6tvU1ITc3FzMnDkTAPDEE09gxYoVqSpuwmL5fP1+P7q7uwEATqcTBoMhFUVVze9//3ts2LABo0aNCtuXrMxK25BvaWmB2WwOvh41ahQuX7484H6z2RyyP91Eq29xcTFmz54NAHC5XNi+fXvwdbqKVmcA2L17N2666SZ87WtfG+7iqS5afc+dO4fS0lKsXbsWixYtwoYNG5Cbm5uKoqoils/3mWeewfr16zFjxgwcPXoUy5YtG+5iqqq2thbTpk2LuC9ZmZW2IS8iTJ6pKErM+9NNrPWx2+1YtWoVJk+ejHvvvXc4ipY00ep85swZHDx4EE8++eRwFitpotXX6/Xi/fffx4MPPojXX38d48ePx6ZNm4aziKqKVl+Xy4X169dj165dePfdd7F8+XKsXbt2OIs4rJKVWWkb8qNHj0Zra2vwdUtLS8gtUP/9Vqs14i1SuohW38C25cuXY/LkyaitrR3uIqouWp3ffPNNWK1W1NTUYPXq1cH6p6to9TWbzSgvL8ctt9wCAFi4cCE+/vjjYS+nWqLV98yZM9Dr9bj11lsBAPfffz/ef//9YS/ncElWZqVtyN955504duwYbDYbnE4nDh48GGyrBIBx48ZBr9fj+PHjAID9+/eH7E830err8/nwxBNPwGKxYP369Wl91xIQrc5PPfUU3nrrLdTX12P79u0YNWoUfve736WwxImJVt+pU6fCZrPh9OnTAIDDhw9jypQpqSpuwqLVt7y8HM3Nzfjiiy8AAIcOHQp+wckoaZmV8KPbFDpw4IBYsGCBmDNnjti+fbsQQojvfve74uOPPxZCCHHq1ClRU1Mj5s2bJ37wgx8It9udyuImbLD6Hjx4UEyaNElUVVUFf61bty7FJU5ctM844Pz582nfu0aI6PX98MMPRU1NjZg/f7547LHHRGtrayqLm7Bo9T1y5IhYtGiRWLhwoXj44YfFuXPnUllc1VRWVgZ71yQ7s7gyFBGRxNK2uYaIiKJjyBMRSYwhT0QkMYY8EZHEGPJERBJjyBMRSYwhT0QkMYY8EZHE/j8DG7o+F7OF4QAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Simulate data\n", "np.random.seed(300)\n", "rng = np.random.RandomState(1)\n", "x = rng.rand(30)\n", "noise = 0.1 * np.random.standard_cauchy(30)\n", "y = 2 * x - 5 + noise\n", "\n", "# Fit model\n", "model = RobustRegression(\"huber\", 1.0)\n", "model.fit(x[:, np.newaxis], y)\n", "\n", "xfit = np.linspace(0, 1.0, 1000)\n", "yfit = model.predict(xfit[:, np.newaxis])\n", "\n", "# Visualize\n", "fig, ax = plt.subplots()\n", "ax.scatter(x, y, label='Truth')\n", "ax.plot(xfit, yfit, label='Fit')\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Application: Predicting Bicycle Traffic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As an example, let's take a look at whether we can predict the number of bicycle trips across Seattle's Fremont Bridge based on weather, season, and other factors.\n", "\n", "In this section, we joinned the bike data with another dataset, and try to determine the extent to which weather and seasonal factors—temperature, precipitation, and daylight hours—affect the volume of bicycle traffic through this corridor.\n", "\n", "We will perform a simple linear regression to relate weather and other information to bicycle counts, in order to estimate how a change in any one of these parameters affects the number of riders on a given day.\n", "\n", "Let's start by loading the dataset:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "daily = pd.read_csv('data.csv', index_col='Date', parse_dates=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this in place, we can choose the columns to use, and fit a linear regression model to our data:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# Drop any rows with null values\n", "daily.dropna(axis=0, how='any', inplace=True)\n", "\n", "column_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'holiday',\n", " 'daylight_hrs', 'PRCP', 'dry day', 'Temp (C)']\n", "# PRCP precipitations\n", "X = daily[column_names].values.astype(float) # converts to numpy array\n", "y = daily['Total'].values.astype(float)\n", "\n", "from sklearn import preprocessing\n", "\n", "\n", "# xx = X.values.astype(float) #returns a numpy array\n", "scaler = preprocessing.MinMaxScaler()\n", "X_scaled = scaler.fit_transform(X)\n", "\n", "model = RidgeRegularization(0.1)\n", "model1 = MultivariateLinearRegression()\n", "model.fit(X, y)\n", "y_pred = model.predict(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can compare the total and predicted bicycle traffic visually:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "ax.plot(y, alpha=0.5, label='Truth')\n", "ax.plot(y_pred, alpha=0.5, label='Fit')\n", "ax.legend()\n", "ax.set(title=\"Bicycle Count\", \n", " ylabel=\"Number of Bicycles\",\n", " xlabel=\"Instance of different factors (weather, etc.)\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is evident that we have missed some key features, especially during the summer time.\n", "Either our features are not complete (i.e., people decide whether to ride to work based on more than just these) or there are some nonlinear relationships that we have failed to take into account (e.g., perhaps people ride less at both high and low temperatures).\n", "Nevertheless, our rough approximation is enough to give us some insights, and we can take a look at the coefficients of the linear model to estimate how much each feature contributes to the daily bicycle count:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Intercept 34.314897\n", "Mon 490.296158\n", "Tue 596.243539\n", "Wed 578.405151\n", "Thu 468.195949\n", "Fri 164.217455\n", "Sat -1116.446711\n", "Sun -1146.596643\n", "holiday -1184.393535\n", "daylight_hrs 129.240672\n", "PRCP -663.374237\n", "dry day 551.976034\n", "Temp (C) 66.065583\n", "dtype: float64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "params = pd.Series(model.coef_, index=pd.concat([pd.Series([\"Intercept\"]),pd.Series(column_names)]))\n", "params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We first see that there is a relatively stable trend in the weekly baseline: there are many more riders on weekdays than on weekends and holidays.\n", "We see that for each additional hour of daylight, 129 more people choose to ride; a temperature increase of one degree Celsius encourages 65 people to grab their bicycle; a dry day means an average of 548 more riders, and each inch of precipitation means 665 more people leave their bike at home." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }