{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Wizualizacja tricku kernelowego" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "W pierwszym kroku generujemy dane które nie są separowalne w przestrzeni 2-D" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:53:34.093325Z", "start_time": "2020-05-19T15:53:33.908256Z" } }, "outputs": [], "source": [ "from sklearn.datasets import make_circles\n", "\n", "X, y = make_circles(n_samples=100, noise=0.05)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:53:34.740342Z", "start_time": "2020-05-19T15:53:34.561022Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from mpl_toolkits import mplot3d\n", "import matplotlib.pyplot as plt\n", "%matplotlib notebook\n", "\n", "\n", "plt.scatter(X[:, 0], X[:, 1], c=y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:53:37.807737Z", "start_time": "2020-05-19T15:53:37.805154Z" } }, "outputs": [], "source": [ "x1, x2 = X[:, 0], X[:, 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "W następnym kroku posłużymy się trickiem kernelowym żeby przetransformować dane z przestrzeni 2-D (w której dane nie są liniowo separowalne) do przestrzeni 3-D, w której będą liniowo separowalne. Użyjemy następującego kernela (symbol $<>$ oznacza iloczyn skalarny wektorów): $$K(x,y)=^2$$ który przeprowadza następującą transformacją: \n", "\n", "$$<[x_1x_2],[y_1y_2]>^2=(x_1y_1 * x_2y_2)^2=x_1^2y_1^2 + 2x_1y_1x_2y_2 + x_2^2y_2^2 = x_1^2y_1^2 + \\sqrt{2}x_1x_2\\sqrt{2}y_1y_2 + x_2^2y_2^2 = <[x_1^2,\\sqrt{2}x_1x_2,x_2^2],[y_1^2,\\sqrt{2}y_1y_2,y_2^2]>$$\n", "\n", "$$\\phi([x_1,x_2]=[x_1^2, \\sqrt{2}x_1x_2, x_2^2]$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:53:39.652619Z", "start_time": "2020-05-19T15:53:39.649058Z" } }, "outputs": [], "source": [ "z1 = x1**2\n", "z2 = np.sqrt(2)*x1*x2\n", "z3 = x2**2\n", "\n", "Z = np.column_stack((z1, z2, z3))\n", "\n", "print(Z[:5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Po transformacji dane są w sposób oczywisty separowalne w przestrzeni 3-D" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:53:40.828850Z", "start_time": "2020-05-19T15:53:40.790886Z" } }, "outputs": [], "source": [ "plt.clf()\n", "ax = plt.axes(projection='3d')\n", "ax.scatter3D(z2, z3, z1, c=y);" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:44:18.413178Z", "start_time": "2020-05-19T15:44:18.410349Z" } }, "source": [ "### dla chętnych\n", "\n", "Przeprowadź analogiczną wizualizację przy użyciu kernela $K(x,y)=(1+)^2$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Klasyfikacja przy użyciu SVM" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do realizacji ćwiczenia posłużymy się klasą [sklearn.svm.SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) i sprawdzimy wrażliwość algorytmu na poszczególne parametry." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:53:55.300254Z", "start_time": "2020-05-19T15:53:55.295645Z" } }, "outputs": [], "source": [ "from sklearn.datasets import load_wine\n", "wines = load_wine()\n", "\n", "print(wines['DESCR'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:54:01.813312Z", "start_time": "2020-05-19T15:54:01.808455Z" } }, "outputs": [], "source": [ "X, y = wines['data'], wines['target']" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:54:02.616414Z", "start_time": "2020-05-19T15:54:02.406030Z" } }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "plt.hist(y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:55:20.171512Z", "start_time": "2020-05-19T15:55:20.165073Z" } }, "outputs": [], "source": [ "from sklearn.pipeline import make_pipeline\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import classification_report, confusion_matrix\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.svm import SVC\n", "\n", "scaler = StandardScaler()\n", "\n", "# https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html\n", "svm = SVC(C=1e-2, kernel='poly', degree=2)\n", "\n", "X = scaler.fit(X).transform(X)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:55:20.629940Z", "start_time": "2020-05-19T15:55:20.626950Z" } }, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:55:20.809762Z", "start_time": "2020-05-19T15:55:20.805781Z" } }, "outputs": [], "source": [ "y_pred = svm.fit(X_train, y_train).predict(X_test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2020-05-19T15:55:21.113405Z", "start_time": "2020-05-19T15:55:21.105950Z" } }, "outputs": [], "source": [ "print(f\"\"\" {\n", "\n", "classification_report(y_pred, y_test)}\n", "\n", "Confusion matrix:\n", "{confusion_matrix(y_pred, y_test)} \n", "\n", "Number of support vectors per class: {svm.n_support_}\n", "\n", "\"\"\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }