{"cells":[{"cell_type":"markdown","metadata":{"id":"w1uF0NGuOmkF"},"source":["# Adapted by Toni Monleón, University of Barcelona. 2022 (Only for teachig pourposes)\n","---"]},{"cell_type":"markdown","metadata":{"id":"pQMtoMeuOmkI"},"source":["# Preliminaries to this course\n","---\n","\n","Material obtained and modified from other courses (Postgraduate in Data Science for Medicine and Biology with Python and R. Fac Biology. UB and other materials obtained from free sources (internet). Will be cited the origin of all the material and bibliographical sources)\n","\n","Once we have knowledge about what Machine Learning is and in order to be able to use the different existing methods, it is necessary to have knowledge of a programming language, such as Python, and of the different libraries necessary for it. Illustrating all this with different biomedical examples.\n","\n","To use machine learning and as we know, it is necessary to have prior knowledge of python and of some libraries in particular, which we are going to see next.\n","\n","\n","# Colabs (Colaboratory) and .jpynb script files\n","\n","For this Machine Learning subject we are going to use Google Colabs (Colaboratory) for its laboratories.\n","Colaboratory, or \"Colab\" for short, is a product of Google Research. It allows any user to write and execute arbitrary Python code in the browser. It is especially suitable for machine learning, data analysis, and education tasks. From a more technical standpoint, Colab is a zero-configuration Jupyter hosted notebook service that provides free access to computing resources, such as GPUs.\n","\n","For their requirements it is necessary to have a Google account.\n","\n","All Colab notebooks are stored on Google Drive or you can upload them from GitHub. Colab notebooks can be shared just like Google Docs and Google Sheets files. To do so, click the Share button at the top right of all Colaboratory notebooks, or follow these instructions for sharing files on Google Drive.\n","\n","See the characteristics and requirements of this system in FAQS: https://research.google.com/colaboratory/intl/es/faq.html\n","\n","We will use different python executable scripts that have .jpynb extension. jpynb files are scripts written in Jupyter notebooks (python that contains executable code snippets and explanatory texts) that have a .jpg extension. Jupyter is the open source project that Colab is based on. Colab allows you to use and share Jupyter notebooks with other users without having to download, install, or run anything.\n"]},{"cell_type":"markdown","metadata":{"id":"6BBWt2kxOmkJ"},"source":["# Python\n","\n","---\n","\n","Python is a programming language that lets you work quickly and integrate systems more effectively\n","\n","See at: https://www.python.org/\n","\n","See a brief introduction to python in: Python For Beginners (https://www.python.org/about/gettingstarted/)\n","\n","BeginnersGuide-Download: https://wiki.python.org/moin/BeginnersGuide/Download\n","\n","Python is one of the most popular dynamic programming languages out there including Java, Javascript, Go, and C#. Although it is often thought of as a \"scripting\" language, it is really a general purpose language. Today, Python is used for everything from simple \"scripts\" to large web servers that provide 24x7 uninterrupted service. It is used for programming graphical interfaces and databases, web programming both on the client and on the server (see Django or Flask) and \"testing\" applications. It is also widely accepted by scientists who make applications for the world's fastest supercomputers and by children who are just beginning to program.\n","\n","The history of the Python programming language dates back to the late 1980s and early 1990s,1 its implementation began in December 1989 when at Christmas Guido Van Rossum who worked at the (CWI) (a Dutch research center for official character that, among other things, currently houses the W3C central office) decided to start the project as a hobby giving continuity to the ABC programming language of which he had been part of the development team at the CWI\n","\n","The name \"Python\" comes from Van Rossum's fondness for the Monty Python group. (Wikipedia)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":321},"id":"ytSboS9aOmkJ","outputId":"517aaf56-dc6b-4b17-b367-f32fb140a197","executionInfo":{"status":"ok","timestamp":1676577850284,"user_tz":-60,"elapsed":37,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/html":[""],"text/plain":[""]},"metadata":{},"execution_count":1}],"source":["\n","# import image module\n","from IPython.display import Image\n"," \n","# get the image\n","Image(url=\"https://images.squarespace-cdn.com/content/v1/5c75dfa97d0c9166551f52b1/1566331496763-W6C5O2YI6Z8GVSK0HXSL/531b3e8a9e44460422611ae63fd929c25cb815c5.jpg\", width=500, height=300)"]},{"cell_type":"markdown","metadata":{"id":"tQtSi3LiOmkK"},"source":["# NUMPY. Numerical Computing with Python\n","\n","Python language is an excellent tool for general-purpose programming, with a highly readable syntax, rich and powerful data types"]},{"cell_type":"markdown","metadata":{"id":"fQ4o1JN5OmkL"},"source":["---\n","\n","However, it was not designed specifically for mathematical and scientific computing.\n","In particular, Python lists are very flexible containers, but they are poorly suited to represent efficiently common mathematical constructs like vectors and matrices. \n","\n","Fortunately, exists the **numpy** package (https://numpy.org/) which is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good. It is used in almost all numerical computation using Python.\n","\n"]},{"cell_type":"markdown","metadata":{"id":"j3HQndd4OmkL"},"source":["Why not simply use Python lists for computations instead of creating a new array type?\n","\n","There are several reasons:\n","\n","* Python lists are very general. They can contain any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementating such functions for Python lists would not be very efficient because of the dynamic typing.\n","* Numpy arrays are statically typed and homogeneous. The type of the elements is determined when array is created.\n","* Numpy arrays are memory efficient.\n","* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).\n","\n","# Basic of Python\n","\n","Installing Packages\n","\n","This section covers the basics of how to install Python packages.\n","\n","It’s important to note that the term “package” in this context is being used to describe a bundle of software to be installed (i.e. as a synonym for a distribution). It does not to refer to the kind of package that you import in your Python source code (i.e. a container of modules). It is common in the Python community to refer to a distribution using the term “package”. Using the term “distribution” is often not preferred, because it can easily be confused with a Linux distribution, or another larger software distribution like Python itself.\n","\n","Before you go any further, make sure you have Python and that the expected version is available from your command line. You can check this by running (only in local, not in Colab):\n","python3 --version\n","\n","Ensure you can run pip from the command line\n","Additionally, you’ll need to make sure you have pip available. You can check this by running (only in local, not in Colab): python3 -m pip --version\n","\n","\n"]},{"cell_type":"markdown","source":["**Use pip for Installing**\n","\n","pip is the recommended installer. Below, we’ll cover the most common usage scenarios. For more detail, see the pip docs, which includes a complete Reference Guide (see in https://pip.pypa.io/en/latest/cli/).\n","\n","Example to install library numpy (only in local, not in Colab)\n","\n","python3 -m pip install numpy\n","\n","Now we are going to see the instruction to install Numpy (if it is not available) in the Colab (remember that the basic libraries are available)"],"metadata":{"id":"n_h--EP0khDl"}},{"cell_type":"code","source":["!pip install numpy"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"OIpbLBmZkheD","executionInfo":{"status":"ok","timestamp":1676634204848,"user_tz":-60,"elapsed":6067,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}},"outputId":"f8d89ce2-210c-40c1-bc0e-b04726158ae8"},"execution_count":1,"outputs":[{"output_type":"stream","name":"stdout","text":["Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n","Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (1.21.6)\n"]}]},{"cell_type":"markdown","source":["\n","See more in: https://packaging.python.org/en/latest/tutorials/installing-packages/\n","\n"],"metadata":{"id":"VSPtVof3k1Ct"}},{"cell_type":"markdown","metadata":{"id":"ZJkgXdm4OmkL"},"source":["## Basics of Numpy\n","\n","To use **numpy** it is needed to import the module:"]},{"cell_type":"code","execution_count":2,"metadata":{"id":"UeTOAaAFOmkM","executionInfo":{"status":"ok","timestamp":1676634209377,"user_tz":-60,"elapsed":2,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[],"source":["#we need to check previosly if library is intalled in our computer\n","import numpy as np"]},{"cell_type":"markdown","metadata":{"id":"TgLhqY-KOmkM"},"source":["## Creating numpy arrays\n","There are a number of ways to initialize new numpy arrays, for example from\n","\n","1. A Python list or tuples\n","2. Using array-generating functions, such as `arange`, `linspace`, etc.\n","3. Reading data from files"]},{"cell_type":"markdown","metadata":{"id":"uI7gD1kFOmkN"},"source":["### 1. From a list\n","For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function."]},{"cell_type":"code","execution_count":3,"metadata":{"id":"5tDacbVoOmkN","outputId":"4955245d-b33d-4513-b336-53013a767f1e","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634257335,"user_tz":-60,"elapsed":176,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([1, 2, 3, 4])"]},"metadata":{},"execution_count":3}],"source":["# a vector: the argument to the array function is a Python list\n","v = np.array([1,2,3,4])\n","v"]},{"cell_type":"code","execution_count":4,"metadata":{"id":"Cc7w9oW6OmkO","outputId":"4342ccfc-c1b3-471f-cd95-8f3f5e4aecb0","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634273662,"user_tz":-60,"elapsed":179,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[1, 2],\n"," [3, 4]])"]},"metadata":{},"execution_count":4}],"source":["# a matrix: the argument to the array function is a nested Python list\n","M = np.array([[1, 2], [3, 4]])\n","M"]},{"cell_type":"markdown","metadata":{"id":"zJuNsgUAOmkO"},"source":["If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: "]},{"cell_type":"code","execution_count":5,"metadata":{"id":"M_ntR-VyOmkO","outputId":"9cb2fd32-8bba-4b37-ba73-3d56cc471a92","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634290797,"user_tz":-60,"elapsed":189,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[1, 2],\n"," [3, 4]])"]},"metadata":{},"execution_count":5}],"source":["M = np.array([[1, 2], [3, 4]], dtype=int)\n","M"]},{"cell_type":"markdown","metadata":{"id":"EvmjIvGmOmkP"},"source":["Common type that can be used with dtype are: int, float, complex, bool, object, etc.\n","\n","We can also explicitly define the bit size of the data types, for example: int64, int16, float128, complex128."]},{"cell_type":"markdown","metadata":{"id":"-z1NA-L1OmkP"},"source":["### 2. Using array-generating functions\n","For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generates arrays of different forms. Some of the more common are:\n","\n","**Zeros and Ones**"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"heDmvbj_OmkQ","outputId":"26af6437-6ea5-4f1f-d6d2-1703131b40a4","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676288548048,"user_tz":-60,"elapsed":280,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([0., 0., 0., 0., 0.])"]},"metadata":{},"execution_count":8}],"source":["np.zeros(5, dtype=float)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Gr3d1jQOOmkQ","outputId":"8ea5474c-a255-4039-f35c-3c41cf707e64","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676288548304,"user_tz":-60,"elapsed":3,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([1., 1., 1., 1., 1.])"]},"metadata":{},"execution_count":9}],"source":["np.ones(5,dtype=float)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"pMt6hvT7OmkQ","outputId":"a72da12a-93f4-4da2-a5da-1803a41e5f44","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676288548305,"user_tz":-60,"elapsed":3,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[0, 0, 0],\n"," [0, 0, 0]])"]},"metadata":{},"execution_count":10}],"source":["np.zeros((2,3),dtype=np.int64)"]},{"cell_type":"markdown","metadata":{"id":"MRXCFP-eOmkR"},"source":["**arange**"]},{"cell_type":"code","execution_count":6,"metadata":{"id":"jgf5MtS3OmkR","outputId":"3eb46fb0-9610-4bf0-e57c-825ee89c0d4c","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634320534,"user_tz":-60,"elapsed":177,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n"," 17, 18, 19])"]},"metadata":{},"execution_count":6}],"source":["x = np.arange(0, 20, 1) # arguments: start, stop, step\n","x"]},{"cell_type":"markdown","metadata":{"id":"pbi6uPj5OmkR"},"source":["**linspace and logspace**"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"8VmCFMSJOmkR","outputId":"78627886-411c-4b85-932c-cb391ef5659e","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676288571622,"user_tz":-60,"elapsed":473,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["A linear grid of 5 elements between 0 and 1:\n","[0. 0.25 0.5 0.75 1. ]\n"]}],"source":["print (\"A linear grid of 5 elements between 0 and 1:\")\n","print (np.linspace(0, 1, 5))\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ydrCMXtkOmkR","outputId":"526300c3-3d03-4cc3-ea6a-d8f1a1843211","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676288587413,"user_tz":-60,"elapsed":326,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["A logarithmic grid of 10 elenebts between 10**0 and 10**3:\n","[ 1. 2.15443469 4.64158883 10. 21.5443469\n"," 46.41588834 100. 215.443469 464.15888336 1000. ]\n"]}],"source":["print (\"A logarithmic grid of 10 elements between 10**0 and 10**3:\")\n","print (np.logspace(0, 3, 10))"]},{"cell_type":"markdown","metadata":{"id":"iY4TJlg4OmkS"},"source":["**Creating random arrays**"]},{"cell_type":"code","execution_count":7,"metadata":{"id":"R8kwCtiWOmkS","outputId":"e4d660b9-b70e-4386-e171-7397e0518705","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634337637,"user_tz":-60,"elapsed":190,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[0.53824801, 0.29722694, 0.05297801, 0.82744711, 0.8844692 ],\n"," [0.9189721 , 0.53592256, 0.21703841, 0.41549287, 0.6827314 ],\n"," [0.52677334, 0.73939667, 0.94781049, 0.99097768, 0.47457526],\n"," [0.70145715, 0.4929748 , 0.31151314, 0.23250583, 0.75479595],\n"," [0.1129694 , 0.61987393, 0.65915877, 0.16901694, 0.68650668]])"]},"metadata":{},"execution_count":7}],"source":["# uniform random numbers in [0,1]\n","np.random.rand(5,5)"]},{"cell_type":"code","execution_count":8,"metadata":{"id":"Ft7ettfCOmkS","outputId":"dcfa632c-6d49-4860-fb60-7cb2d7b46ba3","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634370161,"user_tz":-60,"elapsed":172,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([ 8.78970308, 12.0251407 , 16.79559352, 11.0137931 , 13.27737495])"]},"metadata":{},"execution_count":8}],"source":["# 5 samples from a normal distribution with a mean of 10 and a variance of 3:\n","np.random.normal(10, 3, 5)"]},{"cell_type":"markdown","metadata":{"id":"vuh2oc0lOmkS"},"source":["** diag **"]},{"cell_type":"code","execution_count":9,"metadata":{"id":"38enYhXlOmkT","outputId":"f2828abc-8743-41fa-9d2a-eeb444d3e8c1","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634378321,"user_tz":-60,"elapsed":189,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[1, 0, 0, 0],\n"," [0, 1, 0, 0],\n"," [0, 0, 1, 0],\n"," [0, 0, 0, 1]])"]},"metadata":{},"execution_count":9}],"source":["# a diagonal matrix\n","np.diag([1,1,1,1])"]},{"cell_type":"markdown","metadata":{"id":"FmasiQbBOmkT"},"source":["### 3. Reading data from files\n"]},{"cell_type":"markdown","metadata":{"id":"OnqJhy4HOmkT"},"source":["** Comma-separated values (CSV) **"]},{"cell_type":"markdown","metadata":{"id":"nXdT6jcMOmkT"},"source":["A very common file format for data files are the comma-separated values (CSV), or related format such as TSV (tab-separated values).\n","Open data from https://github.com/datasets\n","\n","[vix-daily.csv](https://raw.githubusercontent.com/datasets/finance-vix/master/data/vix-daily.csv)\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"G_T7PP4WOmkT"},"outputs":[],"source":["np.genfromtxt?"]},{"cell_type":"code","execution_count":10,"metadata":{"id":"wuEqI26rOmkU","outputId":"346c1f5b-f8fa-4a09-c180-1d35f04bfd3c","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634386626,"user_tz":-60,"elapsed":454,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[ nan, 17.96, 18.68, 17.54, 18.22],\n"," [ nan, 18.45, 18.49, 17.44, 17.49],\n"," [ nan, 17.66, 17.67, 16.19, 16.73],\n"," ...,\n"," [ nan, 21.97, 22.89, 19.47, 21.3 ],\n"," [ nan, 20.28, 20.56, 17.55, 17.62],\n"," [ nan, 17.06, 19.55, 17.06, 17.4 ]])"]},"metadata":{},"execution_count":10}],"source":["# Open data from https://github.com/datasets\n","data = np.genfromtxt('https://raw.githubusercontent.com/datasets/finance-vix/master/data/vix-daily.csv'\\\n"," ,skip_header=1,delimiter=',')\n","data"]},{"cell_type":"markdown","metadata":{"id":"N4XHZoLiOmkU"},"source":["*Few remarks on NANs:*\n","\n","By definition, NaN is a float point number which is not equal to any other number \n"]},{"cell_type":"code","execution_count":11,"metadata":{"id":"oNGfnClUOmkU","outputId":"82ce3371-08b8-4fc5-b55f-c32debfce0f7","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634395731,"user_tz":-60,"elapsed":169,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["True"]},"metadata":{},"execution_count":11}],"source":["np.nan != np.nan"]},{"cell_type":"markdown","metadata":{"id":"nCAFpTB3OmkU"},"source":["Thus, the equal operator can not be used for detecting NaN"]},{"cell_type":"code","execution_count":12,"metadata":{"id":"mdDZAbfQOmkU","outputId":"bba5a3cf-bf33-4c58-bd65-6a0e016d2f18","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634429378,"user_tz":-60,"elapsed":175,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[False, False, False, False, False],\n"," [False, False, False, False, False],\n"," [False, False, False, False, False],\n"," ...,\n"," [False, False, False, False, False],\n"," [False, False, False, False, False],\n"," [False, False, False, False, False]])"]},"metadata":{},"execution_count":12}],"source":["data== np.nan"]},{"cell_type":"markdown","metadata":{"id":"lw23AkTOOmkV"},"source":["Instead, isnan function is used:"]},{"cell_type":"code","execution_count":13,"metadata":{"id":"rxD6xo6sOmkV","outputId":"8fb75509-0302-46e3-b2d9-cfa8b8e0b26e","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634435730,"user_tz":-60,"elapsed":2,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[ True, False, False, False, False],\n"," [ True, False, False, False, False],\n"," [ True, False, False, False, False],\n"," ...,\n"," [ True, False, False, False, False],\n"," [ True, False, False, False, False],\n"," [ True, False, False, False, False]])"]},"metadata":{},"execution_count":13}],"source":["np.isnan(data)"]},{"cell_type":"markdown","metadata":{"id":"66tWviieOmkW"},"source":["We can skip one or more columns when importing:"]},{"cell_type":"code","execution_count":14,"metadata":{"id":"LoKkKYkQOmkX","outputId":"ddfeec5b-2eda-4022-c806-a585c16f7179","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634457742,"user_tz":-60,"elapsed":178,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[17.96, 18.68, 17.54, 18.22],\n"," [18.45, 18.49, 17.44, 17.49],\n"," [17.66, 17.67, 16.19, 16.73],\n"," ...,\n"," [21.97, 22.89, 19.47, 21.3 ],\n"," [20.28, 20.56, 17.55, 17.62],\n"," [17.06, 19.55, 17.06, 17.4 ]])"]},"metadata":{},"execution_count":14}],"source":["# Open data from https://github.com/datasets\n","data = np.genfromtxt('https://raw.githubusercontent.com/datasets/finance-vix/master/data/vix-daily.csv'\\\n"," ,skip_header=1,delimiter=',',usecols=[1,2,3,4])\n","data"]},{"cell_type":"markdown","metadata":{"id":"WDeDjwPFOmkX"},"source":["Using `numpy.savetxt` we can store a Numpy array to a file in CSV format:"]},{"cell_type":"code","execution_count":15,"metadata":{"id":"Env3L3KnOmkX","outputId":"b731425d-6d37-4c47-8d16-1a243ad13de3","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634462004,"user_tz":-60,"elapsed":186,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"stream","name":"stdout","text":["[[0.59885581 0.99371314 0.05216767]\n"," [0.68111522 0.17407148 0.68324666]\n"," [0.02297682 0.15409137 0.89717348]]\n","5.988558111047993515e-01,9.937131393646133626e-01,5.216767170987968161e-02\n","6.811152219337583968e-01,1.740714818650686002e-01,6.832466557644286675e-01\n","2.297682101587938952e-02,1.540913699913241119e-01,8.971734846803304242e-01\n"]}],"source":["M = np.random.rand(3,3)\n","np.savetxt(\"random-matrix.csv\", M, delimiter=',')\n","print (M)\n","%cat random-matrix.csv"]},{"cell_type":"code","execution_count":16,"metadata":{"id":"edYvLGkuOmkX","executionInfo":{"status":"ok","timestamp":1676634478209,"user_tz":-60,"elapsed":164,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[],"source":["np.savetxt(\"random-matrix.csv\", M, fmt='%.5f', delimiter= ',') # fmt specifies the format %cat random-matrix.csv"]},{"cell_type":"markdown","metadata":{"id":"rLkEvXvUOmkX"},"source":["To read data from such file into Numpy arrays we can use the `numpy.genfromtxt` function.\n"]},{"cell_type":"code","execution_count":17,"metadata":{"id":"CBSsopHsOmkX","outputId":"49224b38-27c3-4617-d6b3-fba1580f0a76","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1676634480242,"user_tz":-60,"elapsed":167,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[0.59886, 0.99371, 0.05217],\n"," [0.68112, 0.17407, 0.68325],\n"," [0.02298, 0.15409, 0.89717]])"]},"metadata":{},"execution_count":17}],"source":["data = np.genfromtxt('random-matrix.csv',delimiter=',')\n","data"]},{"cell_type":"markdown","metadata":{"id":"fbjGZ5VzOmkX"},"source":["** Numpy's native file format **"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"vA4lAjPpOmkX"},"outputs":[],"source":["np.save(\"random-matrix.npy\", M) #!file random-matrix.npy"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"RjorritMOmkY","outputId":"4cffb836-b774-4d5c-f0a8-d60276dff1c8"},"outputs":[{"data":{"text/plain":["array([[ 0.15663045, 0.10134917, 0.07489006],\n"," [ 0.0055186 , 0.07112554, 0.43598842],\n"," [ 0.23353596, 0.11720206, 0.61343514]])"]},"execution_count":45,"metadata":{},"output_type":"execute_result"}],"source":["np.load(\"random-matrix.npy\")\n"]},{"cell_type":"markdown","metadata":{"id":"9oUp8vnpOmkY"},"source":["## Manipulating arrays\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"4Wfz4ChHOmkY"},"outputs":[],"source":["lst = [10, 20, 30, 40] #python list\n","arr = np.array([10, 20, 30, 40],dtype='int64') #numpy array\n","M = np.array([[10, 20, 30, 40],[50, 60, 70, 80]]) #numpy matrix"]},{"cell_type":"markdown","metadata":{"id":"aKjQ0Jn3OmkY"},"source":["### Element indexing \n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"swUpG_YFOmkY","outputId":"d863306e-56f1-4090-895b-83a30222ed91"},"outputs":[{"data":{"text/plain":["10"]},"execution_count":28,"metadata":{},"output_type":"execute_result"}],"source":["#get the first element of list\n","lst[0]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"QP3kDdu4OmkZ","outputId":"dbc3867e-de0d-43e6-bc27-fd09721f228a"},"outputs":[{"data":{"text/plain":["10"]},"execution_count":29,"metadata":{},"output_type":"execute_result"}],"source":["#get the first element of array\n","arr[0]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Rg0yXQbCOmkZ","outputId":"e620542c-22f7-432c-b3d0-acf56c089476"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[10 20 30 40]\n"," [50 60 70 80]]\n","10\n","10\n","60\n","70\n"]}],"source":["# M is a matrix, or a 2 dimensional array, taking two indices \n","print (M)\n","#M[row][col] or M[row,col]\n","print (M[0][0]) # element from first row first column \n","print (M[0,0]) # element from first row first column \n","print (M[1,1]) # element from second row second column\n","print (M[1,2]) "]},{"cell_type":"markdown","metadata":{"id":"PH2L6IUaOmkZ"},"source":["If we omit an index of a multidimensional array it returns the whole row\n","(or, in general, a N-1 dimensional array)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"hnRVK0SeOmkZ","outputId":"fc57b79e-9ebe-4c2b-9712-b9f7dce92fb4"},"outputs":[{"data":{"text/plain":["array([50, 60, 70, 80])"]},"execution_count":31,"metadata":{},"output_type":"execute_result"}],"source":["M[1] # second row"]},{"cell_type":"markdown","metadata":{"id":"764YRwEYOmkZ"},"source":["The same thing can be achieved with using `:` instead of an index: "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"eHUXS1m5Omka","outputId":"ccc3c72b-90ce-4752-8c0a-b2fd08a6f08e"},"outputs":[{"data":{"text/plain":["array([50, 60, 70, 80])"]},"execution_count":32,"metadata":{},"output_type":"execute_result"}],"source":["M[1,:] # second row, all columns "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"JgOOC2L0Omka","outputId":"15f90fd7-ce1c-4116-e450-9598fc6d2f52"},"outputs":[{"data":{"text/plain":["array([40, 80])"]},"execution_count":33,"metadata":{},"output_type":"execute_result"}],"source":["M[:,3] # all rows, fourth column "]},{"cell_type":"markdown","metadata":{"id":"yTZW50QyOmka"},"source":["We can assign new values to elements in an array using indexing:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"9_rl59QtOmka","outputId":"f995ae88-6240-47f3-cfc0-dfe2e5b0a145"},"outputs":[{"data":{"text/plain":["array([[ 1, 20, 30, 40],\n"," [50, 60, 70, 80]])"]},"execution_count":34,"metadata":{},"output_type":"execute_result"}],"source":["M[0,0] = 1\n","M"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"V_6wclcOOmka","outputId":"55e58e18-028d-42f6-c8a3-9065e8bfb08a"},"outputs":[{"data":{"text/plain":["array([[ 1, 20, 30, 40],\n"," [ 0, 0, 0, 0]])"]},"execution_count":35,"metadata":{},"output_type":"execute_result"}],"source":["# also works for rows and columns\n","M[1,:] = 0\n","M"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"R9gCYYKmOmkb","outputId":"7c375530-a849-496e-b3b3-cf31a7371fa5"},"outputs":[{"data":{"text/plain":["array([[ 1, 20, -1, 40],\n"," [ 0, 0, -1, 0]])"]},"execution_count":36,"metadata":{},"output_type":"execute_result"}],"source":["M[:,2] = -1\n","M"]},{"cell_type":"markdown","metadata":{"id":"f201zjQ1Omkc"},"source":["Arrays are homogeneous; i.e. all elements of an array must be of the same type\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_y29hzo7Omkc","outputId":"daaae43e-ae8f-40ac-9530-c7f22259181b"},"outputs":[{"data":{"text/plain":["[10, 'a string inside a list', 30, 40]"]},"execution_count":37,"metadata":{},"output_type":"execute_result"}],"source":["#Lists are heterogeneous\n","lst[1] = 'a string inside a list'\n","lst"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Cr3U9wmFOmkc","outputId":"0eed2a15-084f-4865-be9d-3486e10b2bec"},"outputs":[{"ename":"ValueError","evalue":"invalid literal for int() with base 10: 'a string inside an array'","output_type":"error","traceback":["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m","\u001b[1;31mValueError\u001b[0m Traceback (most recent call last)","\u001b[1;32mc:\\Users\\Antonio Monleon\\Dropbox\\Material docente UB-UPC\\MACHINE_LEARNING_ALGORITHMS_DS\\micurso_ML\\DataScience-Numpy.ipynb Cell 68'\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[0;32m 1\u001b[0m \u001b[39m#Arrays are homogeneous\u001b[39;00m\n\u001b[1;32m----> 2\u001b[0m arr[\u001b[39m1\u001b[39m] \u001b[39m=\u001b[39m \u001b[39m'\u001b[39m\u001b[39ma string inside an array\u001b[39m\u001b[39m'\u001b[39m\n","\u001b[1;31mValueError\u001b[0m: invalid literal for int() with base 10: 'a string inside an array'"]}],"source":["#Arrays are homogeneous\n","arr[1] = 'a string inside an array'"]},{"cell_type":"markdown","metadata":{"id":"P-VqmPTUOmkc"},"source":["Once an array has been created, its dtype is fixed and it can only store elements of the same type. For this example where the dtype is integer, if we store a floating point number it will be automatically converted into an integer:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"7wdVGiIxOmkd","outputId":"2ef303b7-1fa1-4dad-f240-dcb259877bf3"},"outputs":[{"data":{"text/plain":["array([10, 1, 30, 40])"]},"execution_count":77,"metadata":{},"output_type":"execute_result"}],"source":["arr[1] = 1.234\n","arr"]},{"cell_type":"markdown","metadata":{"id":"h9b07wtqOmkd"},"source":["### Index slicing "]},{"cell_type":"markdown","metadata":{"id":"BsyZAvDWOmkd"},"source":["Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"bFCFqmUXOmkd","outputId":"892b5129-d64f-480c-efdd-121d5a7819bf"},"outputs":[{"data":{"text/plain":["array([2, 3])"]},"execution_count":79,"metadata":{},"output_type":"execute_result"}],"source":["A = np.array([1,2,3,4,5])\n","#slice from second to fourth element, step is one\n","A[1:3:1]"]},{"cell_type":"markdown","metadata":{"id":"ipGyRCUMOmkd"},"source":["Array slices are *mutable*: if they are assigned a new value the original array from which the slice was extracted is modified:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"FMLlxpcKOmke","outputId":"b40f732e-22cd-4e24-d0be-c9ae89fd6733"},"outputs":[{"data":{"text/plain":["array([ 1, -2, -3, 4, 5])"]},"execution_count":82,"metadata":{},"output_type":"execute_result"}],"source":["A[1:3:1] = [-2,-3]\n","A"]},{"cell_type":"markdown","metadata":{"id":"yzlZqLV-Omke"},"source":["We can omit any of the three parameters in `M[lower:upper:step]`, by default `lower` is the beginning , `upper` is the end of the array, and `step` is one"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"-aKjpwggOmke","outputId":"d35ece0e-9c5d-45be-d5a0-61751e9beb0e"},"outputs":[{"data":{"text/plain":["array([ 1, -3, 5])"]},"execution_count":83,"metadata":{},"output_type":"execute_result"}],"source":["A[::2] # step is 2, lower and upper defaults to the beginning and end of the array"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_EpU7umAOmke","outputId":"24e563fc-63a1-4251-88b4-a02e059ed396"},"outputs":[{"data":{"text/plain":["array([ 1, -2, -3])"]},"execution_count":84,"metadata":{},"output_type":"execute_result"}],"source":["A[:3] # first three elements"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"noEFWGEpOmke","outputId":"4955aff2-53af-4a91-cca9-a6aa368adcd4"},"outputs":[{"data":{"text/plain":["array([4, 5])"]},"execution_count":85,"metadata":{},"output_type":"execute_result"}],"source":["A[3:] # elements from index 3"]},{"cell_type":"markdown","metadata":{"id":"AgUYieOqOmke"},"source":["Negative indices counts from the end of the array:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"fX0r3MvAOmkf","outputId":"42826524-ad4e-4a5c-af84-e820341d8453"},"outputs":[{"data":{"text/plain":["5"]},"execution_count":86,"metadata":{},"output_type":"execute_result"}],"source":["A[-1] # the last element in the array"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"YA99WMRKOmkf","outputId":"6cdb765c-7073-48fc-ca75-110503eee23c"},"outputs":[{"data":{"text/plain":["array([-3, 4, 5])"]},"execution_count":87,"metadata":{},"output_type":"execute_result"}],"source":["A[-3:] # the last three elements"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Wd_LK-PPOmkf","outputId":"7b74760a-3408-4d82-8e49-0ef061e31269"},"outputs":[{"data":{"text/plain":["array([ 5, 4, -3, -2, 1])"]},"execution_count":89,"metadata":{},"output_type":"execute_result"}],"source":["A[::-1] #Step backwards, it returns an array with elements in reverse order"]},{"cell_type":"markdown","metadata":{"id":"TmoAOR03Omkf"},"source":["Index slicing works exactly the same way for multidimensional arrays, but every dimension separated by comma:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Q7u_xxBJOmkf","outputId":"c96243c5-14c2-4f07-b0f3-152672ed02d4"},"outputs":[{"data":{"text/plain":["array([[ 1, 20, -1, 40],\n"," [ 0, 0, -1, 0]])"]},"execution_count":90,"metadata":{},"output_type":"execute_result"}],"source":["M"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"rc3RNS-JOmkg","outputId":"b5a3a126-ee8c-4622-f77b-573be53ff1a5"},"outputs":[{"data":{"text/plain":["array([[20, -1],\n"," [ 0, -1]])"]},"execution_count":91,"metadata":{},"output_type":"execute_result"}],"source":["#a block from the original array\n","#all rows, two central columns\n","M[:, 1:3]\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ouHrwBw5Omkg","outputId":"dfaa9bba-184e-4965-bb2d-8f6186fd68ac"},"outputs":[{"data":{"text/plain":["array([[ 1, -1],\n"," [ 0, -1]])"]},"execution_count":92,"metadata":{},"output_type":"execute_result"}],"source":["# all row, skiping even columns\n","M[:, ::2]"]},{"cell_type":"markdown","metadata":{"id":"-GChXgsNOmkg"},"source":["You can master your **index slicing** abilities by resolving the exercises at the end of this\n","notebook"]},{"cell_type":"markdown","metadata":{"id":"oBxy8rcCOmkg"},"source":["### Comparison operators and value testing \n","\n","Boolean comparisons can be used to compare members elementwise on arrays of equal size."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"nLFkHJAEOmkh","outputId":"6cd1752c-7b0d-4cd4-d510-c332bfffda65"},"outputs":[{"name":"stdout","output_type":"stream","text":["[ True False False]\n","[False True False]\n","[False True True]\n"]}],"source":["a = np.array([1, 3, 0], float) \n","b = np.array([0, 3, 2], float) \n","print (a > b )\n","print (a == b )\n","print (a <= b )"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"9wESB-nFOmkh","outputId":"0cb870a7-09e7-4390-fab1-3ef0e13eadb8"},"outputs":[{"data":{"text/plain":["array([False, True, False], dtype=bool)"]},"execution_count":94,"metadata":{},"output_type":"execute_result"}],"source":["a = np.array([1, 3, 0], float) \n","a > 2"]},{"cell_type":"markdown","metadata":{"id":"VMyU_4WxOmkh"},"source":["The any and all operators can be used to determine whether or not any or all elements of a \n","Boolean array are true: "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_MCH1hufOmkh","outputId":"50d48d57-391f-4eb8-8238-c8e431281a8d"},"outputs":[{"name":"stdout","output_type":"stream","text":["True False\n"]},{"data":{"text/plain":["False"]},"execution_count":96,"metadata":{},"output_type":"execute_result"}],"source":["c = np.array([ True, False, False], bool) \n","print (any(c), all(c))\n","any([False,False])"]},{"cell_type":"markdown","metadata":{"id":"0hh6Twt3Omkh"},"source":["The ``where`` function forms a new array from two arrays of equivalent size using a Boolean filter to choose between elements of the two. Its basic syntax is:
\n","where(boolarray, truearray, falsearray)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"9YUfXh3BOmki","outputId":"dec8da66-f6a2-4e93-b207-85f060d62d0a"},"outputs":[{"name":"stderr","output_type":"stream","text":["/usr/local/lib/python3.5/site-packages/ipykernel/__main__.py:2: RuntimeWarning: divide by zero encountered in true_divide\n"," from ipykernel import kernelapp as app\n"]},{"data":{"text/plain":["array([ 1. , 0.33333333, 0. ])"]},"execution_count":98,"metadata":{},"output_type":"execute_result"}],"source":["a = np.array([1, 3, 0], float) \n","np.where(a != 0, 1/a, 0) \n"]},{"cell_type":"markdown","metadata":{"id":"NT8Kvw8ZOmki"},"source":["### Indexing with other arrays (*Fancy indexing*)\n","\n","Arrays allow for a more sophisticated kind of indexing: you can index an array with another array, and in particular with an array of boolean values. This is particluarly useful to **filter**\n","information from an array that matches a certain condition."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"k71cQmVROmki","outputId":"1e9c9a2e-c2cb-42dc-c78c-77b453dd035b"},"outputs":[{"ename":"NameError","evalue":"name 'np' is not defined","output_type":"error","traceback":["\u001b[1;31m---------------------------------------------------------------------------\u001b[0m","\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)","\u001b[1;32mc:\\Users\\Antonio Monleon\\Dropbox\\Material docente UB-UPC\\MACHINE_LEARNING_ALGORITHMS_DS\\micurso_ML\\DataScience-Numpy.ipynb Cell 97'\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[0m arr \u001b[39m=\u001b[39m np\u001b[39m.\u001b[39marray([\u001b[39m10\u001b[39m,\u001b[39m8\u001b[39m,\u001b[39m30\u001b[39m,\u001b[39m40\u001b[39m])\n\u001b[0;32m 2\u001b[0m \u001b[39mprint\u001b[39m (arr)\n\u001b[0;32m 3\u001b[0m mask \u001b[39m=\u001b[39m arr \u001b[39m<\u001b[39m \u001b[39m9\u001b[39m \u001b[39m# construct a boolean array \u001b[39;00m\n","\u001b[1;31mNameError\u001b[0m: name 'np' is not defined"]}],"source":["arr = np.array([10,8,30,40])\n","print (arr)\n","mask = arr < 9 # construct a boolean array \n"," #where i-th eleement is True if the i-th element of arr is less than 9\n","mask"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"RUEwWwPjOmkj","outputId":"5d0069f3-dc75-4c0e-dd57-93d4cb5a0a11"},"outputs":[{"name":"stdout","output_type":"stream","text":["Values below 9: [8]\n"]}],"source":["print ('Values below 9:', arr[mask])"]},{"cell_type":"markdown","metadata":{"id":"48CcJr9LOmkj"},"source":["The index mask can be converted to position index using the `where` function"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"A5cU_GH6Omkk","outputId":"4b40c96a-6f31-4870-8003-21554b443b94"},"outputs":[{"name":"stdout","output_type":"stream","text":["[False True False False]\n"]},{"data":{"text/plain":["(array([1]),)"]},"execution_count":109,"metadata":{},"output_type":"execute_result"}],"source":["print (mask)\n","indices = np.where(mask)\n","indices"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"zvkWv1jKOmkk","outputId":"a4a92c24-45ae-4353-e9e2-b6a4752fc5be"},"outputs":[{"name":"stdout","output_type":"stream","text":["Resetting all values below 9 to 10...\n","[False True False False]\n","[10 10 30 40]\n"]},{"data":{"text/plain":["array([False, False, False, False], dtype=bool)"]},"execution_count":113,"metadata":{},"output_type":"execute_result"}],"source":["print ('Resetting all values below 9 to 10...')\n","print (arr < 9)\n","arr[arr < 9] = 10\n","print (arr)\n","arr < 9"]},{"cell_type":"markdown","metadata":{"id":"xnnHdMaZOmkk"},"source":["It is also possible to select using **integer arrays** that represent indexes."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"zzwyQPp6Omkk","outputId":"1049a74a-00d3-4632-ab0f-fe070f717ead"},"outputs":[{"name":"stdout","output_type":"stream","text":["[10 10 30 40]\n"]},{"data":{"text/plain":["array([10, 30, 40])"]},"execution_count":123,"metadata":{},"output_type":"execute_result"}],"source":["print (arr)\n","row_indices = [1, 2 ,3]\n","arr[row_indices]"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"vhHlXeFUOmkk","outputId":"684c6a7a-a7b4-429a-d44e-050ad0662f8f"},"outputs":[{"data":{"text/plain":["array([ 2., 2., 4., 8., 6., 4.])"]},"execution_count":126,"metadata":{},"output_type":"execute_result"}],"source":["a = np.array([2, 4, 6, 8], float) \n","b = np.array([0, 0, 1, 3, 2, 1], int) # the 0th, 0th, 1st, 3rd, 2nd, and 1st elements of a\n","a[b] "]},{"cell_type":"markdown","metadata":{"id":"N1q5HwSnOmkl"},"source":["For multidimensional arrays, we have to set up one one-dimensional integer array for each axis."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"WzpJjs5cOmkl","outputId":"d43eb2d1-1ba0-4efb-b758-821114713947"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[ 1. 4.]\n"," [ 9. 16.]]\n"]},{"data":{"text/plain":["array([ 1., 4., 16., 16., 9.])"]},"execution_count":127,"metadata":{},"output_type":"execute_result"}],"source":["a = np.array([[1, 4], [9, 16]], float) \n","print (a)\n","b = np.array([0, 0, 1, 1, 1], int) \n","c = np.array([0, 1, 1, 1, 0], int) \n","a[b,c] "]},{"cell_type":"markdown","metadata":{"id":"TrLrdV7nOmkl"},"source":["## Array Attributes and Methods\n","The information about the type of an array is contained in its *dtype* attribute:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"zJ_RXpCEOmkm","outputId":"097f40ff-97e7-4863-958d-51764b57a8a9"},"outputs":[{"data":{"text/plain":["numpy.ndarray"]},"execution_count":128,"metadata":{},"output_type":"execute_result"}],"source":["# arr is an object of the type ndarray that the numpy module provides.\n","type(arr)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"cSEnnLTWOmkm","outputId":"198b7abf-64f5-41d0-a02c-f22f6c0cc338"},"outputs":[{"data":{"text/plain":["dtype('int64')"]},"execution_count":129,"metadata":{},"output_type":"execute_result"}],"source":["arr.dtype"]},{"cell_type":"markdown","metadata":{"id":"IkVBQY4sOmkm"},"source":["The difference between the `arr` and `M` arrays is only their shapes. We can get information about the shape of an array by using the `ndarray.shape` property."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"AFxf8l97Omkm","outputId":"86eda82b-384d-4cff-84e7-5261bf91a1de"},"outputs":[{"name":"stdout","output_type":"stream","text":["[10 10 30 40]\n"]},{"data":{"text/plain":["(4,)"]},"execution_count":143,"metadata":{},"output_type":"execute_result"}],"source":["arr = np.array([10,10,30,40])\n","print (arr)\n","arr.shape"]},{"cell_type":"code","execution_count":null,"metadata":{"collapsed":true,"id":"Q57UHQVJOmkm"},"outputs":[],"source":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6dEALPgNOmkm","outputId":"a7829aaa-2ec9-4f6e-88ba-6675e3fdf8ec"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[ 1 20 -1 40]\n"," [ 0 0 -1 0]]\n"]},{"data":{"text/plain":["(2, 4)"]},"execution_count":146,"metadata":{},"output_type":"execute_result"}],"source":["print( M)\n","M.shape"]},{"cell_type":"markdown","metadata":{"id":"OgrYX_FbOmkm"},"source":["** Don't confuse a matrix with only one row with a vector!!! **, the shapes are not equal!"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_xdb-Ba3Omkn","outputId":"6e6d7d76-25bd-468e-928e-37de4decffcb"},"outputs":[{"name":"stdout","output_type":"stream","text":["[10 10 30 40]\n","(4,)\n","[[10 10 30 40]]\n","(1, 4)\n"]}],"source":["a1 = np.array([[10,10,30,40]])\n","print (arr)\n","print (arr.shape)\n","print (a1)\n","print (a1.shape)"]},{"cell_type":"markdown","metadata":{"id":"2-Vrice9Omkn"},"source":["The number of elements in the array is available through the `ndarray.size` property:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"cM00LmQ9Omkn","outputId":"f6a14282-7838-4de5-b93d-2f0794de1f69"},"outputs":[{"data":{"text/plain":["8"]},"execution_count":151,"metadata":{},"output_type":"execute_result"}],"source":["M.size"]},{"cell_type":"markdown","metadata":{"id":"5QOIvi-mOmkn"},"source":["Equivalently, we could use the function `numpy.shape` and `numpy.size`"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"S-lvk1ZCOmko","outputId":"4209115e-7ffa-485c-ac8c-f08cdf1512a8"},"outputs":[{"data":{"text/plain":["(2, 4)"]},"execution_count":153,"metadata":{},"output_type":"execute_result"}],"source":["np.shape(M)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dJcNYCHtOmko","outputId":"28e9adef-0758-4df7-974c-d3f070cd0ade"},"outputs":[{"data":{"text/plain":["8"]},"execution_count":154,"metadata":{},"output_type":"execute_result"}],"source":["np.size(M)"]},{"cell_type":"markdown","metadata":{"id":"4wYiQ7tcOmko"},"source":["### More atrributes "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"zE-m1pLCOmko","outputId":"28ab0288-0689-47ca-f921-9b93e628ca6b"},"outputs":[{"data":{"text/plain":["8"]},"execution_count":155,"metadata":{},"output_type":"execute_result"}],"source":["arr.itemsize # bytes per element, int64 -> (8bytes)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"-LXtmW9POmkt","outputId":"db0c3264-879f-4def-fb84-14b167cb9cfd"},"outputs":[{"data":{"text/plain":["32"]},"execution_count":156,"metadata":{},"output_type":"execute_result"}],"source":["arr.nbytes # number of bytes 8*4"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_YIOb24EOmkt","outputId":"0acdbce2-e292-4482-bd5f-03b06bd6421d"},"outputs":[{"name":"stdout","output_type":"stream","text":["Num dim arr: 1 Num dim M: 2\n"]}],"source":["print (\"Num dim arr:\", arr.ndim, \"Num dim M:\", M.ndim) # number of dimensions"]},{"cell_type":"markdown","metadata":{"id":"geDs67gTOmku"},"source":["### Useful Methods "]},{"cell_type":"markdown","metadata":{"id":"BAEp4UotOmku"},"source":["NumPy offers a large library of common mathematical functions that can be applied elementwise to arrays. Among these are the functions: abs,sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh, and arctanh . \n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6w5TTcutOmku","outputId":"22e0bd49-75be-43c1-c53d-105790dd7ce0"},"outputs":[{"data":{"text/plain":["array([ 1., 2., 3.])"]},"execution_count":158,"metadata":{},"output_type":"execute_result"}],"source":["a = np.array([1, 4, 9], float) \n","np.sqrt(a)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"XaqPKD-oOmku","outputId":"4ef8d34d-6288-4a8e-d271-cf00c49753c0"},"outputs":[{"name":"stdout","output_type":"stream","text":["[ 1. 4. 9.]\n","Minimum and maximum : 1.0 9.0\n","Sum and product of all elements : 14.0 36.0\n","Mean and standard deviation : 4.66666666667 3.29983164554\n"]}],"source":["print (a)\n","print ('Minimum and maximum :', a.min(), a.max())\n","print ('Sum and product of all elements :', a.sum(), a.prod())\n","print ('Mean and standard deviation :', a.mean(), a.std())"]},{"cell_type":"markdown","metadata":{"id":"IRtC0JXKOmku"},"source":["If we want to know which index is the maximum or minimum, it can be done using `argmax` and `argmin`"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"4v4xul6JOmku","outputId":"1e84ce3a-b95c-42e6-88ad-e23d8b26604a"},"outputs":[{"name":"stdout","output_type":"stream","text":["[ 1. 4. 9.]\n"]},{"data":{"text/plain":["2"]},"execution_count":160,"metadata":{},"output_type":"execute_result"}],"source":["print (a)\n","np.argmax(a)"]},{"cell_type":"markdown","metadata":{"id":"GJ_NjpDEOmkv"},"source":["For these methods, the above operations area all computed on all the elements of the array. But for a multidimensional array, it's possible to do the computation along a single dimension, by passing the `axis` parameter; for example:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"3BtNlE6pOmkv","outputId":"a40b90f1-0f01-431c-81c5-101240df47b5"},"outputs":[{"name":"stdout","output_type":"stream","text":["For the following array:\n"," [[ 1 20 -1 40]\n"," [ 0 0 -1 0]]\n","The sum of all elements is : 59\n","The sum of elements along the columns is : [ 1 20 -2 40]\n","The sum of elements along the rows is : [60 -1]\n"]}],"source":["print ('For the following array:\\n', M)\n","print ('The sum of all elements is :', M.sum())\n","print ('The sum of elements along the columns is :', M.sum(axis=0))\n","print ('The sum of elements along the rows is :', M.sum(axis=1))\n"]},{"cell_type":"markdown","metadata":{"id":"Vtja7psTOmkv"},"source":["To find unique values in array, we can use the `unique` function:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"YJjWInQzOmkv","outputId":"3a80356f-e9ce-4399-889e-e6399e3ae35f"},"outputs":[{"name":"stdout","output_type":"stream","text":["[10 10 30 40]\n"]},{"data":{"text/plain":["array([10, 30, 40])"]},"execution_count":163,"metadata":{},"output_type":"execute_result"}],"source":["print (arr)\n","np.unique(arr)"]},{"cell_type":"markdown","metadata":{"id":"sD-9wVtmOmkv"},"source":["### Reshaping, resizing and stacking arrays"]},{"cell_type":"markdown","metadata":{"id":"m3vJgRK8Omkv"},"source":["The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"xW6YzVnqOmkw","outputId":"05ece7e9-65fe-4da7-b23b-f8db82f3a193"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[ 1 20 -1 40]\n"," [ 0 0 -1 0]]\n"]},{"data":{"text/plain":["(2, 4)"]},"execution_count":164,"metadata":{},"output_type":"execute_result"}],"source":["print (M)\n","n, m = M.shape\n","n,m"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"MVbmfNSLOmkw","outputId":"39afd948-f3ab-44a1-d3b6-97bf0f22cc58"},"outputs":[{"name":"stdout","output_type":"stream","text":["(8,)\n"]},{"data":{"text/plain":["array([ 1, 20, -1, 40, 0, 0, -1, 0])"]},"execution_count":165,"metadata":{},"output_type":"execute_result"}],"source":["B = M.reshape(n*m) #matrix to array\n","print (B.shape)\n","B"]},{"cell_type":"markdown","metadata":{"id":"asutAXf0Omkw"},"source":["Using function `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can create larger vectors and matrices from smaller ones:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_eIIZaEgOmkw","outputId":"71d13ddc-6239-4704-e2b1-42bc419421b0"},"outputs":[{"data":{"text/plain":["array([[1, 2],\n"," [3, 4]])"]},"execution_count":166,"metadata":{},"output_type":"execute_result"}],"source":["a = np.array([[1, 2], [3, 4]])\n","a"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"_1WMixANOmkw","outputId":"92a1f699-4f41-4ea7-c3a6-d2ac79f54fa8"},"outputs":[{"data":{"text/plain":["array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])"]},"execution_count":167,"metadata":{},"output_type":"execute_result"}],"source":["# repeat each element 3 times\n","np.repeat(a, 3)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"uG10yjv1Omkw","outputId":"cb4a8311-8670-43c8-c4ea-73a79beb63ac"},"outputs":[{"data":{"text/plain":["array([[1, 2, 1, 2, 1, 2],\n"," [3, 4, 3, 4, 3, 4]])"]},"execution_count":168,"metadata":{},"output_type":"execute_result"}],"source":["# tile the matrix 3 times \n","np.tile(a, 3)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"mSMK0BuIOmkx","outputId":"e6a6d9f7-fded-43c2-9353-91c8d7803e74"},"outputs":[{"data":{"text/plain":["array([[1, 2],\n"," [3, 4],\n"," [5, 6]])"]},"execution_count":169,"metadata":{},"output_type":"execute_result"}],"source":["np.concatenate((a, np.array([[5, 6]])), axis=0)"]},{"cell_type":"markdown","metadata":{"id":"LO3V11TdOmkx"},"source":["For transposing a matrix, it can be done using the array property T :"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"GXTNcOTzOmkx","outputId":"e1aeae90-508b-4bee-f542-1592cac87465"},"outputs":[{"data":{"text/plain":["array([[1, 2, 5],\n"," [3, 4, 6]])"]},"execution_count":170,"metadata":{},"output_type":"execute_result"}],"source":["np.concatenate((a, np.array([[5, 6]]).T), axis=1)\n"]},{"cell_type":"markdown","metadata":{"id":"svWz4667Omkx"},"source":["**hstack** and **vstack** : shortcuts for concatenate horizontally and vertically"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"MIi3wV4VOmkx","outputId":"793d76d4-db5f-4ea0-e516-b4e93c2af05c"},"outputs":[{"data":{"text/plain":["array([[1, 2],\n"," [3, 4],\n"," [5, 6]])"]},"execution_count":171,"metadata":{},"output_type":"execute_result"}],"source":["np.vstack((a,np.array([[5, 6]])))"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"cp5YEl1fOmkx","outputId":"f9c448c7-0bfe-4cb8-85b1-a20bac7535e7"},"outputs":[{"data":{"text/plain":["array([[1, 2, 5],\n"," [3, 4, 6]])"]},"execution_count":172,"metadata":{},"output_type":"execute_result"}],"source":["np.hstack((a,np.array([[5, 6]]).T))"]},{"cell_type":"markdown","metadata":{"id":"aJTlqTT4Omkx"},"source":["## Copy and \"deep copy\""]},{"cell_type":"markdown","metadata":{"id":"wBZGl6VFOmky"},"source":["To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (techincal term: pass by reference)."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"tXGf5-MWOmky","outputId":"64bb8adc-0cc0-4757-df5d-710b5e3b13b0"},"outputs":[{"data":{"text/plain":["array([[1, 2],\n"," [3, 4]])"]},"execution_count":173,"metadata":{},"output_type":"execute_result"}],"source":["A = np.array([[1, 2], [3, 4]])\n","A"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"TRu42MggOmky"},"outputs":[],"source":["# now B is referring to the same array data as A \n","B = A "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"tBxs9iY_Omky","outputId":"fc4b9728-6752-41c9-a528-a29c5248e4b3"},"outputs":[{"data":{"text/plain":["array([[10, 2],\n"," [ 3, 4]])"]},"execution_count":177,"metadata":{},"output_type":"execute_result"}],"source":["# changing B affects A\n","B[0,0] = 10\n","B"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Nb3pUPoBOmky","outputId":"8ab0c96b-7803-42d6-9757-359464811a14"},"outputs":[{"data":{"text/plain":["array([[10, 2],\n"," [ 3, 4]])"]},"execution_count":178,"metadata":{},"output_type":"execute_result"}],"source":["A"]},{"cell_type":"markdown","metadata":{"id":"eJsQW91SOmkz"},"source":["If we want to avoid this behavior, so that when we get a new completely independent object B copied from A, then we need to do a so-called \"deep copy\" using the function copy:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"A8NEKeOMOmkz"},"outputs":[],"source":["B = np.copy(A)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"IBKnfSM1Omkz","outputId":"5cc3ecbf-f2fa-44ce-90e3-d620f632603f"},"outputs":[{"data":{"text/plain":["array([[-5, 2],\n"," [ 3, 4]])"]},"execution_count":181,"metadata":{},"output_type":"execute_result"}],"source":["# now, if we modify B, A is not affected\n","B[0,0] = -5\n","B"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"6goWGDcgOmkz","outputId":"e489d776-c63b-44fe-ad12-220f9a5038b4"},"outputs":[{"data":{"text/plain":["array([[10, 2],\n"," [ 3, 4]])"]},"execution_count":182,"metadata":{},"output_type":"execute_result"}],"source":["A"]},{"cell_type":"markdown","metadata":{"id":"IpLoqaVXOmkz"},"source":["## Operating with arrays\n","Arrays support all regular arithmetic operators, and the numpy library also contains a complete collection of basic mathematical functions that operate on arrays. It is important to remember that in general, all operations with arrays are applied *element-wise*, i.e., are applied to all the elements of the array at the same time. "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"c_yocYIROmkz","outputId":"701aebce-401b-481a-d9bf-77d8fb650e37"},"outputs":[{"data":{"text/plain":["array([0, 1, 2, 3])"]},"execution_count":183,"metadata":{},"output_type":"execute_result"}],"source":["v1 = np.arange(0, 4)\n","v1"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"N-AM2CIsOmkz","outputId":"fdde4232-b8b3-448f-ba93-f4ed4e0507df"},"outputs":[{"data":{"text/plain":["array([0, 2, 4, 6])"]},"execution_count":184,"metadata":{},"output_type":"execute_result"}],"source":["v1 * 2"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"9h1C9Ye1Omk0","outputId":"36c9b25a-73fd-4791-b6df-ad566d6d77b9"},"outputs":[{"data":{"text/plain":["array([2, 3, 4, 5])"]},"execution_count":185,"metadata":{},"output_type":"execute_result"}],"source":["v1 + 2"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"rSupK0i_Omk0","outputId":"1d001455-9f20-4781-9a00-e8249d3c0a78"},"outputs":[{"data":{"text/plain":["array([[ 2, 40, -2, 80],\n"," [ 0, 0, -2, 0]])"]},"execution_count":186,"metadata":{},"output_type":"execute_result"}],"source":["M*2"]},{"cell_type":"markdown","metadata":{"id":"u5WYQHDHOmk0"},"source":["When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"CerxJValOmk0","outputId":"78dc7da8-94d3-49ea-b86f-0e7ed07a0ef4"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[ 1 20 -1 40]\n"," [ 0 0 -1 0]]\n"]},{"data":{"text/plain":["array([[ 1, 400, 1, 1600],\n"," [ 0, 0, 1, 0]])"]},"execution_count":187,"metadata":{},"output_type":"execute_result"}],"source":["print (M)\n","M*M"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"asaa1T4vOmk0","outputId":"0c1913d3-21a7-4cc0-8180-04e1a2be3e17"},"outputs":[{"data":{"text/plain":["array([0, 1, 4, 9])"]},"execution_count":188,"metadata":{},"output_type":"execute_result"}],"source":["v1*v1"]},{"cell_type":"markdown","metadata":{"id":"6159t26cOmk0"},"source":["If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"xdpI01HEOmk1","outputId":"78858f7f-23d3-4509-eae8-5257f31ee397"},"outputs":[{"data":{"text/plain":["((2, 4), (4,))"]},"execution_count":189,"metadata":{},"output_type":"execute_result"}],"source":["M.shape, v1.shape"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"JsRWjmBrOmk1","outputId":"aa55f67d-ed2a-423a-e1be-aae087aa3858"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[ 1 20 -1 40]\n"," [ 0 0 -1 0]]\n","[0 1 2 3]\n"]},{"data":{"text/plain":["array([[ 0, 20, -2, 120],\n"," [ 0, 0, -2, 0]])"]},"execution_count":190,"metadata":{},"output_type":"execute_result"}],"source":["print (M)\n","print (v1)\n","M * v1"]},{"cell_type":"markdown","metadata":{"id":"I_H56z7WOmk1"},"source":["What about matrix mutiplication? We can use the `dot` function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments: "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"2ccP8mhGOmk1","outputId":"aa64b094-a78f-4d2e-c66d-447fff557921"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[ 1 20 -1 40]\n"," [ 0 0 -1 0]]\n","[0 1 2 3]\n"]},{"data":{"text/plain":["array([138, -2])"]},"execution_count":191,"metadata":{},"output_type":"execute_result"}],"source":["print (M)\n","print (v1)\n","np.dot(M,v1)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"lTx-or_yOmk2","outputId":"c29da683-c32b-4433-e7c9-b0388fb06271"},"outputs":[{"data":{"text/plain":["14"]},"execution_count":192,"metadata":{},"output_type":"execute_result"}],"source":["np.dot(v1,v1)"]},{"cell_type":"markdown","metadata":{"id":"tqghkS9SOmk2"},"source":["### Broadcasting "]},{"cell_type":"markdown","metadata":{"id":"I9Of8Z_lOmk2"},"source":["Broadcasting means that, in principle, arrays must always match in their dimensionality in order for an operation to be valid, numpy will *broadcast* dimensions when possible. Previous examples of operations with an scalar and a vector is broadcasting:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"GO8kXl0nOmk2","outputId":"ece5989a-8675-4eb1-a84e-facbea155c64"},"outputs":[{"name":"stdout","output_type":"stream","text":["[0 1 2 3]\n"]},{"data":{"text/plain":["array([5, 6, 7, 8])"]},"execution_count":193,"metadata":{},"output_type":"execute_result"}],"source":["print (v1)\n","v1 + 5 # broadcasting => [0 1 2 3] + [5 5 5 5]"]},{"cell_type":"markdown","metadata":{"id":"OOFwox_0Omk2"},"source":["We can also broadcast a 1D array to a 2D array, in this case adding a vector to all rows of a matrix:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"iMXSAwylOmk2","outputId":"ced84c72-8fa8-4d6f-e881-ba461e41461e"},"outputs":[{"data":{"text/plain":["array([[ 1., 2., 3., 4.],\n"," [ 1., 2., 3., 4.],\n"," [ 1., 2., 3., 4.],\n"," [ 1., 2., 3., 4.]])"]},"execution_count":194,"metadata":{},"output_type":"execute_result"}],"source":["np.ones((4, 4)) + v1 # broadcasting = np.ones(4,4) + np.tile(v1,4)"]},{"cell_type":"markdown","metadata":{"id":"af5UM4dxOmk2"},"source":["We can also broadcast in two directions at a time:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"s7QDnIdDOmk3","outputId":"0f7dec65-c0ee-4368-c3c9-6fa41bf496ab"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[0]\n"," [1]\n"," [2]\n"," [3]]\n","[0 1 2 3]\n"]},{"data":{"text/plain":["array([[0, 1, 2, 3],\n"," [1, 2, 3, 4],\n"," [2, 3, 4, 5],\n"," [3, 4, 5, 6]])"]},"execution_count":195,"metadata":{},"output_type":"execute_result"}],"source":["print (v1.reshape((4, 1)))\n","print (np.arange(4))\n","v1.reshape((4, 1)) + np.arange(4)"]},{"cell_type":"markdown","metadata":{"id":"0PrUHU9ZOmk3"},"source":["** Rules of Broadcasting **\n","\n","Broadcasting follows the next algorithm:\n","\n","1. If the two arrays differ in their number of dimensions, the shape of the array with fewer dimensions is padded with ones on its leading (left) side.\n","\n","2. If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.\n","\n","3. If in any dimension the sizes disagree and neither is equal to 1, an error is raised.\n","\n","Note that all of this happens without ever actually creating the stretched arrays in memory! This broadcasting behavior is in practice enormously powerful, especially because when numpy broadcasts to create new dimensions or to `stretch` existing ones, it doesn't actually replicate the data. \n","\n","\n","In the first example: \n","\n"," v1 + 5\n","\n","the operation is carried as if the 5 was a 1-d array with 5 in all of its entries, but no actual array was ever created.\n","\n","In the example\n","\n"," v1.reshape((4, 1)) + np.arange(4)\n"," \n","- the second array is 'promoted' to a 2-dimensional array of shape (1, 4)\n","- the second array is 'stretched' to shape (4, 4)\n","- the first array is 'stretched' to shape (4, 4)\n","\n","Then the operation proceeds as if on two 4 $\\times$ 4 arrays."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"7OH1Y3dCOmk3","outputId":"0afca0d9-ec19-41c6-a2c9-4fa97ba43c05"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[0 1 2 3]\n"," [0 1 2 3]\n"," [0 1 2 3]\n"," [0 1 2 3]]\n","[[0 0 0 0]\n"," [1 1 1 1]\n"," [2 2 2 2]\n"," [3 3 3 3]]\n"]},{"data":{"text/plain":["array([[0, 1, 2, 3],\n"," [1, 2, 3, 4],\n"," [2, 3, 4, 5],\n"," [3, 4, 5, 6]])"]},"execution_count":196,"metadata":{},"output_type":"execute_result"}],"source":["#Broadcasting unrolled\n","print (np.tile(np.arange(4),(4,1)))\n","print (np.tile(v1.reshape((4,1)),4))\n","np.tile(np.arange(4),(4,1)) + np.tile(v1.reshape((4,1)),4)"]},{"cell_type":"markdown","metadata":{"id":"Cz_6AQLBOmk3"},"source":["### Visualizing Broadcasting\n","\n","\n","\n","([image source](http://www.astroml.org/book_figures/appendix/fig_broadcast_visual.html))"]},{"cell_type":"markdown","metadata":{"id":"338O6FNQOmk3"},"source":["Sometimes, however, we can use the ``newaxis`` constant to specify how we \n","want to broadcast:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"NrlLRl7oOmk3","outputId":"97e3aaed-cee8-4063-b03d-803c0e7bb1ae"},"outputs":[{"name":"stdout","output_type":"stream","text":["[[ 0. 0.]\n"," [ 0. 0.]] [-1. 3.]\n","[[-1. 3.]\n"," [-1. 3.]]\n","[[-1. 3.]\n"," [-1. 3.]]\n","[[-1. -1.]\n"," [ 3. 3.]]\n"]}],"source":["a = np.zeros((2,2), float) \n","b = np.array([-1., 3.], float) \n","print (a, b)\n","print\n","print (a + b) \n","print\n","print (a + b[np.newaxis,:]) \n","print\n","print (a + b[:,np.newaxis]) "]},{"cell_type":"markdown","metadata":{"id":"34nVkeTeOmk4"},"source":["# Further reading\n","\n","* http://numpy.scipy.org\n","* http://scipy.org/Tentative_NumPy_Tutorial\n","* http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"KUZNE4mQOmk4"},"outputs":[],"source":[]},{"cell_type":"markdown","metadata":{"id":"hff_Y_wuOmk4"},"source":["# Exercises"]},{"cell_type":"markdown","metadata":{"id":"Mhe9EvavOmk4"},"source":["1) In the following table we have expression values for 5 genes at 4 time points. "]},{"cell_type":"code","execution_count":19,"metadata":{"id":"vflUdqRFOmk4","colab":{"base_uri":"https://localhost:8080/","height":52},"executionInfo":{"status":"ok","timestamp":1676634579607,"user_tz":-60,"elapsed":201,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}},"outputId":"9e0009cf-8753-4f63-ebdc-79c3dc274684"},"outputs":[{"output_type":"stream","name":"stdout","text":["cat: genes.csv: No such file or directory\n"]},{"output_type":"execute_result","data":{"text/plain":["/content/genes.csv"],"text/html":["Path (genes.csv) doesn't exist. It may still be in the process of being generated, or you may have the incorrect path."]},"metadata":{},"execution_count":19}],"source":["from IPython.lib import display\n","%cat 'genes.csv' \n","display.FileLink('genes.csv')\n"]},{"cell_type":"markdown","metadata":{"id":"PX-uj88kOmk5"},"source":[" - Create a single array for the data (4x4)\n"," - Find the mean expression value per gene\n"," - Find the mean expression value per time point\n"," - Which gene has the maximum mean expression value? (Use the ``tab`` help on an `array`)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"l6XRTigLOmk5"},"outputs":[],"source":["#Your Code Here"]},{"cell_type":"markdown","metadata":{"id":"Rrc_prZyOmk5"},"source":["
`ipythonblocks` is a teaching tool that allows students to experiment with Python flow control concepts and immediately see the effects of their code represented in a colorful, attractive way. BlockGrid objects can be **indexed and sliced like 2D NumPy arrays** making them good practice for learning how to access arrays.
"]},{"cell_type":"code","execution_count":20,"metadata":{"id":"TUsDAJAqOmk5","outputId":"c9a9a3cb-0908-437f-c4e9-f216784e1976","colab":{"base_uri":"https://localhost:8080/","height":235},"executionInfo":{"status":"error","timestamp":1676634588858,"user_tz":-60,"elapsed":169,"user":{"displayName":"toni Monleon","userId":"05269440626861479105"}}},"outputs":[{"output_type":"error","ename":"FileNotFoundError","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mnumpy\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mchdir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'./modules/'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mipythonblocks\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mBlockGrid\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mipythonblocks\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mcolors\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: './modules/'"]}],"source":["import os\n","import numpy as np\n","os.chdir('./modules/')\n","from ipythonblocks import BlockGrid\n","from ipythonblocks import colors\n","os.chdir('..')\n","grid = BlockGrid(8, 8, fill=(123, 234, 123))\n","grid.show()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"vpQcQf1COmk5","outputId":"88281a99-49cf-4abb-e6bb-62a4c2ece3c8"},"outputs":[{"data":{"text/plain":["array([[0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0]])"]},"execution_count":199,"metadata":{},"output_type":"execute_result"}],"source":["a = np.array(np.zeros([8,8],dtype='int64'))\n","a"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"crfF7dIHOmk5","outputId":"2e7d087d-4896-44aa-840a-42ab331f5f22"},"outputs":[{"data":{"text/html":["
"],"text/plain":["Block(123, 234, 123, size=20)"]},"execution_count":200,"metadata":{},"output_type":"execute_result"}],"source":["grid[0, 0] #access to [0,0] element"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"P14HvuedOmk5","outputId":"c33bebfb-3d32-40a3-b4c5-92c616287a93"},"outputs":[{"data":{"text/html":["
"],"text/plain":[""]},"metadata":{},"output_type":"display_data"}],"source":["grid[0:2,:] = colors['Teal']\n","grid.show()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"FhTT2nnQOmk5","outputId":"fa2c602f-13b6-4d08-af3c-cafcd9c411eb"},"outputs":[{"data":{"text/plain":["array([[1, 1, 1, 1, 1, 1, 1, 1],\n"," [1, 1, 1, 1, 1, 1, 1, 1],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0],\n"," [0, 0, 0, 0, 0, 0, 0, 0]])"]},"execution_count":202,"metadata":{},"output_type":"execute_result"}],"source":["a[0:2,:] = 1\n","a"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Wmou7EEVOmk6","outputId":"53b6ed9f-9d31-49f4-b11c-666d194eddfb"},"outputs":[{"data":{"text/html":["
"],"text/plain":[""]},"metadata":{},"output_type":"display_data"}],"source":["grid[2,1:] = colors['Blue']\n","grid.show()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"BjPcKWxGOmk6"},"outputs":[],"source":["a[2,1:] = 2\n","a"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"aLB4QxcNOmk6"},"outputs":[],"source":["grid[:2,2:3] = colors['Peru']\n","grid.show()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"9OvagzXUOmk6"},"outputs":[],"source":["a[:2,2:3] = 3\n","a"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"C0NXb6APOmk6"},"outputs":[],"source":["grid[:,::2] = colors['Peru']\n","grid.show()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"pQph2OxZOmk6"},"outputs":[],"source":["a[:,::2] = 4\n","a"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"BfIdhTnAOmk6"},"outputs":[],"source":["grid[::2,::3] = colors['Red']\n","grid.show()"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ON7tEiKBOmk7"},"outputs":[],"source":["a[::2,::3] = 5\n","a"]},{"cell_type":"markdown","metadata":{"id":"klGJsmOwOmk7"},"source":["2) Build a graphical representation of all multiple of 3 numbers from 0 to 49 by using exclusively the slicing operator (no iterations). "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"8NFsy7_lOmk7"},"outputs":[],"source":["grid = BlockGrid(50, 1, block_size=10, fill=(123, 234, 123))\n","grid\n","# Your solution here"]},{"cell_type":"markdown","metadata":{"id":"ExPrBbHSOmk7"},"source":["3) Build a graphical representation of a chessboard 8x8 by using exclusively the slicing operator (no iterations)."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"FN1JXE2VOmk7"},"outputs":[],"source":["grid = BlockGrid(8, 8, block_size=20, fill=(0, 0, 0))\n","# Your solution here"]},{"cell_type":"markdown","metadata":{"id":"U-gldXIVOmk8"},"source":["4) Build a graphical representation of the prime numbers from 0 to 4999. (Hint: Compute the list of prime numbers and map this list to the grid representation)."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ZxV62qVJOmk8"},"outputs":[],"source":["BlockGrid(50, 100, block_size=10, fill=(123, 234, 123))\n","# Your solution here"]}],"metadata":{"kernelspec":{"display_name":"Python 3.7.7rc1 64-bit","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.7.7"},"vscode":{"interpreter":{"hash":"5060c395ae64037ee8844c2173a43fa7410db02f7938ed2393fe6924352a78b7"}},"colab":{"provenance":[],"collapsed_sections":["-z1NA-L1OmkP","FmasiQbBOmkT","9oUp8vnpOmkY","aKjQ0Jn3OmkY","h9b07wtqOmkd","oBxy8rcCOmkg","NT8Kvw8ZOmki","TrLrdV7nOmkl","4wYiQ7tcOmko","geDs67gTOmku","sD-9wVtmOmkv","aJTlqTT4Omkx","IpLoqaVXOmkz","tqghkS9SOmk2","Cz_6AQLBOmk3"]}},"nbformat":4,"nbformat_minor":0}