Introduction

In the previous chapter, we discussed some applications of machine learning and even built models with the scikit-learn Python package. The previous chapter covered how to preprocess real-world datasets so that they can be used for modeling. To do this, we converted all the variables into numerical data types and converted categorical variables into dummy variables. We used the logistic regression algorithm to classify users of a website by their purchase intention from the online shoppers purchasing intention dataset. We advanced our model-building skills by adding regularization to the dataset to improve the performance of our models.

In this chapter, we will continue learning how to build machine learning models and extend our knowledge so that we can build an Artificial Neural Network (ANN) with the Keras package. (Remember that ANNs represent a large class of machine learning algorithms that are so-called because their architecture resembles the neurons in the human brain.)

Keras is a machine learning library designed specifically for building neural networks. While scikit-learn's functionality spans a broader area of machine learning algorithms, the functionality of scikit-learn for neural networks is minimal.

ANNs can be used for the same machine learning tasks that other algorithms can perform, such as logistic regression for classification tasks, linear regression for regression problems, and k-means for clustering. Whenever we begin any machine learning problem, to determine what kind of task it is (regression, classification, or clustering), we need to ask the following questions:

  • What outcomes matter the most to me or my business? For example, if you are predicting the value of stock market indices, you could predict whether the price is higher or lower than the previous time point (which would be a classification task) or you could predict the value itself (which would be a regression problem). Each may lead to a different subsequent action or trading strategy.

    The following plot shows a candlestick chart. It describes the price movements in financial data and is depicting a stock price. The colors represent whether the stock price increased (green) or decreased (red) in value over each period, and each candlestick shows the open, close, high, and low values of the data—important pieces of information for stock prices.

    Note

    You can find the high-quality color images for this chapter at: https://packt.live/38nenXS.

    One goal of modeling this data would be to predict what happens the following day. A classification task might predict a positive or negative change in the stock price and since there are only two possible values, this would be a binary classification task. Another option would be to predict the value of the stock the following day. Since the predicted value would be a continuous variable, this would be a regression task:

Figure 2.1: A candlestick chart indicating the movement of a stock index over the span of a month

  • Do we have the appropriately labeled data to train a model? For a supervised learning task, we must have at least some labeled data in order to train a model. For example, if we want to build a model to classify images into dog images and cat images, we would need training data, the images themselves, and labels for the data indicating whether they are dog images or cat images. ANNs often need a lot of data. For image classification, this can be millions of images to develop accurate, robust models. This may be a determining factor when deciding which algorithm is appropriate for a given task.

ANNs are a type of machine learning algorithm that can be used to solve a task. They excel in certain aspects and have drawbacks in others, and these pros and cons should be considered before choosing this type of algorithm. Deep learning networks are distinguished from single-layer ANNs by their depth—the total number of hidden layers within the network.

So, deep learning is really just a specific subgroup of machine learning that relies on ANNs with multiple layers. We encounter the results of deep learning on a regular basis, whether it's in image classification models such as the friend recognition models that help tag friends in your Facebook photos, or the recommendation algorithms that help suggest your next favorite songs on Spotify. Deep learning models are becoming more prevalent over traditional machine learning models for a variety of reasons, including the growing sizes of unstructured data that deep learning models excel at and lower computational costs.

Choosing whether to use ANNs or traditional machine learning algorithms such as linear regression and decision trees for a particular task is a matter of experience and an understanding of the inner workings of the algorithm itself. As such, the benefits of using traditional machine learning algorithms or ANNs will be mentioned in the next section.

Advantages of ANNs over Traditional Machine Learning Algorithms

  • The best performance: For any supervised learning task, the best models have been ANNs that are trained on a lot of data. For example, in classification tasks such as classifying images from the ImageNet challenge (a large-scale visual recognition challenge for classifying images into 1000 classes), ANNs can attain greater accuracy than humans.
  • Scale effectively with data: Traditional machine learning algorithms, such as logistic regression and decision trees, plateau in performance, whereas the ANN architecture is able to learn higher-level features—nonlinear combinations of the input features that may be important for classification or regression tasks. This allows ANNs to perform better when provided with large amounts of data - especially those ANNs with a deep architecture. For example, ANNs that perform well in the ImageNet challenge are provided with 14 million images for training. The following figure shows the performance scaling with the amount of data for both deep learning algorithms and traditional machine learning algorithms:

Figure 2.2: Performance scaling with the amount of data for both deep learning algorithms and traditional machine learning algorithms

  • No need for feature engineering: ANNs are able to identify which features are important in modeling so that they are able to model directly from raw data. For example, in the binary classification of dog and cat images into their respective classes, there is no need to define features such as the color size or weight of the animal. The images themselves are sufficient for the ANN to successfully determine classification. In traditional machine learning algorithms, these features must be engineered in an iterative process that is manual and can be time-consuming.
  • Adaptable and transferable: Weights and features that are learned from ANNs can be applied to similar tasks. In computer vision tasks, pre-trained classification models can be used as the starting points for building models for other classification tasks. For example, VGG-16 is a 16-layer deep learning model that's used by ImageNet to classify 1000 random objects. The weights that are learned in the model can be transferred to classify other objects in significantly less time.

However, there are some advantages of using traditional machine learning algorithms over ANNs, as explained in the following section.

Advantages of Traditional Machine Learning Algorithms over ANNs

  • Relatively good performance when the available training data is small: In order to attain high performance, ANNs require a lot of data, and the deeper the network, the more data is required. With the increase in layers, the number of parameters that need to be learned also increases. This results in more time to train on the training data to reach the optimal parameter values. For example, VGG-16 has over 138 million parameters and required 14 million hand-labeled images to train and learn all the parameters.
  • Cost-effective: Both financially and computationally, deep networks can take a lot of computing power and time to train. This demands a lot of resources that may not be available to all. Moreover, these models are time-consuming to tune effectively and require a domain expert who's familiar with the inner workings of the model to achieve optimal performance.
  • Easy to interpret: Many traditional machine learning models are easy to interpret. So, identifying which feature had the most predictive power in the model is straightforward. This can be incredibly useful when working with non-technical team members who wish to understand and interpret the results of the model. ANNs are considered more of a black box, in that while they are successful in classifying images and other tasks, the understanding behind how the predictions are made is unintuitive and buried in layers of computations. As such, interpreting the results requires more effort.

Hierarchical Data Representation

One reason that ANNs are able to perform so well is that a large number of layers allows the network to learn representations of the data at many different levels. This is illustrated in the following diagram, in which the representation of an ANN being used to identify faces is shown. At lower levels of the model, simple features are learned, such as edges and gradients, as can be seen by looking at the features that were learned in the initial layers. As the model progresses, combinations of lower-level features activate to form face parts, and at later layers of the model, generic faces are learned. This is known as feature hierarchy and illustrates the power that this layered representation has for model building and interpretation.

Many examples of input for real-world applications of deep neural networks involve images, video, and natural language text. The feature hierarchy that is learned by deep neural networks allows them to discover latent structures within unlabeled, unstructured data, such as images, video, and natural language text, which makes them useful for processing real-world data—most often raw and unprocessed.

The following diagram shows an example of the learned representation of a deep learning model—lower features such as the edges and gradients activate together to form generic face shapes, which can be seen in the deeper layers:

Figure 2.3: Learned representation at various parts of a deep learning model

Since deep neural networks have become more accessible, various companies have started exploiting their applications. The following are some examples of some companies that use ANNs:

  • Yelp: Yelp uses deep neural networks to process, classify, and label their images more efficiently. Since photos are one important aspect of Yelp reviews, the company has placed an emphasis on classifying and categorizing them. This is achieved more efficiently with deep neural networks.
  • Clarifai: This cloud-based company is able to classify images and videos using deep neural network-based models.
  • Enlitic: This company uses deep neural networks to analyze medical image data such as X-rays or MRIs. The use of such networks in this application increases diagnostic accuracy and decreases diagnostic time and cost.

Now that we understand the potential applications of using ANNs, we can understand the mathematics behind how they work. While they may seem intimidating and complex, they can be broken down into a series of linear and nonlinear transformations, which themselves are simple to understand. An ANN is created by sequentially combining a series of linear and nonlinear transformations. The next section discusses the basic components and operations involved in linear transformations that comprise the mathematics of ANNs.