书名：Machine Learning Projects for Mobile Applications
作者名：Karthikeyan NG
本章字数：1017字
更新时间：2021-06-10 19:41:37

Linear regression - supervised learning

Let's look at one simple example of linear regression with its implementation on TensorFlow.

Let's predict the price of a house, looking at the prices of other houses in the same area along with its size information:

We have a house that costs $82,000 and another one at $55,000. Now, our task is to find the price of the third house. We know the sizes of all of the houses along with the prices, and we can map this data in a graph. Let's find out the price of the third house based on the other two data points that we have:

You may wonder how to draw the line now. Draw a random line close to all the points marked on the graph. Now, calculate the distance from each point to the line and add them together. The result of this is the error value. Our algorithm should move toward minimizing the error, because the best fit line has a lower error value. This procedure is known as a gradient descent.

The prices of all the houses in the particular locality are mapped on the graph accordingly. Now, let's plot the values of the two houses that we already know:

After that, let's draw one line that passes closer to most of the values. The line fits the data perfectly. From this, we should be able to identify the price of house number three:

Based on the size of the data provided for home number three, we can map the size of the data provided for the home number on the graph. This will allow us to figure out the connecting point on the line drawn through all points. That maps to $98,300 on the y axis. This is known as linear regression:

Let's try to translate our problem into the form of a pseudocode:

def estimate_house_price(sqft, location): 
 price = 0 
 #In my area, the average house costs 2000 per sq.ft 
 price_per_sqft = 2000 
 if location == "vegas": 
     #but some areas cost a bit more 
     price_per_sqft = 5000 
 elif location == "newyork": 
     #and some areas cost less 
     price_per_sqft = 4000 
 #start with a base price estimate based on how big the place is 
 price = price_per_sqft * sqft 
 return price

This will be your typical method for estimating the house price. We can keep adding more and more condition checks to this, but it can't be controlled beyond the point at which the number of locations or the parameter increases. For a typical house price estimation, there are a lot of other factors also considered, such as the number of rooms, locations in the vicinity, schools, gas stations, hospitals, water level, transport, and so on. We can generalize this function into something very simple and guess the right answer:

def estimate_house_price(sqft, location): 
 price = < DO MAGIC HERE >
 return price

How can we identify a line that fits perfectly without writing condition checks? Typically, the linear regression line is represented in the following form:

Y = XW +b

In our example, let's put this into a more simple form in order to gain a better understanding:

prediction = X * Weight +bias

Weight is the slope of the line and bias is the intercept (the value of Y when X = 0). After constructing the linear model, we need to identify the gradient descent. The cost function identifies the mean squared error to bring out the gradient descent:

Let's represent the cost function through pseudocode to solve our problem of estimating the house price:

def estimate_house_price(sqft, location):
 price = 0
 #and this
 price += sqft * 235.43
 #maybe this too
 price += location * 643.34
 #adding a little bit of salt for a perfect result 
 price += 191.23
 return price

The values 235.43, 643.34, and 191.23 may look random, but with these values we are able to discover the estimation of any new house. How do we arrive at this value? We should do an iteration to arrive at the right value while reducing our error in the right direction:

def estimate_house_price(sqft, location):
 price = 0
 #and this
 price += sqft * 1.0
 #maybe this too
 price += location * 1.0
 #adding a little bit of salt for a perfect result
 price += 1.0
 return price

So, we will start our iterations from 1.0 and start minimizing errors in the right direction. Let's put this into code using TensorFlow. We will look at the methods that are used in more depth later on:

#import all the necessary libraries
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy

#Random number generator
randnumgen = numpy.random

#The values that we have plotted on the graph
values_X = 
  numpy.asarray([1,2,3,4,5.5,6.75,7.2,8,3.5,4.65,5,1.5,4.32,1.65,6.08])
values_Y = 
 numpy.asarray([50,60,65,78,89,104,111,122,71,85,79,56,81.8,55.5,98.3])

# Parameters
learning_rate = 0.01
training_steps = 1000
iterations = values_X.shape[0]

# tf float points - graph inputs
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set the weight and bias
W = tf.Variable(randnumgen.randn(), name="weight")
b = tf.Variable(randnumgen.randn(), name="bias")

# Linear model construction
# y = xw + b
prediction = tf.add(tf.multiply(X, W), b)

#The cost method helps to minimize error for gradient descent. 
#This is called mean squared error.
cost = tf.reduce_sum(tf.pow(prediction-Y, 2))/(2*iterations)

# In TensorFlow, minimize() method knows how to optimize the values for # weight & bias. 
optimizer = 
   tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

#assigning default values 
init = tf.global_variables_initializer()

#We can start the training now
with tf.Session() as sess:

    # Run the initializer. We will see more in detail with later       
    #chapters
     sess.run(init)

    # Fit all training data
     for step in range(training_steps):
         for (x, y) in zip(values_X, values_Y):
             sess.run(optimizer, feed_dict={X: x, Y: y})
             c = sess.run(cost, feed_dict={X: values_X, Y:values_Y})
             print("Step:", '%04d' % (step+1), "cost=", "  
                             {:.4f}".format(c), \
                             "W=", sess.run(W), "b=", sess.run(b))

     print("Successfully completed!")
     # with this we can identify the values of Weight & bias
     training_cost = sess.run(cost, feed_dict={X: values_X, Y: 
                                               values_Y})
     print("Training cost=", training_cost, "Weight=", sess.run(W), 
           "bias=", sess.run(b))

     # Lets plot all the values on the graph
     plt.plot(values_X, values_Y, 'ro', label='house price points')
     plt.plot(values_X, sess.run(W) * values_X + sess.run(b), 
                                       label='Line Fitting')
     plt.legend()
     plt.show()

You can find the same code from our GitHub repository (https://github.com/PacktPublishing/Machine-Learning-Projects-for-Mobile-Applications) under Chapter01.