Reading the training dataset

There is a Cryotherapy.xlsx Excel file, which contains data as well as data usage agreement texts. So, I just copied the data and saved it in a CSV file named Cryotherapy.csv. Let's start by creating SparkSession—the gateway to access Spark:

val spark = SparkSession
      .builder
      .master("local[*]")
      .config("spark.sql.warehouse.dir", "/temp")
      .appName("CryotherapyPrediction")
      .getOrCreate()

import spark.implicits._

Then let's read the training set and see a glimpse of it:

var CryotherapyDF = spark.read.option("header", "true")
              .option("inferSchema", "true")
              .csv("data/Cryotherapy.csv")

Let's take a look to see if the preceding CSV reader managed to read the data properly, including header and types:

CryotherapyDF.printSchema()

As seen from the following screenshot, the schema of the Spark DataFrame has been correctly identified. Also, as expected, all the features of my ML algorithms are numeric (in other words, in integer or double format):

A snapshot of the dataset can be seen using the show() method. We can limit the number of rows; here, let's say 5:

CryotherapyDF.show(5)

The output of the preceding line of code shows the first five samples of the DataFrame:

本周热推：

ASP动态网页编程 Visual Studio 2010 (C#) Windows数据库项目开发网络综合布线设计与施工技术这样用PPT！机器人辅助C程序设计