Feature engineering and extraction/selection

Consider a situation where an employee from an analytics firm is given the company's billing data and is asked by their manager to build a machine learning system with it so the company's overall financial budget could be optimized. Now, this data is not in a format that can be given directly to an ML model since ML models expect data in the form of numeric vectors.

Although the data might be in good shape, the employee will still have to do something to convert that data into a favorable form. Given that the data is already wrangled, they still need to decide what features he is they are going to include in the final dataset. Practically, anything measurable can be a feature here. This is where good domain knowledge comes. This knowledge can help the employee to choose the features that have high predictive power. It may sound a bit light-weight, but it requires a lot of skills and it is definitely a challenging task. This is a classic example of feature engineering.

Sometimes, we employ several techniques that help us in the automatic extraction of the most meaningful features from a given dataset. This is particularly useful when the data is very high dimensional and the features are hard to interpret. This is known as feature selection. Feature selection not only helps to develop an ML model with the data that has the most relevant features but it also helps to enhance the model's predictive performance and to reduce its computation time.

Apart from feature selection, we might want to reduce the dimensionality of the data to better visualize it. Besides, dimensionality reduction is also employed to capture a representative set of features from the complete set of data features. Principal Component Analysis (PCA) is one such very popular dimensionality reduction technique.

It is important to keep in mind that feature selection and dimensionality reduction are not the same.