- Modern Big Data Processing with Hadoop
- V. Naresh Kumar Prashant Shindgikar
- 179字
- 2025-04-04 17:12:20
Erasing
As the name suggests, this causes data loss when applied to the input data. Depending on the significance of the data we are dealing with, we need to apply this technique. Typical examples of this technique is to set a NULL value for all the records in a column. Since this null data cannot be used to infer anything that is meaningful, this technique helps in making sure that confidential data is not sent to the other phases of data processing.
Let's take few examples of erasing:
Input Data |
Output Data |
What's erased |
NULL earns 1000 INR per month |
Ravi earns NULL per month |
Salary and name |
NULL mobile number is 0123456789 |
Ravi's mobile number is NULL |
Mobile number and name |
From the examples, you might be wondering: why do we nullify these values? This technique is useful when we are not really interested in the PII but interested in a summary of how many salary records or mobile number records are there in our database/input.
This concept can be extended to other use cases as well.