- Modern Big Data Processing with Hadoop
- V. Naresh Kumar Prashant Shindgikar
- 374字
- 2025-04-04 17:12:20
Hiding
In this approach, the data is considered too sensitive even to reveal it to the original owners. So, to protect the confidentiality of the data, certain portions of the text are masked with a predefined character, say X (or anything), so that only the person with complete knowledge about those pieces can extract the necessary information.
Examples: Credit card information is considered highly confidential and should never be revealed to anyone. If you have some experience of purchasing online on websites such as Amazon and so on, you would have seen that your full credit card information is not shown; only the last four digits are shown. Since I am the genuine owner of such a credit card, I can easily identify it and continue with the transaction.
Similarly, when there is a need for portions of data to be seen by analysts, it's important to mask significant pieces of it so that the end users will not get the complete picture but will use this data at the same time for any analysis that they are doing.
Let's see a few examples to understand this better:
Data type |
Input |
Output |
Network |
Creditcard |
4485 4769 3682 9843 |
4485 XXXX XXXX 9843 |
Visa |
Creditcard |
5402 1324 5087 3314 |
5402 XXXX XXXX 3314 |
Mastercard |
Creditcard |
3772 951960 72673 |
3772 XXXXXX 72673 |
American Express |
In the preceding examples, these numbers follow a predefined algorithm and size. So a simple technique of masking digits at fixed locations can work better.
Let's take up another example of hiding out portions of email addresses which vary in both size and complexity. In this case we have to follow different techniques to hide the characters to not reveal complete information:
Data type |
Input |
Output |
Method |
hello@world.com |
h.l.o@w.r.d.com |
Even Hide |
|
simple@book.com |
.i.p.e@.o.k.c.m |
Odd Hide |
|
something@something.com |
s...th.ng@..me...com |
Complex Hide |
The techniques can be as simple as:
- Even Hide: In this technique, we hide the every character that is in the even position
- Odd Hide: We hide every odd character in the input data
- Complex Hide: In this technique, we understand the data we are dealing with using NLP and then try to apply an algorithm that doesn't reveal too much information that would allow any intelligent person to decode