Hiding

In this approach, the data is considered too sensitive even to reveal it to the original owners. So, to protect the confidentiality of the data, certain portions of the text are masked with a predefined character, say X (or anything), so that only the person with complete knowledge about those pieces can extract the necessary information.

Examples: Credit card information is considered highly confidential and should never be revealed to anyone. If you have some experience of purchasing online on websites such as Amazon and so on, you would have seen that your full credit card information is not shown; only the last four digits are shown. Since I am the genuine owner of such a credit card, I can easily identify it and continue with the transaction.

Similarly, when there is a need for portions of data to be seen by analysts, it's important to mask significant pieces of it so that the end users will not get the complete picture but will use this data at the same time for any analysis that they are doing.

Let's see a few examples to understand this better:

Data type

Input

Output

Network

Creditcard

4485 4769 3682 9843

4485 XXXX XXXX 9843

Visa

Creditcard

5402 1324 5087 3314

5402 XXXX XXXX 3314

Mastercard

Creditcard

3772 951960 72673

3772 XXXXXX 72673

American Express

 

In the preceding examples, these numbers follow a predefined algorithm and size. So a simple technique of masking digits at fixed locations can work better.

Let's take up another example of hiding out portions of email addresses which vary in both size and complexity. In this case we have to follow different techniques to hide the characters to not reveal complete information:

Data type

Input

Output

Method

Email

hello@world.com

h.l.o@w.r.d.com

Even Hide

simple@book.com

.i.p.e@.o.k.c.m

Odd Hide

something@something.com

s...th.ng@..me...com

Complex Hide

The techniques can be as simple as:

  • Even Hide: In this technique, we hide the every character that is in the even position
  • Odd Hide: We hide every odd character in the input data
  • Complex Hide: In this technique, we understand the data we are dealing with using NLP and then try to apply an algorithm that doesn't reveal too much information that would allow any intelligent person to decode