- Modern Big Data Processing with Hadoop
- V. Naresh Kumar Prashant Shindgikar
- 100字
- 2025-04-04 17:12:20
Data standardization
Once the information extraction is complete and any necessary cleanup is done, we need to decide how we are going to save the outcome of this process. Typically, we can use a simple CSV (comma separated value) format for this data. If we are dealing with a complicated output format, we can choose XML (Extensible Markup Language) or JSON (javascript object notation) formats.
These formats are very much standard and almost all the technologies that we have today understand these very easily. But to keep things simple at first, it's good to start with CSV format.