Cleaning and Profiling Data – Java
Cleaning and Profiling Code
Use only Hadoop MapReduce in this part of your project.
Do not use anything else.
You must write and submit 2 separate MapReduce jobs:
MR Job 1.
Data profiling – to explore your data
– Name the files: CountRecs.java, CountRecsMapper.java, CountRecsReducer.java
(Please use these exact names for your classes)
– This MR job counts the number of records in a dataset
– Run it on the original dataset, before cleaning, and output the number of records
– Run it on the cleaned dataset (result of MR Job 2 described below), output number of records – If the number of records don’t match, you should figure out why that is
– Re-submit a schema if it has changed.
MR Job 2.
Data cleaning – to avoid nasty exceptions later on in your analytic
– Name the files: Clean.java, CleanMapper.java, CleanReducer.java
(Please use these exact names for your classes)
– This MR job cleans the data – for example, by dropping columns you don’t need.
– It should write out a new file with only the columns you will use in your analytic.
– The selected columns for your data schema
FOR FULL CREDIT, PROVIDE THE CLASSES FOR EACH JOB
We've got everything to become your favourite writing service
Money back guarantee
Your money is safe. Even if we fail to satisfy your expectations, you can always request a refund and get your money back.
Confidentiality
We don’t share your private information with anyone. What happens on our website stays on our website.
Our service is legit
We provide you with a sample paper on the topic you need, and this kind of academic assistance is perfectly legitimate.
Get a plagiarism-free paper
We check every paper with our plagiarism-detection software, so you get a unique paper written for your particular purposes.
We can help with urgent tasks
Need a paper tomorrow? We can write it even while you’re sleeping. Place an order now and get your paper in 8 hours.
Pay a fair price
Our prices depend on urgency. If you want a cheap essay, place your order in advance. Our prices start from $11 per page.