Cleaning and Profiling Data – Java

Cleaning and Profiling Code
Use only Hadoop MapReduce in this part of your project.
Do not use anything else.
You must write and submit 2 separate MapReduce jobs:

MR Job 1.
Data profiling – to explore your data
– Name the files: CountRecs.java, CountRecsMapper.java, CountRecsReducer.java
(Please use these exact names for your classes)
– This MR job counts the number of records in a dataset
– Run it on the original dataset, before cleaning, and output the number of records
– Run it on the cleaned dataset (result of MR Job 2 described below), output number of records – If the number of records don’t match, you should figure out why that is
– Re-submit a schema if it has changed.
MR Job 2.
Data cleaning – to avoid nasty exceptions later on in your analytic
– Name the files: Clean.java, CleanMapper.java, CleanReducer.java
(Please use these exact names for your classes)
– This MR job cleans the data – for example, by dropping columns you don’t need.
– It should write out a new file with only the columns you will use in your analytic.
– The selected columns for your data schema
FOR FULL CREDIT, PROVIDE THE CLASSES FOR EACH JOB

Don't use plagiarized sources. Get Your Custom Essay on
Cleaning and Profiling Data – Java
For $10/Page 0nly
Order Essay
admin

Recent Posts

Economic Debate #3- Progressive Income Tax – The Homework Helper

Economic Debate- Progressive Income Tax For this Economic Debate, we are going to discuss the…

2 years ago

MKT 6120 – Marketing Management – Davis Learning Engagement #7

TOPIC: Going Global Discussion Thread 1 (initial post due Wednesday for full credit) Please note:…

3 years ago

jvjvjhvjhvhjvj

Assignment Topic This week will culminate in the creation of a narrated PowerPoint to create…

3 years ago

Students are supposed to select a technological organization of their choice.

The Assignment must be submitted on Blackboard (WORD format only) via allocated folder. Assignments submitted…

3 years ago

Increases the risk of wildfires

you need to post your 2-page information flier to share with your Final Project Group.…

3 years ago

Statistics for Technology management

discussion: Discuss the methods used at your company to measure and ensure quality products and…

3 years ago