Python programming language
This project needs to be completed. I believe the only thing missing is the CSV file. The Instructions/Guidelines are for the entire project and the other three files are what has been completed for the project.
Comments from Customer
My understanding is the programming is complete. The files I uploaded should have all the screenshots of the programming. However, if the expert finds that programming is required to get the CSV file and have the assignment completed to instructors requirements, please let me know what the cost increase will be.
Data analysts frequently need to extract large amounts of data from websites and save them to local files for use in the analysis of a phenomenon that is under investigation. That often includes creating a copy of all web links in a page later processing or automating maintenance tasks on a web site, such as checking links or validating HTML code.
For this project, you will use the Python programming language to scrape the web links from the HTML code of the “U.S. Census Bureau’s Population Estimates.”
Your submission must be your original work. No more than a combined total of 30% of a submission can be directly quoted or closely paraphrased from sources, even if cited correctly. Use the report provided when submitting your task as a guide.
You must use the rubric to direct the creation of your submission because it provides detailed criteria that will be used to evaluate your work. Each requirement below may be evaluated by more than one rubric aspect. The rubric aspect titles may contain hyperlinks to relevant portions of the course.
Submit one zipped folder that includes the code, input, and output files from the task. Place the responses to the task prompts in one PDF file.
Note: This assessment requires you to submit pictures, graphics, and/or diagrams. Each file must be an attachment no larger than 30 MB in size. Diagrams must be original and may be hand-drawn or drawn using a graphics program. Do not use CAD programs because attachments will be too large.
Develop a web links scraper program in Python that extracts all of the unique web links that point out to other web pages from the HTML code of the “Current Estimates” web link, both from the “US Census Bureau” website (see web link below) and outside that domain, and that populates them in a comma-separated values (CSV) file as absolute uniform resource indicators (URIs).
- Explain how the Python program extracts the web links from the HTML code of the “Current Estimates,” found in web links section.
- Explain the criteria you used to determine if a link is a locator to another HTML page. Identify the code segment that executes this action as part of your explanation.
- Explain how the program ensures that relative links are saved as absolute URIs in the output file. Identify the code segment that executes this action as part of your explanation.
- Explain how the program ensures that there are no duplicated links in the output file. Identify the code that executes this action as part of your explanation.
Note: Please consider weblinks that point to the same web pages as identical (e.g., www.commerce.gov and www.commerce.gov/).
- Provide the Python code you wrote to extract all the unique web links from the HTML code of the “Current Estimates” (in the web links section), that point out to other HTML pages.
- Provide the HTML code of the “Current Estimates” web page scrapped at the time when the scraper was run and the CSV file was generated.
- Provide the CSV file that your script created.
- Run your script and provide a screenshot of the successfully executed results.
- Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.
- Demonstrate professional communication in the content and presentation of your submission.
File name may contain only letters, numbers, spaces, and these symbols: ! – _ . * ‘ ( )
File size limit: 200 MB
File types allowed: doc, docx, rtf, xls, xlsx, ppt, pptx, odt, pdf, txt, qt, mov, mpg, avi, mp3, wav, mp4, wma, flv, asf, mpeg, wmv, m4v, svg, tif, tiff, jpeg, jpg, gif, png, zip, rar, tar, 7z