Final Project Update

Ethan Ash, Margo Lewis, Kenton Young, Jenna Young

So far, we have scraped the data from the zoobooks and put it into a txt file. We have a file with all the high school names, but are currently working on adding towns and states to the file. We have gathered sources such as databases of public and private schools, which we will run against. In addition to this, we will have gathered sources relating to patterns for college admissions during the time that zoobooks have been around. We hope this research will help us gain insight into patterns we might find with the zoobook data. We have talked to Austin about our project, and he thinks we are on the right track. 

Cleaning the data has been more difficult than we thought it would be. The pdf reader did not do a great job of formatting the data how we wanted it, so we had to go through and clean the data more. We are trying to use regular expressions to get rid of the information we don’t want in the txt file. We haven’t had to change our initial plan, the data cleaning process is just taking longer than we originally thought. 

We are planning to implement the ArcGIS tool by adding a layer of our data of public and private schools when we are finished cleaning it. We also plan to create a heatmap in ArcGIS as well. The project is still on track, and we are heading in a good direction. We plan to finish the data cleaning and scraping fully by Tuesday of next week.  Ethan has been leading the scraping and cleaning of the data. Jenna, Margo, and Kenton have assisted with the cleaning as well as finding sources to use. They have also been planning on how we will use ArcGIS to present the data by researching the capabilities of it. 

Sources:

Carleton College Archives. (1991). Carleton College Zoobooks. Carleton Digital Collections. https://contentdm.carleton.edu/digital/collection/Zoobooks

National Center for Education Statistics. (1965). Search for Private Schools. Search for private schools. https://nces.ed.gov/surveys/pss/privateschoolsearch/index.asp

National Center for Education Statistics. (1965a). Search for Public Schools. https://nces.ed.gov/ccd/schoolsearch/

5 thoughts on “Final Project Update

  1. Interesting project! I’m excited to see what the end results will be. Scraping data from the zoo books may prove to be a challenge, especially since the images are scanned and the software output can be messy. I can definitely relate to this issue, as my team has faced a similar problem and had to resort to manually entering data into an Excel sheet.

  2. Our group also looked into Carleton’s Digital Archives for data and found that cleaning the variedly formatted files could be a little tricky. I feel like compiling the data into an organized structure is always the hardest part, and once we get into visualization things would become more straightforward. Also, the idea of creating a heat map sounds very interesting! As a viewer, I would definitely appreciate an easily interpretable visual that clearly lays out an amount of geographic and demographic information. By the way, I never knew zoobooks were a thing, so I cannot wait to see what your project reveals about Carleton’s student body and its evolution over time!

  3. This is an interesting idea for the project and I am curious to see what results you will find. I am curious if you are doing only the zoobooks for those 3 years, or will there be more? Is there a reason why you chose those years?

  4. I think that your group’s project seems really cool! I am interested to see what kind of patterns you will be able to see once your heat map is completed. In addition, I wonder what the heat map will look like in the end and if it will vary from a standard one. Will you guys be using other softwares to make any other data visualizations?

  5. Wow, it sounds like an interesting project but also a time-consuming one. Gathering data can be hard, especially with the issue you guys have with the pdf reader not doing a great job of formatting the data how you guys wanted it to be. Our group also faced a similar issue where the pdf file doesn’t convert into a CSV file where we could clean the data up. Unfortunately, we have to either write a code to extract the data we need or manually do it ourselves which kinda takes the fun out of our project. Overall, I feel like everyone has some type of issues or challenges that they faced going into this project but in the end, I feel like it would be worth it and the end results would come out to be great.

Leave a Reply to Sunniva Maharjan Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

css.php