How can one summarize information and data and convey its meaning to others? What is an effective data visualization? What is an ineffective or dishonest one? And, for that matter, what is data? This course will explore these questions by introducing students to the broad field of information visualization. Students will learn about different types of visualizations that may be used to explore variation and covariation, the evolution of processes through time and space, and representing parts of a whole. Much of the work of this course will be carried out using computers and the R programming language, but we will also explore non-computational approaches to visualization.
Students will develop skills in data collection, data cleaning, and creating different types of data visualizations (e.g. bar charts, scatter plots, density plots, heat maps, violin plots, time series, and interactive graphics) and effective data communication while working on problems and case studies inspired by and based on real-world questions. We will also critique and reflect upon data visualizations in our daily lives. Students will also gain familiarity with descriptive statistics and ways to organize and summarize categorical and numerical data to pick out key information.
Topic Intro Tuesday and Wednesday, Lab Friday
Topic Intro - 9:00-10:25
Lab - 9:00-10:25
Help sessions with Shea will be Monday: 5:00-7:00; Tuesday 2:30-4:00; Thursday: 4:15-5:15; and Friday: 1:00-2:30. Help sessions with Laurie are Wednesday 1:30-3:00 in GIS Lab. Help sessions with Joshua: Wednesday 4:00-5:30 in GIS Lab
How can one summarize information and data and convey its meaning to others? What is an effective data visualization? What is an ineffective or dishonest one? And, for that matter, what is data? This course will explore these questions by introducing students to the broad field of information visualization. Students will learn about different types of visualizations that may be used to explore variation and covariation, the evolution of processes through time and space, and representing parts of a whole. Much of the work of this course will be carried out using computers and the R programming language, but we will also explore non-computational approaches to visualization. Students will develop skills in data collection, data cleaning, and creating different types of data visualizations (e.g. bar charts, scatter plots, density plots, heat maps, violin plots, time series, and interactive graphics) and effective data communication while working on problems and case studies inspired by and based on real-world questions. We will also critique and reflect upon data visualizations in our daily lives. Students will also gain familiarity with descriptive statistics and ways to organize and summarize categorical and numerical data to pick out key information.
This course is designed to serve as an introduction to programming in R. Students will learn to gain insight from data, to use literate programming and version control so that these insights are reproducible by others, and to develop code collaboratively. Students who successfully complete this course will be able to work with large data sets, transform those data, and implement effective visualizations. Throughout the course we will be using GitHub, ggplot2, Rmarkdown, gganimate, RShiny and the tidyverse packages for data manipulation and visualization. This course is intended to appeal to a wide range of students. The skills and habits of mind taught in this course are applicable not only in the sciences and social sciences, but in almost all fields. Evaluation will be based on several short homework and lab assignments, participation in in-class activities, and a final project.
The course content is organized in four units:
Unit 1 - Hello world: This unit is an introduction to the content, pedagogy, and toolkit of the course.
Unit 2 - Exploring data: This unit focuses on data visualization and data wrangling. Specifically we cover fundamentals of data and data visualization, confounding variables, and Simpson’s paradox as well as the concept of tidy data, data import, data cleaning, and data curation. Also in this unit students are introduced to the toolkit: R, RStudio, R Markdown, Git, and GitHub.
Unit 3 - Data science ethics: In this unit we discuss misrepresentation of findings, particularly in data visualizations, breaches of data privacy, and algorithmic bias. We also cover accessibility of data visualizations.
Unit 4 - Looking forward: In the last unit we will explore a series of short modules. These could include modules such as interactive reporting and visualization with Shiny, creating animated plots using gganimate, and creating maps.
The computing courses at COA are designed to bridge the liberal arts education to computing and the digital world. In this, I am committed to actively creating digital and computational spaces that are radically inclusive. This includes integrating equity and social justice throughout our curriculum, and engaging students in metacognition to support this work.
This course is designed as a community learning journey. Together, we will:
It is also my hope that in this course you:
All books for this course are freely available online as e-books. We will be using two main texts:
The class meets on Tuesday, Wednesday and Friday from 9:00-10:25. The typical weekly class schedule will be:
Day | Activity |
---|---|
Tuesday | Topic Introduction |
Wednesday | Topic Introduction |
Friday | Lab |
In addition to the 4.5 hours of scheduled class time every week, I expect that between readings, going over notes, and doing assignments you will spend at least an additional 10.5 hours a week on this course, for a total of at least 150 hours over the term devoted to this class.
*For labs and homeworks I am trying out deadline windows this year. I would like you to strive to turn in the assignments on their stated due date, but there is flexibility to turn in homework and lab assignments through Sunday at 23:59pm EST. I don’t want to penalize you when things are taking a little more time to complete, but the deadlines are there to keep you on track with the work and so that I can get you feedback in a timely manner. I will start my review of your work on Mondays (hence the window through Sunday).
For all of the team based assignments in this class you will be randomly assigned to teams of 2 or 3 students - these teams will change throughout the trimester. You will work in these teams during class and on the lab assignments. For team based assignments, all team members are expected to contribute equally to the completion of each assignment. During the labs, we will be working together using pair programming, where you will take it in turns to write and review the the code, swapping roles frequently. Once the assignment is submitted the contributors will share responsibility for any revisions to be made based on feedback. Failure to adequately contribute to an assignment will result in a penalty to your grade relative to the team’s overall grade.
What if I need help? First, I expect you to make a good-faith effort whenever tackling a new problem. While there is no single best answer to “How long should I spend trying to solve a problem before asking a question?” five minutes is too little and five hours is way too long. We will be developing our debugging toolkit as we progress through the course. Learning how to spot and fix mistakes in our code (bugs) are a key part of programming.
Remember that struggling with a new concept is normal and is part of the process of learning; becoming frustrated to the point of desperation is not. If you are unable to solve the issue after using the tools in your debugging toolkit, there are several mechanisms by which you can, and should, reach out for help:
The Teaching Assistants and I will have a handful of help sessions every week. You are warmly invited and encouraged to attend these sessions. Help sessions are relaxed, informal, and hopefully fun. Things that happen at help sessions:
Everyone is welcome at help sessions! Attending these sessions help students do well in class and get as much out of it as possible.
The help sessions for this term will be in CHE 103 and are TBD:
Day | Help Session |
---|---|
Wednesday | |
Thursday | |
Friday | |
Sunday |
Building a strong peer network of classmates you can ask for help will be valuable for staying on top of homework and assignments. To help you be successful in this course, you will be assigned to a peer group at the start of the term. Outside of help sessions your peer group should be your first point of call for questions about the course and assignments and also to discuss concepts and your progress and questions in the course.
The question forum on google classroom will be used for online Q&A and collaborative discussion. You can post questions and answers can be posted by the instructor, TA, and/or other students. Please check google classroom to see whether a question has already been asked/answered before starting a new post. However, instead of sharing code on google classroom, describe the issue or post the error message - if we need to troubleshoot outside google classroom, I will let you know!
I will make an effort to check the question forum every few days during the week to answer any questions. I will check the board on Monday to answer any questions posted over the weekend.
Online resources (e.g., StackOverflow) can be a good source for understanding and inspiration. These should not be an immediate go-to without your having struggled with the problem first. You should also take any such online posting with a grain of salt, as they can sometimes be misleading or in a different context than your own or, in some cases, simply wrong. Moreover, take special note of the “Sharing and Reusing Code” section below.
These will be held on Fridays. During these sessions you will work individually or in teams on computing lab exercises and you will finish the exercises after class and turn in your lab reports the following Thursday at 23:59 EST. Attendance to class is important as you will be working on your labs individually and together in class. Labs will be submitted as GitHub repositories.
A frequently asked question is “What happens if I can’t make it to a lab one week because I’m sick or have another obligation at that time?” Answers below:
If you’re missing a workshop day due to short-term illness or some other reason, you should communicate this with your team and attend a team meeting before the deadline for the assignment to contribute to the teamwork. If you have made 0 commits towards a lab assignment, you will receive a 0 for that assignment, so you need to participate both for being a team player and also for your own individual score.
If you’re unable to contribute to a lab assignment because of an illness taking you away from school work for an extended period of time, you should let me and your team know that you won’t be able to contribute to those lab(s) and we can discuss special circumstances and explore alternative arrangements to make up that work.
Overall these policies are put in place to ensure communication between team members, respect for each others' time, and also to give you a safety net in the case of illness or other reasons that keep you away from attending class once or twice.
Beyond the in class activities, you will be assigned weekly larger programming tasks throughout the semester. These assignments will be completed individually, and submitted as GitHub repositories. Tip: Do the (optional) R tutorials which will introduce you to the datasets and topics covered in the homework assignments.
These weekly multiple choice quizzes will help you evaluate your learning continuously. The online quiz will be graded for completion only. You do not need to get any answers right, but it should help you identify what parts of the material you should review. Tip: Don’t leave it until the last minute!
Throughout the course there will be reflection assignments where we engage and reflect on contemporary issues in environmental and social justice related to our digital world, community and identity. These assignments may be based on an assigned reading or visualizations we encounter in our daily lives.
You will be responsible for the completion of an open ended final project for this course, the goal of which is to tackle an “interesting” problem using the tools and techniques covered in this class. Additional details on the project will be provided as the course progresses. Each team’s work will also be shared with and evaluated by at least one other team at an earlier stage in order to provide feedback. You must complete the final project and be in class to present it in order to pass this course. Tip: Stick to optional interim deadlines (outline, draft presentation) to pace your work on the project.
For all of the team based assignments in this class you will be randomly assigned to teams of 2 or 3 students - these teams will change throughout the trimester. You will work in these teams during class and on the lab assignments. For team based assignments, all team members are expected to contribute equally to the completion of each assignment and you will be asked to evaluate your team members on lyceum after each assignment is due. During the labs, we will be working together using pair programming, where you will take it in turns to write and review the the code, swapping roles frequently. Once the assignment is submitted the contributors will share responsibility for any revisions to be made based on feedback. Failure to adequately contribute to an assignment will result in a penalty to your grade relative to the team’s overall grade.
Students are expected to make use of the provided GitHub repository as their central collaborative platform. Commits to this repository will be used as a metric (one of several) of each team member’s relative contribution for each homework.
A growing body of research indicates that traditional approaches to grading fail to produce the sorts of meaningful learning desired by both teachers and students. Such approaches often reinforce inequitable power dynamics between teachers and students, promote faulty reward systems that disincentive creativity and risk-taking, and devalue important aspects of learning (including revision and feedback). Given this context, instead of a traditional approach to grading in which you do work that is evaluated singularly by me, this course assumes that you opt to take ownership and responsibility over your performance and engagement with the class. To make this happen, this course uses a “contract grading” scheme, which gives you a voice in the grading process, provides you with the agency to specify your intended course performance, and also share in the responsibility for evaluating whether or not you fulfilled your intended obligations. Please see the contract grading document (on Google Classroom) for a more-fleshed-out explanation of this approach and how it will operate in the course.
The work in this course will be comprised of the following components and their weights:
Weekly homeworks and labs - approximately 40% Project - approximately 25% Quizzes and self evaluation - approximately 10% Attendance (Participation, Engagement, and Leadership) in all aspects of the course - approximately 10% Reflections - approximately 15%
Only work that is clearly assigned as team work should be completed collaboratively. Individual assignments must be completed individually, you are welcome to discuss the problems in general and ask for help and advice from classmates and teaching assistants and myself. You may not directly share your code with anyone other than the instructors and teaching students, but you can help eachother spot errors and give suggestions.
I am well aware that a huge volume of code is available on the web to solve any number of problems. Unless I explicitly tell you not to use something the course’s policy is that you may make use of any online resources (e.g. StackOverflow) but you must explicitly cite where you obtained any code you directly use (or use as inspiration). Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. On individual assignments you may not directly share code with another student in this class, and on team assignments you may not directly share code with another team in this class. You are welcome to discuss the problems together and ask for advice, but you may not send or make use of code from another team.
The use of generative AI (such as ChatGPT, DALL-E, etc.) may be used to help clarify concepts or to ask questions to assist learning, but you are responsible for ground-truthing that information with other sources.
The use of generative AI tools (such as ChatGPT, DALL-E, etc.) cannot be use to work on specific problems on homework, labs, or project; any use of AI tools for these assignments may be considered a violation of College of the Atlantic’s Academic Integrity policy, since the work is not your own. The use of unauthorized AI tools will result in no credit for that assignment.
When in doubt, ask me beforehand. For instance, I am happy for you to ask Generative AI to explain how the function select
works in R and what is a pipe
? and to compare their answer to your notes. Asking it to “Solve the following problem: [copy-paste homework question]” is not an acceptable use of this tool and is in violation of the Academic Integrity policy.
By enrolling in an academic institution, a student is subscribing to common standards of academic honesty. Any cheating, plagiarism, falsifying or fabricating of data is a breach of such standards. A student must make it their responsibility to not use words or works of others without proper acknowledgement. Plagiarism is unacceptable and evidence of such activity is reported to the provost or their designee. Two violations of academic integrity are grounds for dismissal from the college. Students would request in-class discussions of such questions when complex issues of ethical scholarship arise.
In my experience, issues with Academic Integrity are more like to arise when you are coming up against a deadline. Start your work early and give yourself plenty of time to attend help sessions.
Many of us learn in different ways. For example, you may process information by speaking and listening, so while lectures are quite helpful for you, some of the written material may be difficult to absorb. You might have difficulty following lectures, but are able to quickly assimilate written information. You may need to fidget to focus in class. You might take notes best when you can draw a concept. For some of you, speaking in class can be a stressful or daunting experience. For some of you, certain topics or themes might be so traumatic as to be disruptive to learning. The principle of Universal Design for Learning calls for our classrooms, our virtual spaces, our practices and our interactions to be designed to include as many different modes of learning as possible, and is a principle I take seriously in this class.
It is also my goal to create an inclusive classroom, which depends on community building, and which requires everyone to come to class with mutual respect, civility, and a willingness to listen to and observe others. As such the syllabus serves as a contract of some expectations between all members of the class, including myself.
If you anticipate or experience any barriers to learning in this course, please reach out to me and your student support advisor. If you have a disability, or think you may have a disability, COA’s Disability Support Services located within the Office of Student Life in Deering Commons to develop a plan for your academic accommodations. You can find out more information in the course catalog under Accommodating students with disabilities. If you have already been approved for accommodations through the Disability Support Services please let me know! We can meet 1-1 to explore concerns and potential options.
All work is due on the stated due date through the due date window (Sunday for homeworks and labs). Due date windows are there to help guide your pace through the course and they also allow me to return feedback to you in a timely manner. However, sometimes life gets in the way and you might not be able to turn in your work on time, which is why instead of a single due date, I am trying out due date windows this term.
If you intend to submit work late for an assignment or project outside the due date window, you must notify me ideally before the original deadline and as soon as the completed work is submitted on GitHub. This allows me to return feedback to you and let’s me know when to check your work as Github does not send me notifications. Lab work cannot be submitted outside the due date window.
COA is dedicated to establishing and maintaining a safe and inclusive campus where all community members have equal access to COA’s educational and employment opportunities. We strive to promote an environment of respect, safety, and well-being and will not tolerate gender-based or sexual discrimination nor sexual harassment of any kind.
As a faculty member, I am considered a “responsible employee” and am required to share any disclosures of sexual or gender-based misconduct with the Title IX Coordinator. This includes disclosures of experiences that happened before an individual’s time at COA. This is to ensure that all community members who have experienced sexual misconduct receive support, options, and information about their rights and resources. Community members are not obligated to respond to this outreach, and this will not generate a report to law enforcement.
For more information regarding Title IX, our institutional policy, and to access helpful resources, visit COA’s Title IX website: coa.edu/human-resources/title-ix.
If you have any questions or want to explore support and assistance, please contact COA’s Title IX Coordinator, Puranjot Kaur, at pkaur@coa.edu. Speaking to the Title IX Coordinator does not automatically initiate a college resolution. Instead, much of her work is around providing supportive measures to ensure you can continue to engage in COA’s programs and activities.
Note on Pregnancy and Related Conditions
Title IX prohibits discrimination based on sex in education programs and activities. This prohibition on discrimination extends to pregnancy and related conditions. Pregnancy and related conditions encompass pregnancy, childbirth, miscarriage, termination of pregnancy, false pregnancy, lactation, or recovery from any of these conditions.
Students experiencing pregnancy or related conditions may voluntarily initiate contact with the Title IX Coordinator to request reasonable adjustments available under Title IX. Reasonable adjustments may include but are not limited to: excusing student absences; allowing students to make up missed work; opportunities to move around during class; additional breaks; missing some or all of a class session to nurse or pump, and to have the opportunity to make up any work missed. Information on lactation space on campus can be found here: coa.edu/human-resources/title-ix/support-resources/lactation-space.
Students who believe they have been subject to discrimination because of pregnancy or related condition status may file a formal complaint with the Title IX Coordinator. If you are a pregnant or parenting student, and you are in need of any adjustments. please let me know at your earliest convenience.
I want to make sure that you learn everything you were hoping to learn from this class. If this requires flexibility, please don’t hesitate to ask.
You never owe me personal information about your health (mental or physical) but you’re always welcome to talk to me. If I can’t help, I likely know someone who can.
I want you to learn lots of things from this class, but I primarily want you to stay healthy, balanced, and grounded.
Most of you will need help at some point and I want to make sure you can identify when that is without getting too frustrated and feel comfortable seeking help.
Everyone is welcome at help sessions! Attending these sessions help students do well in class and get as much out of it as possible.
Showcase your inner data scientist
Pick a dataset, any dataset…
…and do something with it. That is your final project in a nutshell. More details below.
The final project for this class will consist of a visualization of a dataset of your own choosing. The dataset may already exist, or you may collect your own data using a survey or by conducting an experiment. You can choose the data based on your interests or based on work in other courses or research projects. The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like) and apply them to a novel dataset in a meaningful way.
The goal is not to do an exhaustive data analysis i.e., do not create every visualization you have learned for every variable, but rather let me know that you are proficient at asking meaningful questions and exploring them with data visualizations, that you are proficient in using R, and that you are proficient at interpreting and presenting the results.
The project is very open ended. You should create some kind of compelling visualizations of this data in R. There is no limit on what tools or packages you may use, but sticking to packages we learned in class (tidyverse
) is required. You do not need to visualize all of the data at once. A few high quality visualizations will receive a much higher grade than a large number of poor quality visualizations. At least one visualization should be made using R but you are also encouraged to create your own visualizations using other mediums/crafts if you choose. Also pay attention to your presentation. Neatness, coherency, and clarity will count. All analyses must be done in RStudio, using R.
Here is an example of a past project write up and presentation on Lessons to be Learned from Super Bowl Advertisements.
In order for you to have the greatest chance of success with this project it is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple relationships can be explored. As such, your dataset must have at least 50 observations and between 10 to 20 variables (exceptions can be made but you must speak with me first). The variables in the data should include categorical variables, discrete numerical variables, and continuous numerical variables.
If you are using a dataset that comes in a format that we haven’t encountered in class, make sure that you are able to load it into R as this can be tricky depending on the source. If you are having trouble ask for help before it is too late.
Note on reusing datasets from class: Do not reuse datasets used in examples, homework assignments, or labs in the class.
Below are a list of data repositories that might be of interest to browse. You’re not limited to these resources, and in fact you’re encouraged to venture beyond them. But you might find something interesting there:
One of the main goals of this course is that you build and develop community leadership skills as a collaborator that shares strengths, builds weaknesses, and contributes to a broader shared understanding. These skills will serve you in this course and beyond in your careers. A crucial part of building strong collaborations is good communication.
Each team will draft a group contract. A group contract is a document to help you formalize the expectations you have for your group members and what they can expect of you. It will help you think about what you need from each other to work effectively as a team! You will create and agree on this contract as a team and refer to it during the project.
At a minimum, your group contract must address these questions:
Each member should “sign” (you can just type out your name) at the bottom of the submission.
Credit for Group Contract: Tiffany Timbers, University of British Columbia
You will write your proposal in the proposal.Rmd file in your Github project.
Section 1 - Introduction: The introduction should introduce your general research question and your data (where it came from, how it was collected, what are the cases, what are the variables, etc.).
Section 2 - Data: Place your data in the /data
folder, and add dimensions
and codebook to the README in that folder. Then print out the output of
glimpse()
or skim()
of your data frame.
Section 3 - Data analysis plan:
Each section should be no more than 1 page (excluding figures). You can check a print preview to confirm length.
10 minutes maximum, and each team member should say something substantial.
Prepare a slide deck using either Google Slides or the template in your repo. This template uses a package called xaringan
, and allows you to make presentation slides using R Markdown syntax. There isn’t a limit to how many slides you can use, just a time limit (10 minutes total). A rough guide to follow is one slide is equal to one minute. Each team member should get a chance to speak during the presentation. Your presentation should not just be an account of everything you tried (“then we did this, then we did this, etc."), instead it should convey what choices you made, and why, and what you found.
If you use xaringan
to make your slides, make sure your chunks are turned off with echo = FALSE
as you finalize your presentation.
Presentation schedule: Presentations will take place during the Tuesday and Wednesday of the last week of the course. During the class you will watch presentations from the other teams and provide feedback in the form of peer evaluations. The presentation line-up will be generated randomly.
Along with your presentation slides, I want you to provide a brief summary of your project in the README of your repository.
This write-up, which you can also think of as an summary of your project, should provide information on the dataset you’re using, your research question(s), your approach (how you decided to visualize the data), and your findings.
The following folders and files in your project repository:
presentation.Rmd
+ presentation.html
: Your presentation slides. If you use google slides you can put a link to your presentation in your README.md instead.README.md
: Your write-up/data/*
: Your dataset in csv or RDS format, in the /data
folder./proposal
: Your proposal from earlier in the semesterStyle and format does count for this assignment, so please take the time to make sure everything looks good and your data and code are properly formatted including labelling code chunks. Pay attention to images and plots included in the presentation and make sure to include appropriate alternative text.
echo = FALSE
) so that your document is neat and easy to read. However your document should include all your code such that if I re-knit your R Markdown file I should be able to obtain the results you presented. Exception: If you want to highlight something specific about a piece of code, you’re welcomed to show that portion.Total | 100 % |
---|---|
Proposal | 10 % |
Presentation | 50 % |
Write-up | 15 % |
Reproducibility and organization | 10 % |
Team peer evaluation | 10 % |
Classmates' evaluation | 5 % |
tidyverse
functions, ggplot2
)Showcase your inner data scientist
Pick a dataset, any dataset…
…and do something with it. That is your final project in a nutshell. More details below.
The final project for this class will consist of a visualization of a dataset of your own choosing. The dataset may already exist, or you may collect your own data using a survey or by conducting an experiment. You can choose the data based on your interests, data you’ve collected outside of class, or a community dataset. The goal of this project is for you to demonstrate proficiency in the techniques we have covered in this class (and beyond, if you like) and apply them to a novel dataset in a meaningful way.
The goal is not to do an exhaustive data analysis i.e., do not create every visualization you have learned for every variable, but rather let me know that you are proficient at asking meaningful questions and exploring them with data visualizations, that you are proficient in using R, and that you are proficient at interpreting and presenting the results.
The project is very open ended. You should create some kind of compelling visualizations of this data in R and they will be presented in a handout. There is no limit on what tools or packages you may use, but sticking to packages we learned in class (tidyverse
) is required. You do not need to visualize all of the data at once. A few high quality visualizations will receive a much higher grade than a large number of poor quality visualizations. At least two visualizations should be made using R but you are also encouraged to create your own visualizations using other mediums/crafts if you choose. Also pay attention to your presentation. Neatness, coherency, and clarity will count. All analyses must be done in Posit, using R, however you can get creative on the presentation of this data by adding more things using a graphics editor.
Note: This is the first year we are making handouts, so check back here later on in the course with more guidance to come.
Here is an example of a past handouts that students at Georgia State University have made Yellowstone Travel and Buckethead.
In order for you to have the greatest chance of success with this project it is important that you choose a manageable dataset. This means that the data should be readily accessible and large enough that multiple relationships can be explored. As such, your dataset must have at least 50 observations and between 10 to 20 variables (exceptions can be made but you must speak with me first). The variables in the data should include categorical variables, discrete numerical variables, and continuous numerical variables.
If you are using a dataset that comes in a format that we haven’t encountered in class, make sure that you are able to load it into R as this can be tricky depending on the source. If you are having trouble ask for help before it is too late.
Note on reusing datasets from class: Do not reuse datasets used in examples, homework assignments, or labs in the class.
Below are a list of data repositories that might be of interest to browse. You’re not limited to these resources, and in fact you’re encouraged to venture beyond them. But you might find something interesting there:
One of the main goals of this course is that you build and develop community leadership skills as a collaborator that shares strengths, builds weaknesses, and contributes to a broader shared understanding. These skills will serve you in this course and beyond in your careers. A crucial part of building strong collaborations is good communication.
Each team will draft a group contract. A group contract is a document to help you formalize the expectations you have for your group members and what they can expect of you. It will help you think about what you need from each other to work effectively as a team! You will create and agree on this contract as a team and refer to it during the project.
At a minimum, your group contract must address these questions:
Each member should “sign” (you can just type out your name) at the bottom of the submission.
Credit for Group Contract: Tiffany Timbers, University of British Columbia
You will write your proposal in the proposal.Rmd file in your Github project.
Section 1 - Introduction: The introduction should introduce your general research question and your data (where it came from, how it was collected, what are the cases, what are the variables, etc.).
Section 2 - Data: Place your data in the /data
folder, and add dimensions
and codebook to the README in that folder. Then print out the output of
glimpse()
or skim()
of your data frame.
Section 3 - Data analysis plan:
Each section should be no more than 1 page (excluding figures). You can check a print preview to confirm length.
More instructions to come.
The following folders and files in your project repository:
memo.qmd
+ memo.html
: The code and narrative about your process for your final handout.README.md
: A brief overview of the project./data/*
: Your dataset in csv or RDS format, in the /data
folder./proposal
: Your proposal from earlier in the semester/handout
: A pdf of your final handout.Style and format does count for this assignment, so please take the time to make sure everything looks good and your data and code are properly formatted including labelling code chunks. Pay attention to images and plots included in the presentation and make sure to include appropriate alternative text.
Total | 100 % |
---|---|
Proposal | 15 % |
Handout | 30 % |
Memo | 30 % |
Reproducibility and organization | 15 % |
Team peer evaluation | 10 % |
tidyverse
functions, ggplot2
)Thorndike Library offers many resources and services that can assist you in your academic endeavors, including individualized research support and access to resources beyond COA. Study spaces are also available. The library is open 7 days/week. Remote access to the research databases is available 24/7. Contact library@coa.edu or visit the library website for details.
If you are in need of a term long loaner laptop, please contact the IT department at helpdesk@coa.edu. Mention that you are taking a data science class, and pick up the laptop in A&S right by the whale skull.
I am committed to the promotion and use of open educational resources and software in the journey of designing an accessible computing education. Much of the course description, design, syllabus, website, and educational materials have been adapted from “Data Science in a Box,” https://datasciencebox.org/, by Mine Çetinkaya-Rundel under the Creative Commons Attribution Share Alike 4.0 International.
Conceptually, intellectually, and substantively, the course policies and learning objectives draws heavily upon the work of current and past colleagues at College of the Atlantic, Bates College, and beyond including Carrie Diaz Eaton, Anelise H. Shrout, Barry Lawson, Meredith Greer, Ethan Miller, Misty Beck, Francis Eanes, Dave Feldman, as well as scholars beyond these institutions.
This course has also been greatly improved by the feedback of Bates students enrolled in DCS 210 in the Fall and Winter 2021-2022 and students enrolled in ES 1085 at the College of the Atlantic in Fall (2022, 2023) and Winter (2023, 2024). Special thanks to Bates students Liza Dubinsky and Max Devon worked with me on the design of the accessible data visualization lab.
Featured artwork is by @AllisonHorst.