# Project Specification



## Task overview
You will be tasked to 

1. Identify two or more related datasets of your interest,
2. Design questions surrounding the data,
3. Conduct analyses, and
4. Communicate your findings.

### Group formation
The project group may consist of up to 3 students.  If you decide to work solo, you will have to get approval from the instructor in advance.  The declared group should be included in the project proposal.

## Project grading 
The project is worth 30% of the class grade, with the following details:

|                     |                        |                   |
|---------------------|:----------------------:|:-----------------:|
|                     |   **Due date**         | **Proportion (%)**|
| Proposal            | Feb 7                  |       10          |
| Presentations       | all slides due March 10|       30          |
| Report              | March 19               |       60          |
| **Total**           |                        |      100          |

## Project proposal

Develop a proposal that includes your topic, the selection of relevant dataset(s), and a plan to answer questions of interests.  Keep in mind the timeline of the quarter and set achievable goals.

The proposal will contain several parts of your project report.  You may incorporate feedback from the proposal to improve your project report.

### Canvas project team declaration
After forming your team of 2 - 3 students, add your group on **Canvas -> People -> Groups -> + Group**.

Only form a group with explicit approval given by your teammates.

### Guidelines for proposal

The proposal should follow this format (5 points): 

- Include full names of all teammates.
- Between 1 to 2 pages, single-spaced, 11-point type, 1-inch margins.
- Do not include graphics.
- A pdf named `[GroupName]_proposal.pdf` should be submitted.  Decide on the name of your group.

The proposal should cover the following components (but in a narrative format, not Q/A):

- **Introduction** (10 points): Introduce your project and motivation, e.g.,
  - What is the main issue you are interested in?
  - Why is this topic important?
  - In what way does this project provide a solution?
  - What data do you require to answer the question?
- **Dataset(s)** (10 points): Describe the dataset(s) you have chosen, e.g.,
  - How can this dataset be retrieved?
  - How is this dataset related to the topic?
  - State the amount of data you will be working with.
  - State the relationship among the datasets and how you intend to relate them.
- **Questions of interest** (20 points): State three questions you are interested in answering with the selected datasets.  For each question, briefly describe:
  - How do you intend to answer the question?  Suggest a possible analysis.
  - What can you answer (and not answer) with the proposed datasets.
  - *This should be the focus of your proposal.*
- **Reference** (5 points): Provide valid references for your dataset(s). (Reference does not count toward the page limit.)
  - For example, using APA guidelines, the MIMIC-III database can be cited as
    > Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). *PhysioNet*. https://doi.org/10.13026/C2XW26.
  - or the original article 
    > Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. *Scientific data*, 3(1), 1-9.
  

### Dataset sources
Below are some possible avenues for finding datasets:
- [Google dataset search](https://datasetsearch.research.google.com/)
- [Awesome public datasets](https://github.com/awesomedata/awesome-public-datasets)
- [BuzzFeed news repository](https://github.com/BuzzFeedNews/everything)

## Sample project topics

- Food Allergies and their Relationships to the Atopic Triad
- Air Pollutant Emissions and Environmental Justice in Texas
- Analyzing the Geographical and Market Trends in the US Fast Food Industry
- Analysis on Recipes and Reviews on Food.com
- Analysis of Childhood Allergies
- Optimizing NBA Coaching Perspectives

## Project presentation

Prepare an in-class presentation for March 11 and 13, 2024.  The presentation schedule will be published on Canvas.

**Submission:** Your slides (either a `.pptx` or a `.pdf` file) should be submitted by March 10 11:59pm on Canvas.

### Guidelines for presentation (80 points)

The presentation should be *up to 10 minutes*, with an additional 2-3 minutes for Q/A. 

The presentation should include *but not limited to* the following:

- Motivation of your project.
- Introduction to your dataset(s) and its relevance.
- Key questions, method of analyses, and findings.
  - You may choose to present **one to two** of your proposed questions.  You can choose to present all of your proposed questions but *consider the time you need to present sufficiently for the audience's understanding.*  Choose what is important to you.
  - *Tips:*
    - Quality over quantity here.
    - Provide an overall narrative that connects the questions, rather than jumping from one to another.
- *(Optional, quick)* Major lesson learned or barrier in the project.
- Summary.

### Guidelines for participation (20 points)

Visit [[link to be inserted on March 11]] for entering presentation feedback.

For each presentation besides your own, enter your response (see pinned example).

When leaving your response,
- Include your full name.
- Note one thing you have learned during the presentation.
- Note one question you have during the presentation.

### Grading criteria

The presentation is evaluated via the following criteria:
- Organization of content
- Appropriate use of language
- Delivery
- Use of supporting materials (visuals, statistics, etc.)
- Clarity of central idea

Note that while the evaluation is primarily for the entire group, evaluation may differ among students if significant discrepancy is observed.  You may refer to the [sample reference rubric](a1-presentation-rubric) for details.

## Project report

Prepare a project report to clearly convey your topic choice and its importance, your approach to analysis, and your findings.

### Guidelines for report

The report should follow this format (5 points):

- Include full names of all teammates.
  - This is a reminder that it is an academic violation to include your name on any work you have not contributed or performed.
- Up to 15 pages (not including Appendix), single-spaced, 11-point type, 1-inch margins.

A pdf named [GroupName]_proposal.pdf should be submitted.

The report should include the following components in a narrative format:

- **Introduction** (15 points): Introduce your project and motivation, e.g.,
  - What is the main issue you are interested in?
  - Why is this topic important?
  - In what way does this project provide a solution?
- **Dataset(s)** (15 points): Describe the dataset(s) you have chosen, e.g.,
  - How can this dataset be retrieved?
  - How is this dataset related to the topic?
  - State the amount of data you will be working with.
  - State the relationship among the datasets and how you intend to relate them.
- **Questions and analyses** (90 points): State three questions you are answering with the selected datasets.  For each question, provide the method of analysis, its implementation, and your findings.  Use any graph, table, or other visualization to support your findings.
  - *Note:* Do not pad your reports with graphics.  Make sure any graphs you use are necessary and clearly presented.
  - Graphics should be included in-text (but not side-by-side with any texts), referenced in the text, and described fully by the caption.
- **Distribution of work** (10 points): Clearly describe the distribution of work among teammates. 
- **Summary** (10 points): Provide a brief summary of your project, and describe any loose ends or future opportunities.
- **Generative AI statement** and **Reference** (5 points):
  - Generative AI statement: If you have, in any way, employed the use of generative AI tools, report all usage according to the syllabus. *This statement does not count toward the page limit.*
  - Reference: Provide valid references for your dataset(s) and any relevant sources you have cited. *Reference does not count toward the page limit.*
    - For example, using APA guidelines, the MIMIC-III database can be cited as
    > Johnson, A., Pollard, T., & Mark, R. (2016). MIMIC-III Clinical Database (version 1.4). *PhysioNet*. https://doi.org/10.13026/C2XW26.
    - or the original article 
    > Johnson, A. E., Pollard, T. J., Shen, L., Lehman, L. W. H., Feng, M., Ghassemi, M., ... & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. *Scientific data*, 3(1), 1-9.
- **Appendix**: Include all code you have used in the appendix.  If you have used Tableau, submit the `.twbx` file(s) as supplementary materials. 