Final Project
===============
**Due Date for Part 1: Thursday, Apr 3, by 5:00pm central time**
**Due Date for Parts 2-4: Friday, May 2, by 3:30pm central time**
The Songs of Distant Final
--------------------------
PART 1: Pitch
~~~~~~~~~~~~~
Form groups of two or three students to work collaboratively on the final project.
You may choose your own groups, or you may ask the instructors to assign you to a
group. Please let us know what groups you are in, or if you would like to be assigned
to a group ASAP.
The first part of the final project is to identify an interesting data set that you
want to work on. The data set should be Engineering focused, broadly defined. The
data set should also be amenable to CRUD operations and some sort of analysis (e.g.
plotting some values or generating summary statistics). Most data sets that are in a
*list-of-dictionaries* format and contain time stamps and/or quantitative values
should work. The Meteorite Landing data or ISS Positional data are good
examples of this, but should not be used for the final project. There are several
links at the bottom of this page to Engineering-focused data sets, but you may
look elsewhere too.
Once your group has identified a potential data set to work on, write up a ~1 page
summary of the proposed title of your project, list of group members, and a description
of the data. Then schedule a ~5 minute meeting with one of the instructors in order
to “pitch” your project. We want to know what the source is of the data, see what
the data looks like, and hear what is your plan for working on the data.
PART 2: Code Repository
~~~~~~~~~~~~~~~~~~~~~~~
The final project involves building a REST API for interacting with an interesting
Engineering data set. The API should allow users to perform basic HTTP requests -
e.g. POST, GET, DELETE - to load in the data, view it in a RESTful / collections
style, and delete the data. The API should also allow users to submit an analysis job,
which is asynchronously handled by workers that generate images and store them in a database.
The application must be hosted on the Kubernetes cluster and accessible to the outside world
at a public URL.
**All requirements from Homeworks 6-8 apply.** Here are some specific requirements we will look for:
* The front-end REST API must have:
* a ``/help`` endpoint describing all the routes within
* appropriate endpoints (and methods) to POST / GET / DELETE the Engineering data set to / from the Redis database
* sufficient REST-style endpoints for returning the data collections (and subsets therein) in an intuitive way
* endpoints for submitting a job to plot data and retrieve the results
* The back-end workers must:
* use queue functionality to watch for jobs being submitted by the user
* generate a plot of some aspect of the data set as guided by the user's job instructions
* contain functionality to add the resulting image back into the Redis databse so the user can download it
* The Redis database must support:
* a 'raw data' database holding the Engineering data set in a logical format
* a 'hot queue' for managing asynchronous tasks
* a 'jobs' database for handling job instructions and statuses
* a 'results' database for holding the final images / results generated by the workers
* a scheme for backing up the database at regular intervals in a way that it can be restored
The project must also include a well-written README following all the guidelines
given in previous class assignments. This README should emphasize two sections:
instructions for deploying and testing the application both on local hardware (e.g. Jetstream) and
on a Kubernetes cluster, and instructions for using the application both on local hardware
(e.g. Jetstream) and at a public endpoint.
Other files including Kubernetes configuration files for both test and production
environments, Dockerfile(s), a
docker-compose.yml file, a requirements.txt file, test scripts compatible with pytest,
and a software diagram illustrating some part of the project will be expected (see 'What to
Turn In' below).
PART 3: Write Up
~~~~~~~~~~~~~~~~
We are looking for a written document (maybe ~10-11 pages as a PDF) describing the project.
The written document should be verbose and targeted towards a non-user, but technically
savvy layperson (e.g. one of your fellow engineering students who is not taking this
class). Here are some things we will be looking for:
* Title page. Contains descriptive title, students names
* Write up contains logical progression of sections with appropriate headers
* High level description with introduction to the project, describes the motivation
* Detailed but concise description of the data
* Key technologies (e.g. Flask, Docker, Kubernetes) are defined at a high level for people who might not know what they are
* List of route endpoints is easy to read and gives a nice overall picture of the API
* Usage section shows representative example code snippets - not necessarily exhaustive, but just enough
* Ethical and Professional Responsibilities section is well thought out
* Section connecting parts of this project to key software design principles (see Unit 04)
* Citations page at the end
PART 4: Video Demo
~~~~~~~~~~~~~~~~~~
Prepare a < 10 minute video demo of the application. Use zoom to screen share
and record your narration of the process. At a minimum, we want to see you describe
and show the resources deployed to Kubernetes, curl various routes to display select data, describe
the data that you are showing and the importance of it,
curl the appropriate routes to submit a an analysis job and retrieve and display
the results, and highlight anything else you think is interesting or unique about
your application.
What to Turn In
---------------
This Final project should be pushed into a standalone repo with a descriptive
name. It should not be part of your existing homework repo. A sample Git
repository may contain something similar to the following after completing the Final
(your filenames may vary):
.. code-block:: text
repo-name/
├── data
│ └── .gitcanary
├── diagram.png
├── docker-compose.yml
├── Dockerfile
├── kubernetes
│ ├── prod
│ │ ├── app-prod-deployment-flask.yml
│ │ ├── app-prod-deployment-redis.yml
│ │ ├── app-prod-deployment-worker.yml
│ │ ├── app-prod-ingress-flask.yml
│ │ ├── app-prod-pvc-redis.yml
│ │ ├── app-prod-service-flask.yml
│ │ ├── app-prod-service-nodeport-flask.yml
│ │ └── app-prod-service-redis.yml
│ └── test
│ ├── app-test-deployment-flask.yml
│ ├── app-test-deployment-redis.yml
│ ├── app-test-deployment-worker.yml
│ ├── app-test-ingress-flask.yml
│ ├── app-test-pvc-redis.yml
│ ├── app-test-service-flask.yml
│ ├── app-test-service-nodeport-flask.yml
│ └── app-test-service-redis.yml
├── Makefile
├── README.md
├── requirements.txt
├── src
│ ├── flask_api.py
│ ├── jobs.py
│ └── worker.py
└── test
├── test_flask_api.py
├── test_jobs.py
└── test_worker.py
Send an email to wallen@tacc.utexas.edu with the PDF write-up
attached plus a link to your new GitHub repository plus a link to download the
zoom recording. Please include "Final Project" in the subject line. We will clone
all of your repos at the due date / time for evaluation. Only one email
per group is required.
Additional Resources
--------------------
Here are some example sites where you can find suitable data sets. This is not
an exhaustive list
* `Registry of Research Data Repositories `_
* `Kaggle `_
* `Data.gov `_
* `NASA Earth Data `_
* Please find us in the class Slack channel if you have any questions!