2008 Class Project

From Btry4790

Jump to: navigation, search

Contents

Project Proposal

A 1-2 page project proposal is due Fri Nov 7th. In this proposal, you should briefly describe your general problem area and outline your plans for your project in some detail. You should demonstrate, by citing appropriate sources, that you have done the necessary research to develop a clear idea of what has been done previously in your problem area and what some of the important remaining challenges are. Show clearly how ideas and methods covered in BTRY4790/6790 can be applied. Comment in particular on the types of models and algorithms you plan to use, how you will implement these methods, and what data sets you will analyze. If your project involves research problems or data sets you have examined in other settings (such as in graduate or undergraduate research projects or other classes), then provide some brief background on these problems or data sets, and explain clearly what contributions from your BTRY4790/6790 project will be new. The proposal does not need to be exceptionally detailed, and it is understood that your plans may change as you learn more about your problem, but you should show that you have thought carefully about your topic, and have a good high-level plan for how to proceed.

General Guidelines

Broadly speaking, a good project should be:

  • challenging. The project is an opportunity to do something larger in scale, more difficult, and more open-ended than what you have done for the homeworks. You should choose something sufficiently ambitious to give you a taste of what it is like to apply probabilistic graphical models to real research questions. Recall that the project is worth 30% of your grade (equivalent to 2 1/2 homeworks); the amount of work you do for it should be commensurate.
  • focused. While the project should be challenging, you have to be realistic about how much can be done in 5-6 weeks. Focus on a small enough slice of your area of interest that you have time to do a good job with all components of the project, from model design through the project writeup. If things go well, you can always expand the project later.
  • relevant. It's good to find synergy with your other work, but the project has to be closely related to the material covered in this class.
  • concrete. Your project should involve modeling, implementation, and analysis of real and/or synthetic data. Purely theoretical projects are difficult to accomplish in such a short time frame, are hard to evaluate if they do not pan out, and, therefore, are discouraged.
  • interesting and fun. Perhaps the single most important thing is to choose a topic that you find deeply interesting and that you will enjoy investigating further. Much of the rest will come naturally if you're excited about your topic.

Recall that the expectations for BTRY 4790 and BTRY 6790 are somewhat different. A 6790 project should involve some novel research. It may not be possible in the time available to make a huge leap forward, but you should try something new. A 4790 project should also involve some implementation and data analysis, but novelty is not required -- it is okay to reimplement a published method, provided it is sufficiently challenging.

Project Report

Guidelines

As discussed in lecture, the project report should be roughly 6-10 pages long (not including appendices), but the quality of the report is more important than its length. It should be written like a scientific paper, with an introduction (summarizing the problem area and background), a methods section (describing your model(s), your approach(es) to inference, etc.), a results section (showing the performance of your methods, possibly in comparison to competing methods, and anything interesting your inferences say about your data set), and a discussion (emphasizing interpretation). The report should include well-designed figures that succinctly illustrates key findings. Please do not include large amounts of raw data or thousands of lines of program source code. Key routines from your programs, along with any raw data of particular interest, should be included as an appendix. As with any good scientific paper, the report should not simply be descriptive, but should include some nontrivial interpretation and analysis.

Due Date

All projects reports will be due by 5pm Monday, December 15. Please turn them in to me at my office by hardcopy. A penalty of 10 points will be applied to projects submitted by email. Because grades are due soon after the deadline, there will be no extensions for the project.

Possible Approaches

Many of you are doing research in AI, statistics, or computational biology, and should have no trouble finding research problems and data sets to which graphical models can fruitfully be applied. However, a few (particularly undergraduate and master's) students have approached me with concerns about how to get started. Below are some possible paths available to these students:

  • Grab a topic we have touched on in the class but not explored deeply and run with it. Good possibilities include HMMs, probabilistic PCA, Kalman filters, and conditional random fields. These are all flexible and powerful classes of methods with many possible applications.
  • Assemble a new model from individual pieces we have discussed separately, such as finite mixture models, linear and logistic regression models, Markov chains, and PCA. For example, build a hidden Markov model in which state transitions are conditioned on covariates via logistic regression, develop an EM algorithm for inference, and apply your methods to an appropriate data set. Or do something similar with a mixture of probabilistic PCA models.
  • Try to address a problem of current interest to the public, using data found online. Can you do a better job than the pollsters of predicting the presidential election results by state, shed light on climate change, predict which way the Dow Jones average will go, or beat the odds on next week's football games?

Example Project Reports

This is the first year BTRY 4790/6790 has been taught, so there are no example projects available, but BTRY484/684 (Computational Genomics) has a very similar project requirement, and the following project reports from that course may be useful as examples. They obviously focus on somewhat different subject matter, but they illustrate the expected scope, quality, and style for project reports. These projects all received A or A- grades.

  • "Predicting the loss of a microRNA motif in one of twelve species of Drosophila," Nandita Garud (undergrad, Biology/Biometry). This is a good example of an class project that nicely complemented an undergraduate thesis project. ( PDF )
  • "RNA motif finding: A fully Bayesian approach," Benjamin Logsdon (grad, Computational Biology). This is a first-rate grad project that could grow into a publication. ( PDF )
  • "Prediction of RNA secondary structures by modified Nussinov algorithm," Jalal Siddiqui (undergrad, Chemical Engineering). This is an example of a solid undergrad project by a student who didn't have much previous background in computational biology. Jalal started with a fairly straightforward algorithm discussed in the course textbook but extended it in an interesting way and applied it to a real biological data set. ( PDF )
  • "Motif finding," Chun-Nam Yu (grad, Computer Science). This is a good example of a solid "reimplementation" project. Chun-Nam built a motif finder similar in many ways to MEME and used it to reanalyze a published data set. As a grad project, this one would be stronger if it involved some novel extensions of previous work, but it is quite good on the implementation and data analysis side. ( PDF )

Example Reports from 2008

(Updated January, 2009) Below are several of the best project reports from 2008.

  • "Automatic music composition: an approach with graphical models," Hyung-Chan An (grad student, CS). Very unusual, creative project. Came with a CD! ( PDF )
  • "Discovery of consensus sequence of motifs from multiple sequences through EM," David Kupiec (undergrad, CS). Good reimplementation project for 4782. ( PDF )
  • "Identifying obstacles in LIDAR data with CRFs," Andrew Owens (grad student, CS). High-quality project connected with the author's work on the DARPA Urban Challenge. ( PDF )
  • "Extensions to Markov modeling framework for bird migration," Dan Sheldon (grad student, CS). Nice extension of previous research by the author incorporating ideas from the class. ( PDF )