NeuroData Design is a year-long design course focused on building machine learning and statistical tools to answer neuroscientific questions. The course consists of some short lectures and weekly progress reports from each design team. Students work closely with neuroscientists, software developers, and data scientists to extend and/or develop algorithms, software, and scientific findings.
Students work in small teams of four to six people. During the first semester, teams work with instructors and TAs to scope a project that can reasonably be completed during the year. Teams typically are tasked with merging code into an open source scientific software repository. Emphasis is placed on the principles of open source software development and agile development practices. During the second semester, teams work on applying their tools to real data, and they submit manuscript drafts on their work for future publication.
2018-2019 NeuroData Design Projects
Team members: Jaewon Chung, Benjamin D. Pedigo, Ronan Perry, Bijan Varjavand
Description: A graph, or network, provides a mathematically intuitive representation of data with some sort of relationship between items. For example, a social network can be represented as a graph by considering all participants in the social network as nodes, with connections representing whether each pair of individuals in the network are friends with one another. Naively, one might apply traditional statistical techniques to a graph, which neglects the spatial arrangement of nodes within the network and is not utilizing all of the information present in the graph. GraSPy is a package that provides utilities and algorithms designed for the processing and analysis of graphs with specialized graph statistical algorithms.
Team members: Sambit Panda, Satish Palaniappan, Ronak Mehta, Ananya Sandhya, Junhao Xiong
Description: With the increase in the amount of data in many fields, a method to consistently and efficiently decipher relationships within high dimensional data sets is important. Because many modern datasets are high-dimensional, univariate independence tests are not applicable. While many multivariate independence tests have R packages available, the interfaces are inconsistent, most are not available in Python. mgcpy is an extensive Python library that includes many state of the art high-dimensional independence testing procedures using a common interface. The package is easy-to-use and is flexible enough to enable future extensions.
Team members: Patrick Myers, Allison Lemmer, Paige Frank, Ganesh Arvapalli, Karun Kannan, Ananyas Swaminathan
Description: Brain atlases are key for localizing neurobiological regions of interest. Using volumetric coordinate spaces, functional relationships can be assessed in conjunction with anatomical data. Over the past fifty years, many human brain atlases have been developed using a variety of methods. However, these atlases are stored in different formats, orientations, and coordinate spaces, making comparisons across atlases and studies difficult. We consolidate all the popular human brain atlases into a single location, and transform them all into the same standard format. This format served as the basis for a specification that we introduce to store future atlases. To demonstrate the utility of collecting all these atlases in a common specification, we conduct an experiment using the Healthy Brain Network data, quantifying the dependence between each parcel in each atlas with various phenotypic variables.