Open Topics
Text Mining:
Machine Learning:
- Interpreting Deep Clustering Results
- Visually exploring neural networks
- Visually Analyzing the Fault Tolerance of Deep Neural Networks
- Library for visualization of slices
- Visualization for optimization problems
- Semantic understanding of Charts
- Clustering in high noise astronomical data
- Exploratory data analysis in Gaia
- Anomaly Detection in High Energy Physics with Variational Autoencoders (in cooperation with CERN)
Data Visualization/Story Telling
- Understanding climate change data
- Understanding COVID-19 data
- Communicating complex vaccination data
HCI - Human Computer Interaction
HDI - Human Data Interaction:
- How do people understand charts?
- Explaining descriptive statistics
- Data documentation
- Data descriptions
- Common data or spreadsheet fears
- How do researchers discover and use data?
Data Science:
Image Processing:
- Neuron signature visualization
- Visualization-Supported Comparison of Image Segmentation Metrics
- Usability Evaluation of Open Source Volume Analysis Software
- Improving ground truth for CNNs through Uncertainty and a Human-in-the-loop
- Ensemble methods in image segmentation
- Smart Image Filter Preview
Theory of Vis:
Text Mining:
Explore novel interaction techniques - Chatbots for Data Analysis
Chatbots are becoming more prevalent and are actively used by many companies. They offer a voice or text interface to interact with a computer. An example of a chatbot is amazon’s alexa, which can tell the time when asked.
The goal of this project is to find new possible ways to interact with an exploratory data analysis tool. Developing new interaction techniques would allow the user to explore and understand the data in a new fashion. For example, it could be possible to have a chat window next to a scatterplot that enables the user to enter queries such as: ‘show me the average’, which would then be reflected in the scatterplot.
- Learn about natural language processing
- Understand and compare interaction techniques
- Develop a ‘conversation’ with a data analysis tool
Prerequisites: VIS, HCI
Contact: Torsten Möller
Access to Justice
The challenge of "Access to Justice (A2J)" is to make laws and court decisions accessible to lawyers and laymen. A lot of legal information is available online. However, this data is often spread among different hard to use databases that are mostly aimed at experts.
While it will be unlikely that you will solve this issue with a semester project, an interesting piece to start with is to better understand the needs of different users (within the user group of lawyers as well as the user group of laymen) and to transform them into mockups of a possible website for the retrieval of the appropriate information.
Your results can build upon some efforts from legal scholars (in particular Paul Eberstaller and his efforts at https://risplus.at/) with whom we are collaborating with.
Milestones:
- conduct interviews with several potential users of said information retrieval interfaces
- conduct a literature search for interfaces trying to achieve similar tasks
- summarize your findings through a requirement analysis, including context analysis, task analysis, and personas
- build low-fidelity prototypes and gather feedback on them
- build high-fidelity prototypes and gather feedback on them
- improve your high-fidelity prototype based on these feedbacks
some helpful references:
- www.canlii.org/en/
- en.wikipedia.org/wiki/LexisNexis
- eur-lex.europa.eu
- www.ris.bka.gv.at
- risplus.at
- www.gesetze-im-internet.de
- "The legal macroscope: Experimenting with visual legal analytics", Nicola Lettieri, Antonio Altamura, Delfina Malandrino
- "Effective User Interaction for High-Recall Retrieval: Less is More", Zhang, Abualsaud, Ghelani, Smucker, Cormack, Grossman
contact: Torsten Möller
in collaboration with: Paul Eberstaller, Nikolaus Forgo (for advice on users and use case); Evangelos Milios (for advice on information retrieval and text analysis)
pre-req: HCI [NLP would be nice, but not required]
Machine Learning:
Interpreting Deep Clustering Results
Deep embedded clustering also called deep clustering is a growing field that combines ideas from clustering and deep learning. The integration of these techniques makes it possible to learn features automatically from the data to increase clustering performance. Current deep clustering methods are hard to interpret, making it difficult to understand how a clustering result was reached.
The goal of this project is to develop an interactive visualization tool, e.g. a web based application, for exploring the predictions of deep clustering algorithms and helping to understand their decision making process.
The student is expected to do a literature review of existing visualization techniques developed for (supervised) deep learning, e.g. feature visualizations, that could be applicable to interpreting unsupervised deep clustering algorithms. The identified methods should then be applied (if necessary adapted) to and compared for existing deep clustering algorithms.
Some research questions of interest that should be considered during the project would be: How suitable are existing visualization techniques to interpret deep clustering results? How do the different parts of the multi-objective loss of deep clustering techniques relate to each other? Considering multiple clustering models, e.g. K-Means vs DBSCAN, how do the neural network visualizations differ for each of them?
Students working on this project need basic background knowledge in machine learning (e.g. Foundations of Data Analysis), visualisation (e.g. Visualisation and Visual Data Analysis), solid programming skills in Python, and desirably some background with PyTorch, deep learning and some visualization framework, like d3.
Prerequisites: VIS, FDA
Contact: Aleksandar Doknic, Torsten Möller, in collaboration with Claudia Plant & Lukas Miklautz
Visually exploring neural networks
We have a collection of 100,000 different neural networks from the Tensorflow Playground . The core goal of this project is to create a visual interface to understand some of the basic properties of neural networks. Enabling a user to explore should help answer questions like the relationship of number of neurons and number of hidden layers, the impact of batch size, activation functions and other parameters on the quality of the network. Your tasks include:
- fast prototyping with Tableau
- getting familiar with the data set
- querying neural network users on what parameters they want to explore (requirement analysis)
- development of low-fi and high-fi prototypes
Prerequisites: VIS, FDA
Contact: Torsten Möller
Visually Analyzing the Fault Tolerance of Deep Neural Networks
The main objective is to design and implement a good and efficient way of visually investigating the resilience of deep neural networks against silent data corruption (bit flips) based on given empirical measurements. There are many possible causes for such faults (e.g., cosmic radiation, increasing density in chips, lower voltage which implies lower signal charge, etc.), and their "incidence" is expected to increase with current trends in chip architecture.
Starting point for the project is a given example data set which contains information about the relationship between single bit flips across various locations of a certain neural network (which layer, which neuron, which weight, which position within the floating-point representation of a real number, etc.) and the resulting accuracy of the network.
The task is to develop a tool which supports answering various questions about the influence of a bit flip on the resulting accuracy.
Examples for interesting questions are the following:
- (empirical) distribution of the influence of a bit flip on the resulting accuracy over the positions in the floating-point representation
- (empirical) distribution of the influence of a bit flip on the resulting accuracy over the layers in the network architecture
- (empirical) distribution of the influence of a bit flip on the resulting accuracy over the weights in a given layer in the network architecture
In order to answer these questions, an iterative design process is required to
- start with a requirement analysis (task & data analysis)
- low-fi prototypes
- high-fi prototypes
- refinement
- constant evaluation of the visual analysis tool.
The data set, the problem setting and the details of the requirements are provided by Prof. Gansterer, the supervision in visual analysis aspects is provided by Prof. Möller.
Prerequisites: VIS, FDA
Contact: W. Gansterer | Torsten Möller
Library for visualization of slices
This is primarily a programming project. Slicing methods are a novel way of visualizing multi-dimensional data. However, there is no publicly-available library for R or Python that makes it easy to use these visualization techniques. The goal of these projects is to develop such a library. Students should have knowledge of Javascript and either R or Python.
Prerequisites: VIS
Programming languages: Javascript and (R or Python)
Contact:Torsten Möller|Thomas Torsney-Weir
Visualization for optimization problems
Can visualization beat traditional (offline) optimization problems? The goal of this project is to see how well visually guided optimization can compete with traditional optimization algorithms. Students will develop a visualization system to find optimum configurations of black box (i.e. unknown) algorithms from a contest.
Prerequisites: VIS, Mathematical Modeling
Programming languages: Javascript, (R or Python), and C++
Contact: Torsten Möller|Thomas Torsney-Weir
Semantic understanding of Charts
Research areas: Machine Learning, HCI - Human Data Interaction
This project aims to automatically understand charts (= data visualizations) and translate their meaning into natural language text. This will be done using deep learning. Neural nets draw bounding boxes around objects and label these objects. After detecting all possible objects another network creates a sentence describing the scene in the image by using the object labels. Examples of deep neural networks for image descriptions and chart extraction can be found here:
A resulting sentence would sound something like this: ‘black and white dog jumps over bar.’
The goal of this project is to use such an approach and apply it to different types of charts. We would have a neural network detecting the objects in a plot and another describing the objects by creating a sentence.
- Learn about state of the art Machine learning
- Use machine learning libraries such as tensor flow
- Find/Aggregate datasets
Prerequisites: FDA, possibly VIS
Contact: Torsten Möller, Laura Koesten
Clustering in high noise astronomical data
Area: Machine Learning
With the second Gaia data release astronomers have been flooded with data. One interesting research question which Gaia helps answering is concerning open clusters (OC). OCs are groups of stars born in the same place and time which are regarded as the building blocks of galaxies. Gaia provides precise positions and velocities of 1.6 billions stars which are the features used to extract these open clusters. However, OCs constitute only a small fraction of the full data set and are embedded in a sea of field stars which we consider noise. Hence, the results of density based clustering techniques depend strongly on small changes in the algorithms hyper-parameters. The goal of this project is to survey current techniques which can help to better extract these OCs from the Gaia catalog and potentially design new algorithms which better suit this task.
Unsupervised learning
Density based clustering
Big Data
Prerequisites: FDA
Contact: Sebastian Ratzenböck
Exploratory data analysis in Gaia
Area: VIS/Machine Learning
Astronomical discoveries depend on the quality of the data. Therefore, quality criteria are introduced to filter out bad data points. Within the Gaia data set there are multiple metrics which provide information about the quality of a single entry in the tabular data set (e.g. a star). The goal of this project is to analyze current quality criteria and their effects on the data features and optimally find better suited filter solutions.
- Visualize the effects of different data filters
- Unsupervised learning
- Big data
Prerequisites: VIS, FDA
Contact: Sebastian Ratzenböck
Anomaly Detection in High Energy Physics with Variational Autoencoders (in cooperation with CERN)
At CERN's Large Hadron Collider (LHC), researchers are searching for new unobserved physics phenomena that could convey missing pieces of today's understanding of the universe. For more than 40 years, many theories for new particles have been put forward and the LHC's data was probed for their evidence. This led to many discoveries, most notably that of the Higgs particle. However, to this date, numerous questions about the nature of matter remain unanswered. Thus, CERN is exploring the use of Machine Learning (ML) in its quest to shed light on those rare unknown phenomena, called anomalies.
Inside the LHC, 1 billion proton-proton collisions are produced every second. The collisions result in new particles which are registered by detectors. Information from the detectors is sent through a data-processing pipeline, where it is denoised, filtered and interpreted. On that resulting data, ML techniques can be applied to search for patterns hinting at anomalous phenomena.
Variational Autoencoding A promising approach to tackle this challenge are unsupervised ML techniques, where no prior theory about the anomaly is needed. One prominent unsupervised ML algorithm is the Variational Autoencoder, which learns to compress input to a much smaller dimensionality called the latent space from where it reconstructs the input. This compression-decompression flow can be used to flag anomalies, whenever the reconstruction does not resemble the corresponding input.
Project Description: The goal of this project is to explore a variational autoencoder applied to simulated LHC collision events. Concretely the four tasks at hand are:
- Train a Variational Autoencoder on a simulation of anomalous particles (e.g. the Randall-Sundrum Graviton)
- Explore functional space, especially through a visual analysis
- Hyperparameters (loss function weighting, dimensionality of latent space, learning rate, etc.)
- Architecture (number of layers, size of layers and of convolutional filters, pooling, etc.)
- Analyze latent space (Can we find an interpretation?)
Programming: Python, Tensorflow
Prerequisites: VIS, FDA
Contact: Torsten Möller
Data Visualization/Story Telling:
Understanding climate change data
Research area: Data Visualisation, HDI / Human Data Interaction
Description: Data visualisations, such as charts, are often used to communicate data about climate change, both in research and in popular news sources. This project investigates how people make sense of common data visualizations about climate change by conducting interview studies with doctoral researchers and students at the University of Vienna.
Tasks:
- Collect sample types of charts commonly used with respect to climate change (e.g. on social media)
- Design and conduct an interview study
- Qualitative data analysis
Prerequisites:
- FDA
- VIS
Contact: Laura Koesten
Identifying charts on climate change or COVID-19
Research area: Data Visualisation
Description: Data visualisations, such as charts, are often used to communicate data about climate change and the COVID-19 pandemic, both in research and in popular news sources. Which charts are commonly used to communicate data about these topics? This project tests and modifies an existing algorithm developed to extract charts from articles in journals and popular news sources in order to inform the creation of a larger dataset of charts which are used to communicate data about climate change and COVID-19.
Tasks:
- Create sample of papers from a journal or news source
- Test and modify the algorithm described in https://arxiv.org/pdf/2105.14931.pdf
Prerequisites:
FDA
Programming languages: any
Contact: Torsten Möller, Laura Koesten, Kathleen Gregory
Understanding COVID-19 data
Research area: Data Visualisation, HCI / Human Data Interaction
Description: Data visualisations, such as charts, are used frequently to communicate data about COVID-19, both in research and in popular news sources. In this project we investigate the types of questions that are frequently asked during the COVID-19 pandemic and how charts are used to answer them. We will do this by collecting commonly asked questions and conducting a qualitative study about how people answer these questions for themselves using COVID data visualisations.
Tasks:
- Collect a sample dataset of COVID related questions (from online resources)
- Design a study aiming to investigate people’s sensemaking practices
Prerequisites:
- FDA
- possibly VIS
Contact: Laura Koesten (+ Kathleen Gregory)
Communicating complex vaccination data
Research area: Data Vis / Storytelling
Description:
The question is simple: How many people are protected against COVID-19? But the data get's rather complex rather fast. We have people that have been ill before, but some of them have had COVID-19 too long ago and are not protected anymore. Some of them had it without knowing about it. Some are vaccinated, but lost their protection (since they didn't have a booster shot), etc. Are you able to explain these complex connections to
- a vis expert
- an epidemeologist
- a doctor
- a politician
- your relatives?
Would you use the same way of explaining it each time or would you choose a different way? Why? The goal of this project is to iterate and find optimal approaches of communicating this data and compare what works and what doesn't.
Prerequisites: HCI + Vis
Contact: Torsten Moeller
HCI - Human Computer Interaction:
Interfaces and influences for fruit fly larvae brains
The goal of this project is to develop interfaces and visualization to answer questions related to the connections and similarities between different anatomical structures of the fluit fly larvae brains. For example, how do neurons on the left and right hemishpheres compare? Or, how does a particular neuron change as the organism grow? To answer these questions we need effective interfaces and visualizations.
Prerequisites:
- Programming languages: Knowledge of or willingness to learn TypeScript and Go
Contact: Thomas Torsney-Weir, Torsten Möller
Sliders for decision making
Research area: Human Computer Interaction, Data Science, Interfaces
Description: Sliders on interfaces provide a range to select an input value. Sliders can restrict users to entering valid values by only offering a valid range, or they can be used to support multi-criteria decision making. In this project we aim to compare different types of sliders for decision making. This includes triangular, binary and single, sliders as well as “scented widgets”, which are embedded visualizations to facilitate navigation in information spaces.
(See for instance https://dl.acm.org/doi/pdf/10.1145/3240167.3240185)
Tasks:
- Creating interfaces using different slider types, develop simple alternatives of slider components
- Design an online user study (including task design, recruitment, usability evaluation)
- Analyse quantitative and qualitative data from the user study
Prerequisites:
- HCI, possibly FDA
- Programming languages: Python or R
Contact: Laura Koesten, Torsten Möller
HDI - Human Data Interaction:
How do people understand charts?
Research areas: HCI - Human Data Interaction, Data Visualization
Textual descriptions of charts are relevant for a variety of application and research areas.
In this project we will create a crowdsourcing study to collect a dataset of charts annotated with a description of their key messages as perceived by the readers of the charts. The data will consist of images (charts) and free text interpretations of the charts. We will analyse the resulting descriptions qualitatively and visualise the results in an interactive manner.
- Qualitative (content analysis) and quantitative analysis of text and image data
- Apply NLP techniques to cluster and analyse free text data
Prerequisites: FDA, VIS
Contact: Laura Koesten
Explaining descriptive statistics
With Anscombe's Quartet [1] it was demonstrated quite figuratively that summary statistics can be very misleading or, at least, hard to interpret. Just recently, this example has become quite playful with the Dinozaur Dozen [2]. However, there are a number of statistical measures, that don't have an easy (visual) explanation. One of them is Krippendorf's alpha [3], a very common measure in the social science for measuring the agreement between subjective coders (as in labeling text or documents). The challenge of this project will be to:
- understand the measure
- develop simple alternatives
- develop different visual representations that "bring this measure to life", i.e. make it easy(er) to understand
pre-req: Vis
Contact: Torsten Möller, Aleksandar Doknic
[1] Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966. JSTOR 2682899, see also en.wikipedia.org/wiki/Anscombe%27s_quartet
[2] Justin Matejka, George Fitzmaurice (2017), "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing," ACM SIGCHI Conference on Human Factors in Computing Systems. see also www.autodesk.com/research/publications/same-stats-different-graphs
[3] Krippendorff, Klaus (1970). Estimating the reliability, systematic error, and random error of interval data. Educational and Psychological Measurement, 30 (1), 61–70. see also en.wikipedia.org/wiki/Krippendorff%27s_alpha
Data documentation
Documenting data is as important as publishing it. There are many proposals that describe the content and format of data documentation, capturing the entire data science lifecycle, from collecting the data (for instance using sensors) to cleaning and analysing it. The aim of this project is twofold:
1. To apply these documentation proposals on known and less known datasets to understand how easy to use they are and how subjective documentation practices are.
2. To explore collaborative documentation practices to reduce inconsistencies in documentation. To do this we will investigate the differences when people use traditional metadata schemata versus a more creative setting, such as using Jamboard, to describe a dataset.
Tasks:
- Design, conduct and analyse a qualitative study
Prerequisite:
- FDA
- basic knowledge of qualitative research methods
Contact: Laura Koesten
Data descriptions
Research areas: Human data interaction, research data management
Description: Metadata, or standardized descriptions of data, are powerful surrogates for data. They impact how data are discovered, how data are understood, and how data are used. Metadata are most often created manually at data repositories, although there is great variation in how this is done. This project will use a large-scale survey (e.g. an online questionnaire) to understand the metadata generation processes at data repositories included in the re3data.org database.
Tasks:
- Create sample of data repositories to include
- Create questionnaire
- Recruit respondents
- Analysis of questionnaire responses
Prerequisite:
- FDA
- Programming languages: Python or R
Contact: Laura Koesten (+ Kathleen Gregory)
Common data or spreadsheet fears
Research area: Human Data Interaction
Description: We are increasingly exposed to data in different aspects of our lives, be that in an ever growing range of professions reliant on data analysis, or in our private lives exposing us to data about us, our activities or using data to inform our decisions. However, many people still do not feel comfortable engaging with a spreadsheet, nor do they have the skills to perform more complex types of data analysis. In this project we aim to conduct a qualitative study to better understand people’s preconceptions by observing them interacting with a spreadsheet and discussing their experiences.
Tasks:
- Design a mixed method study
- Recruit respondents
- Qualitative data analysis
Prerequisites:
- FDA
- possibly VIS and HCI
Contact: Laura Koesten
How do researchers discover and use data?
Research area: data search/discovery, human data interaction , research data management, information retrieval
Description:
Scientists and researchers are increasingly encouraged to use data which other people create. How do these researchers find, use and understand these data? This project performs quantitative analysis of an existing dataset, collected through a global survey of researchers, to examine these questions according to, e.g. academic disciplines, career ages or geographic location.
Tasks:
- Exploratory data analysis of publicly available survey dataset
- Identification of question from survey data
- Descriptive statistics, possible inferential statistics, possible textual analysis of free-text responses
Prerequisites:
- FDA
- Programming languages: Python or R
Contact: Laura Koesten (+ Kathleen Gregory)
Data Science:
Understanding data conversations to understand data science communities
Research area: Data Science
The project will build a corpus of conversations around datasets and data science activities from forums of data communities such as Kaggle, data.world, or Reddit. The aim is to carry out content and community analysis, using qualitative or quantitative methods to understand how people talk about data and to learn what that means for data community platform design.
Tasks:
- Collecting available forum messages of two data platforms (e.g. Kaggle)
- Getting familiar with the data set
- Content and community analysis of the messages and their authors
Prerequisites:
- FDA, VIS
- Basic qualitative and quantitative data analysis
- Basic Python
Contact: Laura Koesten
Crowdsourcing dataset summaries
Research area: Data Science, Human Computation
Text is more accessible than metadata when describing what a dataset is about and how it should be used. In previous studies we used crowdsourcing to generate data summaries and understand what good summaries look like:
https://www.sciencedirect.com/science/article/pii/S1071581918306153
In this project, the aim is to improve on this method to iterate over the summaries written by the crowd and create a larger dataset of summaries. This can be a useful resource, for instance to train Machine Learning algorithms to create dataset summaries automatically.
Tasks:
- Learn crowdsourcing as a method
- Create a dataset of crowdsourced summaries
- Analyse text data
Prerequisites: FDA, HTML and basic Javascript, basic Python, possibly familiarity with APIs
Contact: Laura Koesten
Image Processing:
Neuron signature visualization
Neuroscience is focused on understanding how the brain functions both structurally as well as functionally. This is accomplished by imaging brains using a variety of different methods including EM, confocal, LM, etc. The challenge is to identify neurons across these images to understand how the brain develops throughout an organism's life. The aim of this master's thesis is to push the state of the art in searching for neurons through exploration and evaluation of different methods of exploring similarities between neurons.
Prerequisites: SIP
Contact: Thomas Torsney-Veir | Torsten Möller
Visualization-Supported Comparison of Image Segmentation Metrics
Area: Visualization, Image Processing
Segmentation algorithms, which assign labels to each element in a 2D/3D image, need to be evaluated regarding their performance on a given dataset. The quality of an algorithm is typically determined by comparing its result to a manually labelled image. Many metrics can be used to compute a single number representing the similarity of two such segmentation results, all with specific advantages and disadvantages. The goal in this project is to:
- Research the segmentation metrics in use in the literature.
- Create a tool that calculates multiple segmentation quality metrics on an image.
- With the help of this tool, analyze how the single segmentation metrics perform in detecting specific kinds of errors in the segmentation results, as well as correlations between the metrics.
Prerequisites: SIP, VIS
Contact: Bernhard Fröhler | Torsten Möller
Usability Evaluation of Open Source Volume Analysis Software
open_iA enables users to perform general and specialized visual analysis and processing of volumetric datasets (such as from a computed tomography device). Since it has been developed mainly as a basis for research prototypes, the user interface so far was not developed with usability as first concern.
The goals of this project are:
- To evaluate the usability of its general capabilities, and optionally of its advanced visual analysis tools. This could for example happen through usability interviews, or user studies comparing it to other (open source and commercially available) solutions.
- To find innovative ways of overcoming the problems found in the evaluation.
- Depending on time and interest, to implement some or all of these improvements.
Prerequisites: finished the Signal and Image Processing & the Human Computer Interaction class
Contact: Bernhard Fröhler
Improving ground truth for CNNs through Uncertainty and a Human-in-the-loop
Convolutional neural networks (CNNs) have become one of the most used tools for image segmentation in any application domain. CNNs, however, require a lot of training data. Especially in volumetric datasets (i.e. 3D images such as from a Computed Tomography device), where the size of a single dataset is typically no less than 1000³ = 1 billion voxels, it is already infeasible to fully manually segment just one of these. Other segmentation algorithms (or CNNs trained for slightly different application domains or input data) can be used to help in creating the ground truth, but their results need to be checked. The goal of this project is to
- Design and implement a hybrid segmentation method, based on
- A CNN performing segmentation of an input volume, and
- Gathering user input for iterative refinement
- For gathering the user input, a tool should implemented which visualizes the results of the CNN, and allows a user to provide input for future trainings of the CNN
- The tool should ideally provide guidance on regions to check via uncertainty metrics from the CNN
- Evaluate the implemented method in comparison to state-of-the-art methods in e.g. material science (datasets and reference algorithms will be provided)
Prerequisites: finished the Signal and Image Processing & the Machine Learning class; attending the Visualization and/or Human Computer Interaction class before also would be beneficial
Contact: Bernhard Fröhler
Ensemble methods in image segmentation
An image segmentation algorithm labels a pixel. While no segmentation algorithm is always correct, the idea is to work with many different segmentation algorithm that each create a label for a pixel. We call this an ensemble. The idea of this project is to explore how to best combine these different ensemble members to "always" create the right label for the pixel (to explore 'the wisdom of the crowd').
In image segmentation, several methods are known of how to combine a given collection of segmentation results. For example voting methods might label a pixel according to the majority of labels for that pixel in the collection. However, such a vote can be ambiguous, therefore additional rules might be required to arrive at a definitive labeling.
Goal:
- Gather and/or define a set of useful rules to combine image segmentation results. Furthermore, define a pipeline containing these rules, such that the usage of the rules is depending on the parameterization. A simple example: The pipeline could be based on the majority voting rule, combined with intelligent rules for handling the case of ambiguous pixels, for example through considering the neighborhood of the pixel or the uncertainty of the single segmentation results (if probabilistic segmentation algorithms are used).
- Explore the parameter space of this generalized pipeline. Set up a framework to “learn” suitable parameters for this pipeline. Test your pipeline on several different datasets and try to come up with optimal parameters. Refine your pipeline until it can produce results at least close to the state of the art algorithms for segmentation such images.
- Once a set of optimal parameters for some limited number of datasets are established, perform experiments on whether those parameters learned for the generalized combination pipeline are transferable to the processing of new datasets, i.e. other than those the parameters were learned with.
Milestones:
- Definition of a parameterized, rule-based pipeline for (specific) image analysis tasks.
- Evaluation of the pipeline and refinement of its parameters on a limited number of datasets
- Application of the pipeline and the found parameters on a broader range of datasets
Prerequisites: VIS, SIP
Contact: Torsten Möller | Bernhard Fröhler
Smart Image Filter Preview
The analysis of large images (2D or 3D), requires applying filters like smoothing or denoising. Finding the most suitable parameters for a given analysis task through a trial-and-error approach can be time-consuming. The goal of this project is to develop a tool for a smart preview over the possible outcome of some image processing filters for different parameters for a small region of the image; the outcome of different parameterizations could for example be presented in a matrix; the tool should also be evaluated regarding usability.
Prerequisites: SIP, HCI
Programming languages: Python, C++
Contact: Bernhard Fröhler | Torsten Möller
Theory of Vis:
iTuner: Touch interfaces for high-D visualization
In order to understand simulations, machine learning algorithms, and geometric objects we need to interact with them. This is difficult to perform with something like a mouse which only has 2 axes of movement. Multitouch interfaces let us develop novel interactions for multi-dimensional data. The goals of this project are:
- Develop a touch-screen interface for navigating high-dimensional spaces.
- User interface designed for a tablet (ipad) to be used in concert with a larger screen such as a monitor or television.
Prerequisites: VIS, HCI
Contact: Torsten Möller|Thomas Torsney-Weir