VPSA for evaluating user studies

Many people have discussed and analyzed the effect of replicating studies in terms of data collection.  There’s obviously a difference in terms of what methods you use to analyze the data too. There are a variety of options of data cleaning and statistical tests that one can use to analyze the data. We want to examine how these different methods can affect the outcome using a variety of datasets. How would the significance and power tests change with different types of analysis?

Goals and tasks of this project are:

  • Collect a number of open-data user studies from the visualization and HCI community
  • Design a pipeline using, e.g. R or Python, to run a number of different statistical analyses of the data
  • Sample this pipeline and analyze the results

Contact: Thomas Torsney-Weir

University Course Evaluation

Area: Visualization / HCI

The university gathers data about all its courses after each semester in the form of evaluations and other quantitative measures. The main goal is to create an interactive dashboard to find correlations/outliers in the dataset and analyze them.

  • Design a dashboard which visualizes course evaluation data and is able to provide answers to common questions regarding this data.
  • It should also be able to show changes in certain courses over time.
  • The dataset contains real (anonymized) evaluations from the university and one challenge will be analyzing the dataset.

Contact: Raphael Sahann

Study Path Visualization

Area: Visualization

Each curriculum has a "suggested path" which students are supposed to take in order to finish their studies in time. Data shows that almost no one actually does so. The main question in this project is: how do students actually complete their study and how similar are the paths they take to do so?

  • Find a suitable visualization for the path which an individual student takes trough his/her study
  • compute a measure to estimate the difference between two study paths
  • create an interface which visualizes multiple student paths at once, lets the user select a group of similar paths and compare and explore individual paths.

Contact: Raphael Sahann

Clustering of grade distributions

We have grade distributions from nearly 1000 courses at our faculty. We would like to explore visual and algorithmic ways of exploring this data set. Core questions are:

  • are there clusters of grade distributions?
  • how pronounced are these clusters?

In order to tackle this problem, there is a combination of tasks to be followed:

  • convert the distribution into a space of percentages plus number of students
  • create an interface to "view" this (4+1)-dim space
  • apply standard clustering methods and visualize the results
  • show the impact of a change of parameters to the clustering algorithm visually

Contact: Torsten Möller | Claudia Plant

Personal finance

Area: Visualization / HCI

Today, abundance of data and scarcity of time can make it very difficult to be well informed about specific needs. This becomes even more relevant when it is about our own finances. The main goal of this project is to create a quick snapshot of your state-of-finances.

  • Design an interface which visualizes, valuable financial information that belong to a user. The purpose is to create a visual analysis pipeline. It should allow the user to grab, in less than 60 seconds and with a handful of indicators, a full picture of its financial standpoint.
  • The solution can fish data from a dataset, which will be provided by Erste Bank. The data set contains real (anonymized) transactions, with different categories. One challenge will be to analyze this dataset.
  • The visual outcome of the process should capture a specific timeline (i.e. last month, current month, last 3 months, or personalized time frame).

Contact: Torsten Möller

Visual Document Exploration for Journalists

Area: Visualization / Machine Learning / NLP

Typical document categorization systems use automatic clustering. There is evidence that this method does not produce human-understandable categorizations and does not match how a human would categorize documents. This project would combine machine learning with an interactive document exploration system to better support humans in classifying documents.

  • Analyze state-of-the-art in research and practice
  • Identify an interesting test case of a document collection (e.g. wikileaks data, or wikipedia articles)
  • Develop a tool that allows to
    • manually group documents
    • trains and updates a classifier in the background,
    • recommends other documents the journalist might be interested in
    • visually represents the data to foster overview, understanding and usability

Contact: Thomas Torsney-Weir | Elena Rudkowsky

Visually exploring neural networks

We have a collection of 100,000 different neural networks from the Tensorflow Playground . The core goal of this project is to create a visual interface to understand some of the basic properties of neural networks. Enabling a user to explore should help answer questions like the relationship of number of neurons and number of hidden layers, the impact of batch size, activation functions and other parameters on the quality of the network. Your tasks include:

  • fast prototyping with Tableau
  • getting familiar with the data set
  • querying neural network users on what parameters they want to explore (requirement analysis)
  • development of low-fi and high-fi prototypes

Contact: Torsten Möller

Exploration of confusion matrices

When you are creating a classifier, often the error is reduced to just one number, which is typically the average of all misclassifications. However, if you train two classes A and B, A could be confused for a B and a B could be confused for an A. The difference is really important if A leads to surgery (for instance). Hence, given the data set of 100,000 neural networks from Tensorflow Playground try to understand the tradeoff of training the two classes (orange and blue). What impact do you find to the structure of the neural network? Your tasks include:

  • an interactive interface to change the percentage of the importance of classifiation errors A given B vs. B given A
  • understanding the tradeoff of different neural networks with regards to type of classification, but also to complexity and accuracy, etc.
  • suggest an extension to classifiers with more than three classes.

Contact: Torsten Möller

Visualisation-Supported Comparison of Image Segmentation Metrics

Area: Visualization, Image Processing

Segmentation algorithms, which assign labels to each element in a 2D/3D image, need to be evaluated regarding their performance on a given dataset. The quality of an algorithm is typically determined by comparing its result to a manually labelled image. Many metrics can be used to compute a single number representing the similarity of two such segmentation results, all with specific advantages and disadvantages. The goal in this project is to:

  • Research the segmentation metrics in use in the literature.
  • Create a tool that calculates multiple segmentation quality metrics on an image.
  • With the help of this tool, analyze how the single segmentation metrics perform in detecting specific kinds of errors in the segmentation results, as well as correlations between the metrics.

Contact: Bernhard Fröhler | Torsten Möller

iTuner: Touch interfaces for high-D visualization

Area: Visualization / HCI

Motivation: It is very hard for users to build up a visual understanding of spaces with dimensionality greater than 3

  • Develop a touch-screen interface for navigating high-dimensional spaces.
  • User interface designed for a tablet (ipad) to be used in concert with a larger screen such as a monitor or television.

Contact: Thomas Torsney-Weir

The Perception of Visual Uncertainty Representation by Non-Experts

Area: Visualization / HCI


  • Understanding / Communicating uncertainty and sensitivity information is difficult
  • Uncertainty is part of everyday life for any type of decision making process
  • Some of the previous studies done are unclear and could be improved

Goals and Tasks of several different projects:

  • Brainstorm about different visual encodings
  • Run and evaluate a larger Amazon Turk study

Contact: Torsten Möller | Thomas Torsney-Weir

Histogram Design

Area: Visualization / HCI

Histograms are often used as the first method to gain a quick overview over the statistical distribution of a collection of values, such as the pixel intensities in an image. Depending for example on the datatype of the underlying data (categorical, ordinal or continuous) and the number of data values that are available, several visualization parameters can be considered in constructing a histogram, such as bin width, aspect ratio, tick mark, etc. The perception of a histogram might vary quite a bit depending on the exact parameters chosen, and this might also influence the interpretation. On some of the above points, you should be able to find literature already.

  • Create a web application (e.g. in d3) that allows to enter data in a tabular format, and creates different histograms based on these values.
  • At least the parameters mentioned above should be adaptable by the use
  • Search for rules for determining above parameters automatically from the data, and implement a few
  • Research the variety of tasks that histograms are used for, for instance understanding distributions, filtering of data, finding modes in distribution (number and count)
  • Evaluate the different encodings regarding their effect on the found task.

Contact: Torsten Möller | Bernhard Fröhler

Semi-Automated Data Cleansing of Time Series

Area: Visualization

Many application domains involve a large number of time series, e.g., the energy sector and industrial quality management. However, such data is often afflicted by data quality problems like missing values, outliers, and other types of anomalies. For various downstream tasks, it is not sufficient to merely detect such quality problems, but to cleanse the data. Doing this manually for regularly acquired data may become very time-consuming. On the other hand, fully automated data cleansing may cause a lack of trust in the data by domain experts.

The goal of this work is to design and implement a software prototype that supports a semi-automated process of cleansing time series data. The key idea is to offer the user different mechanisms for cleansing data problems which are suggested by the system in a context-specific way. The flexibility of the user should range from a fully automated "cleanse everything" action to a detailed manual inspection of each detected problem and a corresponding individual choice of cleansing strategy.

Contact: Torsten Möller | Harald Piringer (VRVis)

Task-Oriented Guidance for Visualization Recommendation

Area: Visualization

In many application domains, data involves a large number of attributes and categories. In industrial manufacturing, for example, numerous quality indicators are measured for each produced item along with process information such as the order ID, the used machinery, and much more. For such complex data, manually searching for visualizations that reveal interesting patterns such as correlations, trends, and outliers may become very tedious and time-consuming.

The goal of this work is to extend well-known views such as scatterplots, histograms, or categorical views by integrating recommendations on demand of view parameterizations which may be worth looking at. Typical examples could include “list all scatterplots showing correlations between data attributes for any data subset”, or “rank all time-series plots by the amount of showing a clear trend over the past weeks”. Important tasks of this work are thus to:

  • identify meaningful tasks in the context of various visualization types
  • implement corresponding quality metrics which should ideally be computed efficiently in the background without disturbing the actual analysis
  • design and implement intuitive ways to present the possible visualization options as pre-views to the user in a way that is not obtrusive to the analysis and which scales to large number of possible variants (e.g., by clustering the variants to dissimilar groups).

Contact:Torsten Möller | Harald Piringer (VRVis)

Transformations of visualizations

Area: Visualization / programming languages

We want to investigate how programming language paradigms like Haskell can be used in visualization. We want to understand how changing the underlying data should change the visual representation. Furthermore, we want to investigate how visualizations can be combined with each other in an automated fashion. The overall goal is to have a visualization library that can automatically build visualizations of complex datasets.

An interested student is *not* expected to work on all the following sub-goals. We can design the project based on interest/skillset. Specific sub-goals:

  • Develop a visualization library in Haskell or Purescript
  • Investigate how does category theory fit into visualization?

Contact: Thomas Torsney-Weir

Visually Analyzing the Fault Tolerance of Deep Neural Networks

The main objective is to design and implement a good and efficient way of visually investigating the resilience of deep neural networks against silent data corruption (bit flips) based on given empirical measurements. There are many possible causes for such faults (e.g., cosmic radiation, increasing density in chips, lower voltage which implies lower signal charge, etc.), and their "incidence" is expected to increase with current trends in chip architecture.

Starting point for the project is a given example data set which contains information about the relationship between single bit flips across various locations of a certain neural network (which layer, which neuron, which weight, which position within the floating-point representation of a real number, etc.) and the resulting accuracy of the network.

The task is to develop a tool which supports answering various questions about the influence of a bit flip on the resulting accuracy.

Examples for interesting questions are the following:

  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the positions in the floating-point representation
  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the layers in the network architecture?
  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the weights in a given layer in the network architecture?

In order to answer these questions, an iterative design process is required to

  • start with a requirement analysis (task & data analysis)
  • low-fi prototypes
  • high-fi prototypes
  • refinement
  • constant evaluation of the visual analysis tool.

The data set, the problem setting and the details of the requirements are provided by Prof. Gansterer, the supervision in visual analysis aspects is provided by Prof. Möller.

Contact: W. Gansterer | Torsten Möller

Ensemble methods in image segmentation

An image segmentation algorithm labels a pixel. While no segmentation algorithm is always correct, the idea is to work with many different segmentation algorithm that each create a label for a pixel. We call this an ensemble. The idea of this project is to explore how to best combine these different ensemble members to "always" create the right label for the pixel (to explore 'the wisdom of the crowd').
In image segmentation, several methods are known of how to combine a given collection of segmentation results. For example voting methods might label a pixel according to the majority of labels for that pixel in the collection. However, such a vote can be ambiguous, therefore additional rules might be required to arrive at a definitive labeling.


  • Gather and/or define a set of useful rules to combine image segmentation results. Furthermore, define a pipeline containing these rules, such that the usage of the rules is depending on the parameterization. A simple example: The pipeline could be based on the majority voting rule, combined with intelligent rules for handling the case of ambiguous pixels, for example through considering the neighborhood of the pixel or the uncertainty of the single segmentation results (if probabilistic segmentation algorithms are used).
  • Explore the parameter space of this generalized pipeline. Set up a framework to “learn” suitable parameters for this pipeline. Test your pipeline on several different datasets and try to come up with optimal parameters. Refine your pipeline until it can produce results at least close to the state of the art algorithms for segmentation such images.
  • Once a set of optimal parameters for some limited number of datasets are established, perform experiments on whether those parameters learned for the generalized combination pipeline are transferable to the processing of new datasets, i.e. other than those the parameters were learned with.


  • Definition of a parameterized, rule-based pipeline for (specific) image analysis tasks.
  • Evaluation of the pipeline and refinement of its parameters on a limited number of datasets
  • Application of the pipeline and the found parameters on a broader range of datasets

Contact: Torsten Möller | Bernhard Fröhler

Implementing and Exploring the Toronto Paper Matching System

Assigning reviewers to papers is an important and time consuming task that every conference must tackle. Using NLP techniques allows scientists to do the matching semi or even fully automatically. The Toronto Paper Matching System is a system which works on those problems. We want to implement and explore the proposed system and understand the benefits and challenges of such a system.

Goals and tasks of this project are:

  • Implement the system from the paper
  • Explore different design decisions
  • Evaluate the performance of the system

Contact: Christoph Kralj