Open Topics

Guided interactive definition of derived time series data

The analysis of time-dependent data - such as sensor data from industrial processes - is an important step in creating predictive models, for example to detect failures at an early stage. In many cases, transformations of the measured data play a central role, such as folding, time offsets, derivatives, and many more. In addition to quantitative transformations, the categorization of data by the user is also an important type of derived data, e.g., for summarization.

The goal of this topic is to create an interactive tool that allows users without programming knowledge to define and parameterize such transformations and pipelines of transformations. The tool should help the user to select parameters such as the window size of a folding kernel in a guided way and to validate them immediately.

Tasks:

  • Research the state of the art in scientific methodology and commercial tools for interactive analysis of time series
  • Design an interactive tool featuring a guided definition and validation of transformed and categorized time series
  • Implement the tool as an extension of the software Visplore (http://goo.gl/wqJ4AS)
  • Evaluate the design for real-world use cases from the fields of industrial production and energy management

Pre-requisites: Vis, HCI

Supervisor: Torsten Möller / Harald Piringer (VRVis)

Publication Structure Detection using Image Segmentation

Area: Machine Learning

Pdfs are an important if not the most important way of sharing academic research results. And therefore, an important source for data mining applications. In this project we want to extract different parts of a publication - text, formulas, tables, references and figures - using deep learning methods. More precisely, we want to create a convolutional neural network for image segmentation.

Goals:

  • Create a training dataset by creating a small tool which allows experts to annotate publications. Annotations: text, figure, formula, reference and table.
  • Build and parametrize a image segmentation algorithm based on the annotated publications. The segmentation is based on the annotations.
  • In a first step we would just like to segment images and non-images
  • In a second step we would like to segment all annotations
  • Create a tool which takes pdfs as input and returns segmentation results

Requirements: FDA
Programming languages: Python

Publication Structure Detection using NLP

Area: Natural Language Processing, Data Mining

Pdfs are an important if not the most important way of sharing academic research results. And therefore, an important source for data mining applications. There are multiple libraries that extract text but each one has its own drawbacks.The goal in this project is to improve the quality of the extracted texts. It can be achieved by using smart regular expression and preprocessing techniques like stemming of words, sentences as well as spell correction and other methods. In addition the text should be structured into text, figure, formula, reference and table parts for easier further usage.

Goals:

Requirements: FDA
Programming languages: Python

Contact: Christoph Kralj 

Multiverse analysis

Many people have discussed and analyzed the effect of replicating studies in terms of data collection. There's obviously a difference in terms of what methods you use to analyze the data too. There are a variety of options of data cleaning and statistical tests that one can use to analyze the data. We want to examine how these different methods can affect the outcome using a variety of datasets. How would the significance and power tests change with different types of analysis?

Goals and tasks of this project are:

  • Collect a number of open-data user studies from the visualization and HCI community
  • Design a pipeline using, e.g. R or Python, to run a number of different statistical analyses of the data
  • Sample this pipeline and analyze the results


Course requirements: Visualization

Contact: Thomas Torsney-Weir

Transformations of visualizations

We want to investigate how programming language paradigms like Haskell can be used in visualization. We want to understand how changing the underlying data should change the visual representation. Furthermore, we want to investigate how visualizations can be combined with each other in an automated fashion. The overall goal is to have a visualization library that can automatically build visualizations of complex datasets.

An interested student is not expected to work on all the following sub-goals.We can design the project based on interest/skillset.

Specific sub-goals:

  • Develop a visualization library in Haskell or Purescript
  • Investigate how does category theory fit into visualization?

Course requirements: Visualization

Programming languages: Haskell

Contact: Thomas Torsney-Weir

Visualization for small screen devices

People consume more and more information through small-screen devices like phones and tablets. We want to investigate what visualizations are possible on such devices.

Goals and tasks of this project are:

  • Create a list (with examples) of visualization methods that can be implemented on a small screen
  • Determine why

Course requirements: Visualization / HCI 

Contact: Thomas Torsney-Weir

e-learning methods for visualization

Evaluating the effectiveness of visualizations as a whole is difficult. e-learning technology lets us measure how well people accomplish different learning objectives. Is it possible to use this for evaluating visualizations?

  • Goals and tasks of this project are:
  • Conduct a literature review on e-learning technology
  • Design and execute a user study using these methods
  • Evaluate the effectiveness of this method for visualization effectiveness

Course requirements: Visualization

Contact: Thomas Torsney-Weir

Machine learning to detect user behavior

The specific application is to detect "click throughers" in crowd sourced studies such as Amazon Mechanical Turk. Click throughers produce inconsistent results in crowd sourced studies because they answer randomly to get through the study as quickly as possible rather than thinking carefully about their choices. We would like to develop algorithms to detect these participants.

Goals and tasks of this project are:

  • Verify mouse tracking data from previous studies
  • Analyze mouse tracking data to understand different user behaviors 
  • Use a machine learning algorithm to classify users based on mouse tracking data

 

Course requirements: FDA / HCI

Contact: Thomas Torsney-Weir

Library for visualization of slices

This is primarily a programming project. Slicing methods are a novel way of visualizing multi-dimensional data. However, there is no publicly-available library for R or Python that makes it easy to use these visualization techniques. The goal of these projects is to develop such a library. Students should have knowledge of Javascript and either R or Python.

 

Course requirements: Visualization

Programming languages: Javascript and (R or Python)

Contact: Thomas Torsney-Weir

Visualization for optimization problems

Can visualization beat traditional (offline) optimization problems? The goal of this project is to see how well visually guided optimization can compete with traditional optimization algorithms. Students will develop a visualization system to find optimum configurations of black box (i.e. unknown) algorithms from a contest.

Course requirements: Visualization / Foundations of Mathematics

Programming languages: Javascript, (R or Python), and C++

Contact: Thomas Torsney-Weir

University Course Evaluation

Area: Visualization / HCI

The university gathers data about all its courses after each semester in the form of evaluations and other quantitative measures. The main goal is to create an interactive dashboard to find correlations/outliers in the dataset and analyze them.

  • Design a dashboard which visualizes course evaluation data and is able to provide answers to common questions regarding this data.
  • It should also be able to show changes in certain courses over time.
  • The dataset contains real (anonymized) evaluations from the university and one challenge will be analyzing the dataset.

Contact: Raphael Sahann

Study Path Visualization

Area: Visualization

Each curriculum has a "suggested path" which students are supposed to take in order to finish their studies in time. Data shows that almost no one actually does so. The main question in this project is: how do students actually complete their study and how similar are the paths they take to do so?

  • Find a suitable visualization for the path which an individual student takes trough his/her study
  • compute a measure to estimate the difference between two study paths
  • create an interface which visualizes multiple student paths at once, lets the user select a group of similar paths and compare and explore individual paths.

Contact: Raphael Sahann

Visual Document Exploration for Journalists

 

Typical document categorization systems use automatic clustering. There is evidence that this method does not produce human-understandable categorizations and does not match how a human would categorize documents. This project would combine machine learning with an interactive document exploration system to better support humans in classifying documents.

  • Analyze state-of-the-art in research and practice
  • Identify an interesting test case of a document collection (e.g. wikileaks data, or wikipedia articles)
  • Develop a tool that allows to
    • manually group documents
    • trains and updates a classifier in the background,
    • recommends other documents the journalist might be interested in
    • visually represents the data to foster overview, understanding and usability

Course requirements: Visualization / FDA

Programming languages: Javascript / Python

Contact: Thomas Torsney-Weir 

Visually exploring neural networks

We have a collection of 100,000 different neural networks from the Tensorflow Playground . The core goal of this project is to create a visual interface to understand some of the basic properties of neural networks. Enabling a user to explore should help answer questions like the relationship of number of neurons and number of hidden layers, the impact of batch size, activation functions and other parameters on the quality of the network. Your tasks include:

  • fast prototyping with Tableau
  • getting familiar with the data set
  • querying neural network users on what parameters they want to explore (requirement analysis)
  • development of low-fi and high-fi prototypes

Contact: Torsten Möller

Visualization-Supported Comparison of Image Segmentation Metrics

Area: Visualization, Image Processing

Segmentation algorithms, which assign labels to each element in a 2D/3D image, need to be evaluated regarding their performance on a given dataset. The quality of an algorithm is typically determined by comparing its result to a manually labelled image. Many metrics can be used to compute a single number representing the similarity of two such segmentation results, all with specific advantages and disadvantages. The goal in this project is to:

  • Research the segmentation metrics in use in the literature.
  • Create a tool that calculates multiple segmentation quality metrics on an image.
  • With the help of this tool, analyze how the single segmentation metrics perform in detecting specific kinds of errors in the segmentation results, as well as correlations between the metrics.

Contact: Bernhard Fröhler | Torsten Möller

iTuner: Touch interfaces for high-D visualization

Area: Visualization / HCI

In order to understand simulations, machine learning algorithms, and geometric objects we need to interact with them. This is difficult to perform with something like a mouse which only has 2 axes of movement. Multitouch interfaces let us develop novel interactions for multi-dimensional data. The goals of this project are:

  • Develop a touch-screen interface for navigating high-dimensional spaces.
  • User interface designed for a tablet (ipad) to be used in concert with a larger screen such as a monitor or television.

Contact: Thomas Torsney-Weir

The Perception of Visual Uncertainty Representation by Non-Experts

Area: Visualization / HCI

Motivation:

  • Understanding / Communicating uncertainty and sensitivity information is difficult
  • Uncertainty is part of everyday life for any type of decision making process
  • Some of the previous studies done are unclear and could be improved

Goals and Tasks of several different projects:

  • Brainstorm about different visual encodings
  • Run and evaluate a larger Amazon Turk study

Contact: Torsten Möller | Thomas Torsney-Weir

Histogram Design

Area: Visualization / HCI

Histograms are often used as the first method to gain a quick overview over the statistical distribution of a collection of values, such as the pixel intensities in an image. Depending for example on the datatype of the underlying data (categorical, ordinal or continuous) and the number of data values that are available, several visualization parameters can be considered in constructing a histogram, such as bin width, aspect ratio, tick mark, etc. The perception of a histogram might vary quite a bit depending on the exact parameters chosen, and this might also influence the interpretation. On some of the above points, you should be able to find literature already.

  • Create a web application (e.g. in d3) that allows to enter data in a tabular format, and creates different histograms based on these values.
  • At least the parameters mentioned above should be adaptable by the use
  • Search for rules for determining above parameters automatically from the data, and implement a few
  • Research the variety of tasks that histograms are used for, for instance understanding distributions, filtering of data, finding modes in distribution (number and count)
  • Evaluate the different encodings regarding their effect on the found task.

Contact: Torsten Möller | Bernhard Fröhler

Semi-Automated Data Cleansing of Time Series

Area: Visualization

Many application domains involve a large number of time series, e.g., the energy sector and industrial quality management. However, such data is often afflicted by data quality problems like missing values, outliers, and other types of anomalies. For various downstream tasks, it is not sufficient to merely detect such quality problems, but to cleanse the data. Doing this manually for regularly acquired data may become very time-consuming. On the other hand, fully automated data cleansing may cause a lack of trust in the data by domain experts.

The goal of this work is to design and implement a software prototype that supports a semi-automated process of cleansing time series data. The key idea is to offer the user different mechanisms for cleansing data problems which are suggested by the system in a context-specific way. The flexibility of the user should range from a fully automated "cleanse everything" action to a detailed manual inspection of each detected problem and a corresponding individual choice of cleansing strategy.

Contact: Torsten Möller | Harald Piringer (VRVis)

Task-Oriented Guidance for Visualization Recommendation

Area: Visualization

In many application domains, data involves a large number of attributes and categories. In industrial manufacturing, for example, numerous quality indicators are measured for each produced item along with process information such as the order ID, the used machinery, and much more. For such complex data, manually searching for visualizations that reveal interesting patterns such as correlations, trends, and outliers may become very tedious and time-consuming.

The goal of this work is to extend well-known views such as scatterplots, histograms, or categorical views by integrating recommendations on demand of view parameterizations which may be worth looking at. Typical examples could include “list all scatterplots showing correlations between data attributes for any data subset”, or “rank all time-series plots by the amount of showing a clear trend over the past weeks”. Important tasks of this work are thus to:

  • identify meaningful tasks in the context of various visualization types
  • implement corresponding quality metrics which should ideally be computed efficiently in the background without disturbing the actual analysis
  • design and implement intuitive ways to present the possible visualization options as pre-views to the user in a way that is not obtrusive to the analysis and which scales to large number of possible variants (e.g., by clustering the variants to dissimilar groups).

Contact:Torsten Möller | Harald Piringer (VRVis)

Transformations of visualizations

Area: Visualization / programming languages

We want to investigate how programming language paradigms like Haskell can be used in visualization. We want to understand how changing the underlying data should change the visual representation. Furthermore, we want to investigate how visualizations can be combined with each other in an automated fashion. The overall goal is to have a visualization library that can automatically build visualizations of complex datasets.

An interested student is *not* expected to work on all the following sub-goals. We can design the project based on interest/skillset. Specific sub-goals:

  • Develop a visualization library in Haskell or Purescript
  • Investigate how does category theory fit into visualization?

Contact: Thomas Torsney-Weir

Visually Analyzing the Fault Tolerance of Deep Neural Networks

The main objective is to design and implement a good and efficient way of visually investigating the resilience of deep neural networks against silent data corruption (bit flips) based on given empirical measurements. There are many possible causes for such faults (e.g., cosmic radiation, increasing density in chips, lower voltage which implies lower signal charge, etc.), and their "incidence" is expected to increase with current trends in chip architecture.

Starting point for the project is a given example data set which contains information about the relationship between single bit flips across various locations of a certain neural network (which layer, which neuron, which weight, which position within the floating-point representation of a real number, etc.) and the resulting accuracy of the network.

The task is to develop a tool which supports answering various questions about the influence of a bit flip on the resulting accuracy.

Examples for interesting questions are the following:

  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the positions in the floating-point representation
  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the layers in the network architecture?
  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the weights in a given layer in the network architecture?

In order to answer these questions, an iterative design process is required to

  • start with a requirement analysis (task & data analysis)
  • low-fi prototypes
  • high-fi prototypes
  • refinement
  • constant evaluation of the visual analysis tool.

The data set, the problem setting and the details of the requirements are provided by Prof. Gansterer, the supervision in visual analysis aspects is provided by Prof. Möller.

Contact: W. Gansterer | Torsten Möller

Ensemble methods in image segmentation

An image segmentation algorithm labels a pixel. While no segmentation algorithm is always correct, the idea is to work with many different segmentation algorithm that each create a label for a pixel. We call this an ensemble. The idea of this project is to explore how to best combine these different ensemble members to "always" create the right label for the pixel (to explore 'the wisdom of the crowd').
In image segmentation, several methods are known of how to combine a given collection of segmentation results. For example voting methods might label a pixel according to the majority of labels for that pixel in the collection. However, such a vote can be ambiguous, therefore additional rules might be required to arrive at a definitive labeling.

Goal:

  • Gather and/or define a set of useful rules to combine image segmentation results. Furthermore, define a pipeline containing these rules, such that the usage of the rules is depending on the parameterization. A simple example: The pipeline could be based on the majority voting rule, combined with intelligent rules for handling the case of ambiguous pixels, for example through considering the neighborhood of the pixel or the uncertainty of the single segmentation results (if probabilistic segmentation algorithms are used).
  • Explore the parameter space of this generalized pipeline. Set up a framework to “learn” suitable parameters for this pipeline. Test your pipeline on several different datasets and try to come up with optimal parameters. Refine your pipeline until it can produce results at least close to the state of the art algorithms for segmentation such images.
  • Once a set of optimal parameters for some limited number of datasets are established, perform experiments on whether those parameters learned for the generalized combination pipeline are transferable to the processing of new datasets, i.e. other than those the parameters were learned with.

Milestones:

  • Definition of a parameterized, rule-based pipeline for (specific) image analysis tasks.
  • Evaluation of the pipeline and refinement of its parameters on a limited number of datasets
  • Application of the pipeline and the found parameters on a broader range of datasets

Contact: Torsten Möller | Bernhard Fröhler