Open Topics

Industry-related topics:

Text Mining:

Machine Learning:

Image Processing:

User studies:

Theory of Vis:

Analysis of University Data:

Industry-related topics:

Guided interactive definition of derived time series data

The analysis of time-dependent data - such as sensor data from industrial processes - is an important step in creating predictive models, for example to detect failures at an early stage. In many cases, transformations of the measured data play a central role, such as folding, time offsets, derivatives, and many more. In addition to quantitative transformations, the categorization of data by the user is also an important type of derived data, e.g., for summarization.

The goal of this topic is to create an interactive tool that allows users without programming knowledge to define and parameterize such transformations and pipelines of transformations. The tool should help the user to select parameters such as the window size of a folding kernel in a guided way and to validate them immediately.

Tasks:

  • Research the state of the art in scientific methodology and commercial tools for interactive analysis of time series
  • Design an interactive tool featuring a guided definition and validation of transformed and categorized time series
  • Implement the tool as an extension of the software Visplore (http://goo.gl/wqJ4AS)
  • Evaluate the design for real-world use cases from the fields of industrial production and energy management

Prerequisites: VIS, HCI

Contact: Torsten Möller | Harald Piringer (VRVis)

Semi-Automated Data Cleansing of Time Series

Area: Visualization

Many application domains involve a large number of time series, e.g., the energy sector and industrial quality management. However, such data is often afflicted by data quality problems like missing values, outliers, and other types of anomalies. For various downstream tasks, it is not sufficient to merely detect such quality problems, but to cleanse the data. Doing this manually for regularly acquired data may become very time-consuming. On the other hand, fully automated data cleansing may cause a lack of trust in the data by domain experts.

The goal of this work is to design and implement a software prototype that supports a semi-automated process of cleansing time series data. The key idea is to offer the user different mechanisms for cleansing data problems which are suggested by the system in a context-specific way. The flexibility of the user should range from a fully automated "cleanse everything" action to a detailed manual inspection of each detected problem and a corresponding individual choice of cleansing strategy.

Prerequisites: VIS

Contact: Torsten Möller | Harald Piringer (VRVis)

Task-Oriented Guidance for Visualization Recommendation

Area: Visualization

In many application domains, data involves a large number of attributes and categories. In industrial manufacturing, for example, numerous quality indicators are measured for each produced item along with process information such as the order ID, the used machinery, and much more. For such complex data, manually searching for visualizations that reveal interesting patterns such as correlations, trends, and outliers may become very tedious and time-consuming.

The goal of this work is to extend well-known views such as scatterplots, histograms, or categorical views by integrating recommendations on demand of view parameterizations which may be worth looking at. Typical examples could include “list all scatterplots showing correlations between data attributes for any data subset”, or “rank all time-series plots by the amount of showing a clear trend over the past weeks”. Important tasks of this work are thus to:

  • identify meaningful tasks in the context of various visualization types
  • implement corresponding quality metrics which should ideally be computed efficiently in the background without disturbing the actual analysis
  • design and implement intuitive ways to present the possible visualization options as pre-views to the user in a way that is not obtrusive to the analysis and which scales to large number of possible variants (e.g., by clustering the variants to dissimilar groups).

Prerequisites: VIS

Contact: Torsten Möller | Harald Piringer (VRVis)

Automatic basic parametrization in the area of quality assurance solutions for automated production processes

The industry is defined by automated production processes. Quality assurance is a central point to monitor and optimize these processes. The difficulty in process monitoring is the dependency on many parameters, which are usually set manually. The goal of this project is to identify methods that allow a first automated parametrization of the respective monitoring processes.

Goals and tasks of this project are:

  • Create a task / data requirement analysis
  • Collection, classification and visualization of data
  • Provide a visual interface to allow the exploration of different results given different parameter settings
  • Iterate with the users to evaluate your design
  • Help build a regression model that matches the given input/output data

Prerequisites: FDA, VIS

Contact: Torsten Möller | Christian Kersjes (Plasmo)

Automatic welding seam detection in data, collected by electro-optical distance measurement (laser triangulation)

There are large amounts of data from welding processes, which were collected by means of laser triangulation. These should be analyzed and visualized (using VTK (The Visualization ToolKit) 2D and 3D graphs). The data analysis relates to the creation of algorithms for automatic seam detection in welding processes.

Goals and tasks of this project are:

  • Implement a segmentation algorithm for welding seam detection (We are open which particular algorithm, this could depend on the expertise of the student. An option here would be deep neural networks.)
  • Create a visual interface to allow the manual validation of the segmentation results.

Pre-requisites: SIP, possibly IPA 

Supervisors: Torsten Möller | Christian Kersjes (Plasmo)

Bird Song Classification using Convolutional Neural Networks

The main task is to find the best suiting deep learning architecture for bird songs classification. The process includes non-trivial data preparation: the original sound files must be split into smaller samples with optimal length and converted into spectrograms, which are then the input of a Convolutional Neural Network.

The student shall focus on the following points:

  • Find the optimal length of a sound sample, so that the relevant spectrograms can be best classified in a Convolutional Neural Network (CNN)
  • Build a CNN that classifies the spectrograms derived from the original sound files focusing on the accuracy
  • Compare the performance of the own constructed architecture with the existing on the market ones

Prerequisitions: FDA, VIS

Programming language: Python


Contact: Torsten Möller | Elena Ginina (VRVis)

Anomaly Detection in Time Series using Recurrent Neural Networks

The goal of this project is to evaluate the performance of a Recurrent Neural Network (LSTM) for detecting anomalies in time-dependent data in the framework of real-time interactive visualization.


The student shall focus on the following points:

  • Study the generic anomalies observed in time-dependent data
  • Implement the Long short-term memory (LSTM) technique for detecting the anomalies
  • Compare and estimate the performance of the LSTM against other state-of-the art techniques existing on the market
  • Research the possible improvements for real-time calculations


Prerequisitions: FDA, VIS

Programming language: Python

Contact: Torsten Möller | Elena Ginina (VRVis)

Text Mining:

Publication Structure Detection using Image Segmentation

Area: Machine Learning

Pdfs are an important if not the most important way of sharing academic research results. And therefore, an important source for data mining applications. In this project we want to extract different parts of a publication - text, formulas, tables, references and figures - using deep learning methods. More precisely, we want to create a convolutional neural network for image segmentation.

Goals:

  • Create a training dataset by creating a small tool which allows experts to annotate publications. Annotations: text, figure, formula, reference and table.
  • Build and parametrize a image segmentation algorithm based on the annotated publications. The segmentation is based on the annotations.
  • In a first step we would just like to segment images and non-images
  • In a second step we would like to segment all annotations
  • Create a tool which takes pdfs as input and returns segmentation results

Requirements: FDA
Programming languages: Python

Contact: Christoph Kralj | Torsten Möller

Publication Structure Detection using NLP

Area: Natural Language Processing, Data Mining

Pdfs are an important if not the most important way of sharing academic research results. And therefore, an important source for data mining applications. There are multiple libraries that extract text but each one has its own drawbacks.The goal in this project is to improve the quality of the extracted texts. It can be achieved by using smart regular expression and preprocessing techniques like stemming of words, sentences as well as spell correction and other methods. In addition the text should be structured into text, figure, formula, reference and table parts for easier further usage.

Goals:

Requirements: FDA
Programming languages: Python

Contact: Christoph Kralj 

Visual Document Exploration for Journalists

 

Typical document categorization systems use automatic clustering. There is evidence that this method does not produce human-understandable categorizations and does not match how a human would categorize documents. This project would combine machine learning with an interactive document exploration system to better support humans in classifying documents.

  • Analyze state-of-the-art in research and practice
  • Identify an interesting test case of a document collection (e.g. wikileaks data, or wikipedia articles)
  • Develop a tool that allows to
    • manually group documents
    • trains and updates a classifier in the background,
    • recommends other documents the journalist might be interested in
    • visually represents the data to foster overview, understanding and usability

Prerequisites: Visualization / FDA

Programming languages: Javascript / Python

Contact: Thomas Torsney-Weir 

Machine Learning:

Visually exploring neural networks

We have a collection of 100,000 different neural networks from the Tensorflow Playground . The core goal of this project is to create a visual interface to understand some of the basic properties of neural networks. Enabling a user to explore should help answer questions like the relationship of number of neurons and number of hidden layers, the impact of batch size, activation functions and other parameters on the quality of the network. Your tasks include:

  • fast prototyping with Tableau
  • getting familiar with the data set
  • querying neural network users on what parameters they want to explore (requirement analysis)
  • development of low-fi and high-fi prototypes

 

Prerequisites: VIS, FDA

Contact: Torsten Möller

Visually Analyzing the Fault Tolerance of Deep Neural Networks

The main objective is to design and implement a good and efficient way of visually investigating the resilience of deep neural networks against silent data corruption (bit flips) based on given empirical measurements. There are many possible causes for such faults (e.g., cosmic radiation, increasing density in chips, lower voltage which implies lower signal charge, etc.), and their "incidence" is expected to increase with current trends in chip architecture.

Starting point for the project is a given example data set which contains information about the relationship between single bit flips across various locations of a certain neural network (which layer, which neuron, which weight, which position within the floating-point representation of a real number, etc.) and the resulting accuracy of the network.

The task is to develop a tool which supports answering various questions about the influence of a bit flip on the resulting accuracy.

Examples for interesting questions are the following:

  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the positions in the floating-point representation
  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the layers in the network architecture?
  • (empirical) distribution of the influence of a bit flip on the resulting accuracy over the weights in a given layer in the network architecture?

In order to answer these questions, an iterative design process is required to

  • start with a requirement analysis (task & data analysis)
  • low-fi prototypes
  • high-fi prototypes
  • refinement
  • constant evaluation of the visual analysis tool.

The data set, the problem setting and the details of the requirements are provided by Prof. Gansterer, the supervision in visual analysis aspects is provided by Prof. Möller.

Prerequisites: VIS, FDA

Contact: W. Gansterer | Torsten Möller

Library for visualization of slices

This is primarily a programming project. Slicing methods are a novel way of visualizing multi-dimensional data. However, there is no publicly-available library for R or Python that makes it easy to use these visualization techniques. The goal of these projects is to develop such a library. Students should have knowledge of Javascript and either R or Python.

 

Prerequisites: VIS
Programming languages: Javascript and (R or Python)

Contact: Thomas Torsney-Weir

Visualization for optimization problems

Can visualization beat traditional (offline) optimization problems? The goal of this project is to see how well visually guided optimization can compete with traditional optimization algorithms. Students will develop a visualization system to find optimum configurations of black box (i.e. unknown) algorithms from a contest.

Prerequisites: VIS, Mathematical Modeling
Programming languages: Javascript, (R or Python), and C++

Contact: Thomas Torsney-Weir

Image Processing:

Visualization-Supported Comparison of Image Segmentation Metrics

Area: Visualization, Image Processing

Segmentation algorithms, which assign labels to each element in a 2D/3D image, need to be evaluated regarding their performance on a given dataset. The quality of an algorithm is typically determined by comparing its result to a manually labelled image. Many metrics can be used to compute a single number representing the similarity of two such segmentation results, all with specific advantages and disadvantages. The goal in this project is to:

  • Research the segmentation metrics in use in the literature.
  • Create a tool that calculates multiple segmentation quality metrics on an image.
  • With the help of this tool, analyze how the single segmentation metrics perform in detecting specific kinds of errors in the segmentation results, as well as correlations between the metrics.


Prerequisites: SIP, VIS

Contact: Bernhard Fröhler | Torsten Möller

Ensemble methods in image segmentation

An image segmentation algorithm labels a pixel. While no segmentation algorithm is always correct, the idea is to work with many different segmentation algorithm that each create a label for a pixel. We call this an ensemble. The idea of this project is to explore how to best combine these different ensemble members to "always" create the right label for the pixel (to explore 'the wisdom of the crowd').
In image segmentation, several methods are known of how to combine a given collection of segmentation results. For example voting methods might label a pixel according to the majority of labels for that pixel in the collection. However, such a vote can be ambiguous, therefore additional rules might be required to arrive at a definitive labeling.

Goal:

  • Gather and/or define a set of useful rules to combine image segmentation results. Furthermore, define a pipeline containing these rules, such that the usage of the rules is depending on the parameterization. A simple example: The pipeline could be based on the majority voting rule, combined with intelligent rules for handling the case of ambiguous pixels, for example through considering the neighborhood of the pixel or the uncertainty of the single segmentation results (if probabilistic segmentation algorithms are used).
  • Explore the parameter space of this generalized pipeline. Set up a framework to “learn” suitable parameters for this pipeline. Test your pipeline on several different datasets and try to come up with optimal parameters. Refine your pipeline until it can produce results at least close to the state of the art algorithms for segmentation such images.
  • Once a set of optimal parameters for some limited number of datasets are established, perform experiments on whether those parameters learned for the generalized combination pipeline are transferable to the processing of new datasets, i.e. other than those the parameters were learned with.

Milestones:

  • Definition of a parameterized, rule-based pipeline for (specific) image analysis tasks.
  • Evaluation of the pipeline and refinement of its parameters on a limited number of datasets
  • Application of the pipeline and the found parameters on a broader range of datasets

Prerequisites: VIS, SIP

Contact: Torsten Möller | Bernhard Fröhler

Smart Image Filter Preview

The analysis of large images (2D or 3D), requires applying filters like smoothing or denoising. Finding the most suitable parameters for a given analysis task through a trial-and-error approach can be  time-consuming. The goal of this project is to develop a tool for a smart preview over the possible outcome of some image processing filters for different parameters for a small region of the image; the outcome of different parameterizations could for example be presented in a matrix; the tool should also be evaluated regarding usability.

Prerequisites: SIP, HCI

Programming languages: Python, C++


Contact: Bernhard Fröhler | Torsten Möller

User studies:

The Perception of Visual Uncertainty Representation by Non-Experts

Area: Visualization / HCI

Motivation:

  • Understanding / Communicating uncertainty and sensitivity information is difficult
  • Uncertainty is part of everyday life for any type of decision making process
  • Some of the previous studies done are unclear and could be improved

Goals and Tasks of several different projects:

  • Brainstorm about different visual encodings
  • Run and evaluate a larger Amazon Turk study

Contact: Torsten Möller | Thomas Torsney-Weir

Machine learning to detect user behavior

The specific application is to detect "click throughers" in crowd sourced studies such as Amazon Mechanical Turk. Click throughers produce inconsistent results in crowd sourced studies because they answer randomly to get through the study as quickly as possible rather than thinking carefully about their choices. We would like to develop algorithms to detect these participants.

Goals and tasks of this project are:

  • Verify mouse tracking data from previous studies
  • Analyze mouse tracking data to understand different user behaviors 
  • Use a machine learning algorithm to classify users based on mouse tracking data

 

Prerequisites: FDA, HCI

Contact: Thomas Torsney-Weir

e-learning methods for visualization

Evaluating the effectiveness of visualizations as a whole is difficult. e-learning technology lets us measure how well people accomplish different learning objectives. Is it possible to use this for evaluating visualizations?

  • Goals and tasks of this project are:
  • Conduct a literature review on e-learning technology
  • Design and execute a user study using these methods
  • Evaluate the effectiveness of this method for visualization effectiveness

Prerequisites: VIS

Contact: Thomas Torsney-Weir

Theory of Vis:

Visualization for small screen devices

People consume more and more information through small-screen devices like phones and tablets. We want to investigate what visualizations are possible on such devices.

Goals and tasks of this project are:

  • Create a list (with examples) of visualization methods that can be implemented on a small screen
  • Explain what methods are and are not possible and why

Prerequisites: VIS, HCI 

Contact: Thomas Torsney-Weir

Transformations of visualizations

We want to investigate how programming language paradigms like Haskell can be used in visualization. We want to understand how changing the underlying data should change the visual representation. Furthermore, we want to investigate how visualizations can be combined with each other in an automated fashion. The overall goal is to have a visualization library that can automatically build visualizations of complex datasets.

An interested student is not expected to work on all the following sub-goals.We can design the project based on interest/skillset.

Specific sub-goals:

  • Develop a visualization library in Haskell or Purescript
  • Investigate how does category theory fit into visualization?

Prerequisites: VIS
Programming languages: Haskell

Contact: Thomas Torsney-Weir

iTuner: Touch interfaces for high-D visualization

 

In order to understand simulations, machine learning algorithms, and geometric objects we need to interact with them. This is difficult to perform with something like a mouse which only has 2 axes of movement. Multitouch interfaces let us develop novel interactions for multi-dimensional data. The goals of this project are:

  • Develop a touch-screen interface for navigating high-dimensional spaces.
  • User interface designed for a tablet (ipad) to be used in concert with a larger screen such as a monitor or television.

Prerequisites: VIS, HCI

Contact: Thomas Torsney-Weir

Multiverse analysis

Many people have discussed and analyzed the effect of replicating studies in terms of data collection. There's obviously a difference in terms of what methods you use to analyze the data too. There are a variety of options of data cleaning and statistical tests that one can use to analyze the data. We want to examine how these different methods can affect the outcome using a variety of datasets. How would the significance and power tests change with different types of analysis?

Goals and tasks of this project are:

  • Collect a number of open-data user studies from the visualization and HCI community
  • Design a pipeline using, e.g. R or Python, to run a number of different statistical analyses of the data
  • Sample this pipeline and analyze the results


Prerequisites: VIS

Contact: Thomas Torsney-Weir

Analysis of University Data:

Create a Moodle student interface to see personal learning goals

During a semester, students are able to use materials in Moodle at their own pace. The grade mask is very important for this task, because this is where students can review their assessments. In Moodle this is currently realized as a table. How could this interface be improved to fit the need of students better?

  • perform a requirement analysis. This has two parts:
    • user-centered: interview a number of different potential users and ask them what they would need and expect from the Moodle overview
    • data-centered: create an overview of courses taught at the faculty of computer science and create an overview of the different evaluation schema
  • analyse the available possibilities in Moodle and compare them with the requirement analysis. indicate which features already exist and which need to be developed
  • develop low-fi prototypes, evaluate them with a number of potential users
  • propose a hi-fi prototype and develop it (as a Moodle plugin)
  • evaluate the hi-fi prototype and improve it according to the feedback you get

Prerequisites: finished the Human Computer Interaction class
Programming: JavaScript (D3.js)

Contact: Raphael Sahann | Torsten Möller | Daniel Handle-Pfeiffer (of the CTL)


Enhance the existing Moodle teacher interface for grading

The moodle teacher overview page of a course shows all partial grades that students in a particular course received over the course of the semester. Since all the data is shown in a single large table it is hard to perceive and teachers do not get an overview of how students are doing in their course. The task is to create an interactive overview that shows the overall performance of the course and highlights outliers (e.g. students that need more help). This view should be enhanced with data from previous semester to compare how students progressed in the past. .

  • Create an interactive overview for teachers that shows the overall performance of students in the course and highlights outliers (e.g. students that need more help)
    • It should be built in such a fashion that data can be gradually added over the course of the semester and the view updates in a meaningful fashion
  • The overview should be further enhanced with data from previous semester to compare how students progressed in the past
  • Simple data analysis tasks should also be possible with the interface (e.g. correlation analysis of grades)
  • Develop the interface as a Moodle plugin

Prerequisites: finished the Human Computer Interaction class or read the book "Visualization Analysis and Design" by T. Munzner
Programming: JavaScript (D3.js)

Contact: Raphael Sahann | Torsten Möller | Daniel Handle-Pfeiffer (of the CTL)


Study Path Overview Visualization

Data shows that almost all students takes different courses at different times during their studies. The main question in this project is: how do students actually complete their study and how similar are the paths they take to do so?

  • create an interface which visualizes multiple student paths at once
  • find meaningful ways to filter and select groups of similar paths to compare them and explore trends

 

Prerequisites: finished the Human Computer Interaction class
Programming: JavaScript (D3.js)

Contact: Raphael Sahann 

Study Path Detail Visualization

Data shows that almost all students take different courses at different times during their studies. How can an individual path a student takes be visualized in such a fashion that gives insight on what happened during the study and makes it comparable to a second path?

  • Find a suitable visualization for the path which an individual student takes trough his/her study
  • Create a dashboard that lets the user explore and compare two study paths in detail


Prerequisites: finished the Human Computer Interaction class
Programming: JavaScript (D3.js)

 

Contact: Raphael Sahann 

Visualize grades in relation to courses and their semesters

The paper "Visualizing Student Histories Using Clustering and Composition" by Trimm et al. presents a visualization method that relates grades with courses taken in a particular semester. This should be transferred to student data of the University of Vienna.

  • automatically enhance the existing data of the University of Vienna to fit into the data scheme of the paper
  • reproduce the interface presented in the paper with our own data

Prerequisites: knowledge of data clustering and enhancing
Programming: expert knowledge of any graphical programming language of choice (best fit WebGL/maybe D3)


Contact: 
Raphael Sahann