Open Topics

Text Mining:

Explore novel interaction techniques - Chatbots for Data Analysis

Chatbots are becoming more prevalent and are actively used by many companies. They offer a voice or text interface to interact with a computer. An example of a chatbot is amazon’s alexa, which can tell the time when asked.
The goal of this project is to find new possible ways to interact with an exploratory data analysis tool. Developing new interaction techniques would allow the user to explore and understand the data in a new fashion. For example, it could be possible to have a chat window next to a scatterplot that enables the user to enter queries such as: ‘show me the average’, which would then be reflected in the scatterplot.

Learn about natural language processing
Understand and compare interaction techniques
Develop a ‘conversation’ with a data analysis tool

Prerequisites: VIS, HCI

Contact: Torsten Möller

Machine Learning:

Density deconvolution on Gaia data: estimating accurate distances to millions of stars

Project Overview:

This project aims to tackle the challenge of identifying stellar clusters using data from the Gaia satellite, focusing on the application of density-based clustering methods. A significant limitation in this task arises from noise-corrupted samples, which obscure critical details and lead to less accurate clustering outcomes. The primary goal of this thesis will be to implement and refine a density deconvolution approach to recover the noise-free underlying distribution function common to all samples, enhancing the accuracy and reliability of stellar cluster identification.

Background:

Density estimation is a fundamental statistical task involving the estimation of a distribution's density from a finite set of measurements. In many scientific fields, researchers are often limited to working with noise-corrupted measurements. When the statistics of the noise are known, density deconvolution methods can be employed to approximate the density function of the unobserved, noise-free samples, as opposed to the noisy measurements themselves. The Gaia satellite's measurements present a scenario of additive noise, where the observed samples are the result of adding independent noise to unobserved (hidden) values.

Aims and Expected Outcomes:

To understand and implement existing deconvolution approaches on Gaia data. The successful completion of this project is expected to yield a comparison of existing deconvolution methods and optimally an extension of an existing methodology tailored to astronomical data. This work could significantly enhance the precision of astrophysical research and potentially lead to new discoveries in the field.

Suitable for Candidates Who:

- Have an interest in applied data science and a solid understanding of statistical and machine learning concepts.

- Possess strong programming skills in Python.

Contact: Sebastian Ratzenböck

Scalable Sampling with Lattices

The challenge is easy to describe:

Come up with an algorithm that takes as input N (the number of samples) and D (the dimension) and output a scale factor s and a rotation angle alpha (D-1 angles) that would fit exactly N samples of a Cartesian lattice in the unit box [0,1]^D.

Prerequisites: Math

Contact: Torsten Möller

Interpreting Deep Clustering Results

Deep embedded clustering also called deep clustering is a growing field that combines ideas from clustering and deep learning. The integration of these techniques makes it possible to learn features automatically from the data to increase clustering performance. Current deep clustering methods are hard to interpret, making it difficult to understand how a clustering result was reached.

The goal of this project is to develop an interactive visualization tool, e.g. a web based application, for exploring the predictions of deep clustering algorithms and helping to understand their decision making process.
The student is expected to do a literature review of existing visualization techniques developed for (supervised) deep learning, e.g. feature visualizations, that could be applicable to interpreting unsupervised deep clustering algorithms. The identified methods should then be applied (if necessary adapted) to and compared for existing deep clustering algorithms.

Some research questions of interest that should be considered during the project would be: How suitable are existing visualization techniques to interpret deep clustering results? How do the different parts of the multi-objective loss of deep clustering techniques relate to each other? Considering multiple clustering models, e.g. K-Means vs DBSCAN, how do the neural network visualizations differ for each of them?

Students working on this project need basic background knowledge in machine learning (e.g. Foundations of Data Analysis), visualisation (e.g. Visualisation and Visual Data Analysis), solid programming skills in Python, and desirably some background with PyTorch, deep learning and some visualization framework, like d3.

Prerequisites: VIS, FDA

Contact: Aleksandar Doknic, Torsten Möller, in collaboration with Claudia Plant & Lukas Miklautz

Visually exploring neural networks

We have a collection of 100,000 different neural networks from the Tensorflow Playground . The core goal of this project is to create a visual interface to understand some of the basic properties of neural networks. Enabling a user to explore should help answer questions like the relationship of number of neurons and number of hidden layers, the impact of batch size, activation functions and other parameters on the quality of the network. Your tasks include:

fast prototyping with Tableau
getting familiar with the data set
querying neural network users on what parameters they want to explore (requirement analysis)
development of low-fi and high-fi prototypes

Prerequisites: VIS, FDA

Contact: Torsten Möller

Visually Analyzing the Fault Tolerance of Deep Neural Networks

The main objective is to design and implement a good and efficient way of visually investigating the resilience of deep neural networks against silent data corruption (bit flips) based on given empirical measurements. There are many possible causes for such faults (e.g., cosmic radiation, increasing density in chips, lower voltage which implies lower signal charge, etc.), and their "incidence" is expected to increase with current trends in chip architecture.

Starting point for the project is a given example data set which contains information about the relationship between single bit flips across various locations of a certain neural network (which layer, which neuron, which weight, which position within the floating-point representation of a real number, etc.) and the resulting accuracy of the network.

The task is to develop a tool which supports answering various questions about the influence of a bit flip on the resulting accuracy.

Examples for interesting questions are the following:

(empirical) distribution of the influence of a bit flip on the resulting accuracy over the positions in the floating-point representation
(empirical) distribution of the influence of a bit flip on the resulting accuracy over the layers in the network architecture
(empirical) distribution of the influence of a bit flip on the resulting accuracy over the weights in a given layer in the network architecture

In order to answer these questions, an iterative design process is required to

start with a requirement analysis (task & data analysis)
low-fi prototypes
high-fi prototypes
refinement
constant evaluation of the visual analysis tool.

The data set, the problem setting and the details of the requirements are provided by Prof. Gansterer, the supervision in visual analysis aspects is provided by Prof. Möller.

Prerequisites: VIS, FDA

Contact: W. Gansterer | Torsten Möller

Library for visualization of slices

This is primarily a programming project. Slicing methods are a novel way of visualizing multi-dimensional data. However, there is no publicly-available library for R or Python that makes it easy to use these visualization techniques. The goal of these projects is to develop such a library. Students should have knowledge of Javascript and either R or Python.

Prerequisites: VIS
Programming languages: Javascript and (R or Python)

Contact:Torsten Möller|Thomas Torsney-Weir

Visualization for optimization problems

Can visualization beat traditional (offline) optimization problems? The goal of this project is to see how well visually guided optimization can compete with traditional optimization algorithms. Students will develop a visualization system to find optimum configurations of black box (i.e. unknown) algorithms from a contest.

Prerequisites: VIS, Mathematical Modeling
Programming languages: Javascript, (R or Python), and C++

Contact: Torsten Möller|Thomas Torsney-Weir

Clustering in high noise astronomical data

Area: Machine Learning

With the second Gaia data release astronomers have been flooded with data. One interesting research question which Gaia helps answering is concerning open clusters (OC). OCs are groups of stars born in the same place and time which are regarded as the building blocks of galaxies. Gaia provides precise positions and velocities of 1.6 billions stars which are the features used to extract these open clusters. However, OCs constitute only a small fraction of the full data set and are embedded in a sea of field stars which we consider noise. Hence, the results of density based clustering techniques depend strongly on small changes in the algorithms hyper-parameters. The goal of this project is to survey current techniques which can help to better extract these OCs from the Gaia catalog and potentially design new algorithms which better suit this task.

Unsupervised learning
Density based clustering
Big Data

Prerequisites: FDA

Contact: Sebastian Ratzenböck

Understanding anomalies in high-dimensional spaces

The goal of this project is to visually explain how datasets with varying number of features (or dimensions) per data point behave differently. We understand that high-dimensional spaces behave differently than low-dimensional spaces, but it is difficult to develop an intuition of what these differences are and how they affect datasets.

Visualization could help us intuitively understand what happens to the distances, local geometry, projections, etc. when data points have 3, 10 or 50 dimensions. In other words: Show how the dimensionality of the data affects mathematical operations (dist(X,Y), PCA(X), etc.).

1. Create synthetic datasets to play around with

2. Validate with real datasets with different numbers of dimensions

3. Validate your approach with a user study

Optional: How does data dimensionality affect ML models?

Prerequisites: FDA, optional: VIS

Contact: Aleksandar Doknic, Torsten Möller

Data Visualization/Human Computer Interaction:

Dashboard for Heart Rate Estimations During Football

The goal is to provide real-time heart rate estimates to coaches during a football match. A software is needed that collects data from a local network and stores it in a database (the manufacturer of the sensor infrastructure will help to clarify implementation details). This data will then be used to model/predict the heart rate of all players on the field. The results are to be displayed in a dashboard, that is clear, concise and easy to use during a stressful football match. The software should be modular in the sense that the prediction model is interchangeable. In the scope of the project, only a simple prediction model is to be implemented. The software should run on a laptop/tablet.

It might be necessary to sign an NDA with the Manufacturer.

Prerequisites: Python (for prediction model), VIS

Contact: Tjorven Schnack

Visual complexity of data visualizations

In this project, you will design an online study to investigate how people perceive the complexity of different charts. It is currently unclear what aspects of charts influence people’s complexity ratings. This could include dimensions such as different chart types, different numbers of data categories, differences in dimensions of data,

Here, we aim to investigate the complexity of data visualizations through an experiment. Drawing from a previous study on web pages, the approach will focus on evaluating participants' perceptions of visual complexity (and potentially aesthetic characteristics) in data visualizations. We will gather demographic information, including age and sex, as well as information about participant’s experiences and opinions, such as their familiarity with data visualization or political leaning. The core study will present participants with a series of data visualizations for a brief duration, followed by a ranking task to assess perceived visual complexity and aesthetic attributes (informed by a validated scale for measuring aesthetic pleasure of visual representations by He et al., 2022). After the task, participants will be asked feedback questions probing them about their perceptions of visual complexity in data visualizations.

Background literature:
He, T., Isenberg, P., Dachselt, R. and Isenberg, T., 2022. BeauVis: A Validated Scale for Measuring the Aesthetic Pleasure of Visual Representations. IEEE Transactions on Visualization and Computer Graphics, 29(1), pp.363-373.

Prerequisites: Vis, HCI
Contact: Torsten Möller, Laura Koesten

Mistrust in data (visualizations)

Political and societal trust are declining, fueled by polarization and disinformation. Media studies often assume baseline trust in information sources, neglecting widespread distrust in the press. Visual representations of data are not exempt from such dynamics, and the number of decisions around data and visual design additionally interact with feelings of trust or mistrust. In this project, we want to explore ways to create intentional distrust in visualizations to facilitate critical thinking. This will be informed by the literature on uncertainty visualizations but go beyond that to create several prototypes that create distrust, compare those in an experimental study, and evaluate levels of mistrust amongst participants.

Background literature:
Ge, L.W., Cui, Y. and Kay, M., 2023, April. CALVI: Critical Thinking Assessment for Literacy in Visualizations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1-18). https://dl.acm.org/doi/abs/10.1145/3544548.3581406

Pandey, S., McKinley, O.G., Crouser, R.J. and Ottley, A., 2023, October. Do You Trust What You See? Toward A Multidimensional Measure of Trust in Visualization. In 2023 IEEE Visualization and Visual Analytics (VIS) (pp. 26-30). IEEE. https://ieeexplore.ieee.org/abstract/document/10360907

Prerequisites: Vis, HCI
Contact: Laura Koesten, Torsten Möller

Personal data visualizations

By better comprehending the subjective and influential elements within a visualization, we can more effectively examine design choices and their impact on an audience. In this project, we explore how personal identification with data on a visualization appeals to emotional and behavioral changes in the viewers.
This will be done via several techniques, including creating a visualization that integrates a photo of the viewer into depictions of people in the data. The approach will be tested in a laboratory user study. Tasks include the creation of the prototype, the study design and setup, data analysis, and write-up.

Background literature:
Campbell, S. and Offenhuber, D., 2019. Feeling numbers: The emotional impact of proximity techniques in visualization. Information Design Journal, 25(1), pp.71-86.
https://www.jbe-platform.com/content/journals/10.1075/idj.25.1.06cam

Boy, J., Pandey, A.V., Emerson, J., Satterthwaite, M., Nov, O. and Bertini, E., 2017, May. Showing people behind data: Does anthropomorphizing visualizations elicit more empathy for human rights data?. In Proceedings of the 2017 CHI conference on human factors in computing systems (pp. 5462-5474). https://doi.org/10.1145/3025453.3025512

Prerequisites: Vis, HCI
Contact: Laura Koesten, Torsten Möller

Cognitive load and emotions in the context of data visualizations

This project will investigate the interplay between cognitive load and emotions, delving into how the cognitive demands of a data visualization influence emotional responses. The findings will create insights that inform the design of more effective data visualizations and strategies for managing emotional responses in information-processing contexts with data visualizations. You will design a study using eye-tracking technology, cognitive load assessments, and emotional self-reporting methods. The project will be conducted in collaboration with the AIT - Austrian Institute of Technology (Markus Murtinger).

Background literature:
Huang, W., Eades, P. and Hong, S.H., 2009. Measuring effectiveness of graph visualizations: A cognitive load perspective. Information Visualization, 8(3), pp.139-152.
https://journals.sagepub.com/doi/10.1057/ivs.2009.10

Prerequisites: Vis
Contact: Laura Koesten, Torsten Möller

Visualizing results from a forced choice experiment

In forced-choice experiments, a participant is presented with several alternatives in a study in which stimuli are presented. The participant is forced to choose one stimulus over another (or over multiple) and cannot provide a neutral/custom response.
This project will investigate how to visualize results from such experiments by exploring existing options and developing new or adapted visualization prototypes, which will be evaluated in a design study with users.

Prerequisites: Vis
Contact: Torsten Möller, Laura Koesten

Journaling data visualizations: An (Online) diary study

Data about various topics are often communicated visually through diverse sources. But what kind of data visualizations do lay people see in their everyday life? In which circumstances do they consume data and which news sources do they use? What kind of data respresentations catch people’s attention, what do people like or trust? In this project, you will create a study design in which lay participants are asked to keep an (online) diary of their day-to-day encounters with data visualizations. This could involve taking screenshots or pictures of data visualizations , as well as documenting their opinions about them. You will conduct a thematic analysis of the gathered data visualizations, participants’ data consumption habits, as well as their assessments of different aspects of the visualizations (i.e. understandability, trustworthiness, aesthetics).

Prerequisites: FDA, HCI, VIS
Programming: Python or R
Field of research: Data visualization, HCI

Contact: Laura Koesten

HDI - Human Data Interaction

Explaining descriptive statistics

With Anscombe's Quartet [1] it was demonstrated quite figuratively that summary statistics can be very misleading or, at least, hard to interpret. Just recently, this example has become quite playful with the Dinozaur Dozen [2]. However, there are a number of statistical measures, that don't have an easy (visual) explanation. One of them is Krippendorf's alpha [3], a very common measure in the social science for measuring the agreement between subjective coders (as in labeling text or documents). The challenge of this project will be to:

understand the measure
develop simple alternatives
develop different visual representations that "bring this measure to life", i.e. make it easy(er) to understand

pre-req: Vis

Contact: Torsten Möller, Aleksandar Doknic

_{[1] Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966. JSTOR 2682899, see also en.wikipedia.org/wiki/Anscombe%27s_quartet}

_{[2] Justin Matejka, George Fitzmaurice (2017), "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing," ACM SIGCHI Conference on Human Factors in Computing Systems. see also www.autodesk.com/research/publications/same-stats-different-graphs}

_{[3] Krippendorff, Klaus (1970). Estimating the reliability, systematic error, and random error of interval data. Educational and Psychological Measurement, 30 (1), 61–70. see also en.wikipedia.org/wiki/Krippendorff%27s_alpha}

Image Processing:

Usability Evaluation of Open Source Volume Analysis Software

open_iA enables users to perform general and specialized visual analysis and processing of volumetric datasets (such as from a computed tomography device). Since it has been developed mainly as a basis for research prototypes, the user interface so far was not developed with usability as first concern.

The goals of this project are:

To evaluate the usability of its general capabilities, and optionally of its advanced visual analysis tools. This could for example happen through usability interviews, or user studies comparing it to other (open source and commercially available) solutions.
To find innovative ways of overcoming the problems found in the evaluation.
Depending on time and interest, to implement some or all of these improvements.

Prerequisites: finished the Signal and Image Processing & the Human Computer Interaction class

Contact: Bernhard Fröhler

Smart Image Filter Preview

The analysis of large images (2D or 3D), requires applying filters like smoothing or denoising. Finding the most suitable parameters for a given analysis task through a trial-and-error approach can be time-consuming. The goal of this project is to develop a tool for a smart preview over the possible outcome of some image processing filters for different parameters for a small region of the image; the outcome of different parameterizations could for example be presented in a matrix; the tool should also be evaluated regarding usability.

Prerequisites: SIP, HCI

Programming languages: Python, C++

Contact: Bernhard Fröhler | Torsten Möller

Theory of Vis:

iTuner: Touch interfaces for high-D visualization

In order to understand simulations, machine learning algorithms, and geometric objects we need to interact with them. This is difficult to perform with something like a mouse which only has 2 axes of movement. Multitouch interfaces let us develop novel interactions for multi-dimensional data. The goals of this project are:

Develop a touch-screen interface for navigating high-dimensional spaces.
User interface designed for a tablet (ipad) to be used in concert with a larger screen such as a monitor or television.

Prerequisites: VIS, HCI

Contact: Torsten Möller|Thomas Torsney-Weir

Open Topics

Text Mining:

Machine Learning:

Data Visualization / Human Computer Interaction

HDI - Human Data Interaction:

Image Processing:

Theory of Vis:

Text Mining:

Explore novel interaction techniques - Chatbots for Data Analysis

Machine Learning:

Density deconvolution on Gaia data: estimating accurate distances to millions of stars

Scalable Sampling with Lattices

Interpreting Deep Clustering Results

Visually exploring neural networks

Visually Analyzing the Fault Tolerance of Deep Neural Networks

Library for visualization of slices

Visualization for optimization problems

Clustering in high noise astronomical data

Understanding anomalies in high-dimensional spaces

Data Visualization/Human Computer Interaction:

Dashboard for Heart Rate Estimations During Football

Visual complexity of data visualizations

Mistrust in data (visualizations)

Personal data visualizations

Cognitive load and emotions in the context of data visualizations

Visualizing results from a forced choice experiment

Journaling data visualizations: An (Online) diary study

HDI - Human Data Interaction

Explaining descriptive statistics

Image Processing:

Usability Evaluation of Open Source Volume Analysis Software

Smart Image Filter Preview

Theory of Vis:

iTuner: Touch interfaces for high-D visualization