Open Topics
Text Mining:
Machine Learning:
- Scalable Sampling with Lattices
- Me and my algorithm
- Interpreting Deep Clustering Results
- Visually exploring neural networks
- Visually Analyzing the Fault Tolerance of Deep Neural Networks
- Library for visualization of slices
- Visualization for optimization problems
- Clustering in high noise astronomical data
- Exploratory data analysis in Gaia
- Understanding anomalies in high-dimensional spaces
Data Visualization / Human Computer Interaction
HDI - Human Data Interaction:
Digital Humanism
- An AI's Perspective on Democracy
- ChatGPT - Threat or Hope for Democracy - Discussing Fake News with an AI
Image Processing:
Theory of Vis:
Text Mining:
Explore novel interaction techniques - Chatbots for Data Analysis
Chatbots are becoming more prevalent and are actively used by many companies. They offer a voice or text interface to interact with a computer. An example of a chatbot is amazon’s alexa, which can tell the time when asked.
The goal of this project is to find new possible ways to interact with an exploratory data analysis tool. Developing new interaction techniques would allow the user to explore and understand the data in a new fashion. For example, it could be possible to have a chat window next to a scatterplot that enables the user to enter queries such as: ‘show me the average’, which would then be reflected in the scatterplot.
- Learn about natural language processing
- Understand and compare interaction techniques
- Develop a ‘conversation’ with a data analysis tool
Prerequisites: VIS, HCI
Contact: Torsten Möller
Machine Learning:
Scalable Sampling with Lattices
The challenge is easy to describe:
Come up with an algorithm that takes as input N (the number of samples) and D (the dimension) and output a scale factor s and a rotation angle alpha (D-1 angles) that would fit exactly N samples of a Cartesian lattice in the unit box [0,1]^D.
Prerequisites: Math
Contact: Torsten Möller
Me and my algorithm
In this project, you will create a mobile experience designed to explore how people understand and choose different types of algorithms for a scenario with societal impact, e.g. algorithmic decision-making (ADM) in employment services or university admissions. This will be realized in the form of a mobile game. The project aims to examine if stakeholders can select an algorithm from a set of algorithms aiming to pick one that best aligns with their values. Values correspond to expectations (or changes in expectations) over the population affected by ADM (See https://dl.acm.org/doi/abs/10.1145/3531146.3533097 for an example). You will analyse the information needs of stakeholders for making those decisions in 3 settings (democratic decision-making, business, and public services) and whether and how these needs can be met combining gamification with techniques for explainable artificial intelligence.
Prerequisites: HCI + FDA / IML / Doing Data Science
Programming: Any framework for app development
Fields of research: Machine Learning, HCI, Gamification
Contact: Laura Koesten, Sebastian Tschiatschek (https://dm.cs.univie.ac.at/team/person/109359/)
Interpreting Deep Clustering Results
Deep embedded clustering also called deep clustering is a growing field that combines ideas from clustering and deep learning. The integration of these techniques makes it possible to learn features automatically from the data to increase clustering performance. Current deep clustering methods are hard to interpret, making it difficult to understand how a clustering result was reached.
The goal of this project is to develop an interactive visualization tool, e.g. a web based application, for exploring the predictions of deep clustering algorithms and helping to understand their decision making process.
The student is expected to do a literature review of existing visualization techniques developed for (supervised) deep learning, e.g. feature visualizations, that could be applicable to interpreting unsupervised deep clustering algorithms. The identified methods should then be applied (if necessary adapted) to and compared for existing deep clustering algorithms.
Some research questions of interest that should be considered during the project would be: How suitable are existing visualization techniques to interpret deep clustering results? How do the different parts of the multi-objective loss of deep clustering techniques relate to each other? Considering multiple clustering models, e.g. K-Means vs DBSCAN, how do the neural network visualizations differ for each of them?
Students working on this project need basic background knowledge in machine learning (e.g. Foundations of Data Analysis), visualisation (e.g. Visualisation and Visual Data Analysis), solid programming skills in Python, and desirably some background with PyTorch, deep learning and some visualization framework, like d3.
Prerequisites: VIS, FDA
Contact: Aleksandar Doknic, Torsten Möller, in collaboration with Claudia Plant & Lukas Miklautz
Visually exploring neural networks
We have a collection of 100,000 different neural networks from the Tensorflow Playground . The core goal of this project is to create a visual interface to understand some of the basic properties of neural networks. Enabling a user to explore should help answer questions like the relationship of number of neurons and number of hidden layers, the impact of batch size, activation functions and other parameters on the quality of the network. Your tasks include:
- fast prototyping with Tableau
- getting familiar with the data set
- querying neural network users on what parameters they want to explore (requirement analysis)
- development of low-fi and high-fi prototypes
Prerequisites: VIS, FDA
Contact: Torsten Möller
Visually Analyzing the Fault Tolerance of Deep Neural Networks
The main objective is to design and implement a good and efficient way of visually investigating the resilience of deep neural networks against silent data corruption (bit flips) based on given empirical measurements. There are many possible causes for such faults (e.g., cosmic radiation, increasing density in chips, lower voltage which implies lower signal charge, etc.), and their "incidence" is expected to increase with current trends in chip architecture.
Starting point for the project is a given example data set which contains information about the relationship between single bit flips across various locations of a certain neural network (which layer, which neuron, which weight, which position within the floating-point representation of a real number, etc.) and the resulting accuracy of the network.
The task is to develop a tool which supports answering various questions about the influence of a bit flip on the resulting accuracy.
Examples for interesting questions are the following:
- (empirical) distribution of the influence of a bit flip on the resulting accuracy over the positions in the floating-point representation
- (empirical) distribution of the influence of a bit flip on the resulting accuracy over the layers in the network architecture
- (empirical) distribution of the influence of a bit flip on the resulting accuracy over the weights in a given layer in the network architecture
In order to answer these questions, an iterative design process is required to
- start with a requirement analysis (task & data analysis)
- low-fi prototypes
- high-fi prototypes
- refinement
- constant evaluation of the visual analysis tool.
The data set, the problem setting and the details of the requirements are provided by Prof. Gansterer, the supervision in visual analysis aspects is provided by Prof. Möller.
Prerequisites: VIS, FDA
Contact: W. Gansterer | Torsten Möller
Library for visualization of slices
This is primarily a programming project. Slicing methods are a novel way of visualizing multi-dimensional data. However, there is no publicly-available library for R or Python that makes it easy to use these visualization techniques. The goal of these projects is to develop such a library. Students should have knowledge of Javascript and either R or Python.
Prerequisites: VIS
Programming languages: Javascript and (R or Python)
Contact:Torsten Möller|Thomas Torsney-Weir
Visualization for optimization problems
Can visualization beat traditional (offline) optimization problems? The goal of this project is to see how well visually guided optimization can compete with traditional optimization algorithms. Students will develop a visualization system to find optimum configurations of black box (i.e. unknown) algorithms from a contest.
Prerequisites: VIS, Mathematical Modeling
Programming languages: Javascript, (R or Python), and C++
Contact: Torsten Möller|Thomas Torsney-Weir
Clustering in high noise astronomical data
Area: Machine Learning
With the second Gaia data release astronomers have been flooded with data. One interesting research question which Gaia helps answering is concerning open clusters (OC). OCs are groups of stars born in the same place and time which are regarded as the building blocks of galaxies. Gaia provides precise positions and velocities of 1.6 billions stars which are the features used to extract these open clusters. However, OCs constitute only a small fraction of the full data set and are embedded in a sea of field stars which we consider noise. Hence, the results of density based clustering techniques depend strongly on small changes in the algorithms hyper-parameters. The goal of this project is to survey current techniques which can help to better extract these OCs from the Gaia catalog and potentially design new algorithms which better suit this task.
Unsupervised learning
Density based clustering
Big Data
Prerequisites: FDA
Contact: Sebastian Ratzenböck
Exploratory data analysis in Gaia
Area: VIS/Machine Learning
Astronomical discoveries depend on the quality of the data. Therefore, quality criteria are introduced to filter out bad data points. Within the Gaia data set there are multiple metrics which provide information about the quality of a single entry in the tabular data set (e.g. a star). The goal of this project is to analyze current quality criteria and their effects on the data features and optimally find better suited filter solutions.
- Visualize the effects of different data filters
- Unsupervised learning
- Big data
Prerequisites: VIS, FDA
Contact: Sebastian Ratzenböck
Understanding anomalies in high-dimensional spaces
The goal of this project is to visually explain how datasets with varying number of features (or dimensions) per data point behave differently. We understand that high-dimensional spaces behave differently than low-dimensional spaces, but it is difficult to develop an intuition of what these differences are and how they affect datasets.
Visualization could help us intuitively understand what happens to the distances, local geometry, projections, etc. when data points have 3, 10 or 50 dimensions. In other words: Show how the dimensionality of the data affects mathematical operations (dist(X,Y), PCA(X), etc.).
1. Create synthetic datasets to play around with
2. Validate with real datasets with different numbers of dimensions
3. Validate your approach with a user study
Optional: How does data dimensionality affect ML models?
Prerequisites: FDA, optional: VIS
Contact: Aleksandar Doknic, Torsten Möller
Data Visualization/Human Computer Interaction:
Journaling data visualizations: An (Online) diary study
Data about various topics are often communicated visually through diverse sources. But what kind of data visualizations do lay people see in their everyday life? In which circumstances do they consume data and which news sources do they use? What kind of data respresentations catch people’s attention, what do people like or trust? In this project, you will create a study design in which lay participants are asked to keep an (online) diary of their day-to-day encounters with data visualizations. This could involve taking screenshots or pictures of data visualizations , as well as documenting their opinions about them. You will conduct a thematic analysis of the gathered data visualizations, participants’ data consumption habits, as well as their assessments of different aspects of the visualizations (i.e. understandability, trustworthiness, aesthetics).
Prerequisites: FDA, HCI, VIS
Programming: Python or R
Field of research: Data visualization, HCI
Contact: Laura Koesten
Studying visualizations in different ways
Designing a chart involves many decisions, including the data to be shown, the type and style of the chart, and the way it is tailored for a purpose or audience. While a lot of research on perceptual guidelines exist we still know relatively little about how some design choices influence people's willingness to engage with visualizations and their ability to make sense of the data effectively. In this thesis, you will study how visualizations of one chart type are understood and rated by different audiences. You will create a study design that you will implement in 3 different settings: 1. a lab study in which participants can be observed and interviewed; 2. a crowdsourcing study using Prolific; and 3. an online survey study distributed to volunteers. You will compare and contrast insights gained from these three different methods.
Prerequisites: FDA, VIS, ideally HCI,
Programming: Python or R
Field of research: HCI and Data visualization
Contact: Laura Koesten, Torsten Möller
Field of research: Data visualization, HCI
HDI - Human Data Interaction:
Creating data showcases
The web provides access to millions of datasets. These data can have more impact when used by others, beyond the context for which they were originally created. But using a dataset beyond the context in which it originated remains challenging. Simply making data available does not mean it will be or can be easily used by others. More research is needed to understand the information that researchers need to know about data before those data can be reused. Making data more visible and navigable can also aid data reuse.
In this project, you will do a requirement analysis of what researchers need to use other datasets that have been created for different purposes. You will create different prototypes of what a data showcase would look like, in the form of a website. This includes thinking about how to describe a dataset, what metadata schemas to use, how to publish data and how to visually communicate the methodology and context of a dataset. You will do this combining insights from literature, visualizations of the data and creating a survey.
Prerequisites: FDA, HCI, VIS
Programming: Front-end development, visualizations
Field of research: HCI, Human Data Interaction
Contact: Laura Koesten, Kathleen Gregory
Explaining descriptive statistics
With Anscombe's Quartet [1] it was demonstrated quite figuratively that summary statistics can be very misleading or, at least, hard to interpret. Just recently, this example has become quite playful with the Dinozaur Dozen [2]. However, there are a number of statistical measures, that don't have an easy (visual) explanation. One of them is Krippendorf's alpha [3], a very common measure in the social science for measuring the agreement between subjective coders (as in labeling text or documents). The challenge of this project will be to:
- understand the measure
- develop simple alternatives
- develop different visual representations that "bring this measure to life", i.e. make it easy(er) to understand
pre-req: Vis
Contact: Torsten Möller, Aleksandar Doknic
[1] Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician. 27 (1): 17–21. doi:10.1080/00031305.1973.10478966. JSTOR 2682899, see also en.wikipedia.org/wiki/Anscombe%27s_quartet
[2] Justin Matejka, George Fitzmaurice (2017), "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing," ACM SIGCHI Conference on Human Factors in Computing Systems. see also www.autodesk.com/research/publications/same-stats-different-graphs
[3] Krippendorff, Klaus (1970). Estimating the reliability, systematic error, and random error of interval data. Educational and Psychological Measurement, 30 (1), 61–70. see also en.wikipedia.org/wiki/Krippendorff%27s_alpha
Digital Humanism
An AI's Perspective on Democracy
In this thesis, you will study an AI's perspective on democracy (actually a language's model "perspective" - ChatGPT). In particular, you will systematically investigate and probe the AI's explanations and "understanding" of different types of democratic theories and relate them to the original theories. You will investigate the AI's take on prevalent statements like "direct democracy does not work" or "democracy is the rule of the rich" and identify sources supporting or contradicting these statements according to the AI.
Prerequisites: HCI + FDA / IML / Doing Data Science
Programming: Any framework for app development
Field of research: Digital Humanism
Contact:
Sebastian Tschiatschek (https://dm.cs.univie.ac.at/team/person/109359/)
Laura Koesten
ChatGPT - Threat or Hope for Democracy - Discussing Fake News with an AI
In this thesis, you will investigate how ChatGPT treats fake news and discuss implications for democracy. In particular, you will focus on a selection of fake news relevant to recent politics (probably US related) and contrast the AI's response to those for controversial topics. If time permits, you will investigate whether the responses allow classification as fake news using machine learning techniques, i.e., whether there is information about the truth of a statement in the AI's response. You will discuss whether ChatGPT could be used for fact-checking and whether there are topics such a model should not be used for gathering information in order to warrant proper functioning of a democratic system.
Prerequisites: HCI + FDA / IML / Doing Data Science
Field of research: Digital Humanism
Contact:
Sebastian Tschiatschek (https://dm.cs.univie.ac.at/team/person/109359/)
Laura Koesten
Image Processing:
Usability Evaluation of Open Source Volume Analysis Software
open_iA enables users to perform general and specialized visual analysis and processing of volumetric datasets (such as from a computed tomography device). Since it has been developed mainly as a basis for research prototypes, the user interface so far was not developed with usability as first concern.
The goals of this project are:
- To evaluate the usability of its general capabilities, and optionally of its advanced visual analysis tools. This could for example happen through usability interviews, or user studies comparing it to other (open source and commercially available) solutions.
- To find innovative ways of overcoming the problems found in the evaluation.
- Depending on time and interest, to implement some or all of these improvements.
Prerequisites: finished the Signal and Image Processing & the Human Computer Interaction class
Contact: Bernhard Fröhler
Smart Image Filter Preview
The analysis of large images (2D or 3D), requires applying filters like smoothing or denoising. Finding the most suitable parameters for a given analysis task through a trial-and-error approach can be time-consuming. The goal of this project is to develop a tool for a smart preview over the possible outcome of some image processing filters for different parameters for a small region of the image; the outcome of different parameterizations could for example be presented in a matrix; the tool should also be evaluated regarding usability.
Prerequisites: SIP, HCI
Programming languages: Python, C++
Contact: Bernhard Fröhler | Torsten Möller
Theory of Vis:
iTuner: Touch interfaces for high-D visualization
In order to understand simulations, machine learning algorithms, and geometric objects we need to interact with them. This is difficult to perform with something like a mouse which only has 2 axes of movement. Multitouch interfaces let us develop novel interactions for multi-dimensional data. The goals of this project are:
- Develop a touch-screen interface for navigating high-dimensional spaces.
- User interface designed for a tablet (ipad) to be used in concert with a larger screen such as a monitor or television.
Prerequisites: VIS, HCI
Contact: Torsten Möller|Thomas Torsney-Weir