## 2022

### Monitoring Model Deterioration with Explainable Uncertainty Estimation via Non-parametric Bootstrap

#### Submitted to ICML ‘22

Monitoring machine learning models once they are deployed is challenging. It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach, and monitoring performance metrics becomes unfeasible. In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation as a technique that aims to monitor the deterioration of machine learning models in deployment environments, as well as determine the source of model deterioration when target labels are not available. Classical methods are purely aimed at detecting distribution shift, which can lead to false positives in the sense that the model has not deteriorated despite a shift in the data distribution. To estimate model uncertainty we construct prediction intervals using a novel bootstrap method, which improves upon the work of Kumar & Srivastava (2012). We show that both our model deterioration detection system as well as our uncertainty estimation method achieve better performance than the current state-of-the-art. Finally, we use explainable AI techniques to gain an understanding of the drivers of model deterioration. We release an open source Python package, doubt, which implements our proposed methods, as well as the code used to reproduce our experiments.

### ScandEval: A Benchmark for Scandinavian Natural Language Understanding

#### Submitted to ACL ARR February ‘22

This paper introduces a Scandinavian benchmarking platform, ScandEval, which can benchmark any pretrained or finetuned model on 29 datasets in Danish, Norwegian, Swedish, Icelandic and Faroese, two of which are new. We develop and release a Python package and Command-Line Interface (CLI), scandeval, which can benchmark any model that has been uploaded to the HuggingFace Hub, with reproducible results. Using this package, we benchmark over 60 Scandinavian or multilingual models and present the results of these in an interactive online leaderboard. The benchmarking results shows that the investment in language technology in Norway, Sweden and Iceland has led to language models that outperform massively multilingual models such as XLM-RoBERTa and LaBSE. We release the source code for both the package and leaderboard.

### Can we automate the truth? Mapping the contingencies of automated misinformation detection

#### Submitted to FAccT ‘22

The stark rise of online misinformation in recent years has sparked a growing interest in the development of automatic detection of misinformation using machine learning algorithms. In the wake of COVID-19, the issue became even more rampant and harmful, leading major social media companies like Facebook, YouTube and Twitter to rely more on automated and less on human moderation of online content. The use of machine learning supervised models is a promising approach to tackle the sheer volume of misinformation, but it also brings about challenges related to the reproduction of biases in the data, undue censorship, and potentially backfiring effects. Drawing on an interdisciplinary collaboration between academics from the fields of science and technology studies and data science, we critically unpack the technical and epistemic practices involved in the construction of misinformation classification models. We outline a series of contingencies throughout the stages of problematization, formalization, curation of ground truth datasets and model evaluation. We then suggest three concrete responses and future research paths. This paper contributes to the ongoing scholarly debate on fairness in algorithmic systems which has not yet systematically looked at the distinctive issues linked to the use of ML algorithms in combatting misinformation.

## 2020

### The Virtual Large Cardinal Hierarchy

#### Submitted to Fundamenta Mathematicae

We continue the study of the virtual large cardinal hierarchy, initiated in Gitman and Schindler (2018), by analysing virtual versions of superstrong, Woodin, Vopěnka, and Berkeley cardinals. Gitman and Schindler showed that virtualizations of strong and supercompact cardinals yield the same large cardinal notion (Gitman and Schindler, 2018). We show the same result for a (weak) virtualization of Woodin and a virtualization of Vopěnka cardinals. We also show that there is a virtually Berkeley cardinal if and only if the virtual Vopěnka principle holds, but On is not Mahlo.

## 2019

### Games and Ramsey-like Cardinals

#### Published in the Journal of Symbolic Logic

We generalise the $\alpha$-Ramsey cardinals introduced in Holy and Schlicht (2018) for cardinals $\alpha$ to arbitrary ordinals $\alpha$, and answer several questions posed in that paper. In particular, we show that $\alpha$-Ramseys are downwards absolute to the core model $K$ for all $\alpha$ of uncountable cofinality, that strategic ω-Ramsey cardinals are equiconsistent with remarkable cardinals and that strategic $\alpha$-Ramsey cardinals are equiconsistent with measurable cardinals for all $\alpha>\omega$. We also show that the n-Ramseys satisfy indescribability properties and use them to provide a game-theoretic characterisation of completely ineffable cardinals, as well as establishing further connections between the $\alpha$-Ramsey cardinals and the Ramsey-like cardinals introduced in Gitman (2011), Feng (1990), and Sharpe and Welch (2011).