# Guest Speaker: Shay Moran

**Shay Moran**

480 Dreese Labs

2015 Neil Ave, Columbus, Ohio 43210

**On the expressiveness of comparison queries**

Comparisons are a classical and well studied algorithmic tool that is used in a variety of contexts and applications.

We will discuss two manifestations of the expressiveness of these queries in machine learning and complexity theory (a more detailed overview is given below). Both manifestations are based on the notion of "inference dimension” that can be viewed as another instance of the fruitful link between machine learning and discrete mathematics - a link dating back to the discovery of the VC dimension.

1. Active classification with comparison queries [Kane, Lovett, Moran, Zhang FOCS ’17] Active learning is a model for semi-supervised learning that captures situations in which unlabeled data is abundant and manually labelling it is expensive. We consider an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We prove that the usage of comparison queries leads to an exponential improvement in the query complexity of some well studied problems. Specifically, for the class of half-spaces, we show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size n using approximately O(logn) queries.

2. Nearly optimal linear decision trees for k-SUM and related problems [Kane, Lovett, Moran STOC ’18] We use the above framework to construct linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any k, we construct linear decision trees that solve the k-SUM problem on n elements using O(nk log n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, comparison queries are 2k-sparse and have only {−1,+1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms.

Based on joint works with Daniel Kane, Shachar Lovett, and Jiapeng Zhang.

**Bio:** Shay Moran is a Post-doc at Princeton University and at the Institute for Advanced at Princeton.

He graduated from the Technion in September ’16, and spent a year in California as a postdoc at UCSD and at the Simons Institute during the “Foundations of Machine Learning” program.

In October ’19 he will join the mathematics department at the Technion as an assistant Professor.

Shay’s research interests revolves around mathematical problems that are related to computer science, with a special focus on machine learning.

**Host:** Raef Bassily