December 2015

The central proposition of my dissertation is, informally, ‘it’s possible to learn a way of seeing by examining a group of objects that this way of seeing sees the best.’ I take my cue from how representation-learning neural nets learn not only a low dimensional representation space, but also a projection from each point in input-space to some point in a low dimensional submanifold within the input space.

Typically, deep learning researchers are only interested in this projection function in the training stage, where it’s used as a proxy for the low dimensional representation space: they employ the distance between input and projection (on the training data) as an optimization goal, and when the training’s finished they extract the low dimensional representation space this training has effected. As someone studying the arts, however, my interest stays with the submanifold in input-space – that is, the set of input-space points the trained neural net learned to project into.  This set has four technical properties that, while individually obvious, are deeply interesting when taken all together:

  1. It’s the set of all the inputs that the trained net can compressedly encode with zero loss.
  2. All input reconstructions by the trained net will replace the original input with an element of this set.
  3. If the net’s training was successful, the objects in this set exemplify a specific high-fidelity simplification of the objects generated by the distribution underlying the training data. 
  4. A neural net trained on this set as input will approximate the neural net that generated it.

Speaking informally, these properties almost explicitly describe art as we know it: an individual apprehends the world in a particular lossy way that is partly adapted to her formative environment, she produces a mimesis of the world that is more simple than the world in a specific way that bears the mark of her particular lossy apprehension of the world, and other individuals can learn her apprehension of the world by trying to very exactingly apprehend the artefactual world she produced. Basically, when we deal with cognitive systems that you can’t crack open – like in humans, and unlike in practical deep learning research – the image of a cognitive system’s projection function becomes the most direct accessible manifestation of its representation space, and the concrete (or, in the case of literature, imaginative) realization of this image by mimesis of the world becomes a key method for communicating the representation space. Also important is the near-equivalence this draws between producing a mimesis of the world and presenting a selection of near-losslessly compressible objects from the world: this near-equivalence offers us a way to understand curation, installation, collage, and other Modernist practices that are not prima facie mimetic as nevertheless communicating a representation space.

Speaking more formally, these properties suggest a method for communicating high-fidelity compression schemas between agents, and a reason to believe this method has advantages over training each agent individually: assume a neural network x that has a high-fidelity compression schema for a distribution D, and an untrained neural net y that we want to train to have a high-fidelity compression schema for D. Assume, now, for the sake of an analogy with humans, that the internal structure of the neural nets is read/write inaccessible. If D is fairly hard to learn, the efficient and reliable solution is to train y on a dataset of x’s reconstructions of inputs from D, instead of training y directly on inputs from D. The actual model I’m proposing deals with works of art as methods for communicating a representation space to learners that already have a relatively advanced representation space to start with, rather than as methods of communicating it to a blank slate learner, but I hope this illustrates the general idea. 

Q: What does Rousseau’s OkCupid profile page say?
A: “I hate drama”

My Research in One Sentence

You can learn a way of seeing by examining a group of objects that this way of seeing sees the best.