We present a neural scene graph—a modular and controllable representation of scenes with elements that are learned from data. We focus on the forward rendering problem, where the scene graph is provided by the user and references learned elements. The elements correspond to geometry and material definitions of scene objects and constitute the leaves of the graph; we store them as high-dimensional vectors. The position and appearance of scene objects can be adjusted in an artist-friendly manner via familiar transformations, e.g. translation, bending, or color hue shift, which are stored in the inner nodes of the graph. In order to apply a (non-linear) transformation to a learned vector, we adopt the concept of linearizing a problem by lifting it into higher dimensions: we first encode the transformation into a high-dimensional matrix and then apply it by standard matrix-vector multiplication. The transformations are encoded using neural networks. We render the scene graph using a streaming neural renderer, which can handle graphs with a varying number of objects, and thereby facilitates scalability. Our results demonstrate a precise control over the learned object representations in a number of animated 2D and 3D scenes. Despite the limited visual complexity, our work presents a step towards marrying traditional editing mechanisms with learned representations, and towards high-quality, controllable neural rendering.