Our system detects and tracks people in a video stream providing state-of-the-art detection quality using a model that is 50 times smaller and requires 370 times fewer computations than competing models.