You can have an Orphographic camera with a lower resolution than the backbuffer, without going through a rendertarget and the GPU will sample/blend the texture for each pixel.
With that said, you can also use larger textures and still render them as 16x16 'units' in your virtual camera resolution.
That's strange. I think you somehow let the pixels/resolution into the movement logic there. Normally, change of the rendering shouldn't affect the physics. You have to think of your tile/camera/movement units independently of the backbuffer/texture resolution.