Implementation of deep deterministic policy gradients for controlling dynamic bipedal walking