
BAR-K9
Behavioral Autonomous Robotic K9
Primary Objective
Soft Actor-Critic (SAC) and Deep Q Learning (DQL)
Making use of RL to navigate uneven and rough terrains
RL Training Framework
DQL OVERVIEW
Store experiences in a replay buffer
Collect transitions through interaction
Update Eval-Net using sampled experiences and back propagation
Periodically copy Eval-Net weights to Target-Net
OUTCOME?
adapt dynamically to rough terrains,
optimizing stability and path efficiency
GOAL ?
maximize entropy (exploration)
while also trying to maximize the reward
SAC OVERVIEW
Actor: Policy that determines actions given the current state.
Critic:
Used to estimate state-action values.
Two critics are trained independently to avoid overestimation of Q-Values
Replay Buffer: Stores information on state, action, reward, and next state.
sim2Real
The quadrupeds XML were first modeled and then transferred to MuJoCo (physics simulator) to perform RL training
➡️
PATRIQ robot XML model
RL Training
ROBOT XML DESCRIPTIONS
➡️
adaptation of Google Barkour v0 into MuJoCo
Terrain Adaptation
uneven rough terrain
minimal friction slope
stairs
grids