I have no experience with this, but if I were to do it, I would use something like an optical rotary encoder (an absolute encoder, not an incremental encoder) -- one at every joint (i.e. at every place where their will be a servo on the puppet).
So, I would have two "puppets", one would be a "rehearsal puppet" with encoders (kind of like a stick figure with the same dimensions as the puppet -- also like a wooden poseable dummy like those used by artists, except with encoders at every joint). And the other would be the "performance puppet", which would have servos at it's joints and would be the one in full "costume". [perhaps the rehearsal puppet would also be in costume to make it easier to visualize the actions of it's "performance" -- but, if not in costume (i.e. stick figure), then it could be used for rehearsing other puppets without having to "dress it"].
The encoders would, then, be tied to the inputs of a microcontroller (perhaps use encoders that send their position in serial format such as I2C or SPI). The microcontroller would then collect the position data, time stamp it (or datalog it at a specific interval), package it up and transfer it to a workstation (such as a PC or Mac). The workstation would, then, have the intelligence to read that data and apply it to the servos on the puppet.
The workstation would have three human interfaces:
1. Recording -- with controls like "start recording" "pause" "stop" "save" etc. This would produce "performances"
2. Editing -- where one could manipulate the performances -- linking them together into "scenes", removing recorded movement data (i.e. deleting "gestures"), and even, manipulating the movement data (i.e. making an arm swing a little further, etc.)
3. Performance -- with a Big Green Play Button -- hit play, and watch your puppet dazzle the crowd

The Performance GUI could also have a "kiosk" mode where a time schedule could be set for multiple performances.