Learning and Conditioning
In the late nineteenth century, psychologist Edward Thorndike proposed the law of effect. The law of effect states that any behavior that has good consequences will tend to be repeated, and any behavior that has bad consequences will tend to be avoided. In the 1930s, another psychologist, B. F. Skinner, extended this idea and began to study operant conditioning. Operant conditioning is a type of learning in which responses come to be controlled by their consequences. Operant responses are often new responses.
Just as Pavlov’s fame stems from his experiments with salivating dogs, Skinner’s fame stems from his experiments with animal boxes. Skinner used a device called the Skinner box to study operant conditioning. A Skinner box is a cage set up so that an animal can automatically get a food reward if it makes a particular kind of response. The box also contains an instrument that records the number of responses an animal makes.
Psychologists use several key terms to discuss operant conditioning principles, including reinforcement and punishment.
Reinforcement is delivery of a consequence that increases the likelihood that a response will occur. Positive reinforcement is the presentation of a stimulus after a response so that the response will occur more often. Negative reinforcement is the removal of a stimulus after a response so that the response will occur more often. In this terminology, positive and negative don’t mean good and bad. Instead, positive means adding a stimulus, and negative means removing a stimulus.
Punishment is the delivery of a consequence that decreases the likelihood that a response will occur. Positive and negative punishments are analogous to positive and negative reinforcement. Positive punishment is the presentation of a stimulus after a response so that the response will occur less often. Negative punishment is the removal of a stimulus after a response so that the response will occur less often.
Reinforcement helps to increase a behavior, while punishment helps to decrease a behavior.
Primary and Secondary Reinforcers and Punishers
Reinforcers and punishers are different types of consequences:
- Primary reinforcers, such as food, water, and caresses, are naturally satisfying.
- Primary punishers, such as pain and freezing temperatures, are naturally unpleasant.
- Secondary reinforcers, such as money, fast cars, and good grades, are satisfying because they’ve become associated with primary reinforcers.
- Secondary punishers, such as failing grades and social disapproval, are unpleasant because they’ve become associated with primary punishers.
- Secondary reinforcers and punishers are also called conditioned reinforcers and punishers because they arise through classical conditioning.
Shaping is a procedure in which reinforcement is used to guide a response closer and closer to a desired response.
Example: Lisa wants to teach her dog, Rover, to bring her the TV remote control. She places the remote in Rover’s mouth and then sits down in her favorite TV–watching chair. Rover doesn’t know what to do with the remote, and he just drops it on the floor. So Lisa teaches him by first praising him every time he accidentally walks toward her before dropping the remote. He likes the praise, so he starts to walk toward her with the remote more often. Then she praises him only when he brings the remote close to the chair. When he starts doing this often, she praises him only when he manages to bring the remote right up to her. Pretty soon, he brings her the remote regularly, and she has succeeded in shaping a response.
A reinforcement schedule is the pattern in which reinforcement is given over time. Reinforcement schedules can be continuous or intermittent. In continuous reinforcement, someone provides reinforcement every time a particular response occurs. Suppose Rover, Lisa’s dog, pushes the remote under her chair. If she finds this amusing and pats him every time he does it, she is providing continuous reinforcement for his behavior. In intermittent or partial reinforcement, someone provides reinforcement on only some of the occasions on which the response occurs.
Types of Intermittent Reinforcement Schedules
There are four main types of intermittent schedules, which fall into two categories: ratio or interval. In a ratio schedule, reinforcement happens after a certain number of responses. In an interval schedule, reinforcement happens after a particular time interval.
- In a fixed-ratio schedule, reinforcement happens after a set number of responses, such as when a car salesman earns a bonus after every three cars he sells.
- In a variable-ratio schedule, reinforcement happens after a particular average number of responses. For example, a person trying to win a game by getting heads on a coin toss gets heads every two times, on average, that she tosses a penny. Sometimes she may toss a penny just once and get heads, but other times she may have to toss the penny two, three, four, or more times before getting heads.
- In a fixed-interval schedule, reinforcement happens after a set amount of time, such as when an attorney at a law firm gets a bonus once a year.
- In a variable-interval schedule, reinforcement happens after a particular average amount of time. For example, a boss who wants to keep her employees working productively might walk by their workstations and check on them periodically, usually about once a day, but sometimes twice a day, or some-times every other day. If an employee is slacking off, she reprimands him. Since the employees know there is a variable interval between their boss’s appearances, they must stay on task to avoid a reprimand.
These different types of reinforcement schedules result in different patterns of responses:
- Partial or intermittent schedules of reinforcement result in responses that resist extinction better than responses resulting from continuous reinforcement. Psychologists call this resistance to extinction the partial reinforcement effect.
- Response rate is faster in ratio schedules than in interval schedules. Ratio schedules depend on number of responses, so the faster the subject responds, the more quickly reinforcement happens.
- A fixed-interval schedule tends to result in a scalloped response pattern, which means that responses are slow in the beginning of the interval and faster just before reinforcement happens. If people know when reinforcement will occur, they will respond more at that time and less at other times.
- Variable schedules result in steadier response rates than fixed schedules because reinforcement is less predictable. Responses to variable schedules also cannot be extinguished easily.
As in classical conditioning, extinction in operant conditioning is the gradual disappearance of a response when it stops being reinforced. In the earlier example, Lisa’s dog, Rover, started to put the remote under her chair regularly because she continuously reinforced the behavior with pats on his head. If she decides that the game has gone too far and stops patting him when he does it, he’ll eventually stop the behavior. The response will be extinguished.
If Lisa enjoys Rover’s antics with the TV remote only in the daytime and not at night when she feels tired, Rover will put the remote under her chair only during the day, because daylight has become a signal that tells Rover his behavior will be reinforced. Daylight has become a discriminative stimulus. A discriminative stimulus is a cue that indicates the kind of consequence that’s likely to occur after a response. In operant conditioning, stimulus discrimination is the tendency for a response to happen only when a particular stimulus is present.
Suppose Lisa’s dog, Rover, began to put the remote under her chair not only during the day but also whenever a bright light was on at night, thinking she would probably pat him. This is called stimulus generalization. In operant conditioning, stimulus generalization is the tendency to respond to a new stimulus as if it is the original discriminative stimulus.