In the late 19th century, psychologist Edward Thorndike proposed the law of effect. The Law of Effect states that any behavior that has good consequences will tend to be repeated, and any behavior that has bad consequences will tend to be avoided. In the 1930s, another psychologist, B. F. Skinner, extended this idea and began to study operant conditioning.
Operant conditioning is a learning process in which behaviors are shaped by their consequences. It focuses on how rewards (reinforcements) and punishments influence the likelihood of a behavior being repeated. When a behavior is followed by a positive consequence, such as a reward, it is more likely to occur again. Conversely, if a behavior is followed by a negative consequence, it becomes less likely to occur again. This type of conditioning emphasizes the role of environmental consequences in actively shaping voluntary behavior.
Reinforcement and Punishment
Psychologists use several key terms to discuss operant conditioning principles, including reinforcement and punishment.
Reinforcement is the delivery of a consequence that increases the likelihood that a response will occur. Positive reinforcement is the presentation of a pleasant stimulus after a response so that the response will occur more often. Negative reinforcement is the removal of an unpleasant stimulus after a response so that the response will occur more often. In this terminology, “positive” and “negative” do not mean “good” and “bad.” Instead, positive means adding a stimulus, and negative means removing a stimulus.
Punishment is the delivery of a consequence that decreases the likelihood that a response will occur. Positive punishment is the presentation of an aversive stimulus after a response so that the response will occur less often. Negative punishment is the removal of a desirable stimulus after a response so that the response will occur less often.
Ultimately, reinforcement helps to increase a behavior, while punishment helps to decrease a behavior.
Primary and Secondary Reinforcers and Punishers
Primary reinforcers satisfy basic biological needs and are inherently rewarding, such as food, water, and warmth. Primary punishers, such as pain and freezing temperatures, are naturally unpleasant.
Secondary reinforcers, such as money, fast cars, and good grades, are satisfying because they have become associated with primary reinforcers. Secondary punishers, such as failing grades and social disapproval, are unpleasant because they’ve become associated with primary punishers.
Secondary reinforcers and punishers are also called conditioned reinforcers and punishers because they arise through classical conditioning. To distinguish between primary and secondary reinforcers, people can ask themselves this question: “Would a newborn baby find this stimulus satisfying?” If the answer is yes, the reinforcer is primary. If the answer is no, it’s secondary. The same idea can be applied to punishers by asking whether a baby would find the stimulus unpleasant.
Skinner’s Box
Just as Pavlov’s fame stems from his experiments with salivating dogs, Skinner’s fame stems from his experiments with animal boxes. Skinner used a device called the Skinner box to study operant conditioning. A Skinner box is a cage set up so that an animal can automatically get a food reward if it makes a particular kind of response. The box also contains an instrument that records the number of responses an animal makes.
In Skinner’s experiment, a hungry rat (or pigeon) was placed in the Skinner Box, which contained a lever (for rats) or a disk (for pigeons) that, when pressed, would release a food pellet. Initially, the rat would move around the box randomly, but eventually, it would press the lever by chance, which would result in the release of food. This positive consequence (the food) served as positive reinforcement, encouraging the rat to press the lever more frequently.
Skinner observed that after repeated trials, the rat learned to press the lever deliberately to receive food. This demonstrated that behaviors followed by rewarding outcomes are more likely to be repeated – a central principle to operant conditioning.
Reinforcement Discrimination and Generalization
Discrimination in operant conditioning occurs when an organism learns to respond only to specific stimuli that signal a reinforcement opportunity and to ignore other stimuli. For example, a dog may learn to sit only when it hears its owner’s command but not when someone else uses the same command.
Example: In his experiments, Skinner found that pigeons could learn to peck specific colored disks, such as red, to receive a reward, but ignore other colors. This demonstrated that pigeons could discriminate between colors, responding only to the color that signaled reinforcement.
Generalization involves responding to similar, but not identical, stimuli with the conditioned behavior. If a child learns to say “please” for candy at home and then begins saying “please” in other settings to obtain treats, this demonstrates generalization. Generalization shows how learned behaviors can transfer to similar situations.
Example: When Skinner varied the color of the disk slightly (e.g., from red to a slightly lighter shade), pigeons would often still peck, showing that they generalized the learned response to similar colors. The closer the color was to red, the more likely they were to peck, indicating generalization across similar stimuli.
Shaping
Shaping is a technique in which reinforcement is used to guide a response closer and closer to a desired response. In shaping, each step toward the desired behavior is reinforced until the individual or animal can perform the full behavior.
Example: Lisa wants to teach her dog, Arséne, to bring her the TV remote control. She places the remote in Arséne’s mouth and then sits down in her favorite TV-watching chair. Arséne doesn’t know what to do with the remote, and he just drops it on the floor. So Lisa teaches him by first praising him every time he accidentally walks toward her before dropping the remote. He likes the praise, so he starts to walk toward her with the remote more often. Then she praises him only when he brings the remote close to the chair. When he starts doing this often, she praises him only when he manages to bring the remote right up to her. Pretty soon, he brings her the remote regularly, and she has succeeded in shaping a response.
Instinctive Drift
While shaping can be effective for many behaviors, research has shown that certain natural, instinctual behaviors can interfere with conditioned behaviors. This tendency is called instinctive drift. Two psychologists, Keller and Marian Breland, were the first to describe instinctive drift. The Brelands found that through operant conditioning, they could teach raccoons to put a coin in a box by using food as a reinforcer. However, they couldn’t teach raccoons to put two coins in a box. If given two coins, raccoons just held on to the coins and rubbed them together. Giving the raccoons two coins brought out their instinctive food-washing behavior: raccoons instinctively rub edible things together to clean them before eating them. Once the coins became associated with food, it became impossible to train them to drop the coins into the box. This research demonstrated that some behaviors are deeply ingrained due to biological predispositions, making them resistant to conditioning. Therefore, not all behaviors can be shaped through reinforcement. Shaping works best with behaviors that are not strongly opposed by instinctual tendencies.
Superstitious Behavior
Superstitious behavior arises when an individual mistakenly associates a behavior with a consequence, even though there is no actual connection between the two. This occurs when a behavior is accidentally reinforced, leading the individual to repeat the behavior in hopes of achieving the same outcome.
B. F. Skinner demonstrated superstitious behavior in one of his experiments with pigeons. He placed the pigeons in a box where food was dispensed at random intervals, independent of any action by the pigeons. However, the pigeons began to repeat whatever behaviors they had been performing when the food appeared (such as spinning in circles or pecking in a corner), believing these actions were responsible for the food delivery. The birds developed “superstitious” behaviors by associating them with rewards that were actually delivered randomly.
In humans, superstitious behavior can be observed in rituals or “lucky” actions performed to bring about a desired outcome, like bringing one’s “lucky” pen or pencil to an exam or wearing one’s “lucky” shirt or jersey to a sporting event in hopes of increasing the likelihood of one’s desired outcome. Even though these behaviors are not causally related to the outcome, individuals continue performing them because of the perceived association.
Learned Helplessness
Learned helplessness occurs when an organism learns that it has no control over repeated exposure to aversive events, leading it to stop trying to avoid or escape the situation. Psychologists Martin Seligman and Steven Maier demonstrated this phenomenon in an experiment with dogs. Dogs were placed in situations where they received unavoidable shocks, and after repeated exposure, the dogs eventually stopped attempting to escape, even when an opportunity to avoid the shock was later presented. The dogs learned to feel helpless, assuming that nothing they did would change their outcome.
Learned helplessness has significant implications for human behavior, particularly in understanding how repeated failures or uncontrollable negative events can impact mental health. For example, individuals who experience repeated exposure to stressors they feel they cannot control, such as chronic stress, abusive relationships, or workplace challenges, may develop feelings of helplessness and stop trying to change their circumstances. This concept is also linked to depression in humans, as individuals may come to believe that they have no control over negative events in their lives, leading to passivity and withdrawal.
Reinforcement Schedules
In operant conditioning, the schedule by which reinforcement is delivered greatly impacts the strength and persistence of the learned behavior. The two primary types of reinforcement schedules are continuous and partial (or intermittent), each of which affects how quickly a behavior is acquired and how resistant it is to extinction.
Continuous Reinforcement
Continuous reinforcement involves giving reinforcement every time a desired behavior is performed. This schedule is highly effective for initially establishing a new behavior because the immediate and consistent reward strengthens the behavior-response association. However, behaviors learned under continuous reinforcement are generally more susceptible to extinction, meaning they may fade quickly if reinforcement stops. For example, training a dog to sit by giving it a treat every time it obeys uses continuous reinforcement.
Partial Reinforcement
In partial reinforcement, reinforcement is delivered only some of the time, making it more resistant to extinction compared to continuous reinforcement. Partial reinforcement can be structured based on time intervals or number of responses and is broken down into four main types: fixed-interval, variable-interval, fixed-ratio, and variable-ratio schedules.
In a fixed-interval schedule, reinforcement is given for the first response after a set period. For example, employees might receive a paycheck every two weeks regardless of their productivity during that time. Fixed-interval schedules tend to produce a scalloped pattern on a graph, where the response rate increases as the reinforcement time approaches and then drops after the reinforcement is delivered, resulting in bursts of activity followed by pauses.
In a variable-interval schedule, reinforcement is given after varying amounts of time, making the timing unpredictable. For instance, a supervisor who occasionally drops by at unpredictable times to reward hard-working employees ith praise is using a variable-interval schedule. This schedule results in a steady, moderate response rate because the individual never knows when the next reinforcement will occur.
In a fixed-ratio schedule, reinforcement is provided after a set number of correct responses. For example, a coffee shop’s loyalty program may offer a free drink after every 10 purchases. Fixed-ratio schedules generally produce a high rate of response, with a slight pause after each reinforcement, creating a staircase-like pattern on a graph.
In a variable-ratio schedule, reinforcement is delivered after an unpredictable number of responses, such as in gambling or lottery games. Variable-ratio schedules produce the highest and most consistent response rates, as individuals are motivated to keep responding, not knowing when the next reinforcement will occur. This schedule results in a steady and high rate of response without predictable pauses.
Each reinforcement schedule produces a unique response pattern when graphed. Fixed-interval schedules show a scalloped pattern, with responses increasing as the time for reinforcement approaches. Fixed-ratio schedules create a step-like pattern, with high responses followed by brief pauses. Both variable-interval and variable-ratio schedules lead to steady response rates, though the response rate is typically higher with variable-ratio schedules.