How to study for the LSAT

This post shared by Trevor Klee, Tutor, a Boston-based and online LSAT tutor who scored 175 on his LSAT.

1. How to Prepare for the LSAT

a) Start with a diagnostic test. What are your specific strengths and weaknesses? Use 21st Night to discern the patterns, and follow the error log’s patterns for review (don’t skip them)!

b) Do questions to focus on your weaknesses as revealed in the diagnostic test. Really try to understand the process of how to solve questions: you’ll find a lot of examples online. Ask yourself why certain techniques are used, and why your initial instinct may be wrong.

Don’t worry about speed, that comes with being confident and fluent in the techniques. As the old Army saying goes, “Slow is smooth and smooth is fast.” Focus on being smooth in your application of techniques.

c) Once you feel like you’ve covered your initial weaknesses, or you feel confused about about what to do next, take another practice test. Then start with a) again.

d) There are two parts to studying for the LSAT.

One part is like being a marathon runner. You need to put the miles in on the pavement to run a marathon. Anyone can do it, but it takes effort. Doing questions, getting them wrong, and then learning how to do them correctly is the equivalent of putting those miles in. It’s going to suck, but that’s how you learn.

The second part is like being your own coach. You need to reflect on your own progress and what you get wrong and right. What are the patterns in what you get wrong? What techniques do you have difficulty applying?

2. Your LSAT preparation materials


–Khan Academy LSAT Prep

–21st Night

-Error log helps you organize yourself, and show you what questions you still need to do, which questions you need to understand, and the patterns in what you’re getting wrong. It will also help you repeat questions, so you remember the strategies on test day. for answer explanations (thanks Graeme!)


-My recommendations: my videos

3. Your LSAT study plan

-Plan for roughly 100-150 hours of hardcore studying to go up 20 points

-So, if you’re starting at 150 and want to get to 170, plan to spend 3-4 months spending 20 hours a week studying (to give yourself some wiggle room)

-That’s 2 hours a day on weekdays, 5 hours a day on weekends

-It’s a lot! But packing it all into a few months is the best way to do it. People get discouraged when they spend months working on the LSAT, especially when it’s hard to see yourself making improvements week by week. Packing it into a short time prevents that.

If you get to the point where you can not just do, but also explain every question in Khan Academy (why the right answer is right and why the wrong answers are wrong), you can get a 165+

4. How to review the LSAT sections

-This is both how you should approach the questions, and, more importantly, how to analyze a question you got incorrect

-Analyzing incorrect questions is more important than doing new ones. Use these questions!

Reading Comprehension:

How did the passage fit together (i.e. why did the author include each paragraph in the section)? What precise part of the passage did I need to read to get the correct answer?

-Logical Reasoning:  

How does the argument’s reasoning lead to its conclusion (or, if it doesn’t, why not)? How does the correct answer fit into the argument’s flow from reasoning to conclusion?

Logic Games:

What’s the model of how the game works (i.e. what would be one correct answer to the game)? How can you minimize the logical steps you need to take to get or eliminate an answer (think of it like golf, and get a low score)?

5. When to get LSAT tutoring

You should get tutoring when

  1. You took an LSAT and it didn’t go well
  2. You feel overwhelmed

You don’t have to start with tutoring!

But, if you do want an LSAT tutor, contact me at trevor [at] .

How to Study for the GMAT

This post shared by Trevor Klee, Tutor.

1. Your overall process to start preparing for the GMAT

a) Start with a diagnostic test. What are your specific strengths and weaknesses?

Use 21st Night to discern the patterns. Put all questions you got wrong in the error log, and see which sorts of questions you tend to struggle with. The analytics section of the app will help.

b) Do questions to focus on your weaknesses. Really try to understand the process of how to solve questions: you’ll find a lot of examples online. Ask yourself why certain techniques are used, and why your initial instinct may be wrong.

Don’t worry about speed, that comes with being confident and fluent in the techniques. As the old Army saying goes, “Slow is smooth and smooth is fast.” Focus on being smooth in your application of techniques.

c) Once you feel like you’ve covered your initial weaknesses, or you feel confused about about what to do next, take another practice test. Then start with a) again.

d) There are two parts to studying for the GMAT.

One part is like being a marathon runner. You need to put the miles in on the pavement to run a marathon. Anyone can do it, but it takes effort. Doing questions, getting them wrong, and then learning how to do them correctly through the error log is the equivalent of putting those miles in. It’s going to suck, but that’s how you learn.

The second part is like being your own coach. You need to reflect on your own progress and what you get wrong and right. What are the patterns in what you get wrong? What techniques do you have difficulty applying?

2. Your materials to prepare for the GMAT


-Official GMATPrep tests

-Official Guide questions (which are all available on

– 21st Night


-Strategy guides, for the necessary techniques

-My recommendations: my strategy guides, Manhattan’s

3. Your GMAT study plan

-Generally speaking, you need to work 100 hours (intelligently) to improve 100 points on the GMAT. This is, of course, a very rough estimate, and depends heavily on the quality of the hours you put into studying.

-A reasonable way to accomplish this is to plan to work for 20 hours a week for 2 months (giving yourself room for breaks and slow days). That means working 2 hours per day on the weekdays, and 5 hours per day on the weekends.

-Studying for the GMAT should be a sprint. If you plan to spend 6 months, you will get demotivated midway through and lose track of what you’ve learned. Make it a major part of your life for 2–3 months, then be done with it.

-For a specific 60 day study plan, you can check out my email course.

4. How to review the GMAT sections

-This is both how you should approach the questions, and, more importantly, how to analyze a question you got incorrect

-Revision through the error log is the key to learning

Reading Comprehension: what precisely did I need to read to get the correct answer?

Critical Reasoning: how does the argument work (premise, reasoning, conclusion)? How does the correct answer fit into the argument?

Sentence Correction: how does the correct answer correct and efficiently convey the meaning?

Problem Solving: what equations do I need to start with? how do I get from there to the answers?

Data Sufficiency: How do I simplify the prompt? Or, in other words, what’s the prompt really asking for?

When to seek out GMAT tutoring

You might expect a tutor to say, “Seek out tutoring, all the time, for as many hours as possible, no matter what”. As my Dad says, “Don’t ask the barber when you should get a haircut”.

But, this isn’t the case. Or, at least, it’s not what I recommend.

You should seek out tutoring in two cases:

1. You took a practice GMAT or a real GMAT, and it didn’t go the way you expected or wanted

2. You’ve been studying for a while, and you’re overwhelmed

In either case, you shouldn’t seek out tutoring until you’ve put in some serious effort on your own. It’ll save your wallet, and give you a better idea of what you can get out of tutoring.

In that case, you can start your GMAT tutoring journey by emailing me at the address on the top of the page.

Genetically engineering virus-immune bees

This is a post I wrote outside my comfort zone, mainly because I was having serious writer’s block about writing things inside my comfort zone. I think everything I wrote is correct, but who am I to judge?

Because there’s literally nothing else interesting or important going on in the world right now, I thought I’d take a close look at this neat paper on bees called Engineered symbionts activate honey bee immunity and limit pathogens and its supplement

In this paper, they detail how they genetically engineered the gut bacteria of bees to produce double stranded RNA, which they used to cause bees to gain weight, defend themselves against the deformed wing viruses, and kill parasitic Varroa mites. The latter two are the main causes of colony collapse disorder, if you’re familiar. 

This is cool already, but the way they did it was cool, too: they figured out what part of the bee/virus/mite genome they needed to target, bought online the custom-made plasmids (small DNA sequences) to produce the RNA that’d target the necessary parts of the genome, put the plasmids into bee gut bacteria, then put the gut bacteria into the bees.

That’s really cool, right? We now live in an era where you can just like… genetically engineer bees with stuff you order online. Realistically, you could genetically engineer yourself with stuff you order online. You could make yourself resistant to a virus or lactose intolerant (if you really wanted to).

Before we wax too rhapsodic, though, let’s talk a bit about exactly what they did, how they did it, what the limitations are, some unanswered questions/issues, and then, finally, how soon this stuff can be actually ready to be live.

So, to explain what they did, let’s start with their goals. Their goal was to cause RNA interference with other RNA strands in the bees’ bodies. RNA, as you might recall from biology class, is a lot like DNA, in that it contains instructions for other parts of the cell. Bees’ bodies (and our bodies) use RNA to help transmit instructions from DNA to cell machinery, while viruses just keep all their instructions in RNA in the first place.

RNA interference, therefore, means that the instructions are being disrupted. If you disrupt the instructions to reproduce a virus, the virus will not be reproduced. If you disrupt instructions to produce insulin, the insulin will not be produced. One of the ways RNA can be interfered with (and the way that these people specifically interfered with RNA) is by double stranded RNA. 

Double stranded RNA (dsRNA) is what it sounds like: RNA, but double stranded. This is weird because RNA is usually single stranded. When you put targeted double stranded RNA into the body, an enzyme called dicer dices it (great name, right?) into two single strands.

One of these strands will then be complementary (fits like a puzzle piece) to the target RNA, so it’ll latch onto it, serving as a sort of flag to the immune system. Then a protein called argonaute, now that it knows what it’s targeting, comes in and slices the target RNA in two. The target RNA is effectively interfered with.

Now, this is something that happens naturally in the body all the time as part of the immune system. However, the body has to be producing the right double stranded RNA already, so it can flag things correctly (the flags are super specific). What if the body isn’t producing the right dsRNA yet?

Well, if it isn’t, you need to get the dsRNA in there somehow, so the flagging process can start. One option is, of course, just to inject a ton of double stranded RNA into the body. You have to make it all first, of course, and it has a limited shelf-life, but it’s doable. That’s been done before with bees.

This paper took a different tactic. The authors wanted double stranded RNA to be produced inside the bees’ bodies. Bees’ bodies (and all bodies) contain all the machinery to produce any type of RNA you want. That’s how viruses work, actually: they force the body to produce the viral RNA. It’s all just building blocks put together in different orders, after all.

So, in order to get it produced inside the bees body, first they designed a plasmid, which is a circular ring of DNA (DNA can produce RNA). This was the thing that they literally just went to the Internet for. They knew the result they wanted to get (the order of the dsRNA), so then they just went online and ordered a plasmid that would produce dsRNA in a certain order, and they got their plasmid in the mail. That’s amazing, right?

Once they had a plasmid, they “transformed” it into S. Alvi, a gut bacteria that’s very common in bees. This is basically like molecular sewing: you snip open S. Alvi‘s DNA, snip open the plasmid to get a single strand, sew the single strand in S. Alvi‘s DNA, and then let S. Alvi sew itself back together again with the plasmid still inside.

Then, getting it into the bees was relatively easy: they dunked the bees in a solution with sugar water and the bacteria. The bees clean each other off, and then they get infected with the bacteria. Now, the next time S. Alvi‘s DNA gets activated in the bee to do normal gut bacteria stuff, it’ll also produce this dsRNA.

From there pretty much everything else was just testing. They tested where the RNA ended up being produced by including green fluorescent protein in their primer, which is a super common (but still cool) tactic in biology. If you include “make this protein that glows green under UV light” light in your plasmid’s instructions, then wherever your RNA is being produced, there will also be bright green light.

They also tested whether dicer and argonaute were active, to see if the dsRNA was actually doing its thing. Finally, they got into whether they could actually make the experiment work. First, they used one kind of dsRNA to interfere with insulin RNA (i.e. disrupt the production of insulin). They found that insulin production halved (or even quartered) in all areas of the bee body compared to control.

As you’d expect, this has pretty dramatic effects on the bees. The bees who had insulin interfered with were more interested in sugar water, and also gained weight compared to normal bees. I’ve put the weight graph down below, as I think it’s convincing. The sugar water graph I’m also going to put down below, but I’ll discuss it later, because it’s kind of weird.

pDS-GFP is the plasmid that only produces green light. pDS-InR1 is the plasmid that knocks out insulin. As you can see, the bees that were infected with pDS-InR1 started off lighter, on average, than the bees with pDS-GFP, then ended up heavier.

Same deal as above.
So, this is a complicated graph. Essentially, they don’t feed the bees for an hour, then they strap them down and put them next to sugar water. If the bee extends their proboscis, that’s a response. A response rate of 0.25 means 25% of the bees in a treatment group responded. All bees that responded just to water or never responded were kicked out. pDS-INR1 are the bees that had their insulin knocked out; pDS-GFP are the bees that only produce green light; pNR are bees who were infected with empty plasmids. The 0.01 is saying 2 standard deviation significance between the empty plasmid and the insulin one.

Then, they tried for the interesting stuff. They used another kind of dsRNA to interfere with the reproduction of deformed wing virus (DWV) in bees. The combo of deformed wing virus and Varroa mites are super deadly for bees, and are the main direct causes of colony collapse disorder. So, first, they infected honey bees with DWV, then gave them a plasmid to produce dsRNA to interfere with the reproduction. 45% of infected bees survived 10 days later with the plasmid; only 25% of bees survived without it.

Dashed line is when the bees were injected with just buffer solution, solid line is when they were injected with the virus. pDS-DWV2 are the bees that have the plasmid that protects against the virus; pDS-GFP are the bees that have the plasmid that just produces green light; pNR are bees with an empty plasmid. *** means 3 standard deviations significant, NS means not significant.

Next, they tried to use dsRNA to kill mites feeding on bees. This is a little more complicated, because the dsRNA gets produced in the bees, and then the mites feed on the bees and ingest the RNA. That then kills the mites. 50% of mites survived after 10 days when they didn’t feed on bees with the plasmid; only 25% of mites survived after 10 days when they fed on bees with the plasmid.

This is a confusing graph, because they infected the bees with the plasmid, then measured the survival rate of the mites that fed on those bees. pDS-VAR is the plasmid that kills mites. pDS-GFP is the green light, and pNR is just empty plasmid. ** means 2 standard deviations significant.

Overall, this is a pretty cool paper with some cool methods. Unless they really, really screwed up their data, it seems like they definitely found an effective way to protect bees against viruses and mites through genetic engineering. A lot more bees survived and a lot more mites died than the control.

But I want to talk about some of the limitations and problems I have with the study, too.

First of all, one of their claims is that, even though they produced dsRNA only in the gut, it was effective throughout the bees’ entire body. They used this graph to show this.

The orange is the plasmid producing dsRNA with the green fluorescent protein (the GFP marker), and the gray is a plasmid with nothing in it as a control. The y axis is the number of copies of GFP RNA per ng of normal RNA, x axis is days after inoculation. The y axis is a log scale, so each tick is 10 times more than the last one.

You see that, by day 15, there’s like 10,000 copies of GFP RNA in the gut, 100 in the abdomen, and like 10 in the head for the treatment group. There’s also around 10 copies of GFP RNA in the gut and abdomen for the control group, which is presumably some kind of cross contamination or measurement error.

That’s fine, but I’m not a huge fan of 10 copies of GFP being an error in the control group, but a positive signal in the head treatment group. They try to defend it by the head control group having zero GFP, but I’m not sure that’s the right comparison to make.

I think that the rest of the biomarkers in the head bear that out. Almost all of them are NS (not significant), and I’d imagine the one that is significant is just chance.

So, I’m not convinced any of this stuff makes it to the bees’ head. I think it definitely makes it to the bees’ abdomens, but the effect there is still weird. Look at the biomarker graphs below:

I’ve drawn in a bunch of lines just to point out how confusing the biomarker patterns are. For example, the graphs in row B are for dicer, which should go up with dsRNA (the body produces more of it because it has more dsRNA to dice). Note the y axis is the change, rather than an absolute number (annoying choice on their part).

For both the gut and abdomen, it increases from day 5 to day 10 (column a to column b). However, it stays flat in the abdomen in day 15, but still goes up in the gut. This is supposedly one of their significant results, but what gives?  Is there some natural limit in the abdomen that doesn’t exist in the gut?

They buried all these biomarker graphs in the supplement, but I think they’re interesting. At the very least, they complicate the story.

The next issue I have is with the sucrose response graph, which I’ve reproduced below.

If we just had the pDS-InR1 and pNR, I think it’d be a relatively clear story of insulin getting knocked out and bees becoming more sensitive to sugar. But, it’s really confusing what’s going on with the pDS-GFP bees.

It looks like those bees are consistently more responsive to sugar than the pNR bees, even though they should be virtually identical. Why is that? I really wish we could have seen the weight of the pNR bees compared to the other two, so we could have another point of comparison.

The final issue I have is with the bee and mite mortality rate graphs. Below is the bees with virus graph again.

The bees with the protective dsRNA (pDS-DWV) definitely do better than than the bees without (i.e. the bees with just GFP or just the empty plasmid NR) in the treatment group.

But the pDS-DWV bees also definitely do better in the control group, which shouldn’t happen (they’re not being attacked by the virus). The graph says the gap is not significant, and it might be right, but it’s still a big gap. It’s almost the same size as the treatment gap.

I’m also wondering why up to 40% of bees are dying in 10 days in the control group (the orange dashed line). Bees, according to Google, live 122 to 152 days, so they shouldn’t be dying that quickly. I mean, obviously it’s traumatic to get stabbed in the chest with a comparatively giant needle and pumped full of fluid, but, if that’s the case, what effect is that having on the treatment group? How much of the death is from the virus vs. the trauma? Couldn’t they find a better way of infecting them?

I also wonder about the Varroa mite graph, which I’ve reproduced again below. In the graph, 50% of mites die after 10 days in the control group. According to Google, mites live for 2 months. Why are so many of them dying after 10 days?

I’d like to see a control group of mites feeding on bees in “the wild” (i.e. in a normal beehive), to see what the normal survival rate should be.

So, I think these are some strong results, but they’re complicated. 

I don’t think the story of how dsRNA moves around the body is super clear, and I think it probably doesn’t make it to the head at all, contrary to what the paper claims. 

I think that either insulin is more complicated in bees than this paper assumes (i.e. it doesn’t have such a clear relationship to propensity towards sugar water), or there was something wrong with the GFP bees.

Finally, I think this method does protect against mites and deformed wing virus, but how much it does is complicated by the fact that the bees and mites died a lot regardless of what was done.

Final question: how close is this to production?

Well, barring legal issues, I think this could actually be close. Here are the two big issues standing in the way:

First, it seems like each bee has to be individually treated. The study authors actually tried to see if bees could infect each other with the genetically engineered bacteria, but it only worked on 4/12 newly emerged workers (which is obviously not a large sample size). They’d need to figure out some way to encourage bees to infect each other more, or beekeepers would have to individually treat each bee (which is labor intensive and probably traumatizing for the bees).

The other issue would be with regards to creating the plasmids, putting it into bacteria, then infecting the bees with the bacteria. That’s all expensive and labor intensive, and definitely not the sort of thing that beekeepers would want to do themselves. I’m actually very curious about the total cost of the plasmids for this experiment, and how much that would increase given the number of plasmids you’d need for all the bacteria.

Of course, the ultimate dream would be to do this to humans. Given that there are some viruses (which shall not be named) which currently don’t have effective treatments, the possibility of simply injecting ourselves with gut bacteria of our own transformed to produce dsRNA is really attractive.

I think there are some serious issues with that though. Humans have a really complex immune system compared to bees, and I’m not sure how our immune system would react to a bunch of random snippets of RNA floating around our blood stream. As a last ditch effort in a severe case though… might be interesting. I’ll explore that with my next post.

Self-organized criticality: the potential and problems of a theory of everything

Note: this essay is outside of my comfort zone, so there might be a few mistakes. I relied a lot on this paper and Wikipedia to help me think about it. Mistakes are my own.

The 1987 paper “Self-organized criticality: An explanation of the 1/f noise”, by Bak, Tang, and Wiesenfeld has 8612 citations. That is an astonishingly high number for a paper that presents a model for statistical mechanics. Even more astonishing is the range of papers that cite it. Just in 2020, it’s been cited by a paper on brain activity as related to genesa paper on the “serrated flow dynamic” of metallic glass, and a paper on producing maps of Australian gold veins.

It is an incredibly influential paper on a huge variety of subjects. I mean, I doubt the scientists who wrote those papers have a single other citation in common in their whole research history. How did they all end up citing this one paper? What’s been the effect on science of having this singular paper reach across such a wide range of subjects?

These are the topics that I want to explore in this paper. Before I can, though, we have to start by explaining what the paper is and what it tries to be.

Self-organized criticality. or SOC, is a concept coming out of complexity science. Complexity science is generally the study of complex systems, which covers a really broad range of fields. You might remember it as the stuff that Jeff Goldblum was muttering about in Jurassic Park. When you get a system with a lot of interacting parts, you can get some very surprising behavior coming out depending on the inputs. 

Bak, Tang, and Wiesenfeld, or BTW, were physicists. They knew that there were some interesting properties of complex systems, namely that they often displayed some similar signals. 

For one, if you measure the activity of complex systems over time, you often see a 1/f signal, or “pink noise”. For instance, the “flicker noise” of electronics is pink noise, as is the pattern of the tides and the rhythms of heart beats (when you graph them in terms of frequency).

From . Notice how the baseline of 1/f wanders? The basic reason is because it’s from a complex system with complex inputs.

For another, if you measure the structure of complex systems over space, you often see fractals. They’re present in both Romanesco broccoli and snowflakes. 

Fractal Broccoli.jpg
This is a gorgeous image of Romanesco broccoli from Wikipedia. It naturally approximates a fractal.

BTW proposed that these two are intimately related to each other, which had been suggested by others before. However, the way they proposed was that both can come from the same source [1]. In other words, 1/f noise and fractals can be caused by the same thing: criticality.

Criticality distinctions

Criticality is a phenomenon that occurs in phase transitions, where a system rapidly changes to have completely different properties. The best studied example is with water. Normally, if you heat water, it’ll go from solid, to liquid, to gas. If you pressurise water, it’ll go from liquid to solid (this is really hard and requires a lot of pressure). 

However, if you both heat and pressurize water to 647 K and 22 MPa (or 700 degrees Fahrenheit and 218 times atmospheric pressure), it reaches a critical point. Water gets weird. At the critical point (and in the vicinity of it), water is compressible, expandable, and doesn’t like dissolving electrolytes (like salt). If you keep heating water past that, it becomes supercritical, which is a whole different thing.

A nicely labeled phase transition diagram from Wikipedia. Note that criticality is within the very close vicinity of that red dot.

So, there are two important things about criticality. First, the system really rapidly changes characteristics within a very close vicinity to the parameters. Water becomes something totally unlike what it was before (water/gas) or after (supercritical fluid). Second, the parameters have to be very finely tuned in order to see this. If the temperature or pressure is off by a bit, the criticality disappears.

So what does that have to do with 1/f noise and fractals? Well, because systems are scale invariant at their critical point. That means that, if you graph the critical parameters (i.e. the things that allow the system to correlate and form a coherent system, like electromagnetic forces for water), you should always see a similar graph, no matter what scale you’re using (nano, micro, giga, etc.). This is different from systems at their non-critical point, which usually are a mess of interactions that change depending on where you zoom in, like the electromagnetic attractions shifting among hydrogen and oxygen molecules in water.

Image result for scale invariance
A mesmerizing display of scale invariance from Stack Exchange. No matter how much we zoom in, the graph remains the same.

This is suggestive of fractals and 1/f noise, which are also scale invariant. So, maybe there can be a connection that can be explored. 

Before BTW could make that connection stronger, though, they needed to fix the second important thing about criticality: the finely tuned parameters. 1/f noise and fractals are everywhere in nature, so they can’t come from something that’s finely tuned. To go back to water, you’re never going to see water just held at 647 K, 22 MPa for an extended period of time outside of a lab. 

This is where BTW made their big step. What if, they asked, systems didn’t have to be tuned to the parameters? What if they tuned themselves? Or, to use their terminology, what if they self-organized?

Now, for our water example, this is clearly out of the question. Water isn’t going to heat itself. However, not every phase transition has to be solid-liquid-gas. It just needs to involve separate phases that are organized and can transition (roughly). Wikipedia lists 17 examples of phase transitions. All of these have critical points. Some of them have more than one. 

So BTW just needed to find phase transitions that could be self-organized. And they kind of, sort of, did. They created a model of a phase transition that could self-organize (ish) to criticality. This was the sandpile model.

It goes like this: imagine dropping grains of sand on a chessboard on a table. When the grains of sand get too high, they topple over, spilling over into the other squares. If they topple over on the edge, they fall off the edge of the table. If we do this enough, we end up with an uneven, unstable pile of sand on our chessboard.

If we then start dropping grains randomly, we’ll see something interesting. We see a range of avalanches. Most of the time, one sand grain will cause a small avalanche, as it only destablizes a few grains. Sometimes, that small avalanche causes a massive destabilization, and we get a huge avalanche.

What BTW consider important about this model is that the sandpile is in a specific phase, namely heaped on a chessboard. This phase is highly sensitive to perturbations, as a single grain of sand in a specific spot can cause a massive shift in the configuration of the piles.

If you graph the likelihood of a massive avalanche vs. a tiny one, you get a power law correlation, which is scale invariant. And, most importantly, once you set up your initial conditions, you don’t need to tune anything to get your avalanches. The sandpiles self-organize to become critical.

Image result for self organized criticality graph
This graph from the original paper is on a log-log scale, so a straight line means that there’s a power-law correlation. The fall-off at the end is dictated by the size of the chessboard.

So, with this admittedly artificial system that ignores physical constraints (i.e. friction), we get something that looks like criticality that can organize itself. From there, we can try to get to fractals and 1/f noise, but still in artificial systems. Neat! So how does that translate to 8600 citations across an incredibly broad range of subjects?

Getting to an avalanche of citations for SOC

Well, because BTW (especially Bak, the B), weren’t just going to let that sit where it was. They started pushing SOC hard as a potential explanation for anytime you saw both fractals and 1/f noise together, or even a suggestion of one or both of them.

As long as you had a slow buildup (like grains of sands dropping onto a pile), rapid relaxation (like the sand avalanche), power laws (big avalanches and small ones), and long range correlation (the possibility of a grain of sand in one pile causing an avalanche in a pile far away), they thought SOC was a viable explanation.

One of the earliest successful evangelizing attempts was in an attempt to explain earthquakes. It’s been known since Richter that earthquakes follow a power law distribution (small earthquakes are 10-ish times more likely to happen than 10-ish times larger earthquakes). In 1987, around the same time of SOC, it became known that the spatial distributions and the fault systems of earthquakes are fractal.

Image result for richter law
Gutenberg-Richter law plotted with actual earthquakes, number of earthquakes vs magnitude. The magnitude on the bottom is log scale (the Richter scale), so we get a clear power law correlation. From Research Gate.

From there, it wasn’t so far to say that the slow buildup of tension in the earth’s crust, the rapid relaxation of an earthquake, and the long range correlation of seismic waves meant SOC. So Bak created a model that produced 1/f noise in the time gap between large earthquakes, and that was that! (Note: if this seems a little questionable to you, especially the 1/f noise bit, read on for the parts about problems with SOC).

Next: the floodgates were open. Anything with a power law distribution was open for a swing. Price fluctuations in the stock market follow a power law, and they have slow-ish build-up and can have rapid relaxation. Might as well. Forest fire size follows a power law, and there’s a slow buildup of trees growing then a rapid relaxation of fires. Sure! Punctuated equilibrium means that there’s a slow buildup of evolution, and then a rapid evolutionary change (I guess). Why not?

Bak created a movement. He was very deliberate about it, too. If you look at the citations, both the earthquake paper and the punctuated equilibrium paper were co-written by Bak. They were then cited and carried forward by other people in those specific fields, but he started the avalanche (if you’ll forgive the pun).

And he didn’t stop with scientific papers, either. He spread the idea to the public as well, with his book How Nature Works: the science of self-organized criticality. First of all, hell of a title, no? Second of all, he was actually pretty successful with this. His book, published in 1996, currently has 230 ratings on Goodreads. For a wonky book from 1996, that’s pretty amazing!

That, in a nutshell, is how the paper got to 8600 citations across such a broad range of fields. It started off as a good, interesting idea with maybe some problems. It could serve as an analogy and possibly an explanation for a lot of things. Then it got evangelized really, really fervently until it ended up being an explanation for everything.

But what about science?

This brings us back to our second question, which is: what have the effects on science been of this? Well, of course, good and bad.

Let’s discuss the good first, because that’s easier. The good has been when SOC has proved to be an interesting explanation of phenomena that didn’t have great explanations before. For example, this paper, which I relied on heavily in general for this essay, discusses how SOC has apparently been a good paradigm for plasma instabilities that did not have a good paradigm before.

Now, I completely lack any knowledge of plasma instabilities, so I’ll have to take their word for it, but it seems unlikely that the plasma instability community would know of SOC without Per Bak’s ceaseless evangelism.

The bad is more interesting. Any scientific theory of everything is always going to have gaps. However, most of them never have any validity in the first place. Think of astrology, Leibniz’s monads, or Aristotle’s essentialism: they started off poorly and were evangelized by people who didn’t really understand any science in the first place.

SOC is more nuanced. It had and has some real validity and usefulness. Most of the people who evangelized and cited it were intelligent, honest people of science. However, Bak’s enthusiastic evangelism meant that it was pushed way harder than the average theory. As it was pushed, it revealed problems not just with how SOC was applied, but with a lot of the way theory is argued in general.

The first and most obvious problem was with the use of biased models. This is always a tough problem, because not everything can be put in a lab or even observed. There is always a tension between when a model is good enough, and what things are ok to put in a model or leave out. But Bak and his disciples clearly created models that were designed to display SOC first of all, and only model the underlying behavior secondarily.

Bak’s model of punctuated equilibrium is a particularly egregious example. Bak chose to model entire species (rather than individuals), chose to model them only interacting with each other (ignoring anything else), and modeled a fitness landscape (which is itself a model) on a straight line. In more straightforward terms, his model of evolution is dots on a line. Each dot is assigned a number. When the numbers randomly change, they are allowed to change the dots around them too with some correlation.

Image result for one dimensional fitness landscape
Something like this, with the height of the dots being the numbers he assigned. Seriously. That’s his model of evolution. From here.

This is way, way too far from the underlying reality of individuals undergoing selection. It makes zero sense, and was clearly constructed just to show SOC. Somehow, though, it got 1800 citations.

However, I feel less confident criticizing Bak’s model of earthquakes. In this, he models earthquakes as a two dimensional array of particles. When a force is applied to one particle, it’s also applied to its neighbors. Now, obviously earthquakes are 3-dimensional, and there is a wave component to them that’s not well-represented here, but this seems like an ok place to start.

Maybe it’s not though. Maybe we should really start with three dimensions, and model everything we know about earthquakes before we call an earthquake model useful. Or maybe we should go one step further, and say an earthquake model is only useful once it’s able to make verifiable predictions. Newton’s models of the planets could predict their orbits, after all.

Image result for epicycles
Then again, the incredibly complicated epicycle model could also predict the movements of the planets, to a point. Prediction can’t be the end all and be all. This image from Wikipedia is from Cassini’s highly developed epicycle model.

A purist might hold that models aren’t useful until they’re predictive, but that’s a tough stance for people actually working in science. They have to publish something, and waiting until your model can make verifiable predictions means that people won’t really be communicating any results at all. Practically speaking, where do we draw the line? Should we draw the line at any model which is created to demonstrate a theory, but allow any “good faith” model, no matter how simplistic?

A different sort of issue comes up with SOC’s motte-and-bailey problem. Bak, in his book How Nature Works, proposed SOC for lots of things that it doesn’t remotely apply to. Punctuated equilibrium was just one example. When he was pressed on it, he’d defend it by going back to the examples that SOC was pretty good on.

Image result for motte and bailey logical fallacy
The motte and bailey style of argumentation, from A Moment With Mumma. I had to find this one, because the top result had text from arguments about communism. Oh, the Internet.

It’s not a problem to propose a theory to apply to new situations, of course. However, so many theorists rely on the validity of a theory for a limited example to justify it for a much broader application, rather than defending it for the broader application.

On one level, that’s just induction: recognizing the pattern. However, it’d seem that there should be much more effort put into establishing that there is a pattern, and then justifying the new application as soon as possible.

This ties into the next problem: confusing necessary and sufficient assumptions. In the initial paper, BTW were pretty careful about their claim. SOC was sufficient, but not necessary, to explain the emergence of fractals and 1/f noise. It was necessary, but not sufficient, to have power law distribution, long range correlations, slow buildup with rapid relaxations, and scale invariance to have SOC [2].

Image result for square rectangle
Sufficient vs. necessary: before you can identify something as a square, it is necessary for it to have 4 sides, but it’s not sufficient (it could be a rectangle). It is sufficient and necessary for a square to have 4 sides of equal length and 4 90 degree angles. Image from Quora.

When Bak was hunting for more things to apply SOC to, he got sloppy. He would come close to making claims like fractals and 1/f noise implied SOC, or power laws implied SOC. Now, this is maybe ok at a preliminary part of the hunt. If you’re looking to find more applications of SOC, you have to look somewhere, and anything involving fractals, power laws, or the like is an ok place to start looking. But you can’t make that implication in your final paper.

Not only does this make your paper bad, but it poisons the papers that cite it, too. This is exactly what’s happened with some of the stranger papers that have cited BTW, which is another reason for its popularity besides Bak’s ceaseless evangelism and its validity for limited cases. SOC got involved in neurology through this paper, which uses a power law in neuronal avalanches to justify the existence of criticality. In other words, it says a power law is sufficient to assume criticality, and then goes from there to create a model which will justify self-organization.

But that’s backwards! Power laws are necessary for criticality; they aren’t sufficient. Power laws show up literally everywhere, including in the laws of motion, the Stefan-Boltzmann equation for black body radiation, and the growth rate of bacteria. None of those things are remotely related to criticality, so they obviously can’t imply criticality. The paper, which is cited 454 times (!), is based on a misunderstanding.

SOC is actually kind of a unique case, scientifically, because it did lay out its necessary and sufficient hypotheses so clearly. That’s why I can point out the mistake in this paper. However, many more less ambitious scientific hypotheses aren’t nearly so clear. For example, here’s the hypothesis of the neurology SOC paper, copy pasted from the abstract: Here, we demonstrate analytically and numerically that by assuming (biologically more realistic) dynamical synapses in a spiking neural network, the neuronal avalanches turn from an exceptional phenomenon into a typical and robust self-organized critical behaviour, if the total resources of neurotransmitter are sufficiently large.

The language is a bit dense, but you should be able to see that it’s unclear if they think SOC is sufficient and necessary for neuronal avalanches (you have to have it), or just sufficient (it’s good enough to have it). In fact, I’d wager that they wouldn’t even bother to argue the difference.

It’s only because SOC is such an ambitious theory and Bak tried to apply it to so many things that he was forced to be so clear about necessary vs. sufficient. Way, way too often in scientific papers suggestive correlations are presented, and then the author handwaves what exactly those correlations mean. If you present a causal effect, is that the only way the causal effect can occur? Are there any other ways?

The weighing

So, in conclusion, the bad parts of SOC’s incredible wide-ranging influence are a lot of the bad parts of scientific research as a whole. Scientists are incentivized professionally to publish papers explaining things. That’s one of the big purposes of scientific papers. They are not particularly incentivized to be careful with their explanations, as long as they can pass a sniff test by their peers and journal editors.

This means that scientists end up overreaching and papering over gaps constantly. They develop biased models, over-rely on induction without justification, and confuse or ignore sufficient and necessary. 

BTW’s impact, in the end, was big and complicated. They created an interesting theory which had an enormous impact on an incredible variety of scientific fields in a way that very few other theories ever have. On most of the fields, SOC was probably not the right fit, although it may have drove the conversation forward. On a few of the fields, it was a good fit, and helped explain some hitherto unexplained phenomena. On all of the fields, it introduced new terms and techniques that practitioners were almost certainly unfamiliar with.

It’ll be interesting to see when the next theory of everything comes about. Deep learning and machine learning are turning into a technique of everything, which comes with problems of its own. Who knows?


1. This is where some of the problems and the ubiquity of SOC come from. Bak, in particular, has come very close to suggesting they always come from the same source, which is way more indefensible than they can come from the same source. See the motte and bailey discussion further on.

2. Quick primer on sufficient and necessary: Sufficient means that if you have SOC, you are guaranteed to have fractals and 1/f noise, but you don’t need to have SOC to have those. Necessary means you needed power laws, etc. for SOC, but you might also need more things too for SOC

How to fix how people learn calculus: make calculus exciting again

Most people who take a calculus course never really learn calculus. They have only a hazy grasp of how the pieces fit together. Sure, they might be able to tell you that the derivative of x^2 is 2x, but ask them why and you’ll get a blank look. They learned to mask their confusion with shortcuts, and their teachers never really checked to see if there was anything deeper.

This is a pity, and a mark of how we fail to teach calculus correctly. If students really learned calculus, they wouldn’t find it confusing. They’d find it shocking, instead. Calculus is unlike arithmetic, algebra, or geometry, and learning calculus is learning a whole new way to think about math.

Arithmetic can be understood by simply counting on your fingers. Geometry can be understood by drawing shapes in the sand (or on the blackboard). Basic algebra can be understood by replacing the numbers in our arithmetic with variables. All of these build off of real-world analogues.

Calculus is different. Calculus has paradoxes at its core, and understanding calculus means coming to grips with these paradoxes in a way that doesn’t have a real world analogue. This is a tall order. In fact, it’s so tall it took literally 2000 years to do so.

When Democritus and Archimedes first approached the integration part of calculus through geometry, they recognized the usefulness of it quickly. Have an irregular shape or parabola that you need to calculate the area of? You just divide it up into infinitely small triangles, and “exhaust” the area. It actually works pretty well.

Illustrates Archimedes’ method of exhaustion for finding the area of a region under a parabola.
From AMSI. Note that the smaller the triangles get, the closer they get to approximating the area, but there are still gaps.

But what does infinitely small mean? The Greeks couldn’t figure it out. If they’re triangles, they presumably have an area. If they have an area, putting a bunch of them together should add up to some huge number, even if each one individually is small. If we say instead that they don’t have an area, putting a bunch together should add up to an area of zero. But somehow, calculus is supposed to tell us that putting an infinite number of triangles together adds up to a finite area.

Illustration of Zeno's dichotomy paradox
It’s related to one of Zeno’s paradoxes: in order for the guy to reach the end of the race, he has to cross to the halfway mark. Once he gets there, he has to cross to the halfway mark of what’s left (3/4). And halfway again, and so forth. He’s traveling an infinite number of increasingly small distances, so you’d expect for him to never get there. Yet somehow, he does. How?

This should be shocking. Our understanding of (Euclidean) geometry is almost entirely based on what we can draw. Integrative calculus looks like something we can draw on paper, but, when we try to, we end up with something that doesn’t really make sense. This very much disturbed the Greeks (and really disturbed the Jesuits as they followed in the Greek footsteps, to the point that the Jesuits declared “infinitesimals” heretical). What kind of geometry made less sense when you drew it out?

Understanding calculus from our other main method of understanding math, algebra, was even less fruitful at first. The Islamic Golden Age mathematicians, like Sharaf al-Din al-Tusi, experimented a lot with solving polynomials, and eventually realized they could find the maxima of certain functions by limits (note: that link goes to a clever recast of al-Tusi’s work into a modern day calculus word problem). But the use of that sort of “pure” algebra stops there. Without strict definitions of functions or limits, it’s hard to recognize a problem like “finding the maximum of a cubic polynomial” as what it is, which is finding the derivative of a function.

We had to literally extend algebra before we can get to derivative calculus from the other way. We needed functions, and the Cartesian coordinate plane, the latter of which was literally invented by Descartes and his academic descendants to help understand calculus (a fact which surprised me while researching this, given the standard math curriculum). If we understand functions and we have the coordinate plane, we can plot functions onto the coordinate plane. Then we can think about dividing up a parabola into line segments, and examining the slope of those line segments can get us thinking about how we might predict the way the slope changes over the course of a curve.

From this random PowerPoint presentation. Where the tangent line intersects the curve is, of course, the slope of the curve at that spot. The title of this slide suggests one of the main reasons that mathematicians became interested in finally formalizing derivatives: to relate acceleration, velocity, and position. Newton, of course, was the most famous, but Descartes and Galileo before him made huge amounts of progress on the same issue. Leibniz, interestingly, came to calculus on purely theoretical concerns, like the Arabs before him.

This is a useful trick. We can get to maxima and minima by looking at when the slope will reach zero. And, we’re at the point where we can be shocked again. It’s another paradox!

A smooth parabola can’t really be made of line segments. It’d end up being choppy. So the line segments would have to be infinitely small, and then we get the same issue as before. A parabola of definite length being made up of an infinite number of infinitely small line segments seems like a contradiction. Either they have a length, in which case the parabola is choppy, or they don’t, in which case they shouldn’t be able to “make up” anything.

Here we have a choppy parabola. Each line segment is straight (i.e. has a definite slope) and obviously has an actual length, but they need gaps in between them in order to make the suggestion of a parabola. If we tried to connect them with straight lines, they’d run past each other. From mathworks.

So we’ve got paradoxes on either side of calculus. If we try to understand calculus by our old geometry, we’ve got a paradox of infinitely small triangles. If we try to understand calculus by our old algebra, we’ve got a paradox of infinitely small line segments. And yet both seem to work well for limited cases. How?

Ironically, pointing out these paradoxes of infinity can clue us into the greatest shock of all, which is that integrals and derivatives are two sides of the same coin. This is when calculus gets really surprising. The flip side of the rate of change of a parabola is the area under a parabola. 

Every time we add a bit of area (the red stripe), we are adding about the same as the function multiplied by the amount forward. The smaller the amount forward, the more exact the equality. This can only work with a strict definition of functions, though, which is what the Islamic mathematicians and the Greeks were lacking. From Wikipedia.

There is literally nothing in geometry and algebra so shocking as this. It’s so shocking that it took humanity 2000 years, from Democritus to Barrow (the first man to come up with a geometric proof of this) to realize this. Not only are geometry and algebra fundamentally related, but they’re related through something that’s both paradoxical and entirely physical (just think of the relation between total velocity and amount moved).

When students learn this, they should be like, “Holy shit, math! You mean that this entire time there’s been a deep relationship between two subjects I’ve been holding separate in my head? And now, armed with this new knowledge, I can go out and solve real world issues! That’s amazing!”

But, unfortunately, they’re not like that. I mean, I don’t know if any teenager would be caught dead being that enthusiastic about anything, but teachers don’t even attempt to make students that enthusiastic.

It’s not the teacher’s fault, either. The fault is in the curriculum. In this McGraw Hill textbook (which I think I used in my own AP Calculus class), this fundamental theorem of calculus is taught in chapter 4.5, sandwiched in between “The Definite Integral” and “Integration by Substitution”. So, students are taught how to do a derivative, how to do an integral, then “by the way these two seemingly unrelated topics are actually deeply related”, then “and also here’s another way to do integrals”.

Students react logically to this. They’re not shocked by the fundamental theorem of calculus, they’re confused. If it’s fundamental, why would it be stuck in the middle of another section? If they’re being taught both derivatives and integrals already, isn’t it obvious that they’re related as being part of calculus? It’s just another opportunity to zone out in math class.

That’s certainly how I remember feeling about it, and I was one of the best students in my high school math class. I didn’t care. I was too busy memorizing limits, differentiation, and integration tricks. When test day came, I had a bunch of formulae and equations in my head that I plugged in appropriately, and I scored the highest mark possible on both my AP Calculus exams.

I was so far into this mindset of “memorize techniques to score well on exams” that I don’t think a single lesson would have done anything, to be honest. To have really learned calculus well enough, I think I’d need to have been taught to appreciate the practical concerns that drove the development of calculus, as well as the understand the theoretical underpinnings of calculus.

If I were to teach a calculus class myself, those would actually be the main things I’d focus on. Memorization of formulae and techniques should be a small part of a calculus class, not the majority of it. Sure, it can make parts of calculus easier, but overreliance on it gives a “Chinese room” understanding of calculus. The student sees a problem and is able to put out the correct answer, but doesn’t really understand why the answer is correct. More importantly, if they see a similar problem formulated differently, they’re unable to solve it.

To handle the motivation portion, I’d start by introducing the practical concerns that drove the development of calculus. This could literally be a lab portion of the class. Bring out trebuchets for derivatives or make students try to fill a warehouse for integrals. Show them why people ever cared to calculate these values. But, most importantly, have them try to solve the practical problems first with just geometry and algebra, so they can appreciate the usefulness of calculus in the same way their academic ancestors did. These labs should motivate learning calculus, not illustrate it.

The theoretical underpinnings of calculus would, admittedly, be trickier. Having taught a lot of adults math, I am confident in saying that most students have an ok understanding of geometry and a poor understanding of algebra. This is because calculus is not the only math course that’s structured badly. Precalculus is, I dare say, even worse.

For those who aren’t familiar, precalculus is the American educational system’s way of bridging the gap between algebra and calculus. Instead of focusing on a deeper understanding of algebraic proofs and fundamentals, though, it’s a weird grab bag of introductions to some math that students are probably unfamiliar. So, a precalculus course introduces functions, polynomials, exponents, logarithms, trigonometry, and polar coordinates, in sequence, one after the other. And because the course is explicitly not about teaching calculus, the only clue that students get as to why they’re taught this is “you’ll need it later”.

Here’s the table of contents of a precalculus textbook from McGrawHill. Again, this is pretty similar to the one I used. Imagine moving from polynomials, to logarithms, to trigonometry in the space of 3 chapters. I literally do not think I could imagine 3 mathematical topics that have less in common.

What kind of student could learn 6 disparate pieces of math, one after another, in a mostly disconnected fashion, then start the next year and apply them all to calculus? Well, in my experience, pretty much none of them. They fail to learn functions, get confused why they’re learning the rest of it, and then start calculus having forgotten algebra but still not understanding precalculus.

A calculus course, then, has to take into account that a lot of students won’t have the right background for it. In that case, I’d say the entire first semester or so should be dedicated to a proper theoretical background for calculus: finding areas with geometry, functions, algebraic proofs, and at last the Cartesian coordinate plane to unite algebra and geometry. This would provide a clear theoretical transition into calculus (and theoretical motivation for calculus with proper foreshadowing). 

Then derivatives and integrals can be covered in the second semester, with add-ons like exponents and logarithms, polar coordinates, and infinite series saved for either a follow-up course or for advanced students. The actual calculus section of the course would likely be similar to this excellent MIT textbook, actually.

A calculus course taught this way would, hopefully, make deep sense to the students. They’d begin with developing the background to calculus, ending the first semester with the same background that Newton and Leibniz had when they developed calculus.

Then the second semester could provide that shocking, “Aha!” moment. Students would not just get the knowledge of Newton and Leibniz, but get some small sense of what it must have been like to be them as they made their groundbreaking discoveries.

It took me a long time to appreciate math, and calculus longest of all. I only realized in college that I had been cheated out of a deep understanding of math and given a shallow collection of tricks instead. The education system focuses so much on the utility of math, even when it’s a reach (see the derivatives of trigonometric functions). It should focus on the beauty, the shock, and the awe instead.

Why most intro philosophy courses feel useless and how to fix them

Introduction to philosophy tends to be a useless class. At its best, it tends to feel like a drier version of the stuff you argue about with your friends while high. At its worst, it feels like listening to high people argue while you’re sober. Neither one makes you feel like you’ve accomplished that much more than high talk.

These problems are structural. It’s not just how the classes are taught, but what’s taught in the classes. For instance, take a look at syllabus to this Coursera course, which actually receives great reviews.

Syllabus to Introduction to Philosophy

  • What is Philosophy?
  • Morality: Objective, Relative or Emotive?
  • What is Knowledge? And Do We Have Any?
  • Do We Have an Obligation to Obey the Law?
  • Should You Believe What You Hear?
  • Minds, Brains and Computers
  • Are Scientific Theories True?
  • Do We Have Free Will and Does It Matter?
  • Time Travel and Philosophy

Judging by the reviews (4.6 stars from 3,941 ratings!) , this is probably a fun class. But this class, without a doubt, is pretty useless.

How do I know that? Well, because literally zero of the questions are answered. It says so in the syllabus. For example, this is how they discuss “Morality: Objective, Relative or Emotive?”:

We all live with some sense of what is good or bad, some feelings about which ways of conducting ourselves are better or worse. But what is the status of these moral beliefs, senses, or feelings? Should we think of them as reflecting hard, objective facts about our world, of the sort that scientists could uncover and study? Or should we think of moral judgements as mere expressions of personal or cultural preferences? In this module we’ll survey some of the different options that are available when we’re thinking about these issues, and the problems and prospects for each.

It’s both sides-ism. It literally guarantees you that you won’t actually get an answer to the question. The best you can get is “options”. This both sides-ism is even worse for those questions that obviously have right answers. Yes, we have knowledge. Yes, we have an obligation to obey the law most of the time. Yes, most scientific theories are true.

Now, it’s possible to cleverly argue these topics using arcane definitions to make a surprisingly compelling case for the other side. That can be fun for a bit. But introduction to philosophy should be about providing clear answers, not confusing options.

Let me make my point with an analogy. Imagine going into an introductory astronomy course, knowing very little about astronomy besides common knowledge. The topic of the first lesson: “Does the Earth revolve around the sun?” The professor then would present compelling arguments for whether or not the Earth revolves around the sun, without concluding for either side. 

If the class was taught well, a student’s takeaway might be something like, “There are very compelling arguments for both sides of whether or not the Earth revolves around the sun.” The student would probably still assume that the Earth revolved around the sun, but assume that knowledge was on a shaky foundation.

This would make for a bad astronomy class, which is why it’s not done. But this is done all the time in philosophy. In fact, most of the readers of this essay, if they only have a surface level impression of philosophy, probably assume philosophy is about continually arguing questions without ever coming to conclusions.

That’s not what philosophy is. At least, that’s not what most of it is. Philosophy is, in fact, the foundation in how to think and how to evaluate questions. Every single academic subject started or was heavily influenced by philosophy, and philosophy can still contribute to all of them.

Making philosophy feel useful again

Introduction to philosophy, properly taught, should be like teaching grammar. Students can think and analyze without philosophy, just like they can speak and write without knowing the formal rules of grammar. But philosophy should provide them with the rigorous framework to think more precisely and, in turn, analyze subjects and ideas in a way that they could not before. Philosophy should change the way a student thinks as irrevocably as knowing when exactly to use a comma changes the way a student writes.

In order for that to be the case, though, introduction to philosophy has to be a different class. It has to be a class with clear answers and clear takeaways, rather than a class with fun questions and arcane discussions. It has to be explicitly didactic: philosophy has things to teach, not just things to discuss.

If I were to design a philosophy course, that’s what I’d do. I’d make a course that assumed no background in philosophy, took a student through the most important, life-changing ideas in philosophy, and gave clear, actionable ways to change how a student should think and live. At the end of the course, I’d want a student to feel confident taking everything I taught out of the classroom to affect the rest of their lives.

And you know what? That’s exactly what I did when I made my own introduction to philosophy course.

Let me give some background. Although I’ve always loved self-studying philosophy, I only took two philosophy courses in college, and I got a B in one and a B+ in the other. The first featured a large, soft man who droned about theory of knowledge while covered in chalk dust. The second featured a hyperactive woman who attempted to engage the class in discussion about our ethical intuitions, while I attempted to engage my intuitions about Flash games (laptops are always a danger in boring classes). 

After college, however, a happenstance meeting got me a job teaching a massive online philosophy course to a Chinese audience. This was a difficult job: I was teaching philosophy in these students’ second language, these students were paying several hundred dollars for the course, and they had no reason to be there besides their own interest (not even grades). The students could drop my class at any time and get a refund.

In fact, because I got realtime viewership numbers, I could literally see people drop my class anytime I ventured into a boring subject. It was terrifying and exhilarating. I lasted 1.5 years in this company (until they switched to Chinese language instruction only), taught around 5000 students total, and went through 3 complete overhauls of my syllabus.

 By the time of my last overhaul, I had decided that the guiding principle of my class would be as I wrote above: a backbone to the rest of your intellectual life. Specifically, I had 3 sections to my course: how to think, how to live, and how things should work.

My own course: how to think, how to live, and how things should work

I chose those 3 sections because I felt like those were the most important things that philosophy had to offer, and what had the greatest impact on my own life.

“How to think” directly related to both the academic subjects that my students (most of whom were in college or right out of college) were familiar with, and the later sections of my course. As I described it to my students, the takeaways from “how to think” served as a toolbox. 

The power and the limitations of deduction, induction, and abduction apply to everything built on them, which basically encompasses all academic knowledge. It’s like starting a boxing class with how to throw a punch: a pretty reasonable place to start.

“How to live” was what it sounded like: how to live. I didn’t want to simply call it ethics, as I wanted to make it clear to my students that they should take these lessons out of the classroom. After I described the ethical philosophies to my students, we evaluated them both logically, using the tools from “how to think”, and emotionally, seeing if they resonated.

 If the ethical philosophy was both logical and emotionally resonant, I told my students to be open to changing how they lived. All philosophy should be able to be taken out of the classroom, especially something so near to life as ethics. I’m just as horrified by a professor of ethics who goes home and behaves unethically as I would be by a professor of virology who goes home and write anti-vax screeds.

Finally, “how things should work” was my brief crash course in political philosophy. Political philosophy is a bit hard to teach because the parts that are actually practicable tend to be sequestered off into political science. It’s also hard to teach because, frankly, college students don’t have a lot of political pull anywhere, and China least of all.

So, instead, I taught my students political philosophy in the hopes that they could take it out of the classroom one day. As we discussed, even if their ultimate position is only that of a midlevel bureaucrat, they still will be able to effect change. In our final lesson, actually, we talked about the Holocaust in this respect. The big decisions came from the leaders, but the effectiveness of the extermination came down to the decisions of ordinary men.

Above all, I focused in each section on how what they learned could change their thinking, their lives, and their actions. To do so, I needed to focus on takeaways: what should my students be taking away from the body and life of Socrates, Wittgenstein, or Rawls? As an evidently mediocre philosophy student myself, I am all too aware that asking students to take away an entire lecture is frankly unreasonable. I mean, I taught the course and I have trouble remembering the entire lectures a few years later.

So, I focused on key phrases to repeat over and over again, philosophers boiled down to their essence. For Aristotle’s deduction: “it’s possible to use deduction to ‘understand’ everything, but your premises need to carefully vetted or your understanding will bear no relation to reality”. For Peirce’s pragmatism: “your definitions need to be testable or they need to be reframed to be so”. I ended each lecture by forcing my students to recall the takeaways, so that the takeaways would be the last thing they remembered as they left.

I also distributed mind maps, showing how each philosopher built on the next. The mind maps were distributed at the beginning as empty, then filled in with each subsequent lecture and its takeaways. Not only did this give students a greater understanding of how each philosopher fit into the course, but it gave them a clear sense of progress to see their map filled in.

Philosophy is one of humanity’s greatest achievements. The fact that it’s been relegated to just a collection of neat arguments in a classroom is a tragedy. We live in an age of faulty arguments, misleading news, and a seeming abandonment of even a pretense of ethics. Philosophy can change the way people see the world, if only it’s taught that way.

I’ve discussed the details of how I approached my class below, and also left a link for the course materials I developed. I haven’t touched the material in a couple years, but it’s my hope that others will be able to use it to develop similar philosophy courses. 

A Google Drive link for the course materials I developed

The details of how I structured my course

How to think and takeaways

In “how to think”, we first discussed arguments and their limitations. Arguments are the language of philosophy (and really the language of academia). Being unable to form a philosophical argument and attempting philosophy is like attempting to study math without forming equations. You can appreciate it from the outside, but you won’t learn anything. 

To introduce arguments, I used Socrates, of course. His arguments are fun and counterintuitive, but they also are very clear examples of the importance of analogy and definitions in philosophical arguments. Socratic arguments always started by careful definitions, and always proceeded by analogy. This same structure is omnipresent in philosophy today, and extends to other fields (e.g. legal studies) as well.

To discuss the limitations of this approach I brought in Charles Sanders Peirce’s pragmatic critique of philosophical definitions, and later Wittgenstein’s critique of “language games”, which can easily be extended to analogies. As should probably be clear by my bringing in 19th and 20th century philosophers, I wasn’t aiming to give students a chronological understanding of how argumentation developed in philosophy. I was aiming to give them a tool (argumentation), and show them its uses and limitations.

From there I went onto how to understand new things through deduction and induction. It is easy for philosophy courses, at this point, to discuss deduction, discuss induction, and then discuss why each is unreliable. This leaves the student with the following takeaway: there are only two ways to know anything, and both are wrong. Therefore, it’s impossible to know anything. Given that this is obviously not the case, philosophy is dumb.

I really, really wanted to avoid that sort of takeaway. Instead, I again wanted to give students a sense of deduction and induction as tools with limitations. So I started off students with Aristotle and Bacon as two believers in the absolute power of deduction and induction, respectively. I took care to make sure students knew that there were flaws in what they believed, but I also wanted students to respect what they were trying to do

For deduction, I then proceeded to use Cartesian skepticism to show the limitations of deduction, and then Kantian skepticism to show the limitations of deduction even beyond that. This reinforced the lesson I taught with Socratic arguments: deduction is powerful, but the premises are incredibly important. Aristotle never went far enough with questioning his premises, which is why so much of his reasoning was ultimately faulty.

Discussing the limits of induction was more interesting. From the Bacon lesson, my students understood that induction was omnipresent in scientific and everyday reasoning. It obviously works. So, Hume’s critique of induction is all the more surprising for its seeming imperviousness. Finally, bringing in Popper to help resolve some of those tensions was a natural conclusion to how to think.

At the end of this section (which took 10 classes, 2 hours each), my students had learned fundamentals of philosophical reasoning and its limitations. My students were prepared to start to apply this reasoning in their own academic and personal lives. They were also prepared to think critically about our next sections, how to live and how things should work.

They did not come away thinking that there were no answers in philosophy. They didn’t inherently distrust all philosophical answers, or think that philosophy was useless. It’s possible to understand the flaws in something without thinking it needs to be thrown out altogether. That was the line I attempted to walk.

How to live and takeaways

Once I finished teaching my students how to think philosophically, I embarked on telling my students how philosophers thought they should live. My theme for this segment was a quote I repeated throughout, from Rilke, “For here there is no place that does not see you. You must change your life.”

In other words, I wanted to introduce my students to philosophy that, if they accepted it, would change the way they chose to live their lives. Ethical philosophy now is often treated like the rest of philosophy, something to argue about but not something to change yourself over. In fact, surveys show that professors of ethics are no more ethical than the average person.

This is a damn shame. It doesn’t have to be this way, and it wasn’t always this way. In fact, it’s not even this way for philosophy outside of the academy today. I’ve personally been very affected by my study of existentialism and utilitarianism, and I know stoicism has been very impactful for many of my contemporaries.

That’s the experience I wanted for my students. I wanted them to engage critically with some major ethical philosophies. If they agreed with those ethical philosophies, I wanted them to be open to changing the way they acted. In fact, I specifically asked them to do so.

The ethical philosophers I covered were the ones that I felt were most interesting to engage with, and the most impactful to me and to thinkers I’ve respected.

First, I covered Stoicism. I asked my students to consider the somewhat questionable philosophical basis for it (seriously, it’s really weird if you look it up), but also consider the incredibly inspiring rhetoric for it. If Hume is right, and thoughts come from what you feel and are only justified by logic, then the prospect of controlling emotions is incredibly appealing. Even if he isn’t, any philosophy that can inspire both slaves and emperors to try to master themselves is worth knowing. Plus, the chance to quote Epictetus is hard to pass up.

I then covered Kant, as an almost polar opposite to Stoicism. Kant’s categorical imperative ethics is well reasoned and dry. You can reason through it logically and it’s interesting to argue about, but it’s about as far away from inspiring as you can get. Even the core idea: “Act only according to that maxim whereby you can, at the same time, will that it should become a universal law,” is uninspiring, and it’s very hard to imagine acting as if everything you did was something everyone else should do. As I asked my students, how do you decide who gets the crab legs at a buffet? But my students needed to decide for themselves: did they want to follow a logical system of ethics, or an inspiring one?

We then covered utilitarianism. This, as I’ve mentioned is something I’m biased towards. My study of utilitarianism has changed my life. I donate far more to charity than most anyone I know because of their arguments: keeping a hundred dollars per month more or less for me does not affect my life in the slightest, but it can make an incredible impact on someone less fortunate.

 I presented two sides to utilitarianism: the reasoned, calm utilitarianism of Bentham, and the radical demands of Peter Singer. For Bentham, I asked my students to consider how they might consider utilitarianism in their institutions: can they really say if their institutions maximize pleasure and minimize pain? For Singer, I asked my students to consider utilitarianism in their lives: why didn’t they donate more to charity? 

What I wanted my students to think about, more than anything, was how and if they should change the way they live. Ethical philosophy, as it was taught at Princeton, was largely designed to be left in the classroom (with the noted exception of Peter Singer’s class). Ethical philosophers today, having been steeped in this culture, likewise leave their ethics in their offices when they go home for the night. To me, that’s as silly as a physics professor going home and practicing astrology: if it’s not worth taking out of the classroom, it’s not worth putting into the classroom.

Finally, we covered existentialism.I knew my students, being in their late teens and early twenties, would fall in love with the existentialists. It’s hard not to. At that age, questions like the purpose of life and how to find a meaningful job start to resonate as students start running into those issues in their own life. The existentialists were the only ones to meaningfully grapple with this.

My students came into this section with the tools to understand ethical philosophies. They came out of it with ideas that could alter the course of their lives, if they let them. That, to my mind, is what philosophy should be. Of course, we weren’t done yet. Now that they had frameworks to think about how their lives should be, I wanted them to think about how institutions should be.

How things should work and takeaways

Teaching political philosophy to Chinese students is interesting and complicated, but not necessarily for the reasons you’d expect. I wanted to teach my students the foundations of liberalism, and, when I first taught the coure, I naively thought that I’d be starting from ground zero. I wasn’t. In fact, as I was informed, Chinese students go over Locke in high school, and often John Stuart Mill in college. They’re just thoroughly encouraged to keep that in the classroom.

So, my task wasn’t to introduce students to the foundations of liberalism. My task turned out to be the same as the impetus for this course: to make political philosophy relevant. 

This is actually tough. Almost all political philosophy is, frankly, so abstract to be useless. While I was teaching the course, for instance, Donald Trump was rapidly ascending to the Presidency (and became President right before one of my classes, actually). Locke didn’t have a ton to say about that sort of thing.

But, instead of avoiding the tension in my attempt to make political philosophy relevant, I tried my best to exploit it. I roughly divided up the section into practical political philosophy and idealistic political philosophy. Plato and Locke were idealists, Machiavelli, the Federalists, and Hannah Arendt were practical.

When I discussed Plato and Locke, I wanted to discuss their ideas while making it clear they had zero idea how to bring them about. Plato, for his Republic, needed to lie to everyone about a massive eugenics policy. Locke, for his liberal ideals, came up with an idea of property that was guaranteed to please nobody with property. They’re nice ideas (ish), but their most profound impacts are just how people have used them as post-hoc justifications for existing ideals (i.e. the Americans with Locke’s natural rights).

I wanted my students to understand how Machiavelli exploited the nitty gritty of how ruling actually worked, and did so with a ton of examples (inductive reasoning). Even in his “idealistic” work, Discourses on Livy, he wrote his ideals with a detailed knowledge of what did and did not work in the kingdoms of his time.

For the Federalists, I discussed similarly how much more involved they were with property. The Federalists listed Locke as an influence, but they actually had to build a country. They wrote pages upon pages of justifications and details about taxes, because they knew the practicalities of “giving up a little property to secure the rest of it” often led to bloodshed.

Finally, I ended the section with a discussion of Hannah Arendt’s banality of evil. In a time of increasing authoritarianism around the world, I wanted my students to be aware of the parts of political philosophy that would immediately impact them. They were likely not to be rulers or princes, but they were likely to be asked to participate in an evil system if they entered politics (especially in the China of today). I wanted them to be acutely aware of the bureaucracy that made evil governments possible, and the mindset that could stop them.

My political philosophy section ended with the takeaways that politics can be analyzed with the same thinking tools as the rest of philosophy, and weighed with the same ethics of how to live. The minutiae are complicated, but it is not a new world.

Final takeaways

In the end, the fact that nothing is a new world were my intended takeaways from the entire course. Philosophy underpins everything. It is the grammar of thinking. Scientific experiments, legalistic arguments, detailed historical narratives: all of these methods of making sense of the world have their roots in philosophy and can be analyzed philosophically.

And, if everything can be analyzed philosophically, then you might as well start with your life and the society you live in. It’s not enough to just analyze, though. Philosophy should not be something to bring out in the classroom and then put away when you come home. If the way you live your life is philosophically wanting, change it. If your society is on the wrong course, fix it, even if you can only fix it a little.

There’s nothing worse in education than lessons that have no impact on the student. Likewise, there is no higher ideal for education than to permanently change the way a student evaluates the world. The classroom should not simply be a place of empty rhetoric or even entertainment. To paraphrase Rilke, “For there there should be no place that does not see you. You must change your life.”

[Once again, Google Drive link for all my course materials].

Lessons in business from the golden age of advertising

I previously wrote a post on lessons in marketing from the golden age of advertising in early 20th century America, which I think went pretty well. Unfortunately (but fortunately), there are more great stories from the admen than can be fit in such a restrictive format.

So, here’s my attempt at relaying them. For this post, I relied entirely on The Man Who Sold America, a biography of Albert Lasker from Cruikshank and Shultz, available at fine retailers near you.

If you don’t know how to offer something, ask for something instead

Claude Hopkins was one of the great advertising geniuses of his day. Unfortunately, he was somewhat promiscuous with how he lent his advertising genius, and ended up making a tremendous success out of “Liquozone”, which purported to be a germicide made out of liquid oxygen.

When muckrakers revealed that Liquozone was not pure oxygen but instead just water, Claude Hopkins was disgraced. This left him unhappy and also literally a millionaire.

Albert Lasker wanted to offer Hopkins a job at Lord & Thomas, but didn’t know how to go about doing so (as obviously money wasn’t going to be enough). So he asked around, and found out from a mutual friend that Hopkins was quiet, sensitive, and stingy.

So Lasker came upon a solution. He found out that Hopkins had been reluctant to buy his wife a new electric automobile, as he thought they were too expensive. He arranged a lunch with Hopkins, and showed him a contract from Van Camp for $400,000 contingent on satisfactory copy.

He told Hopkins that he needed his help for the contract, as the copy that he had received from his employees was terrible. If Hopkins would agree to help him, Lasker would buy his wife an electric car as thanks. Hopkins agreed.

The rest was history. Lasker knew he couldn’t offer Hopkins anything he didn’t already have. The only thing he could do was ask.

Experts get clients

Any service business is perpetually concerned with how to get new clients. Advertising is no exception.

One of the best ways to get clients is to be seen as an expert in the field. In the age of the Internet, the best way to do so is to publish a blog, vlog, or Twitter.

Back in the golden age of advertising, it was not quite so easy. So, instead, Lasker put out an ad announcing the creation of an “advertising advisory board”. The ad read: Here we decide what is possible and what is impossible, so far as men can. This advice is free. We invite you to submit your problems. Get the combined judgment of these able men on your article and its possibilities. Tell them what you desire, and let them tell you if it can probably be accomplished.

Of course, the advisory board was entirely made up of Lord & Thomas employees. But this ad worked: they got hundreds of inquiries, rejected the 95% they didn’t want, and took the top 5% as clients.

How to end a partnership with everyone happy-ish

Lasker ended quite a few partnerships over his business career, and he always did so in the same way.

He’d tell his partner, “I’ll buy out your share for 2x, or you can buy out my share for x.”

While it’d still be clear that Lasker wanted to break up, at least people wouldn’t be quite so unhappy about it.

Using a partner to double-team a client

Lasker and Hopkins made a great team. They were excellent at putting the razzle-dazzle on a client.

This started during the introduction. When a Lord & Thomas solicitor would first introduce themselves to a client, they’d speak glowingly of the genius of Albert Lasker. If the client visited the office, Lasker would speak glowingly of the wizardry of Hopkins. By the time the client was pitched by Hopkins, they’d feel like they were getting pitched by the god of marketing himself.

This double-teaming would continue during the pitch. Hopkins would pitch the campaign, and Lasker would remain quiet. If the client disagreed with any part of the pitch, Lasker would automatically side with the client and ask Hopkins to argue his case.

Then, if Lasker actually agreed with the client, he’d say so, and Hopkins would back down. On the other hand, if Lasker actually agreed with Hopkins, Lasker would turn to the client and say, “Well, I guess we’re both wrong.”

This way, the client never felt like they were being sold. Instead, they felt like it was a collaborative process by really smart people who just wanted the best for their product.

Using the techniques of the Mad Men to market a SaaS

I love reading about advertising in the early 20th century. It’s one of my favorite subjects, and it’s way too often neglected by the Internet at large.

Part of it is just a fascination with the ambition of those early admen. It was truly fitting for that age of great works. While bridges and skyscrapers were being built at breakneck speed (sometime literally), the admen were similarly constructing cultural edifices out of whole cloth, deciding almost singlehandedly what constituted a proper lunch, marriage proposal, or even morning ritual.

So much of what’s “normal” for us today was decided by executives on Madison Avenue 70 or 80 years ago that looking at that history is like getting a frontrow seat at the creation of the modern world.

Another part of my fascination, however, is more practical. As I’ve studied these admen, I’ve found that their struggles and techniques for overcoming them have a resonance with my attempts to sell on the Internet today (i.e. my flashcard app for tutors and test-takers). 

Early 20th century American capitalism was a no-holds-barred affair. Regulations were weak and a lot of the legal protections we take for granted today were nonexistent. Anyone could make any claim about their product, everyone was competing for everyone’s attention, and consumers weren’t particularly loyal to one brand or another. If you wanted someone to buy your soap, cereal, or toothpaste, you had to go out there and sell them on it.

Kind of similar to the Internet today, no? My flashcard app is, likewise, one of many. My app could easily get copied, slandered, or slated, and there’s not a lot I can do besides compete even more fiercely. I desperately want customers for it, but I know I will have to work incredibly hard for them.

The admen that survived and thrived in this environment were tough, resourceful, and occasionally unethical. More than anything, though, they were systematic, relentlessly experimenting to find effective marketing and copywriting techniques.

Ironically, just as their advertising campaigns passed into culture, so too have their innovative advertising techniques become part of the advertising landscape. It’s easy to not realize how innovative these techniques were until you look at the before and after, like how a plain description of pork and beans became a prime example of need-focused marketing.

This Van Camp ad is from 1897. Notice the focus: it’s just trying to get you to remember the name of the product, and then essentially bribing you to participate in a contest. The picture has almost nothing to do with what they’re trying to sell.
This Van Camp ad is from 1937. Look at the difference! The headline is telling you why Van Camp’s Pork and Beans will help you, the housewife: it’s a timesaver during the holidays. The picture is still eye-catching, but it relates directly to what they’re trying to sell.

Like most of the readers of this essay, though, I never actually learned this stuff. I had to learn all my marketing the hard way, by failing to sell things. Finding out the actual stories and laws of “scientific advertising” (itself a clever marketing phrase from that era) has made a huge difference in my ability to market myself and my products.

This is doubly so because my advertising looks a lot more like these old magazine ads than like modern television or magazine advertising. I employ hero images, headlines, and long copy to convince people to buy my products. I do not have the funds or resources to run a branding campaign, so these direct response campaigns are my main focus.

I’d like to share with you all the lessons that I learned from these early “Mad Men”. Hopefully, it’ll prove as useful to your marketing as it was to me.

Before I start, this piece relies heavily on the work of Albert Lasker and his crew at Lord & Thomas: Claude Hopkins, John E. Kennedy, and the like. This is not only because these men were behind some of the greatest advertising successes of their age (Sunkist, Kotex, Puffed Wheat, the election of Warren G. Harding), but also convenience: they wrote a lot about what they did. In fact, Claude Hopkins literally wrote a book called Scientific Advertising, which I relied on for this piece along with Cruikshank’s and Schultz’s biography of Albert Lasker The Man Who Sold America.

Lesson 1: Measure campaigns (and measure sales especially).

The most important part of direct response copywriting is measuring the direct responses. Ideally, you measure all the way from first contact to point of sale.

For the early admen, this was tough. When they were just putting ads in magazines, they had literally no idea which ads were working and which ones weren’t. That’s why they made really heavy use of coupons. 

For instance, in their campaign to turn Palmolive into the international juggernaut that it still is, they gave out coupons for 10 cent cakes of Palmolive. From there, they could measure the cost of the ad vs. the revenue. For instance, in Cleveland, their coupons cost $2000, their ads cost $1000, but sales increased from a base rate of $3000 per year to $20,000 per year after the campaign. Given that their ad campaign only ran in Cleveland and they had no other ads running at that time, they could tell that their ad campaign returned $14,000 on top of what they would have earned without the campaign (for details about this incredibly clever ad campaign, see footnote [1]).

I can’t find the original Palmolive ad, but this is an ad for Pepsodent, another big Lord & Thomas success. Notice the free 10 day tube: they’d measure the number of responses to the ad to see the success of their copywriting. Given that this was apparently in the Canadian Home Journal, I’m guessing this was a test campaign in Canada to see if this was an effective ad for the region.

Measuring is way easier in the Internet era, of course. With Google AdWords, segmentation tools, and Google Analytics it’s easy to see the result of a campaign. It’s shocking, then, that more Internet companies don’t measure their advertising, or worse, claim not to care. As Claude Hopkins would put it, “The only purpose of advertising is to make sales. It is profitable or unprofitable according to its actual sales.”

For my app, I care most about the funnel: from first click on my website, to trying out the app, to purchasing. Ironically, articles like this one aren’t actually great from a marketing standpoint (although good investments in SEO), as the people who will read this article are almost certainly not the same people who would purchase a flashcard app. Hopkins would accuse me of being more interested in artistry than sales, which he would be correct about. Oh, well.

Lesson 2: Samples and contests work, but only if people have to work for them

The admen at Lord and Thomas loved their free samples, coupons, and contests. However, they also quickly learned that you need to make people work for them.

For example, when they were first advertising puffed rice and puffed wheat (with the immortal tagline “Food shot from guns”), they made the mistake of distributing samples “promiscuously, like waifs”, which were quickly discarded by the consumers. Similarly, they mistakenly attempted to offer puffed wheat for free to anybody who bought puffed rice. As Lasker later put it, “it is just as hard to sell at a half price as at a full price to people who are not converted.” 

Instead, the effective method was to make people work for it. If they wanted a free sample, they had to write a letter. If they wanted to guess the weight of the world’s biggest cake and win a prize, they had to buy a tin of Cotosuet, the lard substitute the cake was made with.

This is a Sunkist orange ad from their “Drink an Orange” campaign. Lord & Thomas was given the task of promoting California Fruit Growers Exchange oranges, which they did incredibly well under the brand name Sunkist. One of their ways was by promoting orange juice (which took a lot of oranges to make). They literally manufactured and sold extractors for this promotion, making a small profit on the extractors and a large profit on the oranges.

It’s a similar story today. Many websites do giveaways, especially venture-backed ones. It’s tempting as a way to juice sales or responses. But, it’s important to make your consumer work for the giveaway. If they don’t believe in the product enough to work for the giveaway, it’s a waste of time to give it to them.

This is something I’ve noticed as well. Whenever I just give away something (like a book, a video, or advice), I get poor quality responses and a lot of hassle. In other words, people treat the giveaway as worthless. But, even if I just require people to agree to sign up for my newsletter, people end up treating the giveaway with more care.

Lesson 3: Your headline needs to capture the most exciting part of your product.

While before we were talking about general marketing advice, now we can talk about copywriting specifically (like for your landing page).

It’s very common for people marketing on the Internet to feel like their headline needs to either reflect exactly what their product is, or their aspirations. That’s how you get generic headlines like “Connect with friends and the world around you on Facebook. ” or “A new way to manage your work” (from

Part of the problem, of course, is that space on the Internet is unlimited, and products on the Internet don’t face the intense pressure from competition that the products these admen were advertising. If you’re advertising a bar of soap, your headline has to be really attention-grabbing, because honestly soaps are pretty similar. If you’re advertising Facebook, not only do you get a ridiculous amount of free advertising and brand awareness, you’re also advertising a uniquely sticky product that seems to sell itself pretty darn well.

The admen didn’t have that luxury. They spent countless hours debating on the perfect headline, including punctuation and capitalization. Their headlines were attention grabbers: they made you want to read the ad and, crucially, buy the product (or send off for the sample).

From 1912: the legendary “food shot from guns” headline. Claude Hopkins admitted it was stupid, but also recognized that it was instantly a classic. The other thing to note here is another Lord & Thomas staple: selling by creating a personality. Professor Anderson was just a technician working for Quaker Oats, but Hopkins made him a star.

So, they ended up with classic headlines like “A cow in your pantry” for evaporated milk, the aforementioned “food shot from guns” for puffed wheat and rice (it has to do with the manufacturing process), “Let this Machine do your Washing Free” [sic] for a hand-cranked washing machine, “Better than jewels — that schoolgirl complexion” for Palmolive “beauty soap”, and “Women like the convenience of Kotex” for the hitherto impossible to advertise for sanitary pads.

None of these were exact descriptions of was being sold, but they were exciting! Some of them were definitely oversold, like describing a hand-cranked washing machine like it was automatic, but they grabbed the reader’s attention and sold the product. The Palmolive headline is especially interesting: it doesn’t even sell the product, but the benefits of the product.

Similarly, if you want to sell products online, your headline needs to be something that captures the most exciting part of your product. I think Slack has an interesting take on this: “Slack replaces email inside your company.” For companies that have been drowning in email, it’s an appealing proposition.

Note that this headline does not actually describe what Slack does in any meaningful sense. This headline would also work for a personal whiteboard company (if they were really ambitious about the possibilities). But it addresses a specific problem and makes the reader want to find out more.

For my own app, I’ve decided on “Say goodbye to unproductive studying”. While it’s not quite as good as Lord & Thomas’s, it captures the benefits of my app pretty well, while still remaining intriguing.

Lesson 4: the core of copywriting is self-interest

This is something Claude Hopkins hammered home again and again. Copywriting (and marketing) should not be about winning awards or making art. At the end of the day, your copywriting is about communicating to the consumer why it benefits them to buy your product.

Too often copywriters think they have to win awards with their copywriting or entertain the reader. That’s nice, but that’s not what you’re aiming for. Your copywriting doesn’t need to be hilarious or heartwarming, it just needs to tell the reader why the product is to their benefit.

There’s another corollary to this, too: as long as you’re telling the reader as to why a product is to their benefit, it doesn’t matter how much you write. Again, going back to Claude Hopkins, “Some say, ‘Be but very brief. People will read but little.’ Would you tell that to a salesman?”

If you look at the early ads, like for Palmolive, it’s amazing how much text there is. This amount of text actually cost the advertisers more money, but they wrote as much as they needed to sell the product.

This ad is, of course, gorgeous, but it also has an incredible amount of text. It’d be easy to say, “Why would anyone want to read that much about soap?”, but these are the ads that took Palmolive to #1 in its category (and made the manufacturer change its name from BJ Johnson to Palmolive).

Nowadays, a lot of effective advertising has been replaced by video, which is even closer to salesmanship than longform text. Slack, for instance, puts a 2 and a half minute video on their homepage, along with around 100 words of text. If you look at someone like Tai Lopez, who is undoubtedly successful as a self-marketer (no comment on anything else), his videos selling his courses run over an hour in length.

There are still some practitioners of the art of the longform text sales page on the Internet, though. Brennan Dunn at Double Your Freelancing Rate does exactly that, as do a lot of the skeezy get-rich-quick guys who advertise on Facebook. I can’t say if their products perform as they promise, but their marketing obviously works, or else they wouldn’t keep running ads.

In your own marketing, don’t be afraid to write more and talk more. As long as your marketing keeps speaking to your customers’ self-interest (i.e. why your product will benefit them), they will keep reading. Or, as John E. Kennedy put it, “Copywriting is salesmanship in print.” As long as you’re still selling, keep talking.

My own landing page runs several hundred words in length. I’ve continually lengthened it over the months that it’s been up, and had good success with it. I’ve refined it based on the in-person communication I’ve had with people about my app, and what’s worked successfully in selling it. What works to sell in person also works to sell online.

Lesson 5: Marketing a product is about making the features seem extraordinary

There’s a classic Mad Men scene where Don Draper first figures out how to market Lucky Strike cigarettes: it’s toasted. The tobacco manufacturers are confused, because “everybody else’s tobacco is toasted”. Don Draper corrects him, “Everyone else’s tobacco is poisonous. Lucky Strike is toasted.”

This is an ad campaign that actually ran, masterminded by none other than Albert Lasker and his Lord & Thomas gang. The timeline in Mad Men is a bit messed up: that ad campaign was ran well before the cancer scare (it started in 1917), and wasn’t actually the most successful of the Lucky Strike campaigns (we’ll talk about that later).

But the core of the scene is correct. Making features seem extraordinary even when they were commonplace was a key technique of Lord & Thomas. For example, that’s exactly what Claude Hopkins did to sell Schlitz Beer. He created an advertising campaign called “poor beer vs. pure beer”, and claimed Schlitz’s “pure beer” was good for you because it was made with pure water, filtered air, and 4-times washed bottles. This was true of all beers, but as Hopkins put it, “Again and again I have told common facts, common to all makers in the line–too common to be told. But they have given the article first allied with them an exclusive and lasting prestige.” As a result, Schiltz went from 5th in sales to 2nd in sales in St. Louis, where the test campaign was run.

This is a later ad, from 1959. It relies much more heavily on color and images than the early copywriting. But it still uses the classic formula: take something ordinary “slow distilling”, and make it extraordinary. In case you haven’t guessed, “slow distilling” is very common among bourbons.

This technique is even more effective, however, when there is actually a cool new feature (and if you’re not willing to lie and say that a certain sort of beer is healthier for you). So, for instance, Lord & Thomas had a really hard time marketing Pepsodent, because it was pretty much identical to all other toothpastes. However, they had a lucky break when Pepsodent purchased the exclusive right to add sodium lauryl sulfate to their toothpaste, which is the ingredient that makes toothpaste foam.

So, how did Lord & Thomas capitalize on this lucky break? By calling out this new feature in the most ostentatious way possible. They couldn’t possibly call the new ingredient “Sodium Lauryl Sulfate”, so instead Lasker put out a challenge to Lord & Thomas employees: he wanted a 5 letter name with two consonants and three vowels. His employees came up with “irium”, which sounded suitably futuristic.

Then Lord & Thomas went buckwild. They rebranded Pepsodent as “Pepsodent with irium”. Having already been bankrolling Amos & Andy, the most popular show on radio, they flooded the airwaves with ads for new “Pepsodent with irium”. Pepsodent jumped to the top of toothpaste sales. And, to the delight of Albert Lasker, irium became so famous that the American Dental Association had to hire a receptionist specifically to talk to all the people who were calling in and asking about it.

In your own marketing, call out every single feature in your product and make it sound extraordinary. Describe it in loving detail and make up a name for it if you have to.

And remember: your customer is going to come to your product with a different view of what’s interesting than you are. My own product is a flashcard web app aimed at tutors and test-takers. Because it’s a web app, you can access it through your desktop or mobile browser. Duh, right?

Well, no. Pretty much every tutor I’ve talked to has been surprised and impressed to hear that. What is ordinary to me (and likely to you) is extraordinary to your customers. Treat it that way.

Lesson 6: The difference between a bug and a feature can be a matter of positioning

Sometimes bugs are just bugs. If you have an app that randomly shuts down at inopportune moments, that’s just something to be fixed.

However, sometimes bugs can be features for some users, as this immortal xkcd comic reminds us. In your marketing, you can make use of this.

This is what Lord & Thomas did when selling van Camp evaporated milk. They first got housewives to try it by advertising it as “a cow in your pantry”. However, they couldn’t get housewives to stick with it, because it tasted scalded (which, to be fair, it was). In other words, they didn’t have product-market fit, and advertising more was just going to burn through potential customers.

So they positioned it differently. They called it “sterilized milk”, and told housewives “you’ll know it’s genuine by its almond flavor”. The bug, bad taste, was transformed into a mark of cleanliness. In an era before strong food controls, that was a good marketing technique.

In your own marketing, you can do similar things. If you run a chat app, for instance, one of the main problems is that people feel such pressure to respond instantly that it totally distracts them from getting their work done. So, in your marketing, why not include a testimonial that suggests email is just too darn slow to have a proper conversation?

In other words, if your weaknesses are impossible to hide, might as well turn them into a strength.

Lesson 7: Fit your marketing in with existing habits

One of the first things Albert Lasker realized while helping to invent the modern marketing industry was that changing people’s habits with marketing was almost never profitable. The trouble was that you’d have to spend a lot of effort changing people’s habits, and then spend even more effort getting them to buy your product.

The better way was to identify a habit people already had, and exploit and popularize it with your marketing. People don’t want to just buy your product, they want to use it. If you tell them that they can use it to help them do what they already want to do, they’re a lot more likely to buy it.

Lord & Thomas exploited this constantly when they were tasked to sell fruit. They would promote recipes with the fruit, ways to use the fruit, and even, in the case of Sunkist oranges, manufactured juicers that were easier to use than the ones on the market to get people to make more orange juice.

However, their biggest success was with Lucky Strike cigarettes. They were successful with the “It’s toasted” campaign, but they still weren’t top of the market. The top of the market came when Lasker heard that doctors were prescribing cigarettes as appetite suppressants. So, he came up with the campaign, “Reach for a Lucky instead of a sweet.” 

This was a classic case of exploiting existing habits. Lasker didn’t need to convince anyone that cigarettes worked as an appetite suppressant. He just needed to convince people to use a Lucky as a suppressant instead of another brand. This was so successful that sales increased by 8.3 billion units from 1928 to 1929, then reached 10 billion units in 1930.

In fact, this advertising campaign ended up being banned by the FTC because of complaints from candy manufacturers, so it was changed to just “Reach for a Lucky Instead”. At that point, the catchphrase was so ingrained that it still worked.

This ad campaign was so successful that it was banned. That is a heck of an ad campaign. Note how it still has “It’s Toasted” in the bottom right, and a sexy woman up top. Some advertising techniques don’t need to be explained.

The best products, likewise, are positioned to fit in with existing habits. Slack couldn’t work as an email killer if people hadn’t already been using Gchat. Excel couldn’t work as well as it did if people hadn’t already been familiar with spreadsheets, and especially with Lotus 1-2-3. Every wildly successful product has been built on helping people do what they’re already doing, but easier.

In your own marketing, find what people are already doing and convince them that your product can help them do it easier and better. Don’t try to make them do something that they’re not already inclined to do.

In my own marketing, I’ve relied heavily on the existing popularity of spaced repetition flashcards like Anki and Quizlet. I have to convince users that my app is better than those, but not that the app is worthwhile at all.


You don’t have to reinvent the wheel. Marketing and selling have existed for hundreds of years, while modern marketing has existed for dozens.

The products being sold change, but the selling remain the same. Convince your customer that your product will benefit them, and they will listen to you and buy from you.


[1] The Palmolive campaign was actually incredibly clever. They needed to get Palmolive distributed in drug stores as a “beauty soap”, so it wouldn’t be competing against normal soap in general stores. In order to do so, they employed a carrot and a stick. 

The carrot was that the copy of the ad claimed the drug stores were distributing essentially free samples of Palmolive (they paid 10 cents for them and the consumer was using a 10 cent coupon for them). However, in reality, the drug stores paid wholesale prices for the sample, and were reimbursed at retail prices by the manufacturer. The drug stores got the publicity of pretending to do a giveaway, while actually profiting.

The stick was that, before the coupons were sent out to consumers, they were sent out to all the drug stores in the area, along with a message that the coupons would surely be redeemed somewhere. The not so hidden message: if you don’t stock Palmolive yourself, be prepared to watch your customers form a line out the door of your competitiors.

Why introductory chemistry is boring: a long-term historical perspective

Looking for flashcards for chemistry? Try 21st Night! It allows you to create and share flashcards with anything on them: images, videos, equations, you name it. Plus, there’s a built-in to-do list!

We are better at chemistry now than at any other point in history. In 50 years, we will almost certainly be better at chemistry than we are now.

This statement above is one of the central dogmas of not only chemistry, but of science. In fact, it’s so central that it probably just seems factual. Of course we’re better now, right? We have the benefit of history!

It’s easy to forget or not realize how controversial this statement would have been even 400 years ago, though. Scholars then were raised in a culture that worshipped the achievements of their intellectual ancestors. Claiming that you knew more than Aristotle about science was equivalent to claiming you knew more than Paul about Christianity. I mean this literally. One of the reasons Galileo was famously imprisoned wasn’t simply for disagreeing with the church, but disagreeing with the words of the “Philosopher”, Aristotle. Aristotle, if you’re unfamiliar, lived approximately 2000 years before Galileo.

I am glad that we no longer live in that era. That was a worse time for science and a worse time for humanity. On a moral level, I abhor the idea that intellectual opinions can be censured by imprisonment or death. On an intellectual level, I think an obsession with the achievements of the past necessarily inhibits recognizing advancements in the present, and leads to stagnation.

But the pendulum has swung too far the other way. Specifically, I think the “obviousness” of the inevitability of scientific progress, even though it’s well-founded, tends to cloak that the progress of the associated scientific culture is by no means inevitable. For instance, there have been some recent great essays arguing that our research culture has regressed, at least judging by the number of people involved versus our rate of scientific advancements.

What I’d like to argue in this essay is that, similarly, the obviousness of progress in chemistry has masked the regression of the associated culture of teaching chemistry, especially general chemistry.

It’s regressed in the sense that students today come out of general chemistry with a mistaken view of what chemistry is, and lacking in practical chemistry skills. In the past, students have come out of general chemistry with a better idea of the principles of chemistry, and with greater practical skills.

A big reason as to why today’s students are mistaken in their view of chemistry can be seen in the texts we use. In modern chemistry textbooks, very few portions are justified by empirical evidence. They are justified by principles and deduction, and occasionally illustrated by real-world examples. But, it is incredibly rare to see citations from the original laboratory work that proved the theories.

Meanwhile, the practical side of chemistry (as well as the emphasis of the importance of it) gets let down by the hands-on portion of chemistry, laboratory work. Laboratory work is meant to be training in empiricism. However, this empiricism is only in theory. The laboratory work of introductory chemistry is closer to a cookbook than an experiment. Students are given incredibly precise instructions and told to find a certain result. If they fail to find the result, they are made to redo the “experiment”.

The textbook and laboratory work combined leave a general chemistry student today with two takeaway impressions:

1. Chemistry is a monumental edifice of theory and deduction, disconnected to empirical evidence.

2. When empirical evidence does conflict with chemical theory, the evidence must change.

These takeaway impressions are regressive. These would be the explicit lessons of Aristotelian chemistry, which distrusted experimentation except as illustration. When the ancient Greeks discussed chemistry in terms of the four elements of fire, air, water, and earth, they would illustrate the similarity of fire and air by discussing candles. However, it was literally anathema to the Greeks to disprove the theory of the elements by experiment.

So, if chemistry teaching has regressed, then we need to return it to its former glory by progressing it again, right?

Before our current shoddy state, chemistry teaching was much better at impressing upon students the empirical basis of chemistry. Even more importantly, they impressed upon students the importance of doing chemistry, rather than simply learning it and keeping your lecture notes in a notebook on a shelf. That is the standard of teaching I’d like to go back to, which I’ll explore more later on in this essay.

Before I get there, though, I want to explore how we’ve gotten to where we are, starting from the beginning of chemistry teaching. It would behoove us to be mindful of our history, especially as we’ve already started to repeat it.

But before even that, I wanted to get a few things out of the way.

First of all, my qualifications: I took chemistry in high school and college, both general and geochemistry (I was a geosciences major). I found general chemistry very boring, geochemistry less so. Since college, I’ve worked as a philosophy teacher and own a test-prep business. While I’m not the most qualified to discuss chemistry, I am very qualified to discuss education, and would like to frame this essay as such.

Second, I wanted to define chemistry as the science of chemicals, and the teaching of chemistry as teaching what to do with chemicals. The early years of chemistry had a lot of overlap with philosophy, and more recently chemistry has had a lot of overlap with physics. I’d rather not discuss those right now, so my historical tour is limited to chemicals only.

Finally, it can be a bit tough to get information on what the live teaching of chemistry was like (there weren’t reporters), so I have to rely a lot on textbooks and surviving notes.

With that in mind, I’d like to go back to the first chemistry textbook, and the first hints of tension between students learning chemistry as a theory vs. students learning chemistry as an empirical science. It was when the teaching of chemistry was still in progress…


The first chemistry textbook was, well, a weird one. 500 years ago, chemistry wasn’t a subject. Really, there weren’t many distinct subjects at all, as most things science-like were subsumed under broad category “natural philosophy”. 

This wasn’t just a nomenclature thing. Because so many subjects were under one umbrella, the techniques and even boundaries of chemistry were shared with a bunch of other subjects. So, we are forced to rely on the definition of chemistry mentioned previously, and define the first chemistry textbook as “the first book that systematically discussed and taught how to use chemicals”.

Using that definition, we find ourselves in the mid 1500s in Bavaria with Georgius Agricola’s De re metallica.

For those of you who are familiar with Latin, you might already see the problem. The title of the book translates to “On the Nature of Metals”, and it was mostly about the operation of mines. This included political instruction, like “Don’t start mines in authoritarian states”, and practical engineering details, like how to make sure your mineshafts don’t collapse.

Crucially for chemistry, though, it also included how to get your ores out of your minerals. Whenever a mineral is mined, it doesn’t come in a pure form. It almost always comes in the form of a weird rock.

So, how do you get usable minerals out of a weird rock? With chemistry! Or, as Agricola would put it, the manufacture and use of “juices”, like alum (a salt), “vitriol” (sulfuric acid), and sulfur. 

Agricola, in his instructional work on mining, therefore also taught chemistry. He didn’t teach chemistry in any traditional way. He was really just a reporter. He reported what the miners did (and included diagrams), without explanation or theory. He didn’t even use units. He would just say things like “leave vitriolous stones out in summer and winter” until they “get soft”.

From a section on how to make sulfuric acid, or “vitriol”. Note how the instructions are very general, and are almost certainly repeated directly from the miners that Agricola consulted.

But it is precisely this approach to teaching chemistry that made Agricola a seminal figure in the teaching of chemistry. Before Agricola, chemistry (or, to be precise, the parts of natural philosophy that dealt with chemicals) was entirely theoretical. Evidence had zero impact on theory, and theory was deductive and based on intuition and first principles.

That’s why Aristotle’s chemistry was based on the four elements of air, fire, water, and earth. It made sense to him, and then he could create all sorts of fun diagrams. Then students were taught to memorize them, and replicate them using deduction. By the time we got to chemical “equations” like dry + hot = fire, we’ve created an entire theory which doesn’t help us understand the actual reactions at all.

This is literally what Aristotelian chemical diagrams looked like, although this is prettier than most. They were useless. From

Agricola just ignored all of it, despite the fact that it was considered the foremost science of his time and something he almost certainly learned. There is nothing in Aristotle’s chemistry that could explain why the miners’ processes to create sulfuric acid worked. Any attempt to use Aristotle’s theory to do so would just be misleading.

So, if we think about what chemistry students would take away from this work, we can say their takeaway would be something like “In order to make use of chemicals, you have to follow a process that is known to work.”

It’s not a great foundation to science, but it’s a start. It’s a place to build from, and, more importantly, puts the use of chemicals at the forefront of chemistry. But it’s a long ways away from good chemistry teaching.

On a side note, I read De re metallica in an English translation, translated by none other than Herbert Hoover (former President of the US and mining engineer) and his wife. Hoover was an absolutely fascinating and incredibly impressive person, and it is really a pity that he screwed up handling the Great Depression so bad as President.


The next big innovation in chemistry teaching came from an unlikely source: yet more theorists. This time, not the Aristotelians, but the alchemists. Yes, those alchemists, of eternal life and transmutation into gold renown.

Alchemy had been around for a while, but, by the 1500s, it was the rage among intellectuals. Even Newton was obsessed with it. Now, the theoretical foundations of alchemy were total nonsense, much like the theoretical foundations of Aristotelian science. For example, Paracelsus, one of the founding fathers of Renaissance alchemy, believed that all diseases came from sulfur, salt, and mercury, which he “proved” by burning wood. It’s hard to imagine being more wrong, no?

But the alchemists, despite their foundations being flawed, did have a distinct advantage over the Aristotelians in both their chemistry and chemistry teaching. The alchemists were meticulous experimentalists, precise and excellent at record-keeping. That meticulousness and desire to repeat experiments is exactly why they were responsible for the next big innovation in chemistry teaching. They just needed to be bold enough to throw out theory entirely.

That is precisely what Jean Beguin, a French alchemist, did. In 1610, he published a chemistry textbook that represented the best of the alchemists, called the Tyrocinium Chymicum, or “tournament of chemicals”. It was a guide to manufacturing pharmaceuticals (there was a lot of overlap between medicine and alchemy, as the Paracelsus example shows), and it was a big step forwards in that it was the first chemistry textbook to actually use units (hurray!).

Well, Beguin didn’t exactly publish his textbook. He actually shared it privately with his students. Then it was pirated, became incredibly popular, and Beguin improved and republished it under the idea that, if it was going to be popular, it might as well be good. This does raise the alternative topic of the connection between piracy and improved scientific education, but that can be tabled for now.

Tyrocinium Chymicum represented the best of the alchemists because it is bold. It takes the experimentation of the alchemists and throws out the rest.

At this time, this was not an easy thing to do. Beguin emphasizes in his preface that he has nothing but respect for alchemy, and for Galen, Hippocrates, and the whole Greek tradition of science. He also emphasizes that, because it’s obvious he respects the whole tradition, he requests that other alchemists stop getting so upset with him. The subtext, of course, is that he would really appreciate especially if they wouldn’t somehow get the church involved, as he didn’t wish to be Galileo-d.

After emphasizing his respect for alchemy and the Greek tradition in his preface, Beguin proceeds to not discuss it for the rest of the book. The entire rest of the book is just incredibly detailed cataloguing of how to produce various pharmaceuticals. And, unlike Agricola, Beguin’s alchemical background let him be very, very detailed. In Tyrocinium, there are never any descriptions like “put rocks in a ditch and wait”. There are weights, instruments, and a lot of precise cataloguing of inputs and outputs.

An excerpt from Tyrocinium. Unfortunately, I could only find it in the original Latin. Even Google Translate has trouble, but you can pick out terms like “destillatio” (distilled), “alembici” (alembic), etc. This looks a lot more like chemistry instructions, no? From

The fact that Beguin was willing to throw out theory meant that this was actually a very effective way to teach how to use chemicals. For example, Beguin had no idea why soaking oak and filtering it produced a liquid that could dissolve pearls. That would require him knowing what tannic acid was, and then understanding how it related to pearls (calcium carbonate). If he tried to understand it using alchemical theory, he’d get confused.

But, he didn’t understand it, and he didn’t really care why he didn’t. He just described it. Exhaustively. In minute detail. Making sure not to forget literally any step even if he had no idea if it was important. 

He does, unfortunately, prescribe literally everything he creates for some ailment or another. So, for instance, he prescribes putting ammonium hydrosulfide directly into open wounds (pg 28), which, if you’re unfamiliar, is the active ingredient in stink bombs. He was a better chemist than physician.

Tyrocinium Chymicum is not a fun read, even when you compare it to other alchemy books. But it’s an incredibly important text in chemistry. It is a book by someone who experimented excessively with chemicals, laboriously detailed everything he found, and then instructed others so they could do the same, even under fear of censure/imprisonment/death. It also contains the glimpse of chemical equations in his excessive cataloguing, the first big step in useful chemical theory.

The equation for stibnite as found in Tyrocinium, translated and simplified by Patterson, a chemical historian in 1937. From sci-hub, page 45.

So, if we think back to the idea of takeaways by chemistry students, we might get something from Tyrocinium like, “Chemistry is about following steps meticulously, using the proper instruments, and exhaustively recording inputs and outputs”. Funnily enough, this isn’t so different from our chemistry labs today.

So, we’re at the 1600s, and we’ve already reached the state of laboratory work today in general chemistry. If that’s how far we’ve regressed, then it is worthwhile to see what we’ve given up.


Human beings are natural categorizers, and we’re good at it. It’s easy to mistake that with being good at science. Aristotle certainly did, categorizing elements as “hot” or “vaporous”, fooling himself and much of the Western world (up to the mid 1600s!) into not noticing that the categories said more about man’s perception than the elements themselves.

By the mid 1700s, categorization had come roaring back into chemistry teaching. This time, at least, it was backed up by the rigorous experimentation exemplified by Beguin. Unfortunately, it still came with the baggage of theory, although it fortunately no longer got in the way of experimentation quite so much. For instance, it was much more unlikely that you’d get thrown in jail for your experimental findings disagreeing with theory.

The year was 1766, and our hero Joseph Black, a brilliant scholar and professor in the midst of the Scottish Enlightenment. The Scottish Enlightenment, if you’re unfamiliar, was one of those strange moments in time when all of the world’s most influential thinkers were, for some reason, in one place. In this case, it was Edinburgh.

While we don’t have any textbooks from that time, we do have copies of lecture notes that people took in Black’s classes. Black was very popular as a lecturer, and people would actually attend just for fun. A big part of this was his innovative use of flashy experiments to illustrate his points. 

This is more important than it seems. Black’s lectures, by focusing around experiments, made it clear to the students that chemistry, in the end, was about describing reality. The lectures helped make sense of what the demonstrations showed. Students imbibed the importance of empirical evidence.

Another big innovation of Black’s time (if not Black himself) was, as previously mentioned, useful categorization. Agricola and Beguin threw out the categorization of Aristotle and Paracelsus (despite Beguin’s protests to the contrary) because they simply did not correspond to reality. It’s one thing to say burning wood shows the unification of sulfur, mercury, and salt, but if you’re actually an experimentalist, you know that’s dumb. However, once they threw it out, they didn’t have anything to replace it with.

Black and his brethren, along with further developing the rudimentary chemical equations originated by Beguin, developed a new type of categorization: affinity tables. They knew certain elements ended up bonded with other elements. They didn’t really know why, or have a good way of predicting future bonds besides noticing obvious similarities between elements.

So, that’s what they did. They grouped elements by similarities, and created tables of which elements bonded with which other elements. As a teaching tool, this was handy, in a sort of similar way to memorizing the periodic table of elements.

This is an example of one way that Joseph Black discussed “affinities”, by creating delibrately general chemical equations for his students to learn. In this, he shows how an acid solution of metallic salt (the left circle) would result in a solution of an alkali plus “mephitic air”, or CO2. A modern equation would be something like CaCl2(aq) + Na2CO3(aq) → CaCO3(s) + 2NaCl(aq). From Crosland, a chemical historian in 1959, page 13.
Meanwhile, a more general affinity table would look like this, although this one is French. Everything on the top can react with everything below it. So, on the left, an acid can react with a “fixed alkali salt”, a “volatile alkali salt”, an “absorbant earth”, and a “metallic substance”. From Geoffroy, a French chemist, in 1718.

As mentioned, though, this came with the burden of theory. Students learned that bonding was due to an invisible force called “affinity” (hence, affinity table). One can only assume that students who learned this and scholars who studied affinity theoretically massively wasted their time. Oh, well.

So, here’s what the students of Black would take away from his classes:

1) Chemistry is an interesting, useful subject founded upon experimentation.

2) Theory is a useful tool for a chemist when it’s grounded in experimental results.

This is getting closer to what a chemistry student should believe, and better than the impressions a student would get from general chemistry today. However, what we’re still missing is a sense of all of the uses of theory (although Black himself wasn’t aware of them), as well as the connection of theory not just to usefulness but to reality.


After the 1700s, chemistry was set on a clear trajectory. An openness to scientific inquiry and state sponsorship meant that the inevitability of scientific progress was basically set at this point. Meanwhile, this steady march of improved understanding led to an appreciation for scientific knowledge in general, and a great atmosphere for scientific learning.

Here’s where I say something controversial.

The atmosphere was so great, in fact, that the teaching of general chemistry actually reached its peak in the mid 1800s. The mid 1800s marks our point of regression.


Let me explain.

The mid 1800s were a unique point in chemistry. The usefulness of chemistry had exploded. Chemical industries were in full swing. The universities incorporated chemical labs. In Europe, the state sponsored research and development, while in America, the state sponsored intellectual piracy of European research and development.

Chemical theory, meanwhile, had actually become really useful! In Germany, Augustus Kekulé discovered and popularized the use of diagrams for organic chemistry. Simultaneously, the careful experimentation of laboratory scientists meant that categorizing chemicals could actually be done in a systematic way (like by weight of each element in a given chemical). Eventually, by the mid-late 1800s (1869, to be precise), the ultimate chemical categorization tool arrived: the periodic table.

Now, what’s interesting is that a lot of what we consider “fundamental” to chemistry was still not on the table (pun intended). Gilbert Lewis introduced the idea of electron pairs only in 1916. In fact, the electron wasn’t even discovered until 1897, and the nucleus in 1909. We’re still 50 years out from all of that.

So what did that mean for chemistry students? Well, it meant that their chemistry instruction was incredibly practical. Theory wasn’t developed enough to be learned solely on its own, so it was taught to supplement the laboratory work. Germany especially was a paragon of practical chemical learning with its massive research laboratories. The “Giessen method” of teaching was developed during this time by Justus von Liebig, one of the foremost chemists of his day. It involved massive amounts of laboratory work, hands-on training in what chemists actually do.

From Augustus Kekulé’s organic chemistry textbook, Lehrbuch der organischen Chemie, oder, der Chemie der Kohlenstoffverbindungen. Actual chemical equations! Helpful diagrams! Chemical theory based on experiments and also reality! It’s a miracle!

Crucially, this wasn’t just cookbook laboratory work, like we do today. These were legitimate experiments, closely tied to the active research von Liebig was carrying out at the time. The assignments would give instructions as to the goal, but only vague instructions on how to carry it out. 

The final exam for the end of a student’s chemistry education was the so-called 100 bottle challenge, where a student had to analyze 100 bottles of unknown compounds for their constituents. The reward once you’re done? Well, join von Liebig’s lab as an assistant, of course!

As you can tell, every piece of a student’s chemistry education under the Giessen method was to prepare you to go out in the world and do chemistry. Students were in the lab from the beginning, learning theory and performing experiments that actually helped them do chemistry.

That’s why I say this is the best time for chemistry teaching. Here’s what a chemistry student’s takeaways would be:

1) Practical methods on how to do chemistry as taught by someone who was at the forefront of research chemistry.

2) Specific theories, as well as how to apply them, why they’re useful, and the empirical evidence supporting them

In this era and place, general chemistry students were taught to the greatest extent possible to be chemists. That, to my mind, is the greatest possible way to teach chemistry: not as something to learn, but as something to do.


If the Giessen method was indeed the peak of chemistry teaching, then it’s all downhill from there (as peaks tend to operate). That’s unfortunate, because next up on our historical tour of chemistry teaching is Linus Pauling.

I really shouldn’t throw Linus Pauling under the bus. The man was incredibly brilliant, and an absolute font of ideas, some Nobel Prize winning and only a few that can be described as stupid (e.g. using Vitamin C to cure AIDS). And his approach to teaching chemistry, especially general chemistry, was pretty clever.

Pauling, somewhat similarly to Joseph Black, saw introductory chemistry as serving two functions: introducing chemistry and making people interested in it. So, that’s how he set up his teaching and his textbook.

The Table of Contents for Linus Pauling’s General Chemistry, 1947. Notice how chapters 1-10 introduce chemistry in more or less the way we would today, while Chapter 11 veers off to discuss Chromium as an interesting example of a transition metal.

When it came to introducing chemistry, Pauling took the approach of building up chemistry logically, like the textbooks of today, by starting with atoms and working his way up to chemical bonding. However, unlike the textbooks of today, he still included the empirical evidence that supported the fundamental assertions.

For instance, when introducing the periodic table, Pauling puts it in, explains it, and then briefly talks about the experimental evidence that confirmed it. He doesn’t simply cite facts referenced in the book (like similarities of alkali metals, as a modern chemistry textbook would do), but references the actual experiments done and includes their results.

It’s when dealing with what makes people interested in chemistry, though, that Pauling’s love for science really shines through and makes an impact on his students. He obviously loves collecting interesting and illustrative reactions and properties. It’s clear that he isn’t just familiar with chemistry as an academic subject, either, as he weaves in the use of chemicals in engineering as well.

This short paragraph introducing chlorine is the perfect example of how Linus Pauling’s love for chemistry shines through. He introduces the name, how it was discovered, and how it’s manufactured. He actually proceeds to discuss it further in the next couple paragraphs, too.

Where does his teaching fall short, though? Well, exactly where Justus von Liebig’s work shined: as an example of how to do chemistry. The practice problems in the book are abysmal, rote, and seem to mostly exist to fill space. They are problems that in no way are natural ones to come up, or to trigger deeper thought from the students.

Just looking at these practice problems from the chapter on Chemical Equilibrium is giving me flashbacks to my days in Gen Chem.

So, then, what would the takeaways be from Linus Pauling’s course as a student?

1) Chemistry is logical, interesting, and based on empirical evidence

2) The empirical evidence that chemistry is based on is beyond the ability of introductory chem students to repeat

In 100 years, this was the shift. Chemistry, as something that one did, was relegated to more advanced courses. You would no longer be able to do chemistry by taking an intro course alone.

The regress, then, would come in disregarding the empirical evidence entirely, and transforming chemistry entirely into something that one learned, rather than did.


This brings us to the present day, the year of our lord, 2020.

As mentioned, I took high school chemistry and 2 semesters of college chemistry. I don’t remember what textbook I used for my course, but it was much like this free online textbook, Chemistry 2e, for both my high school and college classes.

With this textbook, we’ve moved completely away from chemistry as something to do to chemistry as something to learn. The book is illustrated with plenty of examples, sure, but not of actual experiments that any real scientist did to prove or explore concepts. These are examples that literally only exist within the bounds of a chemistry textbook, the sort that use color changes with easy chemical formulae.

The table of contents is literally all theory. There is nothing put in here because the authors found it interesting, or wanted students to know how chemistry is used in the real world.
This is how the book uses real-world examples. There are so many choices of real examples to use when discussing acids, including the sorts of problems that drove men like Agricola and Beguin to start chemistry in the first place. But instead, we discuss acid raid in the context of trees and statues. It’s not even necessary to understand chemistry for this example! No chemist has ever been inspired to learn more by the example of acid raid, so why use it as an illustration?

It’s sterile chemistry. It’s the equivalent of teaching kids scales before they learn to play a song, or insisting that kids learn grammar before teaching them how to write. It’s chemistry as rote and rules, with no joy to exploration.

What’s frustrating about this is that it isn’t wrong, and that’s what its proponents hide behind. This is a way to teach students chemistry so that they learn the right things. They learn an introduction to everything in order, and learn the right facts to make sense of it all.

But, at the end of it, there’s nothing to take away. There’s certainly nothing that a chemistry student can do with this knowledge. It’s as useless as the dates he learned in history class or the declensions he learned in Latin. It’s another thing that will simply molder away in a notebook until it’s thrown out.

It doesn’t have to be this way. In fact, it wasn’t supposed to be this way. While researching for this essay, I stumbled across the CHEM project, which was a project in the 1960s to modernize the teaching of high school chemistry. While it’s not perfect, its emphasis on teaching chemistry that is grounded in the sort of experiments that high schoolers can do is an improvement.

Unfortunately, from what I can tell, it never got much traction. A follow up survey in the 70s found only 20% of high schools using the curriculum, and I’m not sure if any do now.

So, the takeaways, then, for a modern college chemistry student is this:

1) Chemistry is logical. 

2) The empirical evidence for chemistry is unimportant. 

3) The applications of chemistry are for changing the colors of solutions, and for describing things that are already kind of obvious.

4) Chemistry experiments are about repeating a process until you get to a specified endpoint

Students walk out of modern general chemistry classes finding chemistry neither interesting nor understanding its use. These classes are a process repeated over and over again, in college and high school classrooms around the world, with no clear goal or even metrics. They are closer to rituals than anything else.

As I’ve hopefully shown, this is a relatively recent phenomenon. Before the past 100 years or so, theory wasn’t developed enough to dominate any class, never mind general chemistry. As a result, classes could be far more practical, not only engaging the student but teaching them chemistry as something useful

If nothing else, I hope we can think about going back to that time. Chemistry teaching should be held to as high of standards as chemistry itself, and not left simply to wither.

Using spaced repetition flashcards to learn pretty much anything

Flashcards are an incredibly useful learning tool. In fact, they can be used to learn pretty much anything. That’s why it’s a pity that so few people use flashcards to their full potential.

Flashcards can be used to learn pretty much anything, because, in short, remembering is essential to learning. Once something is taught to you, you need to be able to remember it to use it.

Remembrance gets stronger with repetition. The more you remember something, the more vivid it is in your memory and the easier it is to remember. For instance, the more often you’re forced to recall a certain password, the more likely it is that you will remember it in the future.

Flashcards directly force you to recall chunks of information. In doing so, they make it easier for you to recall that information in the future.

Flashcards, therefore, have an obvious use when memorizing discrete content, like facts or vocabulary. Mnemonics can aid in memorization, but nothing tests and strengthens memorization quite so well as literally forcing yourself to recall. In fact, if you force yourself to recall the mnemonics as well, it increases their effectiveness.

As a graduate exam tutor (and flashcard app developer), however, I’ve discovered that flashcards can also be an excellent way to help yourself with processes and problems. The basic idea is the same: in order to perform a process or solve a problem, you have to remember how to do it. Flashcards can help you do so.

However, the complication with learning a process is that you don’t want to simply memorize the problem or process. After all, the point of learning a process isn’t simply to be able to repeat it on command. Instead, you want to be able to apply it when appropriate, even when the problem comes in a different form.

In other words, you don’t want to use the flashcard to just recall the problem. You want to use the flashcard to recall the thought process inherent to the problem, so you can use it to help solve future problems. 

The best way to do so is to create a 3 sided flashcard. There’s the problem, the answer, then the explanation. The answer to the problem isn’t enough. The answer only confirms if your attempt to solve the problem is correct, before you check the explanation for the process itself.

This is essentially how I tutor my tutoring clients. I put every question they have trouble with into my flashcard app, 21st Night. We work through the problem together, and then, once we’re finished, I have them write their own explanation in the app. For homework, they review the cards.

Now, when my tutoring clients write an explanation for a problem, they often make the mistake of just literally writing the step-by-step to solve the problem. However, in order to really use the flashcard to aid your pattern recognition, the explanation needs to anticipate future problems. To do so, it’s really important that the explanation incorporates not only how to solve the problem, but how you know that this is the way to solve the problem.

So, for instance, if it’s an algebra problem, the explanation that you test yourself on shouldn’t just be the step-by-step process to solve the problem. It should discuss how you know how to set up the problem, and how you will recognize similar problems with similar setups in the future.

Then, when reviewing, it’s important to test your recall of not only of the problem and the answer, and not even of only just the process, but also of your recognition of the type of problem. No matter the subject, there are only a limited number of types of problems. Similarly, there are only a limited number of approaches to solve the problems. As long as you can recall the approaches and recall how to recognize when to use them, you will be able to solve the problems in the subject.

In my own tutoring practice, this approach of putting everything into flashcards and practicing recall has proven very effective. While before my students would consistently forget the complex details of what I taught, now my students remember what I teach and how to solve the problems that used to vex them.

In summary, flashcards can be used to learn basically any subject that can go in a textbook. If the subject is content-heavy, the flashcards will help you memorize the content. If it’s process or problem-heavy, the flashcards will help with your pattern recognition, as long as you structure the explanation correctly

If you’re looking for flashcards, I’d of course recommend my own, which come with detailed studying analytics, three sides, and built-in sharing with friends and tutors. If you’re looking for free flashcards, though, I’d recommend Anki, although you’ll have to get comfortable with creating hint fields in order to get the 3 sided functionality.