Wednesday, April 24, 2024

Why Evolution is False

 Recently deceased philosopher Daniel Dennett called Darwin's idea of natural selection the best idea anyone ever had, a universal solvent that melts away all the problems in biology. Dennet had contempt for Christians, coined the term 'brights' for those who shared his worldview, and thought it wise not to respect religion because of its damage to the 'epistemological fabric of society.' Like fellow atheist Richard Dawkins, he never debated biologists, just theologians.

In a 2009 debate, Dennet mentioned he brought his friend, an evolutionary biologist, to a 1997 debate with Michael Behe about his book Darwin's Black Box (1996) because he felt unqualified to address the micro-biology arguments. Dennett described Behe's book as 'hugely disingenuous propaganda, full of telling omissions and misrepresentations,' that was 'neither serious nor quantitative.' Dennet then added he would not waste time evaluating Behe's newest book, The Edge of Evolution (2007).1

Dennett emphasized his approach to understanding the world was rational, reasoned, and evidence-based. Yet, he never directly addressed Behe's arguments and instead stuck to the cowardly pleasure of debating non-scientist theologians with whom he had greater knowledge of biology. He admits he could not evaluate the arguments alone by enlisting a biologist to help him debate Behe. If he could not trust himself to evaluate Behe's argument, a rational approach would be to take a trusted source's opinion as a Bayesian prior, not as a fact so certain that its negation damages the epistemic fabric of society. Unfortunately, many, perhaps most scientists, agree with Dennett and think the only people who don’t believe in evolution are ignorant (e.g., see Geoffrey Miller here, or Dawkins here).

If Dennett had read Behe's Edge of Evolution, he would have seen it as a logical extension of his earlier book, not moving the goalposts. Behe's argument isn't based on parochial microbiology knowledge; it's pretty simple once one gets the gist.

Behe highlighted the edge of evolutionary power using data on malarial resistance to two different antibiotics. For the malaria antibiotic atovaquone, resistance develops spontaneously in every third patient, which, given the number of malaria in a single person, the probability malaria successfully adapts to this threat can be estimated as one in 1e12, or 1e-12. Resistance occurs once every billionth patient for the antibiotic chloroquine, giving an estimated successful mutation probability of 1e-20. This roughly squares the original probability, which led Behe to suggest that at least two mutations were required to develop resistance, as two mutations represent a squaring of the initial probability. This prediction was confirmed a few years later.

Extending this logic, given a base mutation rate of p per base pair per generation (e.g., p ~1e-8 for humans), if n mutations are needed, the probability of that happening scales at pn. Given that new proteins require at least 10 changes to some ancestor (out of an average of 300 aminos), the probability of an existing genome evolving to a new specific protein would be 1e-80. Given that only 1e40 organisms have lived on Earth, this implies that evolution is limited in what it can do (note most evolution we observe, as in dog breeds, just involves changes in allele frequencies).

A reasonable criticism is that this argument works for a specific target, such as Behe's example of malaria overcoming an antibiotic. However, the state space of possible unknown proteins is much larger. For example, the average protein has 350 amino acids; with 20 amino acids, that's 20^350 or 1e455 possible permutations. If there are islands of functional proteins within this vast state space, say at only a rate of 1e-55, that leaves 1e400 potential targets. The 'singular target' criticism does not invalidate Behe's assertion, but addressing it would require too much space for this post, so I will just mention it as a defensible criticism.  

However, most reviews of Behe's malaria-based argument centered on the simple assertion that generating n specific mutations scales at pn.

Ken Miller (Nature, 2007):

Behe obtains his probabilities by considering each mutation as an independent event

Sean Carroll (Nature, 2007)

Behe's chief error is minimizing the power of natural selection to act cumulatively.

Jerry Coyne (The New Republic, 2007)

 If it looks impossible, this is only because of Behe's bizarre and unrealistic assumption that for a protein-protein interaction to evolve, all mutations must occur simultaneously, because the step-by-step path is not adaptive.

These criticisms are based on two possible assumptions. One is that if multiple mutations are needed, single mutations encountered along the process of fixing the multiple mutations may each confer a fitness advantage, allowing selection to fix the intermediate cases. While this can be true, it is not for the mutations needed for malaria to develop chloroquine resistance, which needs at least two, and it is undoubtedly not true in general. Indeed, a good fraction of the intermediate steps reduce fitness, some severely (see here or here), which is why a fitness advantage from a two-step mutation does not happen with the same frequency as fitness enhancement that needs one mutation: in the wild, the intermediate mutations are eliminated from the population.

The other assumption would be if the number of indirect paths overwhelms the specific case where sequential mutations occur. There are many indirect paths from one sequence of 300 amino acids into another. The question is their probability, the sum of these successful paths over all possible paths.

Intuitively, multiple specific simultaneous mutations are astronomically improbable and would constitute an impenetrable fitness valley, which is why evolution proponents are emphatic that it is an absurd approximation. Their intuition is that if one needs a handful of specific mutations when one already has 100 or 400 of the needed amino acids in the correct spot, a cumulative process of some sort should be likely, even if the final steps involve neutral mutations that neither help nor hurt fitness. However, none of Behe's critics generated a model to quantify their reasoning, even though they are all scientists in this field.

Model of the Behe Edge of Evolution Argument

Hypothesis: Prob(get n specific mutations sequentially) ~= Prob(get n specific mutations over 1e40 generations)

The model below uses a trinomial lattice to show that if the probability of getting one mutation correct is p, the probability of getting n mutations correct is on the order of pn. Given the small probability of getting one mutation correct, this highlights what Michael Behe calls the edge of evolution: the fitness landscape generates an impenetrable barrier for naturalistic processes. Assuming simultaneous mutations towards the target captures most of the cumulative probability of reaching that target if mutations are neutral until the target is achieved. The other paths drift away at a rate that initially eliminates the benefit of being so close.

We can model this using a variant of Richard Dawkin's Weasel program that he used to demonstrate the power of evolution. It starts with a string of letters and spaces, 27 potential characters in a string of length 28.

neuh utnaswqvzwzsththeouanbm

Dawkins randomly generated strings, fixing characters that matched the target phrase from a Shakespearean play. The application to genetics is straightforward if we think of the phrase as a sequence of nucleotides or amino acids creating a protein.

xethinks dt is like a weasek

xethinks dt is like a weasel

xethinks it is like a weasel

methinks it is like a weasel

This algorithm fixes the characters that only make sense at completion. This is like assuming necessary mutations are selected with foresight, which would require some outside designer shepherding the process. Evolutionists countered that Dawkins' weasel program was merely to demonstrate the difference between cumulative selection and single-step selection, but this is disingenuous, as removing forward-looking selection increases the number of steps needed from 100 to 1e39, which would not make for a convincing TV demonstration (see Dawkins’ promoting his weasel here).

Nonetheless, the Weasel program is familiar to many who study evolution and can be used to illustrate my point. Consider the case where we are two characters away from the target sequence.

Start: 2 wrong, 26  correct

XXTHINKS IT IS LIKE A WEASEL

Case 1: 1 wrong, 27 correct.

MXTHINKS IT IS LIKE A WEASEL

XETHINKS IT IS LIKE A WEASEL

Case 2: 2 wrong, 26 correct. The two mismatched characters can each change into 25 potential targets (27 – 2) that are also wrong, for a total of 2*25. For example, here are three.

YXTHINKS IT IS LIKE A WEASEL

XYTHINKS IT IS LIKE A WEASEL

AXTHINKS IT IS LIKE A WEASEL

Case 3: 3 wrong, 25 correct. Each of the remaining matching characters can become unmatched by changing into any of the 27-1 potential characters. The total number of paths is 26 x 26. Here are two.

XXXHINKS IT IS LIKE A WEASEL

XBTHXNKS IT IS LIKE A WEASEL

To generalize the problem, let us define L as the string length, c the number of potential characters at each element in the string, and n the number of needed changes to the string (initial mismatches). The set of potential new strings can be split into three groupings with their distinctive number of paths:

1.      strings that have more characters that match the target:

a.       e.g., n moving from 3 to 2

2.      strings that have the same number of characters matching the target: n*(c – 2)

a.       e.g., n changing one mismatched amino to another, so staying at 3

3.      strings that have fewer characters that match the target: (Ln)*(c-1)

a.       e.g., n moving from 3 to 4

The probabilities are the same regardless of which n characters are referenced. For example, the two sequences below are mathematically identical regarding how many paths are towards and away from the target, so they will have the same probabilities of moving towards or away from the target.

XXTHINKS IT IS LIKE A WEASEL = METHINKS IT IS XIKE A WEASLX

This makes the problem simple to model because regardless of where the string starts, the number of cases that need probabilities is at most L+1, as n can range from 0 to L. All mutations are considered equal probability; the number of paths up over the total possible paths {up, same, down} is the probability of moving up. We can, therefore, calculate the probability of moving closer, the same, or further from its target (i.e., nt+1 < nt, nt+1 = nt, nt+1 > nt) for each n <= L. This allows us to model the evolution of the string using a recombining lattice that evolves from an initial node, a standard tool for modeling option values.

At every row of the L+1 nodes in the lattice, we calculate the probabilities of moving up, across, and down via the following formulas.

1.      Prob(closer): n/(L*(c-1))

2.      Prob(same): n*(c-2)/(L*(c-1))

3.      Prob(farther: (L-n)*(c-1) /(L*(c-1))

Figure 1 below shows a case with five rows representing a string of 4 characters. The columns represent generations defined by a change in one of the spaces in the sequence (a mutation). The bottom row of nodes reflects zero matches, and the top level is a complete match. In this case, the initial node on the left implies 2 mismatches out of 4. If the new mutation flips a mismatched position to the target, it is closer to the target and thus moves up one level, just below the top row; if it stays two away by having an incorrect letter changed to a different incorrect letter, it moves horizontally; if a correct letter changes to an incorrect letter, it moves down.

Figure 1

A complete match is a success, and we assume that once the sequence reaches the target, it does not regress (it is then subject to selection and fixates in the population). Thus, there is no path downward from the top row. While this is not realistic in practice, it inflates the total probability of reaching the target and does not affect the gist of my argument. The point is to highlight the relative importance of the direct path as opposed to the actual probability.

In Figure 2 below, we have a snip from a worksheet modeling the case where the starting string has two mismatched characters from the target sequence of 'methinks...' For the first mutation, there are 676 changes away from our target, 50 that would maintain the same distance, and 2 that move towards the target, giving probabilities of 92.8%, 6.9%, and 0.3% to the node branches. At the next upward node in step 1, the probability of going up again and reaching the target is 1/728 or 0.14%, so the probability of reaching the target in two mutations is 0.27% * 0.14%, or 3.77e-6.

Figure 2

The 'nodeProb' row represents the probability of reaching the target on that mutation, while the 'cum prob' row represents the cumulative sum of those nodes, given we assume reaching the target allows the organism to thrive with its new protein function.

The node probabilities asymptotically decline, so taking the hundredth' node prob' as equal to all subsequent nodeProbs generates a conservative estimate of the number of generations (G) needed to double the direct path probability.

2*nodeProb(2) = cumProb(100) + G*nodeProb(100)

G = (2*nodeProb(2) - cumProb(100) )/nodeProb(100)

For this example

G = (2*3.77e-6 – 4.23e-6)/3.00e-37 = 1e31

This implies at least 1e31 generations are needed for the cumulative probability to be twice the direct probability. As estimates in this field have logarithmic standard errors (e.g., 1e-10 to 1e-11, as opposed to 1.0e-10 to 1.1e-10), a direct path probability within a factor of 2 of the cumulative case over 1e31 paths is about the same.

Modeling this with a lattice allows us to see the relative importance of the direct path because everything reaches the target over an infinite amount of time, so the probability of reaching the target is the same regardless of the starting point. With a lattice approach, we can model the finite case that is still large enough (e.g., 1e40) and see the relative probabilities.

The probability of a direct path is approximately equal to a simultaneous path because, if we assume mutations obey a Poisson process, the probability of a simultaneous mutation is the same as sequential mutations, just one probability times the other. For example, the malarial parasite cycles through several generations during an infection, and the resistant parasite could acquire the necessary mutations in generations 3 and 5 or simultaneously in generation 3 (both would have the same probability).

Thus, Behe's hypothesis that the probability of reaching a target n mutations away scales with pn is a reasonable estimate when intermediate steps are neutral.

The worksheet contains the Weasel program applied to 2 and 3 characters away from the target. It can be generalized straightforwardly for an arbitrary n of needed mutations, 20 amino acids, and protein of length L, as shown in the worksheet ‘12awayAminos.’ Looking at the worksheet '2away' in the Excel workbook, the probability of reaching the target in a direct sequence is the probability of hitting the first top node. All the probabilities here are relative because this model assumes a mutation along an existing string of amino acids. This occurs only at a rate of 1e-8 per nucleotide per generation in humans. So, the extension to amino acid changes needs an extra adjustment (nucleotide changes do not necessarily change amino acids). The purpose of this post is just to show the relative probability of the direct and indirect paths, which is independent of the absolute probability.

There are an estimated 35 million single nucleotide differences and 90 Mb of insertions and deletions, creating 700 new genes in chimps and humans that were not present in our last common ancestor. That's at least 100 functional mutations that must be fixed every generation over the past 6 million years. Genetic drift can generate the 100 newly fixed mutations, but this drift is random. The probability that a random amino acid sequence would do something helpful is astronomically improbable (estimates range from 1e-11 to 1e-77, a range that highlights the bizarre nature of evolution debates), which creates a rift within the evolution community between those who emphasize drift vs. those who emphasize natural selection.2 This debate highlights that there are rational reasons to reject both; each side thinks the other side’s mechanism cannot work, and I agree.

In Dawkin's Mount Improbable, he admits the de novo case is effectively impossible. He suggests indirect paths involving gene duplication, as the base structure provided by an existing gene would give the scaffolding needed for protein stability. Then, one must merely nuance a few segments to some unknown function. The above shows that if one needs a modest number of specific amino acids (e.g., a dozen), the probability of reaching such a target will be greater than 1 in 1e100. For this copy-and-refine process to work, it requires a protein space with a dense set of targets—islands of functionality in the protein state space—which is possible but eminently debatable.

1

See the 2009 debate between Dennett and Christian philosopher Alvin Plantinga, especially around 64 minutes in.

2

The 1e-11 estimate applies to an 80-amino string converting from weak to strongly binding to ATP, which is understandable given this is the most common binding protein machine in a cell, as it powers everything. Further, this was studied in vitro, which is 1e10 less likely to work in vivo. The 1e-77 estimate is for beta-lactamase, an enzyme produced in bacteria with a specific function, unlike binding to ATP which is common. Other estimates include 1e-65 for cytochrome c, 1e-63 for the bacteriophage lambda repressor, and 1e-70 for a certain phage protein. 

Tuesday, April 16, 2024

How to Eliminate Impermanent Loss

 Generally, markets are efficient in that it isn't easy to make above-average returns day-trading, and most mutual funds underperform market-weighted ETFs. Yet historically, in various applications, options have been underpriced for decades. For example, asset return distributions were known to have fatter tails than the lognormal distribution back in the 1960s (see Benoit Mandelbrot ('62) or Eugene Fama (' 65)). Most option market makers, however, applied the basic Black-Scholes with a single volatility parameter, which underpriced out-of-the-money options. On a single day, October 19, 1987, the stock market fell 17%, which, using a Garch volatility estimate, was a 7.9 stdev event. From a Bayesian perspective, that did not imply a miracle but, rather, a misspecified model. Option market-makers soon adjusted their models for out-of-the-money puts and calls, ending decades of underpricing (20 years before NN Taleb's Black Swan introduced the concept of fat tails to the masses as if it were 1986).

Another example is convertible bonds, which are regular bonds that are convertible into stock at a fixed price, a bond plus an equity call option. The optionality was underpriced for decades. In the 1990s, a hedge fund that merely bought these bonds hedged the option's equity delta and bond duration and generated 2.0 Sharpes, which is exceptional for a zero-beta strategy. Eventually, in the early 2000s, investment bankers broke these up into bonds and options, allowing them to be priced separately. This isolation highlighted the option underpricing, and this market inefficiency went away. One could add Korean FX options that were underpriced in the 90s, and I'm sure there were many other cases.

So, there is precedent for a lot of money locked up in money-losing short convexity positions, as with liquidity providers (LPs) on automatic maker makers (AMMs). However, LP losses explain why AMM volume is still below its 2021 peak; transformative technologies do not stagnate for three years early in their existence unless something is really wrong. It's like how multi-level marketing scams like Amway have a viable business model relying on a steady stream of new dupes; they can persist, but it's not a growth industry.

The Target

The way to eliminate AMM LP convexity costs centers on this formula:

Impermanent Loss (IL) = LPvalue(pt) – HODL(pt)

LPvalue(pt) is the value of the LP position at the new price, pt, and the 'hold on for dear life' (HODL) portfolio is the initial deposit of LP tokens valued at the new price. It turns out that this difference is not an arbitrary comparison but instead captures a fundamental aspect of the LP position, the cost of negative convexity in the LP position. The expected value of the IL is the LP's convexity cost, which is called theta decay in option markets.

The LP convexity cost also equals the expected value decay of a perfectly hedged LP position (minus fees). Hedging is often thought of as reducing IL, and it does reduce the variance of the IL, eliminating extreme LP losses. However, this does not affect the mean IL, which, over time, equals the expected value. Lowering a cost's variance reduces required capital, which is why hedging is a good thing, but it will not solve the IL’s main problem, its mean, which is generally larger than fees for significant capital-efficient pools.

If we expand the above formula, we can rearrange variables to get the following identical equation as a function of pool token quantity changes and the current price (see here for a derivation).

IL = USDchange + pt×ETHchange

I am using a stablecoin USD and the crypto ETH as my two tokens because it makes it easier to intuit, though this generalizes to any pair of tokens (I like to think of prices in terms of dollars, not token A, but that's just me). The duration used to calculate the token changes implies the same LP hedging frequency. 24 hours is a pretty good frequency as hedge errors cancel out over a year via the law of large numbers, but anything between 6 hours and 1 week will generate similar numbers. If we can set ETHchange=0, USDchange will be near zero, and we will effectively eliminate IL.

An Extreme Base Case

One way to eliminate IL is to have a single LP who can trade for free while everyone else trades for a fee. Whenever the AMM price differs from the true price by less than the fee, only the LP can profit from arbitrage trading. A simple way to intuit this is to imagine if an AMM was created and not publicized, so it has no traders outside of the creator who assumes the role of LP and arbitrageur. With no other traders, he is trading with himself in a closed system. If his accounts are netted, he could both set the price on his contract efficiently and hedge his IL; nothing changes except the price. His net position in the tokens is static: ETHchange=0, IL=0.

It's helpful to think of traders as being two types: arbitrage and noise. Arbitrage traders are trying to make instant money off minor price discrepancies between exchanges; noise traders are like white noise, mean zero net demand that flows in like Brownian motion. A market price equilibrates supply and demand, so the volume that nets out is, by definition, noise. Noise traders are motivated by individual liquidity shocks, such as when a trader needs money to pay taxes, or the various random buy and sell signals generated by zero-alpha trading strategies.

If the AMM's base trading fee was 15 bps, and the LP could trade for free, the LP could turn loose an automated trading bot based on the Binance/CME/Coinbase price, trading whenever the mispricing exceeded 10 bps. Over time, the LP/arbitrager will be responsible for all the AMM's net price changes. With the AMM at the equilibrium price, immune to arbitrage by non-LP traders, the other trades will be white noise, net zero demand, by definition.

The figure below shows how the LP's ETH position for an AMM restricted range from $1300 to 1700 changes. The Pool ETH change implies traders are accumulating an offsetting amount; if the pool lost 2.5 ETH, traders gained 2.5 ETH. This net trader accumulation is arbitrage trading because it is not random, which is mean-zero over time. Gross trading will be greater due to noise trading, and in a healthy exchange, the noise traders will compensate the LP for his predictable IL over time.

If the monopolist zero-fee LP were the arbitrageur, at any price his net ETH position would be the same; the sum of those lines, his net position on the contract, would be horizontal. With a constant net ETH position, his impermanent loss would be zero. This works because, unlike traditional markets, the option seller is not reacting to prices but setting them. The latency inherent in any decentralized AMM implies the price setter does not need any alpha, just an API to the lower latency centralized exchanges.

The LP would still have exposure to this net ETH position, but this is simple to hedge off the AMM with futures, as it would not need frequent adjusting.

If we add a separate margin account on the contract that held the LP's trades (with himself), they will net to a constant, his initial token position. The monopolist LP's net position on the contract would be represented as in the table below (on average).

Single LP with Exclusive Arbitrage Trading Access

Thus, if the LP could trade for free, allowing him to dominate arbitrage, and he had a separate trading account on the contract, he could eliminate his IL without moving tokens on and off the blockchain or making other trades on different exchanges.

Multiple LPs

An AMM with only one LP would not work. An extensive set of LPs is needed to avoid centralization, which presents an attack surface for regulators and hackers; it is also needed to give the AMM unlimited scale. However, now we must consider LP competition. If we gave all the LPs free trading, a power law distribution of trading efficiency would invariably reveal a few dominant LPs monopolizing the arbitrage trading, shutting the slower LPs out and leaving them all exposed to IL as before.

Fortunately, the AMM provides a simple mechanism to allow all the LPs to arbitrage the price and hedge their positions simultaneously. This is because, on an AMM, the price is only changed by trades, so given a specific liquidity amount, a price change implies a specific change in ETH and vice versa. This makes it feasible to apply a rule using the LP's net ETH position on the contract to see if they qualify for the free discount.

Consider the following framework. Each LP gets a trading account in addition to their LP position. The key point is these accounts are netted, so like above, the LPs can retain a constant net position as the price moves significantly.

This can be adjusted to give them capital efficiency without changing the implications.[see footnote below] For my purpose in this post, this is irrelevant.

In the earlier example with one LP, his margin account's net change is calculated using the same liquidity as the pool because, with one LP, his liquidity is the pool's liquidity. This implied the LP's trades would always exactly offset his pool token changes. With many LPs, this is not so. A trade is against the entire pool, which is necessarily greater than any one LP. Thus, if LP(i) trades in a pool with many LPs, it will change LP(i)’s individual net position.

netETHchg(i) = (totLiq - liq(i))/totLiq *ETHtrade

As totLiq > liq(i), each trade changes the LP's net position, unlike when the monopolist LP trades with himself. For example, if an LP owns 10% of the liquidity, buying 1.0 token increases his margin position by 1.0 but decreases his pool position by only 0.1.

Assume LPs can only trade for free in the following conditions:

Buy for free only if

PoolEth(i) + MarginEth(i) - initEthDeposit(i) < + 0.01

Sell for free only if

PoolEth(i) + MarginEth(i) - initETHDeposit(i) >  - 0.01

If we assume the LP hedged his initial ETH deposit off-chain, the rules basically say if an LP is net net long, he cannot buy for free; if the LP is net net short, he cannot sell for free [presuming she hedged elsewhere initially, her ‘net net' position is her net position minus her initial deposit, as we assume it is hedged].

To see how this works, assume we have two LPs, Alice and Bob, with equal amounts of liquidity. Both Alice and Bob start with pool ETH positions of 990, and they have zero positions in their margin accounts. The trading fee is 0.5%, so all the price changes in this example are only profitable opportunities for Alice and Bob.

LP Pool, Margin, and Net ETH Balances

Assume the price initially rises by 0.41%, generating a arbitrage opportunity for the LPs. Alice wins the race and arbitrages this price discrepancy, setting the AMM price at its new equilibrium level of $3,313. She bought 4 ETH, and her pool position declined by 2 for a new ‘net net’ ETH position of +2. LP Bob just sees his pool position decline and has a new net net ETH position of -2; Bob is short, and Alice is long.

Alice cannot arb the AMM if the price rises again due to the rule preventing long LPs from buying at zero fee. If the price rises, Bob will win the arb race (by default), and LP’s Bob and Alice are each flat again. In the table above, Alice buys in the rows where she has a light blue box, and Bob buys in the rows where he has a light orange box. They can only buy when their net net position is flat or negative when prices are rising. This prevents Alice or Bob from dominating arbitrage.

If the price immediately went back down after Alice initially arbitraged the pool, only Alice could arbitrage the pool price. This is because in period 1, Alice is long, and Bob is short, so Bob cannot sell for free, while Alice can. When she does sell, she restores the initial net position for both herself and Bob. Alice’s extra trading is benign.

The LPs will experience net token position changes for single trades or small price movements. Over longer price movements, however, each individual LP's net token change would be insignificant, and assuming the LP hedged, their net net position would be insignificant.

There is no reason to pay arbs to set AMM prices, as the latency in decentralized blockchains implies arbitrage requires no special expertise, unlike the price discovery on the lowest latency exchanges. External arbitrageurs generate a deadweight loss for high-latency decentralized AMMs, so there is no trade-off with price efficiency.

A specialist with fast APIs and redundant arbitrage bot programs distributed in the cloud could manage a bot for a set of LPs. If the contract allowed LPs to whitelist another address that could do one thing on his account: trade for zero fee. The rule would prevent this arb vault manager from overtrading some accounts at the expense of others, as either it would single out one account for trades that cancel out, like in the case of Alice buying then selling above, or quickly find that her accounts that have traded first in one direction can no longer trade in that direction. One could cap trade size so the arb vault manager does not max out a trade on an LP to generate a profit for the other LPs. Competition among such specialists would allow passive LPs to get most of the benefit from arbitrage/hedging without creating and running their own arb bots.

It wouldn't work on coins not traded on major centralized exchanges because it presumes the true price is common knowledge. The next Shina Ibu will have to first trade on a conventional AMM. Yet, that's a minor amount of AMM volume.

Alternatives

As the equilibrium price is presumed to be on CEXes, and an equilibrium price implies net zero demand trading, one could use an oracle to update the AMM's last price to this price just as an LP arbitrage trade would. Over time, the LP net token changes would not be explicitly tied to price changes such that LP position values had negative convexity (i.e., linear in the square root of price). The main problem for oracle-based AMMs is centralization. This creates an attack surface for hackers. It also creates an attack surface for regulators, as the SEC knows where American Chainlink CEO Sergey Nazarov, for example, lives. Once they figure out how to do it—it took regulators several years to shut down InTrade—they will prevent his company from supporting anything that competes with regulated industries like finance and sports betting because regulation’s quid pro quo is protection from competition. Another problem is incentives, in that with more players with different actions, the state space complexity increases exponentially, so any solution will be more costly and less efficient. Arbitrageurs with more skin in the game would invest more time exploiting the oracle than the oracle would defending itself, especially as it requires a collective decision-making process and will be much slower to respond.

A decentralized limit order book (LOB) could avoid IL like an oracle-based AMM by allowing the LPs to move their resting limit orders costlessly to the latest CEX price, but there would be several problems. First, on-chain LOBs aspire to look and feel like the trading interfaces we are used to on centralized exchanges, so they emphasize low latency. Low latency leads to quasi, if not complete, centralization, an attack surface, and, more importantly, insiders who will get the equivalent of co-located server access. As third parties do not audit these ‘decentralized’ exchanges, the LOB leaders or a lower-level dev could sell undisclosed privileged access, and the costs would be virtually zero. It would take a lot of faith to presume they are immune to this subterfuge. Secondly, it would require many messages to cancel and replace resting limit orders. Unlike my mechanism, where one account rectifies the mispricing, on an LOB each LP account must be adjusted separately. On a CEX, these cancel-and-replace orders are costless, but if the blockchain is marginally decentralized, it will cost a few cents, and zero is a significant price point. Lastly, no one has presented a mechanism to incent a manager to oversee a set of LPs who wish to outsource their active management coherently. The LPs would still compete for favorable queue positioning, making it needlessly complicated and shutting out potential passive LPs.

The auction approach proposed by Maollemi et al. lets people bid to be the zero-fee trader. In that approach, the LPs would sell their right to fees to an arbitrageur who would get privileged zero-fee trading rights over some future period. The bidder pays a lump sum to the LPs, and the trading fees for that period go to the bid winner (meaning his trading fees return to him, so he trades for free). Assuming risk-neutrality and zero capital costs or other expenses, the arb would pay the expected arbitrage profit, which equals the expected LP convexity cost. They would also pay for the expected noise trader volume. Considering the arb bidder would be exposed to risk in that actual volatility and noise trader volume will be much lower than expected on occasion, he would pay considerably less than the expected value of the arbitrage opportunity and noise trading fees. The arb would have to hedge his trades on a different exchange, requiring extra collateral, and have to move tokens on and off the blockchain between those two exchanges (always losing one and gaining another). Lastly, frequent novel auctions can be manipulated, which means they will, especially at first.

Conclusion

The signature AMM is the ETH-USDC 5 bps pool on the Ethereum mainchain. Over the past 12 months, it has generated about $44MM in LP fee revenue and experienced $54MM in convexity costs. Yet even if LPs were, in aggregate, making a profit, the convexity cost is still significant and unnecessary. Given that the entire Uniswap AMM market is 4x the above pool, and there are other AMM protocols, that's several hundred million dollars a year wasted annually. The good news is this can be eliminated, propelling AMMs out of their doldrums and securing a long-term solution to an essential blockchain mechanism: swapping coins.

Most crypto people do not intuitively understand an LP's convexity costs, but they are not some new speculative theory (e.g., hedge fund sniping). Gamma, convexity costs, and theta decay have been analyzed empirically and theoretically for the past 50 years. They are the unavoidable consequence of convex payouts based on an exogenous underlying stochastic price. Constant product AMMs that link trade amounts to price changes, combined with the latency, allow the derivative owner (LP) to both hedge and set the underlying price simultaneously.

I haven't met anyone who understands this because they would be excited if they did. It's not often you find ways of saving hundreds of millions of dollars a year. My friends in academia or tradFi don't have much interest, let alone knowledge, of AMMs. My acquaintances in crypto, even those actively building AMMs, don't understand convexity beyond the technical definition, so they do not think it is a big deal. Fear of convexity costs is the beginning of AMM wisdom.

I wrote my dissertation in 1994 on low-vol stocks, noting they generated abnormal returns because they generate a slight premium to the market at 30% less risk (see here for a history of low-vol investing). I took a job as a bank risk manager but was busy pitching the idea to various asset managers, including several at the major investment banks. They didn't reject what I was saying but were always eager to know if well-known people or institutions in this field were on board. They weren't, and no one thought a fund offering virtually the same return as the SP500, regardless of risk, was compelling. I was out by the time low-vol investing started to grow. The responses I get from crypto people to this idea are similar.

It's not an abstract argument, just MBA-level finance. My best hope for this approach is that in a few years, LLMs will pick it up as they scrape the web for data, and via pure logic, over a couple of years, ChatGPT6 will use it as an answer for 'How do I remove impermanent loss?' There is pleasure in just being right about something important. 

footnote

In a levered AMM, the initial pool amounts are multiples of the initial deposit. This implies the LP starts with debits in his margin account for both tokens. The process works as follows:

Take an initial ETH deposit, ETH0. Given the initial price, p0, and assuming a leverage of 20x, apply the following liquidity to that LP

liquidity = 20*sqrt(p0)*ETH0

Given that liquidity, the initial USD deposit, also levered 20 times, is calculated:

Initial USD deposit = liquidity*sqrt(p0)/20

The objective of monitoring the LP’s net position for free trading is the same as above. Here, the initial margin account is negative, and the LP is susceptible to insolvency. However, a more pressing concern is whether the LP will be insolvent in one of the tokens, even though their total position value is positive. A pool where all the LPs are solvent, but the LPs have zero of one of the tokens would prevent purchases of that token. The solution is to allow liquidation based also on the minimum net position for the LP in the two tokens.

If the LP actively hedges its position, its net position will be constant in both tokens, so LPs would only be subject to liquidation if they exposed themselves to IL out of negligence. 

Monday, April 08, 2024

Spurious High Frequency Autocorrelation

 A curious aspect of high-frequency data is that most come from centralized limit order books (CLOBs) where the bid-ask spread makes the data look negatively autocorrelated as trades are randomly made at the bid and the ask.

The returns driving this pattern are well below transaction costs, so they do not generate an arbitrage opportunity. However, one might be tempted to use the high-frequency data to estimate variance for pricing options or convexity costs (aka impermanent loss, loss versus rebalancing). This is a problem because the 1-minute Gemini returns generate a variance estimate 40% higher than one derived from daily data.

Variance grows linearly over time; volatility grows with the square root of time. Thus for a standard stochastic process, the variance should be the same when divided by the frequency. If the return horizon is measured in minutes, the variance of the 5-minute return should be half of the variance of the 10-minute return, etc. Variance(ret(M minutes))/M should be constant for all M. It’s helpful to divide the data by a standard, which in my case is the variance of the 1-day return over this sample (1440 minutes), so we can clearly identify those frequencies where variance is over and under-estimated. If the ratio is above 1.0, this implies mean-reversion at this frequency (negative autocorrelation), and if below 1.0, this implies momentum (positive autocorrelation).

Here is the ratio of the m-minute return variance divided by m for various minutes, normalized by dividing by the 1-day return. It asymptotes to 1.0 at 300 minutes.

I added the ETH-USDC 5 and 30 bp pools, and we can see the effect of stasis created by the fee and lower transaction volume. The one-minute return variance ratios for AMMs are well below 1.0, implying momentum—positive autocorrelation—at that frequency. Again, the effect driving this is well below 5 basis points, so it’s not a pattern one can make money off.

A common and easy way to calculate variance is to grab the latest trade and update an exponentially weighted moving average. This would generate a variance estimate at an even higher frequency than once per minute. The perils of this are clear when many important data are linear in variance, such as the expected convexity cost, and 40% is big.

As a refresher, I show how the common concept for impermanent loss equaling a function linear in variance because it’s not obvious. We start with the original IL definition

This is measured as

The following AMM formulas can substitute liquidity and prices into the above equation.

plugging these in, we can derive the following formula.

This ‘difference in the square roots squared’ function calculates a realized IL over a period (from 0 to 1). If one estimated this using daily data over several months, it would equal the expected IL via the law of large numbers.

One can estimate the expected IL using the expected variance and gamma of the position. The key nonlinear function on the AMM LP position is the square root of price, which has negative convexity. This implies a simple variance adjustment to the current square root to get the expected future square root.

Note that in the above, the variance is for the time period from 0 to t, so if it’s an annualized variance applied to daily data, you would divide it by 365. Returning to the above IL formulated as a function of square roots, we can substitute for E[sqrt(p1)] to see how this equals the ‘LVR’ equation with the variance.

Using (p1/p0 - 1)2 for the variance will generate the same formula as the difference in square roots squared if you use the same prices. We can’t measure expected returns, but average returns equal expected returns over time, and that’s all we have. One chooses some frequency to estimate the IL, regardless of the method. Just be careful not to use returns under a 300-minute horizon, as these will bias your estimate. 

Thursday, March 14, 2024

Moallemi's Auction-Managed AMM

A recent paper by Columbia professor Ciamac Moallemi and three Uniswap affiliates (Adams, Reynolds, and Robinson) presents a mechanism for recapturing the convexity costs. It builds upon Moallemi's previous work on automated market makers (AMMs) and arbitrage profit, published a year ago, which Moallemi presented at a16z crypto last summer. In that talk, he mentioned auctions as a way to reduce adverse selection costs for liquidity providers (LP).

AMMs' big problem is that LPs generally lose money in their popular capital-efficient (v3) pools. An LP's net profitability consists of revenue in the form of fee income and convexity.

LP profits = volume*fee – convexity costs

The LPs present traders with an option, and like all options, these are costly for the sellers. LP convexity costs have three mathematically equivalent formulations, and I posted three equivalent ways on Monday. Many only consider LP fee revenue as if convexity costs are just an academic hypothetical relative to an ideal. However, unlike the hedge fund sniping expense, option expenses are real, as reflected in standard option premiums above intrinsic value.

More relevant to this post, the convexity cost in an AMM is determined by two variables: the variance of the asset, which is exogenous to the AMM, and the liquidity in the AMM pool, which is endogenous. Both have a direct, linear impact on convexity costs.

AMM convexity costs = liquidity * sqrt(p0) * variance / 4

The expected value of this formula is unaffected by the granularity of the data used to generate the variance, highlighting that this cost has nothing to do with the frequency of blocks, trading, or hedging. The convexity cost is the inverse of the gross arbitrage profit (noted in my Monday post), which is the profit to the arbitrageur collective pre-fees

The largest AMM on blockchains is the ETH-USDC 5 basis point v3 pool, which processes hundreds of millions of USDC daily. This year, the 5 bp pool is actually making money because its liquidity is down about 60%, which has a first-order effect on convexity costs. For example, last year, their liquidity was usually around 30MM, which implies a $100k trade would generate slippage of only 0.01%; currently, liquidity is around 10MM, so the slippage would be 0.04%. I don't think traders mind, as this is still far better liquidity than anywhere else. This LP profitability could be temporary, but this pool has not had this low level of liquidity for three months since 2021, suggesting LPs have wizened up. With only $150MM in capital in the pool, this annualizes to an excellent 10% annualized return. 

Average Daily Stats for ETH-USDC 5bp Uniswap v3 Pool

Unfortunately, this year's ETH-USDC 5bp pool performance is not typical, and regardless, addressing LP convexity costs is important. The larger capital efficient restricted range (v3) pool LPs lose money. Since January 1, 2023, the ETH-USDC 5bp pool has lost an average of $6k a day net, including $142k in fee revenue and $148k in convexity costs. If one could reduce these costs by a mere 10%, the pool LPs would be profitable.

The auction mechanism is presented formally per Moallemi's previous work, but the gist is straightforward. Arbitrageurs bid for the right to be the 'pool manager' who gets all fee revenue and sets the pool fee in a future period (e.g., a day). As they get all the fees paid, the pool manager will effectively trade for free because the fees come back to him as revenue, while other arbitrageurs would pay the X bp fee. This gives the pool manager a competitive advantage in arbitraging the pool, as those paying the fee will find any mispricing less than the fee unprofitable, while the pool manager could capture it. If the pool manager pays the expected fees and just half of the arbitrage revenue, virtually all v3 pool LPs would be profitable. This is because gross arbitrage revenue—i.e., excluding fees—is the flip side of their convexity costs.

Chicago professor Eric Brudish has been promoting frequent batch auctions as an alternative to CLOBs for the past decade (see here and here), highlighting the tax generated by hedge fund sniping, where speedy high-frequency traders scoop up resting limit orders before the hapless market makers can remove them. Milionis, Moamelli, Roughgarden, and Zhang's Loss vs. Rebalancing paper alludes to hedge fund sniping in the context of AMMs, but that's far-fetched. AMMs do not have race conditions analogous to traders picking off stale limit orders. Even within the snail's pace of blockchains, you do not see LPs trying to do the inverse of just-in-time liquidity to get out of the way of trades (removing liquidity before a big trade and then adding it back in after the trade). The sniping form of adverse selection is sexy, focusing on unsympathetic high-frequency traders portrayed by best-selling author and SBF fanboy Michael Lewis. Still, it's, at best, a curiosity. A generous estimation of its effect is 0.4 basis points on average CLOB stock market spread of 3.5 basis points—a trivial effect—and irrelevant in AMMs. The bottom line is that arbitrage profits generate standard adverse selection costs for all market makers. With AMMs, this cost has an explicit formula.

Nonetheless, the sniping-motivated auction literature explains why Professor Moamelli had an auction solution ready. Academics like auctions because, in theory, they do not have any deadweight losses and are efficient given simplifying assumptions. In practice, however, auction complexity generates problems, which is why most asset markets use CLOBs, which have reduced stock trading costs by 90% from 1990 through 2010.

Uncertainties in the Bidder's Expected Arbitrage Profit

Expected arbitrage profits are a linear function of variance, the square of volatility. The average daily variance using minute-downsampled data is 15 basis points, but the distribution is flat with a long tail, so we should expect bidders to assume something well below the mean.

Expected arbitrage profit is also a linear function of liquidity, which is also variable, especially in the popular v3 pools [they acknowledge their mathematical model only applies to v2 pools, but clearly, the application would be to v3 pools]. An intriguing example occurred on March 6, where the ETH-USDC 30 basis point pool liquidity was a whopping 30 times higher than its standard value over a 5% price interval, from $3,612 to $3,812. This was due to a massive LP position above the current price, where the LP probably thought it would be an excellent place to sell their ETH without any hassle. Interestingly, in this case, the LP lost $2.6MM in convexity costs and captured $2.2MM in fees, for a net loss of about 40 basis points on the $100MM in ETH they effectively sold. This was not a good deal, but not horrible given the size. The bottom line is that the expected variance of arb profit in the face of these liquidity spikes increases the expected arb profit volatility even more, making it more difficult because the LPs can withdraw it at any time.

Huge LP Position Taken Out Last Week



New Gaming Strategies

There is also the issue of the unknown unknowns in this setup. For example, i
n March 2020, a bidder rigged an auction to buy $8.32 million in ETH for zero DAI.  While that attack has been exposed, who knows what strategies this approach generates? For example, one could handicap an arbitrageur by withdrawing all of your liquidity when an adversary wins the pool manager's right. This could be a part of a long game where the punisher is willing to incur a cost to inflict a cost on others, getting them to stop bidding. As withdrawals are standard in v3 pools, it is difficult to see how to lock in liquidity and avoid this.

While the pool manager would have an advantage in arbitrage trading, they would not prevent noise traders from inadvertently taking some of their profit, as people will randomly buy when the AMM price is low. To prevent that, a pool manager might target traders using high latency front ends like Uniswap, and front-run these trades. Conversely, traders could see the current mispricing and anticipate an arbitrage trade, sandwich attacking the arbitrageur. While chains without the slow ETH mainchain could avoid explicit MEV tactics, one can expect motivated traders to see these transactions and get in front of them regardless, as latency for a decentralized blockchain is orders of magnitude greater than anything on centralized exchanges (100s of milliseconds vs 100s of microseconds). Unlike co-location on centralized exchanges, there would be back-room deals for access to the sequencers or RPC nodes, creating a deadweight loss to users.

Even without front-running, there is the question of how much of the total arbitrage profit a zero-cost ab could capture. These uncertainties would incentivize the Bidder to price arbitrage profit well below its expected value.

The paper also acknowledges the potential for an increase in sandwich attacks because the zero fee would make these more attractive. This would be like the front-running attack on lucky noise traders, but instead, it would target any large trade. The authors mention that off-chain auctions or private relays could address this, but this complicates the mechanism even further (how are private relays incented?). A strategy space grows exponentially in the number of moves players can make. Simplicity is the key to security.

Lastly, there is the issue of to what extent arbitrageurs are hedging off the blockchain. Given the mispricing distribution, we can see that the AMM price is generally within the fee for the 5 bp pool. As centralized exchanges charge at least 2.5bps, and there is always slippage (i.e., price impact, spread), the total cost of a hedge is at least 5 bps at the size they would trade (chunks of $50k). The arbs cannot hedge each AMM arbitrage trade with a CEX trade and lock in a profit. They probably hedge every hour or day or when their on-chain position reaches some absolute level. 

This practical nature of hedging implies the arbs will have the opposite risk profile as the LPs, becoming short during price declines and long on the opposite. As this arb would have tens of millions at work, it creates an endogenous risk amplification mechanism because a price decline would make the arb profit and also increase his short position, so he would have the means and motive to push it down further to hit stop-loss orders on centralized exchanges.


If average convexity costs for a pool were 100, the uncertainty about future liquidity and variance and how much of the arb could be captured would lower the didder's price well below that, say, half. The arb would have to pay hedging costs and gas fees, reducing this further. Then there is the loss of the fees the current arbs pay. It's possible that LPs would make less money, just as ObamaCare, in theory, was supposed to reduce medical costs but, in practice, increased them. 

Nevertheless, this is a step in the right direction, addressing a major problem in the status quo. Given that the difference between revenues and costs on these pools is 3%, the bidder could pay a mere 10% of the expected total arbitrage profits and bring LPs back to profitability. The biggest downside is that there are many complexities generated by this approach, implying a lot of work for a temporary solution, as the best-case scenario would capture a small fraction of the arbitrage profits, leaving significant room for improvement.