Wednesday, April 24, 2024

Why Evolution is False

 Recently deceased philosopher Daniel Dennett called Darwin's idea of natural selection the best idea anyone ever had, a universal solvent that melts away all the problems in biology. Dennet had contempt for Christians, coined the term 'brights' for those who shared his worldview, and thought it wise not to respect religion because of its damage to the 'epistemological fabric of society.' Like fellow atheist Richard Dawkins, he never debated biologists, just theologians.

In a 2009 debate, Dennet mentioned he brought his friend, an evolutionary biologist, to a 1997 debate with Michael Behe about his book Darwin's Black Box (1996) because he felt unqualified to address the micro-biology arguments. Dennett described Behe's book as 'hugely disingenuous propaganda, full of telling omissions and misrepresentations,' that was 'neither serious nor quantitative.' Dennet then added he would not waste time evaluating Behe's newest book, The Edge of Evolution (2007).1

Dennett emphasized his approach to understanding the world was rational, reasoned, and evidence-based. Yet, he never directly addressed Behe's arguments and instead stuck to the cowardly pleasure of debating non-scientist theologians with whom he had greater knowledge of biology. He admits he could not evaluate the arguments alone by enlisting a biologist to help him debate Behe. If he could not trust himself to evaluate Behe's argument, a rational approach would be to take a trusted source's opinion as a Bayesian prior, not as a fact so certain that its negation damages the epistemic fabric of society. Unfortunately, many, perhaps most scientists, agree with Dennett and think the only people who don’t believe in evolution are ignorant (e.g., see Geoffrey Miller here, or Dawkins here).

If Dennett had read Behe's Edge of Evolution, he would have seen it as a logical extension of his earlier book, not moving the goalposts. Behe's argument isn't based on parochial microbiology knowledge; it's pretty simple once one gets the gist.

Behe highlighted the edge of evolutionary power using data on malarial resistance to two different antibiotics. For the malaria antibiotic atovaquone, resistance develops spontaneously in every third patient, which, given the number of malaria in a single person, the probability malaria successfully adapts to this threat can be estimated as one in 1e12, or 1e-12. Resistance occurs once every billionth patient for the antibiotic chloroquine, giving an estimated successful mutation probability of 1e-20. This roughly squares the original probability, which led Behe to suggest that at least two mutations were required to develop resistance, as two mutations represent a squaring of the initial probability. This prediction was confirmed a few years later.

Extending this logic, given a base mutation rate of p per base pair per generation (e.g., p ~1e-8 for humans), if n mutations are needed, the probability of that happening scales at pn. Given that new proteins require at least 10 changes to some ancestor (out of an average of 300 aminos), the probability of an existing genome evolving to a new specific protein would be 1e-80. Given that only 1e40 organisms have lived on Earth, this implies that evolution is limited in what it can do (note most evolution we observe, as in dog breeds, just involves changes in allele frequencies).

A reasonable criticism is that this argument works for a specific target, such as Behe's example of malaria overcoming an antibiotic. However, the state space of possible unknown proteins is much larger. For example, the average protein has 350 amino acids; with 20 amino acids, that's 20^350 or 1e455 possible permutations. If there are islands of functional proteins within this vast state space, say at only a rate of 1e-55, that leaves 1e400 potential targets. The 'singular target' criticism does not invalidate Behe's assertion, but addressing it would require too much space for this post, so I will just mention it as a defensible criticism.  

However, most reviews of Behe's malaria-based argument centered on the simple assertion that generating n specific mutations scales at pn.

Ken Miller (Nature, 2007):

Behe obtains his probabilities by considering each mutation as an independent event

Sean Carroll (Nature, 2007)

Behe's chief error is minimizing the power of natural selection to act cumulatively.

Jerry Coyne (The New Republic, 2007)

 If it looks impossible, this is only because of Behe's bizarre and unrealistic assumption that for a protein-protein interaction to evolve, all mutations must occur simultaneously, because the step-by-step path is not adaptive.

These criticisms are based on two possible assumptions. One is that if multiple mutations are needed, single mutations encountered along the process of fixing the multiple mutations may each confer a fitness advantage, allowing selection to fix the intermediate cases. While this can be true, it is not for the mutations needed for malaria to develop chloroquine resistance, which needs at least two, and it is undoubtedly not true in general. Indeed, a good fraction of the intermediate steps reduce fitness, some severely (see here or here), which is why a fitness advantage from a two-step mutation does not happen with the same frequency as fitness enhancement that needs one mutation: in the wild, the intermediate mutations are eliminated from the population.

The other assumption would be if the number of indirect paths overwhelms the specific case where sequential mutations occur. There are many indirect paths from one sequence of 300 amino acids into another. The question is their probability, the sum of these successful paths over all possible paths.

Intuitively, multiple specific simultaneous mutations are astronomically improbable and would constitute an impenetrable fitness valley, which is why evolution proponents are emphatic that it is an absurd approximation. Their intuition is that if one needs a handful of specific mutations when one already has 100 or 400 of the needed amino acids in the correct spot, a cumulative process of some sort should be likely, even if the final steps involve neutral mutations that neither help nor hurt fitness. However, none of Behe's critics generated a model to quantify their reasoning, even though they are all scientists in this field.

Model of the Behe Edge of Evolution Argument

Hypothesis: Prob(get n specific mutations sequentially) ~= Prob(get n specific mutations over 1e40 generations)

The model below uses a trinomial lattice to show that if the probability of getting one mutation correct is p, the probability of getting n mutations correct is on the order of pn. Given the small probability of getting one mutation correct, this highlights what Michael Behe calls the edge of evolution: the fitness landscape generates an impenetrable barrier for naturalistic processes. Assuming simultaneous mutations towards the target captures most of the cumulative probability of reaching that target if mutations are neutral until the target is achieved. The other paths drift away at a rate that initially eliminates the benefit of being so close.

We can model this using a variant of Richard Dawkin's Weasel program that he used to demonstrate the power of evolution. It starts with a string of letters and spaces, 27 potential characters in a string of length 28.

neuh utnaswqvzwzsththeouanbm

Dawkins randomly generated strings, fixing characters that matched the target phrase from a Shakespearean play. The application to genetics is straightforward if we think of the phrase as a sequence of nucleotides or amino acids creating a protein.

xethinks dt is like a weasek

xethinks dt is like a weasel

xethinks it is like a weasel

methinks it is like a weasel

This algorithm fixes the characters that only make sense at completion. This is like assuming necessary mutations are selected with foresight, which would require some outside designer shepherding the process. Evolutionists countered that Dawkins' weasel program was merely to demonstrate the difference between cumulative selection and single-step selection, but this is disingenuous, as removing forward-looking selection increases the number of steps needed from 100 to 1e39, which would not make for a convincing TV demonstration (see Dawkins’ promoting his weasel here).

Nonetheless, the Weasel program is familiar to many who study evolution and can be used to illustrate my point. Consider the case where we are two characters away from the target sequence.

Start: 2 wrong, 26  correct

XXTHINKS IT IS LIKE A WEASEL

Case 1, closer: 1 wrong, 27 correct.

MXTHINKS IT IS LIKE A WEASEL

XETHINKS IT IS LIKE A WEASEL

Case 2, stasis: 2 wrong, 26 correct. The two mismatched characters can each change into 25 potential targets (27 – 2) that are also wrong, for a total of 2*25. For example, here are three.

YXTHINKS IT IS LIKE A WEASEL

XYTHINKS IT IS LIKE A WEASEL

AXTHINKS IT IS LIKE A WEASEL

Case 3, further: 3 wrong, 25 correct. Each of the remaining matching characters can become unmatched by changing into any of the 27-1 potential characters. The total number of paths is 26 x 26. Here are two.

XXXHINKS IT IS LIKE A WEASEL

XBTHXNKS IT IS LIKE A WEASEL

To generalize the problem, let us define L as the string length, c the number of potential characters at each element in the string, and n the number of needed changes to the string (initial mismatches). The set of potential new strings can be split into three groupings with their distinctive number of paths:

1.      strings that have more characters that match the target:

a.       e.g., n moving from 3 to 2, n possibilities

2.      strings that have the same number of characters matching the target: n*(c – 2) possibilities

a.       e.g., n changing one mismatched amino to another, so staying at 3

3.      strings that have fewer characters that match the target: (Ln)*(c-1) possibilities

a.       e.g., n moving from 3 to 4

The probabilities are the same regardless of which n characters are referenced. For example, the two sequences below are mathematically identical regarding how many paths are towards and away from the target, so they will have the same probabilities of moving towards or away from the target.

XXTHINKS IT IS LIKE A WEASEL = METHINKS IT IS XIKE A WEASLX

This makes the problem simple to model because regardless of where the string starts, the number of cases that need probabilities is at most L+1, as n can range from 0 to L. All mutations are considered equal probability; the number of paths up over the total possible paths {up, same, down} is the probability of moving up. We can, therefore, calculate the probability of moving closer, the same, or further from its target (i.e., nt+1 < nt, nt+1 = nt, nt+1 > nt) for each n <= L. This allows us to model the evolution of the string using a recombining lattice that evolves from an initial node, a standard tool for modeling option values.

At every row of the L+1 nodes in the lattice, we calculate the probabilities of moving up, across, and down via the following formulas.

1.      Prob(closer): n/(L*(c-1))

2.      Prob(same): n*(c-2)/(L*(c-1))

3.      Prob(farther: (L-n)*(c-1) /(L*(c-1))

Figure 1 below shows a case with five rows representing a string of 4 characters. The columns represent generations defined by a change in one of the spaces in the sequence (a mutation). The bottom row of nodes reflects zero matches, and the top level is a complete match. In this case, the initial node on the left implies 2 mismatches out of 4. If the new mutation flips a mismatched position to the target, it is closer to the target and thus moves up one level, just below the top row; if it stays two away by having an incorrect letter changed to a different incorrect letter, it moves horizontally; if a correct letter changes to an incorrect letter, it moves down.

Figure 1

A complete match is a success, and we assume that once the sequence reaches the target, it does not regress (it is then subject to selection and fixates in the population). Thus, there is no path downward from the top row. While this is not realistic in practice, it inflates the total probability of reaching the target and does not affect the gist of my argument. The point is to highlight the relative importance of the direct path as opposed to the actual probability.

In Figure 2 below, we have a snip from a worksheet modeling the case where the starting string has two mismatched characters from the target sequence of 'methinks...' For the first mutation, there are 676 changes away from our target, 50 that would maintain the same distance, and 2 that move towards the target, giving probabilities of 92.8%, 6.9%, and 0.3% to the node branches. At the next upward node in step 1, the probability of going up again and reaching the target is 1/728 or 0.14%, so the probability of reaching the target in two mutations is 0.27% * 0.14%, or 3.77e-6.

Figure 2

The 'nodeProb' row represents the probability of reaching the target on that mutation, while the 'cum prob' row represents the cumulative sum of those nodes, given we assume reaching the target allows the organism to thrive with its new protein function.

The node probabilities asymptotically decline, so taking the hundredth 'nodeProb' as equal to all subsequent nodeProbs generates a conservative estimate of the number of generations (G) needed to double the direct path probability.

2*nodeProb(2) = cumProb(100) + G*nodeProb(100)

G = (2*nodeProb(2) - cumProb(100) )/nodeProb(100)

For this example

G = (2*3.77e-6 – 4.23e-6)/3.00e-37 = 1e31

This implies at least 1e31 generations are needed for the cumulative probability to be twice the direct probability. As estimates in this field have logarithmic standard errors (e.g., 1e-10 to 1e-11, as opposed to 1.0e-10 to 1.1e-10), a direct path probability within a factor of 2 of the cumulative case over 1e31 paths is about the same.

Modeling this with a lattice allows us to see the relative importance of the direct path because everything reaches the target over an infinite amount of time, so the probability of reaching the target is the same regardless of the starting point. With a lattice approach, we can model the finite case that is still large enough (e.g., 1e40) and see the relative probabilities.

The probability of a direct path is approximately equal to a simultaneous path because, if we assume mutations obey a Poisson process, the probability of a simultaneous mutation is the same as sequential mutations, just one probability times the other. For example, the malarial parasite cycles through several generations during an infection, and the resistant parasite could acquire the necessary mutations in generations 3 and 5 or simultaneously in generation 3 (both would have the same probability).

Thus, Behe's hypothesis that the probability of reaching a target n mutations away scales with pn is a reasonable estimate when intermediate steps are neutral.

The worksheet contains the Weasel program applied to 2 and 3 characters away from the target. It can be generalized straightforwardly for an arbitrary n of needed mutations, 20 amino acids, and protein of length L, as shown in the worksheet ‘12awayAminos.’ Looking at the worksheet '2away' in the Excel workbook, the probability of reaching the target in a direct sequence is the probability of hitting the first top node. All the probabilities here are relative because this model assumes a mutation along an existing string of amino acids. This occurs only at a rate of 1e-8 per nucleotide per generation in humans. So, the extension to amino acid changes needs an extra adjustment (nucleotide changes do not necessarily change amino acids). The purpose of this post is just to show the relative probability of the direct and indirect paths, which is independent of the absolute probability.

There are an estimated 35 million single nucleotide differences and 90 Mb of insertions and deletions, creating 700 new genes in chimps and humans that were not present in our last common ancestor. That's at least 100 functional mutations that must be fixed every generation over the past 6 million years. Genetic drift can generate the 100 newly fixed mutations, but this drift is random. The probability that a random amino acid sequence would do something helpful is astronomically improbable (estimates range from 1e-11 to 1e-77, a range that highlights the bizarre nature of evolution debates), which creates a rift within the evolution community between those who emphasize drift vs. those who emphasize natural selection.2 This debate highlights that there are rational reasons to reject both; each side thinks the other side’s mechanism cannot work, and I agree.

In Dawkin's Mount Improbable, he admits the de novo case is effectively impossible. He suggests indirect paths involving gene duplication, as the base structure provided by an existing gene would give the scaffolding needed for protein stability. Then, one must merely nuance a few segments to some unknown function. The above shows that if one needs a modest number of specific amino acids (e.g., a dozen), the probability of reaching such a target will be greater than 1 in 1e100. For this copy-and-refine process to work, it requires a protein space with a dense set of targets—islands of functionality in the protein state space—which is possible but eminently debatable.

1

See the 2009 debate between Dennett and Christian philosopher Alvin Plantinga, especially around 64 minutes in.

2

The 1e-11 estimate applies to an 80-amino string converting from weak to strongly binding to ATP, which is understandable given this is the most common binding protein machine in a cell, as it powers everything. Further, this was studied in vitro, which is 1e10 less likely to work in vivo. The 1e-77 estimate is for beta-lactamase, an enzyme produced in bacteria with a specific function, unlike binding to ATP which is common. Other estimates include 1e-65 for cytochrome c, 1e-63 for the bacteriophage lambda repressor, and 1e-70 for a certain phage protein. 

Tuesday, April 16, 2024

How to Eliminate Impermanent Loss

 Generally, markets are efficient in that it isn't easy to make above-average returns day-trading, and most mutual funds underperform market-weighted ETFs. Yet historically, in various applications, options have been underpriced for decades. For example, asset return distributions were known to have fatter tails than the lognormal distribution back in the 1960s (see Benoit Mandelbrot ('62) or Eugene Fama (' 65)). Most option market makers, however, applied the basic Black-Scholes with a single volatility parameter, which underpriced out-of-the-money options. On a single day, October 19, 1987, the stock market fell 17%, which, using a Garch volatility estimate, was a 7.9 stdev event. From a Bayesian perspective, that did not imply a miracle but, rather, a misspecified model. Option market-makers soon adjusted their models for out-of-the-money puts and calls, ending decades of underpricing (20 years before NN Taleb's Black Swan introduced the concept of fat tails to the masses as if it were 1986).

Another example is convertible bonds, which are regular bonds that are convertible into stock at a fixed price, a bond plus an equity call option. The optionality was underpriced for decades. In the 1990s, a hedge fund that merely bought these bonds hedged the option's equity delta and bond duration and generated 2.0 Sharpes, which is exceptional for a zero-beta strategy. Eventually, in the early 2000s, investment bankers broke these up into bonds and options, allowing them to be priced separately. This isolation highlighted the option underpricing, and this market inefficiency went away. One could add Korean FX options that were underpriced in the 90s, and I'm sure there were many other cases.

So, there is precedent for a lot of money locked up in money-losing short convexity positions, as with liquidity providers (LPs) on automatic maker makers (AMMs). However, LP losses explain why AMM volume is still below its 2021 peak; transformative technologies do not stagnate for three years early in their existence unless something is really wrong. It's like how multi-level marketing scams like Amway have a viable business model relying on a steady stream of new dupes; they can persist, but it's not a growth industry.

The Target

The way to eliminate AMM LP convexity costs centers on this formula:

Impermanent Loss (IL) = LPvalue(pt) – HODL(pt)

LPvalue(pt) is the value of the LP position at the new price, pt, and the 'hold on for dear life' (HODL) portfolio is the initial deposit of LP tokens valued at the new price. It turns out that this difference is not an arbitrary comparison but instead captures a fundamental aspect of the LP position, the cost of negative convexity in the LP position. The expected value of the IL is the LP's convexity cost, which is called theta decay in option markets.

The LP convexity cost also equals the expected value decay of a perfectly hedged LP position (minus fees). Hedging is often thought of as reducing IL, and it does reduce the variance of the IL, eliminating extreme LP losses. However, this does not affect the mean IL, which, over time, equals the expected value. Lowering a cost's variance reduces required capital, which is why hedging is a good thing, but it will not solve the IL’s main problem, its mean, which is generally larger than fees for significant capital-efficient pools.

If we expand the above formula, we can rearrange variables to get the following identical equation as a function of pool token quantity changes and the current price (see here for a derivation).

IL = USDchange + pt×ETHchange

I am using a stablecoin USD and the crypto ETH as my two tokens because it makes it easier to intuit, though this generalizes to any pair of tokens (I like to think of prices in terms of dollars, not token A, but that's just me). The duration used to calculate the token changes implies the same LP hedging frequency. 24 hours is a pretty good frequency as hedge errors cancel out over a year via the law of large numbers, but anything between 6 hours and 1 week will generate similar numbers. If we can set ETHchange=0, USDchange will be near zero, and we will effectively eliminate IL.

An Extreme Base Case

One way to eliminate IL is to have a single LP who can trade for free while everyone else trades for a fee. Whenever the AMM price differs from the true price by less than the fee, only the LP can profit from arbitrage trading. A simple way to intuit this is to imagine if an AMM was created and not publicized, so it has no traders outside of the creator who assumes the role of LP and arbitrageur. With no other traders, he is trading with himself in a closed system. If his accounts are netted, he could both set the price on his contract efficiently and hedge his IL; nothing changes except the price. His net position in the tokens is static: ETHchange=0, IL=0.

It's helpful to think of traders as being two types: arbitrage and noise. Arbitrage traders are trying to make instant money off minor price discrepancies between exchanges; noise traders are like white noise, mean zero net demand that flows in like Brownian motion. A market price equilibrates supply and demand, so the volume that nets out is, by definition, noise. Noise traders are motivated by individual liquidity shocks, such as when a trader needs money to pay taxes, or the various random buy and sell signals generated by zero-alpha trading strategies.

If the AMM's base trading fee was 15 bps, and the LP could trade for free, the LP could turn loose an automated trading bot based on the Binance/CME/Coinbase price, trading whenever the mispricing exceeded 10 bps. Over time, the LP/arbitrager will be responsible for all the AMM's net price changes. With the AMM at the equilibrium price, immune to arbitrage by non-LP traders, the other trades will be white noise, net zero demand, by definition.

The figure below shows how the LP's ETH position for an AMM restricted range from $1300 to 1700 changes. The Pool ETH change implies traders are accumulating an offsetting amount; if the pool lost 2.5 ETH, traders gained 2.5 ETH. This net trader accumulation is arbitrage trading because it is not random, which is mean-zero over time. Gross trading will be greater due to noise trading, and in a healthy exchange, the noise traders will compensate the LP for his predictable IL over time.

If the monopolist zero-fee LP were the arbitrageur, at any price his net ETH position would be the same; the sum of those lines, his net position on the contract, would be horizontal. With a constant net ETH position, his impermanent loss would be zero. This works because, unlike traditional markets, the option seller is not reacting to prices but setting them. The latency inherent in any decentralized AMM implies the price setter does not need any alpha, just an API to the lower latency centralized exchanges.

The LP would still have exposure to this net ETH position, but this is simple to hedge off the AMM with futures, as it would not need frequent adjusting.

If we add a separate margin account on the contract that held the LP's trades (with himself), they will net to a constant, his initial token position. The monopolist LP's net position on the contract would be represented as in the table below (on average).

Single LP with Exclusive Arbitrage Trading Access

Thus, if the LP could trade for free, allowing him to dominate arbitrage, and he had a separate trading account on the contract, he could eliminate his IL without moving tokens on and off the blockchain or making other trades on different exchanges.

Multiple LPs

An AMM with only one LP would not work. An extensive set of LPs is needed to avoid centralization, which presents an attack surface for regulators and hackers; it is also needed to give the AMM unlimited scale. However, now we must consider LP competition. If we gave all the LPs free trading, a power law distribution of trading efficiency would invariably reveal a few dominant LPs monopolizing the arbitrage trading, shutting the slower LPs out and leaving them all exposed to IL as before.

Fortunately, the AMM provides a simple mechanism to allow all the LPs to arbitrage the price and hedge their positions simultaneously. This is because, on an AMM, the price is only changed by trades, so given a specific liquidity amount, a price change implies a specific change in ETH and vice versa. This makes it feasible to apply a rule using the LP's net ETH position on the contract to see if they qualify for the free discount.

Consider the following framework. Each LP gets a trading account in addition to their LP position. The key point is these accounts are netted, so like above, the LPs can retain a constant net position as the price moves significantly.

This can be adjusted to give them capital efficiency without changing the implications.[see footnote below] For my purpose in this post, this is irrelevant.

In the earlier example with one LP, his margin account's net change is calculated using the same liquidity as the pool because, with one LP, his liquidity is the pool's liquidity. This implied the LP's trades would always exactly offset his pool token changes. With many LPs, this is not so. A trade is against the entire pool, which is necessarily greater than any one LP. Thus, if LP(i) trades in a pool with many LPs, it will change LP(i)’s individual net position.

netETHchg(i) = (totLiq - liq(i))/totLiq *ETHtrade

As totLiq > liq(i), each trade changes the LP's net position, unlike when the monopolist LP trades with himself. For example, if an LP owns 10% of the liquidity, buying 1.0 token increases his margin position by 1.0 but decreases his pool position by only 0.1.

Assume LPs can only trade for free in the following conditions:

Buy for free only if

PoolEth(i) + MarginEth(i) - initEthDeposit(i) < + 0.01

Sell for free only if

PoolEth(i) + MarginEth(i) - initETHDeposit(i) >  - 0.01

If we assume the LP hedged his initial ETH deposit off-chain, the rules basically say if an LP is net net long, he cannot buy for free; if the LP is net net short, he cannot sell for free [presuming she hedged elsewhere initially, her ‘net net' position is her net position minus her initial deposit, as we assume it is hedged].

To see how this works, assume we have two LPs, Alice and Bob, with equal amounts of liquidity. Both Alice and Bob start with pool ETH positions of 990, and they have zero positions in their margin accounts. The trading fee is 0.5%, so all the price changes in this example are only profitable opportunities for Alice and Bob.

LP Pool, Margin, and Net ETH Balances

Assume the price initially rises by 0.41%, generating a arbitrage opportunity for the LPs. Alice wins the race and arbitrages this price discrepancy, setting the AMM price at its new equilibrium level of $3,313. She bought 4 ETH, and her pool position declined by 2 for a new ‘net net’ ETH position of +2. LP Bob just sees his pool position decline and has a new net net ETH position of -2; Bob is short, and Alice is long.

Alice cannot arb the AMM if the price rises again due to the rule preventing long LPs from buying at zero fee. If the price rises, Bob will win the arb race (by default), and LP’s Bob and Alice are each flat again. In the table above, Alice buys in the rows where she has a light blue box, and Bob buys in the rows where he has a light orange box. They can only buy when their net net position is flat or negative when prices are rising. This prevents Alice or Bob from dominating arbitrage.

If the price immediately went back down after Alice initially arbitraged the pool, only Alice could arbitrage the pool price. This is because in period 1, Alice is long, and Bob is short, so Bob cannot sell for free, while Alice can. When she does sell, she restores the initial net position for both herself and Bob. Alice’s extra trading is benign.

The LPs will experience net token position changes for single trades or small price movements. Over longer price movements, however, each individual LP's net token change would be insignificant, and assuming the LP hedged, their net net position would be insignificant.

There is no reason to pay arbs to set AMM prices, as the latency in decentralized blockchains implies arbitrage requires no special expertise, unlike the price discovery on the lowest latency exchanges. External arbitrageurs generate a deadweight loss for high-latency decentralized AMMs, so there is no trade-off with price efficiency.

A specialist with fast APIs and redundant arbitrage bot programs distributed in the cloud could manage a bot for a set of LPs. If the contract allowed LPs to whitelist another address that could do one thing on his account: trade for zero fee. The rule would prevent this arb vault manager from overtrading some accounts at the expense of others, as either it would single out one account for trades that cancel out, like in the case of Alice buying then selling above, or quickly find that her accounts that have traded first in one direction can no longer trade in that direction. One could cap trade size so the arb vault manager does not max out a trade on an LP to generate a profit for the other LPs. Competition among such specialists would allow passive LPs to get most of the benefit from arbitrage/hedging without creating and running their own arb bots.

It wouldn't work on coins not traded on major centralized exchanges because it presumes the true price is common knowledge. The next Shina Ibu will have to first trade on a conventional AMM. Yet, that's a minor amount of AMM volume.

Alternatives

As the equilibrium price is presumed to be on CEXes, and an equilibrium price implies net zero demand trading, one could use an oracle to update the AMM's last price to this price just as an LP arbitrage trade would. Over time, the LP net token changes would not be explicitly tied to price changes such that LP position values had negative convexity (i.e., linear in the square root of price). The main problem for oracle-based AMMs is centralization. This creates an attack surface for hackers. It also creates an attack surface for regulators, as the SEC knows where American Chainlink CEO Sergey Nazarov, for example, lives. Once they figure out how to do it—it took regulators several years to shut down InTrade—they will prevent his company from supporting anything that competes with regulated industries like finance and sports betting because regulation’s quid pro quo is protection from competition. Another problem is incentives, in that with more players with different actions, the state space complexity increases exponentially, so any solution will be more costly and less efficient. Arbitrageurs with more skin in the game would invest more time exploiting the oracle than the oracle would defending itself, especially as it requires a collective decision-making process and will be much slower to respond.

A decentralized limit order book (LOB) could avoid IL like an oracle-based AMM by allowing the LPs to move their resting limit orders costlessly to the latest CEX price, but there would be several problems. First, on-chain LOBs aspire to look and feel like the trading interfaces we are used to on centralized exchanges, so they emphasize low latency. Low latency leads to quasi, if not complete, centralization, an attack surface, and, more importantly, insiders who will get the equivalent of co-located server access. As third parties do not audit these ‘decentralized’ exchanges, the LOB leaders or a lower-level dev could sell undisclosed privileged access, and the costs would be virtually zero. It would take a lot of faith to presume they are immune to this subterfuge. Secondly, it would require many messages to cancel and replace resting limit orders. Unlike my mechanism, where one account rectifies the mispricing, on an LOB each LP account must be adjusted separately. On a CEX, these cancel-and-replace orders are costless, but if the blockchain is marginally decentralized, it will cost a few cents, and zero is a significant price point. Lastly, no one has presented a mechanism to incent a manager to oversee a set of LPs who wish to outsource their active management coherently. The LPs would still compete for favorable queue positioning, making it needlessly complicated and shutting out potential passive LPs.

The auction approach proposed by Maollemi et al. lets people bid to be the zero-fee trader. In that approach, the LPs would sell their right to fees to an arbitrageur who would get privileged zero-fee trading rights over some future period. The bidder pays a lump sum to the LPs, and the trading fees for that period go to the bid winner (meaning his trading fees return to him, so he trades for free). Assuming risk-neutrality and zero capital costs or other expenses, the arb would pay the expected arbitrage profit, which equals the expected LP convexity cost. They would also pay for the expected noise trader volume. Considering the arb bidder would be exposed to risk in that actual volatility and noise trader volume will be much lower than expected on occasion, he would pay considerably less than the expected value of the arbitrage opportunity and noise trading fees. The arb would have to hedge his trades on a different exchange, requiring extra collateral, and have to move tokens on and off the blockchain between those two exchanges (always losing one and gaining another). Lastly, frequent novel auctions can be manipulated, which means they will, especially at first.

Conclusion

The signature AMM is the ETH-USDC 5 bps pool on the Ethereum mainchain. Over the past 12 months, it has generated about $44MM in LP fee revenue and experienced $54MM in convexity costs. Yet even if LPs were, in aggregate, making a profit, the convexity cost is still significant and unnecessary. Given that the entire Uniswap AMM market is 4x the above pool, and there are other AMM protocols, that's several hundred million dollars a year wasted annually. The good news is this can be eliminated, propelling AMMs out of their doldrums and securing a long-term solution to an essential blockchain mechanism: swapping coins.

Most crypto people do not intuitively understand an LP's convexity costs, but they are not some new speculative theory (e.g., hedge fund sniping). Gamma, convexity costs, and theta decay have been analyzed empirically and theoretically for the past 50 years. They are the unavoidable consequence of convex payouts based on an exogenous underlying stochastic price. Constant product AMMs that link trade amounts to price changes, combined with the latency, allow the derivative owner (LP) to both hedge and set the underlying price simultaneously.

I haven't met anyone who understands this because they would be excited if they did. It's not often you find ways of saving hundreds of millions of dollars a year. My friends in academia or tradFi don't have much interest, let alone knowledge, of AMMs. My acquaintances in crypto, even those actively building AMMs, don't understand convexity beyond the technical definition, so they do not think it is a big deal. Fear of convexity costs is the beginning of AMM wisdom.

I wrote my dissertation in 1994 on low-vol stocks, noting they generated abnormal returns because they generate a slight premium to the market at 30% less risk (see here for a history of low-vol investing). I took a job as a bank risk manager but was busy pitching the idea to various asset managers, including several at the major investment banks. They didn't reject what I was saying but were always eager to know if well-known people or institutions in this field were on board. They weren't, and no one thought a fund offering virtually the same return as the SP500, regardless of risk, was compelling. I was out by the time low-vol investing started to grow. The responses I get from crypto people to this idea are similar.

It's not an abstract argument, just MBA-level finance. My best hope for this approach is that in a few years, LLMs will pick it up as they scrape the web for data, and via pure logic, over a couple of years, ChatGPT6 will use it as an answer for 'How do I remove impermanent loss?' There is pleasure in just being right about something important. 

footnote

In a levered AMM, the initial pool amounts are multiples of the initial deposit. This implies the LP starts with debits in his margin account for both tokens. The process works as follows:

Take an initial ETH deposit, ETH0. Given the initial price, p0, and assuming a leverage of 20x, apply the following liquidity to that LP

liquidity = 20*sqrt(p0)*ETH0

Given that liquidity, the initial USD deposit, also levered 20 times, is calculated:

Initial USD deposit = liquidity*sqrt(p0)/20

The objective of monitoring the LP’s net position for free trading is the same as above. Here, the initial margin account is negative, and the LP is susceptible to insolvency. However, a more pressing concern is whether the LP will be insolvent in one of the tokens, even though their total position value is positive. A pool where all the LPs are solvent, but the LPs have zero of one of the tokens would prevent purchases of that token. The solution is to allow liquidation based also on the minimum net position for the LP in the two tokens.

If the LP actively hedges its position, its net position will be constant in both tokens, so LPs would only be subject to liquidation if they exposed themselves to IL out of negligence. 

Monday, April 08, 2024

Spurious High Frequency Autocorrelation

 A curious aspect of high-frequency data is that most come from centralized limit order books (CLOBs) where the bid-ask spread makes the data look negatively autocorrelated as trades are randomly made at the bid and the ask.

The returns driving this pattern are well below transaction costs, so they do not generate an arbitrage opportunity. However, one might be tempted to use the high-frequency data to estimate variance for pricing options or convexity costs (aka impermanent loss, loss versus rebalancing). This is a problem because the 1-minute Gemini returns generate a variance estimate 40% higher than one derived from daily data.

Variance grows linearly over time; volatility grows with the square root of time. Thus for a standard stochastic process, the variance should be the same when divided by the frequency. If the return horizon is measured in minutes, the variance of the 5-minute return should be half of the variance of the 10-minute return, etc. Variance(ret(M minutes))/M should be constant for all M. It’s helpful to divide the data by a standard, which in my case is the variance of the 1-day return over this sample (1440 minutes), so we can clearly identify those frequencies where variance is over and under-estimated. If the ratio is above 1.0, this implies mean-reversion at this frequency (negative autocorrelation), and if below 1.0, this implies momentum (positive autocorrelation).

Here is the ratio of the m-minute return variance divided by m for various minutes, normalized by dividing by the 1-day return. It asymptotes to 1.0 at 300 minutes.

I added the ETH-USDC 5 and 30 bp pools, and we can see the effect of stasis created by the fee and lower transaction volume. The one-minute return variance ratios for AMMs are well below 1.0, implying momentum—positive autocorrelation—at that frequency. Again, the effect driving this is well below 5 basis points, so it’s not a pattern one can make money off.

A common and easy way to calculate variance is to grab the latest trade and update an exponentially weighted moving average. This would generate a variance estimate at an even higher frequency than once per minute. The perils of this are clear when many important data are linear in variance, such as the expected convexity cost, and 40% is big.

As a refresher, I show how the common concept for impermanent loss equaling a function linear in variance because it’s not obvious. We start with the original IL definition

This is measured as

The following AMM formulas can substitute liquidity and prices into the above equation.

plugging these in, we can derive the following formula.

This ‘difference in the square roots squared’ function calculates a realized IL over a period (from 0 to 1). If one estimated this using daily data over several months, it would equal the expected IL via the law of large numbers.

One can estimate the expected IL using the expected variance and gamma of the position. The key nonlinear function on the AMM LP position is the square root of price, which has negative convexity. This implies a simple variance adjustment to the current square root to get the expected future square root.

Note that in the above, the variance is for the time period from 0 to t, so if it’s an annualized variance applied to daily data, you would divide it by 365. Returning to the above IL formulated as a function of square roots, we can substitute for E[sqrt(p1)] to see how this equals the ‘LVR’ equation with the variance.

Using (p1/p0 - 1)2 for the variance will generate the same formula as the difference in square roots squared if you use the same prices. We can’t measure expected returns, but average returns equal expected returns over time, and that’s all we have. One chooses some frequency to estimate the IL, regardless of the method. Just be careful not to use returns under a 300-minute horizon, as these will bias your estimate.