Yann LeCun, one of the world’s most influential AI researchers, recently shared a 3-hour-long interview with Lex Fridman1 titled “Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI”. This interview is remarkable, and with its clear and articulate thinking, Yann LeCun demonstrates what science in motion looks like at its best. I highly recommend this interview to anyone claiming an interest in Artificial Intelligence (AI). However, I do not fully support all the propositions put forward by Yann LeCun, and these divergences of opinion might turn out to be more consequential than a mere academic debate. In the following entry, I review this interview, starting from the propositions that are in most dire need of broader support, and continuing with the ones that I challenge.

A scientist teaches an AI how to pour tea.

The current hype around AI derives from Large Language Models (LLMs) - what Yann LeCun correctly characterizes as being, in fact, autoregressive LLMs. Tokens (word fragments) are generated one at a time, feeding back to the model that generated the previous token. LLMs are the secret sauce behind ChatGPT. Most people, including myself, who spend a few hours with GPT4 cannot help but feel a sense of vertigo. This might not be Artificial General Intelligence (AGI), but it is still mightily impressive. The shock is profound, and many have started to call for drastic emergency legal measures as AGI is frequently presented as a genuine extinction-level threat for humanity - a real-world Terminator scenario.

Doom and Gloom

Yann LeCun essentially argues that this ‘doom and gloom’ perspective is complete nonsense, and I fully support this position. The doomsday scenario requires AGI to be an event: there must be a point in time where one AGI is so far ahead of everything else that it can simply take over. However, Yann LeCun argues that AGI is not going to be an event, and I could not agree more.

While the general public may get the impression that the technology leapt forward (essentially overnight) back in 2023, this is absolutely not the case. On the contrary, progress has been incredibly, tediously, and maddeningly incremental over the last 5 decades. Furthermore, it is not “one” thing being improved, but dozens of largely unrelated concerns: better paradigms, better algorithms, better hardware, better datasets, better methodologies, better codebases, etc. All of these improvements have been generated (for want of a better phrase) by an incredibly diverse crowd of contributors. Thus, I estimate the probability of leapfrogging into AGI to be nil. Unfortunately, prophesizing the end of times is as old as mankind itself, and quite a few people have hopped on the bandwagon for self-aggrandizement purposes2.

Once we remove the ‘event’ aspect of AGI, it is very hard to see why AGI would ever present as an extinction-level threat. Any malicious AGI, or any AGI operated by a malicious actor, will be countered by somewhat similarly capable AGIs that armies, intelligence agencies, companies or even universities or hobbyists would have developed. AGI will just be a late addition to the already very long list of technologies that can do immense damage when used for nefarious purposes.

In fact, I would even argue that even if humanity were to leap straight to AGI - for example, thanks to an alien civilization handing over ready-made blueprints - the doomsday scenario would still be very unlikely. As pointed out 80 years ago by Hayek3, there are two types of knowledge: the special (formulas, concepts, etc.), and the mundane (knowledge of time and place). Intellectuals almost invariably treat special knowledge as if it were the only sort of knowledge that matters. An AGI would be the pinnacle of special knowledge and thus, unsurprisingly, many intellectuals derive from this observation that, as the master of special knowledge, this AGI would also be the master of the world overall. Yet, for all its “special knowledge”, this AGI would still lack all the mundane knowledge, and this is an immense gap that should not be understated.

Furthermore, AGI will be the product of an advanced industrial civilization. As such, it will be acutely dependent on complex - and fragile - supply chains. Indeed, semi-conductors are one of the most complex industries there is4, and any alternative hardware technologies (if AGI does not happen through semi-conductors) will almost certainly be just as sophisticated—and just as fragile. The continued existence of the AGI will depend on the active support of humanity for decades, if not centuries. People have had plenty of time (and opportunities) to address the inevitable problems that have accompanied the rise of every technology so far. There is, at least so far, no indication that this technology will be any different.

LLMs are insufficient for AGI

LeCun argues that bigger and faster LLMs are insufficient to get us to AGI. I agree, although, as we will see in the following, for reasons that are quite unlike the ones presented by LeCun. Yet, I do fully agree that LLMs, as they presently exist, are at very least incomplete. While LLMs do pass the Turing Test with flying colors, as LeCun correctly points out, the Turing Test is just a very bad test of intelligence that can be gamed in numerous ways that were not conceivable for Alan Turing and his contemporaries back in 1950.

The property that LeCun attributes to intelligence is the capacity to predict the future: the better the prediction, the greater the intelligence. Overall, I approve of this perspective, although I favor a variant. I define intelligence as the capacity to make better choices when considering future rewards. My definition differs in the sense that it does not care for knowledge per se, but accounts for the benefits that can be derived from it.

My definition covers the whole spectrum of intelligence from that of a snail to that of a great scientist. However, it leaves open what should be considered as “intelligent”, i.e., what should be the threshold, within this spectrum, that can be considered as a manifestation of “general intelligence”.

I propose that general intelligence is the capacity to intentionally improve intelligence itself. This intentional self-improvement of intelligence is something that is very specifically human (at least for now). Smart animals, like orcas or chimps, are capable of passing cultural baggage - hunting or foraging techniques - to their offspring. However, while they do exhibit a great deal of intelligence, they do not have anything remotely close to the riddles and tales that have been with mankind since the dawn of time. A riddle is nothing but an exercise intended to sharpen the mind for the sake of it. Tales play a similar role but leverage our emotional response to facilitate memorization.

Based on this test, LLMs are not intelligent yet as they remain unable to self-improve. However, as LLMs are mightily capable of generating useful code, they are closer to intentional self-improvement than any technology we have ever had. Yet the continuation of the journey to AGI still exclusively falls to human minds.

The curse of long sequences

As a variant of the decade-old argument of the curse of dimensionality, LeCun argues that LLMs are suffering from an unrecoverable design flaw: as token generation progresses, the error rate increases exponentially, hence guaranteeing garbage for any sufficiently long sequence. The technical argument is straightforward: let $${p}$$ be a lower bound for probability of picking an incorrect token. The probability of getting the correct sequence is at least greater than $${(1-p)^n}$$ for a sequence of $${n}$$ tokens. Hence, the exponential convergence to zero guarantees nonsense for any long-winded answer.

I do politely, but firmly disagree with Yann LeCun on that one. This perspective would be correct if the goal was to produce one single “correct” answer. Indeed, if we were trying, for example, to predict the sequence of digits of $${\pi}$$. For such a task, an LLM design guarantees that digits will rapidly come out wrong, which is exactly what empirically happens when trying to get an LLM to generate numerous digits for $${\pi}$$.

My first objection is that the ‘asymptotic’ perspective with $${n}$$ growing larger is an unreasonable demand for any system, even when considering a system of great intelligence. In fact, is it reasonable to demand from the LLM to be able to produce arbitrarily long, non-repetitive and continuously correct answers? I think not. This demand would not be reasonable for a very intelligent human, too. There is a limit to how much you can say about something, or how much you can plan a course of action before the abstract reasoning process (whatever the form) devolves into idle speculation. In practice, LLMs generate their own termination conditions. Thus, $${n}$$ is upper bounded by the model itself, and thus the convergence to garbage does not have to happen. We just have to accept that I don’t know is an acceptable intelligent answer to most questions.

The second objection is that the supposedly “self-evident” transition from per-token validity to per-sequence validity is, in fact, incorrect, and deceptively so. Indeed, the whole argument is derived from the idea that there is such a thing as an “incorrect” token (i.e., the probability $${p}$$). This is not the case, as the following example demonstrates:

Question: Was Pierre-Simon de Laplace a great mathematician? (respond like an opinionated Frenchman)

Answer A: No, absolutely not.

Answer B: No, absolutely not, he was the greatest of his time!

Starting the answer with No is seemingly a wholly incorrect token, as demonstrated by Answer A. However, when this answer is extended with the second part of the sentence, it becomes the correct token, and captures the tone and structure that would be expected here. The validity of the token “No” cannot be assessed independently from the tokens to be later generated.

Here we are touching on the problem, also mentioned in the interview by LeCun, that we do not know any genuinely useful ways to appreciate high-dimensional probability distributions. In language, insights cannot be derived from token-wise perspectives, just like they cannot be derived from pixel-wise perspectives for vision either.

Constant-resource answering machines

By and large, LLMs require a constant amount of computing resources (i.e., bandwidth, memory, and compute) to produce the next token. LeCun identifies this as a manifestation of a design flaw of LLMs, pointing out that certain questions are much more difficult to answer than others, and that LLMs appear to consume the same resources no matter the difficulty of the question.

I agree that it is self-evident that certain questions are harder to answer, and that answering those questions should require more resources. However, I disagree on the assessment that LLMs are constant-resource answering machines. The discovery chain-of-thought5 back in 2022 was a landmark for LLMs. In essence, by adding “Let’s think one step at a time” to the prompt, LLMs get much better answers for entire classes of problems. My own personal empirical assessment of LLMs confirms this, and hinting the possibility of a staged problem resolution to LLMs vastly increases the quality of the delivered answers. However, such staged answers, where the LLM does not rush into immediately answering but instead constructs the reasoning that will lead to the correct answer, is a direct manifestation of a varying expenditure of resources.

In 2023, I personally witnessed several LLMs being gradually improved in automatically picking the appropriate pace while delivering their answers, which now ranges from 1 word delivery (instant answer) to 1 page of tergiversation prior to reaching a conclusion. My working hypothesis is that the Reinforcement Learning from Human Feedback (RLHF) datasets has been progressively expanded to have the LLM get better at auto-picking the right pace to deliver their answer. This behavior disproves the idea that LLMs are constant-resource answering machines. Once again, LeCun appears to be making the mistake of deriving sequence-level properties from token-level observations.

However, it is also clear to me that LeCun is directionally correct in the sense LLMs can almost certainly be vastly improved by being a lot more aggressive in terms of varying the expenditure of resources. It is straightforward to imagine a setup where each question is fed to a classifier that picks the “right-sized” LLMs to perform the token generation. In fact, for any given challenge, most LLM practitioners – me included – seem to start with the biggest and costliest LLMs out there. Once it is working, we attempt to gradually move toward smaller and cheaper LLMs, stopping when savings do not seem to be worth the extra efforts. In the future, the process will most likely go away either through better LLMs or through better instrumentation.

The high-bandwidth path to AGI

Language is a low-bandwidth channel compared to vision. In fact, LeCun points out that a child will acquire more information (think informational entropy) through her eyes than there is to be found through reading the entirety of Wikipedia. LeCun almost points out that vision is a highly redundant signal, which is very desirable from a learning perspective. From those observations, LeCun proposes that the path of AGI most certainly involves putting a high-bandwidth channel, like vision, front and center. This is because, realistically, there will not ever be enough written materials to go around to get to AGI. In essence, LLMs bypassed the need for high-bandwidth channels by cheating: LLMs work because of a short series of preexisting, large, high-quality corpuses (e.g., Wikipedia, GitHub, Stack Overflow), but now that those corpuses have been exploited, we have nothing comparable to turn to. It would be nice to have 10 other online encyclopedias, just as rich and diverse as Wikipedia, but there simply are none. In all likelihood, there will not be any more until we get an AGI capable of writing them.

Yet, while this argument is seductive, there is contrary evidence. I had the chance to meet a few blind-from-birth people. If we were to follow LeCun’s line of reasoning, then not having access to high-bandwidth channels (such as vision) should hinder intelligence. Thus, childhood blindness should hinder cognitive development, at least a little when it comes to intelligence. Yet, my anecdotal sample was the opposite. I have found every single one of those person to be notably above average intelligence (e.g., better memorization, better verbal communication, and better planning). Concerning the nature of their handicap, it is unsurprising: blind people have to use their intellect all the time and in all sorts of ways to compensate for their lack of vision. Those observations weight against the hypothesis that high-bandwidth channels are the path to AGI.

In fact, considering the animal kingdom, I would pretty much state the opposite. Compared to most animals, humans have remarkably undeveloped senses. Most cat owners would agree that from their pet’s perspective, humans are half-blind and half-deaf creatures that cannot perceive a fly until it lands on their nose. Again, it is another piece of circumstantial evidence contradicting the proposition that high bandwidth channels pave the way to AGI.

While blindness at birth may not (in and of itself) hinder cognitive development, language deprivation does. While the experiment of intentionally depriving a child from access to language is a moral atrocity, it has been done multiple times throughout history6. Invariably, children have grown to be “retarded” (in the clinical sense of the term). Similar results have been obtained on primates, although, once again, ethical concerns prevent further investigation. Once more, while circumstantial, these pieces of evidence suggest language is instrumental in the emergence of intelligence, despite being a low-bandwidth channel.

Resolving Moravec’s Paradox

Moravec’s Paradox7 is the observation, originally made in the 1980s, that what seems to be amongst the simplest of problems, like grabbing a teapot and pouring tea into a cup, are the sorts of problems that are most challenging for artificial intelligence to replicate. LeCun argues that the one key ingredient that has been lacking so far is an abstract representation of the world or worldview. This worldview is expected to be critical to support all the base mobility operations (may it be a driverless car, or an android), and would also act as the true foundational layer of an AGI.

Presently, LeCun argues, LLMs do not know anything because they do not even have a sense of the world itself. It derives from this that all there is to LLMs is elaborate mimicry. This will not get us all the way to AGI because, well, LLMs are just very good at faking intelligence. Further, the lack of a worldview will doom them to be forever baffled by mundane aspects of our base reality that are so self-evident that humans do not ever bother putting them into words (and maybe would not even be able to). LeCun proposes that joint-embedding methods applied to vision datasets are currently the best shot we have at engineering this worldview, which will later pave the way to AGI.

I disagree with this perspective on two levels.

First, Moravec’s Paradox is not a paradox at all when approached from the correct perspective. The implicit hypothesis behind perceiving this situation as a ‘paradox’ is that animal mobility (and animal planning) is “nurture” rather than “nature”. In other words, acquiring mobility is a result of a learning process – one that has proven incredibly difficult to replicate through artificial means over the last four decades.

My proposition is that to a very large extent it is not a learning process at all. Let us consider that foals (and other members of the horse family, such as donkeys and zebras) can stand and walk within hours of birth. Thinking of this process as “learning” is incorrect. It is much more apt to consider this process as some sort of “calibration”. All the cognitive structures are almost fully developed. The system only needs to be nudged a little bit for the pathways to fall into place.

This largely innate aspect of animal mobility and planning is less obvious among humans, because compared to nearly all other mammals, human births are extremely premature, due to the development of the head being the bottleneck for a longer gestation. What many people perceive as the infants “learning stuff” is cognitive development that would proceed unimpeded if the infant were still in the womb for a few months more.

It stands to reason that those cognitive structures, being largely innate, are also a product of evolution. Those structures have (give or take) 800 million years of runaway evolutionary pressure to improve, over an incredibly large number of individuals. Thus, if we take present day animal mobility as the baseline of what we want from a machine, it is entirely unsurprising that this baseline turns out to be incredibly difficult to beat. The challenge is just as difficult as engineering a device better than trees at converting sunlight into structural material, or out-competing ribosomes as macromolecule factories. Outperforming evolution at games that have been played for eons is brutally difficult.

However, abstract skills, like storytelling, have only been part of the evolutionary game for the last 1 to 5 million years (give or take). Furthermore, the number of individuals was also incredibly limited, as great apes are not only very rare (putting modern humans aside) but long-lived as well, hence slowing down the evolutionary process, at least compared to most other animals. Thus, to achieve mobility, we are left to engineer what nature most likely does through hundreds of intricately coupled heuristics. Those heuristics may revolve around some sort of inner “worldview” as LeCun suggests, but my gut feeling tells me it is not the case. I am profoundly skeptical that a snail has any worldview whatsoever. Yet, the snail can correctly perceive and navigate the complex 3D maze that vegetation represents. Snails make intelligent choices, but without an intelligent worldview.

Thus, as a second point of disagreement, I do not see LeCun’s concept of a worldview as being anything but mostly peripheral to the emergence of AGI. My take is that the “worldview” (as seen by LeCun) will be one of those optional modules that an AGI could leverage for specialized tasks that require mobility, exactly like an arithmetic module, However, fundamentally, they will not be needed for the AGI itself.

In nature, a “worldview” was a necessary evolutionary path to AGI because animals cannot leapfrog into general intelligence. However, machines have no such constraint, as long there are humans to cater to their needs (e.g., energy, materials) while we jointly figure out how to crack through the last frustrating tidbits, like figuring out inorganic substitutes for muscle fibers, and all the sorts of problems that evolution has solved in ways that do not lend themselves to industrialization.

Nevertheless, while I do not think that the vision-driven path LeCun pursues will lead to AGI, I do believe that this path has a fairly good change of cracking the mobility problem, which is a grand problem that very much deserves a good solution.

Hierarchical planning

Through the anecdote of planning a trip from Paris to New York, LeCun illustrates that we lack truly general solutions to approach hierarchical planning. For any given problem, we can somehow conjure ad hoc hierarchies, but we have nothing close to being able to perform this process automatically - operating within abstract representation spaces. While I do agree with LeCun that we do not have the capabilities to do that (i.e., hierarchical planning in abstract representation spaces), I disagree that we do not have a path for effective hierarchical planning.

I propose that hierarchical planning can be solved satisfyingly - with complete automation - with text, aka LLM-style. Being able to leverage abstract representation spaces is an appealing idea, but just like if we cannot have fusion reactors, nothing prevents us from using the next best thing - fission reactors, which also represent a satisfying answer to that particular problem (energy production).

Returning to the Paris-New York trip, planning through words is incredibly efficient because humanity has organized itself through those words. There is no such thing as a “taxi” from an automotive engineering perspective: it is just a regular car. What makes “taxi” (the concept) so effective planning-wise is that, in any airport, the voyager can expect to find a series of taxi signs - bearing the word “taxi” - that lead to a place where the services of a driver (with a car) can be rented relatively cheaply and on-demand.

LeCun proposes to decouple hierarchical planning from language as if it was a given that doing so would result in better planning, liberated from accidental inconsequential linguistic concerns. I argue that those concerns are neither accidental nor inconsequential. The taxi signs are physically real, and when there are not enough of those signs, voyagers get lost and distressed.

Unlike the animal kingdom, planning in general intelligence settings - aka planning within a civilization - is most efficiently performed by rigorously sticking to the agreed-upon abstractions. For example, the concept of “US plug” means very little for a physicist, but for our traveler going from Paris to New York, anticipating the need for a US plug adapter to charge his smartphone is a smart move. Again, acquiring a “US plug adapter” will be done by the voyager by checking what’s written on the box prior to purchasing the device.

Civilization is shaped by language, and language shapes civilization. I am open to the idea that there exists something better than language to perform truly general, deeply layered, planning. However, this is far from being as self-evident as LeCun makes it out to be. An AGI will have to operate in the world of humans for a long time. Centuries down the path, AGIs might bypass tokenized language entirely, possibly communicating and planning through embeddings derived from agreed-upon worldviews. However, this is highly speculative.

My take is that beyond-human AGI would start by simply improving the linguistic constructs themselves. Few people realize how closely science has progressed alongside language. Many things that we consider self-evident today are only so because we have proper words for them, with the proper semantics attached. The Sun is a star like all other stars, or conversely, stars are just distant suns. Yet, until Giordano Bruno postulated it was the case in the 16th century, the Sun and the stars were perceived as wholly different in nature. Thus, when we get AGI, it is reasonable to assume that it will simply resume the linguistic process, at least for a while.

Learning and inference

In present-day LLMs, mathematical optimization only happens during the learning stage, while inference is a single-stage non-iterative process (cf. the constant-resource answering machine discussed previously). As a more difficult question should require more efforts to be answered, LeCun postulates that optimization should happen during the inference process. In other words, if we had any kind of truth-seeking optimization process to apply to the inference stage of the LLM, this process would nicely fit into what we expect from both easy and hard questions.

Here, LeCun openly speculates about the characteristics of superior systems, and while I disagree, my disagreement is just as speculative as the proposition put forward by LeCun. So, take this with a grain of salt. In a nutshell, my take is that the crux of the problem is incorrectly segregating learning from inference. The problem is not so much that inference lacks optimization, but rather that learning, and inference are being segregated when they should not be.

To unpack this proposition, let us consider that one of the most severe flaws that I perceive in LLMs is their strange behavior when it comes to learning—at least if AGI is the intended outcome. This behavior is strange on three accounts.

First, LLMs are quite insensitive to low quality corpuses at the very beginning of their training process and become increasingly sensitive as their training progresses. Once we reach the stage of late fine-tuning, even a very minor amount of garbage input can completely unravel the whole LLM (been there, done that).

This behavior is exactly the opposite of what I would expect from a system gradually acquiring “intelligence”. As a thought experiment, imagine if two people had to listen to a 15-minute talk arguing that the earth is flat. The first person is a 5-year-old boy; the second is 15-year-old boy. It is evident that the odds of messing up with the belief system of the younger boy is much higher than achieving the same outcome with the older boy. Thus, as we age and gain intelligence, we become more resilient to garbage input. Naturally, this poses the problem of rigidifying our mind around bogus concepts, but all in all, becoming proficient at dismissing bogus information is a vital skill that every functioning human has to master to a certain degree.

Second, LLMs learn by processing input in a completely randomized manner. For example, an LLM might be randomly fed pieces of the Linux kernel codebase, obscure accounting regulations from Scotland, and kindergarten lyrics. The LLM being trained, keeps ingesting randomly selected documents, and keeps progressing as more documents are ingested, but the whole process is a little bit baffling.

Similarly, it is obvious that if some insane parents were to decide that they should spend hours every day reading treaties, in random order, about quantum physics to their infant, this would likely not prove conducive to healthy intellectual development. Parents instinctively know that they have to start simple and gradually ramp up the sophistication of the materials as the intelligence of the child develops. While a machine intelligence does not have to abide to all the properties of human intelligence, I am quite skeptical that the genesis of a mechanical intelligence could entirely bypass any sort of intellectual progression.

Third, LLMs are insanely inefficient with regards to their source materials. In terms of raw reasoning abilities, it takes the equivalent of hundreds of lifetimes of reading sessions for the LLM to acquire skills that many humans acquire in less than 20 years. LeCun mentions this very defect, and he attributes its cause to the lack of worldview (see above). My take is very different: I expect a truly well-engineered machine intelligence to be able to mature with less linguistic information than what a typical human gets in its first 20 years of life.

Indeed, as a casual observation, smart people are able to do much more, intellectually-speaking, with much less. Such people don’t need to be re-explained things from five different perspectives - the ‘redundancies are good’ mentioned by LeCun - to “get it”. They read between the lines. They make the right derivations from the little piece of evidence that is in front of them. They can even afford to be quite distracted when ingesting materials (e.g., a talk, a book), and correctly fill in the gaps afterward.

Thus, going back to my original proposition that the problem lies in the segregation between learning and inference, what is lacking for LLMs is most likely (my humble hunch) an ‘active listening’ skill - while all they have right now is a ‘passive listening’ one. Every time the system answers a question, it should correspondingly update its internal cognitive state. Indeed, the process by which an intelligence (machine or human) answers a question should lead to the improvement of said intelligence. By answering a novel question, something must have been learned.

LLMs already have a spark of this potential in them. Let us consider the GPT4-Turbo chat (see Annex 1) below. The basic question of mine is Do you think that the labor theory of value is correct?. In its first chat answer, the LLM is not capable of giving the correct answer. Instead, it regurgitates the ambient confusion found on the internet, throwing in a haphazard list of arguments made by countless people on the case. However, once probed again to reason and get to an actual conclusion, instead of being wishy-washy about it, the LLM gets to the correct conclusion. Spoiler alert: this labor theory of value is disproven. A great answer to this question would have started with The labor theory of value is a now disproven theory that was instrumental to the progress of economics … or something along those lines. However, as the LLM never had the opportunity to actively digest the information it received, this understanding never crystallized.

I would see several ways to augment LLMs with such an active listening capacity, which would include listening to the very answers it produces. One way could be a sort of ‘inner monologue’ approach, where every input is turned into an output, with this output being then used for learning. This pattern would address the first two problems above: the input would have to be fed in a sensible order to get the desired outcome (starting simple and ramping up the sophistication afterward); and the LLM would become increasingly capable of dismissing garbage input, as it is the output used for learning and not the input.

Another way could be a variant of the energy-based approach proposed by LeCun. The current SGD (stochastic gradient descent) step applied to every document ingested by the LLM is only the passive listening part. Another supplementary process would be required to compress/narrow the belief system of the LLM to a smaller region of space whenever a new input is ingested, including a naked question. A numerical optimization process should be introduced to act as a truth-seeking mechanism which would most likely take the form of seeking greater self-consistency for the outputs.

Conclusion

Yann LeCun is a great scientist, and this interview raises a whole series of fundamental questions. Whether humanity manages to craft an AGI in 20 years or 200 will really depend on how fast we can get both the right questions and the right answers. LeCun is banking on video data to pave the way to AGI. While I believe that LeCun has a fair chance of getting something truly remarkable out of his research efforts, namely solving the decade-old problem of general mobility for robotics, I am very skeptical that it will contribute to AGI. My humble take is the opposite direction: we already have more textual data than what we should need. We just lack novel paradigms. If my hunch is correct, then it is fantastic news for humanity because paradigms can never be owned, even less monopolized. Plus, paradigms are cheap. However, even if my hunch turns out to be incorrect, then I am glad that the Chief AI Scientist of Meta happens to be a fierce open-source proponent.

Annex: Chat, a critique of the labor theory of value

Joannes: Do you think that the labor theory of value is correct?

ChatGPT: The labor theory of value is a foundational concept in classical economics, particularly associated with Adam Smith, David Ricardo, and most famously, Karl Marx. It proposes that the value of a commodity can be objectively measured by the average number of labor hours required to produce it. This theory has been influential in shaping economic thought, especially Marxist economics, which uses it to analyze capitalist societies and the exploitation of labor. However, the labor theory of value is subject to debate and criticism, especially from the perspective of neoclassical economics and marginalist theories which emerged in the late 19th and early 20th centuries. These theories argue that value is subjective and determined by individual preferences and the marginal utility of goods, rather than solely by the labor invested in them. Some critiques of the labor theory of value include:

  1. Subjectivity of Value: Marginalist economists argue that value is subjective and depends on individual preferences, needs, and the context in which choices are made. For example, water in a desert has a different value from water in a lake-rich region, irrespective of the labor involved in acquiring it.
  2. Neglect of Capital and Entrepreneurship: Critics argue that the labor theory of value underestimates or ignores the roles of capital, risk, and entrepreneurship in the production process. These factors can significantly affect the cost and price of goods and services.
  3. Technological Change and Productivity: The theory doesn’t fully account for how technological advancements and productivity improvements affect value. For example, automation can drastically reduce the labor needed to produce goods, which can lead to significant discrepancies between the labor value and market price of goods.
  4. Variability of Skill and Intensity: The theory assumes that all labor hours are equal, but in reality, the skill, intensity, and productivity of labor can vary widely. This makes it difficult to standardize the measure of labor value. Despite these criticisms, the labor theory of value remains an important concept in understanding economic history and the evolution of economic thought. It has been particularly influential in critiques of capitalism and discussions about income inequality, exploitation, and the distribution of wealth. Whether one considers it “correct” can depend largely on their economic philosophy, values, and the aspects of economic activity they emphasize.

Joannes: As a testable, and many times tested, scientific hypothesis, do you consider this theory to be proven or disproven? Reason about it, and then conclude. Your conclusion should be proven or disproven, nothing in between.

ChatGPT: When evaluating the labor theory of value as a testable, scientific hypothesis, it’s crucial to consider the methodology of economic analysis and the criteria for proving or disproving theories in social sciences, which can be inherently different from natural sciences due to the complexity of human behavior and economic systems.

Testability and Empirical Evidence:

  • The labor theory of value posits that the value of a commodity is determined by the labor required to produce it. This principle can be partially tested by examining the correlation between labor costs and market prices. However, empirical evidence shows a mixed picture. While there is often a correlation between labor input and prices, numerous exceptions are driven by supply and demand dynamics, technological changes, and consumer preferences.
  • Additionally, the theory struggles to account for the value of goods that require little to no labor to produce (such as digital goods or natural resources) but can command high market prices due to scarcity or demand.

Scientific Consensus and Critiques:

  • The advent of marginalist and neoclassical economics introduced the concept of subjective value and marginal utility, which provide a more flexible framework for understanding price determination and consumer behavior. These models better account for the variations in prices that cannot be explained by labor inputs alone.
  • The labor theory of value has been criticized for not adequately explaining the role of capital, risk, entrepreneurship, and technological innovation in value creation.

Conclusion: Given the criteria for scientific proof, which demands consistent, repeatable evidence and predictive capability, the labor theory of value as a universal explanation for value determination in economics is disproven. While it captures certain aspects of value creation and can explain some economic phenomena, it fails to account for the full complexity of how prices are determined in a market economy. The theory does not hold up against empirical evidence across diverse economic contexts and has been supplanted by theories that better accommodate the wide range of factors influencing value and price.


  1. Yann LeCun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416 ↩︎

  2. See the Geoffrey Hinton interview given to the New York Times, ‘The Godfather of A.I.’ Leaves Google and Warns of Danger Ahead, May 1st, 2023. ↩︎

  3. The use of Knowledge in Society, by Friedrich Hayek, September 1945 ↩︎

  4. For example, in 2022, TSMC Chairman Mark Liu reminded the public that if China were to invade Taiwan and to capture TSMC facilities and its personnel, the production capacities of TSMC would be rendered non-operable merely by their Western partners becoming unwilling to keep supporting them. ↩︎

  5. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou, January 2022. ↩︎

  6. Language deprivation experiments↩︎

  7. Moravec’s paradox↩︎