00:00:01 Introduction and discussion on generative AI.
00:01:11 Exploring generative problem and its solutions.
00:02:07 Discussing the evolution and progress of generative AI.
00:03:02 Milestones in machine learning and accessibility of tools.
00:04:03 Discussing the quirks and results of AI tools.
00:05:00 Popularity and accessibility of generative AI.
00:06:33 Image generation with Stable Diffusion, becoming accessible.
00:07:37 Discussing accessibility of generator tools.
00:08:43 Explaining high dimensional object generation.
00:09:38 Challenges and improvements in dimensional capacity.
00:10:07 Exploration of text generation and its limitations.
00:11:15 Discussing consistency at different scales.
00:12:24 Moving to topic of generator specificity and versatility.
00:13:46 Comparison of AI-generated output with human output.
00:14:59 Discussion on machine learning models and language generation.
00:15:51 Exploring the cutting and pasting method in AI.
00:16:30 AI’s lack of common sense highlighted.
00:17:26 Mention of ChatGPT’s IQ test performance.
00:18:45 Discussing AI understanding and examples.
00:19:47 AI’s shallow understanding and high dimensional blending.
00:20:41 Complexity of artificial intelligence and its history.
00:21:58 The unknown elements and progression of AI intelligence.
00:22:25 Discussing changing perception of intelligence.
00:23:45 Insights on deep learning and artificial intelligence.
00:24:24 Concept of latent knowledge in human languages.
00:25:59 Understanding of universe in ancient and modern times.
00:27:02 Introduction to the concept of ‘anti-fragility’ from Nasim Taleb’s book
00:28:01 Anti-fragility in ecosystems and human societies
00:29:31 Critique of ChatGPT’s ability to generate ‘intelligent’ discourse
00:31:05 Considering the applications of generative AI in enterprises.
00:31:37 The potential role of generative AI in supply chain management.
00:33:34 Limited capabilities of ChatGPT in data-scarce domains.
00:35:00 Caution for using AI-generated code in critical systems.
00:36:04 AI benefits to the supply chain and peripheral activities.
00:37:37 Discussing the trend towards broader corpus for code completion.
00:38:45 Comparing parameter requirements: ChatGPT vs smaller generator.
00:40:45 The implications of generative AI on enterprise and supply chain.
00:41:19 Discussing Lovecraft’s view on deep truths of the universe.
00:42:01 Relation of technology misuse to supply chain software.
00:42:56 Concerns over creating and verifying fake case studies.
00:44:04 Critique of Lokad competitors’ vague marketing claims.
00:45:10 Discussing AI language model limitations.
00:46:08 Explaining the specifics of AI in technology.
00:47:00 Importance of specific terminology in AI.
00:48:01 Analogy of window purchasing to AI understanding.
00:48:48 Discussion on software architecture integration issues.
00:50:14 The importance of core design in enterprise software.
00:50:54 Example of core design in a transactional database.
00:51:48 The necessity for proper software design and integration.
00:52:52 Advice for assessing vendor’s technology.
00:53:36 Importance of publicizing achievements in tech.
00:54:20 AI as a buzzword and vetting vendors.
00:55:25 Closing remarks and interview end.
In this interview, Joannes Vermorel, the founder of Lokad, discusses the state and impact of generative AI, specifically focusing on advancements like ChatGPT and stable diffusion. Vermorel explains generative AI and its history, highlighting the incremental progress made in image and text generation. He mentions the user-friendly nature of recent tools like Stable Diffusion and ChatGPT, which have improved success rates and accessibility. Vermorel emphasizes the limitations of current AI models in terms of common sense and true intelligence. He also discusses the challenges and potential of AI in supply chain management and criticizes the vague and misleading claims made by some companies about their AI capabilities. Vermorel emphasizes the importance of understanding the underlying technology and being cautious when evaluating AI solutions.
In the interview between host Conor Doherty and Joannes Vermorel, the founder of Lokad, they discuss the current state and impact of generative AI, particularly focusing on advancements such as ChatGPT for text and Stable Diffusion for images.
Vermorel begins by defining generative AI as a collection of proposed solutions to the generative problem, which involves developing an algorithm or method to create one more instance of a digital representation of a collection of objects. He mentions that these kinds of problems have been around for decades and for narrow applications, there have been successful generators. For instance, generators have been used to create names of realistic sounding locations in England or titles for a Stephen King novel.
Likewise, in the realm of image generation, there have been generators capable of creating a map that looks like a setting from ‘Lord of the Rings’, complete with mountains, forests, coastlines, and fantasy names. The progression in this field, according to Vermorel, has been incremental with the goal of making generators broader and increasingly reliant on input data sets, rather than an extensive set of pre-coded rules.
Highlighting two notable milestones reached last year by the broader machine learning community, Vermorel references ChatGPT for text and stable diffusion for images. While these were significant steps forward, making these tools more accessible, he insists that they were incremental rather than groundbreaking, representing no new discoveries in statistics, mathematics, or computer science.
However, the fact that these tools were packaged and polished enough to allow laypersons to begin using them within minutes was certainly noteworthy. This stood in contrast to earlier generative tools which, although capable of generating impressive images or text, often came with many quirks and required a degree of expertise to operate effectively.
Stable Diffusion and ChatGPT stood out due to their user-friendly nature. With stable diffusion, for instance, one could enter a simple prompt, such as ‘beautiful castle in the middle of the forest,’ and get a plausible image 20% of the time. Though this was far from perfect, it represented a significant improvement over previous generation techniques which only had a 1% success rate.
This marked an order of magnitude improvement, a sentiment echoed when Vermorel speaks about ChatGPT. As with stable diffusion, the introduction of ChatGPT marked a shift toward more user-friendly, accessible tools in the realm of generative AI.
In this segment of the interview, Conor Doherty and Joannes Vermorel, founder of Lokad, discuss the recent evolution and impact of Generative Pre-trained Transformer (GPT) models. Vermorel emphasizes that the popular ChatGPT model is not fundamentally new, but rather a repackaged and more accessible version of pre-existing technology. He marks 2022 as the milestone year when generative AI became widely available to the public, largely due to improvements in usability.
The conversation then shifts to specific instances where GPT models have made a significant public impact. Vermorel draws attention to last year’s releases, like Stable Diffusion and ChatGPT’s third iteration. The appeal and success of these models, he explains, lie in the effort made by research teams to package these technologies in a user-friendly manner.
Vermorel provides examples of this accessibility. He notes that stable diffusion, a tool for image generation, was released as open-source software. This allowed users with minimal Python experience to set up a Python programming environment in approximately two hours and explore the tool independently. Vermorel highlights that to use stable diffusion, one doesn’t need to be a proficient Python programmer; a basic understanding of command-line execution is sufficient.
He also references the availability of online tutorials and the launch of a free user interface, Dream Studio, which allows users to generate up to 100 images for free. For subsequent batches of images, users need to pay a fee, a model which also applies to GPT’s web application.
Joannes Vermorel initially explains the complexity of creating a high-dimensionality image, citing an example of a 1000x1000 pixel image, which in essence equates to three million dimensions when considering the three primary colors. He further mentions that the initial iterations were limited to 512x512 capacity, though enhancements are in progress.
Similarly, the text generation issue is discussed. Vermorel explains that the dimensionality in text generation arises from two directions. The first pertains to the input size or prompt, which can vary from a line to multiple paragraphs or even pages. The second involves how far the generation can go before the model starts losing consistency.
Vermorel points out the limitations of current models, as they can’t produce an entire book consistently from end-to-end. The challenges increase with the size of the text: a single word requires local consistency, a sentence requires consistency at a larger scale, a paragraph even larger, and a book would potentially involve millions or tens of millions of abstract dimensions.
The conversation then shifts to discuss the aspect of “generosity” in these models. Vermorel interprets this as the ability of a model to tackle different problems or generate diverse outputs. An interesting development in the past five years, as Vermorel states, is the deep learning community’s capability to leverage massive datasets.
Whether it is text data from a variety of sources like Wikipedia, web forums, or legal texts, the deep learning models have progressed to generate diverse outputs. They can now produce anything from poetry to legalese, code, or even genomic sequences when correctly prompted. The same is true for images, where the outputs could range from pixel art to photorealistic images or different styles of painting.
Conor Doherty asks Joannes Vermorel about the sophistication of AI models such as ChatGPT compared to humans. Vermorel elaborates on the notion of sophistication, explaining that it’s complex due to the need to define and clarify its meaning. In response to a possible application of the Turing test, he states that the current state of AI models heavily depends on blending vast amounts of data together, drawing from an enormous corpus of texts.
To some extent, he argues that what ChatGPT produces is a sort of advanced “cut-and-paste” process, piecing together chunks of text found on the internet. He acknowledges that the model’s power lies in the ability to stitch together these pieces in a grammatically and syntactically correct manner, identifying high-level statistical patterns that exist between words, phrases, and sentences. Vermorel stresses that the resultant text may sound human-like, but it’s mainly a reproduction of existing human-written content.
However, Vermorel tempers the discussion by highlighting that these models don’t possess common sense. He cites an example from the head of AI at Facebook, who asserts that even the most advanced AI models lack the common sense of a cat. This is because AI fundamentally operates on statistical relationships and lacks the intuitive understanding that comes with common sense. He illustrates this point with a humorous scenario where an AI model suggests a GPS route to avoid traffic in the middle of the Atlantic Ocean, missing the absurdity of the situation.
To further articulate the limitations of current AI, Vermorel discusses an Amazon research experiment where ChatGPT was subjected to a series of IQ tests. The results placed the AI model about a standard deviation below the norm, which resonates with his perspective on AI primarily patching together information without the innate understanding that humans possess.
However, he emphasizes that even a person with limited cognitive abilities is far smarter than a cat. This comparison serves to underline that even with all its impressive capabilities, AI is far from matching the intelligence level of a cat, let alone a human. Vermorel reminds us that despite our perception of a cat’s cognitive limitations, we are still a long way from creating an AI model with comparable intelligence.
This conversation underscores the complexity of AI sophistication, the process behind AI text generation, and the limitations that AI currently faces in terms of common sense and intrinsic understanding. It provides a valuable perspective on the state of AI and its current capabilities while tempering expectations for its immediate future.
Vermorel elaborates on the notion that AI’s comprehension of the world is incredibly shallow. He describes the processes these models use as “high dimensional blending of the input data”. He also entertains the possibility that with more sophisticated models, this could be enough to achieve intelligence, but he suspects that real intelligence may be more complicated.
In his view, the journey of AI has been more about identifying what intelligence is not, rather than defining what it is. This clarification process has been ongoing for approximately 70 years. He identifies the breakthrough of deep learning in 2011-2012 as a significant turning point, which allowed for a multitude of applications that have resulted in substantial insights. However, he emphasizes the uncertainty in the field. He posits that our understanding of intelligence might need to be redefined each time a new AI technique is developed.
The host then questions Vermorel about the improvements in AI’s performance across different iterations, focusing on ChatGPT. Vermorel agrees that generative AI, including ChatGPT, has improved significantly over time, but he notes the difficulty of quantifying the improvements needed to bridge the existing gap in AI’s understanding of concepts.
In response to Doherty’s query about how much better the fourth iteration of ChatGPT would have to be, Vermorel candidly admits the lack of certainty. He underscores that the issue is not merely one of linear progression. The fundamental problem, he asserts, lies in not knowing what we’re missing in our understanding of intelligence.
As a historical perspective, Vermorel notes that a century ago, an individual’s intelligence might have been judged by their ability to perform complex mathematical tasks such as inverting a matrix. However, our understanding and measures of intelligence have shifted and evolved significantly since then. AI’s development, he implies, might undergo similar transformations as we continue to explore and challenge our conceptions of intelligence. A century ago, capabilities such as inverting matrices or computing the 20 digits of pi were considered signs of superior intelligence. Today, however, these tasks are deemed mechanistic, achievable by a simple pocket calculator, thus questioning their association with intelligence. He notes that computers, despite being orders of magnitude better than humans at these tasks, are not considered intelligent.
Vermorel’s discussion transitions to the capabilities and implications of AI, particularly focusing on the generation using deep learning. He suggests that AI has exposed many tasks which, on the surface, seem incredibly challenging, but may not reflect intelligence as much as initially thought. As an example, he considers the text generation abilities of ChatGPT. Rather than demonstrating what intelligence is, Vermorel proposes that it reveals what intelligence isn’t. He sees ChatGPT more as a reflection of the enormous amount of latent knowledge within human language than a demonstration of true intelligence.
Expanding on the concept of latent knowledge, Vermorel describes it as the cumulative total of human understanding and knowledge, which is implicitly represented in language. This latent knowledge is often recorded in structured forms like databases, maps, and others, containing details like chemical properties, resistivity of materials, and fusion points. However, Vermorel asserts that language also embodies a significant portion of this knowledge. He argues that the words and phrases we use mirror our collective understanding of the universe. For example, saying that ‘planets orbit stars’ presupposes an understanding of astrophysical concepts.
This latent knowledge, he suggests, is embedded in even the simplest forms of linguistic expression, such as dictionary definitions, which can encapsulate much of modern science. He further contends that the absence of certain words or concepts can prevent some forms of knowledge from even being recognized. To illustrate this, he refers to the book ‘Anti-Fragile’ by Nassim Taleb. He explicates the concept of “anti-fragility” – a term Taleb coined to describe a state which not only resists chaos and disorder, but thrives and improves in such conditions. This stands in contrast to something “fragile,” which degrades under disorder, or something “durable,” which merely withstands chaos at a slower rate. Vermorel finds this concept significant as it introduced a fresh perspective to comprehend various systems ranging from ecosystems to human societies.
Their discussion extends to the inherent relationship between language and knowledge. Vermorel illustrates how the introduction of a new term or concept, such as “anti-fragile,” can substantially enrich understanding, albeit in a manner that can be challenging to apprehend due to the limitations of language. He emphasizes the role of language in expressing and communicating knowledge.
Moving towards the subject of artificial intelligence, Vermorel discusses the phenomenon of latent knowledge present in language. He points out that this latent knowledge plays a crucial role in applications like OpenAI’s ChatGPT, a model capable of generating human-like text. Vermorel critically describes ChatGPT as a “platitude generator,” attributing its seeming intelligence to its propensity to piece together widely accepted ideas or idioms from vast, diverse data sets.
Despite his criticism, Vermorel acknowledges the impressiveness of ChatGPT’s ability to generate coherent and contextually apt content even in domains the user might not be familiar with. This feature, he suggests, is due to ChatGPT being trained on a super massive data set comprising millions of pages of text from extremely diverse fields.
As the conversation progresses, they contemplate the practical applications of generative AI like ChatGPT within the context of enterprise and supply chain management. In Vermorel’s perspective, the impact of generative AI on supply chain management is unlikely to be significant, at least in a direct sense. However, he also underscores the challenge of predicting the future, implying that the scope and potential of generative AI might still evolve and surprise us in the times to come.
Vermorel asserts that despite the growing prominence and capabilities of AI technologies, they might not have a substantial impact on supply chain optimization. He reasons this by stating that these models thrive on large, freely accessible information sources, such as the web, where they analyze images and text tags. However, the data critical for supply chain management - transactional history, for instance - is specific to each company and is not openly shared or easily accessible. Therefore, the current form of these AI tools may lack the necessary information to optimize supply chain processes effectively.
Focusing on the sales example of door frames, Vermorel explains that generic data about door frames is less useful for supply chain planning compared to a company’s specific sales history of door frames. He emphasizes that this data, hidden within the company’s silo, gives a more accurate prediction of what to order, produce, and stock. It underscores the point that AI technologies like ChatGPT, which perform better with widely available data, may be less effective when the relevant data is scant.
However, Vermorel acknowledges that AI language models could be valuable for some tasks. For example, ChatGPT can assist in generating code snippets due to the large amount of freely available code online, primarily on platforms like GitHub. This availability allows the AI to generate decent code snippets or programs, serving as a productivity tool for programmers. Still, he warns of the need for careful supervision as the AI-generated code could also be faulty.
Looking into the future, Vermorel speculates that AI language models could aid in areas like note-taking, proofreading, and meeting summaries. For instance, they might be able to compress a two-hour meeting discussion into a two-page summary while retaining the critical details. However, he suggests that currently, AI tools like ChatGPT might struggle with such tasks due to their inherent limitations. Nonetheless, he believes that in the next decade, AI technologies will evolve to handle such tasks more effectively.
Vermorel identifies data as the core challenge, indicating that generative AI models don’t necessarily deal well with the inherent complexities of supply chain data. Doherty then brings up the idea of GitHub Co-pilot, a tool designed to assist in coding that can even autonomously produce decent coding. He questions whether this isn’t more suited for the job at hand.
Vermorel rebuffs this, saying that GitHub Co-pilot and ChatGPT-3 share near-identical technology underpinnings—both use the Transformer architecture. The differences lie in the user experience, with GitHub Co-pilot providing auto-completion with every keystroke, whereas ChatGPT-3 is more dialogue-oriented. Vermorel predicts that the best tool for code completion will likely utilize a broader corpus than just code.
Continuing, Vermorel refers to a recent paper from an Amazon team. It discusses a promising generator that merges image and text data, claiming comparable, and occasionally superior, performance to ChatGPT-3 but with fewer parameters (a billion compared to ChatGPT-3’s hundred billion). This notion, Vermorel says, is intriguing because it suggests that blending more diverse data types can create a model that is simpler yet more potent.
Vermorel highlights a paradoxical observation in AI model development: larger models, like ChatGPT-3, aren’t necessarily better. He references Stable Diffusion, a model significantly leaner and faster than its predecessor, the Generative Adversarial Network, despite being only about a billion parameters. It’s unclear, Vermorel states, whether models as large as ChatGPT-3 (which falls in the trillion parameter range) are necessary.
Reinforcing this point, he again mentions the Amazon team’s research claiming that they’ve nearly reproduced ChatGPT-3’s performance with a billion-parameter model. This smaller size, he explains, allows operation on common graphic cards found in current laptops and workstations. This opens a gateway for wider accessibility.
Circling back to the initial topic, Doherty queries whether generative AI brings net positive or negative impacts specifically for enterprises and, more particularly, supply chains.
Vermorel explains that progress in science and technology is generally good, contradicting the pessimistic perspective of Lovecraft who believed that there are certain profound or deep truths in the universe that are so brutal and hostile to human minds that if discovered, it would drive them insane.
Vermorel acknowledges that any tool, since the Stone Age, can be used or misused. In the context of supply chain enterprise software, he fears an increase in confusion due to the misuse of technology, specifically artificial intelligence. According to him, vendors are already over-hyping AI, and it may worsen with their marketing departments creating endless fake case studies. This could lead to even more misleading claims and unverifiable case studies.
Vermorel explains that in the past, creating a fake case study took some effort, but now, due to AI, it has become effortless. He also points out that participants in a case study have no incentive to say that the benefits claimed by the company are false. They usually confirm these benefits and attribute some of the success to themselves. Vermorel predicts that these technologies will make the situation more convoluted.
While discussing the marketing strategy of his competitors, Vermorel indicates his disappointment with the flimsy and non-informative use of the term “AI for supply chain.” He criticizes their lack of transparency and how they manage to write lengthy pages filled with platitudes, offering no substantial information about their product. This makes it difficult for him to understand their technology, its function, its design, or the insights presiding over it.
Vermorel points out that genuine AI applications in supply chain optimization involve highly specialized and technical approaches. These applications are based on specific algorithms or structures like Transformer architectures, generative networks, or hierarchical approaches. He expresses a need for companies to be precise and detailed about the AI techniques they utilize. His argument is that claims of simply ‘doing AI’ without specifics are often misleading or entirely baseless.
To illustrate his point, Vermorel compares AI technology to buying a window for a house. When purchasing a window, the buyer expects a detailed description of the product - is it made of wood, aluminum, or plastic? Is it single or double-glazed? Similarly, when it comes to AI, Vermorel believes that companies should offer a detailed explanation of what AI techniques they use and how they benefit the supply chain. He asserts that generic or vague descriptions can be equated to selling ‘generic windows’ without any specifics.
Vermorel extends this analogy to make a critique of the term ‘sustainable windows’. He argues that such vague descriptions add more confusion rather than providing clarity. In the same vein, he criticizes companies that offer ’excellent light’ in relation to their windows, suggesting that it’s equivalent to AI claims that lack concrete evidence or details.
Furthermore, Vermorel anticipates that the use of AI technologies like GPT (Generative Pretrained Transformer) will increase confusion in the industry. While these tools can generate marketing material and be integrated into existing tech stacks with relative ease, they might not contribute significantly to the overall function or optimization of the supply chain if the software architecture wasn’t designed with these capabilities in mind.
In his view, this approach is akin to duct-taping an additional piece to an existing structure - it might not improve the structure or even make sense in its application. Vermorel sees a risk in the further misuse of ‘real’ AI technologies, as companies may integrate valuable algorithms into their operations in nonsensical ways, contributing to industry confusion rather than offering valuable advancements.
Vermorel criticizes the tendency to incorporate AI into supply chain optimization in ways that are ineffective and, in fact, nonsensical. He points out that these processes often add no value to the solutions they are supposed to improve. To support his point, Vermorel brings up the historical pattern of iterations in operations research, data mining, and data science, implying that current trends, like cognitive AI, could well be more of the same.
According to Vermorel, if a company wants to make the most of AI as part of its enterprise software, the integration should be at the design level. He strongly argues against “duct-taping” AI onto existing software, stressing that the core design of a product can only be established at the inception of its development. Attempting to shoehorn AI into a product after it has been created proves exceedingly difficult and often counterproductive.
When asked for an example of the core design level he refers to, Vermorel discusses transactional databases. These databases, built to ensure transactional integrity, are not designed to leverage technologies like image or text generators. In his opinion, these different paradigms are almost incompatible, and achieving a fit between them is not a given. It requires careful design considerations and a guiding principle that ensures compatibility within the software architecture.
Vermorel acknowledges the possibility of having AI as an add-on that sits on the side of an existing product, but he contends that this arrangement seldom leads to proper integration or synergy. Rather, it complicates the software, introducing more moving parts and potential bugs.
His advice to those considering AI integration into supply chain optimization is to thoroughly question vendors about their offerings. He urges customers to ensure that a vendor can explain their technology clearly and sensibly. If a vendor is unable to do so, Vermorel suggests that it could indicate a problem with the product or the vendor’s understanding of their technology.
Vermorel closes his part of the discussion by emphasizing that true achievements in AI technology, such as the creation of complex models, are often made public through research papers and other publications. This openness is partly due to the pride developers feel in achieving something difficult. He points out that these accomplishments are not well-kept secrets but are shared openly for the world to witness, further underscoring the importance of understanding the underlying technology.
Vermorel acknowledges the notable advancements achieved by certain companies in the tech industry. He points out that companies that manage to reach certain technical milestones often publish detailed reports to share how they achieved their successes. He sees this as a common trend within the industry, reinforcing that it’s a sign of actual technological progression.
Next, Vermorel takes a critical stance on the role and perception of AI in the modern corporate world. He characterizes AI as a buzzword that has gained significant traction in the market. Despite the widespread use of the term, he emphasizes that its meaning is so broad and often vague that it can encompass almost anything. He cautions against blind acceptance of vendors’ claims about their AI capabilities, especially when they can’t provide a precise description of what they are offering under the label of AI.
Vermorel firmly advises that when dealing with vendors who claim to offer AI solutions, one must exercise diligence to understand the exact nature of their offerings. He warns against trusting a vendor whose salesperson admits to lacking knowledge about the technology they are selling, passing it off as the domain of a separate tech team. Vermorel considers this a clear indicator that the company may not possess the technological prowess it claims.
He elaborates on this point by cautioning against falling for the “we hire Nobel laureates, we have Einsteins” rhetoric. He asserts that such claims are usually a smokescreen designed to convince potential clients of their technical aptitude without any substantial evidence. More often than not, he argues, these situations imply that there’s nothing truly innovative or technologically advanced behind the claims - it’s just more of the same.
On concluding this segment of the conversation, Doherty expresses his gratitude towards Vermorel for sharing his insights, emphasizing how enlightening the discussion has been. The segment ends with Doherty thanking the audience for their time and attention, promising to return with more insightful conversations in the future.
Conor Doherty: Generative AI is everywhere these days, not only in supply chain. Is this a net positive or negative? Here to explain to us is Joannes Vermorel. Welcome.
Joannes Vermorel: Hello, Conor, pleasure to have you.
Conor Doherty: So, if you want to, let’s set the table a little bit. What exactly is generative AI? What is its purpose because it’s everywhere these days?
Joannes Vermorel: Yes, generative AI is essentially a set, a collection of proposed solutions to the very old generative problem. The generative problem is when you have collections of objects in their digital representation, and you want to find an algorithm, a method, a recipe to generate one more instance. These sorts of problems have been going on for decades. For specific, narrow situations, there have been plenty of generators. For example, there has been for decades a generator that can create the name of a realistic-sounding location in England or a realistic-sounding title for a Stephen King novel. If you wanted to create images, there were generators that would make a map that looks a bit like Lord of the Rings. It carries this sort of fantasy medieval vibe with little mountains, forests, coasts, and fantasy names all over the map. The idea of having a generator has been floating around for decades. The progress has been fairly incremental, with the path being to get the generator broader, leveraging more input data sets rather than an extensive set of pre-coded rules. That’s where we stand, decades into the process. Last year, the machine learning community reached two very notable milestones with ChatGPT-3 for text and stable diffusion for images. However, these were milestones in terms of the accessibility of these tools, not necessarily a fundamental breakthrough in statistics, mathematics, or computer science. They were the first products that were packaged and polished enough so that a layman could get started in minutes and play with them. On the image front, for over a decade, there have been generative adversarial networks that can create very nice images. But these tools came with tons of quirks. Stable diffusion, on the other hand, made it easy for users to enter a prompt, say, “a beautiful castle in the middle of the forest,” and get a decent image. Not perfect, but decent enough.
Conor Doherty: So, it’s about an order of magnitude improvement in the accessibility and usability of these tools?
Joannes Vermorel: Exactly, the same thing with ChatGPT. By the way, the sort of GPT that became popular was actually a model that was already around for a couple of years. It was literally something that had been repackaged in a way that made it much more accessible. It was a matter of usability. The milestone was in 2022 when generative AI became widely accessible as opposed to being obscure. Nothing truly fundamental happened; it was really a matter of pure usability.
Conor Doherty: I recall growing up, there were examples of those generative websites, like the “give me a Ramones name” one. I use that famous example. I think Childish Gambino, the musician, generated his name through a similar website. But I was not familiar with the previous iterations of ChatGPT because the current iteration is the third. So what exactly about last year’s releases, like Stable Diffusion and ChatGPT third iteration, caught the attention of the public? They’re everywhere now.
Joannes Vermorel: What caught the attention of the public was the efforts made by research teams in the packaging of the tech. Stable Diffusion was released as open source. If you were familiar with a Python environment, even if you don’t know much about Python, you could set up a programming environment in about two hours. You could play with all the moving parts on your own. You didn’t even have to be a Python programmer. You just had to be fluent enough to execute a series of command lines. There were various tutorials. Stable Diffusion made image generation accessible if you could play with the command line. It’s a little geeky but not over the top. There was even a free user interface, Dream Studio, where you could play for free for the first 100 images. After that, you had to pay something like ten dollars to generate the next 100 images. Open GPT was also a web app. Just a small registration and nowadays, you have to pay about 20 Euros a month to gain access. The interesting thing is that in both cases, you could access a generator in a broad sense in a matter of, let’s say, an hour. You need a bit of experience to start getting the feel of the tool, but it was orders of magnitude less compared to what was before. In terms of true progression, the interesting thing is that these generators have been progressing on two fronts for decades. One front is the dimensionality. You want to be able to generate high dimensional objects in a broad sense. For example, if you want to generate a name for a Roman or a location in England, it is a fairly low-dimensional problem. Something like 10 to 20 dimensions, depending on whether you’re counting the number of letters or syllables. But if you want to generate a piece of text that is one page long, we are talking about something like a few thousand dimensions. If you want to generate an image of a thousand by a thousand pixels, you face something like a three-million-dimensional challenge due to the three primary colors. It’s a significant increase. The initial iteration of Stable Division was limited to 512 by 512 in terms of capacity. They’re improving it, but this high dimensionality was one significant challenge. The same sort of problem arose with text. Dimensionality plays out in two directions. There’s the amount of text you can use as an input prompt, and it can range from a single line to multiple paragraphs, or even pages. Then there’s the question of how far you can go text-wise before the generator loses any consistency with itself. These models are limited. They can’t generate an entire book end to end with the end being consistent with the beginning. For text generation, one challenge is to navigate these higher dimensions. If you generate one word, you just have to be consistent on a local level. If you generate a sentence, it has to be consistent on a larger scale, and so on. If it’s a book, you’re dealing with maybe millions or tens of millions of abstract dimensions, which can also be seen as degrees of freedom or the complexity of the object you’re examining. The same problem existed with images. One progress avenue is moving towards higher dimensions while maintaining consistency. If you divide the thing, it’s easier to generate two smaller images as opposed to one that’s bigger and consistent.
Conor Doherty: So when you speak about these bigger dimensions, you mean the generator should maintain consistency?
Joannes Vermorel: Yes, precisely. The goal is to maintain entanglement and consistency in the generated object, regardless of its size or complexity. Another progress avenue is universality. Are you talking about a generator that’s specific to a narrow problem, or is it a generator that can tackle anything? Over the past five years, the deep learning community has made enormous progress in leveraging gigantic data sets. If it’s text, it covers everything - Wikipedia, web forums, or any other text source. Thus, the generator, when correctly prompted, can produce anything from poetry to legalese, to code, or even genome advice. The same applies to images. We have generators capable of creating anything from pixel arts to photorealistic views or even oil paintings. It’s about covering a range of sophistication and style.
Conor Doherty: When you talk about the dimensionality of these applications, how comparable are the outputs? For example, on ChatGPT, how comparable is an essay generated through ChatGPT compared to that which is generated by the average, let’s say, University educated person? Are they comparable levels of sophistication? Are we there yet?
Joannes Vermorel: In terms of sophistication, that’s a difficult question. We would have to define and clarify what we mean by sophistication.
Conor Doherty: Actually, I can jump in there. Let’s say we use the Turing test such that you might not actually be able to determine whether it was generated by ChatGPT or by a student in a classroom.
Joannes Vermorel: It depends, because these models, especially the text generator, work by blending together enormous amounts of corpus. Some people have conducted tests, and to a large extent, the thing that ChatGPT writes is literally a cut-and-paste of stuff that is found somewhere on the web. The power of the model is its ability to glue these pieces together so they’re grammatically and syntactically correct. But it’s essentially about identifying high-level statistical patterns that exist between words, groups of words, and sentences to find things that fit together in ways that are statistically likely or credible. Does it sound like a human? A lot, yes. But the reality is that large parts of what it generates can be found on the web, taken from various sites. However, the breakthrough lies in being able to do this, which was incredibly difficult. It’s not just about cutting and pasting phrases. It’s about understanding high-level statistical dependencies so that they can be blended together in ways that are credible. Yet, when it comes to common sense, as the head of AI at Facebook commented, none of these generators possess the common sense of a cat. That’s the level of understanding we’re dealing with. It’s purely statistical relationships. For instance, ask a basic question such as “How can I avoid the traffic jam in the middle of the Atlantic Ocean?” and it might suggest getting a better route with a newer GPS, completely missing the humor in the question. It’s about gluing pieces of text together based on high-level statistical relationships.
Conor Doherty: I believe researchers at Amazon subjected ChatGPT to a battery of IQ tests and discovered that it was about a standard deviation below the norm, around 83. This seems consistent with what you’re saying here, just gluing pieces of information together that look like they belong.
Joannes Vermorel: But I think you’re missing the point. Even an incredibly unintelligent human, someone who is not brain-dead, is still vastly smarter than a cat. Yet, what’s been postulated, and I tend to agree, is that we are not even close to something as intelligent as a cat. We are still very far. You might say, “Oh, but my cat is completely incapable of telling me anything about, let’s say, the Theory of Relativity.” Yet, ChatGPT is able to do a fairly good job of giving me a couple of paragraphs of introduction. This is because ChatGPT is literally going to cut and paste a nice summarization of this theory from the thousands of instances that can be found on the web, blend them together, and regurgitate. However, that doesn’t mean it understands anything. Even a cat, for example, would understand that if there is something… Let’s use an example with GPT. If you ask your GPT something like, “Three cars need two hours to drive from the city of Paris to the city of Tours. If you have six cars, how much time does it take?” GPT would tell you, “Well, six cars is twice as much as three, so it’s going to take like four hours.” Again, if you think about a cat, and the cat thinks, “If I have a buddy, I want to go over there,” it’s going to take the same time whether there is me or my buddy cat. While the cat is not going to phrase things in a way that is as elaborate, there is some understanding of those very basic things about our three-dimensional universe, time flowing, and so on. Again, GPT is incredibly impressive in its capacity, and the same goes for Stable Diffusion. But you could see that there is this sort of incredibly shallow understanding because all that these models are doing is high-dimensional blending of the input data. Maybe this is sufficient. Maybe if we continue on this path further with even more elaborate models, there is nothing more to intelligence but piling up these sorts of recipes just on a grander scale. But I suspect the situation is more complicated than that. I suspect that those aware researchers have plenty of research demonstrating once more that the entire story of artificial intelligence is to clarify what intelligence is not. And that has been like a journey, a journey we’ve been on for the last 70 years or so.
Conor Doherty: Well, I think you said earlier that the current iteration of ChatGPT and Stable Diffusion, or just generative AI, is about an order of magnitude better than previous iterations. Yes. How much better would the fourth iteration of ChatGPT have to be to bridge the gap you just described?
Joannes Vermorel: We really don’t know because that’s the thing. Whenever there is a sort of breakthrough, and I believe here the real breakthrough was deep learning, not these applications of deep learning. Deep learning was the breakthrough around 2011-2012. That was the real mathematical, conceptual breakthrough. These are applications and very elaborate insights that were gained through the last decade. But we still don’t really know what we’re missing. It’s still a very open question and you shouldn’t think of it as a linear progression. That’s the problem with intelligence - we don’t know what we are missing. Once we establish a new kind of technique, it lets us even revisit what intelligence means in the first place. If we go back a century in the past and you were to ask, “How can you establish that one person has superior intelligence?” If you were to ask professors in academia, they might say something like, “Well, if this person can invert a matrix or compute the first 20 digits of pi, they have superior intelligence.” Nowadays, people would say that a pocket calculator can do that. It’s a completely mechanical task. There is no intelligence whatsoever in being able to compute the first 20 digits of pi. You have simple recipes that we call algorithms. You can run them with a computer and get thousands of digits. It doesn’t make you smart in any way. This was the situation a century ago, where what was considered as the true reflection of human intelligence turned out to be the easy part of mechanization. Nowadays, computers are literally 10 orders of magnitude, or even 15 orders of magnitude better than humans at doing these calculations, but they are not intelligent at all. At least, that’s the general consensus now. What we have discovered with this generation of AI, with deep learning, is that there are plenty of tasks that on the surface look incredibly difficult or challenging but might not reflect intelligence that much. For example, ChatGPT tells more about what intelligence is not compared to what it actually is. What it says is that the amount of latent knowledge in the English language and all human languages is enormous. When you say “latent knowledge,” it means that, let’s say we have this abstract thing which is all the sum total of human knowledge. There are databases, for example, that chemists have collected over the last century. These databases detail the properties of every single chemical compound. So, you have an entire database that lists the resistivity of every single material known on Earth, or the fusion point of every material on Earth. We have maps that collect knowledge in another form. There is also some sort of latent knowledge in the language itself. The words we use reflect a large understanding that we have about the universe. If we say that there are stars and planets, and that planets orbit around stars, it means that we have already understood a lot about the universe. For example, the ancient Greeks had a different understanding of what stars and planets were. To postulate that the sun is a star, like all the other stars, is now accepted and is part of the vocabulary. This is part of the latent knowledge. If you were just to look at the definitions given in a dictionary, you would learn a lot about what there is to be learned from modern sciences. The words themselves tell you about the state of knowledge. Conversely, sometimes having a word missing prevents some sort of knowledge from even existing. A peculiar example of this situation would be the book “Antifragile” by Nassim Taleb. The basic premise of the book was to define the actual opposite of fragile. Fragile, in his definition, is something that will worsen when subject to chaos and disorder. He argued that being durable, hard, or sturdy doesn’t exactly make something the opposite of fragile. These characteristics only mean that under chaos and disorder, it will decay or degrade at a slower pace. Taleb wondered what the true opposite would be, something that when subjected to chaos and disorder, would improve. This abstract perspective led him to coin the term ‘anti-fragile’, creating a whole new perspective on how to look at ecosystems, human societies, and many other things. By introducing this one word, he enriched our knowledge, though this might be hard to grasp because the way we communicate knowledge is through language itself.
Conor Doherty: That brings us back to my starting point. The brilliance of ChatGPT demonstrates that there is an enormous amount of latent knowledge in the language itself. This explains, for instance, why a politician can give you ten buzzwords of the day that correspond to the causes you want to defend. They can spin off an entire discourse based on that and appear as though they’re saying something intelligent while providing absolutely no substance.
Joannes Vermorel: Interestingly, this is what ChatGPT does. When you give the tool a prompt, it tends to piece together all sorts of widely accepted ideas that align with common sense or the dominant established perspective. Imagine if you had someone who would only answer your questions using proverbs. ChatGPT does that, but better, by stringing together platitudes from literally every single domain. It’s impressive because you’re usually not even familiar with what would be a platitude in a domain you know nothing about. This is the beauty of training a generator against a super massive dataset that includes millions of pages of text from super diverse fields.
Conor Doherty: When it comes to actually applying all of this, in your opinion or your estimation, are there any useful applications of generative AI when it comes to, let’s say, enterprise or supply chain?
Joannes Vermorel: Enterprise is a very broad field, so I’ll stick with supply chain. For supply chain, I would say most likely not, not directly at least. But it is incredibly hard to predict the future. The reason I’m inclined to think that this wave of generators will not have a massive impact on supply chain is that the strength of these generators is that they tap into a massive pool of ambient knowledge which is basically the web, with all those images and tags that you can access for free. But when it comes to the optimization of a supply chain, the data that is most relevant is your transactional history. If you’re selling, let’s say door frames, it doesn’t really help you when it comes to supply chain planning to know a lot of general things about door frames. Your sales history of door frames from last year tells you a lot more about what exactly you should order, produce, and how you should allocate the stock. So, the data that is most relevant is not exactly shared openly with the world. It exists in the silo of your company. Companies, unlike ChatGPT, are distorted by the fact that these tools are better when discussing things where a lot of materials are publicly available online. If you discuss things that are not widely published online, ChatGPT quickly becomes ignorant of that. Very concretely, I would say, if you think of the methods that could be used to do any kind of optimization, I’m not too sure just because the sort of inputs are not there. However, these tools could potentially become instrumental in supporting your development. For example, ChatGPT is actually quite good at helping you generate code snippets. For coding, because that’s generic language, means sequence of characters, ChatGPT can generate tags, but also code. Due to the fact that there is a gigantic amount of code that is available online, mostly through GitHub but plenty of other places, you have massive codebases that are readily available for ChatGPT to train on. Thus, ChatGPT is actually capable of composing halfway decent code snippets or programs. As a productivity tool for programmers, there is a lot of potential. But, beware, the code that ChatGPT generates can be as bogus as the code written by humans. I would not use it without careful supervision if you want to engineer the next generation of autopilot for an aircraft or a car. Also, I suspect that the sort of technology that will come out will be things like meeting minute records. Right now, I’m not too sure that ChatGPT would be able to summarize a two-hour discussion into something like two pages while preserving the maximum amount of details about what was said. But, similar tools, I’m pretty sure that within the next decade, will be able to do that. So for supply chain, there will be plenty of benefits. However, I suspect most of them will be kind of on the fringe, on the periphery, things like facilitating meetings, note-taking, or better systems to proofread the documents. But the core problems and challenges are in the data, and those generators don’t handle the data as it presents itself in supply chains.
Conor Doherty: Aren’t there other programs designed specifically for coding? I mean, ChatGPT is a text-based generative AI, but there’s GitHub co-pilot that’s designed to assist in coding, and it can produce pretty decent coding by itself, right?
Joannes Vermorel: No, those models are almost identical, almost interchangeable. The sort of technology that is behind them is incredibly similar. They use the same Transformer architecture. The only differences are slight variations in the corpus and the user experience. GitHub copilot aims to provide an auto-completion at each keystroke, while ChatGPT is more oriented towards dialogue. But the differences are really just a thin layer of veneer on top. Underneath, they are the same. I suspect the best tool for code completion will be built on a corpus that is broader than code. This is illustrated by a recent paper published by a team at Amazon. They presented a promising generator that combines both image and text data, essentially unifying them. They even claim to outperform ChatGPT on a few benchmarks, with comparable results on most other metrics. However, take this with a grain of caution, as determining a good generator is a problem as complex as crafting the generator itself. What’s interesting though, is that their model is as effective as ChatGPT, but with a billion parameters, whereas ChatGPT has almost 100 times more parameters. This suggests that by blending more diverse sorts of data, you can have a model that is more powerful and simpler, which is paradoxical. For instance, the ChatGPT model is gigantic, with a parameter range in the trillions. But it’s unclear whether such an enormous model is necessary. In fact, one of the breakthroughs of Stable Diffusion, compared to other models, was a model that is two orders of magnitude faster and leaner than the Generative Adversarial Network it replaced. Stable Diffusion only has about a billion parameters, making it very small compared to ChatGPT. But, a team recently claimed that they’ve reproduced the performance of ChatGPT with a model that is much smaller, roughly the size of a billion parameters. This is interesting because it’s about the same size as what can be operated with a graphic card commonly found in notebooks and workstations nowadays.
Conor Doherty: Well, this kind of brings us back to the full circle to what I said right at the start or in the intro overall, is this a net positive or net negative? Now, in the specific context of enterprise or even more granular supply chain, do you see this, generative AI, as a distraction, a boon or a curse?
Joannes Vermorel: As a general line of thinking, my take is that any progress in terms of science and technology is good. I don’t have this Lovecraftian perspective, you know, where there are some profound or deep truths of the universe that are so brutal and so hostile to the human mind that if you discover them, you go insane. My take is not a Lovecraftian one. I believe that generally, it is a good thing. It is certainly better than ignorance. Now, like any tool since the Stone Age, the first hammer could be engineered to hunt an animal or to kill your fellow humans. So, this has been the problem with technology and can be misused. It’s been a problem for thousands of years. These sorts of tech can also be misused. The probable misuses in the realm of supply chain enterprise software are going to be an increase in confusion due to noise. Vendors are already hyping AI like crazy, and now they will even be able to tune the thing up to eleven by having their marketing department just spin endless fake case studies. In the past, creating a fake case study took some effort. Yet, you could fake it entirely because nobody is ever going to check your claims. Most of your claims are impossible to verify. And, as I described in my lecture, nobody in a case study has any incentive to say that all the millions that you claim to have saved or earned or generated are fake. Everybody that is part of a case study has a massive incentive to say, “Yeah, everything, all those benefits are true, and it’s all thanks to me, at least in part if we manage to achieve all of that.” So my take is that the situation will become even more muddy because these teams are going to go berserk and generate even more bogus case studies and claims and vapid pages that describe the tech. I’ve spent some time on the websites of many Lokad competitors. The interesting thing is that you can have entire pages of text where you read it and at the end, you haven’t learned anything. They manage to spin platitudes or stuff that doesn’t give away anything about what it is they are actually doing.
Conor Doherty: Flimflammery, is that what we’re saying?
Joannes Vermorel: Yes, exactly. I’m always a bit baffled when I go through a 10-page long documentation about AI for supply chain, and at the end, I can’t say anything about what it is, what it does, why it was engineered that way, or what sort of insights presiding over this thing. That’s kind of baffling. I suspect that in the past, marketing teams spent days coming up with these fluffy descriptions. Now, using generative AI, such as ChatGPT, a ten-page description can be instantly created. So, if you’re questioning the validity of content that claims to have AI in their supply chain optimization, I would say it’s mostly suspect. Not because AI is bogus, but because it’s being misrepresented in this context. When we talk about generative AI, specific terms are used, like stable diffusion, Transformer architecture, and generative network. These techniques have names. Professionals in this field don’t simply say “I’m doing AI”. They’re more precise. They need these terms to describe their work. This precision develops as part of an emerging process within the community. People who can’t be bothered to describe their technology in detail often resort to vague terms. Let’s take a simple example. If you want to buy a window for your house, the seller will specify the material of the frame, the number of glass layers, and so on. If a seller just says “I’m selling windows, trust me they’re good” without any specifics, that’s questionable. If someone can’t give you technical specifications and instead uses buzzwords like “sustainable”, it’s not clarifying anything. It’s adding more puzzles. This is analogous to what happens with AI and ChatGPT. These tools might generate confusing marketing materials and give vendors the ability to include them in their tech stack without creating anything substantial. It’s quite easy to integrate these tools into an existing software architecture, but it’ll be a gadget if your existing software architecture hasn’t been designed to optimize the technology’s capabilities. It’s always somewhat easy to duct tape one more piece onto software, but that doesn’t mean it’ll make a difference or be useful. Therefore, I believe this situation will create further confusion. It will give one more opportunity to vendors to plug some sort of real value algorithms, but in ways that are nonsensical. In the end, this doesn’t add any value to the solution, which is another problem. We’ve already experienced several iterations of this with operational research 50 years ago, then Data Mining, and then data science. Now there will be those cognitive AI iterations. However, the issue is, if you want to make the most of this technology as enterprise software, it cannot just be an add-on. It has to come at the design level of your product. It’s a core design that you cannot change afterwards. The problem with the core design of products is that it’s something that you can only do at the beginning. You can’t just duct tape that into your product after the fact.
Conor Doherty: Can you give an example of the core design level that you’re discussing?
Joannes Vermorel: If you have a system where at the core of your system, you have a transactional database designed to ensure transactional integrity, it’s great. But this design is not going to do anything to leverage any kind of image or text generator. It’s completely at odds with the transactional perspective. You’re dealing with transactions, but having some sort of tool that can generate text or image, it’s not even the same realm. So, what I’m saying is that having something that fits is not a given. It usually requires extensive care about the design and the guiding principles for your architecture, so that things fit at all. Otherwise, you’re just on separate tracks. In software, what is misleading is that it’s always possible to have a product and then have an add-on that sits on the side. But it’s not properly integrated, not connected, and there is no synergy between the two. You just have a more complicated mess with more moving parts and more bugs. So, on the balance of power, I would advise against trying to integrate this into supply chain optimizations. But if a vendor comes forward with that, you really need to probe what it is that they’re doing. My parting advice for the audience would be: make sure if you read the technology page of this vendor, that it makes sense to you. You don’t have to be an expert. If the vendor is not capable of conveying in a way that makes sense what their technology is and what it does, and what sort of techniques it uses, it’s most likely a red flag. I’ve never seen in my entire career that a company capable of achieving something difficult hides it. On the contrary, companies who manage to reach this point are more than happy to put their accomplishments on display for the world to witness. By the way, that is true for all those models - Stable Diffusion, ChatGPT, etc. Those achievements are public. There have been papers published about them. These are not well-kept secrets. On the contrary, the companies that manage to reach this point of technical achievement often publish very detailed papers on how they have accomplished it. This is a very typical behavior. From my perspective, the fundamental advice is that while there is a lot of value in AI, it’s merely a buzzword. You can categorize almost anything under this umbrella. Therefore, whenever a vendor approaches you, it’s essential to understand what exactly they do. If the person selling to you doesn’t have this understanding, and if the vendor claims ignorance saying, “I’m just a salesperson, it’s the tech team who knows,” don’t trust them. If they say such things, it indicates that there is no substantial technology behind their claims. This is a time-tested technique that’s been in use for decades: claiming to have hired Nobel laureates, boasting of having ‘Einsteins’ in the back room, telling you to trust them because their team is incredibly smart. However, if they profess ignorance about the tech and assure you that it’s the rest of the team who knows, that almost guarantees that there is no substance to their claims. It’s just more of the same.
Conor Doherty: Well, on that note, thank you, Joannes. I’ve learned quite a bit. Thank you for your time, and thank you all for watching. We’ll see you next time.