AI is great! The entire spectrum of AI analysis, from prediction to reaction, from algorithms to generative AI, and the actions empowered by Machine Learning (ML) are generally understood as technological advancements that can benefit society overall if applied with caution.
However, there’s an ‘if’ and a ‘but’ in this proposition.
Analyzing various misconceptions related to AI analysis is crucial. The question is not just about which jobs and tasks will soon be fully automated by autonomous robots and controlled by AI, eliminating general apprehension in this regard. Most people now understand that while some routine jobs might disappear, new high-value jobs can be created, existing roles can be enhanced, and AI analysis can be harnessed positively to improve our lives.
That being said, it’s essential to perform a thorough SWOT analysis (Strengths, Weaknesses, Opportunities, Threats) of the current state of AI.
To tell the story, let’s organize this AI analysis into Opportunities, Strengths, Weaknesses, and Threats (OSWT).
With AI analysis and Large Language Models (LLMs), we can achieve a lot if we genuinely understand how they work. For instance, when we ask ChatGPT to explain Einstein’s General Theory of Relativity, we get a reasonably accurate response. However, at the end of the day, ChatGPT is still ‘just’ a computer program (like all other LLMs), operating based on its instruction set with closed eyes. It interprets Einstein’s General Theory of Relativity better than your favorite pet might.
“Unfortunately, we use ‘human-like’ terms to describe these techniques employed by AI analysis engineers to build AI analysis models and functions. For example, we talk about ‘machine learning’ and ‘training’ in this context, how we are working with LLMs in the field of AI. It’s misleading because the LLM’s brain is not human-like,” clarified Keith Pijanowski, a Senior Technologist and AI/ML SME at MinIO, a company renowned for its open-source high-performance object storage for cloud-native workloads. As the trap for AI analysis is being set.
Here’s a specific subtle point, according to Pijanowski: How can a non-thinking chatbot summarize the quest for the most intelligent human? If we can understand the fundamental nature of LLMs on a contradictory basis, we can unveil more opportunities to use these new genres of actions that still require further exploration.
The beauty of LLMs lies in their ability to understand the probable distribution of words in the training set used to create them. If the training set is extensive (like a collection of Wikipedia articles or public code on GitHub), the models have a reservoir of words and related probabilistic distribution that makes their outputs appear as if they understand the real world. The text is the output.
If we take an example from philosophy and ask ChatGPT, “What does ‘cogito, ergo sum’ mean, and who wrote it?” we get a result similar to the following text:
“‘Cogito, ergo sum’ is a Latin philosophical proposition, translated into English as ‘I think, because I am.’ This statement is associated with the famous French philosopher, mathematician, and scientist René Descartes. It is part of his published work ‘Meditations on First Philosophy’ in 1637. The phrase reflects Descartes’ attempt to establish a foundational truth that cannot be doubted – the existence of a thinking being as a proof of one’s existence.”
So, here we see the strength element, and as mentioned earlier, LLMs generate such results by utilizing the distribution of possibilities based on billions of training examples. Pijanowski explained, “It works somewhat like, starting with a prompt in a question, they begin by looking at the text and determine that the word ‘cogito’ is most likely the first word of the answer. From there, they look at the first word in the question and find the word with the highest probability of coming next.
This continues until a particular ‘answer end’ role has the highest probability. There’s nothing to fear from this ability to generate natural language responses based on the distribution of possibilities – it’s something that should be leveraged for business value. Results are even better when you use novel techniques. For example, by using techniques like Retrieval Augmented Generation (RAG) and fine-tuning, we can teach an LLM about your specific business.”
For Pijanowski and the team, weaknesses are relatively clear… and this realization comes from their experience working with MINIO users. We know that LLMs cannot think, comprehend, or provide reasoning for AI analysis, and that is a fundamental limitation of LLMs.
“The ability of language models to argue about a user’s query is not there. They are probabilistic machines that give a decent estimate of a user’s query. It doesn’t matter how good the estimate is; it is still an estimate, and whatever it generates based on those estimates will ultimately be something that is not true.
In creative AI, this is taken as a kind of trickery,” suggested Pijanowski. “When properly trained, trickery can be minimized. Techniques like fine-tuning and RAG have also reduced trickery. The most important thing is – for proper training, it needs to be fine-tuned, validated, and for contextual relevance (RAG), data and infrastructure are required to store it on scale so that it can be measured and presented with performance.”
The most popular use of LLMs is undoubtedly Generative AI analysis. Generative AI doesn’t provide a specific answer that can be compared with a known result, unlike other use cases of AI analysis that provide a specific prediction that can be easily validated.
“To check models for image identification, classification, and retrieval is straightforward. But how do you check LLMs used for Generative AI analysis that are impartial, fact-based, and scalable? If you are not an expert, how can you be sure that the complex responses of LLMs are correct? Even if you are an expert, you cannot be part of the automated checks that happen in the CI/CD pipeline,” clarified Pijanowski, pointing out a potential danger in this area.
He expressed regret over the fact that there are some standards in the industry that can be helpful. The use of GLUE (General Language Understanding Evaluation) is made to assess the performance of models working on human language. SuperGLUE is an extension of GLUE that introduces more challenging tasks related to language. These tasks include solving reasoning, answering questions, and more complex linguistic expressions.
“While the above-mentioned standards are helpful, a big part of the solution should be an organization’s own approach to collecting its data. Consider all questions and answers, and focus on creating your tests based on the results that suit your needs. For this, you will also need a data infrastructure that is built for measurement and performance representation,” Pijanowski concluded. “When we look at the strengths, opportunities, weaknesses, and threats of LLMs (now reordered in the SWOT sequence), if we want to take advantage from the beginning and minimize the other two, then we need a data and storage solution that can handle a lot. It is.”
Don’t forget, SWOT also stands for success without tears.
Although conducting a SWOT analysis for AI analysis(in any order) is somewhat straightforward, commonplace, and deserving of the subsequent facts or function audit, these technologies are rapidly advancing, and it is undoubtedly a cautious diagnostic exercise that we should request on a regular basis.
Read More: Best AI Chatbots