I planned to give a talk on AI’s application at a local cafe. Unfortunately there was not a single one who signed up. In order to not to waste my entire effort in preparing for the talk, I translated my slides into a blog post. That is why it’s going to read a bit like slides.
😨 Will AI Threaten Humanity?
🤖 What is AI?
Artificial Intelligence (AI) is a broad term encompassing various technologies. Here are some key areas:
-
📖 Machine Learning: Systems that learn from data to make predictions or decisions. For example, Netflix uses ML to analyze your viewing history and recommend new shows and movies you might enjoy.
-
💬 Natural Language Processing (NLP): Technology that enables computers to understand and generate human language. Virtual assistants like Siri and XiaoAi can understand voice commands, answer questions, and even engage in conversations.
-
📷 Computer Vision: Systems that can understand and process visual information. This technology powers facial recognition in your phone’s camera, helps autonomous vehicles identify road signs, and enables augmented reality applications.
-
🚗 Autonomous Driving: Self-driving vehicles that can navigate roads and traffic. Companies like Baidu are developing cars that can drive themselves using a combination of sensors, cameras, and AI algorithms.
-
🦾 Robotics: Intelligent machines that can perform physical tasks. Tesla’s factories use sophisticated robots for manufacturing, assembly, and quality control.
-
🏥 Medical Imaging Analysis: AI systems that can detect diseases in X-rays, MRIs, and other medical images with high accuracy, helping doctors make better diagnoses.
-
📈 Financial Forecasting: Banks and financial institutions use AI to predict market trends, detect fraud, and automate trading decisions.
Two Types of Generative AI
In this post, I will focus on two types of generative models that have seen rapid development in recent years:
Large Language Models (LLMs)
These are AI systems that can understand and generate human-like text. Given a prompt or partial text, LLMs can generate contextually appropriate and coherent content. For example:
Input: “Why are math books always sad?”
LLM Output: “Because they have too many problems!”
Diffusion Models
These are AI systems that can generate images from text descriptions. The “diffusion” process involves gradually adding noise to an image and then removing it step by step to generate new images.
4️⃣ The Four Giants of LLM
Currently, there are four major players in the global LLM space:
- OpenAI - Creator of ChatGPT
- Anthropic - Developer of Claude
- Google DeepMind - Behind Gemini
- Meta (Facebook) - Leading open-source efforts with LLaMA
Other significant players include:
- Grok by xAI
- Mistral AI with their Mistral model
- Baidu with Ernie
- Alibaba with Tongyi Qianwen
- Tencent with Hunyuan
🤔 How Should We View AI?
There are several perspectives on AI:
😐 “It doesn’t affect me”
Many people believe AI is distant from their daily lives, not realizing that AI is already integrated into many of their everyday activities.
😍 “AI is Amazing!”
Enthusiasts point to:
- Significant productivity improvements in various industries
- Liberation from repetitive tasks
- Personalized art and creative content generation
- Scientific breakthroughs enabled by AI
- Potential solutions for environmental challenges
😠 “AI is Problematic!”
Critics worry about:
- AI’s current limitations and mistakes
- Over-hyping by tech companies
- Copyright infringement issues
- Spread of misinformation and deep fakes
- Potential job displacement
- Environmental impact of AI computing resources
😱 “AI is Scary!”
Concerned voices highlight:
- Privacy concerns and data collection
- Risks of AI systems getting out of control
- Existential threats to humanity
🤪 “Nobody Really Knows!”
Perhaps the most honest perspective is that we’re all speculating about AI’s future impact. Like any powerful technology, its effects will likely depend on how we choose to develop and use it.
📈 Data Analysis
AI has transformed data analysis into a quick and accessible process. Here’s a practical example: Imagine taking photos of restaurant receipts with your iPhone.
AI can instantly:
- Convert these receipts into structured data
- Create organized tables
- Perform detailed analysis
🖼️ Image Recognition
The evolution of image recognition technology has been remarkable. A famous 2014 XKCD comic illustrated the stark contrast between simple and complex image recognition tasks:
What seemed like a five-year challenge in 2014 can now be done in minutes. Modern AI can not only recognize static images but also analyze video content in real-time, understanding complex scenes and actions.
🎨 AI Art Creation
AI Photography
The line between AI-generated and real photographs has become increasingly blurred:
AI Painting and Art Styles
Modern AI art generators (Diffusion Models) offer various artistic styles:
- Stable Diffusion: Open-source, highly customizable
- DALL-E 3: Integrated with ChatGPT, user-friendly
- Midjourney: Excellent for artistic style generation
- Flux: Focuses on speed and efficiency
- Imagen (Google): Specializes in high-fidelity images
- Firefly (Adobe): Integrated with Creative Cloud
Here are examples of different styles:
The Challenge of Authenticity
A concerning trend has emerged: when searching for certain images (like “baby peacock”) on Google, many results are AI-generated:
This raises important questions about visual authenticity in the digital age. The old saying “seeing is believing” may no longer apply, making reliable source verification more crucial than ever.
🎵 Music and Audio Creation
AI has ventured into music creation with services like Suno and Udio. These platforms can generate both lyrics and complete songs:
🎬 Video Generation
Video generation AI has made remarkable progress with tools like:
- OpenAI’s Sora
- Google’s Veo
- Meta’s Movie Gen
- RunwayML
🗣️ Voice Processing
Speech Recognition and Interaction
Modern AI systems like ChatGPT’s Whisper have achieved impressive accuracy in speech recognition:
Real-time Voice Interaction
What was once science fiction is now reality with tools like ChatGPT voice mode and Gemini Live:
These technologies enable:
- Real-time language teaching and translation
- Virtual customer service
- Personalized education assistance
- Smart personal assistants
- Mental health support and companionship
While there are still challenges with latency and connection stability, the technology continues to improve rapidly.
Voice Summaries and Accessibility
Tools like Google NotebookLM can create podcast-style summaries of written content:
AI voice technology also offers promising applications for accessibility:
- Image-to-speech descriptions for visually impaired users
- Sign language translation to text or speech
- Assisted reading through voice synthesis
- Audiobook creation
- Emotional content in text-to-speech
- Voice cloning and modification
- Real-time translation captioning
- Meeting transcription and summarization
🚨 A Note of Caution
While these AI capabilities are impressive, they also present new challenges. In May 2024, a finance employee at Arup fell victim to a sophisticated scam using deepfake technology to impersonate company executives in a video conference, resulting in a $250 million fraudulent transfer. This serves as a reminder to remain vigilant as AI technology becomes more sophisticated.
As AI continues to evolve, it’s crucial to maintain a balance between embracing its benefits and being aware of potential risks. The future of AI holds immense promise, but it requires responsible development and usage to ensure it benefits society as a whole.
💡 AI Learning Guide: Understanding Large Language Models
Note: I highly recommend Ethan Mollick’s Thinking Like an AI blog post. Most of the content in this part is a summary of that post.
🤖 How Do LLMs Write?
Imagine LLMs as master players of a sophisticated word association game. They’re like incredibly skilled predictors of what comes next in any given text:
- When you write “Today’s weather is…”, the LLM considers multiple possible continuations like “nice,” “terrible,” or “sunny”
- Each prediction is based on patterns learned from millions of similar phrases
- The model weighs different possibilities before selecting the most appropriate continuation
✏️ How Do LLMs Create Complete Texts?
Think of an LLM as a storyteller crafting a narrative one word at a time:
- Each word is carefully chosen based on all the previous words, creating a coherent flow
- Changes in earlier content can dramatically alter the direction of the text
- This sequential process explains why asking the same question multiple times might yield different responses
- The model maintains context awareness throughout the generation process
📚 Where Do LLMs Get Their Knowledge?
Picture LLMs as voracious readers who have absorbed information from vast libraries of content:
- 📱 Web Articles: From news sites, blogs, educational platforms, and online forums
- 📖 Books and Magazines: Including literature, textbooks, and specialized publications
- 🔬 Scientific Research: Academic papers, research publications, and technical documentation
- The model processes and interconnects this knowledge to provide informed responses
🧠 How Long is an LLM’s Memory?
LLMs have memory limitations similar to human short-term memory:
- 🆕 Each conversation starts fresh - like waking up with no memory of previous chats
- 💭 They can only work with information provided in the current conversation
- ⚠️ Content beyond their context window is inaccessible during the interaction
LLM Model | Context Window (Memory Length) | Equivalent To |
---|---|---|
GPT-4 | 128K tokens | 📚 ~300 pages |
Claude 3.5 | 200K tokens | 📚 ~500 pages |
Gemini 1.5 | 2M tokens | 📚 ~5000 pages |
💡 How Can We Better Work with LLMs?
Practical tips and future possibilities:
Current Best Practices:
- 🔄 If the LLM gets stuck or confused, don’t hesitate to restart the conversation
- 🎨 Experiment with different prompting styles to encourage more creative and precise responses
- ⚖️ Remember that LLMs have limitations - they’re tools, not omniscient beings
- 👋 Practice regularly to develop your prompting skills and understanding of the model’s capabilities
Looking to the Future:
- 📈 Context windows will continue to expand, allowing for longer conversations and more complex tasks
- 🤝 Future models may better understand and remember user preferences across sessions
- ♾️ Enhanced capabilities in mathematics, programming, and logical reasoning
📖 Recommended Reading: Thinking Like an AI by Ethan Mollick
⌛ Master AI in 10 Hours
I quite like the following quote from Ethan Mollick:
The most effective way to understand AI is through hands-on experience. For about 10 hours, immerse yourself in experimenting with AI—try out tasks you typically do for work or fun, explore its quirks, and ask it unexpected questions. This practical exposure will teach you far more than reading articles alone. Through this, you’ll gain an in-depth understanding of AI’s strengths and limitations, potentially uncovering surprising insights along the way.
💭 What’s Next?
What would you like AI to do for you? I think by using AI in our daily life in a productive and ethical way, or at least by understanding their strength and limitations, we the users can also influence the future.