What Can AI Do?

I planned to give a talk on AI’s application at a local cafe. Unfortunately there was not a single one who signed up. In order to not to waste my entire effort in preparing for the talk, I translated my slides into a blog post.

You can check out my original slides (in Chinese) here.

😨 Will AI Threaten Humanity?

🤖 What is AI?

Artificial Intelligence (AI) is a broad term encompassing various technologies. Here are some key areas:

📖 Machine Learning: Systems that learn from data to make predictions or decisions. For example, Netflix uses ML to analyze your viewing history and recommend new shows and movies you might enjoy.
💬 Natural Language Processing (NLP): Technology that enables computers to understand and generate human language. Virtual assistants like Siri and XiaoAi can understand voice commands, answer questions, and even engage in conversations.
📷 Computer Vision: Systems that can understand and process visual information. This technology powers facial recognition in your phone’s camera, helps autonomous vehicles identify road signs, and enables augmented reality applications.
🚗 Autonomous Driving: Self-driving vehicles that can navigate roads and traffic. Companies like Baidu are developing cars that can drive themselves using a combination of sensors, cameras, and AI algorithms.
🦾 Robotics: Intelligent machines that can perform physical tasks. Tesla’s factories use sophisticated robots for manufacturing, assembly, and quality control.
🏥 Medical Imaging Analysis: AI systems that can detect diseases in X-rays, MRIs, and other medical images with high accuracy, helping doctors make better diagnoses.
📈 Financial Forecasting: Banks and financial institutions use AI to predict market trends, detect fraud, and automate trading decisions.

Two Types of Generative AI

In this post, I will focus on two types of generative models that have seen rapid development in recent years:

Large Language Models (LLMs)

These are AI systems that can understand and generate human-like text. Given a prompt or partial text, LLMs can generate contextually appropriate and coherent content. For example:

Input: “Why are math books always sad?”

LLM Output: “Because they have too many problems!”

Diffusion Models

These are AI systems that can generate images from text descriptions. The “diffusion” process involves gradually adding noise to an image and then removing it step by step to generate new images.

Generating a photo of a hot dog - DALL-E 3

4️⃣ The Four Giants of LLM

Currently, there are four major players in the global LLM space:

OpenAI - Creator of ChatGPT
Anthropic - Developer of Claude
Google DeepMind - Behind Gemini
Meta (Facebook) - Leading open-source efforts with LLaMA

Other significant players include:

Grok by xAI
Mistral AI with their Mistral model
Baidu with Ernie
Alibaba with Tongyi Qianwen
Tencent with Hunyuan

🤔 How Should We View AI?

There are several perspectives on AI:

😐 “It doesn’t affect me”

Many people believe AI is distant from their daily lives, not realizing that AI is already integrated into many of their everyday activities.

You might not have noticed, but AI is already part of your daily life.

😍 “AI is Amazing!”

Enthusiasts point to:

Significant productivity improvements in various industries
Liberation from repetitive tasks
Personalized art and creative content generation
Scientific breakthroughs enabled by AI
Potential solutions for environmental challenges

I thought AI would let me do more interesting work

😠 “AI is Problematic!”

Critics worry about:

AI’s current limitations and mistakes
Over-hyping by tech companies
Copyright infringement issues
Spread of misinformation and deep fakes
Potential job displacement
Environmental impact of AI computing resources

How can AI help us achieve a better future?

😱 “AI is Scary!”

Concerned voices highlight:

Privacy concerns and data collection
Risks of AI systems getting out of control
Existential threats to humanity

🤪 “Nobody Really Knows!”

Perhaps the most honest perspective is that we’re all speculating about AI’s future impact. Like any powerful technology, its effects will likely depend on how we choose to develop and use it.

📈 Data Analysis

AI has transformed data analysis into a quick and accessible process. Here’s a practical example: Imagine taking photos of restaurant receipts with your iPhone.

AI can instantly:

Convert these receipts into structured data
Create organized tables
Perform detailed analysis

🖼️ Image Recognition

The evolution of image recognition technology has been remarkable. A famous 2014 XKCD comic illustrated the stark contrast between simple and complex image recognition tasks:

What seemed like a five-year challenge in 2014 can now be done in minutes. Modern AI can not only recognize static images but also analyze video content in real-time, understanding complex scenes and actions.

🎨 AI Art Creation

AI Photography

The line between AI-generated and real photographs has become increasingly blurred:

AI Painting and Art Styles

Modern AI art generators (Diffusion Models) offer various artistic styles:

Stable Diffusion: Open-source, highly customizable
DALL-E 3: Integrated with ChatGPT, user-friendly
Midjourney: Excellent for artistic style generation
Flux: Focuses on speed and efficiency
Imagen (Google): Specializes in high-fidelity images
Firefly (Adobe): Integrated with Creative Cloud

Here are examples of different styles:

Traditional Chinese painting style by DALL-E 3

The Challenge of Authenticity

A concerning trend has emerged: when searching for certain images (like “baby peacock”) on Google, many results are AI-generated:

Google search results showing mostly AI-generated images

This raises important questions about visual authenticity in the digital age. The old saying “seeing is believing” may no longer apply, making reliable source verification more crucial than ever.

🎵 Music and Audio Creation

AI has ventured into music creation with services like Suno and Udio. These platforms can generate both lyrics and complete songs:

‘Green Future’ - An AI-generated song by Suno

🎬 Video Generation

Video generation AI has made remarkable progress with tools like:

OpenAI’s Sora
Google’s Veo
Meta’s Movie Gen
RunwayML

A frame from Sora showing a fashionable woman walking in Tokyo

🗣️ Voice Processing

Speech Recognition and Interaction

Modern AI systems like ChatGPT’s Whisper have achieved impressive accuracy in speech recognition:

Whisper’s speech recognition error rates across languages

Real-time Voice Interaction

What was once science fiction is now reality with tools like ChatGPT voice mode and Gemini Live:

These technologies enable:

Real-time language teaching and translation
Virtual customer service
Personalized education assistance
Smart personal assistants
Mental health support and companionship

While there are still challenges with latency and connection stability, the technology continues to improve rapidly.

Voice Summaries and Accessibility

Tools like Google NotebookLM can create podcast-style summaries of written content:

AI voice technology also offers promising applications for accessibility:

Image-to-speech descriptions for visually impaired users
Sign language translation to text or speech
Assisted reading through voice synthesis
Audiobook creation
Emotional content in text-to-speech
Voice cloning and modification
Real-time translation captioning
Meeting transcription and summarization

🚨 A Note of Caution

While these AI capabilities are impressive, they also present new challenges. In May 2024, a finance employee at Arup fell victim to a sophisticated scam using deepfake technology to impersonate company executives in a video conference, resulting in a $250 million fraudulent transfer. This serves as a reminder to remain vigilant as AI technology becomes more sophisticated.

As AI continues to evolve, it’s crucial to maintain a balance between embracing its benefits and being aware of potential risks. The future of AI holds immense promise, but it requires responsible development and usage to ensure it benefits society as a whole.

💡 AI Learning Guide: Understanding Large Language Models

Note: I highly recommend Ethan Mollick’s Thinking Like an AI blog post. Most of the content in this part is a summary of that post.

🤖 How Do LLMs Write?

Imagine LLMs as master players of a sophisticated word association game. They’re like incredibly skilled predictors of what comes next in any given text:

When you write “Today’s weather is…”, the LLM considers multiple possible continuations like “nice,” “terrible,” or “sunny”
Each prediction is based on patterns learned from millions of similar phrases
The model weighs different possibilities before selecting the most appropriate continuation

✏️ How Do LLMs Create Complete Texts?

Think of an LLM as a storyteller crafting a narrative one word at a time:

Each word is carefully chosen based on all the previous words, creating a coherent flow
Changes in earlier content can dramatically alter the direction of the text
This sequential process explains why asking the same question multiple times might yield different responses
The model maintains context awareness throughout the generation process

📚 Where Do LLMs Get Their Knowledge?

Picture LLMs as voracious readers who have absorbed information from vast libraries of content:

📱 Web Articles: From news sites, blogs, educational platforms, and online forums
📖 Books and Magazines: Including literature, textbooks, and specialized publications
🔬 Scientific Research: Academic papers, research publications, and technical documentation
The model processes and interconnects this knowledge to provide informed responses

🧠 How Long is an LLM’s Memory?

LLMs have memory limitations similar to human short-term memory:

🆕 Each conversation starts fresh - like waking up with no memory of previous chats
💭 They can only work with information provided in the current conversation
⚠️ Content beyond their context window is inaccessible during the interaction

LLM Model	Context Window (Memory Length)	Equivalent To
GPT-4	128K tokens	📚 ~300 pages
Claude 3.5	200K tokens	📚 ~500 pages
Gemini 1.5	2M tokens	📚 ~5000 pages

💡 How Can We Better Work with LLMs?

Practical tips and future possibilities:

Current Best Practices:

🔄 If the LLM gets stuck or confused, don’t hesitate to restart the conversation
🎨 Experiment with different prompting styles to encourage more creative and precise responses
⚖️ Remember that LLMs have limitations - they’re tools, not omniscient beings
👋 Practice regularly to develop your prompting skills and understanding of the model’s capabilities

Looking to the Future:

📈 Context windows will continue to expand, allowing for longer conversations and more complex tasks
🤝 Future models may better understand and remember user preferences across sessions
♾️ Enhanced capabilities in mathematics, programming, and logical reasoning

📖 Recommended Reading: Thinking Like an AI by Ethan Mollick

⌛ Master AI in 10 Hours

I quite like the following quote from Ethan Mollick:

The most effective way to understand AI is through hands-on experience. For about 10 hours, immerse yourself in experimenting with AI—try out tasks you typically do for work or fun, explore its quirks, and ask it unexpected questions. This practical exposure will teach you far more than reading articles alone. Through this, you’ll gain an in-depth understanding of AI’s strengths and limitations, potentially uncovering surprising insights along the way.

💭 What’s Next?

What would you like AI to do for you? I think by using AI in our daily life in a productive and ethical way, or at least by understanding their strength and limitations, we the users can also influence the future.

What Can AI Do?

😨 Will AI Threaten Humanity?

🤖 What is AI?

Two Types of Generative AI

Large Language Models (LLMs)

Diffusion Models

4️⃣ The Four Giants of LLM

🤔 How Should We View AI?

😐 “It doesn’t affect me”

😍 “AI is Amazing!”

😠 “AI is Problematic!”

😱 “AI is Scary!”

🤪 “Nobody Really Knows!”

📈 Data Analysis

Receipt Example 1

Receipt Example 2

🖼️ Image Recognition

🎨 AI Art Creation

AI Photography

AI Painting and Art Styles

The Challenge of Authenticity

🎵 Music and Audio Creation

🎬 Video Generation

🗣️ Voice Processing

Speech Recognition and Interaction

Real-time Voice Interaction

Voice Summaries and Accessibility

🚨 A Note of Caution

💡 AI Learning Guide: Understanding Large Language Models

🤖 How Do LLMs Write?

✏️ How Do LLMs Create Complete Texts?

📚 Where Do LLMs Get Their Knowledge?

🧠 How Long is an LLM’s Memory?

💡 How Can We Better Work with LLMs?

Current Best Practices:

Looking to the Future:

⌛ Master AI in 10 Hours

💭 What’s Next?

😨 Will AI Threaten Humanity?#

🤖 What is AI?#

Two Types of Generative AI#

Large Language Models (LLMs)#

Diffusion Models#

4️⃣ The Four Giants of LLM#

🤔 How Should We View AI?#

😐 “It doesn’t affect me”#

😍 “AI is Amazing!”#

😠 “AI is Problematic!”#

😱 “AI is Scary!”#

🤪 “Nobody Really Knows!”#

📈 Data Analysis#

Receipt Example 1

Receipt Example 2

🖼️ Image Recognition#

🎨 AI Art Creation#

AI Photography#

AI Painting and Art Styles#

The Challenge of Authenticity#

🎵 Music and Audio Creation#

🎬 Video Generation#

🗣️ Voice Processing#

Speech Recognition and Interaction#

Real-time Voice Interaction#

Voice Summaries and Accessibility#

🚨 A Note of Caution#

💡 AI Learning Guide: Understanding Large Language Models#

🤖 How Do LLMs Write?#

✏️ How Do LLMs Create Complete Texts?#

📚 Where Do LLMs Get Their Knowledge?#

🧠 How Long is an LLM’s Memory?#

💡 How Can We Better Work with LLMs?#

Current Best Practices:#

Looking to the Future:#

⌛ Master AI in 10 Hours#

💭 What’s Next?#

😨 Will AI Threaten Humanity?

🤖 What is AI?

Two Types of Generative AI

Large Language Models (LLMs)

Diffusion Models

4️⃣ The Four Giants of LLM

🤔 How Should We View AI?

😐 “It doesn’t affect me”

😍 “AI is Amazing!”

😠 “AI is Problematic!”

😱 “AI is Scary!”

🤪 “Nobody Really Knows!”

📈 Data Analysis

🖼️ Image Recognition

🎨 AI Art Creation

AI Photography

AI Painting and Art Styles

The Challenge of Authenticity

🎵 Music and Audio Creation

🎬 Video Generation

🗣️ Voice Processing

Speech Recognition and Interaction

Real-time Voice Interaction

Voice Summaries and Accessibility

🚨 A Note of Caution

💡 AI Learning Guide: Understanding Large Language Models

🤖 How Do LLMs Write?

✏️ How Do LLMs Create Complete Texts?

📚 Where Do LLMs Get Their Knowledge?

🧠 How Long is an LLM’s Memory?

💡 How Can We Better Work with LLMs?

Current Best Practices:

Looking to the Future:

⌛ Master AI in 10 Hours

💭 What’s Next?