Large Language Models (LLMs) are advanced neural networks designed to understand and generate human-like text based on vast amounts of data. They leverage deep learning techniques to process natural language, enabling applications in translation, summarization, and conversational agents, among others. Prominent examples include OpenAI's GPT series, Google's BERT, and Meta's LLaMA models.
1. ChatGPT 4o
2. Anthropic Claude 3
3. Google Gemini 1.5
...
Retrieval Augmented Generation (RAG) combines the capabilities of retrieval-based models and generative models to improve the accuracy and relevance of generated content. In this approach, relevant documents or information are retrieved from an external database and then used as context for the generative model to produce more informed and accurate responses. This technique is particularly useful for tasks requiring detailed and specific knowledge, such as question answering and summarization.
1. Sentiment Analysis
2. Content Moderation
3. ChatBot by integrating Custom Data
4. Language Translation
5. Health Care - Diagnosis Example
More than 150 examples across various LLMs.
Diagnosis Service uses Gen AI with Large Language Models and RAG (Retrieval Augmented Generation) to summarise the patient diagnosis data for the past years.
Artificial Intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (acquiring information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI applications include expert systems, natural language processing, speech recognition, and machine vision.
Machine Learning (ML) is a subset of AI that involves the use of algorithms and statistical models to enable computers to improve their performance on a task through experience. ML systems learn from data, identify patterns, and make decisions with minimal human intervention.
Neural Networks are a set of algorithms modeled loosely after the human brain that are designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling, or clustering of raw input.
Generative AI involves models that can generate new data samples that resemble the training data.
In supervised learning, the model is trained on labeled data. This means the training data includes input-output pairs, and the goal is to learn a mapping from inputs to outputs. Common algorithms include linear regression, logistic regression, and support vector machines.
Classification: It is a type of supervised learning where the goal is to predict discrete labels (categories) for given inputs.
Example: Email Spam Detection
The model is trained on a labeled dataset of emails, where each email is marked as “spam” or “not spam.” The goal is to classify new emails into these categories based on learned patterns.
Regression: It is a type of supervised learning where the goal is to predict continuous values for given inputs.
Example: House Price Prediction
The model is trained on a dataset of house features (e.g., size, number of bedrooms, location) and their corresponding prices. The goal is to predict the price of a house based on its features.
In unsupervised learning, the model is trained on unlabeled data. The system tries to learn the underlying structure of the data without specific input-output pairs. Common techniques include clustering (e.g., k-means, hierarchical clustering) and association (e.g., Apriori algorithm).
Clustering: Customer Segmentation
A retail company uses clustering to group customers based on purchasing behavior. This helps in identifying distinct customer segments for targeted marketing.
Association: Market Basket Analysis
Analyzing transaction data to find associations between products. For example, if customers often buy bread and butter together, the store might place these items close to each other.
Dimensionality Reduction: Principal Component Analysis (PCA) for Image Compression
PCA is used to reduce the number of features in an image dataset while preserving as much variance as possible, making it easier to store and process.
Reinforcement Learning (RL) involves training an agent to make a sequence of decisions by rewarding it for good actions and penalizing it for bad ones. The agent learns to achieve a goal by maximizing cumulative rewards. Key algorithms include Q-learning and deep Q-networks (DQN).
Q- Learning: It is a value-based Reinforcement Learning algorithm that aims to find the optimal action-selection policy by learning the Q-values (quality values) for state-action pairs.
Deep Q-Networks (DQN): It extend Q-Learning by using deep neural networks to approximate the Q-values, making it feasible to handle large state spaces where maintaining a Q-table is impractical.
Example: Game Playing AI
A reinforcement learning agent is trained to play a game like chess or Go. It learns by playing many games, receiving rewards for winning and penalties for losing, ultimately improving its strategy over time.
The perceptron is one of the fundamental building blocks of a Neural Network. It was created by Frank Rosenblatt in 1957. The above architecture of Perceptron shows 3 key principles.
1. Summing up the Inputs * Weights
2. Adding the Bias
3. Step Function for the Non-Linearity
Shallow Networks typically consist of one or two hidden layers. They are simpler and require less computational power compared to deep networks but may not perform as well on complex tasks.
Single Layer Perceptrons
A Single Layer Perceptron is the simplest type of artificial neural network. It consists of a single layer of output neurons connected directly to the input features, with no hidden layers.
Multi-Layer Perceptrons with One Hidden Layer
A Multi-Layer Perceptron (MLP) with one hidden layer is a more complex network that can learn non-linear relationships. It consists of an input layer, one hidden layer, and an output layer.
Radial Basis Function Networks
Radial Basis Function (RBF) Networks use radial basis functions as activation functions. They are typically used for pattern recognition
Adaptive Resonance Theory (ART)
Adaptive Resonance Theory (ART) networks are designed for pattern recognition and clustering, capable of learning new patterns without forgetting old ones.
Example: Simple Perceptron for Binary Classification
A single-layer neural network (perceptron) is used to classify data points into two categories. For instance, determining whether a given email is spam or not based on basic features.
Deep Learning involves neural networks with many layers (deep neural networks). It allows the model to learn from large amounts of data with high levels of abstraction.
- Convolutional Neural Networks (CNNs)
CNNs are specialized for processing structured grid data like images. They use convolutional layers to automatically and adaptively learn spatial hierarchies of features from low- to high-level patterns.
Example: Image Recognition
CNNs are used in applications like identifying objects in photos. For instance, a CNN can be trained to recognize different breeds of dogs in images by learning hierarchical features such as edges, textures, and shapes.
- Recurrent Neural Networks (RNNs)
RNNs are designed for sequence data. They use recurrent layers where connections between nodes form directed cycles, allowing the model to maintain information about previous inputs in the sequence.
Example: Language Translation
RNNs are used in language translation systems. For example, Google Translate uses RNNs to convert sentences from one language to another by understanding the sequence of words and their context.
- Transformer Models (BERT, GPT)
Transformer Models use self-attention mechanisms to process data. BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) are examples that excel in natural language processing tasks. BERT is designed for understanding the context of words in a sentence, while GPT is used for generating coherent and contextually relevant text.
Example: BERT (Developed by Google)
- Sentiment Analysis (Movie Rating)
- Text Classification (For News, Categorizing the news as politics, sports, movies etc.)
- Next Word Prediction
- Quesiton Answer Model (Chat)
Example: Text Generation (GPT-3)
GPT-3 can generate human-like text based on a given prompt. For instance, it can write essays, generate code, or create conversational responses by understanding the context and generating relevant text.
- Generative Adversarial Networks (GANs)
GANs consist of two networks, a generator and a discriminator, that are trained together. The generator creates data samples, and the discriminator evaluates them. The generator aims to produce data indistinguishable from real data, while the discriminator aims to identify fake data.
- Foundation Models
Foundation Models are large pre-trained models that can be fine-tuned for various downstream tasks. They are trained on massive datasets and have broad applicability.
- Large Language Models
Large Language Models (LLMs) are a type of foundation model trained on extensive text data to understand and generate human language. Examples include GPT-3 and BERT, which can perform a wide range of natural language processing tasks such as translation, summarization, and question answering.
Example: Chatbots and Virtual Assistants
Large Language Models like GPT-3 are used to build chatbots that can engage in complex conversations, answer questions, and perform tasks based on natural language input. For instance, a virtual assistant can help schedule meetings, answer customer queries, or provide technical support by leveraging its extensive language understanding capabilities.
Weights are the parameters within a neural network that transform input data within the network’s layers. They represent the strength of the connection between nodes (neurons).
Example: In a neural network predicting house prices, a weight might determine how much importance to give to the size of the house. If the weight is high, the size heavily influences the prediction.
Example: Health Care - Predicting Diabetes
In this neural network, a weight might determine how much importance to give to the glucose level when predicting diabetes. If the weight is high, glucose level heavily influences the prediction.
Biases are additional parameters in a neural network that allow the activation function to be shifted to the left or right. They help the model fit the data better by providing flexibility.
Example: In the same house price prediction network, a bias might be added to ensure that even with no input (e.g., a house of zero size, which is unrealistic), the model outputs a value that makes sense in the context of the problem.
Example: Health Care - Predicting Diabetes
A bias might be added to ensure that even with low or zero values for certain features, the model outputs a value that makes sense in the context of diabetes prediction.
Activation functions introduce non-linearity into the model, allowing it to learn more complex patterns. They determine the output of a neuron.
Common Activation Functions:
Example: Using ReLU in the house price model helps capture non-linear relationships, like sudden jumps in price due to certain features (e.g., having a pool).
Example: Health Care - Predicting Diabetes
Using the Sigmoid function in the output layer to predict the probability of diabetes.
The loss function measures how well the neural network’s predictions match the actual data. It provides a way to quantify the error in predictions.
Common Loss Functions:
Example: For house price prediction, Mean Squared Error might be used, calculated as the average of the squared differences between predicted and actual prices.
Example: Health Care - Predicting Diabetes
Binary Cross-Entropy might be used to quantify the difference between the predicted probability of diabetes and the actual outcome (0 for no diabetes, 1 for diabetes).
Gradient Descent is an optimization algorithm used to minimize the loss function by iteratively adjusting the weights and biases in the direction that reduces the loss.
Example: In our model, Gradient Descent would adjust the weights and biases to minimize the difference between predicted house prices and actual prices.
Example: Health Care - Predicting Diabetes
Gradient Descent would adjust the weights and biases in the model to minimize the difference between the predicted probability of diabetes and the actual outcomes
Back Propagation is the process of computing the gradient of the loss function with respect to each weight by the chain rule, and it helps in updating the weights.
Example: In back propagation, the error from the output layer (difference between predicted and actual price) is propagated backward through the network to update the weights. This ensures that each weight adjustment reduces the overall error.
Example: Health Care - Predicting Diabetes
In back propagation, the error from the output layer (difference between predicted and actual diabetes outcomes) is propagated backward through the network to update the weights and biases.
There’s no guaranteed path to safety as artificial intelligence advances, Geoffrey Hinton, AI pioneer, warns. He shares his thoughts on AI’s benefits and dangers with Scott Pelley.
Competitive pressure among tech giants is propelling society into the future of artificial intelligence, ready or not. Scott Pelley dives into the world of AI with Google CEO Sundar Pichai.
Tom Steinfort interviews a robot. Ameca, as it likes to be called, is the most advanced lifelike robot in the world. A marvel of generative artificial intelligence, it’s curious, chatty and full of attitude.
Professor Geoffrey Hinton, CC, FRS, FRSC, the ‘Godfather of AI’, delivered Oxford's annual Romanes Lecture at the Sheldonian Theatre on Monday, 19 February 2024.
AI won't kill us all — but that doesn't make it trustworthy. Instead of getting distracted by future existential risks, AI ethics researcher Sasha Luccioni thinks we need to focus on the technology's current negative impacts, like emitting carbon, infringing copyrights and spreading biased information. She offers practical solutions to regulate our AI-filled future — so it's inclusive and transparent.
The current explosion of exciting commercial and open-source AI is likely to be followed, within a few years, by creepily superintelligent AI – which top researchers and experts fear could disempower or wipe out humanity. Scientist Max Tegmark describes an optimistic vision for how we can keep AI under control and ensure it's working for us, not the other way around.
This quick start guide explores the differences and relationships between the increasingly important branches of data science AI, ML, and DL using practical examples.
Covers questions like What is generative AI, how does it work, how do I use it, what are some of the risks & limitations. Also covers things like autonomous agents, the role of us humans, prompt engineering tips, AI-powered product development, origin of ChatGPT, different types of models, and some tips about mindset around this whole thing.
Get the latest insights on Artificial Intelligence (AI) 🧠, Natural Language Processing (NLP) 📝, and Large Language Models (LLMs)
MIT Introduction to Deep Learning 6.S191: Lecture 1
New 2024 Edition Foundations of Deep Learning
Lecturer: Alexander Amini
Google DeepMind CEO and Co-Founder Demis Hassabis speaks with Bloomberg's Tom Mackenzie about the latest version of the company's AlphaFold AI system (AlphaFold 3), intended to tackle problems in biology including disease protection and treatment.
What is Generative AI and how does it work? What are common applications for Generative AI? Watch this video to learn all about Generative AI, including common applications, model types, and the fundamentals for how to use it.
What is really the difference between Artificial intelligence (AI) and machine learning (ML)? Are they actually the same thing? In this video, Jeff Crume explains the differences and relationship between AI & ML, as well as how related topics like Deep Learning (DL) and other types and properties of each.
Copyright © 2024 Araf Karsh Hamid - All Rights Reserved.
Powered by OZAZO