Artificial intelligence (AI) is a wide-ranging branch of computer science concerned with building smart machines capable of performing tasks that typically require human intelligence. Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns, and make decisions with minimal human intervention. Deep learning is a subset of machine learning where artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data.
Recent public concern is about fake images and videos including facial information generated by digital manipulation, in particular with Deepfake methods. The very popular term “Deepfake” is referred to a deep learning-based technique able to create fake images and videos by swapping the face of a person by the face of another person.
Nowadays, it is becoming increasingly easy to automatically synthesize non-existent faces or manipulate a real face of one person in an image/video, because of: i) the accessibility to large-scale public data, and ii) the evolution of deep learning techniques that eliminate many manual editing steps such as Autoencoders (AE) and Generative Adversarial Networks (GAN).
Deepfake relies on the technology of deep learning. Through the algorithm of deep learning, it can identify the photos of different angles, postures and expressions of the target characters (such as celebrities, politicians, etc.), and then continuously train to automatically generate the fake pictures, and cover them to the faces of the original video characters to form the " Deepfake videos “.
This has become a social challenge as they are difficult to distinguish from an authentic image and impossible to distinguish by human eye. They have been negatively used to trick the society by creating fake news and misleading pictures and videos. Because machines are generating perfect images these days, it has become difficult to distinguish the machine generated images from the originals.
Generative Adversarial Networks (GANs)
A Generative Adversarial Network (GAN) is a special type of deep learning, which is what we call Convolution Neural Networks (CNN). GANs are a form of deep neural network that has been commonly used to generate deep fake. Among many advantages, one advantage of GANs is that it capable to learn from a set of training data set and create a sample of data with the same features and characteristics. GANs can be used to swipe a “real” image or the video of a person with that of a “fake” one.
The Generative Adversarial Network (GAN) has the following two parts:
- Generator: Generator is trained to generate the fake data by incorporating the fake data from the discriminator and make the discriminator classify its output as real.
- Discriminator: Discriminator learns to distinguish the generator’s fake data and real data.
Both generator and discriminator are trained simultaneously in an adversarial process, where the generator tries to produce fake sample which can fool the discriminator and discriminator tries to identify the sample correctly from the training dataset or the generator. The main objective of generator is to generate the sample which are indistinguishable from the training dataset to the discriminator and discriminator’s objective is to truly identify the sample is from generator or the training dataset. Both the generator and discriminator are updated based on their performance during training.
Deepfake Image/Video Detection Model
Deepfake image/video detection is the process of identifying deepfake images/videos that have been artificially generated using machine learning algorithms. Some techniques that have been developed for detecting deepfake images/videos, including:
Metadata Analysis: Analyze the metadata of the image/video to check for inconsistencies or anomalies that may indicate that the image/video has been manipulated.
Pixel Analysis: Analyze the pixels of the image for signs of tampering or manipulation, such as distorted or stretched pixels.
Audio analysis: Analyze the audio of the video to detect any discrepancies or abnormalities that may indicate that the audio has been manipulated.
Frame analysis: Analyze the individual frames of the video to look for signs of tampering or manipulation, such as distorted or stretched pixels.
Deep Learning: Deep learning techniques are used to train models to detect deepfake images/videos. These models are trained on large datasets of real and fake images/videos and can learn to recognize patterns and features that are indicative of a deepfake.
Machine learning: Machine learning algorithms can be trained to recognize patterns in deepfake images/videos and identify them as such. These models typically require a large dataset of both real and fake images/videos to be effective.
Watermarking: One technique that has been proposed for detecting deepfake images/videos is to embed a digital watermark into the image/video that is not visible to the human eye. This watermark can be used to verify the authenticity of the image/video.
A deepfake is a digital image/video of real person that has been edited to generate extremely realistic but not true depiction of them doing or saying that they did not actually do or say. This realistic can be achieved by using AI, Machine Learning and Deep Learning techniques, which makes difficult to distinguish real and fake images/videos. Deepfake have the potential to be misused in different ways and cause significant damage as they are used to create fake news, false pornographic videos, targeting celebrities and politicians, harassments, social insulting documents, reputational damage, sexual exploitation, identity theft.
There are different deepfake detection techniques which can be used detect real or fake images/videos. Using only one technique may not be effective to detect deepfake in many cases. So different deepfake detection techniques can be used combinedly to get the best result.
Chang, X. et al. (2020) ‘DeepFake Face Image Detection based on Improved VGG Convolutional Neural Network’, Chinese Control Conference, CCC, 2020-July, pp. 7252–7256. doi: 10.23919/CCC50068.2020.9189596.
Korshunov, P. and Marcel, S. (2018) ‘DeepFakes: a New Threat to Face Recognition? Assessment and Detection’, pp. 1–5. Available at: http://arxiv.org/abs/1812.08685.
McCloskey, S. and Albright, M. (2019) ‘Detecting GAN-Generated Imagery Using Saturation Cues’, Proceedings - International Conference on Image Processing, ICIP, 2019-Septe, pp. 4584–4588. doi: 10.1109/ICIP.2019.8803661.
Neves, J. C. et al. (2020) ‘GANprintR: Improved Fakes and Evaluation of the State of the Art in Face Manipulation Detection’, IEEE Journal on Selected Topics in Signal Processing, 14(5), pp. 1038–1048. doi: 10.1109/JSTSP.2020.3007250.
Paul, O. A. (2021) ‘ Deepfakes Generated by Generative Adversarial Networks’, Honors College Theses. 671.
Tolosana, R. et al. (2020) ‘Deepfakes and beyond: A Survey of face manipulation and fake detection’, Information Fusion, 64, pp. 131–148. doi: 10.1016/j.inffus.2020.06.014.
Mr Adhikari holds a Master's Degree in Computer Engineering from NCIT, Pokhara University, Nepal, and is currently an Assistant Coordinator of IT Programs at KIST.