Artificial intelligence developed to model written language can be utilized to predict events in people’s lives. A research project from DTU, University of Copenhagen, ITU, and Northeastern University in the US shows that if you use large amounts of data about people’s lives and train so-called ‘transformer models’, which (like ChatGPT) are used to process language, they can systematically organize the data and predict what will happen in a person’s life and even estimate the time of death.
In a new scientific article, ‘Using Sequences of Life-events to Predict Human Lives’, published in Nature Computational Science, researchers have analyzed health data and attachment to the labour market for 6 million Danes in a model dubbed life2vec. After the model has been trained in an initial phase, i.e., learned the patterns in the data, it has been shown to outperform other advanced neural networks (see fact box) and predict outcomes such as personality and time of death with high accuracy.
“We used the model to address the fundamental question: to what extent can we predict events in your future based on conditions and events in your past? Scientifically, what is exciting for us is not so much the prediction itself, but the aspects of data that enable the model to provide such precise answers,” says Sune Lehmann, professor at DTU and first author of the article.
Predictions of time of death
The predictions from Life2vec are answers to general questions such as: ‘death within four years’? When the researchers analyze the model’s responses, the results are consistent with existing findings within the social sciences; for example, all things being equal, individuals in a leadership position or with a high income are more likely to survive, while being male, skilled or having a mental diagnosis is associated with a higher risk of dying. Life2vec encodes the data in a large system of vectors, a mathematical structure that organizes the different data. The model decides where to place data on the time of birth, schooling, education, salary, housing and health.
“What’s exciting is to consider human life as a long sequence of events, similar to how a sentence in a language consists of a series of words. This is usually the type of task for which transformer models in AI are used, but in our experiments we use them to analyze what we call life sequences, i.e., events that have happened in human life,” says Sune Lehmann.
Raising ethical questions
The researchers behind the article point out that ethical questions surround the life2vec model, such as protecting sensitive data, privacy, and the role of bias in data. These challenges must be understood more deeply before the model can be used, for example, to assess an individual’s risk of contracting a disease or other preventable life events.
“The model opens up important positive and negative perspectives to discuss and address politically. Similar technologies for predicting life events and human behaviour are already used today inside tech companies that, for example, track our behaviour on social networks, profile us extremely accurately, and use these profiles to predict our behaviour and influence us. This discussion needs to be part of the democratic conversation so that we consider where technology is taking us and whether this is a development we want,” says Sune Lehmann.
According to the researchers, the next step would be to incorporate other types of information, such as text and images or information about our social connections. This use of data opens up a whole new interaction between social and health sciences.
The research project
The research project ‘Using Sequences of Life-events to Predict Human Lives’ is based on labour market data and data from the National Patient Registry (LPR) and Statistics Denmark. The dataset includes all 6 million Danes and contains information on income, salary, stipend, job type, industry, social benefits, etc. The health dataset includes records of visits to healthcare professionals or hospitals, diagnosis, patient type and degree of urgency. The dataset spans from 2008 to 2020, but in several analyses, researchers focus on the 2008-2016 period and an age-restricted subset of individuals.
Transformer model
A transformer model is an AI, deep learning data architecture used to learn about language and other tasks. The models can be trained to understand and generate language. The transformer model is designed to be faster and more efficient than previous models and is often used to train large language models on large datasets.
Neural networks
A neural network is a computer model inspired by the brain and nervous system of humans and animals. There are many different types of neural networks (e.g. transformer models). Like the brain, a neural network is made up of artificial neurons. These neurons are connected and can send signals to each other. Each neuron receives input from other neurons and then calculates an output passed on to other neurons. A neural network can learn to solve tasks by training on large amounts of data. Neural networks rely on training data to learn and improve their accuracy over time. But once these learning algorithms are fine-tuned for accuracy, they are potent tools in computer science and artificial intelligence that allow us to classify and group data at high speed. One of the most well-known neural networks is Google’s search algorithm.