IT Brief US - Technology news for CIOs & IT decision-makers
Story image

How synthetic data is unlocking new opportunities for intelligent video

Yesterday

Video technology has advanced significantly over the past few decades, not least because of advances in video analytics and AI, which makes this possible. Yet, according to a MarketsandMarkets forecast, the AI market is projected to reach an eye-watering 1.3 trillion USD by 2030. One potential drag on this massive growth is the availability of large datasets to train AI models. So-called synthetic data could be the answer.

Pioneering work by the so-called "Godfathers of AI," 2018 Turing Award winners Yoshua Bengio, Geoffrey Hinton, and Yann LeCun, along with Fei-Fei Li's creation of ImageNet, helped lay the groundwork for modern AI, particularly in computer vision (CV). That's particularly relevant for sensors that create image data, such as video, and it unlocked many new opportunities to improve the safety of our cities, transport, retail stores and more.

Because of AI, organisations can now gain deeper insights to inform their strategies and make better decisions about where to build a new road, which products to place on a particular store shelf, and how to plan maintenance or cleaning schedules. The combination of video and AI has truly transformed a brave new world.


Accurate AI requires large training datasets

However, training with huge datasets is needed to make these AI models as accurate as possible. The datasets used to train AI models must be representative, diverse to ensure accuracy and fairness, and legally sourced to respect data owners IP rights. As AI evolves, the need for these large, (partly)annotated datasets becomes more pressing and obtaining this data isn't always simple. Especially when dealing with sensors such as cameras that can collect a lot of personal or confidential information. Safety, privacy, and practical limitations can restrict the amount and quality of data an AI can be trained with.

This is where synthetic data steps in to open up new opportunities.


The solution offered by synthetic data

Synthetic data refers to artificially generated or augmented datasets that simulate real-world conditions. By using this data, AI developers can train models on vast amounts of diverse and representative information while mitigating the ethical and legal concerns surrounding privacy and consent. Moreover, synthetic data can preserve key real-world characteristics, ensuring that models learn from realistic environments without exposing individuals to risk. It is also a ready-to-use source, which can speed up algorithm development time.

Furthermore, synthetic data can help reduce bias in AI models. Traditional datasets are often shaped by the biases present in the original data collection process, which can skew the outcomes of AI decision-making. By thoughtfully designing synthetic data collection processes, developers can minimise the biases that arise from relying on historical datasets.

Lastly, synthetic data is scalable and cost-effective. It enables AI developers to create vast, diverse datasets quickly and affordably, which is particularly useful for tasks that require specific, high-quality data that is not readily available.


BOX-OUT In action: protecting Danish harbours

A research project in Denmark shows the potential role of synthetic data in improving safety and saving lives. In this project, AI models that detect someone falling into a harbour have been trained on different datasets, including synthetic data.

Unfortunately, Danish harbours have witnessed numerous drowning incidents over the years, with 1,647 lives lost between 2001 and 2015 in Danish waters, and a quarter of these tragedies occurring in harbours themselves.

Researchers created the most extensive outdoor thermal dataset for video analytics in one of Denmark's busiest ports, Aalborg Harbour. This dataset enables AI-equipped video cameras to detect different types of objects in a thermal setup. To cover fall incidents, volunteers were asked to fall into the water. However, it was too dangerous to ask human volunteers to do this. Moreover, jumping into a harbour looks different from someone accidentally losing their footing and falling in. The researchers also needed a representative dataset for wheelchair users, cyclists, and skateboarders.

Warmed-up dummies were used to mimic human bodies, but again, they couldn't fully capture the full complexity of a human falling into the harbour. Therefore, the best solution was synthetic data that could model more intricate behaviours and diverse falling scenarios.

The project expanded its training dataset using synthetic data without compromising safety or ethical concerns. The AI model developed through this process shows promising results to alert rescue teams if and when a person falls into the harbour, increasing the chances of survival by minimising response times and reducing cold water exposure.


The broader applications for synthetic data

Video analytics is ubiquitous across multiple industries, and the same applies to the synthetic data it is trained in. Further use cases include manufacturing, where synthetic data-trained AI models can ensure automated production lines function correctly. AI can detect anomalies in production or potential equipment failure. Collecting large production line footage can be risky, given the confidential information on manufacturing techniques and components.

Synthetic data may also be helpful in healthcare settings where patient privacy is paramount, and collecting training data for scenarios like falling might be too challenging. It can help train models to detect when a dementia patient is lost and wandering the halls of a hospital or, for example, alert staff when a care home patient has fallen out of bed.


A growing opportunity

As we witness more uses of AI in video and other applications, we can expect a rise in the use of synthetic data, too. Providing a safe, ethical and scalable data source, this data can be the best option in some situations. Therefore, everyone working with data and video should be aware of the opportunities that synthetic data brings to their AI's accuracy, representation, and overall effectiveness.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X