Title T. Source: Sean McGrath, Wikimedia Commons CC-BY 2.0

Have you ever thought of how the VFX people make realistic looking cloth and skin from math? One algorithm is called ‘Perlin noise’, created sometime in the 90s, and was revolutionary in computer generated graphics. One of the main challenges in VFX, even today, is how to generate effects that look great without much to go off. This is quite similar to trying to model how something behaves, without much info on how it behaves in the first place.

If you’re working on a chemical plant with thousands of components, each behaving slightly differently, you can either figure out the mechanics behind each component individually and how they interact together, or you can model the system empirically by observing trends and changes when you manipulate certain parameters. For the latter, you need a good bank of data to base your inferences and models on. The problem here is that, for a typical chemical plant, there’s a lot of information for when the plant is operating at steady-state, but not much for when things go unsteady. When that happens, the model can’t exactly predict what’s going to happen or how to fix it.

One solution to this is to make up additional data points for fringe conditions that can fill in gaps in the databank. How can you do this? Well there’s a bunch of algorithms currently available to do just that (SMOTE, ROSE, ADASYN). Unfortunately, these mostly work for classification problems, like credit card fraud detection (i.e. fraudulent or not). A team at UoA came up with a new algorithm that is more suited toward issues found in process plant data.

This new resampling algorithm (they call it COVERT) primarily uses the extremes of the dataset as anchor points for creating new synthetic data points using kernel density estimation (KDE) and ‘k’ nearest neighbors (kNN). They tested the algorithm on a sample Gaussian-distributed dataset, and then a case study using the data from a geothermal power plant. Based on what they concluded, the COVERT algorithm succeeded in filling in gaps in the data, and was indistinguishable from the actual plant data.

You can look at the source code of the program in the Github link in the article. The article also explains more about the actual modelling algorithm better than I can. Check it out if you’re interested.

– Basil

Article link:

COVERT: A classless approach to generating balanced datasets for process modelling

Leave a Reply

Your email address will not be published. Required fields are marked *