Open Banking data quality far superior to screen scraping: Study

Open banking screen scraping CDR

Open Banking data, accessed via Australia’s Consumer Data Right (CDR) scheme, is cleaner, more insightful, and offers significant efficiency advantages over screen scraped datasets, according to the author of a new study released by open data specialist fintech Frollo.

As part of a two-month study late last year comparing the two methods of data extraction, report author Zhitao Xiong, Frollo’s head of data science, found that while screen scraping remains an option, it “doesn’t come close to the data quality offered by Open Banking”.

For one, he noted, more data is retained via a CDR exchange. Xiong said that the process of screen scraping – which captures data visible on a particular screen – is inherently limited data due to its reliance on what is visually accessible.

“Key fields like merchant category codes, biller codes, and transaction types – crucial for understanding financial behaviour – are often missing,” he said.

These codes were far more likely to be retained in full via the CDR.

Frollo analysis showed that CDR data offers merchants names in 52.3 per cent of transactions compared to just 31.7 per cent for screen scraping – a significant 65.7 per cent advantage.

Data (and, notably, large datasets) was also found to be cleaner and less ‘noisy’ when accessed via the CDR.

Xiong noted that large datasets “can contain inconsistencies like meaningless symbols or text fragments, which pollute data and hinder categorisation by advanced, AI-powered enrichment engines like Frollo’s IDEaS service”.

Analysing word frequency across both datasets revealed a stark difference.

“In CDR data, only 14 per cent of words were irrelevant, compared to 34 per cent in screen-scraping data.

“This ‘dirty’ data translates to user effort. Frollo users re-categorise screen-scraped transactions 30 per cent more often than CDR transactions, reflecting the impact of data quality,” Xiong said.

Open Banking offers a secure, consistent, and standardised method for data sharing. It delivers cleaner and more insightful data than screen scraping, resulting in a better user experience and return on investment.

Frollo’s study (covering the November 2023 – January 2024 period) analysed anonymised data from 9.7 million CDR transactions and 1.3 million screen-scraped transactions within its money management app.