A Summary of Data from CmdR ScotSoft2020

Our CmdR ScotSoft 2020 conference was a bit different this year. Our first ever virtual conference had almost 40 speakers on a huge range of topics and some 1200 delegates. In case you missed them, you can catch up with them on the event platform via your registered link, but here’s a quick round up of some of the data presentations.

Olivia Gambelin of Ethical Intelligence and Joseph Crispell, data scientist with ONS, both talked about building ethics into the approach to your data project. Olivia began by dispelling a few misconceptions. Applying ethics is not a blocker to innovation; it’s an enabler as it allows you to explore the edges of other constraints. It’s also not actually possible to completely remove bias; the key is understanding it and making explicit choices. Good definition and labelling is vital. Joseph then explored the application of ethics in his project work in some very sensitive topics. He talked about his previous study of whether bovine tuberculosis was being transmitted from badgers (highly emotive as it was used to determine the case for badger culling), and also his current work with the UNICEF Data for Children Collaborative mapping HIV risk in young people in Cote D’Ivoire. Ensuring accuracy, representativeness and anonymity are vital ethical considerations in both projects.

James McMinn of Bellrock Technologies talked about the complexities in deploying a data project: the multiplicity of different languages, versions of libraries, platforms, data latencies, physical runtime locations, peaks of resource demand, dependencies and so on, mean that managing the analytics supply chain can become an all consuming headache.

He also talked about the difference between the role of the data scientist whose time is best used responding to business problems, and the role of the data engineer who, coming from the world of IT, is more concerned with having a stable, secure and accurate production process. Both have much they can learn from each other! James covered a number of tips for managing some of these challenges: borrowing from DevOps, containerisation can be a really useful tool for managing heterogeneous data pipelines; Kubernetes can help with managing resource demand for these containerized workloads, but this can be built up to; start simple. Bellrock can help to coordinate and organise the end to end production process and bring order to chaos with their Lumen platform!

Mike Ferguson’s presentation focussed on the emergence of an Enterprise Data Marketplace where “customers” can shop for trusted data as a service. Over the past few years, with IT departments being seen as a bottleneck and the emergence of self-service data wrangling tools, there is a danger of actually increasing in the number of data silos and moving further away from “one version of the truth” (the mantra of data warehousing back in the noughties). The addition of different cloud stores and data held in SaaS systems adds to this. This creates a drive back to having a trusted data service at enterprise level and at the heart of this is a data catalogue which can cover all sorts of different data “sources” from core data assets, to analytical apps and queries, and where “business ready” products are often logical entities. Mike talked about some of the necessary features of the catalogue, such as different types of data classification, for retention, security, level of confidence and provenance. He also talked about some emergent organisational roles and responsibilities which the move to an Enterprise Data Marketplace can create.

The presentation by Murray Collins, CEO of Space Intelligence, focussed on their use of AI and integrating different types of data to improve sustainability. Existing mapping approaches leave an information gap in terms of understanding land use. As land use is one of the main drivers of carbon emissions, improving land use will better enable Scotland to meet Net Zero targets. In particular, Murray talked about a collaboration with NatureScot (formerly Scottish Natural Heritage); they had a requirement to be able to monitor land use in certain key areas such as Cairngorm national park, but were working with out-of-date maps. By combining the strengths of different types of mapping, Space Intelligence have been able to bring new insights and create a much more dynamically tracked Natural Capital Asset Index for NatureScot. And, hot off the press, Space Intelligence and NatureScot have been given funding for a further phase of this project which will help them understand where best to intervene for improvement, which can then of course be tracked. Note too, Space Intelligence are currently recruiting – see here.

As the closing keynote, Steve Guggenheimer of Microsoft gave a fascinating talk about the AI journey. AI is now becoming embedded into different toolsets, and many pre-trained “cognitive services” are now available which can be incorporated as a building block into different solutions. Examples include virtual agents, auto-generated writing solutions (this is not one of them!), and logistics planning tools which can predict when people will be in to receive deliveries. Steve did not say “Most AI is built in powerpoint” – though I do like that phrase – but he did acknowledge the hype around AI, while highlighting that now is the time to get started on the AI journey. He covered a kind of “AI continuum” with different design patterns and solution options for increasing levels of sophistication. While very many products do and will embed AI without the users knowing, he also highlighted the importance of education and the use of AI for good in bringing the public along on the journey and the importance of building sufficient guardrails while reiterating Olivia Gambelin’s earlier point that there is no way to definitively create the right ethics for AI.

Lastly, there were some other data topics too. Gavin Littlejohn talked about the challenges and opportunities in creating a Global Open Finance Centre of Excellence. Alex Bell and Petur Einarsson of BJSS had a good discussion about some of the different approaches to AI. Both are also very worth watching!

If you didn’t make it to the event and want to see more of any of these presentations, they are available on catch-up. That’s definitely one of the benefits of a virtual conference. And while I missed some of the buzz that comes with the real-life conference, it was great to be able to more easily access so many international speakers.