Thinking Machines and Data-Centric Engineering, with The Alan Turing Institute’s Mark Girolami

By Lauren Xandra on March 23, 2023

We believe the majority of human progress is still ahead of us. And companies applying the latest advancements in data, AI and computing are poised to have an outsized impact on what that progress looks like. In this series, Future Proof, we speak with industry experts at the helm of this future, so we can become future-proofed.

For our first episode, we speak with Mark Girolami, Chief Scientist at The Alan Turing Institute, the UK’s national institute for data science and artificial intelligence. The full interview can be viewed below.

Mark Girolami is an academic having ten years’ experience as a Chartered Engineer within IBM. In March 2019 he was elected to the Sir Kirby Laing Professorship of Civil Engineering (1965) within the Department of Engineering at the University of Cambridge where he also holds the Royal Academy of Engineering Research Chair in Data Centric Engineering. Professor Girolami is a fellow of Christ’s College Cambridge.

Prior to joining the University of Cambridge Professor Girolami held the Chair of Statistics in the Department of Mathematics at Imperial College London. He was one of the original founding Executive Directors of the Alan Turing Institute the UK’s national institute for Data Science and Artificial Intelligence, after which he was appointed as Strategic Programme Director at Turing, where he established and led the Lloyd’s Register Foundation Programme on Data Centric Engineering. Since October 2021 he serves as the Chief Scientist of the Alan Turing Institute.

Professor Girolami is an elected fellow of the Royal Academy of Engineering, the Royal Society of Edinburgh, he was an EPSRC Advanced Research Fellow (2007-2012), an EPSRC Established Career Research Fellow (2012-2018), and a recipient of a Royal Society Wolfson Research Merit Award. Professor Girolami currently serves as the Editor-in-Chief of the new open access journal Data Centric Engineering published by Cambridge University Press.

Mark joins today’s episode to share his take on what recent advancements in AI and ML signal for creativity, scientific discovery, and society, as well as the most important areas for continued research and innovation.

Watch Episode 1 of Future Proof now on YouTube.

Read the transcript below, which has been condensed and revised for clarity.

Opportunities and risks amidst rapid advancements in AI and ML

Lauren Xandra: Thank you so much for joining us, Mark. Our goal on the Future Proof series is to help create new knowledge around the invisible forces of data science and AI and their tangible impact on science, society and the economy. We’re very excited to have you joining us today to share your perspective on how best to apply and really hone the past decade’s incredible rise in computer power, data and scientific breakthroughs for the betterment of society. To start, it would be great if you can tell us about the goals of the Alan Turing Institute, where you serve as Chief Scientist.

Mark Girolami: So the Alan Turing Institute was established by the UK government to make big leaps in what was then called data science, to make the world a better place. And of course, if you think about AI and the big renaissance that we have been experiencing in AI, most of it is driven by the availability of data and lots of it. So the main goals of the Alan Turing Institute are to advance research in data science and AI, and apply it to some of the biggest global challenges that we as societies face. And so at the moment, the Alan Turing Institute is about to announce its strategy and it will be doing that in a couple of weeks at the AI UK Meeting. And there we will be announcing our grand challenges. Those grand challenges will be improvements on environment and sustainability, improving the defense of the UK and its security, and improving health outcomes for the population as a whole. So those are the goals of the Alan Turing Institute.

Lauren Xandra: Yeah, very important goals to tackle. You know, today we’re closer than ever to what Alan Turing imagined in his landmark research paper from 1950…this world where we’re interacting with “machines that think.” In the wake of all the ChatGPT and GPT-3 news, we can sort of just type a message to a friend, and serve up a Shakespearean play, a research paper, or lines of code. It would be great to learn about some of the ways you think these capabilities will inspire creativity, further scientific discovery, or lead to other beneficial outcomes.

Mark Girolami: So I think the first thing to say is that the machines that we work with don’t think. And I think we need to really be quite grounded in that, that we don’t have intelligent machines. We have very powerful computers that still follow Moore’s Law in terms of the number of transistors that actually go onto the silicon, but the number of transistors that can go on to silicon are actually enabling the compute capability to provide us with power and to process data–skills that we’ve never been able to think of before. And so, for example, things like ChatGPT and GPT-3 are the outcomes of very, very large models of language whose parameters have been estimated by pretty much the whole of the Internet. And so that’s hugely exciting that we have that capability to be able to do that which thirty years ago we couldn’t.

And so I think that, in terms of the important ways that these capabilities are going to inspire creativity and enhance the scientific discovery process and lead to beneficial societal outcomes…I think we need to look at some of the advancements of technology that we’ve seen previously.

If we go back 30-40 years, to the advent of personal computing–a computer and the capability to program that was being put onto individuals’ desks, onto their kitchen tables, that led to an explosion of creativity, people writing programs, playing games, and so on. And there were huge numbers of capabilities that emerged that we just didn’t really expect. Then if you fast forward to the next big technological advance: the internet, and you think of the global connectivity that we then have. The ability to access information that we had to physically go to libraries to get, an explosion in online streaming, and access to the creativity of whole populations.

And then if you think of the iPhone or mobile computing, and taking that capability off our desktops and basically putting it into our back pockets. And that mobility again took us to another level of creativity and various applications of that. So I think that these recent advancements in AI are going to give us the tools to harness mankind’s creativity in similar ways to personal computing or the Internet or the iPhone did previously. And that means in ways which we probably can’t really know at the moment. I think in terms of creativity, we should really just be watching and waiting and seeing what’s going to come out of it.

And in terms of scientific discovery, we’ve got developments like AlphaFold, which, again, has used the compute power, and the power of data and combining those to give us really smart AI algorithms to start to predict the folds of protein and so on. Those tools are then being put into the hands of basic scientists who are trying to understand the genesis of some of the diseases that we face as whole populations and as mankind as a whole. Again, I think we’re going to see a supercharging of some of the advancements in scientific discovery because of AI capabilities.

And then again, in terms of beneficial societal outcomes, I think in terms of some of the grand challenges in the Alan Turing Institute–defending our nation, making the population secure; improving health outcomes; mitigating our sustainable infrastructure against climate change, and so on. Some of these AI tools are going to lead to beneficial outcomes in some of these challenge areas.

Lauren Xandra: Thank you–that’s very helpful context and perspective. And you’re absolutely right that we don’t quite have “machines that think,” but we do have machines that appear to be thinking and are sort of outpacing public understanding. It would be great to get your take on some of the risks that you think are important to bear in mind for different audiences, be it businesses, end users or entrepreneurs.

Mark Girolami: So one of the major risks that we face at the moment is that by and large, the scientists, like myself, who are developing these AI algorithms are not entirely clear as to how they work. And therefore how they might fail. And I think that one of the big risks that we’re going to face is that we are putting these potential tools into the hands of policy makers and governments. And it’s not entirely clear what their failure [modes] will be. I mean, we’ve seen some very high profile examples of some of the failures of these large language models – the potential for bias because they’re being trained on, well, we’re not entirely sure, because it’s just a huge scrape of the global internet. So I think that the introduction of these sorts of biases, and how we understand what those biases are, and how we engineer them out is something that is going to be a big risk.

Then, I think about the failure modes or how sensitive these AI methods or algorithms or tools are to small perturbations in operating conditions. They can move from something that is operating well, you know, giving good results and sensible answers, to something that could be wildly weird and of course, potentially dangerous. So I think that we really need to be working very, very closely with our government to understand on the one hand, what are the potentials and what are the opportunities? But on the other hand, what are the risks? And what are the dangers of adopting this?

If you think of the history of flight…now when we started building aircrafts, we didn’t really understand all of the technical details of why a plane stayed up in the air. And by and large, we still don’t know fully, but it doesn’t stop us safely using aircrafts and using flight for transportation. But to get to this level of safe usage of aircraft, we’ve had to learn some very, very serious lessons because of some of the aircraft and airline disasters that we’ve experienced over the whole history of flight. And so [what] we ideally would not like to do is be in a similar situation where we have to rely on disasters to better understand and then better use some of these AI tools. So I think entrepreneurs, businesses and end users are all the same levels – I suppose entrepreneurs and businesses are at even greater risk because their businesses are going to be based on something that potentially has inherent risk associated with it. And so I think the mitigation of those risks at the design stage and then the deployment stages are going to be really critical.

The importance of research, and the evolution of education

Lauren Xandra: Building on this thinking around opportunities and risks–what do you think are some of the most important areas for research, at this significant inflection point?

Mark Girolami: The whole notion of robustness–how robust are these tools going to be that they will always operate in a safe way. You know, preventing something that’s safe from flipping to something that is unsafe. So there’s an awful lot of research going on at the moment, looking at robustness of some of the deep architectures associated with things like [Chat]GPT, GPT-3, and so on.

I think another really important area I mentioned [is] bias and really understanding the importance of data and of course the input of humans into that. I mean that’s why ChatGPT is so good, is because they have used humans to actually help out in the tuning, and the learning, and the training of these tools. So having humans actually in the loop at some point and being able to use that is going to be really very important, I would say.

Lauren Xandra: Excellent. Thinking about research and perhaps learning more broadly, how do you think pedagogy will be impacted by these involvements?

Mark Girolami: Again, if we go back to when calculators first came along–what was the net effect of that? Well, mental arithmetic is not so important anymore, but what is more important and more prominent, was that education, basic mathematics, could progress without having to focus on the root learning of mental arithmetic. Why? Because we have these tools, these calculators that do that. And so I think that we’ll see similar advancements in terms of education and pedagogy.

One example–one of my former PhD students who is now a professor at University, was writing a research grant application and he used ChatGPT just to write all the boiler plate text that’s required. He could then focus on making the more nuanced arguments about why this research is important and why it should be funded. So I think, like most advances in technology, once we get these tools, then we need to revise in terms of what is important as far as pedagogy goes, and as far as the necessary skills are concerned, and then develop those. So I think we are in an exciting time as far as learning is concerned. I am not concerned about it, you know? I mean, I think it’s a good thing and it will be very exciting.

Lauren Xandra: I think that will be comforting for many to hear. And I suppose, taking a step back, I understand that your background as a civil engineer, both at IBM and as a Professor of Civil Engineering at the University of Cambridge has influenced your interests in mathematics, statistics, and engineering more broadly, and that you sometimes describe this intersection as “data-centric engineering.” I’d love to get your take on what that means and why this is a really important area.

Mark Girolami: So first of all, I should confess to say that I’m not a civil engineer. I’m usually introduced by my colleagues at Cambridge: “This is Mark. He’s the Professor of Civil Engineering who isn’t a civil engineer.” I actually was brought into Cambridge from the mathematics department at Imperial College, where I was Professor of Statistics. But I think that’s a good example of this intersection of mathematics, statistics, and engineering and I think it builds on what is happening in, in AI.

Because most of these AI systems like ChatGPT are in essence big statistical models, and they are at the end of the day, learning the regularities of some sort of language. They’re using mathematical methods to learn their parameters and then there’s a huge amount of very clever engineering in developing these large-scale systems. So I think for the intersection of mathematics, statistics, engineering, and computer science, it’s a hugely fertile area.

I talk about data-centric engineering in two ways. The first one is that data centric engineering is nothing new. The Victorian engineers were doing data-centric engineering. They were conducting experiments. They were making measurements. They were gathering data. And then they were defining empirical laws of whatever, you know, how structures stand up, how fluids flow, what have you. But what is now different, and in similar fashion to the revolution that we’ve been seeing in AI, is the availability of the amount of data that we can have, the amount of data that is of a very, very fine granularity that gives us insights into things as small as the cell. We can gather data about the details of the cell, we can gather data about the darkness of the cosmos, and pretty much everything in between.

And so that data has information about the things we want to study, and so using that and putting that data right at the center of whatever it is we do, whether it’s science – like AlphaFold, you know, they took all this data, the smart engineering of the AlphaFold algorithm, put it right at the center of everything that’s being done – and it’s the same with engineering, whether it’s aeronautical engineering, whether it’s agricultural engineering, is being completely transformed because of this availability of data. And you could even go further forward and say that AI – which is enabled by data, it is data-centric – is enabling some of the biggest advances that we are seeing in engineering and science in practice. So it is very important.

Digital twins helping to transform society for good

Lauren Xandra: Excellent, really fascinating. I’d love to hear about some of the projects that you’ve worked on in the past that put data at the center of problem-solving, and are applied to some of the big challenges that we were speaking about earlier in this call.

Mark Girolami: One area of AI that we haven’t mentioned on this call is the idea of the digital twin or the digital avatar. What does that mean? Well, it means that we have something in the world, whether it’s an aircraft, whether it’s a bridge, whether it’s a process, plant, whether it’s a city, whether it’s a person. And what we can do is develop and realize a digital representation of that entity in the physical world. And we can couple them together–we can twin them with data. So we can make measurements across our city, we can make measurements of our aircraft, we can make measurements of our farm, and we can feed that data into the digital representation. And then use the coupling between the physical and the digital to control whatever it is we are interested in for better design, or whatever we are interested in. I’ll give you two examples.

One is in agriculture. We have real big challenges in terms of future agriculture. And one area that is being developed is the use of hydroponics. And there is a company that has been developing farming underground, so not above ground, but actually underground. At Clapham Junction in London, there is a disused tube line, which has a farm in it.

Lauren Xandra: Incredible.

Mark Girolami: It grows herbs. It grows vegetables that supply London. We were brought in to help with this because growing plants and agriculture underground is completely different from over ground. The way that you control heat; the way that you control, you know, various gasses, oxygen and so on, is completely different. And what we were asked to do was two things. One was to generate data from the farm about all of these key indicators that were important in being able to control how the yield of the farm would work, and then develop a digital twin of the farm so that they could better control the operating or the growing conditions of the farm. That’s exactly what we did. We built a digital twin of the environment, of the farm, of the way in which the various crops would grow, how they would use CO2 and so on, and humidity, how the farm would be able to reject or retain heat, and so on. And the use of that, right? With all of the sensors feeding data into the digital twin and the digital twin then saying, here are probably the conditions that would be optimal for this type of yield, we were able to increase yield by many percentage points and make that farm really efficient. And so this is incredibly exciting.

It’s now used – you know, the next time you go into a Michelin-starred restaurant in London, you’ll probably be eating herbs that were grown under Clapham Junction. And that whole area of agriculture, we’re now working with some of the government agricultural research stations where they want to develop digital twins of some of the big agricultural farms. So that’s agriculture.

Another area is our infrastructure. You know – the roads that allow our transportation system on land to work well, the bridges that carry our trains across rivers or gorges, and so on. One project that we’ve been involved with to deal with transportation, was when Network Rail were building new rail bridges up in Staffordshire. What they did was they embedded a number, about 180 sensors, these were fiber optic sensors, in the concrete of the bridge when it was actually being constructed. And the bridges when they were constructed, when they were actually deployed–they were described as living structures because we were able to gather this data from the bridge in real time and continuously.

So every time a train went over it, we were getting all of this data. And so what it means is that we could continuously monitor the performance of that structure without sending engineers out to look at it because we are getting its footprint. We’re getting its heartbeat on a regular basis. And again, what we did is we built a digital twin of that structure and the continuous feed of data into the digital twin, and the digital twin then being able to see or answer questions about if the bridge is performing well, if the bridge is degrading, if there will be a point where we might start to see structural problems that could put potential users at harm. And we have this now, the bridge is monitored continuously, it is controlled continuously, and it is all done remotely. So this notion of a digital twin feeding off the data that comes from the bridge, this is a complete transformation in the way that bridges and critical infrastructure can be operated and controlled to both make them more efficient, in ensuring that their availability is optimized, and make them safer, so if there are any faults that are starting to put the structure onto a pathway where it may well fail, we can see that long before that critical point happens. And I’m sure you’ve seen very recent stories of some of the disasters – the bridge in Italy that just completely collapsed, and the mortality was dreadful. Having something like this, as I said, will make operations more efficient, but also make it much more efficient and much more safe, and ensure the risk of these catastrophic failures is greatly reduced.

Lauren Xandra: Thank you, Mark. Those are really fascinating examples of data science in practice to solve real world issues. To close, it would be great to get your advice to someone who wants to be “future proofed” as we call it. What do you think are the most exciting and important forces to stay knowledgeable about, and to continue to track?

Mark Girolami: I think that you just don’t stop learning. Don’t stop reading. And read widely: read about politics, read about politicians, read about economics, read about finance, read about engineering, read about physics. Read about anything. And just always stay informed. I don’t think there’s one thing; I think it’s just a case of being naturally interested in the world around you. And don’t stop asking questions. I think if you do that, you’re gonna be “future proofed” for sure.

Lauren Xandra: Excellent. Well, thank you so much for your time and insight. Really appreciate it.

The views expressed herein are solely the views of the author(s), are as of the date they were originally posted, and are not necessarily the views of Two Sigma Ventures, LP or any of its affiliates. They are not intended to provide, and should not be relied upon for, legal, regulatory and/or investment advice, nor is any information herein any offer to buy or sell any security or intended as the basis for the purchase or sale of any investment. The information herein has not been and will not be updated or otherwise revised to reflect information that subsequently becomes available, or circumstances existing or changes occurring after the date of preparation. Certain information contained herein is based on published and unpublished sources. The information has not been independently verified by TSV or its representatives, and the accuracy or completeness of such information is not guaranteed. Your linking to or use of any third-party websites is at your own risk. Two Sigma Ventures disclaims any responsibility for the products or services offered or the information contained on any third-party websites.