Episode 54 Cybersecurity Challenges in the Era of Large Language Models
Download MP3Hessie Jones
So now we have the emergence of large language models, and today we're starting to see the speed of adoption, how we communicate, how we research, how we improve our productivity has actually transformed society. So LLM's offer remarkable abilities when it comes to natural language and understanding and the generation of more accurate and human like language. But they also introduced some new challenges and risks when it comes to the practice of cybersecurity and its agency. At its adjacency into data privacy, data breaches and attacks have been really prevalent in narrow AI. But with the increased vulnerabilities introduced by LM's, we're starting to see more sophisticated phishing attacks, manipulation of online content, and exploitation of privacy control. Goals
Welcome everyone to Tech Uncensored. My name is Hessie Jones. Today we're talking about cyber security in the era of large language models.
So here's a little bit of the wrinkle that happened a couple of days ago, April 22nd, the Biden administration signed set Section 702 of FISA, which is the Foreign Intelligence Surveillance Act into law that essentially reauthorizes the government to spy on U.S. citizens without a warrant. So, I interviewed Christine Bannon, who is the US public policy manager. The time. And she said this LLM's will be used by governments to sort through large data sets for intelligence, making it easier to conduct mass surveillance.
So today, we're honored to welcome Saima Fancy to be with us today. She has expertise in data privacy. And the intersection with cybersecurity and AI? We'll dive into many of these vulnerabilities when it comes to LLM's and will actually address the new law and how it legally undermines individuals data privacy rights as well as through civil liberties. And what are the implications for emerging tech companies? So a little bit about Saima Fancy: She has two decades of professional experience in chemical engineering. Privacy, engineering, law, data privacy and security. She's a speaker. She's a mentor and an active member and volunteer with the IAP. All tech is human leading cyber ladies, women and AI ethics, and the Governing Council for the UofT Faculty of Engineering
Welcome.
Saima Fancy
I'm a. Thank you, Hessie. Happy to be here. Great. Thank you.
Hessie Jones
This is going to be an amazing discussion. So let's start with your journey first as an engineer and how you eventually. Move towards data privacy.
Saima Fancy
Yeah. So, my journey between I started with doing degrees in biochemistry and then chemical engineering from the University of Toronto set off to the traditional engineering routes in my career path worked as a chemical engineer in a environmental consulting firm, doing environmental side, also audits of different chemical plants. US, Canada. Working with lawyers along the way actually for the transactions that were happening of these plants and along the way of in between doing, you know lab work and presenting our results to lawyers and potential purchasers of these properties, I fell in love with law and decided to pursue a little bit of law. Not a lawyer but did work in a couple of boutique litigation firms and became a scientific legal advisor and picked up a lot of those skills along the way. Today, and did that for quite some time, had a start-up of my own where I worked with lawyers in essentially storytelling their stories by way of medical illustrations in, in court, and went all the way up to the Supreme Court of Canada with the the lawyers who retained my services and then went back into engineering again and ran many of. A manufacturing facility here in Northern Ontario of pressure sensor. Labels working with machinery and running businesses, and learned all that, and still I wasn't fully satisfied. I had all these plethoric skill sets, and I thought, you know, somehow, I need to be able to bring them all together and put them to really good use and and be constantly on the path of learning in perpetuity. And that's when I discovered. We did privacy in cybersecurity, and it wasn't it wasn't by accident. It was by design. I was looking for for a landings professional space for myself where I could use all my skills and I and I. And. And that's exactly what I found. And we've got a bunch of certifications done, including a privacy engineering certification at Carnegie Mellon University, and from a few other places and yeah, the journey beginning with the public sector, where you're working for the Ministry of Health, going to private sector to Twitter, learned about social engineering and how that works. And right now, I'm back in the health tech space again. So really, really happy about my how I came to where I am today.
Hessie Jones
That's amazing. That is so amazing. I I find, especially in this phase, when it when it comes to, let's say, responsible tech, you you either fall into it. Or you actually make your way towards it. And I think at the time that data privacy and even responsible tech, it wasn't on, it wasn't vogue. I think people just decided to do the right thing and they were trying to find ways to do the right thing when it wasn't popular.
Saima Fancy
Exactly. And for scientists like myself, a tech geek like myself, and also philosopher of sorts, who wants to do the one thing, how do you do the right thing by putting all of this together and data privacy protection of people's data is it is the right thing to do and. I love it.
Hessie Jones
That's great. That's great. Well, thank you for joining us today. I'm going to throw a couple of stats at you when it comes to security breaches. Just to give context into how serious this problem is. So between 22/20/22 and 2023, there wasn't 20% increase in the number of data. Reaches the number of publicly reported data. Compromises increased 78% year over year from 22 to 23. And the average cost of a data breach reached an all-time high last year of 4.45 million, and that was a 15% increase from three years ago. This is an interesting 190% of organizations have at least one third party vendor that has suffered a data breach and. Globally, there were twice the number of victims in 2023 compared to 2022. So. What do you think? The underlying causes of these trends are especially when it comes to the the number of data breaches and the level of harm that they've done.
Saima Fancy
it's the greed for data and and whatever means to be able to get to collecting the the stock stocking up of the data that big orgs wants, right. But that the fans of the world want. And with that coming in various acts. That comes irresponsible acts, right? Would that come deployment of of tech without doing enough soul searching and enough? Scientific reasoning which says, for example opening eye, it was released way too early, right and and sometimes a lot of these technology pieces are released early by design to be able to collect as much data as possible. The transaction is you can download it for free. But it's not really free because you're going to be putting in your data, which they need to run their models, because without corpus of data of that of of a magnitude that is in thousands or hundreds and thousands of petabytes, these LLMs can't learn. They can. Not a function.
Hessie Jones
So it's almost by design. When you say like it was released far too early. But if it wasn't released then it wouldn't be accurate. So it's almost like a chicken and egg thing.
Saima Fancy
No, they could. They could have purchased the data lawfully and trained it as such, but then perhaps they would argue they would have the arguments that you don't have the wide variety of data that you would you train on, right? Right. But the responsible thing to do would be to wait. Gemini wasn't out yet and he was ready. There were many other tools that were ready at the time, but they weren't released yet for that particular reason because they were going through sandboxing. They were going to validation, testing, verification, testing. Right. So I think a lot of this leads to it and a lot of. Times it's not done. Purposefully by companies, I think there's a lot of excitement. There's a lot of need for it. There's a lot of warrant. Everyone's waiting for the next twinkling toy to come out so. You know, there's a race for it, which? Which can cause inadvertent harm. And it's just like, OK, let's just release it and see what happens. See, see. See what sticks on the wall. So all right.
Hessie Jones
Yeah, it seems almost counterintuitive in a lot of ways. Went for technology. I guess from a process perspective, Q&A, you know, ensures that the inform that the technology or the product is working before it's released into the ether, but when it comes to AI it it's it almost turns that whole process into its head. It's released into beta and then put public beta into, you know, and at the same time the harms get.
Saima Fancy
Yeah, and don't, don't. Forget a lot of these tools are not stable right? And what that means is that they are the developers are iteratively learning. They're fine tuning as time goes on. So that means that what's coming in is not necessarily going to give you the correct information of the output that you. Want, right? So we're still in learning stage and yet it's life on the on the worldwide form.
Hessie Jones
OK, so let's talk about LLMs. So when they indiscriminately scrape information, it can expose things like your credentials, your API, or any confidence or any kind of confidential information. So from your perspective, are we much more vulnerable? Today, because it seems like we're on the defensive and trying to mitigate a lot of this stuff, that's.
Saima Fancy
The public at large is vulnerable, no doubt more vulnerable than the past 12. Not much to compare with to this is unprecedented, right? We're in the fifth industrial revolution. AI has been around for a while, but Gen. AI hasn't. So as we release these tools out in the public and not educate the public on how to use and yes. The vulnerability level is super high. People are entering in their clinical notes and records and asking chat GP to. So, to summarize or whatever tool they're using, right, not realizing that once you put your personal information in, it's gone into the ether and it's going to. Be used to train. Models, right, and that your information if it. Not protected redacted synonymized anonymize what not will be able to cause the output results to be able to reidentify you. Hence there goes your privacy right? So from that perspective, yeah, the vulnerabilities and all time high. And that's not just doesn't just go at a personal level at the corporate level too. If you introduce all these tools to your employees and not train them how to use it, not train them in the Area of prompt engineering the vulnerability is super high and for corporations the risk is even higher because they risk to lose trust. They will risk data breaches. Remember, as these tools are coming. Vote not only are we at the receiving end, you know, benefiting from it, but at the other end where you've got the various actors, they're using them to fine tune their ransomware attacks, their fishing attacks and whatnot, and becoming more and more sophisticated. Hence you're seeing the attacks go up and the cost to immediate. Them go up.
Hessie Jones
OK, so let's dive into that. So can you mentioned a few there. Can you talk a little bit more in detail about the rise of some of these attacks and what are they?
Saima Fancy
Yeah, a rise of a lot of these attacks are the most common ones are from phishing and malware attacks, rate phishing, one being the easiest. Where this e-mail comes in, it looks exactly like your bank. The colors are right, the logos are right, and it says, oh, you know the there was money that was by mistake the drawn or and and click here and you can deposit back in. And of course, when it comes to something like that, you know your dopamine level goes high and you're like, Yep, I'm going to do it. Click, click before you know it. You've been attacked. Right. So that's the most common methodology, both from a commercial and a personal level and. As why? Why the systems are failing so much is because a lot of a lot of systems out there are legacy systems with faulty APIs, leaky APIs. People are still companies are still moving to the cloud, or they've got a hybrid system, or they've got just on Prem systems which actually right now seem to be a little bit safer than some of the cloud computing. Systems also the other thing is when startups are starting, they are not calculating in for forking in cost for security, for privacy concerns. And that's a huge mistake because by the time we realize it and we do it, it's kind of too late, so to speak. You know, if you look at privacy. And design principle #1 be proactive, not reactive, because reactive is not the way to be able to backtrack and try to fix your system from the get go in your software development life cycle. Putting these principles from the minute you start developing your software from the minute you deployed and the data gets. Ingested. Those are some of the measures of preventing these attacks, but these attacks are super sophisticated. I've just given you some of the simple ones, but but they go through the roof in in sophistication to the point that they can bring from banks and hospitals down, which is what happening right now. If you look all across North America in the healthcare landscape, but hospital up to hospital. Coming down on their knees and, you know, protocol is don't give in to these the attackers. But that means a lot of data is on the dark web being. Sold for pennies.
Hessie Jones
On the dollar? Yeah. So I remember the attack that happened with, I think it was indigo. And the the employee data was compromised and unfortunately they couldn't do anything with it. They they gave in to the attack. As you have said that they shouldn't, they shouldn't. The only thing that they could do was tell their employees. Go back to your bank, try to try to change your credentials, and to limit the effects of them actually coming in because the employees had to give up. Obviously their bank account information.
Because that's how they that's how they get paid. So from that perspective, it almost it almost seemed like. Large corporations are handcuffed and don't really understand how they can mitigate it. So imagine that from the perspective of a small startup that's just starting to collect data and they don't. They may think that they're immune to it because no, no one is going to try to attack.
Saima Fancy
That that. Yeah. And that is such a naive way of thinking what these MLMS have done is essentially increase the landscape of attack of the attack landscape. It's increased so wide. There's so many ways for the various actors to get it. And they do get it. And they longer and they watch you function and your employees function within your servers. Living. Systems and then they find that back door at the right moment at the right time could be middle of the night and then boom, they get in and then that's it. The access is there and a lot of times corporations don't even know that that's happened until it's too late. So what I'm saying is why allow that to happen? Why allow them to have find back doors that you didn't think of? Putting these measures ahead of time, carve money out of your start up in your funding ahead of time and make this a priority because all of this would be for not then, at the end of the day, you could lose your entire business if you don't to put measures in place. Which brings us to regulations, it's the regulations that are becoming robust around the world. That will bring you down, right? We know that in the year we know in certain parts of the States the punitive damages are deep and they are cutting in deep. Canada's catching up. We're getting there. And with AI acts becoming as robust as well, we can't afford to be less, say, less affair about this.
Hessie Jones
OK, so let's talk about another law and I alluded to it in the beginning. So this was the Section 702 of the Foreign Intelligence Surveillance Act. And so just a bit of background for our listeners, this law was actually established in 1978 to spy on foreign individuals. This is the US government. That after 911 the rules change that allow the broad sweeping of U.S. citizens information without a loan. Now they call this incidental collection, they said. Well, if it comes in there, we're not going to, we're, we're still going to. Allow it right. So some key dates 2013 NSA whistleblower Edward Snowden. Expose this and this prison program. And infamously began to surface about big tech. Google, Facebook at the time, giving the government unfettered Internet access to people's data. That means later it was ruled unlawful, and this bulk collection of the US citizens emails they're their mobile. Communications their phone records violated the Constitution, so since then they've actually added. Provisions to increase the oversight and minimize the incidental collecting of people's information, but ultimately 2 days ago, all three governments rejected it and 702 and Section 702 without amendment was actually now put into law. So now we have these lamps. They're far more powerful than what we considered, let's say eight years ago, when the AI hype was here. They have increasing context and understanding of people's behavior and their intentions. So from your perspective, what do you think the implications are for companies that actually collect the data and create these models more often? Yeah, with?
Saima Fancy
With laws like this passing, I understand it's been renewed for another two years to be able to collect data and and force companies to. Hand over that data. The implications are huge, so if if the governments federal governments want to come and collect data on a span of population, these companies are mandated by law to hand it over, which is surveillance at A at and it's just been taken at another level, right in government related surveillance. The implications are scary for regular folks, day-to-day folks, and it's all justified under the guise of World state protection. Right. We need to be able to spy on our citizens living abroad on any foreign state actors that may. Sure. At the at the face value of it, it does look all right. It looks like OK, we need to do this, but it just opens doors wide for abuse at a larger scale, right? With the just a viable tool. So, it is unfortunate.
Hessie Jones
What do you think that means for let's say, companies that collect information about their customers, but they do not want to hand over the information to the government once they've once they've requested it, like what can they do from a what are they doing from a technological perspective to try to protect the data privacy?
What do you think that?
I don't know what they can do because this specifies the law is is a nationwide sweeping and it compels companies to hand it over. There is no door to get out of it. If you are going to be collecting data you by law are mandated to hand over the data set. If requested by the federal government.
Hessie jones
Other so in companies like signal or proton, they've they basically said we can't access the data, it's ended to end encryption and it only allows you as a user to have access to the data that you create. Is that is that a way to absolve your responsibility from the government mandate? If you technologically do not have the capability of actually accessing individual user information.
What do you think that?
Yeah, I would think so. I think that's a really good point you raised. There's something called data vaults out there where if you do allow your customer to put the data in there and you are just working off of the metadata, those votes are. Locked and encrypted and only the customer themselves can access them. Then yes, blockchain technology is another one right where you've got the key to open and close and until unless you're asked personally, which the federal government won't come due because FISA doesn't apply to them. That is one way to protect that protect them, but that requires a lot more measures, a lot more, or lost less money, making abilities as well, right? Because now you've really constrained your data pool and you're not able to use as much of it. But sure, that those definitely those are those are two, two good use cases.
So you mentioned earlier about so, that's how how I don't I. I wouldn't say it's difficult. All of them. But I think they're not mindful at their early stages of the things that they. Need to do. To to actually protect the information that they're collecting. Is it because of lack of money issue or is it just not being aware?
Hessie Jones
I don't. I think in the early stages, startups care about developing a product that meets the market need. So they're so used to building a tool that has solutions around it that the thought of the data collection and what they're collecting probably doesn't come in until much later, so.
Saima Fancy
Right.
Hessie Jones
So I think, on mass, they don't think about that. So, but you say that data privacy security should not be left as an afterthought. So what can they do in the beginning to shield themselves from the threat of any kind of cyber-attack and minimize the damage from them?
Saima Fancy
Yeah. You know there, there are a lot of out-of-the-box tools available out there in the market to allow for allow them to put in new security measures within the cloud space to say assume where we're working with the scenario where all the data that they're collecting and their computing is happening within a cloud structure of their choice. Whether it's private. The public there are a lot of out-of-the-box security tools that they can buy and deploy within their cloud to manage those data sets and to be able to create their own cloud, classify them, categorize them, bucket them up such that they can protect them right and then monitor their clouds constantly to see if there any. Any infractions happening? Any kind of? Of loopholes, any kind of openings that are within their cloud structure, but these these tools are not cheap, right? And they are expensive and they can be deployed in the beginning, right. As you're bringing data in, whether it's unstructured, structured and the other thing is really important for them to know that if they're going to have to be dealing with huge amounts of. Data they need to know where it is. There are a lot of big, big, big data companies that have said we don't know where data is sitting. We don't know which data centers it's in you. Know it's just. Flying all over the world, that alone is a. Huge risk, the. Other thing startups can do is make sure that the encryptions in place both for data at rest and data. Transit a lot of times they only look after encrypting data at rest and they think, OK, we're done well. No, a lot of the infractions that happen is when your data is traveling from data center to data center, from desktop to data center and whatnot. Right, you got to make sure the encryption is happening there as well. So there's a lot available and we're we're sitting in a really good spot in time of technological. Evolution, where we can see we've got a lot of things available at our leisure to be able to use not to. Is highly, highly irresponsible and the cost of that of those tools are coming down. But you've you've got it as your as your funding is coming in, you've got to be able to set some money aside to be mindful of doing that. Just like for your data storage cost, you're mindful similarly your privacy and security tooling, you've got to set money aside and purchase those tools and get them in. And right away so that you know you're going to be compliant with the laws because these laws are coming down hard and fast, they're robust. They have huge period of making and order making powers and they can shut you down the goals and the metals of the world are able to take the hit by the small startup.
Hessie Jones
Can we talk a little bit about some of those costs in the past like that Google, that meta have actually had to pay in because they weren't compliant and and let let let's, let's be clear before the privacy, these privacy laws actually. Need to be a lot of these guys were already flying under the radar because they were doing. That, that, that didn't have a law attached to it, so they could do it. But they continued to do it even after the law actually came to be. So can you, can you remember some of the some of the infractions that that Google or Meta had to absorb because of this?
Hessie Jones
There's been so many I've lost track of them. Actually, a lot, a lot of them. They've appealed and now remember Googles and medals of the metals of the world have a lot of money so they can hire lawyers to do appeals and counter appeals and. Not right at the end of the day, they they do end up paying, but they pay it a lot. Lower cost of it and in the beginning of time before regulations came into place. Yeah, they were getting away with a lot of it. Even the beginning of when GDPR came in and they didn't take it very seriously and were able to get away with it. But now we're in a stage in. That the regulation ecosystem and the iterative evolution of it, that country after country is over 130 countries that have put in robust privacy and cybersecurity laws that have order making powers are no longer going to. Except, you know, being under the radar, they are aware they're technologically aware, they're legislation are being constantly amended and updated to keep up with technologies. So it's going to be a lot harder for companies to get away with it, even the large ones. But the little guys, they're going. To be wiped out.
Hessie Jones
I didn't. I think that's the unfortunate fall-out. Lack of legislation because, I mean, if you're a small guy trying to do the right thing, but you may not necessarily have either the tools or the financial resources to try to put everything in place and you invert inadvertently do something that that is not compliant. Then you're right, you could be shut down and it gives more power to those that. I mean within that space, right?
Saima Fancy
But there are other things that smaller, some small startups can do. They can. They can practice data minimization; they can practice collecting data for the purpose that the data was collected. They can put in robust consent, consent management measures in place and put audit logging in place and be able to manage those logs and. And see who's got access controls in place. Right there. Are these other mechanisms and practices that are fairly healthy within data ecosystems that they can put in place that can help them get there if they can't afford the big out-of-the-box? Tools or or they can even purchase these tools, but smaller features of them and then eventually as their funds grow, they can enlarge the features and purchase more of them. But these practices are there that they can implement exactly.
Hessie Jones
I think so the companies out there like signal and proton. They're there to provide a service. They don't use any of their user's data for anything, even from a dashboard perspective. The minute you start creating these advanced services for your customers then you need the client data and then that becomes goal to you and then you can monetize it and then you create this downstream harm.
Saima Fancy
But even then, there are certain box tool companies tooling companies out there that actually allow you the customer to go in the safe space that create for you within your cloud where you can. Which the data with the tools that you buy from them, so they say we'll support you will give you the tools here is your space within it's your enclave you go manage it. We won't have access to it but we will support you as you need to as you need us. So, there is that design as well.
Hessie Jones
Yeah. And I I think I think as we start to get more into giving people control over their information and the ability to delete if they need to delete everything that we have to move the ability to create redundancy in our systems and that that.
Saima Fancy
That ability to delete is not in the hands of the consumer, right? They're what they have available to them is the tool called forget me. The ability for us to ask you to forget us for us you to delete our information. But how do you delete data that you don't have a handle on? How do you delete? Any data that's not been classified and categorized as such, whether it's PII, Phi. Financial data, right. And how do we deal with data that in in a mixture of systems whether it's multi cloud, hybrid one, what about the fact that there's data duplication, you've got data sitting on your developers laptop and you've got data sitting in different cloud structures, right?
Hessie Jones
How do you effectively delete data then and comply with the law and that's I think that's the issue. Right. So, the law sits here. But the reality sits here. So how can you enforce a law that doesn't allow the systems to effectively comply?
Saima Fancy
Yeah. And some in the GDPR, I mean, these laws are so robust you can't even back up data as as an organization that collects data without consent of your customer without letting them know that that's happening, right? Because if they have. Have mechanisms where they've locked. They've got the keys to their data that's been locked. You can't just go in and take it and and back and back it up for them. So those are good, healthy practices, whether they're happening at large or not is question right. But it's been like – they're all there. There are there are standards out there like NIST and ISO's that help you. Get there right. Even if you can't afford the big shiny tools, there are practices that you can implement within your startup that are good.
Hessie Jones
Well, thank you Saima. I could talk to you all day. One of my favorite topics, but that's all we have for today unfortunately. So today's progress as we as we noted. Is not without consequence and I think individuals need to be a lot more cautious, more informed, while organizations need to be more vigilant as AI continues to expand. So thank you so much for joining us.
Saima Fancy
No problem
Hessie Jones
So tech ancestor is powered by altitude accelerator. It is produced by Blue Max and Transistor radio. My name is Hessie Jones and until next time have fun and stay safe.