#136 - Privacy Engineering: How to Build for Data Privacy - Nishant Bhajaria

 

   

“Privacy is about handling data in a way that builds for both compliance and trust, maturity and transparency.”

Nishant Bhajaria is cybersecurity and data privacy executive and the author of “Data Privacy: A Runbook for Engineers”. In this episode, we discussed the importance of data privacy and privacy engineering. Nishant described his definition of data privacy and why it is becoming a key concern for users, companies, and regulators. He explained why doing data privacy is hard and how companies can build a privacy-first culture. Nishant also covered other data privacy topics, including data classification, data sharing, data consent, and data privacy applied to machine learning.  

Listen out for:

  • Career Journey - [00:03:29]
  • Writing “Data Privacy” Book - [00:05:45]
  • Building a Course - [00:10:04]
  • Data Privacy Definition - [00:13:43]
  • Data Privacy Concerns - [00:16:03]
  • Data Privacy Regulations - [00:22:07]
  • Data Privacy is Hard - [00:26:23]
  • Privacy & Security - [00:31:22]
  • Privacy-First Culture - [00:35:23]
  • Data for Machine Learning - [00:39:23]
  • Data Privacy Tooling - [00:42:45]
  • Data Sharing - [00:45:45]
  • Data Consent - [00:49:27]
  • Data Classification - [00:52:10]
  • 3 Tech Lead Wisdom - [00:55:46]

_____

Nishant Bhajaria’s Bio
Nishant Bhajaria is an executive in the cybersecurity and data privacy industry. Having started out as an engineer with a second act as a product manager, he pivoted to data protection before it became a high-visibility topic. Besides building and leading teams at Nike, Netflix, Google and Uber, Nishant has also authored the recently released Data Privacy: A Runbook for Engineers - a deep dive into strategies on effectively identifying, communicating and addressing privacy risks using technical strategies. He also teaches courses on LinkedIn Learning on cybersecurity, career development and building inclusive teams.

Follow Nishant:

Mentions & Links:

 

Our Sponsor - Tech Lead Journal Shop
Are you looking for a new cool swag?

Tech Lead Journal now offers you some swags that you can purchase online. These swags are printed on-demand based on your preference, and will be delivered safely to you all over the world where shipping is available.

Check out all the cool swags available by visiting techleadjournal.dev/shop. And don't forget to brag yourself once you receive any of those swags.

 

Like this episode?
Follow @techleadjournal on LinkedIn, Twitter, Instagram.
Buy me a coffee or become a patron.

 

Quotes

Writing “Data Privacy” Book

  • I had learned a lot of things both good and bad throughout my own career. And there was a real opportunity to add something to the popular knowledge about security and privacy.

  • How do you build stuff that’s gonna force people to work together in a way that people typically didn’t? People tend to be within their silos, their OKRs, their metrics, their products, their commitments. How do you build something that is not for one product, but for the entire platform? How do you build stuff that is not reactive from a risk perspective or proactive from an innovation perspective?

  • The lesson here is, if you wanna be a leader in the trust, security, and compliance space, you have to take some chances. You have to write your own book in a manner of spinning, cause there is no handbook. There is no course that teaches you how to catch these opportunities and make a difference.

  • The second thing is there are a lot of people with the same questions. There are a lot of people struggling with questions I had 7, 8, 9 years ago, because a book didn’t exist.

Building a Course

  • It’s interesting when you teach these courses, you have to essentially combine three things.

    • You have to combine a sort of the actual course material itself, the domain. You have to have your core content. Cause unless you do that, there’s no point in the course being there.

    • The second thing is you need to have a narrative. You cannot just throw instructions at people. Cause human beings fixate around stories. People who ping me for compliments tell me they remember the stories, my life experiences, things that I did well, things that I didn’t do well.

    • The third thing is the ability to promote these stories. So have the core competencies covered. Tell the stories and be ready to sort of really promote your work. Cause otherwise there’s so much content out there that people won’t catch it.

  • I tell them all the time, I’m gonna be very generous with my mistakes so that you can make new mistakes of your own rather than the one I’ve made. So then I can learn from your mistakes rather than just learning from my own.

  • There is a lot of value in telling people about your mistakes, because for every 10 people in Silicon Valley who pretend that they know what they’re talking about, there are like a hundred out there who are afraid to ask the right question because they are concerned that they will not look as smart as the people think they are. So this imposter syndrome is real.

Data Privacy Definition

  • Privacy does not have a definition per se that is universally accepted. So I think of it as two different definitions.

    • From a user perspective or from a customer’s perspective, I like to think about people like my parents, my sibling, my grandparents, my spouse, her dad. For them, privacy is about being treated with respect, like being able to make informed decisions with their own data and not be caught by surprise. Like there should not be an example where somebody intentionally, willfully, or continuously and carelessly did something with your data that you would not have wanted to do. In other words, I shouldn’t do something with somebody else’s data that I wouldn’t want somebody else to do with mine. So there’s a very human, visceral definition that may not be quantifiable, but something that is easily understandable.

    • The second thing is as a company, as an institution, as a government, you wanna make sure that you use somebody’s data in a way that is respectful, that is transparent, that is compliant, that is continuously improved. If you think about the scale of data, if you think about the nature of human engagement, if you think about the diversity of human beings across the world; no two people are gonna think about privacy the same way. So how do you, as a company, factor in the first part of respect and the second part of scale and governance and maturity?

  • Privacy is about handling data in a way that builds for both compliance and trust, maturity and transparency.

Data Privacy Concerns

  • What has happened in the last 10-13 years is pretty significant, cause multiple forces have colluded together to change our world in ways that often make it hard to recognize the world we live in compared to where we were like just a generation ago. We had an expansion of internet access, unlike any time before in human history. We had a switch from pure laptop, desktop functions to mobile devices. We had the explosion of global ID. So in the past where you had to create a username password every single time, you can now authenticate using your Google ID or a bunch of other IDs. You had the ability to build platforms to help provide people capabilities or to provide other people capabilities to sell stuff to customers at scale.

  • In the past, you had major changes happen in small increments. So you had Intel switch from memory to processing, which was a pretty big shift for its time. We had this amazing tech bubble in the late 1990s, but that was an example of innovation in search of actual utilization. You had people building amazing stuff, but there was no market for it.

  • I don’t think we have fully understood how much humanity has changed, cause in the last 10 years, a bunch of other things have also changed. Platform misinformation, abuse of trust, power consolidation in the tech sector. We’ve also seen examples of unstable democracies, essentially, teetering on the brink. People saying stuff that is factually not true. So because of these things that have happened at the same time, it is very hard to scale anything and measure things in a meaningful fashion.

  • We live in a world where our computational processing power far exceeds our moral processing power. So the ability to measure change, the ability to balance innovation and personalization on the one side with competition and compliance on the other is very hard to do.

  • Companies need to worry about this because you could have things happen to you in a way that you cannot fully predict at a time and place that is not of your choosing. And whether you are a company that is collecting the data and building the products on the one side, or you’re a customer who wants privacy, but also low latency at the same time. You have a bunch of things, a bunch of expectations, and a bunch of actions that are collectively incompatible with each other. And yet somehow we have to figure out how to make sense of this world we live in, because everybody wants everything all the time.

  • How do you catch these things before something bad happens? How do you build the right tools? How do you build the right products? How do you course correct before things go badly? How do you offer training and compliance at the same time? The lack of understanding and the lack of scaling and the lack of ability to undo things is the big challenge. So my advice to companies tends to be you should get things done correctly before you go too far down the path.

  • “Days and days of debugging saved you hours and hours of planning or hours and hours of testing.”

  • So many of us that have done things that may not be great from today’s perspective, but there is no online record of it. From a customer’s perspective, how do you make intelligent decisions with your data? The complexity now is like you may end up doing, saying something online that may come back to haunt you.

  • It’s the incompatibility of expectations around privacy and security on the one side, and expectations around quick performance of your service and app on the other side. That’s the challenge here.

  • The other aspect is a lot of customers don’t fully understand how the internet works, how online services get funded, because the domain has grown really quickly. And I think the tech sector has to do a much better job of telling people, “Hey, here’s how we make the internet work. Here’s how your data gets used.”

  • So the lack of patience, the abundance of complexity collectively means it’s very hard for customers often to make an informed decision. And everything moves really quickly. And also the regulatory state. The tools that are being built to protect the customers at the government level and the company level don’t fully appreciate the complexity and the volume of data.

  • Everybody’s moving very fast. The volume of data and the number of transactions are going pretty fast. And as a result, customers cannot always make informed decisions.

Data Privacy Regulations

  • When it comes to regulation, there are two perspectives. The one is, let’s come up with something quick to address the most pressing issue in the land. But the second perspective, which is something the policy folks that I’ve worked with in the past have educated me on, is the fact that you only get to do so much in the system.

  • What regulators wanna do is pass something in an omnibus fashion that covers as many use cases as possible. Cause the idea that you can pass something once and then pass something a second time and a third time is not always viable. Cause you have multiple bodies to convince. When you say that it took a long time, it’s because the systems that are required to work together to pass regulations are extremely complex. That’s number one.

  • The second thing I’d say is a lot of the people who build complex technical systems and the people who pass regulations are living in very different universes. The people who pass these laws tend to be policy makers, attorneys who don’t always understand technology. And the people who build these tools, collect this data are often engineers who don’t understand the world of policy. So the gap between the doers and the builders on the one side and the enforcers on the other side is a challenge.

  • That may not have been such a big deal 15-20 years ago when, as I mentioned before, cloud computing didn’t exist. Global IDs didn’t exist. Mobile computing was not a big deal. But now, with the volume of data, with the number of good actors and bad actors, the amount of innovation taking place, it’s extremely challenging.

  • I think it is very easy to criticize the fact that the governments of the world have not moved fast enough. But I feel like the challenge is, do you move too fast and break something? Or do you move slowly and come late to the party? There’s a bit of a bad choice on both sides. Nobody wants to be the person who over-promised and under-delivered.

  • The other thing I’d say is no country in the world wants to be responsible for passing laws that’s stymy their own local tech sector, while allowing companies in a different country an unfair advantage. So there is the antitrust aspect to it as well.

  • I feel like we’re gonna have to rethink the idea of how to pass regulation. And in this case, one of the reasons I wrote the book was hoping that I can have the attorneys, the policy people on the one side and the engineers, product managers on the other side come together to sort of really think about regulation in a meaningful fashion.

Data Privacy is Hard

  • Even though the book is primarily targeted towards engineer, the book is aimed at a lot more people than just engineers. So I think of the book as three different books fused together.

  • The first one third of the book is aimed at engineers, attorneys, policymakers together to understand, set context, set a common vocabulary, and have a common sort of shared set of facts to start with. The middle one third is aimed primarily at engineers to build the tools and the systems and some examples from a privacy security perspective. The last one third is aimed at policy makers, executives, and senior engineers, because then you wanna build things at scale, think about maturity, think about how do you build for trust, how do you think about reusing tools? How do you make privacy efficient?

  • The end goal of the book is a sort of threefold. First is to build better engineers who can focus on not just depth, but breadth. Close the gap between the engineers and the non-engineers. And the third is to set the conversation on how we need to do these things. Not just because privacy security is the right thing to do, but because it’s good for business, it’s good for national security, it’s good for the company’s bottom line.

  • So if you can make those three things happen at the same time, build better engineers, bring people together, and make sure that good privacy and security are seen as good business, then this will become not a problem, but something that people see as an opportunity.

  • I’ll quote President Kennedy who said that “The best time to fix the roof is when the sun is shining.” Privacy is kind of like having that flooded house, because you didn’t fix your roof in time. That’s the challenge here. That is why it’s hard, cause by the time you focus on privacy, your home is flooded. The street is full of snow. The people who wanna fix the roof can’t get to your house in time, and as a result, the floodwater keeps rising.

  • Privacy is hard because people start too late, quite frankly. Because people don’t understand that privacy and security risks are not something you happen to come upon in one day. It is the combination of risks you have built over time. Bad decisions you made. Good decisions you didn’t make. Things you delayed. Things you knew were a problem, but you chose to look the other way. So it is a combination of a lot of different risks.

  • I think people sometimes feel like fixing privacy is all about hiring somebody like me or buying my book. But that’s like saying that you can eat badly all day, all year, and then on the first of the year, you’ll pass a new year’s resolution, you’ll jump on the treadmill for 10 minutes, and then wonder why you didn’t lose the 40 pounds you gained over the year. Sometimes, it’s about accumulating risk over a long period of time and then trying to do a quick fix that will not fix the issue at hand.

  • The good thing is that there are things you can do incrementally. You can make the argument that collecting only what you need is not just a privacy imperative, it’s a sound business. Like, you don’t buy food that you’ll never eat. You don’t buy a car that you’ll never drive. Why would you ship something that you’ll never use? Why would you collect data that you wouldn’t use? Why would you collect bad data? Why would you use data that is outdated? So the things you do wrong from a privacy perspective are also bad from a business perspective.

  • Even if you don’t understand the first thing about privacy, you should know that the things you fix for privacy will also benefit some other part of your business. You should not be encrypting data that you will not be using. You should not give access to data for people who don’t need access to that data.

  • If you think about privacy, not as just a regulatory concern or a trust concern or a compliance concern, but as a business efficiency concern, you are already off to a good start. Just as you build privacy risk over time by not thinking about the business efficiency aspect of things, you can start addressing privacy concerns by asking yourself, “What can I do that is right from a privacy trust perspective, but also right from a business perspective?” So thinking of privacy and business not as competitive tension issues, but as business efficiency issues is the way to go.

Privacy & Security

  • I think of privacy as security plus. If you think about traditional security, we’re talking about firewalls, certificates, encryption keys, things like that. The assumption is that’s all you need to protect data. But the problem with privacy is, you have to think of security as privacy risk. So if something is a security risk, it is by definition of a privacy risk. So if you, in an unauthorized fashion, get into a company’s database and you steal somebody’s data, that’s obviously a security risk and a privacy risk at the same time.

  • But what happens if you are able to bypass security, either because you are an employee of the company or because you got into the company’s domain in a sneaky fashion? What happens if you get authorization to the data and then it gets used incorrectly?

  • From an engineering perspective, how do you think of privacy and security, not just as infrastructure and protecting the company, but about using the nuances of the data and protecting the customer as well? What happens is if you collect data that you should not have collected or if you collected data correctly, but now it is being used to do things that were not initially possible.

  • The challenge with data is, data is a living, breathing organism. If you collected my data three weeks ago, and it was perfectly legitimate to collect that data and use it for a certain purpose. But now, three weeks later, you also were able to obtain some other data about me from some other source on the internet. And both of those combined can tell you things about me that you may not have been able to infer with the first collection anyway. That’s a problem. Because now you have possibilities to do stuff to me and my data that you couldn’t do before and I don’t have the ability as a customer to know that.

  • Life is about compensating, checks and balances, right? So should be privacy and security. So my advice to engineers is, use tooling, use processes, use cross-functional checks and balances to make sure that just as you innovate, you can also protect just as you collect, you can also destroy. Just as you provide surprises to your customers, you can provide them with transparency and trust and choices. It’s all about making sure that there is a counterweight to everything else you do on a daily basis.

Privacy-First Culture

  • I often tell people, and this, I think it’s become almost cliche for me to say this by now, you would not have medicine without checking for the side effects first. When you go to the grocery store and buy milk, you check the expiration date. When you drive a car, and before you turn, you check the light and you check to make sure nobody’s coming.

  • In every other aspect of your life, common sense dictates that you account for safety and you account for some verification. Why on earth would you collect, ship, sell, share data without checking? Especially since if you make a mistake with that data, it could affect somebody’s life. It could lead to a big fine for your business, it could lead to a consent decree, it could lead to roadmaps being permanently affected. So just common sense, from a business perspective, dictates you should have a culture of privacy.

  • The second thing is, as I mentioned before, when you build the tooling and the processes to protect privacy, you are also building tooling and processes to protect your business. If you collect data that you should not collect anyway, then when it comes to discovering that data, and if you’ve already used that data for the wrong purposes, you later then have to spend a lot of time to understand, okay, what did we do with it? How did this happen? How should we prevent it in the future? And that is time that you could have spent building the next product that’ll get you a ton of engagement and revenue.

  • Privacy mistakes will not only surprise you at the time of not your choosing, but it’ll affect your ability to make money and build stuff that’ll help your company succeed.

  • Having the right tools to check for privacy risks is extremely critical. Building the tooling for the right privacy, honestly, could help you build those other tools to protect your business.

  • When it comes to bad privacy risks, there is no such day as Christmas or New Year. Every day could be Friday night, right? So you wanna make sure that you build the right tools to protect yourself and the company. So there is the strategic business reason to protect privacy at all times.

  • Whether you see it from the do the right thing, business perspective, or you look at the right perspective from a business self-preservation perspective, you wanna build a culture of privacy. It’s about the right tools, the right processes, the right verification.

  • Honestly, I tell people that if most companies did the right thing from the basic perspective, I wouldn’t have a job. You wouldn’t need me. The reason I had to do this, to write the book and teach these courses, is because companies often end up in two extremes. They either don’t care about privacy and get surprised and then have to spend the next 10 years trying to fix their mistakes, or they become over cautious and piss everybody off and they end up stifling the engineers in the company with unnecessary process.

  • My job here, my goal is to find that balance in the middle where companies can make informed decision based on the right tooling, make the case for intelligent regulation and intelligent innovation, and showcase their work to the customer so they can get credit for doing the right thing from privacy and security perspective.

Data for Machine Learning

  • A lot of people are using these words without knowing exactly what they mean because that’s just how the world works these days. So don’t be intimidated. Ask questions and try to make sure you have your facts in place before you make decisions about data or make a case about having more or fewer data. That’s point number one.

  • AI and data collection is extremely complex. On the one side, you have to collect data to represent the sample size accurately to govern for data quality, to check against bias. On the other side, I’m not as concerned about people collecting data for AI purposes. I’m more concerned about people collecting data without caring about the data.

  • As long as you have the right controls to suss out the utility of the data and then delete it once its usage is complete, I’m okay. As long as people know what they’re collecting and why, and then deal with access control intelligently, that concern goes down. So I think data collection and AI can be done intelligently, thoughtfully, as long as you have the controls in place. Not just to protect people’s privacy, but to make sure that the data itself is useful and correct. That’s number two.

  • The third thing is from a security perspective, data collection is also important. So I think it is less about collection, but more about careless collection, less about volume of data and more about the lack of controls to enforce policies on the data.

  • Cause this is a continuous learning process. You collect the right kind of data. You check to make sure you discover a shortage in your collection processes or deficiencies. And then you improve your collection processes. And then you identify something that happens some place else, you improve your processes. So it’s about teaching your AI models to be a better representative factor of the customer data and better utilization of your engineering resources.

  • It’s about continuous learning for yourself, for your business, for your tools, and for your data itself. AI is not this thing that fell from the sky. It’s something that was built by human beings, but with a massive amount of data and a massive amount of scale. So you have to learn not just from the model perspective, but also from yourself in terms of building the model in the first place.

Data Privacy Tooling

  • The challenge is there is no definitive tool available from a privacy perspective, because there is no definitive single privacy law. The absence of a law means that there’s an absence of a proper tool. There is no tool off the shelf, which is part of the reason I wrote my book, part of the reason I teach all my courses online.

  • The choices for companies are the following:

    • Build something from the ground up within the company. That is the upside of having built by people that have the tribal knowledge. But that has the downside of essentially being built by the same people that didn’t see it coming the first time around. So there is a tradeoff there.

    • There are multiple off-the-shelf solutions, third-party tools. They’re trying to fix these problems from an outsider’s perspective, but also make sure that there is a standard in the industry so that not everybody has their own bespoke software. That’s number two.

    • The third model is to start with building something in-house and buy a third-party vendor or buy a third-party vendor and then build something on top of that to provide coverage with their own use cases.

  • The domain is in its relative infancy. So I don’t think we’re at a point where we can just build something for everyone, because we don’t have one law in a given country. We don’t have an example of how one law can be properly, verifiably complied with, and we also don’t have a common way of doing things. There is so much diversification at the engineering level, at the privacy level, at the customer expectation level, at the international legal level, that it’s very hard to have one tool.

  • Which is why I tell people: shift left, start early, keep improving, keep building their virtuous circle. And then you can make this decision on an informed basis without being forced to comply with the law that may be expensive to comply with. And in the end will not protect yourself from an IP perspective and will not protect your customers as well.

Data Sharing

  • What happens in Vegas may stay in Vegas, but very little that happens elsewhere stays in that location.

  • Anytime data leaves your system, that is data sharing.

  • The problem starts when that data now gets shared and used for other purposes. When you collect that data as a company and you give it to third-parties without an understanding of what happens to the data once it gets there. Does that third-party have good privacy security practices? Is there an attack possible in the middle while the data is in transit?

  • For me, the biggest risk from a third-party sharing perspective is what happens when the data you shared, the data that exists on the dark web, the data the vendor may have, all of which combined together to fundamentally change the risk calculus.

  • We talked about classification, inventory, tagging, labeling, etc. That happens once or twice in the company’s history. But then what happens is once that data gets pulled with other data, the risk factor changes completely.

  • Think about what we can do to somebody’s anonymity, somebody’s identity, somebody’s physical safety at scale with massive algorithms, massive compute power. That is kind of the challenge when it comes to data sharing. As I mentioned before, data is not static. It is a living, breathing organism. Data is not like tax law that only changes once every generation. Data changes every single moment. Your data, my data, is changing as we speak.

  • What people typically don’t get from a sharing perspective is they go after hacking, they go after exfiltration, they go after attacks. But the real risk is what happens to the data and what happens to it without any malfeasance intended by anyone. Or what happens based on decisions that were made 2, 3, 4, 5 years ago that were totally legitimate decisions based on what we knew at the time. But with the advent of new technology, new algorithms, new manipulation systems, new AI, etc, the fundamental risk calculus has changed. And it’s very hard to reverse those decisions because the cat’s outta the door at that point.

Data Consent

  • When consent is required, how it should be collected, the clarity of the copy, that is more of a legal question.

  • What I will say is, from the engineering perspective, from a tool perspective, it is critical to ask yourself, are you giving the customer enough information? Are you giving the customer too much information? Are you giving the customer an informed choice? Because at the end of the day, this is a combination of the tools you build, the copy and the language, the clarity of the language itself, and the clarity and the integrity of the policy that’s behind it.

  • This is not just about privacy or security, it’s about the complexity of the law.

  • People are honing in on privacy and consent a little too much, because this is a larger challenge between when it comes to the disconnect between the people building the tools and the people writing the laws, the people who use the products and people who push out the policy, there is a significant disconnect that did not begin with privacy.

  • The challenge with privacy is much bigger simply because of the volume of data. But I think, we have to, as a community, figure out a way that the people who build stuff and the people who write these policies are in the same sort of contextual framework as the people who say yes or no to these policies.

Data Classification

  • You wanna make sure that your categorization of data is as contextual as possible. When you collect data, before you categorize it, before you inventory it and tag it, there are decisions you can make to the data about the data that might impact how seriously you treat the security or privacy of the data. You can reduce the risk by doing things like aggregation, perturbation, data obfuscation or some other modality of verification of data. You can increase, in which case you can keep the data for a long time. In other use cases, you can collect the data and not change the data at all. In other words, take on the risk of identification, but keep the data for a very limited period of time, minimize access, in which case the risk goes down.

  • There is a constant tug of war between the precision of the data and the retention of the data. The longevity of the data and the precision of the data. You have to sort of see what that balance looks like for you. And that balance may change on a day-to-day basis, week by week basis, depending upon the volume of data you have, your risk appetite, the nature of the customer, the kind of data, the stage of growth you’re going through, the country you’re doing business.

  • So privacy is very contextual. It is very visceral. So you have to make sure that the tooling and the processes that you build for it are responsive to that complex nature of privacy.

3 Tech Lead Wisdom

  1. When it comes to fixing for privacy and security, it is no different from any other innovation. Think of privacy as a product.

    • Sometimes people who work in privacy and security make the mistake of thinking of privacy and security as a cause, as a moral issue. But that is the beginning of the conversation.

    • If you went to any corporate CEO, they will tell you, we care deeply about privacy and security. Most important thing. They will also say we care deeply about growing our business and keeping our employees well paid. Most important thing. What happens when there is a conflict between those two?

    • Life is about making choices, right? So recognize that. And recognize that whether it’s privacy, security, misinformation, AI, fairness, equity; whatever your cause is, they are looked at through the prism of the business.

    • When you make the case for funding, for tooling, ask yourself, how do you make the case in a way that responds to the needs of the business? Now, there will be examples where it is critical to do the right thing from privacy and security perspective, no matter the business cost. But that is not true in every use case. Like you don’t have to run privacy and security in a way that hurts the business.

    • You have to have that level of judgment when it comes to privacy and security. You need to be very deliberate about telling the business, we shouldn’t do this because of privacy or security issues, no matter the cost of the business. But there are 50 other cases where you can say, the business wants X but if we just do X a bit differently, we can get the right privacy outcome. And in the long run, that’s better for the business, anyway.

    • Try to recognize that there is sometimes a moral cost we made, but in a lot of other cases, there is a business sensitive case you can make that will make the right case for privacy and make the right case for the business as well.

    • A lot of engineers often get extra careful and they hurt the business with unnecessary process. And in some cases, they become extra careless and they hurt the business, because they didn’t do the right thing. Recognize when it’s important from a moral perspective. When are you doing too little? When are you doing too much?

  2. My lesson to engineers is to ask questions. Seek the advice of the legal team, the comms team. Document things whenever possible. But if you have concerns, say something. The worst thing is maybe you will ask the wrong question at the wrong time. There is a lot of forgiveness in my experience from asking the wrong question or taking initiative. There will be a lot less forgiveness if you knew what the right thing was and still didn’t do it.

  3. Be humble, be creative, be ethical. That’s my advice to engineers as I would give the same advice no matter what question you ask me, privacy or otherwise.

  4. The other advice I would give is don’t wait for regulation.

    • My general, my big frustration in life, honestly, when it comes to engineers is, engineers have allowed themselves to be painted into a corner.

    • Engineers often accept the idea that their job is to write code and do what somebody tells them to do. No, I think engineers should be willing to understand that what they’re doing with data is extremely complex. It has implications upon people’s lives, but it also makes the company a lot of money. So don’t wait for the regulations. You should wait for the requirements, but don’t always wait for the regulations.

    • If you feel you can make a more intelligent way, build a more intelligent tool, come up with a more intelligent process to protect privacy, make the case for it. Make the case based on data, make the case based on scenarios. Make the case based on business impact. And recognize that engineering is business from a technical lens and the business is engineering from a non-technical lens. The two are connected.

    • My advice to engineers would be, think about somebody else’s data as if it were your own. And ask yourself, how would you build the right tool for it? So don’t wait for regulation. Like if your house were on fire, you wouldn’t wait for the fire alarm to go off. If you can see the fire, if you can feel the heat, you’ll probably run for the door.

    • Ask yourself, why not do the right thing today rather than waiting for the regulation? Cause it is entirely possible that you have discovered something that the regulators have not. You can build the right tool and inform the next regulation that’ll benefit a lot more people. So this is a chance to do the right thing for your business, for your customers, and also for your own career as well, cause you’ve done something that nobody else has done so far.

Transcript

[00:01:03] Episode Introduction

Henry Suryawirawan: Hey, everyone. Welcome back to the Tech Lead Journal podcast, the podcast where you can learn about technical leadership and excellence from my conversations with great thought leaders in the tech industry. If you haven’t, please follow the show on your podcast app and social media on LinkedIn, Twitter, and Instagram. And to appreciate and support my work, subscribe as a patron at techleadjournal.dev/patron or buy me a coffee at techleadjournal.dev/tip.

My guest for today’s episode is Nishant Bhajaria. Nishant is cybersecurity and data privacy executive and the author of “Data Privacy: A Runbook for Engineers”. In this episode, we discussed the importance of data privacy and privacy engineering. Nishant described his definition of data privacy and why it is becoming a key concern for users, companies, and regulators. He explained why doing data privacy is hard and how companies can build a privacy-first culture. Nishant also covered other data privacy topics, including data classification, data sharing, data consent, and data privacy applied to machine learning.

I hope you enjoy listening to this episode and learning a lot from it as much as I learned from this conversation. And if you do, please share this with your colleagues, your friends, and your communities, and also leave a five star rating and review on Apple Podcasts and Spotify. It will help me a lot in getting more people discover this podcast. Let’s go to the conversation with Nishant after hearing a few words from our sponsors.

[00:03:01] Introduction

Henry Suryawirawan: Hello, everyone. Welcome back to another new episode of the TechLead Journal podcast. Today, I have with me an author of a book titled “Data Privacy”. It is actually quite an interesting topic, because we’ll be covering a lot about what is data privacy. That’s the first thing. And what we can do from the engineering team, from the product team stance in order to protect our users’ data. So Nishant Bhajaria is here with me. And I’m really looking forward for this conversation. Hi, Nishant!

Nishant Bhajaria: Hello. Thank you.

[00:03:29] Career Journey

Henry Suryawirawan: In the beginning I would like to ask you maybe if you can share your career journey. Maybe sharing about your highlights or turning points with audience so that they can hear from your story.

Nishant Bhajaria: Yeah. Thank you for having me here. I appreciate the opportunity to talk about the book and my career journey here. So I am one of those people that did not quite fit into one lane. When you work for companies, anywhere in the US or anywhere else in the world for that matter, they think of you in terms of your skillset, your ladder. So accounting, engineering, non-engineering, legal, etc. I’m one of those people that likes to operate in multiple realms at the same time, because as companies become large and vast, the opportunities exist where people don’t see them before.

So my career journey began as an engineer working for Intel. That was my first job after graduate school. And then I made a switch in the late 2008, 2009 timeframe, away from Intel, away from semiconductor development to healthcare. At the time it felt like a pretty unwise move, because I was leaving an extremely secure job in the middle of what was gonna become a pretty deep economic recession. And in the short term, it did feel that way cause I had to go through some instability, a lot of new learning. But then I learned a lot about healthcare, about product management, about security, about compliance, about how do you do things that represent the entire spectrum of the company rather than just working on the one area that I would’ve done at Intel.

So that diversification enabled me to then gradually make the pivot to product management, program management, run big teams, big organizations, really massive cross-functional initiatives. And then over time, that became a full on segue into a more detailed security, privacy engineering. So essentially helping protect the company from a business compliance perspective on the one side, while leveraging data to deliver features to customers without hurting their privacy and security. So essentially, I was able to represent the interest of the business from a commercial and risk perspective on the one side, while building for trust and compliance on the other side.

You have to remember, as you rise in the company, you have multiple customers. Your internal stakeholders are customers, but then your external customers are also your customers as well. But then people in the press, in the media, in the regulatory circles, they also represent your customer base. So how do you support multiple people at the same time? And I love the scale. I love the challenge. I love understanding whether it’s a product or a problem from multiple perspectives. So my career journey basically spans not just different skill sets, different companies, but also different levels of detail and different levels of strategic focus across the company and across the sector as a whole.

[00:05:45] Writing “Data Privacy” Book

Henry Suryawirawan: Thanks for sharing your story. And maybe the story about your book. So you started your journey in security and privacy engineering, maybe in this healthcare company. How did you come about writing the book? What kind of problems did you see back then, and why did you decide to write the book?

Nishant Bhajaria: So this is the question where people normally have an inspiring story when it comes to the book. I don’t have one. I wrote the book, because it was the beginning of the Covid pandemic, and I didn’t have anything better to do. I couldn’t bake bread to save my life. So rather than turning the house into an oven and like setting it on fire, I thought writing a book would be a risk free adventure.

And I never anticipated the book to do well. I never anticipated that I would finish the book for one thing, because I had never written one before. In fact, I had not even written a proposal before. And the publisher told me most books end up being abandoned, even if they’re started out by an author who has one or two books behind their name. In my case, I didn’t have any experience writing a book. I had taught a lot of privacy, security, and career management courses on LinkedIn Learning before. But having a two hour course is one thing, writing a book is something totally different. So I wanted to say, do some research, make some connections, and take a first stab at writing a book, thinking that maybe the next time around I’ll be a lot more prepared.

But once I started writing the book, once I started working with the editing team in London, once I started getting feedback from people who read individual chapters of the book, I realized how far I had to go in terms of being able to articulate my views in a way that people understood. But I also realized I had a lot of experience. I had learned a lot of things both good and bad throughout my own career. And there was a real opportunity to add something to the popular knowledge about security and privacy.

How do you build stuff that’s gonna force people to work together in a way that people typically didn’t? People tend to be within their silos, their OKRs, their metrics, their products, their commitments, right? How do you build something that is not for one product, but for the entire platform? How do you build stuff that is not reactive from a risk perspective or proactive from an innovation perspective? So all I had to do was figure out how to leverage the lessons of my career over a lifetime into a 350 page book, in my case, 380 page book that would benefit everybody at the same time. So it began as sort of let me do something fun in the middle of this extremely challenging pandemic. And it became something a lot more inspirational.

The funny thing, Henry, is that this whole journey began, not with the book for me, but with LinkedIn Learning and teaching my first course on privacy. I was in the middle of like leaving one job and joining another one where I happened to be on LinkedIn one morning. And I saw somebody from LinkedIn Learning post a comment on somebody else’s wall and say, “Hey, we’re teaching this course on privacy, but we don’t have anybody to teach it yet.” Like, we’re thinking about this course. We have approval to put this course out there. And the person from LinkedIn had left this comment on the wall of a high profile CISO in Silicon Valley. And they had not responded yet. The CISO hadn’t responded yet.

So I contacted this person on LinkedIn Learning and said, “Hey, if this person doesn’t respond, can you ping me cause I’m interested?” And I had no shame, I had no reluctance to be desperate, cause you know, you only live one time. You have to sort of take your chances. And as it turned out, that person didn’t respond and I got the course. And that course did very well. It led to three other courses and those three courses and their feedback by learners on LinkedIn Learning made Manning publications catch me. And they’re like, okay, this guy must wanna do something. So I took initiative the first time that led to courses that led to the book and that led to a lot of interesting opportunities including this podcast.

So the lesson here is, if you wanna be a leader in the trust, security, and compliance space, you have to take some chances. You have to write your own book in a manner of spinning, cause there is no handbook. There is no course that teaches you how to catch these opportunities and make a difference.

The second thing is there is a lot of people with the same questions. There are a lot of people struggling with questions I had 7, 8, 9 years ago, because a book didn’t exist. And I wondered, I wish somebody had a book or some Google resource that I could learn from. Turned out I didn’t have that benefit. But the LinkedIn Learning courses, the podcast I do, the book I’ve done, hopefully serve as a resource for the next Nishant who’s gonna hopefully do even much better than me because sometimes you ask questions, sometimes you ask and answer those questions, and I chose to do the latter.

Henry Suryawirawan: Wow! Thank you for sharing your story. I think that’s a very good message for the listeners here, right. So sometimes we have to create our own opportunity. Not to say that we wait for opportunity to come being offered to us and we do. So I think that’s really a good thing.

And you started from the LinkedIn course, which I think many people would also have done, creating courses and things like that. Which I think I find very interesting because not everyone has the confidence, right. Being comfortable creating courses by themselves and then publishing it. So I think thanks for sharing this story.

[00:10:04] Building a Course

Nishant Bhajaria: When it comes to your last, can I just make one more point about the courses?

Henry Suryawirawan: Yeah. Sure.

Nishant Bhajaria: So LinkedIn Learning has an excellent program for that, and they help you build out the table of contents, the courses, the scripts. It’s interesting like when you teach these courses, you have to essentially combine three things. You have to combine sort of the actual course material itself, the domain. If you talk about security or AI or career development, whatever that happens to be, you have to have your core content. Cause unless you do that, there’s no point in the course being there.

The second thing is you need to have a narrative. You cannot just throw instructions at people. Cause when we were kids, we have fond memories of our childhood, because we remembered our teachers, our parents, grandparents, counselors, whatever, teaching us the stories. Cause human beings fixate around stories. Like Richard Nixon, a former president said that “you campaign in poetry, you govern in prose.” Most of us live our daily lives on prose. Like we have to get our job done, pay our bills, wake up in the morning for alarms, show up to meetings on time. There’s a lot of prose. But you really think of your favorite moments as the ones that having poetry. You know, the first time somebody gave you a promotion. The first time somebody listened to your idea. The first time you made a mistake and learned from it. So the second thing I’ll say when it comes to these LinkedIn Learning courses, have a narrative. Like, people who ping me for compliments tell me that they remember the stories, my life experiences, things that I did well, things that I didn’t do well. So that’s number two.

The third thing is the ability to promote these stories. Be able to tell people, “Hey, I’ve taught this course. Here’s who I aim at. Here’s what I’ve learned from the process.” So have the core competencies covered. Tell the stories and be ready to sort of really promote your work. Cause otherwise there’s so much content out there that people won’t catch it. Building the course itself was a learning experience, and that leveraged everything I learned in college, including in the classroom, outside the classroom, and the debate team, etc.

So I would urge people to think of privacy and security through that realm. Cause it’s one thing for people to say, I believe in security, I believe in privacy. First principles, do the right thing. Everybody says that. But how do you tell the stories? How do you build the course material? How do you come up with the dashboards? How do you build the products? How do you convince people that give you a chance, right? All those things are very important as well.

Henry Suryawirawan: Thank you for the tips. So I think that’s really very important, right? To build the narratives, not just spitting theories and points to share with the people. So I think building your own narrative, sharing your stories, I think is very powerful. And also, especially if you can be vulnerable and share your not so successful stories, I believe. So that people can relate and actually find that it is actually relevant to them.

Nishant Bhajaria: Yeah, I tell them all the time, I’m gonna be very generous with my mistakes so that you can make new mistakes of your own rather than the one I’ve made. So then I can learn from your mistakes rather than just learning from my own. So I think there is a lot of value in telling people about your mistakes, because for every 10 people in Silicon Valley who pretend that they know what they’re talking about, there are like a hundred out there who are afraid to ask the right question because they are concerned that they will not look as smart as the people think they are. So this imposter syndrome is real.

But these fields like security and privacy are pretty brand new. Like a lot of what we know today wasn’t known 10 years ago. The idea of having Open IDs wasn’t as big 10 years ago. Global internet was not as penetrative as it is right now. In fact, even the smartphone is something that happened in a sequence, right? Like Blackberry tried it first. I remember HP released the iPAQ. So innovation happens in peaks and valleys.

So it’s important for you to be honest with yourself and with others, because you may look stupid initially, but I feel in the long sweep of history, you will benefit people a lot more if you can be very comprehensive about your mistakes and your successes. Obviously, when you work for big companies, like I often do, you have to work with your comms team to make sure that you don’t end up revealing something that is IP or trade secret or whatever. But I feel like there is a lot of value in telling those stories and learning and helping others learn as well.

Henry Suryawirawan: Right. I hope one day I could also do the same, you know, publish my own course, telling my stories and let people learn from it.

Nishant Bhajaria: Right after we’re done this podcast, you can get on LinkedIn, and maybe there’s somebody else pinging somebody else and you can intrude like I did, and say, “Hey, if this person doesn’t respond, give me the course.” So that might work.

[00:13:43] Data Privacy Definition

Henry Suryawirawan: So Nishant, the topics of data privacy itself. I find it’s pretty rare to find good resources about it. So when I see your book, right, I think it’s pretty, maybe it’s one of the thing that I just bumped into. But actually the topic is pretty hot these days when people talk about data, people talk about, like GDPR in the Europe. Here in Singapore, we also have something similar called PDPA. And also users have become more aware that data privacy is a key thing for them.

And I think when I see your book, I find it very interesting. And the first thing that I would ask you is actually to define, what is this data privacy. And there’s an equivalent privacy engineering associated with it. Maybe if you can describe what those are, that will be great.

Nishant Bhajaria: Yeah, so privacy does not have a definition per se that is universally accepted. So I think of it as two different definitions, and hopefully we can overlap the two over the course of this conversation.

So from a user perspective, from a customer’s perspective, I like to think about people like my parents, my sibling, my grandparents, my spouse, her dad. For them, privacy is about being treated with respect, like being able to make informed decisions with their own data and not be caught by surprise. Like there should not be an example where somebody intentionally, willfully, or continuously and carelessly did something with your data that you would not have wanted to do. Or in other words, I shouldn’t do something with somebody else’s data that I wouldn’t want somebody else to do with mine. So there’s a very human, visceral definition that may not be quantifiable, but something that is easily understandable, right? That’s the first definition.

The second thing I would say for privacy is as a company, as an institution, as a government, you wanna make sure that you use somebody’s data in a way that is respectful, that is transparent, that is compliant, that is continuously improved. If you think about the scale of data, if you think about the nature of human engagement, if you think about the diversity of human beings across the world; no two people are gonna think about privacy the same way. So how do you as a company factor in the first part of respect and the second part of scale and governance and maturity. That is for me, privacy. Making sure that people aren’t surprised, making sure that people aren’t disrespected, your business is handled courteously and professionally. So privacy is about handling data in a way that builds for both compliance and trust, maturity and transparency.

Henry Suryawirawan: Right. Thanks for a really great definition. So the few key things that I picked up is about trust, is about treating our users with respect, right? And also treating others like what you want to be treated, I guess, right? So if you don’t want your data to be shared, maybe don’t do that with people as well.

[00:16:03] Data Privacy Concerns

Henry Suryawirawan: I think in the past, I don’t know, maybe the last five or 10 years or so, people start to share their data on the internet more and more, right? Maybe with the introductions of new websites, new applications, people start to share more data. And in the last few years or so, we can see so many data breaches in the news, right? And there are also people who are becoming more concerned about it.

Maybe if you can summarize all these. What are actually the concerns from the company’s point of view. And also from the user’s point of view, why they should think a lot more about data privacy these days.

Nishant Bhajaria: So I think I’ll start with something a lot more high level, and then kind of hone in with a very specific example. So what has happened in the last 10, 13 years is pretty significant, cause multiple forces have colluded together to change our world in ways that often makes it hard to recognize the world we live in compared to where we were like just a generation ago. We had an expansion of internet access, unlike any time before in human history. We had a switch from pure laptop, desktop functions to mobile devices. We had the explosion of global ID. So in the past where you had to create a username password every single time, you can now authenticate using your Google ID or a bunch of other IDs. You had the ability to build platforms to help provide people capabilities or to provide other people capabilities to sell stuff to customers at scale.

Now, in the past you had major changes happen in small increments. So you had Intel switch from memory to processing, which was a pretty big shift for its time. We had this amazing tech bubble in the late 1990s, but that was an example of innovation in search of actual utilization. You had people building amazing stuff, but there was no market for it.

But in the last 10 years, we had several changes of that scale happen at the same time. And I don’t think we have fully understood how much humanity has changed, cause in the last 10 years, a bunch of other things have also changed. Platform misinformation, abuse of trust, power consolidation in the tech sector. We’ve also seen examples of unstable democracies essentially teetering on the brink. People saying stuff that is factually not true. So because all of these things that have happened at the same time, it is very hard to scale anything and measure things in a meaningful fashion.

So we have examples of people behaving badly or people behaving carelessly, and sometimes both at the same time. As a result of which, I can say we live in a world where our computational processing power far exceeds our moral processing power. So the ability to measure change, the ability to balance innovation and personalization on the one side with competition and compliance on the other is very hard to do.

So I feel like companies need to worry about this because you could have things happen to you in a way that you cannot fully predict at a time and place that is not of your choosing. And whether you are a company that’s collecting the data and building the products on the one side, or you’re a customer who wants privacy, but also low latency at the same time. You have a bunch of things, bunch of expectations, and a bunch of actions that are collectively incompatible with each other. And yet somehow we have to figure out how to make sense of this world we live in, because everybody wants everything all the time.

So that’s the challenge here. How do you catch these things before something bad happens? How do you build the right tools? How do you build the right products? How do you course correct before things go badly? How do you offer training and compliance at the same time? The lack of understanding and the lack of scaling and the lack of ability to undo things is the big challenge. So my advice to companies tends to be you should get things done correctly before you go too far down the path.

I remember when in my undergraduate college days, one of our computer science professors, she had a sign outside her door saying, “Days and days of debugging saved you hours and hours of planning or hours and hours of testing.” And I think that analogy is operative even more today, especially considering the volume of data, the scale of data, the proficiency of bad actors, and the sheer complexity of the regulations and the tech stack we operate in.

Henry Suryawirawan: And how about from the user side? So what would be your summary of the concerns that people should think about now from the user’s perspective about data privacy?

Nishant Bhajaria: I remember, and this was in 2003, I was an RA in our college dorm. And I remember this was the first time people had something akin to an online photo journal that was hosted by the university’s internet. And as an RA, you were not allowed to drink. In fact, I remember correctly, nobody was allowed to drink in the college dorm. And this guy thought it was a good idea to have an open bottle of alcohol, Bud Light, this guy was not 21, and allow himself to be photographed with that bottle and let somebody upload that photograph in a newsletter. He lost his job the next day.

But there are so many of us who may have done, not me obviously cause I’m smart that way, but so many of us that have done things that may not be great from today’s perspective, but there is no online record of it, right? That was the first example that what you do in a confined space may not remain private for too long. So I feel like that’s the lesson here from a customer’s perspective, right? How do you make intelligent decisions with your data?

But the challenge is, unlike somebody holding a beer bottle in their teens, the complexity now is like you may end up doing, saying something online that may come back to haunt you. Or you may want things, like when you open the Netflix app for example, how would you like it if the app takes 10 minutes to load? No, you want to go on Netflix online and you wanna find something within the first 10, 15 seconds so you can get on with it and get on with their evening so you can Netflix and chill, right?

With the customers, it’s the same thing. It’s the incompatibility of expectations around privacy and security on the one side, and expectations around quick performance of your service and app on the other side, right? That’s the challenge here. And the other aspect is a lot of customers don’t fully understand how the internet works, how online services get funded, because the domain has grown really quickly. And I think the tech sector has to do a much better job of telling people, “Hey, here’s how we make the internet work. Here’s how your data gets used.”

So the lack of patience, the abundance of complexity collectively means it’s very hard for customers often to make an informed decision. And everything moves really quickly. There are too many hands in the pie, too many people in the kitchen at the same time. And also the regulatory state. The tools that are being built to protect the customers at the government level and the company level don’t fully appreciate the complexity and the volume of data.

So everybody’s moving very fast. The volume of data and the number of transactions are going pretty fast. And as a result, customers cannot always make informed decisions. Like how many of us read… forget online for a second. When you get a new credit card, you get the credit card bill in the mail, and alongside the bill you got 10 pages of small print, which is the governance and terms and conditions. How many people really read that stuff, right? The level of clarity, the level of understanding, and the implications and the gap between the two is I think the big challenge for customers to reconcile right now.

[00:22:07] Data Privacy Regulations

Henry Suryawirawan: And you mentioned about regulations, right. I think I also feel that the regulations came out pretty late to take some actions before all these things become a messy kind of situations. I myself don’t have familiarity with all these data privacy rules, regulations, and things like that. Maybe if you can also share what are some of the things, the concrete things that some countries have done in terms of protecting their citizens, their users, for data privacy related. So, I know one thing, GPDR. But are there other countries that are at the forefront of all these things?

Nishant Bhajaria: So I would firstly qualify my answer by saying that when it comes to regulation, there are two perspectives. The one is, let’s come up with something quick to address the most pressing issue in the land. But the second perspective, which is something the policy folks that I’ve worked with in the past have educated me on, is the fact that you only get to do so much in the system.

And if you look at the US government system, you have a House of Representatives, 435 members, you need a majority of 218 to pass something. And then you have the Senate, which is the second half of the legislative branch of government. You have a body of hundred senators, two per state, 50 states so a hundred senators, and you need 51 votes to pass something. But you need 60, in essence, to pass anything to make sure that something could actually get to the threshold where 51 votes can lead to passage. And then you have the executive that is the president who may or may not sign it. And then you have the judiciary, which is multiple courts across the country leading up to the Supreme Court, essentially, that decides whether the law is constitutional or not. The system is very complex.

So what regulators wanna do is pass something in an omnibus fashion that covers as many use cases as possible. Cause the idea that you can pass something once and then pass something a second time and a third time is not always viable. Cause you have multiple bodies to convince, right? So if you look at sort of tax law, it only gets passed once a generation, typically. I think the last immigration law that was passed of any consequence was in the sixties, if I remember correctly. So you have this extremely complex judicial system that has to pass and then enforce regulations, and it’s very hard to do. So that is, when you say that it took a long time, it’s because the systems that are required to work together to pass regulations are extremely complex. That’s number one.

The second thing I’d say is a lot of the people who build complex technical systems and the people who pass regulations are living in very different universes. The people who pass these laws tend to be policy makers, attorneys who don’t always understand technology. And the people who build these tools, collect this data are often engineers who don’t understand the world of policy. So the gap between the doers and the builders on the one side and the enforcers on the other side is a challenge.

Now that may not have been such a big deal 20, 15 years ago when, as I mentioned before, cloud computing didn’t exist. Global IDs didn’t exist. Mobile computing was not a big deal. It may not have been a big deal at the time. But now with the volume of data, with the number of good actors and bad actors, the amount of innovation taking place, it’s extremely challenging.

So I think it is very easy to criticize the fact that the governments of the world have not moved fast enough. But I feel like the challenge is do you move too fast and break something? Or do you move too slow and come late to the party? There’s a bit of a bad choice on both sides, right? Nobody wants to be the person that over promised and underdelivered.

The other thing I’d say is no country in the world wants to be responsible for passing laws that’s stymy their own local tech sector, while allowing companies in a different country an unfair advantage. So there is the antitrust aspect to it as well. And I would say GDPR is a good start. CPRA is a good start. The ISO standard that I was part of when I was at Google back in the days, it was a good start. But I feel like we’re gonna have to rethink the idea of how to pass regulation. And in this case, one of the reasons I wrote the book was hoping that I can have the attorneys, the policy people on the one side and the engineers, product managers on the other side come together to sort of really think about regulation in a meaningful fashion. Not pass regulation necessarily, but tell the regulatory state that hey, we were able to work internally in the company and here’s how we think regulations can be better.

And I want the regulatory state to read the book and say, hey, now we have an engineering perspective. Cause the name of the book is “Data Privacy: A Runbook for Engineers”. I want these folks to work with each other and say, here’s what the next step, the next GDPR should look like. Because I want engineers and non-engineers to work together in the company to meet their current obligations and use that cooperation, use those learnings to contribute to the next generation of regulations. Which will in turn improve the next generation of innovation and make that virtuous circle happen without distrust, without talking past each other.

Henry Suryawirawan: Right. So I think that’s a pretty good objective, right? So to have people build more awareness, including the government side.

[00:26:23] Data Privacy is Hard

Henry Suryawirawan: So you mentioned this book is targeted for engineers in the first place. And I feel like a lot of companies, product companies especially, when they build product, they may not start thinking about data privacy first. I don’t know whether maybe some companies are doing that. But a lot of times, they actually focus on the features, the functional requirements, so to speak, of what the product would do.

In your book, actually, in the first few chapters, you mentioned that data privacy is something that is hard to do, right? So for people to start, and you start the book by saying data privacy is hard. Maybe if you can explain a little bit what is the complexity required to start working on privacy engineering?

Nishant Bhajaria: So even though the book is primarily targeted towards engineer, Henry, I think the book is aimed at a lot more people than just engineers. So I think of the book as three different books fused together.

The first one third of the book is aimed at engineers, attorneys, policymakers together to understand, set context, set a common vocabulary, and have a common sort of shared set of facts, to start with. The middle one third is aimed primarily at engineers to build the tools and the systems and some examples from a privacy security perspective. The last one third is aimed at policy makers, executives, and senior engineers, because then you wanna build things at scale, think about maturity, think about how do you build for trust, how do you think about reusing tools? How do you make privacy efficient? Which is sort of a big topic these days about how do you use resources efficiently.

So I think even though the book is aimed at engineers, it is aimed at a much bigger universe. Cause I think the end goal of the book is sort of threefold. First is build better engineers who can focus on not just depth, but breadth. Close the gap between the engineers and the non-engineers. And the third is to set the conversation on how we need to do these things. Not just because privacy security are the right thing to do, but because it’s good for business, it’s good for national security, it’s good for the company’s bottom line.

So if you can make those three things happen at the same time, build better engineers, bring people together, and make sure that good privacy and security are seen as good business, then this will become not a problem, but something that people see as an opportunity.

Henry Suryawirawan: Right. And also the other thing that I asked just now, how do we get started, right? For most companies I would say that they may not know the challenge, the kind of complexity that they have to deal with whenever they think about data privacy and privacy engineering, so to speak.

So maybe if you can elaborate a little bit more, like why data privacy could be hard for engineers or for product companies to start thinking about.

Nishant Bhajaria: I would like to quote another president now. I quoted President Nixon once and I’ll quote President Kennedy who said that “The best time to fix the roof is when the sun is shining.” The reason privacy is hard is because people buy a really big house. They wanna make sure it looks really nice on the outside. They buy amazing, expensive furniture, kitchen cabinets with granite, etc. But they forget to fix the roof. And it’s not a big deal cause they moved in the summer, cause that’s when most people move, because it’s break from school, right? And then the rain comes in the winter, the snow falls down, and then you realize the fact that you didn’t have a good roof means that your home is now flooded.

Privacy is kind of like having that flooded house, because you didn’t fix your roof in time. That’s the challenge here, right? So that is why it’s hard, cause by the time you focus on privacy, your home is flooded. The street is full of snow. The people who wanna fix the roof can’t get to your house in time, and as a result, the floodwater keeps rising.

So privacy is hard because people start too late, quite frankly. Because people don’t understand that privacy and security risks are not something you happen to come upon in one day. It is the combination of risks you have built over time. Bad decisions you made. Good decisions you didn’t make. Things you delayed. Things you knew were a problem, but you chose to look the other way. So it is a combination of a lot of different risks.

And I think people sometimes feel like fixing privacy is all about hiring somebody like me or buying my book. But that’s like saying that you can eat badly all day, all year, and then on the first of the year, you’ll pass a new year’s resolution, you’ll jump on the treadmill for 10 minutes, and then wonder why you didn’t lose the 40 pounds you gained over the year, right? Sometimes, it’s about accumulating risk over a long period of time and then trying to do a quick fix that will not fix the issue at hand, right? So that’s why privacy is hard.

The good thing is that there are things you can do incrementally. You can make the argument that collecting only what you need is not just a privacy imperative, it’s a sound business. Like, you don’t buy food that you’ll never eat. You don’t buy a car that you’ll never drive. Why would you ship something that you’ll never use? Why would you collect data that you wouldn’t use? Why would you collect bad data? Why would you use data that is outdated? So the things you do wrong from privacy perspective are also bad from a business perspective.

So even if you don’t understand the first thing about privacy, you should know that the things you fix for privacy will also benefit some other part of your business. You should not be encrypting data that you will not be using. You should not give access to data for people who don’t need access to that data, right?

So if you think about privacy, not as just a regulatory concern or a trust concern or a compliance concern, but as a business efficiency concern, you are already off to a good start. Just as you build privacy risk over time by not thinking about the business efficiency aspect of things, you can start addressing privacy concerns by asking yourself, “What can I do that is right from a privacy trust perspective, but also right from a business perspective?” So thinking of privacy and business not as competitive tension issues, but as business efficiency issues is the way to go.

Henry Suryawirawan: I like the way that you frame this privacy is to also something good for the business, right? It’s not something just to comply with regulations or comply with the user’s needs. But actually, it’s also good for the business.

Nishant Bhajaria: Exactly.

[00:31:22] Privacy & Security

Henry Suryawirawan: So in terms of the actual details about privacy. In your book you mentioned at the fundamentals, actually privacy is all about handling the data, right? How do you collect the data? How do you store data, classify and things like that. In terms of implementation, maybe if you can give a little bit of explanation for engineers who are the listeners here, what should be their concerns or what they should think about, maybe during the design, maybe during implementation, and maybe during how they handle the data within their whole ecosystem of systems within the product company?

Nishant Bhajaria: So let me give a very specific example here, Henry. So I think of privacy as security plus. And I know people get really mad in the privacy domain, cause we don’t like it when people lump us with security, and we’re separate. But honestly, if you think about traditional security, we’re talking about firewalls, certificates, encryption keys, things like that. The assumption is that’s all you need to protect data. But the problem with privacy is, you have to think of security as privacy risk. So if something is a security risk, it is by definition of a privacy risk. So if you in an unauthorized fashion get into a company’s database and you steal somebody’s data, that’s obviously a security risk and a privacy risk at the same time, right?

But what happens if you are able to bypass security, either because you are an employee of the company or because you got into the company’s domain in a sneaky fashion. What happens if you get authorization to the data and then it gets used incorrectly? So as an example, I collected your data to recommend to you shoes or on amazon.com, the next thing you should purchase. If you bought dog food six weeks ago and your dog typically, needs that same food refresh once every six weeks or once every eight weeks, then at the fourth week, it may make total sense for me to give you an ad saying, buy this. Now that is totally legitimate as long as we have consent and whatnot, right? But if I infer things about you like your race, your gender, etc, that’s a problem, right?

So from an engineering perspective, how do you think of privacy and security, not just as infrastructure and protecting the company, but about using the nuances of the data and protecting the customer as well. What happens is if you collect data that you should not have collected or if you collected data correctly, but now it is being used to do things that were not initially possible. The challenge with data is, data is a living, breathing organism. If you collected my data three weeks ago, and it was perfectly legitimate to collect that data and use it for a certain purpose. But now three weeks later, you also were able to obtain some other data about me from some other source on the internet. And both of those combined can tell you things about me that you may not have been able to infer with the first collection anyways. That’s a problem. Because now you have possibilities to do stuff to me and my data that you couldn’t do before and I don’t have the ability as a customer to know that.

So my insight to engineers is, continuously classify the data based on risk. Tag the data based on your understanding of the risk. And enforce policies on an ongoing basis. Because if you do those things, then just as the data and the risk accumulates on an ongoing basis, your ability to understand that risk and protect your customer from that risk also happens on an ongoing basis.

It’s a bit like when you, let’s assume you eat a lot every single day like I do, as long as you work out the next morning, there is a good chance that what you are accumulating in terms of calories is being burned in terms of your running. So just as you do everything in moderation and balance, the risk and rewards, if you do it in every other aspect of your life, if you have a big expense, then you cut back on something else. If you stay up all night to watch a movie or something, you get some extra rest over the weekend.

Life is about compensating, checks and balances, right? So should be privacy and security. So my advice to engineers is, use tooling, use processes, use cross-functional checks and balances to make sure that just as you innovate, you can also protect just as you collect, you can also destroy. Just as you provide surprises to your customers, you can provide them transparency and trust and choices. It’s all about making sure that there is a counterweight to everything else you do on a daily basis.

Henry Suryawirawan: Wow! I think that’s a pretty good message, right? For engineers here, always be conscious, trying to classify your data, the risk associated with the collection of the data. And also thinking about compensating, right? So if you collect more, maybe one day you should think about destroying.

So I had this one maybe advice in the past of my career. Just collect the data. Who knows, one day in the future we will need it, right? So I think that advice does not apply anymore in this data privacy related world.

[00:35:23] Privacy-First Culture

Henry Suryawirawan: And also one more thing is that how can we build a privacy first culture within the company? Or maybe for developers, it’s like privacy driven development. I dunno whether that term exists, but how can the company start building this culture so that people, whenever they work on feature, they work on the new product, they start thinking, okay, maybe we should put privacy at one of the forefront of the concerns that we should think about in the design, in the approval, and things like that.

Nishant Bhajaria: So I’m gonna give you two answers to your question. First is at the more strategic level, and second is more brass tacks examples.

So I often tell people, and this, I think it’s become almost cliche for me to say this by now, you would not have medicine without checking for the side effects first. When you go to the grocery store and buy milk, you check the expiration date, right? Or at least I do. When you drive a car, and before you turn, hopefully before–although in California people don’t do that, and more people should, but this is California where nobody knows how to drive, different topic for a different day –but before you turn left, before you turn right, you check the light and you check to make sure nobody’s coming, right. In every other aspect of your life, common sense dictates that you account for safety and you account for some verification. Why on earth would you collect, ship, sell, share data without checking?

Because, especially when it comes to real life, if you buy $4 worth of milk and it turns out it’s not great, you can always go back to the store and return it. Or worst case, you’re only out $4, you will not drink bad milk. Right? Why would you behave in a fashion that is cavalier when it comes to large volumes of data. Especially since if you make a mistake with that data, it could affect somebody’s life. It could lead to a big fine for your business, it could lead to a consent decree, it could lead to roadmaps being permanently affected, right? So just common sense, from a business perspective dictates you should have a culture of privacy.

But the second thing is, as I mentioned before, when you build the tooling and the processes to protect privacy, you are also building tooling and processes to protect your business. If you collect data that you should not collect anyways, then when it comes to discovering that data, and if you’ve already used that data for the wrong purposes, you later then have to spend a lot of time to understand, okay, what did we do with it? How did this happen? How should we prevent it in the future? And that is time that you could have spent building the next product that’ll get you a ton of engagement and revenue. So privacy mistakes will not only surprise you at the time of not your choosing, but it’ll affect your ability to make money and build stuff that’ll help your company succeed.

So having the right tools to check for privacy risks is extremely critical. Companies have invested in tools to make sure that you can block any code releases that will break your build. They will make sure that you don’t release something on a Friday night right before the weekend. If you work for a retail company, I bet you there are checks and balances to make sure that you don’t release something the day before Christmas, right? So my sense is building the tooling for the right privacy, honestly, could help you build those other tools to protect your business, right?

Because Christmas comes once a year, so you wanna be careful with that release. When it comes to bad privacy risks, there is no such day as Christmas or New Year. Every day could be Friday night, right? So you wanna make sure that you build the right tools to protect yourself and the company. So there is the strategic business reason to protect privacy at all times.

But there is also the ability to protect your roadmap, your own performance, your own bonus, your own release cycles, your own metrics. So whether you see it from the do the right thing, business perspective, or you look at the right perspective from a business self-preservation perspective, you wanna build a culture of privacy. It’s about the right tools, the right processes, the right verification. And fundamentally, I’m not talking about something that’s rocket science. Everything I’ve talked about, if privacy and GDPR didn’t exist as topics, people would still do them anyways. It’s just so happens that privacy has become this big scary thing that people are afraid of.

Honestly, I tell people that if most companies did the right thing from the basic perspective, I wouldn’t have a job. You wouldn’t need me. Now, I’m glad I have a job. I’m glad I exist. But the reason I had to do this, to write the book and teach these courses is because companies often end up in two extremes. They either don’t care about privacy and get surprised and then have to spend the next 10 years trying to fix their mistakes, or they become over cautious and piss everybody off and they end up stifling the engineers in the company with unnecessary process. My job here, my goal is to find that balance in the middle where companies can make informed decision based on the right tooling, make the case for intelligent regulation and intelligent innovation, and showcase their work to the customer so they can get credit for doing the right thing from privacy and security perspective.

[00:39:23] Data for Machine Learning

Henry Suryawirawan: I wanna bring, maybe, one recent trend which I believe some people think it in a different way as well in terms of collecting data. So we are talking about AI and machine learning these days. So as we all know, for this machine learning to work properly, you need to have lots of data, lots of labels, tags so to speak, right? You need to classify the users with a lot more attributes.

So what’s your take with all this new trend, right? So people think that we have to collect more and more, identify the users better so that the machine learning model becomes more accurate. Maybe you can help to give an advice here for people who think that actually for building machine learning, we need to have more data.

Nishant Bhajaria: So before I answer the question, I want to be a little snarky here. There are some words people use to appear smart. So I remember when, after I got married, my wife and I would go to nice grocery stores. Cause until then I would go to the cheapest grocery store. But with her, I’d go to nice places. And a lot of products had words like organic, farm fresh. I still don’t know what any of that actually means, but people often say things to sound smart. In Silicon Valley, a year ago you had to say homomorphic encryption at least once in the first 10 seconds or people didn’t think you were smart. Now the topic is Generative AI. Six months ago was governance, right?

So what I would tell people is that first up, a lot of people are using these words without knowing exactly what they mean because that’s just how the world works these days. So don’t be intimidated, ask questions and try to make sure you have your facts in place before you make decisions about data or make a case about having more or less data. That’s point number one. And I still don’t know what a farm fresh actually means, but that doesn’t stop me from asking the question.

To answer your question specifically, I would say AI and data collection is extremely complex. On the one side, you have to collect data to represent the sample size accurately to govern for data quality, to check against bias. On the other side, I’m not as concerned about people collecting data for AI purposes. I’m more concerned about people collecting data without caring about the data.

As long as you have the right controls to suss out the utility of the data and then delete it once its usage is complete, I’m okay. As long as people know what they’re collecting and why, and then deal with access control intelligently, that concern goes down. So I think data collection and AI can be done intelligently, thoughtfully, as long as you have the controls in place. Not just to protect people’s privacy, but to make sure that the data itself is useful and correct. That’s number two.

The third thing is from a security perspective, data collection is also important, right? Because unless you have the right level of profiling of users, you cannot decide which user is about to DDoS you versus which user is getting unethically improperly penalized. So I think it is less about collection, but more about careless collection, less about volume of data and more about the lack of controls to enforce policies on the data, right?

Cause this is a continuously learning process. You collect the right kind of data, you check to make sure you discover shortage in your collection processes or deficiencies. And then you improve your collection processes. And then you identify something that happens someplace else, you improve your processes. So it’s about teaching your AI models to be a better representative factor of the customer data and better utilization of your engineering resources.

It’s about continuous learning for yourself, for your business, for your tools, and for your data itself. Remember, AI is not this thing that fell from the sky. It’s something that was built by human beings, but with massive amount of data and massive amount of scale. So you have to learn not just from the model perspective, but also from yourself in terms of building the model in the first place.

Henry Suryawirawan: I think that’s a very good insights from you about AI and Generative AI. So like also people talk about it a lot these days, right? But maybe they are not familiar with the whole thing. And think about like we just collect data and maybe one day ML model will find it useful.

[00:42:45] Data Privacy Tooling

Henry Suryawirawan: So you mentioned a couple of times now that it’s important to have tooling within the company. Maybe if you can give a brief, like what kind of toolings are available out there? Is that something that can be automated or is that something more like a library, client SDK that we can embed? Or is it something that is more low level? Maybe if you can share some of the toolings that are available so that people are familiar with them.

Nishant Bhajaria: So the challenge is there is no definitive tool available from a privacy perspective, because there is no definitive single privacy law. For that matter, in the US we have multiple breach notification laws that I think vary state by state. I’m not an attorney, but I think my facts are in place here. The absence of a law means that there’s absence of a proper tool. I mean, in the US we have very complex, very archaic tax law, and we have multiple tax preparation softwares that were written to basically scare the crap out of me, because it’s extremely complex and I have no way of knowing if everything is correct. I’m just hoping that the tool actually works, with my only other choice is to go to a CPA or do it myself. And every option has downsides to it, right? So there is no tool off the shelf, which is part of the reason I wrote my book, part of the reason I teach all my courses online.

The choices for companies are the following. Build something from the ground up within the company. That is the upside of having built by people that have the tribal knowledge. But that has the downside of essentially being built by the same people that didn’t see it coming the first time around. So there is a trade off there. There are multiple off-the-shelf solutions, third party tools. I advise some of these companies, to be totally honest with you, and they’re trying to fix these problems from an outsider’s perspective, but also make sure that there is a standard in the industry so that not everybody has their own bespoke software. That’s number two. The third model is start with building something in-house and buy a third party vendor or buy a third party vendor and then build something on top of that to provide coverage with their own use cases.

I don’t think there is one answer for any company. Hopefully, we get to a point where on a sector by sector basis or for different kinds of data or for different kinds of cloud vendors, there are certain set of tools that work. But I think the domain, as you mentioned at the very beginning, Henry, is in its relative infancy. So I don’t think we’re at a point where we can just build something for everyone, because we don’t have one law in a given country. We don’t have an example of how one law can be properly, verifiably complied with, and we also don’t have a common way of doing things.

Like there are companies that are legacy companies moving to cloud for the first time. There are companies that, for a whole host of reasons, preferred an on-prem infrastructure. There are companies that still have a monorepo, other companies that have multiple repos. There are companies that have a single point of failure, other companies that have a multiple microservices model. There is so much diversification at the engineering level, at the privacy level, at the customer expectation level, at the international legal level, that it’s very hard to have one tool.

Which is why, again, I tell people: shift left, start early, keep improving, keep building their virtuous circle. And then you can make this decision on an informed basis without being forced to comply with the law that may be expensive to comply with. And in the end will not protect yourself from an IP perspective and will not protect your customers as well.

Henry Suryawirawan: I think I like the way you mentioned about shift left. So as we have shift left with so many things, you know, like automation, security, and things like that, I think privacy could also be one area where we can shift left and do better planning earlier.

[00:45:45] Data Sharing

Henry Suryawirawan: So, one thing is about company collecting the data for their own purpose, right? I think these days we can see a lot as well about data being shared with other third party apps or other users. So data sharing I think is also one thing that we can discuss a lot about today. What do you think about this aspect? Companies collect data and then it can share with other people. You can think of it like the Google having consent screen, we will share your data to this third party apps. Or maybe even like some other apps doing so as well. So what will be your key message here about data sharing?

Nishant Bhajaria: Two things. You know, what happens in Vegas, may stay in Vegas, but very little that happens elsewhere, stays in that location. So that’s number one. I’m sure I could have landed that joke a bit better, but the general point remains; that anytime data leaves your system, that is data sharing. So you turn on your TV and open the Netflix app. There’s a bunch of stuff about you going to the Netflix source. Now, this is not creepy at all because Netflix needs that data. They need to understand where you live. Are you who you say you are, your device ID, your internet connection, your browser type, etc. Cause the streaming experience has to be customized. It’s not like a DVD, which by the way, the Netflix folks just shut the business down, right? So everything is now online from a streaming perspective. That is data driven.

The problem starts when that data now gets shared and used for other purposes. When you collect that data as a company and you give it to third parties without an understanding of what happens to the data once it gets there. Does that third party have good privacy security practices? Is there an attack possible in the middle while the data is in transit? That’s number two.

For me, the biggest risk from a third party sharing perspective is what happens when the data you shared, the data that exists on the dark web, the data the vendor may have, all of which combined together to fundamentally change the risk calculus. Remember we talked about risk analysis at the beginning stage of data collection, right? We talked about classification, inventory, tagging, labeling, etc. That happens once or twice in the company’s history. But then what happens is once that data gets pulled with other data, the risk factor changes completely.

For people listening to this podcast, you guys should google “Mitt Romney Twitter account”. So M-I-T-T, Mitt. R-O-M-N-E-Y, Romney, Twitter account. Governor Romney or Senator Romney is a former US presidential candidate. He is a high placed official in the US government, a very famous presidential candidate, a very successful venture capitalist. He mentioned to a journalist, I think three or four years ago that he has a private Twitter account. So he has a public Twitter account because he works with the US government. But he also mentioned he has a private Twitter account. He didn’t mention the handle.

And a journalist who listened to that interview was able to identify within a few hours what that Twitter account was. And that was based on information about the Governor Romney that she had. How many kids he has, what his business ventures were, what his history is, including where he served as a missionary for his church in his younger years. Based on purely that piece of information, she was able to figure out the account. Now this is somebody who was not a computer science engineer. This person did not have privacy domain expertise and she was able to figure it out within two hours.

Think about what we can do to somebody’s anonymity, somebody’s identity, somebody’s physical safety at scale with massive algorithms, massive compute power, right? So I think that is kind of the challenge when it comes to data sharing. As I mentioned before, data is not static. It is a living, breathing organism. Data is not like tax law that only changes once every generation. Data changes every single moment. Your data, my data is changing as we speak, as more words come out of my mouth and get transcribed on your system, right?

So I think what people typically don’t get from a sharing perspective is they go after hacking, they go after exfiltration, they go after attacks. But the real risk is what happens to the data and what happens to it without any malfeasance intended by anyone. Or what happens based on decisions that were made 2, 3, 4, 5 years ago that were totally legitimate decisions based on what we knew at the time. But with the advent of new technology, new algorithms, new manipulation systems, new AI, etc, the fundamental risk calculus has changed. And it’s very hard to reverse those decisions because the cat’s outta the door at that point.

Henry Suryawirawan: Right. And I also wanna discuss from the end user’s point of view, right? I find sometimes we are at the disadvantaged position, right? So with all this, for example, if you see consent screen that an application wants to access your data from Google, for example, there’s no option where you can say no. If you say no, that basically that means you can’t use any of the features from the apps. And also, for example, cookies. Now we have all this pop up. But most of the time actually the option is like accept cookies, right?

So I think sometimes the end users is at disadvantage, like not having a good option not to share their data consciously. So what would be your message here for people, maybe for end users about thinking before we actually give consent to our data?

Nishant Bhajaria: So this question kind of goes into the legal turf a little bit, so I’ll be able to provide a very limited answer. Because when consent is required, how it should be collected, the clarity of the copy, that is more of a legal question. And just as the attorneys don’t teach me how to write code and build services and create metrics, I probably shouldn’t be moonlighting as an attorney anyways.

What I will say is, from the engineering perspective, from a tool perspective, it is critical to ask yourself, are you giving the customer enough information? Are you giving the customer too much information? Are you giving the customer an informed choice? Because at the end of the day, this is a combination of the tools you build, the copy and the language, the clarity of the language itself, and the clarity and the integrity of the policy that’s behind it, right?

And honestly, what happens is, people have to go through their life on a daily basis. As I mentioned before, I don’t remember ever reading all the details of the credit card statement that gets sent to me. I pay my balance in full every single month. And my assumption is everything will work out correctly if I’m paying my bill in full, there’ll be no interest charge, no late fees charge. But there are people who may not be able to pay the full balance, and for whom something in those policies might actually mean something.

So this is not just about privacy or security, it’s about the complexity of the law. It’s about sort of the details in there. Like when I became a naturalized US citizen, I was told multiple times that if there was ever a misunderstanding of anything, it was my responsibility as if I’m supposed to single-handedly understand the complexities of immigration law that was passed in 1965. My parents were not even double digits when that law was passed, and yet I’m supposed to understand every single detail.

So I think people are honing in on privacy and consent a little too much, because this is a larger challenge between when it comes to the disconnect between the people building the tools and the people writing the laws. The people who use the products and people who push out the policy, there is a significant disconnect that did not begin with privacy.

The challenge with privacy is much bigger simply because of the volume of data. But I think, we have to, as a community, figure out a way that the people who build stuff and the people who write these policies are in the same sort of contextual framework as the people who say yes or no to these policies. I don’t think we’re there yet. I don’t have an easy answer right now, because as I mentioned before, this challenge predates the emergence of privacy and security as risk carriers.

[00:52:10] Data Classification

Henry Suryawirawan: Thanks for your valuable input. So in terms of the data that we collect, you mentioned a couple of times you need to do risk analysis, do classification. And in your book in the latest chapters, you also talk about privacy maturity model. Maybe if you can give a glimpse, how should people start categorizing or classifying the data that they collect within the company and what kind of things that they could aspire to build as a privacy maturity model within the company.

Nishant Bhajaria: So let me give a very specific example, right? You wanna make sure that your categorization of data is as contextual as possible. So as an example, just to stick with the Netflix use case, when you collect customer data, as a streaming platform you could make the argument that somebody’s IP address is a very sensitive location data. Cause you get their IP address, you could pretty much identify where they live. And then you can infer their gender or their race from their streaming data. You might be able to infer other details about them, things like that. If you use their IP address only for the purposes of personalization, that’s a challenge.

But in that case, if you think about it purely through the lens of risk, IP address should be very, very sensitive data, which means collect and delete quickly, minimize access, things like that. But if you only use the IP address for security purposes to check from a DDoS perspective, maybe it’s better to have that data in a separate database. Keep it for a long time to study trends and patterns, but minimize access.

If you collect IP address from somebody and they live in New York City where it’s very densely populated and it’s very hard to hone in on somebody’s specific location, maybe the IP address is not very sensitive, because it’s hard to identify someone. But if you’re like my father-in-law, he lives in a small town of 600 people. He genuinely believes the government is trying to keep an eye on him. He’s one of those paranoid type of people. Maybe in that case it is very sensitive because there is an individual who’s concerned, but also the identification risk is very high. The other example is if you could collect somebody’s IP address, get consent for collection, but lump that IP in a group of a lot of people, so that identification risk for individual users is very low, then the risk goes down.

So what I’m generally saying is, when you collect data, before you categorize it, before you inventory it and tag it, there are decisions you can make to the data about the data that might impact how seriously you treat the security or privacy of the data. You can reduce the risk by doing things like aggregation, perturbation, data obfuscation or some other modality of verification of data. You can increase, in which case you can keep the data for a long time. In other use cases, you can collect the data and not change the data at all. In other words, take on the risk of identification, but keep the data for a very limited period of time, minimize access, in which case the risk goes down.

So there is a constant tug of war between the precision of the data and the retention of the data. The longevity of the data and the precision of the data, right? So you have to sort of see what that balance looks like for you. And that balance may change on a day-to-day basis, week by week basis, depending upon the volume of data you have, your risk appetite, the nature of the customer, the kind of data, the stage of growth you’re going through, the country you’re doing business. And so what is totally fine to do in Thailand may not be totally fine in Germany, for example. Different histories, different risk tolerance, different privacy sensibilities. So privacy is very contextual. It is very visceral. So you have to make sure that the tooling and the processes that you build for it are responsive to that complex nature of privacy.

Henry Suryawirawan: Thanks for such elaborate answer. Just by looking at IP address itself, there are so many contexts where it can lead to, for example, is it very sensitive for some company or for some users? So these things definitely are not abstract. It’s not a hundred percent always applicable to many companies, many users, right? But sometimes think within your context, how the data could be used or misused and how do we protect it, right? Classify it, protect it, and maybe even thinking about storing it differently for people not to get access.

Nishant Bhajaria: By the way, I appreciate you saying that my answers are elaborate. That’s a polite way of saying that I talk too much. So I appreciate you, you deploying euphemism a little bit there.

[00:55:46] 3 Tech Lead Wisdom

Henry Suryawirawan: Right. So Nishant, as we go to the last part of the conversation, there’s one question that I would like to ask you, which I asked to all my guests. This question I call the three technical leadership wisdom. So think of it like you wanna give some advice to people here so that they can learn from your journey. They can learn from your experience. So what will be your three technical leadership wisdom to share here, Nishant?

Nishant Bhajaria: So when it comes to technical wisdom, to the extent I have any, I would say that when it comes to fixing for privacy and security, it is no different than any other innovation. Think of privacy as a product. Sometimes people who work in privacy and security make the mistake of thinking of privacy and security as a cause, as a moral issue. Now it is those things. Your decisions when it comes to data could affect somebody else’s life. They could affect somebody’s preferences. They could affect somebody’s physical security, right? So it is a moral cost.

But that is the beginning of the conversation. If you went to any corporate CEO, they will tell you, we care deeply about privacy and security. Most important thing. They will also say, we care deeply about growing our business and keeping our employees well paid. Most important thing. What happens when there is a conflict between those two? Life is about making choices, right? So recognize that. And recognize that whether it’s privacy, security, misinformation, AI, fairness, equity; whatever your cause is, they are looked at through the prism of the business.

So when you make the case for funding, for tooling, ask yourself, how do you make the case in a way that responds to the needs of the business? Now, there will be examples where it is critical to do the right thing from privacy and security perspective, no matter the business cost. Like you would not hire an engineer who says it is okay to say bad things about people based on their race. You would never hire somebody like that, right? Even if they happen to be a very good engineer from a coding perspective. But in some cases, there are choices that are very critical to make.

But that is not true in every use case. Like you don’t have to run privacy and security in a way that hurts the business. If you hire an engineer who does not speak very good, doesn’t talk very well, you can coach them from a communication perspective. So it is one thing for you to say, I’m not gonna hire an engineer who has bad morals, which is exactly the right thing to do. You shouldn’t hire somebody like that. But you can’t say no to everybody who’s different than you.

So you have to have that level of judgment when it comes to privacy and security. You need to be very deliberate about telling the business, we shouldn’t do this because of privacy or security issues, no matter the cost of the business. But there are 50 other cases where you can say, the business wants X but if we just do X a bit differently we can get the right privacy outcome. And in the long run, that’s better for the business anyways. So try to recognize that there is sometimes a moral cost we made, but in a lot of other cases, there is a business sensitive case you can make that will make the right case for privacy and make the right case for the business as well. So that is my lesson.

A lot of engineers often get extra careful and they hurt the business with unnecessary process. And in some cases they become extra careless and they hurt the business, because they didn’t do the right thing. Recognize when it’s important from moral perspective. When are you doing too little? When are you doing too much?

My lesson to engineers is ask questions. Seek the advice of the legal team, the comms team. Document things whenever possible. But if you have concerns, say something. The worst thing is maybe you will ask the wrong question at the wrong time. There is a lot of forgiveness in my experience from asking the wrong question or taking initiative. There will be a lot less forgiveness if you knew what the right thing was and still didn’t do it.

And I have run my career the same way. I ask questions. I do my research, I’m wrong as often as I’m right. And I’m still learning as well. This is a learning experience for me as well. So be humble, be creative, be ethical. That’s my advice to engineers as I would give the same advice no matter what question you ask me, privacy or otherwise.

The other advice I would give is don’t wait for regulation. My general, my big frustration in life, honestly, when it comes to engineers is, engineers have allowed themselves to be painted into a corner. Like if you watch movies, people who play attorneys, people who play every other profession gets represented in a very, very glamorous way. I don’t remember the last time an engineer was cast in a TV sitcom or in a movie where the engineer was sort of the leading role. I don’t know if you watched the US sitcom Friends from the 1990s. The only person that was a borderline engineer there was Ross Keller, play by David Schwimmer. And they made fun of dinosaurs, they made fun of him as well for his profession.

So I think engineers often accept the idea that their job is to write code and do what somebody tells them to do. No, I think engineers should be willing to understand that what they’re doing with data is extremely complex. It has implications upon people’s lives, but it also makes the company a lot of money. So don’t wait for the regulations. You should wait for the requirements, but don’t always wait for the regulations.

If you feel like you can make a more intelligent way, build a more intelligent tool, come up with a more intelligent process to protect privacy, make the case for it. Tell people what will happen this way versus that way. Make the case based on data, make the case based on scenarios. Make the case based on business impact. And recognize that engineering is business from a technical lens and the business is engineering from a non-technical lens. The two are connected.

So my advice to engineers would be, think about somebody else’s data as if it were your own. And ask yourself, how would you build the right tool for it? So don’t wait for regulation. Like if your house were on fire, you wouldn’t wait for the fire alarm to go off. If you can see the fire, if you can feel the heat, you’ll probably run for the door, right? Hopefully fixing privacy is not like running out of a burning building.

But ask yourself, why not do the right thing today rather than waiting for the regulation? Cause it is entirely possible that you have discovered something that the regulators have not. You can build the right tool and inform the next regulation that’ll benefit a lot more people. So this is a chance to do the right thing for your business, for your customers, and also for your own career as well, cause you’ve done something that nobody else has done so far.

Henry Suryawirawan: Wow! I find it a very insightful and inspiring message for people to start thinking data privacy as a product. That’s the first thing, right? The second thing is don’t wait for regulations. So whenever engineers deal with the data, always think about privacy first, right? And think as if like you are the users who are sharing the data with the company, right? So I think that’s a real key message.

It’s been a very exciting conversation, Nishant. For people who would love to connect with you, ask you more about data privacy or learn from your courses and things like that. So is there a place where they can find you online?

Nishant Bhajaria: Yeah, they can go on LinkedIn and I’m the only person in the universe that I know of whose first name is Nishant then last name is Bhajaria. So there’s an irony that the privacy guy has a name that nobody else has. So I have zero privacy online in that respect. But yeah, I’m on LinkedIn. I get a lot of messages there. My book is available on Amazon.

And I will say all proceeds from my book, all proceeds from my LinkedIn courses, from a royalty perspective, go straight to animal welfare, which I care deeply about. So if people wanted to buy the book, take the courses, they have the benefit of building their own skill sets, protecting their business and their customers, but also donating money to charity indirectly as well. So any help in that would be much encouraged. Much appreciated.

Henry Suryawirawan: Wow, that’s another great cause that you’re doing with the animal welfare. So for people who want to check out Nishant’s resources, please do so.

Nishant Bhajaria: I can just make one more point. I care deeply about animal welfare. I care about helping dogs get out of high kill shelters. A cause very close to my heart is elephant conservation. So if you travel all over the world, don’t ride elephants, don’t use elephants in circuses. They get beaten up horribly. So it’s a cause very close to my heart. So I know this doesn’t have much do with privacy, but if you think about the world we live in right now, whether it’s addressing the next pandemic, water shortages, air pollution, ecological conservation, elephant welfare; these are all connected to each other.

And if we have learned something from COVID in the last two or three years, it’s about how the problems we will face in the future are not gonna be problems that we can easily fix in one fell swoop. It’s a very connected, intermingled complex ecosystem. So I care deeply about elephants and animal rescue and the environment in general. But it’s a larger issue and it’s gonna be something that’s gonna be very important in the years to come. Just like privacy and security are from an engineering perspective.

Henry Suryawirawan: Thanks for the important plug, important message for people. I didn’t know about all these elephant being beaten up and things like that. So I think that’s also new information maybe for some of us. So thanks for sharing that.

Nishant Bhajaria: Thank you.

Henry Suryawirawan: Yeah, it’s been a pleasant conversation. Thank you so much for this talk. I learned a lot about data privacy. So thank you again, Nishant.

Nishant Bhajaria: Thank you.

– End –