In a typical manufacturing company, a supply chain is the chain of companies that you rely on to make your product. For example, a mobile phone manufacturer buys processor chips from a supplier. That supplier needs to buy a part from another manufacturer. And that manufacturer relies on yet another company for the raw metal.
But what is the software supply chain? And how do you keep it secure? We spoke with Kim Lewandowski, co-founder and head of product at Chainguard, to explain the details.
Your software supply chain is more complex than you think
The software supply chain can be complicated. Mainly because it’s difficult to know how far it reaches. Take a simple example: If you use Salesforce to keep track of your customers, you store your customers’ data on Salesforce’s servers. Not a problem, surely? But Salesforce could have a breach. And what about the servers themselves? Those servers might run on Windows. If that has a security bug, hackers have another way in. How about the software that Salesforce uses to host its website? If that is hacked, you have yet another breach.
“When I think of the software supply chain, it’s all the code and all the mechanics and the processes that went into delivering that core piece of software at the end,” Kim explained. “It’s all the bits and pieces that go into making these things.”
-On the Dev Interrupted Podcast at 11:28
Keeping the software supply chain secure involves checking who has keys
The important part of keeping your supply chain secure is making sure that you track down what you’re using. And checking that they’re secure and reliable. Every new third party can be a potential problem. If you don’t do your due diligence, you won’t know what risks you’re taking.
As Kim explained, a favorite analogy of hers is thinking about doing construction work on your own home.
“You have a contractor. Well, they need keys. They have subcontractors. You give the keys out to all their subcontractors. Who are they? Where are they from? What materials are they bringing into your house?”
-On the Dev Interrupted Podcast at 12:09
The more third party tools you use, the more out of control it can become
It all comes down to accountability. It can easily start spreading rapidly. One third-party tool that you use to create your software might rely on five separate third parties. And you don’t know what code they’ve got hidden under the hood. Your keys are suddenly all over the place.
The only way to keep it under control is to remind yourself to check and to do regular audits of the services you use. Kim believes it’s helpful to think of every new tool as a package coming to your home.
“How is your package getting to your house?” Kim said. “What truck is it riding on and who is driving those trucks?”
-On the Dev Interrupted Podcast at 12:44
Get the full conversation
If you’d like to learn more about the software supply chain, and how to make sure that yours is secure, you can listen to the full conversation with Kim over on our podcast.
Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is inspired by Dev Interrupted - the go-to podcast for engineering leaders.
Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.
Listen and subscribe on your streaming service of choice today.
Chaos Engineering might sound like a buzzword - but take it from someone who used to joke his job title was Chief Chaos Engineer (more on that later) it is much more than buzz or a passing fad - it’s a practice.
The world can be a scary place and more and more companies are beginning to turn to Chaos Engineering to proactively poke and prod their systems and in doing so are improving their reliability and guarding against unexpected failures in production and unplanned downtime.
During my career I dealt with my fair share of outages, including one that caught me mid-song during a bout of karaoke and far too many that woke me up at 02:00. As the co-founder and CTO at Gremlin, I do my best to make sure no other engineers have to suffer sleepless nights worrying about their product.
But the question remains, what is Chaos Engineering and where did it come from?
A Short History
The spiritual predecessor to Chaos Engineering is often called by a much more widely recognized name - disaster recovery. The focus when this practice was introduced is much the same as today: proactively suss out production problems by injecting failure.
Netflix’s Chaos Monkey is probably the most well publicized Chaos Engineering tool as it arguably kickstarted the adoption of Chaos Engineering outside of large companies, but this has led to the erroneous belief that Netflix invented the practice. In fact, the practice was already widely in use amongst the titans of technology.
Over a decade ago during my time as a Lead Software Engineer at Amazon, we implemented several crude practices designed to inject failure into our systems. The most rudimentary of which was employed by a man called Jesse Robbins, who earned the nickname “Master of Disaster” by running through data centers pulling out cables.
Let’s just say the practice has evolved a lot since those early days and your data center cables are much safer these days.
What is Chaos Engineering?
“What Chaos Engineering really is, is the art, if you want to call it that, of introducing controlled chaos.”
- 2:16 on the Dev Interrupted podcast
At its core, Chaos Engineering is a disciplined approach of identifying potential failures before they have an opportunity to become customer facing outages.
It is a practice that lets you safely test your assumption about how your systems will behave under duress by actually exercising resilient mechanisms in a controlled fashion. You literally "break things on purpose" to validate and build resiliency. The end goal of Chaos Engineering is not to inject arbitrary failure into a system, but rather to strategically inject turbulence to enhance the stability and resiliency of your systems.
How Chaotic is Chaos Engineering?
I always tell people that Chaos Engineering is a bit of a misnomer because it’s actually as far from chaotic as you can get. When performed correctly everything is in control of the operator. That mentality is the reason our core product principles at Gremlin are: safety, simplicity and security. True chaos can be daunting and can cause harm. But controlled chaos fosters confidence in the resilience of systems and allows for operators to sleep a little easier knowing they’ve tested their assumptions. After all, the laws of entropy guarantee the world will consistently keep throwing randomness at you and your systems. You shouldn’t have to help with that.
How do I Start?
One of the most common questions I receive is: “I want to get started with Chaos Engineering, where do I begin?” There is no one size fits all answer unfortunately. You could start by validating your observability tooling, ensuring auto-scaling works, testing failover conditions, or one of a myriad of other use cases. The one thing that does apply across all of these use cases is start slow, but do not be slow to start.
What I mean by this is to start testing across just a few nodes versus impacting your entire fleet. We refer to the impacted area as the “blast radius” and we highly recommend starting with a small blast radius (the number of systems impacted) and increasing it over time.
By starting small you allow yourself to gain confidence in both the experiments you are running and your systems. Of course your risk tolerance is also a factor of how large a blast radius your organization will use.
For instance, a large banking institution with millions of customers has a much lower risk tolerance than a tech startup with a couple hundred customers. In that case, they would want to run experiments in a programmatic way and would need to be very explicit about communicating to the rest of the organization what tests are going to be run and when to avoid any unplanned 2am or 3am disasters.
Eventually you want to get to the point where all of this is automated, a process we refer to as “continuous chaos.” Starting small with automation could be something as simple as taking out a single node; then taking out five nodes; then ten; and so on. Eventually you automate the process at a level you are comfortable with.
“Ultimately you want to be able to handle any of this random chaos being thrown at you, because that's what the world is, it's entropy, it's degradation”
- 7:35 on the Dev Interrupted podcast
No Tolerance for Downtime
When I founded Gremlin, it was just myself and my co-founder developing the first iteration of the product. The business looked very different then and I jokingly referred to myself as the “Chief Chaos Engineer” responsible for implementing code that was mostly used by enterprise companies. Many of these companies came to us because they had reliance thrust upon them by the US government or they had top-down reliability standards and they wanted a tool to help them shore up their systems.
As the company began to evolve, so did the customer base. These days it’s not just Fortune 500 companies that care about reliability, it’s everybody. Planned downtime is a relic of days gone by. It is no longer acceptable to espouse planned maintenance windows as part of development lifecycles and customers don’t have the patience for products they rely upon to spend any time unavailable. Companies recognize this dynamic - and it’s not a hard one to miss.
Seemingly our appetite for technology has gone up exponentially while our ability to stomach downtime has drastically decreased. Customers expect that your product is always working, always running. If your product is down because of outages then there are ten other similar products waiting in the wings to take their money.
Making Lives Better
Visibility is high these days and companies don’t need the publicity that comes with making any unforced errors, let alone to be subject to errors not of their making. No one wants to be blown up on Twitter because their product isn’t working or because one of their downstream dependencies or their cloud provider had an unexpected outage.
By preparing for the worst, we can be at our best as an industry and can be prepared when disaster eventually comes knocking. That’s why when an unexpected outage occurs or there is a production failure customers will never even know it happened.
I often joke that we are the engineers’ engineers because many of us know that feeling of being jolted from a dream at 03:00 by our pagers, groggily wiping our eyes and whipping out the laptop to go dig through a sea of monitoring dashboards and logs. It’s not fun and it’s exactly why I founded Gremlin. Because there is a better way to approach operations than merely sitting back on our haunches and waiting for the next outage. Chaos Engineering not only helps to protect against the randomness of the world, but also teaches people how to build more reliable software. And if enough people build more reliable software, we build a more reliable internet.
Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is inspired by Dev Interrupted - the go-to podcast for engineering leaders.
Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.
Listen and subscribe on your streaming service of choice today.
Three years into my software engineering career I was loving life. I could fix anything in the codebase with no doubts in my ability. I was confident, too. Most 24 year olds are. When I was offered the opportunity to become a dev team lead I jumped at the chance. With so much confidence, what could go wrong?
The first few months hit me like a freight train. I might have been a good developer, but I wasn’t a good leader - not yet. It was a humbling experience that I continue to grow from to this day. Great leaders understand that learning is a process that evolves over time, but only if you open yourself up.
In the past year as the host of the Dev Interrupted podcast, I have had the pleasure of interviewing and learning from some of the best engineering leaders in the business.
Here are 5 of their most inspiring lessons:
Always be delegating
Brendan Burns, Corporate Vice President at Microsoft
Brendan is widely known as one of the co-founders of Kubernetes. But he is also responsible for managing over 650 engineers at Microsoft. Even though Brendan takes time to schedule as many one on ones as possible - sometimes as many as 14 in one day, and something he views as a priority as more teams become remote - he knows such large teams can only be successfully managed through delegation.
Let go of the instinct to jump into every project. It’s ok if your teams make mistakes. They’re going to learn, but only if you give them the space and agency to grow. Stepping away from micromanaging can feel scary, but it will set your organization up for long term success and your employees will thank you for it.
Remote first, not remote friendly
Shweta Saraf, Senior Director of Engineering at Equinix
Shweta had the unique experience of undergoing a fully-remote acquisition during the pandemic. Her small team was acquired by Equinix, the largest data center company in the world. As if this adjustment wouldn’t have been difficult enough on its own, Equinix wanted Shweta and her team to teach them - an organization with over 30,000 employees worldwide - how to implement remote work best practices.
To be as successful as possible with this transition they chose to embrace remote work completely. There would be no half measures. If they were going to become a remote work company, they would be remote first - not remote friendly.
Leadership with empathy
Ben Matthews, Director of Engineering at Stack Overflow
Ben wants leaders everywhere to know that no one has ever done a better job because they were scared, stressed, or worried about their future. Especially not in jobs centered around creativity and problem solving like software development. Providing people with benefits such as mental health days does more for an organization’s productivity than measuring hours worked ever could.
When you take care of people they will work better and faster - that’s also what they want to do. Everyone wants to be successful. Value creation happens when people are provided for, not when they are treated like widgets.
Comparison leads to unhappiness
Kathryn Koehler, Director of Productivity Engineering
Kathryn believes that what’s being delivered is ultimately of greater importance than how something is being delivered. Though she is in charge of making sure engineering teams at Netflix run smoothly and efficiently, she takes great care when evaluating a team’s performance. She understands that productivity isn’t simple math.
That’s because every project is different. The customer base is different, the use case is different, personas are different, and where a team is within the software development life cycle is different. Ranking teams against each other shouldn’t be the goal. Success is best measured in context, not in competition.
Avoid meetings
Darren Murph, Global Head of Remote at GitLab
Darren tells anyone that will listen there is a quick way to improve your meetings: make them harder to have. He believes people deserve to be able to focus on their work. No one wants to sit on video calls all day. Zoom fatigue is real. Focus should remain on critical day-to-day functions, not on hopping in and out of meetings that leave you feeling exhausted and unproductive.
Leaders should embrace tools like Slack that allow teams to gather consensus asynchronously. Reserving synchronous time for purposeful meetings like making decisions or sharing important status updates.
Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is inspired by Dev Interrupted - the go-to podcast for engineering leaders.
Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.
Listen and subscribe on your streaming service of choice today.
Managing the software development process has been likened to herding cats. In other words, you can’t really do it, but you can sure give it the old college try.
It’s no secret that managing the development of a software project is an imprecise science. Here are nine truisms that I’ve learned over the years that have helped me to understand the limitations of our ability to manage the strange world of software development projects.
1. Estimates Are Always Wrong
Whether you estimate something at one hour or one year, your estimate is wrong. That’s just the way it is. They won’t necessarily be extremely wrong — they might only be a little bit off — but they will be wrong.
If you look at a bug report and think, “That will take an hour to fix,” it almost certainly won’t take an hour. It might take 45 minutes, it might take three hours, but the chances of it taking exactly an hour — even give or take a minute or two — are slim. Now, you might say, “about an hour” instead. That’s a better estimate because actual, precise estimates are wrong.
Now for short projects that might take an hour, this isn’t a big deal. But…
2. The Bigger the Project, the Less Accurate Your Estimate Will Be
The bigger the project, the less precise the estimate will be — especially if estimation takes place at the very beginning of the project. As with the hour estimate above, if you estimate a project at a year, it might take nine months or 36 months. In some cases, it might take five years. There is no way to know when the project is starting out.
The bigger the project, the more “unknown unknowns” there are. There are usually more people involved. That is, as a project’s size increases, there are more variables and more things that will happen that you cannot anticipate. All of these things will add time to the project that you can’t plan for at the beginning because by definition you don’t know that they are going to happen.
3. Focus and Concentration Are Our Most Valuable — and Scarcest — Commodities
When building software, the single most valuable thing required to complete a project is the ability of the developers on the team to concentrate in an undistracted manner.
The fewer distractions, the more productive the team will be. It’s really that simple. One of the main responsibilities of a software development manager is to reduce the number and duration of distractions to the team.
Software developers, when left alone, can be quite productive. When they are interrupted — whether for meetings or by people asking questions or anything else — they can lose that productivity very quickly. We all know about “flow” and how hard it is to get into the flow and stay there. That flow time should be valued like bitcoin and protected as such.
4. Hofstadter’s Law Is the Truth
Hofstadter’s Law is stated as follows:
“It always takes longer than you expect, even when you take into account Hofstadter’s Law.” — Wikipedia
This is related to estimates, but it’s important to note the beauty of this aphorism. You can pad your estimates because you think it will help buy you time to get things done. You can add in extra factors, plan for “unknown unknowns,” and increase your estimates to take into account the belief that it will take longer than you think, but in the end, it will still almost always take you longer than you think to get a project done.
5. You Can Only Run in the Red for Very, Very Brief Periods
You can demand the team put in more hours, come in on weekends -- all those “crack the whip” kinds of things -- and you might get some (very) short-term gains out of that.
But if you try to make it the norm — if you try to run your team's engine at the red line of RPMs on a consistent basis — you will burn out the engine. You will see diminishing returns pretty quickly. Employees will leave. People, like race car engines, cannot be overstressed for extended periods of time without breaking down.
6. Brain Time Is More Important Than Butt Time
This one is so important, I wrote a whole blog post about it.
Nothing will decrease productivity more than demanding Butt Time (i.e. that your developers be seen sitting in their chairs for hours on end). You can measure Butt Time and feel like you’ve got a metric that will really show how productive people are being. But you’d be wrong. Demanding Butt Time will demoralize a team that really wants to spend Brain Time.
Brain Time is what really matters. Think about it this way: Let’s say you are a manager and it is most important for you to see your team sitting at their desks “working.” You wander around the office seeing those developers sitting in their chairs, pounding away at their keyboards. All is well with the world.
But then you run across one developer, and they’re just sitting there staring at their screen. That’s it. They’re sitting and staring. For like half an hour. What the heck! They’re not doing a thing!
But of course, they are. They’re thinking. They’re spending Brain Time solving a very difficult problem. Maybe they even get up and wander around the building for a while. In the end, they sit down, type 11 lines of code, and mark a user story complete.
Did they meet your “Butt Time” criteria? No. Did they produce an elegant solution to a very difficult problem? Yes.
Butt Time proves nothing. Brain Time means everything.
7. Hardware Is Cheaper Than Developer Time — Way Cheaper
Developers are expensive. You pay competitive salaries to attract top talent. An hour of their time is not cheap. Despite this, many companies don’t realize the incredible value of an hour of a developer’s time and skimp on hardware for the team.
But come on, computers are expensive! That extra RAM will bust the budget for hardware!
Well, it might bust the budget, but that’s because you’ve got a budget problem.
Look at it this way: Let’s say that you pay a developer $100,000 a year — or around $50 an hour. Let’s say they spend an hour a day waiting for the compiler to do its work. However, you could add some RAM and a faster processor to that developer's machine and cut that time down to 45 minutes a day. You save 15 minutes a day. At 200 days a year, that is 50 hours. At $50 an hour, that is $2,500 saved per developer per year. But what if the incremental cost of the faster machine is $500?
You get the point. If you have 20 developers, getting the faster machine saves you $40,000 for a $10,000 investment. That ought to be a no-brainer.
And that is only for the faster compile times. Everything else they do will be faster as well.
If your budget doesn’t allow for faster machines, then you need to adjust your budget.
8. If You Haven’t Read “PeopleWare”, Then You Aren’t Really a Software Development Manager
As far as I’m concerned, there is but one book that will teach you how to manage software developers: Peopleware by Tom DeMarco and Timothy Lister (be sure to get the third edition…).
This book is excellent, insightful, to the point, clear, and pulls no punches. It is full of wisdom about managing software projects and software developers. It is timeless.
9. Quality Is a Perception — Not a Bug Count
This one is really hard to accept.
Here’s the basic premise: You can have close to zero bugs in your bug tracker and people can still think your software is buggy. You can have a large number of bugs in your bug tracker and people can think your software is as solid as a rock. There’s no correlation between the number of bugs in your tracking system and the perception of the quality of your software.
Now I’m not arguing that you shouldn’t try to reduce your bug count — quite the contrary. But in the end, your software can only be said to be of high quality if your customers perceive it that way — and your bug count won’t necessarily dictate that. Weird, huh?
And while we are on the subject, what does it mean to have a “high” bug count? What is the definition of “high” when your codebase has 100,000 lines of code? 5 million lines of code? Who’s to say?
Embrace Flexibility
Bringing a software project in for a safe landing on a short runway is a challenging and difficult proposition under the best of circumstances. Add in the ambiguities and all the things that can go wrong along the way, and it’s a miracle anything gets done. Development managers need to be flexible and take things as they come
The trick is to accept and understand those ambiguities and to work with them — not against them. Accepting these nine truisms will help with that.
Sponsored by LinearB
Want to reduce a lot of that ambiguity? LinearB can closely track what is happening in your software pipeline, enabling more brain time, and automating things that require butt time.
Book a demo today and find out how you can drastically reduce your code delivery times and continuously improve your development process.
Join the Dev Interrupted Community
With over 2500 members, the Dev Interrupted Discord Community is the best place for Engineering Leaders to engage in daily conversation. No sales people allowed. Join the community >>