It’s important to remember that investment isn’t a completely altruistic act. While investors clearly want to encourage innovation, a primary motivation is to see a return on that investment. At the end of the day, they’re gambling that your idea will make them money.

This can make investing in true innovation tricky. True innovations are those rare game-changing technologies that revolutionize an industry. They’re notoriously difficult to spot. How often have you heard that people thought Apple would fail when they released the first iPhone or didn’t believe in Facebook when it first went public? True innovation rarely looks revolutionary to begin with. So how do investors spot which ideas are worth the effort?

We spoke with Jason Warner, managing director at Redpoint Ventures, to understand the reasoning behind investments and why investors are so picky.

1. Typical SaaS companies are easy to invest in, but true innovation doesn’t follow the same model

When developers start searching for investment, it can often be discouraging. While investors might not understand the intricacies of every technology company they invest in, they can at least spot the trends. They know and understand how a Software-as-a-service (SaaS) company grows.

If a company is growing, it has a very familiar pattern. And so investors can be quite confident that they’ll see a return. They’re much more willing to take a risk and ‘YOLO’ an investment.

“SaaS companies are really well understood in terms of how they grow,” explained Jason. “There is no real investor challenge to understand that if a company is growing 2x and its enterprise sales look good then … [investors] can just “yolo” invest into them. Because they understand what these companies look like … It’s all just Excel spreadsheets.” -On the Dev Interrupted Podcast at 40:29

2. Investors often wait until the first round of funding, but developers need seed funding

If you’re developing a revolutionary piece of technology, then it’s likely that you need investment to get you off the ground. However, it’s difficult for investors to sort the good from the bad. How do they know you’ll be successful, without a few years of revenue behind you? It’s a catch 22 situation. You need the investment to get those first few years, but the investors need to see a few years before they’re willing to invest.

Look at how Netflix completely surprised the world. Nobody predicted that it would change how we watch video (most of all Blockbuster, who fatefully ignored the potential). This is a trend that harks back decades. Online shopping, personal computers, the television, even electric light bulbs were all disregarded when they were first conceived.

These industry-changing innovations need investment much earlier than typical SaaS companies. And spotting what works is more of an art than a science.

“[Investors] miss the fundamentals. They can see the ones that are the trends,” Jason said. “It should [then] become obvious in the next round or the round after that from other investors … oh yeah, that is a great company.” -On the Dev Interrupted Podcast at 41:18

3. Developers need to seek out companies like Redpoint for seed investment

If you have a truly new idea, you’ll need to find an alternative to the usual investors. A company like Redpoint, which focuses on giving seed funding, is much more likely to take the time and actually investigate whether your technology will be a success.

It will take longer, of course. And it might not be the full amount you need to get your business started. But it’ll be what you need to begin building a proof of concept, get those first few years under your belt and start pitching to other investors.

“[If you’re] talking to a Redpoint investor, you should be flattered,” Jason explained. “What we’re thinking is that you are a majorly important company in the future. You have the potential to land … If Redpoint invests in you, we want it to basically mean that we think of you as a new primitive on the Internet or in whatever sector that you are in. And other people are going to build upon you.” -On the Dev Interrupted Podcast at 41:35

Listen to the full conversation

If you’d like to learn more about what Jason thinks and how to secure yourself an investment, catch the podcast on our website.

Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is inspired by Dev Interrupted - the go-to podcast for engineering leaders.

Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.

Listen and subscribe on your streaming service of choice today.

Discover Our Most Popular Podcasts
Join the Dev Interrupted discord

In a typical manufacturing company, a supply chain is the chain of companies that you rely on to make your product. For example, a mobile phone manufacturer buys processor chips from a supplier. That supplier needs to buy a part from another manufacturer. And that manufacturer relies on yet another company for the raw metal.

But what is the software supply chain? And how do you keep it secure? We spoke with Kim Lewandowski, co-founder and head of product at Chainguard, to explain the details.

Your software supply chain is more complex than you think

The software supply chain can be complicated. Mainly because it’s difficult to know how far it reaches. Take a simple example: If you use Salesforce to keep track of your customers, you store your customers’ data on Salesforce’s servers. Not a problem, surely? But Salesforce could have a breach. And what about the servers themselves? Those servers might run on Windows. If that has a security bug, hackers have another way in. How about the software that Salesforce uses to host its website? If that is hacked, you have yet another breach.

 

“When I think of the software supply chain, it’s all the code and all the mechanics and the processes that went into delivering that core piece of software at the end,” Kim explained. “It’s all the bits and pieces that go into making these things.” -On the Dev Interrupted Podcast at 11:28

Keeping the software supply chain secure involves checking who has keys

The important part of keeping your supply chain secure is making sure that you track down what you’re using. And checking that they’re secure and reliable. Every new third party can be a potential problem. If you don’t do your due diligence, you won’t know what risks you’re taking.

As Kim explained, a favorite analogy of hers is thinking about doing construction work on your own home.

“You have a contractor. Well, they need keys. They have subcontractors. You give the keys out to all their subcontractors. Who are they? Where are they from? What materials are they bringing into your house?” -On the Dev Interrupted Podcast at 12:09

The more third party tools you use, the more out of control it can become

It all comes down to accountability. It can easily start spreading rapidly. One third-party tool that you use to create your software might rely on five separate third parties. And you don’t know what code they’ve got hidden under the hood. Your keys are suddenly all over the place.

The only way to keep it under control is to remind yourself to check and to do regular audits of the services you use. Kim believes it’s helpful to think of every new tool as a package coming to your home.

“How is your package getting to your house?” Kim said. “What truck is it riding on and who is driving those trucks?” -On the Dev Interrupted Podcast at 12:44

Get the full conversation

If you’d like to learn more about the software supply chain, and how to make sure that yours is secure, you can listen to the full conversation with Kim over on our podcast.

Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is inspired by Dev Interrupted - the go-to podcast for engineering leaders.

Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.

Listen and subscribe on your streaming service of choice today.

Discover Our Most Popular Podcasts
Join the Dev Interrupted discord

Hiring neurodiverse developers can be challenging, particularly for smaller companies that are less experienced at hiring. This isn’t because you need an entirely new process or that neurodiverse people are inherently trickier to interview. It’s that small flaws in your hiring process get exacerbated. Obstacles that cause neurotypical people to stumble, become outright blockers to a neurodiverse person.

So we asked Matt Nigh, data engineering manager at UW Medicine, to give his tips on how to make sure your hiring process suits everybody.

“I think there are companies that other organizations could mimic,” Matt explained. “I would look at Google as one of probably the best that I’ve experienced.”-On the Dev Interrupted Podcast at 25:50

1. Interview processes should be conversational

If you use a lot of formal language, jargon and needlessly complicated words, you’ll make it much harder for your interviewee to understand what you want them to do. It also makes the interview artificial and cold, which can lead to unnecessary stress and anxiety in your interviewee. This is true for everybody, but for a neurodiverse developer, it can be much more potent.

 

“The most inclusive interview process I ever experienced was at Google,” Matt said. “And the reason I felt they had such an inclusive process is that it was wildly conversational. They were incredibly good at explaining what they were asking and what they were looking for. And to me, it was an incredibly friendly process.” -On the Dev Interrupted Podcast at 24:10

2. Neurodiverse developers prefer straightforward and clear instructions 

When giving instructions, particularly in practical tests, it’s important to make sure that you’re being clear and straightforward. Leaving ambiguity can cause problems, especially for neurodiverse developers. That ambiguity can distract away from the actual task at hand. The clearer your instructions, the better you’ll test a developer’s actual skills.

 

“I would say the reason I failed the system design interview was (and this is an example of what autism will do during an interview) it was the first system design interview I ever had. And I spent half the time trying to understand the language that the individual was using, rather than solving the problem, trying to make sure we’re just on the same page with what we were saying,” Matt said. -On the Dev Interrupted Podcast at 24:40

3. Neurodiverse developers need diverse recruiters, and stick around for longer once hired

Everyone has their own biases. While we should all strive to overcome those, it’s not always possible. The best way to avoid those problems is to make sure your interview team is diverse. Some coping mechanisms and strategies can seem strange to a neurotypical recruiter at first.

For example, someone with ADHD might ask you to repeat points or be typing as you speak. While it could initially look like they’re answering emails or not paying attention to you, it’s more likely that they’re taking notes to make sure they follow your instructions properly. The more diverse your recruiters, the fewer false assumptions you’ll make.

“Most recruiters are used to looking at neurotypical applicants, and they essentially have mental flags that come up with certain things, certain questions or anything like that,” Matt said. “Companies should ask: Do I have inclusive recruiters? So say, for example, at Google, they had incredibly inclusive recruiters. I was recruited by a deaf individual, for example. So this person very clearly understands me and anything that was going on.”-On the Dev Interrupted Podcast at 25:13

4. Neurodiverse developers could be more productive, and worth changing your processes

A program at Hewlett Packard Enterprise hired over 30 neurodiverse people in software testing roles at Australia’s Department of Human Services. The initial results from the program seem to suggest that those testing teams are 30% more productive than others, according to an article in the Harvard Business Review, called neurodiversity as a competitive advantage.

 

It would seem that, while a neurodiverse person might struggle in some areas—like the social anxiety brought on by an interview—they could exceed in others, such as pattern recognition.

Watch the full interview

If you’d like to hear more from Matt on neurodiversity in software development, you can watch the full podcast on our channel.

Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is inspired by Dev Interrupted - the go-to podcast for engineering leaders.

Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.

Listen and subscribe on your streaming service of choice today.

Discover Our Most Popular Podcasts
Join the Dev Interrupted discord

At Netflix, we don’t just think about productivity - we engineer it. There’s an entire team within Netflix dedicated to productivity. I lead the Develop Domain along with my Delivery and Observability Domain peers, and together, we make up Productivity Engineering.

I recently sat down with the Dev Interrupted podcast to discuss all things productivity, how I run my team, and how other managers should view employee success. Here’s how we think about it at Netflix:

Can productivity be engineered?

In short, yes! Productivity is not a generic term for team performance or a perfunctory buzzword used during team meetings. The productivity team is an actual organization. The work we do is foundational to Netflix’s development teams. Productivity Engineering lives within the broader, central Platform organization.

The role of the Productivity Engineering team is simple: we exist to make the lives of Netflix developers easier. Abstracting away the various “Netflix-isms” around development, delivery, and observability, productivity allows devs more time to focus on their domain of expertise. 

“We are sort of like the nerds’ nerds, if you will, enabling them to use our platforms and tools so that the work that they're doing is focused on studio and streaming, without thinking about everything that's under the hood.” - On the Dev Interrupted Podcast at 2:31

With the recent addition of Gaming to the list of Netflix’s pursuits, the resulting focus becomes even more important.

Practically speaking, it’s the role of Productivity Engineering to help with things like coding, testing, debugging, dependency management, deployment, alerting, monitoring, performance, incident response, to name a bunch. Netflix utilizes the concept of a “paved road,” the frameworks, platforms, apps, and tools we build and support to keep our devs rolling. The idea is to keep workflows streamlined and enable developers to operate as efficiently and effectively as possible. If the road ahead is cleared of obstacles, you’re going to get to where you need to go faster and with support along the way. 

It’s also about helping developers enjoy the ride. To abuse another metaphor, a sound engineering experience should be like dining at a fine restaurant. If done right, you rarely remember the waitstaff, have a hard time finding something you like, or worry about how they prepared the food; you simply enjoy the experience. If Productivity Engineering is doing their job, they act as the restaurant and waitstaff with developers as the customer, providing nothing short of a beautiful end-to-end experience. 

Measuring Outcomes vs. Output

Measuring all of that productivity can be hard, and there’s no one unicorn measurement to rule them all. Hence, developer productivity teams should focus on impact and outcomes. Above all, Netflix focuses on customer satisfaction. Our philosophy is that while how something is delivered is important, the impact of what’s delivered is ultimately of greater importance. 

"If you're running around a track super-fast, but you're on the wrong track, does it matter? So really, what are you delivering? How you're delivering is important. But if that thing that you're delivering is ultimately doing what you want it to do, that's the most important thing." - On the Dev Interrupted Podcast at 5:05

In this model, the outcome always wins over output or activity. For instance, standard productivity deployment metrics (DORA) as applied to our customers become an important proxy for measuring our success. Key Performance Indicators (KPIs) for productivity are viewed as a reflection of a team’s performance as it relates to customer satisfaction.

I’m a big fan of the SPACE framework, developed by Nicole Forsgren, for precisely this reason. How are our customers doing in terms of Satisfaction, Performance, Activity, Communication, and Efficiency? The answer to those questions reflects how we’re doing as a Productivity organization.

"This is our strategy, these are our hypotheses around, how we're going to improve our customers' productivity. Are those things paying off? And if you can't measure them in some way, who knows? Right? So yeah, we're getting a little more hardcore about this." - On the Dev Interrupted Podcast at 24:17

Key metrics provide productivity teams with a holistic view of performance by establishing benchmarks. Understanding that everything needs to be viewed within the proper context, it’s difficult to improve as an organization if nothing is measured or tracked. 

Comparing Productivity 

Comparing developers’ productivity across teams is a thorny subject at best and downright dangerous for team morale at worst. As the old saying goes, “Comparison is the thief of joy” or what I typically say, “comparisons lead to unhappiness”, or with my kids “eyes on your own paper!”. 

The productivity teams at Netflix take a contextualized view of dev teams rather than relying solely on raw data. Every project is different, the customer base is different, the use case is different, personas are different, and where a team is within the software development life cycle is different.

It’s a basic understanding that comparing apples to oranges is not good math. A team that is just starting out and building something new, is going to look very different than a team with a mature product. By recognizing this, it becomes almost impossible to rank teams against each other because very rarely, if ever, will teams be doing the same thing, in the same space, the same way, with the same people. 

Even a measurement of an outcome pertaining to customer satisfaction (CSAT) is not straightforward. At Netflix and across the industry, we’ve found that satisfaction for internal teams skews lower than satisfaction for customer-facing teams.

The reason? Teams within Netflix are their own harshest critics. When attempting to gauge the performance of an internal team vs a customer-facing team, it’s understood that the internal team is almost always going to score lower on satisfaction, even if both teams are equally effective. 

Context is everything. Measuring productivity means being mindful of context. 

Pushing Productivity 

Any company that wants to be successful must understand how to measure its success. Productivity doesn’t count for much if an organization is not moving towards desired outcomes. 

By viewing productivity as more than just a concept or a raw set of data, the hard-working teams at Netflix have turned productivity into an actual apparatus. It is a living, breathing team of human beings whose devotion to empathetic efficiency improves customer satisfaction and dev team quality of life. I am incredibly proud to lead these teams, and I sincerely hope the work we do inspires other organizations to improve their developers’ experience.

And if you want to be as productive as Netflix, remember that metrics are only as good as their context! 


If you enjoyed this article and you would like to learn more about the work that I do at Netflix, I invite you to come join me at INTERACT on April 7th

This will be the second time that I have sat down for a panel discussion hosted by Dev Interrupted. I love being a member of the Dev Interrupted community because they are such an amazing resource. If you are a team lead, engineering manager, VP or CTO looking to improve your team, come to INTERACT and check out the community - I promise you will learn something.

Pretend you are watching your favorite show on Netflix: Sit back, relax & watch as I share the stage with other amazing engineering leaders from places like Slack, Stack Overflow, American Express, Outsystems, Drata & many more.

>Register Here<

Chaos Engineering might sound like a buzzword - but take it from someone who used to joke his job title was Chief Chaos Engineer (more on that later) it is much more than buzz or a passing fad - it’s a practice. 

The world can be a scary place and more and more companies are beginning to turn to Chaos Engineering to proactively poke and prod their systems and in doing so are improving their reliability and guarding against unexpected failures in production and unplanned downtime. 

During my career I dealt with my fair share of outages, including one that caught me mid-song during a bout of karaoke and far too many that woke me up at 02:00. As the co-founder and CTO at Gremlin, I do my best to make sure no other engineers have to suffer sleepless nights worrying about their product. 

But the question remains, what is Chaos Engineering and where did it come from?

A Short History

The spiritual predecessor to Chaos Engineering is often called by a much more widely recognized name - disaster recovery. The focus when this practice was introduced is much the same as today: proactively suss out production problems by injecting failure. 

Netflix’s Chaos Monkey is probably the most well publicized Chaos Engineering tool as it arguably kickstarted the adoption of Chaos Engineering outside of large companies, but this has led to the erroneous belief that Netflix invented the practice. In fact, the practice was already widely in use amongst the titans of technology. 

Over a decade ago during my time as a Lead Software Engineer at Amazon, we implemented several crude practices designed to inject failure into our systems. The most rudimentary of which was employed by a man called Jesse Robbins, who earned the nickname “Master of Disaster” by running through data centers pulling out cables. 

Let’s just say the practice has evolved a lot since those early days and your data center cables are much safer these days.

What is Chaos Engineering?

“What Chaos Engineering really is, is the art, if you want to call it that, of introducing controlled chaos.” - 2:16 on the Dev Interrupted podcast

At its core, Chaos Engineering is a disciplined approach of identifying potential failures before they have an opportunity to become customer facing outages. 

It is a practice that lets you safely test your assumption about how your systems will behave under duress by actually exercising resilient mechanisms in a controlled fashion. You literally "break things on purpose" to validate and build resiliency. The end goal of Chaos Engineering is not to inject arbitrary failure into a system, but rather to strategically inject turbulence to enhance the stability and resiliency of your systems.

How Chaotic is Chaos Engineering?

I always tell people that Chaos Engineering is a bit of a misnomer because it’s actually as far from chaotic as you can get. When performed correctly everything is in control of the operator. That mentality is the reason our core product principles at Gremlin are: safety, simplicity and security. True chaos can be daunting and can cause harm. But controlled chaos fosters confidence in the resilience of systems and allows for operators to sleep a little easier knowing they’ve tested their assumptions. After all, the laws of entropy guarantee the world will consistently keep throwing randomness at you and your systems. You shouldn’t have to help with that.

How do I Start?

One of the most common questions I receive is: “I want to get started with Chaos Engineering, where do I begin?” There is no one size fits all answer unfortunately. You could start by validating your observability tooling, ensuring auto-scaling works, testing failover conditions, or one of a myriad of other use cases. The one thing that does apply across all of these use cases is start slow, but do not be slow to start.

What I mean by this is to start testing across just a few nodes versus impacting your entire fleet. We refer to the impacted area as the “blast radius” and we highly recommend starting with a small blast radius (the number of systems impacted) and increasing it over time.

By starting small you allow yourself to gain confidence in both the experiments you are running and your systems. Of course your risk tolerance is also a factor of how large a blast radius your organization will use. 

For instance, a large banking institution with millions of customers has a much lower risk tolerance than a tech startup with a couple hundred customers. In that case, they would want to run experiments in a programmatic way and would need to be very explicit about communicating to the rest of the organization what tests are going to be run and when to avoid any unplanned 2am or 3am disasters. 

Eventually you want to get to the point where all of this is automated, a process we refer to as “continuous chaos.” Starting small with automation could be something as simple as taking out a single node; then taking out five nodes; then ten; and so on. Eventually you automate the process at a level you are comfortable with.  

“Ultimately you want to be able to handle any of this random chaos being thrown at you, because that's what the world is, it's entropy, it's degradation” - 7:35 on the Dev Interrupted podcast

No Tolerance for Downtime

When I founded Gremlin, it was just myself and my co-founder developing the first iteration of the product. The business looked very different then and I jokingly referred to myself as the “Chief Chaos Engineer” responsible for implementing code that was mostly used by enterprise companies. Many of these companies came to us because they had reliance thrust upon them by the US government or they had top-down reliability standards and they wanted a tool to help them shore up their systems. 

As the company began to evolve, so did the customer base. These days it’s not just Fortune 500 companies that care about reliability, it’s everybody. Planned downtime is a relic of days gone by. It is no longer acceptable to espouse planned maintenance windows as part of development lifecycles and customers don’t have the patience for products they rely upon to spend any time unavailable. Companies recognize this dynamic - and it’s not a hard one to miss. 

Seemingly our appetite for technology has gone up exponentially while our ability to stomach downtime has drastically decreased. Customers expect that your product is always working, always running. If your product is down because of outages then there are ten other similar products waiting in the wings to take their money. 

Making Lives Better

Visibility is high these days and companies don’t need the publicity that comes with making any unforced errors, let alone to be subject to errors not of their making. No one wants to be blown up on Twitter because their product isn’t working or because one of their downstream dependencies or their cloud provider had an unexpected outage. 

By preparing for the worst, we can be at our best as an industry and can be prepared when disaster eventually comes knocking. That’s why when an unexpected outage occurs or there is a production failure customers will never even know it happened. 

I often joke that we are the engineers’ engineers because many of us know that feeling of being jolted from a dream at 03:00 by our pagers, groggily wiping our eyes and whipping out the laptop to go dig through a sea of monitoring dashboards and logs. It’s not fun and it’s exactly why I founded Gremlin. Because there is a better way to approach operations than merely sitting back on our haunches and waiting for the next outage. Chaos Engineering not only helps to protect against the randomness of the world, but also teaches people how to build more reliable software. And if enough people build more reliable software, we build a more reliable internet.

_____________________

Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is inspired by Dev Interrupted - the go-to podcast for engineering leaders.

Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.

Listen and subscribe on your streaming service of choice today.

A good SRE engineer will tell you your service is never down. A great SRE engineer will tell you that’s not what you should be measuring. In fact, they’ll tell you their job is customer service. 

Site Reliability Engineering (SRE) has grown immensely popular with many of the world’s largest tech companies, like Netflix, LinkedIn and Airbnb employing SRE teams to keep their systems reliable and scalable.

Along the way, SRE engineers have become one of the most sought after engineering roles in tech. 

The role is traditionally understood as ensuring that services are reliable and unbroken, but reliability and uptime aren’t perfect metrics. Perhaps what organizations should be asking themselves is what their customers think of their service. 

Wandering down to your engineering department and asking your SRE team about customer satisfaction is a good place to start. 

Their answer just might surprise you. 

History of SRE

In practice, Site Reliability Engineering has been around for a while. In the past its functions were covered by roles that had names like production ops, disaster recovery, testing or monitoring. The rise of cloud computing facilitated a need for more engineers in production. The complexity only grew as more organizations transitioned from monolithic infrastructures to distributed microservices. 

Modern Site Reliability Engineering originated at Google in 2003 with the work of Benjamin Treynor, who is seen as the “father” of what we now simply call SRE. Treynor, who coined the term, was a software engineer placed in charge of running a production team. With the goal of making Google’s website as reliable and serviceable as possible, he asked that his team spend half their time on operations tasks so they could better understand software in production. This team would become the first-ever SRE team.

Ben Treynor said, I'm paraphrasing, ‘[SRE] is essentially like throwing a software engineer at an operations problem’, right? Because you come from that developer mindset, that design and, you know, you think about all of these things. So think about it as a developer but apply it to an operational type of problem.” - Brian Murphy on the Dev Interrupted podcast at 4:26

Why not uptime?

So why shouldn't you be too concerned about your uptime metrics? In reality SRE can mean different things to different teams but at its core, it’s about making sure your service is reliable. After all, it’s right there in the name. 

Because of this many people assume that uptime is the most valuable metric for SRE teams. That is flawed logic. 

For instance, an app can be “up” but if it’s incredibly slow or its users don’t find it to be practically useful, then the app might as well be down. Simply keeping the lights on isn’t good enough and uptime alone doesn’t take into account things like degradation or if your site’s pages aren’t loading. 

It may sound counterintuitive, but SRE teams are in the customer service business. Customer happiness is the most important metric to pay attention to. If your service is running well and your customers are happy, then your SRE team is doing a good job. If your service is up and your customers aren’t happy, then your SRE team needs to reevaluate.

A more holistic approach is to view your service in terms of health. 

The Four Golden Signals

As defined by Google, these are the four golden signals of SRE. If these can be managed effectively, then you probably have a healthy system. 

Establishing system health

“The best way to get started is just measuring stuff, you know, just getting the baseline of what's healthy, what's not healthy, what looks like health, and then you can start working from there.” - Brian Murphy on the Dev Interrupted podcast at 10:49

It can be difficult to know whether or not your organization should consider forming an SRE team, or what your next steps are if you’ve already made the decision. 

Again, think of your decision in terms of a holistic approach, not just your uptime. If you have high uptime, that’s fantastic, but what you should be establishing is a benchmark. 

Using the four golden signals to guide you, establish what you think a healthy system should look like and set your benchmark. Keep measuring over time and you will begin to see the areas that are good or require more work. 

These measures will help inform all of your future decisions. Perhaps your organization is ready to roll out new features or make choices around expanding your service. 

Critically, the health you establish provides insights into customer happiness. If things look good you probably have happy customers. 

Internal customers

When done right SREs aren’t just making customers happy, they’re making the lives of developers easier too. Nothing is worse than having to stop because there’s a problem in production. Good SRE teams can shield dev teams by focusing on major hotspots.

If the fires are being managed before they are out of control, it allows developers to keep pushing out features. It even gives them the freedom to keep breaking things, if necessary!

When things do break, or require a slowdown, a dialogue can occur. A good SRE understands that the developer who wrote a piece of code understands it better than anyone. The model for good internal customer service is an SRE who brings in a developer, gives them ownership of the code they created, and offers to help them fix it.

Happy customers are the best customers

Whether you already have an SRE team or are thinking about forming one, remember to think beyond the engineering - think about the customer. 

Ask yourself if your customers are happy and if you would describe your service as healthy. Remember to think about your own teams as well, your developers will thank you for it. 

_____________________

Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is based on an episode of Dev Interrupted - the go-to podcast for engineering leaders.

Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.

Listen and subscribe on your streaming service of choice today.

 

Following a recent interview on the Dev Interrupted Podcast, OutSystems CEO and founder Paulo Rosado joined us to chat about his path to founding the company, advice for successful leaders, and the growing threat of technical debt. The conversation below has been edited for length and clarity. 

_____________________

Tell us about OutSystems' founding story. What inspired you to start the company?

In February 2021, OutSystems was valued at $9.5 billion dollars - but it certainly didn’t start out that way. The idea behind OutSystems was decades in the making, and its mission stems from what I observed after moving to Silicon Valley back in the mid-nineties. 

My journey in technology began when I graduated with a degree in computer engineering from Universidade Nova de Lisboa in Lisbon, Portugal and moved to the US to get my Masters in Computer Science from Stanford. Afterward, while working in Silicon Valley, I began to understand just how much of a problem technical debt was. 

While working on a very large engineering team, we were faced with tackling a gigantic project in Java and I realized the issues of releasing and maintaining code sustainably. The lack of productivity in the software development process was appalling. Fixing this problem is ultimately what motivated me to found OutSystems. 

Before founding OutSystems, there was a small company I founded and later sold, which focused on internet and intranet projects. It wasn’t a bad company, but we kept failing. Projects were never delivered on time or on budget. 

We would think to ourselves, “We’re smart. How is this possible?” Our inclination was to blame the requirements of the project, labeling the scope as incorrect and adjust from there. However, we began to realize that the companies hiring us for these projects wanted us to make changes as we were developing in response to rapidly changing environments. 

The issue we began to face was the continual accumulation of technical debt. We would reach first production and realize we had built something users didn’t want, requiring us to go back and rework the stuff we had just built. 

“We came up with this realization that the problem was not that the requirements up front were wrong. The problem was that the cost of changing wrong requirements, which are a fact of life, is very high.” - on the Dev Interrupted podcast at 6:03

 

This phenomenon was occurring in 90% of projects at the time. Things were always over budget and always late. 

Today, it’s easy to take this for granted because concepts like Agile, DevOps, CI/CD are mainstream. But at the time, you had to build software the same way you build a bridge.  

Why is technical debt a challenge for companies now? How has this problem changed?

Technical debt has become a large problem for businesses, and one that only compounds with time. Tech debt doesn’t have a singular cause - it’s the accumulation of several factors. 

Over the course of my career, I’ve seen first-hand the complexity brought about by the evolution of software development. For instance, we’ve seen an explosion of languages, paradigms and frameworks that can all be used to achieve a solution. Often these languages are dispersed with no connections between them, so tracking these dependencies requires a great deal of sophistication. 

In addition to this, turnover within the development team is a critical problem that leads to technical debt. The moment a company loses a developer, the knowledge accrued by that developer also departs the company. The hole left behind is complex, including code,  frameworks and intent behind how their systems are structured. 

It’s been my experience that a lost team member can take as much as 20% to 30% of the fundamental knowledge of a system with them. Reverse engineering their work is both time-intensive and inefficient. 

Companies have tried to corral this problem by investing in coding standards. While these constraints can help mitigate the loss of a valued developer, our research indicates turnover remains a significant problem. 

OutSystems recently released a study on the effects of technical debt. What were its findings? 

Recently, OutSystems surveyed 500 large companies around the world to examine the cost of technical debt facing businesses and uncover the challenges companies face as they confront its causes. The results from the companies surveyed were many of the same things I’ve observed throughout my career. 

It’s important to note that while the causes of technical debt have largely remained the same, the pace at which technical debt occurs has grown substantially.

And so it's a hack, right? What we call a hack at OutSystems, they did a hack to just release the software quickly. And those hacks compound into technical debt.” - on the Dev Interrupted podcast at 27:11

The survey we conducted isolated three major causes of technical debt. They are as follows: 

  1. The amount of developer frameworks. An increase in frameworks leads to an increase in technical debt. 
  2. Developer erosion. Employees leaving an organization and taking legacy knowledge with them. 
  3. Compromises in quality of architecture and code. Often caused by a shortsighted view that what needs to be done now is more important than long-term stability of the codebase.

In the past, companies believed they could buy their way out of this problem, but that strategy has proven ineffective. The reality is, the most successful companies must build the software they require to meet their business needs. 

Simply purchasing what you need doesn’t solve your problems because even purchased systems must be cobbled together, requiring unique API’s, unique UI’s, unique portals, and unique mobile applications. 

Does OutSystems play a role in helping companies cut tech debt? 

The core of what we do at OutSystems is focused on tackling those three fundamental problems. We understand that technical debt amasses slowly over time, through a myriad of decisions that appear much smaller at their onset than their totality would suggest. Once these “tiny” decisions become a major problem, they inhibit investment in current operations and future innovations. 

The increasing pressures of today’s fast-paced business environment often push companies toward decisions that spiral into technical debt. The good news is that by creating a development process that marries short-term deadlines with long-term strategic goals, it’s possible to “pay down” that debt. 

I believe that any company is capable of whittling away technical debt with the correct tools and processes, and I founded OutSystems because companies shouldn’t have to choose between building fast and building right. 

To learn more about technical debt, how to combat it, and what to expect in the future, you can download the 2021 Technical Debt Report on our website.  

_____________________

Starved for top-level software engineering content? Need some good tips on how to manage your team? This article is based on an episode of Dev Interrupted - the go-to podcast for engineering leaders.

Dev Interrupted features expert guests from around the world to explore strategy and day-to-day topics ranging from dev team metrics to accelerating delivery. With new guests every week from Google to small startups, the Dev Interrupted Podcast is a fresh look at the world of software engineering and engineering management.

Listen and subscribe on your streaming service of choice today.

 

Darren Murph, Head of Remote at GitLab

The office of the 20th century is a testament to design. A great deal of thought goes into the layout of a building. How are the offices laid out? Where are the elevators located? Where will teams meet? But the focus on co-located office space is quickly becoming a relic of the past. To meet the challenges of the 21st century GitLab's Head of Remote Darren Murph is pushing organizations to put just as much thought into their remote work structure as they would an office building. 

For many companies, the transition to this mindset comes with difficulty. They've shifted into remote work as a necessity, but maintain the 20th-century ‘office-first’ mindset. While this is passable and can work, it's not ultimately taking advantage of the key benefits of a virtual atmosphere. 

To take advantage of the shifting dynamics, GitLab is using their own platform to consolidate all of their virtual collaboration. Providing a single source of truth, GitLab has designed the virtual version of a central hallway where all work is funneled. This breaks down organizational siloes and enables the GitLab team to collaborate with maximum efficiency, by making sure that everything is as visible and as transparent as possible for everyone in the organization. 

A company’s ‘central hallway’ is going to look different from organization to organization, but the takeaway for all remote organizations and engineering leaders should be the importance of de-siloing information across your organization. This will encourage virtual collaboration and boost creativity. 

Meetings that Support Remote Culture

A Chief People Officer once asked Darren, “How do we make our meetings better?” His response? “Make them harder to have.” 

Darren believes that you should have as few meetings as possible because people deserve to be able to focus on their work. From this belief flows the practice of using tools like Slack or Microsoft Teams to gather consensus asynchronously, and then reserve synchronous time for meetings where only decisions are made or important status updates are shared. 

This has the effect of focusing a team’s attention which is important as teams become distributed around the globe, and time zones become a greater issue. It's far too easy for your entire day to be meeting with teams across your organization, with people coming online in various time zones to fill your day. Instead, the focus should remain on having critical day-to-day functions performed asynchronously - with meetings taking a back seat. 

In addition to focusing an organization's efforts, being thoughtful about structuring remote work also reduces meeting fatigue. We’ve all experienced being on Zoom or other video conferencing software continuously throughout the day. Not only is it inefficient and distracting, but it can lower your company morale and leave you exhausted and feeling like you didn't accomplish anything during the day.

Darren’s ideas may have seemed radical just a couple of years ago. But he and the folks at GitLab are pioneering - and thriving - in today’s remote environment. The office of the 21st century is undoubtedly going to be virtual, so remember to put as much rigor and thought into your virtual work structure as you would if you were designing a building. 

To learn more about how GitLab and other companies transitioned to remote work, check out Dev Interrupted's Remote Work Panel on August 11, from 9-10am PST.

Image showing four leaders of remote work from Dev Interrupted

Interested in learning more about how to implement remote work best practices at your organization?

Join us tomorrow, August 11, from 9am-10am PST for a panel discussion with some of tech’s foremost remote work experts. This amazing lineup features:

Dan Lines, COO of LinearB, will be moderating a discussion with our guests on how they lead their teams remotely, how the current workplace is changing, and what's next as the pandemic continues to change

 

Don't miss the event afterparty hosted in discord from 10-10:30am with event speakers Chris and Shweta, as well as LinearB team members Dan Lines and Conor Bronsdon.

______________________________________________________________________________________________________________________________________

If you haven’t already joined the best developer discord out there, WYD?

Look, I know we talk about it a lot but we love our developer discord community. With over 2000 members, the Dev Interrupted Discord Community is the best place for Engineering Leaders to engage in daily conversation. No salespeople allowed. Join the community >>

Dev Interrupted: The New Faces of Engineering Leadership

 

 

 

Shweta Saraf, Senior Director of Engineering at Equinix
Shweta Saraf, the Senior Director of Engineering at Equinix, has a particularly interesting remote work story: she experienced a fully remote acquisition during the pandemic.

Her former employer - Packet - was acquired by Equinix, a huge company with more than 30,000 employees and over 200 locations around the world. Suddenly, the small team at Packet who were experts at remote work found themselves in the position of trying to onboard not just themselves at a new company, but onboard an organization of 30,000+ to the principles, structure, and best practices of a fully remote work environment. Because Equinix is the largest data center company in the world, operating data centers and office hubs all over the globe, the switch to remote work had to be as seamless and efficient as possible.

One of the key areas Equinix first looked to be efficient was in their meeting practices. They began what they refer to as ‘Better Way Wednesdays’ (their name for a best practice also utilized by Shopify) as a way to better inform employees and leadership. These meetings paired with a monthly business memo, capture the state of business along with key achievements, challenges or blockers, and give senior leaders KPIs and metrics.

This practice made it possible to cut down on the number of weekly status meetings, where the same information is passed on in different formats or through different levels of abstraction. The investment in this practice paid off immediately. Teams found that the “Better Way’ meetings would often only take an hour, but would save tons of time across the board. It also had the added benefit of reducing zoom fatigue. More focus time and better communication were realized by a single meeting shift.

The biggest focus Equinix implemented was asynchronous communication, because of the many time zones involved and the number of people all over the world, including engineering teams. Rather than restrict productivity to a specific set of time zones, async communication gives employees the agency to be held accountable for completing their work on their own time. Meaning that it is no longer necessary to align employees on separate continents onto the same Zoom call if that information can be transcribed in a chat app.

However, for companies where office culture is strong, with ceremonies happening in-office, it can be a learning process to adapt to working completely remote. With Packet’s experience aiding the transition for Equinix, a cross-synergy of ideas was realized. Employees from both companies found themselves questioning former agile ceremonies, such as stand-ups and retros, and whether these can be done asynchronously, or if they require a meeting at all. The merger resulted in an easier working environment for everyone.

Equinix, a company of tens of thousands of employees and hundreds of locations, transitioned to remote work successfully during the pandemic not because they were remote-friendly, but because they adopted a mindset of remote-first. Meaning that a developer on the other side of the globe could participate meaningfully and not feel left out. While not every company underwent an acquisition during the pandemic, Equinix's journey to a fully-remote organization is a familiar story for many tech companies this year.  To learn more about Equinix and how other companies transitioned to remote work, check out Dev Interrupted's Remote Work Panel on August 11, from 9-10am PST.

Interested in learning more about how to implement remote work best practices at your organization?

Join us tomorrow, August 11, from 9am-10am PST for a panel discussion with some of tech’s foremost remote work experts. This amazing lineup features:

Dan Lines, COO of LinearB, will be moderating a discussion with our guests on how they lead their teams remotely, how the current workplace is changing, and what's next as the pandemic continues to change

Don't miss the event afterparty hosted in discord from 10-10:30am with event speakers Chris and Shweta, as well as LinearB team members Dan Lines and Conor Bronsdon.

______________________________________________________________________________________________________________________________________

If you haven’t already joined the best developer discord out there, WYD?

Look, I know we talk about it a lot but we love our developer discord community. With over 2000 members, the Dev Interrupted Discord Community is the best place for Engineering Leaders to engage in daily conversation. No salespeople allowed. Join the community >>

Dev Interrupted: The New Faces of Engineering Leadership

 

 

Informing someone that you want to “measure” them is not a great way to start a conversation. Software developers, like all people, tend to look unfavorably upon having their performance closely measured. But measuring developers is one of the hottest trends for companies around the globe. So is it tyranny to measure people?

People are quick to note that numbers don’t tell the whole story and can become defensive at the notion their productivity should be quantified somehow. This resistance can become even more entrenched when teams become stacked against each other. 

Many leaders - like Netflix's Kathryn Koehler - believe we should absolutely avoid stack ranking of individuals and teams. After all, the human elements of teams - communication, coordination, leadership - can all affect the speed or perceived productivity of a team or process, so how can you quantify that? 

Thankfully, plenty of tools exist to enable a data-driven approach to the development process. With the right approach, these tools can be used to enhance any individual’s or team's performance. Beyond performance, when applied correctly, these tools can bring about more harmonious alignment between leadership and employees. 

But first, why do we measure? 

Why measure at all? 

It’s important to remember that measuring is just a tool -- it can be used for both good and for evil. An effective leadership team understands this basic fact. The cool thing about metrics is that they provide insight that might otherwise be difficult to obtain. Employees should note that good leadership will use metrics merely to inform their decisions, not solely drive them.

As a basic rule, it is difficult to improve something that is not measured.

I started with the really basic observation that I believe you cannot be sure that you're improving something if you don't measure it, right? I mean, think of trying to lose weight without weighing yourself. -Luca Rossi, from the Dev Interrupted Podcast at 4:31

Measuring creates a foundationa benchmarkwhich can be built upon. Once this foundation is established, organizations can begin to set goals. If you can set goals, you can rally a team; and if you can rally a team, you can achieve your product goals. This all becomes easier when you adopt a quantitative approach to your process. Success flows from the foundational element that measuring provides. 

Lastly, there exists a psychological benefit to measuring: namely, that people implicitly understand that metrics which are measured are important. After all, no one measures something they don’t care about. That which leadership identifies as measurable naturally informs employees what metrics leadership thinks are important. The very act of deciding to measure something telegraphs your intentions. Inversely, the opposite is also true: people tend to believe what is not measured is not important. 

All of this means leadership should be careful to identify what metrics are important because they define your organization and its behavior.

What should be measured?

First, an organization must decide what metrics are worthy and whether or not they are actionable. A lean approach is best here. Decide what does and does not matter to you, and then move confidently in that direction. 

So what metrics are worthy to an engineering team? 

There are many valid metrics which include things like Velocity, Branch Coverage, and Cycle Time, but one of the most interesting metrics we care about at LinearB is the Pull Request. In fact, it’s kind of our thing

A pull request has two major goals: the first is to improve the quality of code before you release it, while the second is to share knowledge between the team about the code base. 

Pull Requests are great because they let your team have control over what goes into git; they let people comment about what and why things were done a certain way; and they can serve as permanent documentation about a chunk of code that can be useful down the road.

If we at LinearB were to recommend one metric to start measuring, it would be Pull Request Size. Measuring PR Size -- and using that measurement to encourage smaller Pull Requests -- will help drastically lower Cycle Time. It will encourage all kinds of good behavior around code reviews, and it will prove that metrics can have a positive impact on business outcomes.

Once you have identified the best metrics to track for your success, you will need to go about making sure they last. Here it is best to maintain a lean approach, avoiding large upfront investments of time or money. 

How to make metrics stick

Image shows Luca Rossi and his title: Head of Engineering at Translated

And you should, as a leader, as a manager, you should make people feel that you care. There is no substitute for that. - Luca Rossi, Dev Interrupted Podcast at 21:10

We have all worked at a company where a change was decided, people were excited to see the change and then, after a couple of months, everyone has forgotten about it. Metrics are no different. Once a company makes the decision to align itself with a metric it must make the effort for it to last. 

It’s often best to start small. In the beginning, it is best to maintain a lean approach, avoiding large upfront investments of time or money. With small actionable change in mind, there are two ways to approach metrics adoption:

  1. In a top-down approach you ‘ll want to speak with your senior developers early on. Give them strategic input to help them figure out the best way to implement metrics into your dev team. An engineering leader can then explain how the metrics fit into the company’s strategic direction and convey that message clearly to employees. It is important to get buy-in from your employees because they are the ones building value, and further discovering what can and should be measured. When done correctly, the top-down approach will have managers saying, “Wow, I’ve always wanted to measure this. Now I can.” 
  2. In a bottom-up approach, it’s all about making your team understand what you want to achieve. Provide them with the initiative to build value so they understand why and how they are being measured. For instance, most developers can relate to a bad pull request experience - one that takes way too long or one that has a poor review and causes issues during production. People understand that a good code review should be both fast and accurate. So by starting small with a metric that is already understood, you will gain buy-in from your team. When done correctly, the bottom-up approach will have employees saying, “Wow, we didn't think we could measure this, and it's actually valuable.”

How you approach adjusting for change in the long-term is up to you. In an ideal world, both the top-down and bottom-up approaches would be utilized simultaneously.

Most importantly, your people should know that you care and that you value their feedback. 

Bottom Line

To achieve future goals a development team should know where it started. Metrics provide that benchmark. They are a foundation on which future success and business alignment can be built. 

Furthermore, in engineering, just as in life, tiny improvements are staggering when tracked continuously over a period of months and years. These gains boost team confidence and collaboration. 

Because with continual improvement, people feel as if they are working within a team and working as a team. When things go right, and everyone improves together, bonds are formed.

And of course, if you want to learn more about LinearB’s metrics or find out what our customers already know, you can book a free demo of LinearB.

I included a few quotes from Luca Rossi above. He has a great blog for those who are passionate about engineering. You can check it out here: https://refactoring.fm/ 

Who says developers can't be the life of the party? Come see for yourself why the Dev Interrupted Discord has emerged as the go-to community for developers. Growing from 0-1400+ engineering leaders in just the past 8 months, our community is attracting the attention of CTOs and VPs of companies like Netflix, GitHub and Google. Join the community.