Being indistractable is a super power. Nir Eyal started out his Mind the Product SF 2018 presentation by sharing that in the five years since his book Hooked came out he’s kept up with everything, gathered feedback, and learned even more about the neuroscience and behavior that drives our motivations and attention.
My main takeaway from his message is simple. You’ll know when you’re distracted by planning ahead. Using DND (do not disturb) mode to plan your time grants you freedom for what author Cal Newport calls “Deep Work” and Nir Eyal names “Traction.”
Working to your input each day rather than output to get important work done. Nir mentioned the “Forest” app to stay focused. In the few weeks after I attended Mind the Product my colleague Rachel McRoberts also mentioned this app to me. It’s a simple concept: each focus period grows a virtual green tree. If you interrupt the focus, the tree dies and you have to start over. Nir also uses the “Time Guard” app which allows you to set sensible limits to time spent on distractions.
I highly recommend watching this 28 minute video to hear and understand Nir’s latest work and pick up practical tips on decluttering and avoiding distraction.
We are rewarded for the answer. Not another question. It’s beaten out of us from kids, and later in work it can be hazardous for your career. —Warren Berger
Via the Farnam Street podcast I loved this cultural insight. An honest assertion that our business culture rewards quick-hit answers instead of rewarding the act of slowing down to find the right question.
Why do I avoid the backlog and overflowing todo list? Why do I shove one more tool into a drawer already full of bits and bobs? Why do I squeeze yet another outfit into an overflowing closet? Because confronting this mess is hard work. It means making tough choices. Most of the time, I’d rather not decide.
To make sense of my environment, my work, my life—I need to confront the mess. Once the clutter is gone I know I’m left with just the essentials. Once the dust is clear, I can get to work.
In The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing Japanese organizing consultant Marie Kondo explains that while the process of decluttering and cleaning your home is important to your physical wellbeing, the true outcome is happiness and clarity in your mind. The habit gives you the freedom to take responsibility for important decisions.
I learned so much from this book, from awareness and mindfulness to practical tips on folding and hanging clothes. The habit of tidiness is now a mindset for me rather than just a chore to be completed.
The process starts by discarding the inessential items. Tidying up defines what is valuable: learning what I can do without; learning which books, clothes, keepsakes, or kitchen tools give me the most joy.
In applying her principles, my books were the hardest. I had hundreds and many in the category of “I’ll read this someday.” I trimmed it down to 80-90 best of the best — including this one! Hah. Keeping sentimental, must-read again, and books I reference often. The rest I gave as gifts to a new home or donated.
Life becomes far easier once you know that things will still work out even if you are lacking something.
A clean home is a perfect metaphor for a clear and organized mind. If my room and desk are clear and tidy I can face the reality of what’s in front of me. “It is by putting one’s own house in order that one’s mindset is changed. When your room is clean and uncluttered, you have no choice but to examine your inner state.” Am I scared of what I’ll find?
Because you have continued to identify and dispense with things that you don’t need, you no longer abdicate responsibility for decision making to other people.
Decisions are now easier as I see more clearly the work in front of me. And I enjoy even more the treasures, clothes, and tools I chose to keep.
I have a hiring heuristic called ABCDEF, which stands for: agility, brains, communication, drive, empathy and fit. For gatekeepers, I’ve found agility is the most important attribute. To test it, I ask them: ‘Tell me a best practice from your way of working.’ Then I ask: ‘Tell me a situation where that best practice would be inappropriate.’ Only agile thinkers can demonstrate that a best practice isn’t always best,” says Ries. “For an attorney, that might be probing for a situation where you shouldn’t run everything by a lawyer. Hopefully they don’t say ‘criminal conspiracy,’ but you want someone to say something like: ‘You know what? If you’re a two person team, and you’re just doing an MVP, and six people are involved, you don’t need a lawyer.’ It requires some common sense and mental flexibility.
Keep learning. You’ve only touched the edge of the issue. Develop your judgement, which is essentially decision making under uncertainty. Pattern matching: keep growing your pattern matching database, and be very conscious about it.
Heard on the a16z podcast for March 26, 2018 with Andy Rachleff, Wealthfront founder and CEO.
My notes and takeaways from a long read on anomalies and system complexity called the STELLAReport from the SNAFUcatchers Workshop on Coping With Complexity, 2017. Via Matt.
This paper is one of the best I’ve read in a while. Many lessons here match my experiences developing—and breaking—software for WordPress. I gained new insight into adaptive mental models, how best to coordinate teams during an outage, and how much I both love, and depend on, debugging and troubleshooting.
Building and keeping current a useful representation takes effort. As the world changes representations may become stale. In a fast changing world, the effort needed to keep up to date can be daunting.
Several years ago I told a colleague in passing that my professional goal as a software developer was to build a mental model of everything in our codebase. To know where each piece lives and how it works. They just laughed and wished me luck. I was serious.
Though my approach may have seemed naïve, or maybe unnecessary in my job, I saw it as essential for survival in a bug-hunting role. A step toward mastery and adding more value to the company. What I didn’t know at the time was that we were past the point where one person could keep the entire codebase organized in their head.
What this paper indicates is that my coworker was right to laugh—it’s not useful to hold my own mental model of the entire system. I should however strive to learn from every opportunity to update the working knowledge I do have at any given time.
Note: “Resilient performance” sounds like a fancy word for “uptime.”
Much of my team’s work at Automattic is in the area of software quality: error prevention by blocking deploys when automated tests fail, building developer confidence by creating smarter, faster testing infrastructure. So much more we could do there in the future.
Many big tech companies have a specific role around this called Site Reliability Engineer (SRE). Combined with Release Engineering teams they build safeguards such as deploying to a small percent of production servers for each merge, or starting with a small amount of read-only HTTP requests. When no errors occur, the deploy continues.
At a software quality conference last year I learned how Groupon approaches this via Renato Martins. They use “Canary” tests like those we run on WordPress.com—small, critical tests. Once these pass, they push code into a blue/green deployment system. Which means if any error occurs the deploy system immediately switches all traffic to a previously known safe version (blue) while reverting the broken one (green). A continuous sequence of systems: one known safe version, one new version.
Groupon deploys the blue/green changes to a small subset of the public-facing servers, say 5% of all traffic. On top of that they have a Dark Canary, which is a separate server infrastructure that receives the live production HTTP traffic but doesn’t actually reply to the end user’s requests. They run statistical analysis on the results of this traffic to determine whether the build is reliable or not. For example, looking at HTTP response codes to see how many are non-200. (It’s more sophisticated than that, but basically it’s risk-free testing on a tiny portion of traffic.)
The most interesting piece mentioned is that when Groupon first developed this system, they were failing the build once every two weeks or so. But over time that number dropped to almost zero because the developers became conscious of it, and didn’t want to be the one to induce a failure. So it changed their culture, too.
Back to the STELLA report.
Proactive learning without waiting for failures to occur.
Experts are typically much better at solving problems than at describing accurately how problems are solved.
Eliciting expertise usually depends on tracing how experts solve problems.
The concept of “above-the-line/below-the-line” appeared in Ray Dalio’s Principles book as well. Great leaders are able to navige above and below with ease. In this case it deals with mental models of a system (above) with the actual system (below). Another way of stating it: below the line are details around “why what matters.” Above the line is the deeper understanding around “why what matters matters.”
A somewhat startling consequence of this is that what is below the line is inferred from people’s mental models of The System. What lies below the line is never directly seen or touched but only accessed via representations.
So true. I remember seeing an internal post mapping to explain how a new product worked with reactions from people saying, “Wow, I had no idea it was this complex.” And, “Thank you, now I see and understand it clearly.” I often think to myself when considering a software system, “This is probably only fully represented in one developer’s mind.”
Two challenges I’ve come across in practice:
To keep an accurate representation yourself in order to get work done.
To hold a good enough understanding of how others’ represent it in order to work in a team.
I love the SNAFU stories in this paper. Feeling the pain reading it—for times I’ve caused an outage on WordPress.com or a committed bad code to a default WordPress theme.
Pattern: a cascading “pile on” effect—I’ve seen this with user sessions on WordPress.com accumulating into the hundreds of thousands, until our UI tests started failing. We finally saw enough slowdowns that a deeper analysis was warranted to uncover the cause.
Surprise: where my mental model doesn’t match reality (both situational and fundamental shifts).
Uncertainty: failure to distinguish signal from noise can be wasteful. “It is unanticipated problems that tend to be the most vexing and difficult to manage.”
Evolving understanding: start from a fragmented view, expand as you learn how it really works.
Tracing: sweep across the environment looking for clues.
Tools: command line is closest and most common: “in virtually all cases, those struggling to cope with complex failures searched through the logs and analyzed prior system behaviors using them directly via a terminal window.”
Human coordination is interesting and also complex: “This coordination effort is among the most interesting and potentially important aspects of the anomaly response.” (Coworkers and I have noted “watching the systems channel for the entertainment and thrill of the hunt.”)
Communication: chat logs help with the postmortem (I saw this often in themes and WordPress.com outages).
Conflict between a quick fix and gaining a clear understanding of what/why it happened.
Managing risk: pressure is high for a quick fix, but potential for other effects is also high.
Tagging “postmortems”—which at Automattic we do on internal “P2” sites. The paper made me laugh here by calling the archive of these recaps a “morgue” (also used in the journalism/newspaper industry).
Anomalies are unambiguous but highly encoded messages about how systems really work. Postmortems represent an attempt to decode the messages and share them.
Anomalies are indications of the places where the understanding is both weak and important.
This is a key point: learning from outages helps us gain a more accurate understand of our system. Back to my point about trying to hold it all in my head: “Collectively, our skill isn’t in having a good model of how the system works, our skill is in being able to update our model efficiently and appropriately.”
The authors seem to treat postmortems as a deeply social activity for the teams involved, valuable beyond the dry technical review. At Automattic we could benefit from more intentional structure and synchronous sharing around this activity.
During the anomaly, coordinating the work can be difficult, assigning well-bounded tasks out to individuals to speed up recovery, bringing onlookers and potential helpers up to speed—versus doing it yourself to focus on the problem
Good insight for technical people from software developers to QA to DevOps:
To be immediately productive in anomaly response, experts may need to be regularly in touch with the underlying processes so that they have sufficient context to be effective quickly.
It’s much harder to work across many codebases and products and be effective in helping resolve an outage. There is high value in “shared experience working in teams” so that communication about the underlying issues is unneeded during a crisis; communications are short and pointed. You already know if your coworker is capable of something, so you don’t even have to ask.
“Sense making” is what I feel I often do in my daily investigative work, and a valuable skill—pattern matching and synthesis.
Strange loops are interdependencies in the failure that cause even more issues. For example, when you can’t log errors because the log file stopped working due to kernel TCP/IP freeze; and the failures caused an overloaded log or full storage.
This bit applies to WordPress.com: continuous deployment can change the culture around site outages, making them “ordinary” and quickly resolved as brief emergencies because of automation that’s readily available. But, when that automation itself fails — like a hung deploy command—it becomes an existential issue. Now we can’t break the site because our mechanism to quickly recover is gone.
A good summary of the balance between taking time to avoid or pay technical debt with the pressure to quickly ship visible product changes for customers.
There is an expectation that technical debt will be managed locally, with individuals and teams devoting just enough effort to keep the debt low while still keeping the velocity of development high.
Reminds me of how software development teams expect framework and platform changes to continue during normal product cycles—most teams I’ve worked with struggle with balancing the need to do both.
Technical debt in general is easy to spot before writing code, by looking at code, and is solved by refactoring. Dark debt is not recognized or recognizable until an anomaly occurs: complex system failures.
In a complex, uncertain world where no individual can have an accurate model of the system, it is adaptive capacity that distinguishes the successful.
A key insight: adding new people to the team or bringing in experts for analysis can help answer the question, “Why are things done the way they are?” Often lacking during internal discussions. We fix the point problem and move on; fighting fires instead of making a fire suppression system.
This STELLA report shows that value exists in participating in open discussions with other companies around these issues. Sharing common patterns, which is a big benefit of open source software, where you can follow not only the fix but the discussion around it.
More SRE (site reliability engineering) references:
When I come to a conversation without technique and provide the space to listen, I do so because I’ve failed at this a thousand times. I’ve planned and schemed and got lost in my own mind — missing the conversation, missing the moment, missing the person on the other side.
This time I’m going to do it differently.
I’m going to pause, give enough time and space to see other person first. Listen deeply so I can adjust my effort to the situation. If it’s the right moment, share what has worked for me. Later, I can ask how I’m doing to measure success.