Atul

Thinking In Systems

2026-05-07T00:00:00+05:30

After writing a short take about system dynamics I recently picked up the book Thinking In Systems again. And this time I finished it.

The book talks about systems, flows, stock, balancing and reinforcing feedback loops, traps, leverages, etc.

I won’t add my notes/ highlights here. Instead, you can take a look at the very good chapter wise summary of the book. A note: The chapter 5 video talks about inventor. That should be intervenor.

The book is written in a simple style. Mentions concepts, terminologies, ideas, etc. Not too technical. And that makes it somewhat easier to read. To the author’s credit, she has deliberately made it even simpler. If you watch some videos on youtube related to the topic, you can see that the simplicity of the book must have been a choice.

There are systems everywhere. And if you pay attention, you can see those systems. From the solar system to your body to the world economy. Almost all are dynamic. If you are familiar with a system under consideration (say your work area, your body, your city, etc.) then it is likely that you know what factors come into play. I think the book will help you reason about a system you are not very familiar with. It, of course, does not make you an expert but it gives you terminology, ideas, etc. to know about and deal with a system.

While reading the book you can immediately relate some ideas to your familiar systems and they are likely to make sense. For example, the author says- ‘it’s easier to learn about a system’s elements than about its interconnections’. It immediately reminded me of Michael Nygard’s observation in Release It: ‘Every architecture diagram ever drawn has boxes and arrows… A new architect will focus on the boxes; an experienced one is more interested in the arrows.’

Then there are other ideas which may take some time to make sense: ‘Structure determines what behaviors are latent in the system.’ Or ‘Systems fool us by presenting themselves- or we fool ourselves by seeing the world- as a series of events… We are less likely to be surprised if we can see how events accumulate into dynamic patterns of behavior… If the news did a better job of putting events into historical context, we would have better behavior-level understanding, which is deeper than event-level understanding. When a systems thinker encounters a problem, the first thing he or she does is look for data, time graphs, the history of the system. That’s because long-term behavior provides clues to the underlying system structure. And structure is the key to understanding not just what is happening, but why.’ Now these sentences may make more sense when we know that ‘the structure of a system is its interlocking stocks, flows, and feedback loops’. Then for example, we can make sense of why every year mango sellers tell you ‘mangoes are in short supply this year’ or why there have been multiple conflicts between say India and Pak or Iran and the US. (While we are on the topic of conflicts between nations here’s a book recommendation: Prisoners Of Geography by Tim Marshall.)

Short Takes 3

2026-04-23T00:00:00+05:30

In the short-takes posts I write about some concepts, phrases, models, ideas, whatever related to my work and life. As these are likely to be known to many, the idea is to just document these together, rather than go into details of these. Will keep updating.

Law of the minimum

Liebig’s law of the minimum states that growth is dictated not by total resources available, but by the scarcest resource (limiting factor). If a farm has ample nitrogen and phosphorus but low potassium, the yield will remain low, limited by the potassium levels. Adding more of a non-limiting nutrient (e.g., more phosphorus) does not improve growth; only adding the limiting nutrient (potassium) will increase the yield. This is related to Theory Of Constraints (The Goal, Phoenix Project), Critical Path Method, etc.
Bounded Rationality

(Came across the idea in the book Thinking In Systems)

Adam Smith says that every individual by pursuing his own interest frequently promotes that of society more effectually than when he really intends to promote it.

Unfortunately, we see multiple examples of people acting rationally in their short-term best interests and producing aggregate results that no one likes. Tourists flock to places and then complain that those places have been ruined by all the tourists. Farmers produce surpluses and prices plummet. Fishermen overfish and destroy their own livelihood.

Bounded rationality means that people make quite reasonable decisions based on the information they have. But they don’t have perfect information, especially about more distant parts of the system. Fishermen don’t know how many fish there are, much less how many fish will be caught by other fishermen that same day. Human decision-making is limited by cognitive capacity, time constraints, and imperfect information, leading people to settle for “satisficing” (satisfactory) decisions rather than optimal ones. Satisficing: Instead of maximizing utility by analyzing all options, individuals choose the first option that meets their minimum requirements, combining “satisfy” and “suffice”.
Framing effect

(Was reminded of this bias when reading the above mentioned wikipedia entry about bounded rationality.)

Framing effect is a cognitive bias where people’s decisions change depending on how options or statements are framed, even when they are logically identical. A fortune teller predicts that the king’s grandchildren will die before him whereas another says that the king will outlive his grandchildren.

Thoughts In Progress: 3

2026-04-10T00:00:00+05:30

In the Thoughts-In-Progress posts, I write about ideas which I may not be fully convinced of yet. Or about which I have not made up my mind yet- this way or that. Like Work In Progress, these thoughts are not done yet.

I have listened to the Montek Singh Ahluwalia(MSA) interview on Seen Unseen podcast a few times. I have mentioned the episode before (coincidentally, in both earlier Thoughts-In-Progress posts).

A few ideas from the episode were that infrastructure cannot be traded/ imported, government should ease the movement of people to cities maybe by building new cities, do better in primary education, health, etc.

The above ideas seem logical right away.

One idea he mentioned and I still have not been able to make up my mind about is this (idea mentioned a little after 1hr30min** into the episode): He mentions that at Oxford, Prof. Ian Little (who also taught Dr. Manmohan Singh) said that when you value a project you should value everything at world prices. For a steel manufacturing company, the value the company adds is the international value of iron ore (raw material) subtracted from the international value of steel. MSA mentions that when you evaluated like that Indian steel making companies of the past decades were inefficient. Before 1991 reforms, due to protection from world competition those companies got iron deposits cheaply and the steel they made did not command good prices in the international market.

Makes sense, right? Why then is this thought still in progress in my mind? Well, this must be true for GCC/ Iran/ Russian crude oil, etc. But isn’t this true for Indian software as well? A lot of Indian IT is still about arbitrage advantage. If the raw materials (human resources, real estate, etc. in case of IT) are valued at international prices, what kind of value do the Indian IT companies generate? Is it good enough? But on the other hand, what’s wrong with having comparatively cheap resources? Crude oil is a natural resource there and what is wrong if those countries take advantage of that? If we evaluate everything at world prices what differentiators would be there? Now, if we are talking about efficiencies, quality, systems, world competition, value added, etc. we can say that maybe Indian IT companies could do better. (Was that too politically correct?) But if you have crude oil or IT people cheaply available, why should you evaluate everything at international market prices? Every person, country will want to take advantage of the cheap resources they have and maybe trade the arbitrage advantage for say fresh water, food (in case of middle east), foreign currency, crude oil, GPUs (in case of India), rare minerals (in case of the USA).

Well, the idea makes sense in the context in which MSA mentioned it. Indian companies were protected before opening Indian markets in 1991 and they had unfair advantage and had become complacent. They were not competitive and would have faced difficulty in open market. But on the other hand no market is really free market. Open/ Free market sounds good. But a lot of countries are pro business rather than pro market. They do favors for certain players. And sometimes correctly so. If US government had not funded the banks with stimulus package after the subprime crisis the banks would have been wiped out and a pro-market approach would have been to let them die and let other players enter the market; but then the people would have suffered (if my understanding is correct). All countries do that- pro business vs pro market. USA may favor Shell, fund Stanford University, and every country (people in power) will favor cronies- their own Andani families to set up Golf resorts, oil companies, etc. And this favoritism is likely to remain till some vested interests, us-vs-them feeling are there. So with these realities why not count on cheap resources, why factor in international prices of those resources?

I agree with the principle. But not yet convinced of the need. And yet my thoughts are not exactly out-of-line with MSA’s. Because I don’t think that it is a conventionally accepted wisdom to evaluate at world prices. In fact widely accepted practice seems to be to take advantage of the resources you have- offshore some work, etc. And I think when you estimate at world prices it’s to judge the merit all things being equal. But as long as all things are not equal I am not fully sure of reasons why we should. Maybe I should read more economics, project management books.

Reading 1988 Book Introduction to Functional Programming

2025-11-07T00:00:00+05:30

Finished reading Introduction to Functional Programming by Richard Bird and Philip Wadler. The first edition. Published in 1988. This book is so old it’s not listed even on the authors’ university homepages and wikipedia entries. There is a 1998 version using Haskell that is listed. But I hadn’t read a programming book in a while, someone on twitter had mentioned this book, PDF was available, so I read it. Here are a few impressions.

The 1988 book by Bird and Wadler covers some basic concepts which are mainstream now and you see those covered elsewhere. But it is fun (and at times irritating) to read formal definitions in a somewhat closer to mathematics than programming style. You know when you define currying, foldright with symbols, etc. I won’t consider the book worth while if you have read a functional programming language book. But sometimes changing the perspective (language-maths) helps. In that sense a decent read.

And now for something completely different:

I don’t like when people pick a small area in an ecosystem and become Bible-salesman-like fanatic about it. I have heard about no-carb diets, long distance running, global warming just as much as I have heard about haskell and going cloud-native. I had become (~2011-15) quite fanatic about refactoring, agile, TDD, etc. I am not saying that these are bad things. What’s bad is being dogmatic about those.

Engineering is not purity. It is pragmatism. However much software craftsmanship, category theory, monads, clojure excite you, ideas and technologies like these are tools to better meet business needs. But there are pockets in IT people gravitate towards- e.g. emacs, agile, unix, etc. Mainly people do so because there is merit in these pockets and they do like cherry soda.

Functional programming has been one such pocket for some time. You can, of course, go back to von Neumann, Alonzo Church, lisp. But in large scheme of things it is not very old if you limit to multi-core processors, delayed release of Java 8 era, etc. People behave missionary like when it comes to pure functions, loops, declarative, referential transparency, tail-call, values/ value objects, pattern-matching, list-comprehensions, lazy evaluations, currying, map, reduce, filter, etc.

I have written functional programs- in java, scala, scheme, clojure, erlang, elixir, JS, and to some extent even in haskell. And I have consumed a lot of material advocating functional programming. And thought process wise, it does make you think differently. But you have to manage state, integrate with other systems, etc. So it’s worth repeating that engineering is pragmatism. Immutability and purely functional data structures are ok but things work better if they are made usable (like in clojure, and later elsewhere). Also, in my opinion static typing works better but then- as far as functional programming with static typing is concerned- there could be some compromises you may have to live with.

An Opinionated Programmer

2025-07-19T00:00:00+05:30

Recently Lex Fridman posted a podcast episode of his discussion with DHH (Duration 6hr+). I have previously written about another such Fridman interview- that with John Carmack. DHH is a legendary programmer. Known for his work on Ruby On Rails (RoR). You have read about him here more than once. In fact, the oldest post on this blog was about his sensible but controversial opinion. (Well, I had written a few blog posts before that one. I used wordpress for blogging then. But when migrating to/ starting with GitHub pages, those earlier posts did not seem to have stood the test of time.)

He is not limited to RoR. He is an influencer in the programmer community. He is very opinionated. And having acquired an aura and followers due to RoR work, he exerts influence in a few other areas as well. And he is quite noisy. I mean it as a compliment. For example, he created a lot of noise against App Store 30% payment rules (All app payments must go thr’ Apple and Apple will take 30% of the amount).

DHH is not an abstract-to-concrete thinker like Alan Kay or Rich Hickey (deep-thinking-as-an-activity types). But software is an opinionated world and we need such noisy people. Not all are as noisy as DHH. But we need strongly put counter opinions most of the time. Otherwise we will drown in the deafening noise created by other folks (Software Craftsmanship BS proponents, Agile consultants, Silicon Valley VCs, AWS certified cloud architects, AI hype-men/ doomsayers, LinkedIn fiction writers).

(There might be some but) There does not seem to be any malice against the entity he is crusading against at a particular time. For example, these days he seems to be anti-Apple, but when M1 chips came out, he was quite vocal how they outperformed the best chips of the time. He was quite smitten by iPhone javascript performance and praised Safari in the past. But that is not limited to Apple. He has been quite noisy about how his company saved millions by moving out of AWS onto their own servers in professionally managed data centers. But he does not seem to be against everything Amazon/ AWS. He has been supportive of how Musk handled twitter. But he’s called Musk delusional as well.

One thing he mentions in this interview with Fridman is that he being an introvert never liked to pair program with another programmer but these days AI helps him as a pair programmer. I did not think of DHH as an introvert. This is the thing… He knows how to be opinionated and when to create noise, etc. He wrote that ‘TDD is dead’ post some days before his rubyconf keynote. He started App Store payment noise just ahead of WWDC. He started talking about SSR when they were about to launch Hey. Can such person be an introvert?*

He is rich and can afford luxury. He can also afford luxury beliefs but does not seem to have many such beliefs. He is not a tech bro but at times does sound like one. And there was that smear-campaign like ‘basecamp funny names’ thing shortly after he testified against Apple.

I don’t agree with everything he says. For example, static vs dynamic typing from the interview. In fact, I don’t even read everything he writes on his blog. But a lot of what he says makes actual sense. He is one of those very few people in software, who you can agree with, and yet choose not to be dogmatic followers/ bhakt of the person or everything they say.

*Marcus Geduld made this point about introverts. I find it insightful. When I was writing about DHH saying he’s an introvert, I remembered Marcus Geduld immediately. I must have read that answer some 8-10 years ago so I was not sure whether it’s by Marcus or my memory was playing tricks on me. But google search and quora related content helped me find this link quickly.

Short Takes 2

2025-06-12T00:00:00+05:30

PuLa

There are very few people, possibly there isn’t even one, who I admire as much as I admire late PuLa.

Thanks to him I got introduced to a lot of good things which have stayed long with me- from eating fish to reading Wodehouse.

An important factor is that this admiration, respect, etc. for PuLa has lived long for me. As I matured, I understood some of his shortcomings and limitations, etc. But I also appreciated some other things which I did not earlier. For example, his writing is limited to Marathi language. It is not easily translated to other cultures, languages, etc. In one speech, he himself says that humor and poetry are rooted deeply in the language in which they are written and cannot be easily lifted and shifted. So he was aware of this. Second, many of his plays (excluding ‘Tuze Ahe Tujpashi’) are musical comedies, lacking drama. His translations (Pygmalion to ‘Ti Phulrani’, Old Man And The Sea- ‘Eka Koliyane’) are also said to lose something in translation. On the other hand, he reworked some portions of Pickwick Papers into- deeply rooted into Marathi- the iconic ‘Batatyachi Chawl’. And when he wrote about people he met in real life (Raosaheb, for example) the balance which he struck between his admiration and knowing that the person he was writing about was not perfect is just breathtaking. I started appreciating this only after a friend pointed it out to me. It is a very difficult balance to achieve.

His fine observation skills, humor, music, etc. are discussed and admired- almost worshipped- a lot. I also admire him because he worked in many areas (writing, films, music, plays, social activities, speaking truth to power, etc.) and in all these areas he did good- almost excelled. He was not always successful (the ‘Subkuch PuLa’ movie ‘GuLaacha Ganpati’, for example). But wherever he went he created a happy place. I don’t know a single- a single- person who did this better than him. I am not saying he was Leonardo Da Vinci. But, anyway. In one QI episode Ronnie Ancona called Stephen Fry a renaissance man. I think PuLa was the same.

In one interview PuLa mentions that he wrote as if he was speaking to someone sitting in front of him. Because I admired his writing I also picked this way of writing to some extent (Wodehouse must have contributed, as well). And at times it is annoying. Why should I write that way when I am writing some technical notes? It makes you somewhat dependent on mostly apathetic readers.

Perhaps the thing that I admire most in him is that he was very appreciative of quality in people and things. Here by appreciative I mean he spotted, praised, and encouraged talent early on. Here are some examples, the first performance of late Bhimsen Joshi in Pune was at PuLa’s place. The first letter saying ‘I liked your poems’ to the then unknown poet Mahanor was by PuLa. He was the first to support the Dalit literature (He wrote a forward to Anand Yadav’s book and told Narayan Surve that his poem MoneyOrder stunned him). I may not be a people person like PuLa but I hope and strive to appreciate good things like he did.
Madeleine moment

Also known as involuntary memory. What was the most recent (and somewhat dramatic) Madeleine moment you experienced? I recently spent a few weekends away from city. And once saw quite a few fireflies. Some memories were triggered. But those were neither dramatic nor quite close to heart. I should meditate more.
Language and script (in entertainment)

Language is something that is primarily spoken. Script is written. Anurag Kashyap says- in this interview- that from Rangeela onwards movies started using spoken language more than language in its pure, polished form. Their (Ram Gopal Varma and Anurag Kashyap) movie Satya has long been a favorite. But I remember I didn’t understand some of the language nuances and a senior friend had to explain those to me. Even now, the expletive packed language of north Indian cinema is not for me. I do use expletives. But while, for example, the raw language fits well in Gangs of Wasseypur movies, I don’t like blatant use of such language everywhere.
Thought subversion is difficult

What kind of thought subversion are we comfortable with? For example, if you are from India/ Asia or were born in a religion that originated in India, would you be able to believe that Karma does not exist, that you are not reborn, that no energy flows thr’ your spine? Well, I could not do it. Christian folks may find it difficult to not believe in Original Sin. Is there any reason why one should think that life is sacred? I don’t bother about past/ future lives but that is when I am thinking/ acting consciously. But the shackles of Karma are difficult to shake off. And as a result I find it easier (and I am almost sure that you’ll find it illogical) to reconcile that while maybe past/ future lives don’t exist, there are Samskara or Saṅkhāra due to karma in this life. But if someone argues with me that all that samskara thing is just escapism non-sense, I would not be able to counter that person well. Yet, such things have a hold on us. Ok, forget religious, spiritual things. Here is some text from Feynman’s book ‘Surely, You’re Joking, Mr. Feynman’: ‘Von Neumann gave me an interesting idea: that you don’t have to be responsible for the world that you’re in. So I have developed a very powerful sense of social irresponsibility as a result of Von Neumann’s advice. It’s made me a very happy man ever since. But it was Von Neumann who put the seed in that grew into my active irresponsibility!’ While I fully understand that it is a healthy attitude, I have not been able to have that even for short periods of time. Why do ideas, thoughts have such suffocating hold on us? Maybe I am someone who takes themselves/ their ideas more seriously than they should. Maybe it is due to culture, inbreeding, geography, system around us. Actionable part is: how can we escape such holds? Maybe create an environment/ ecosystem around us which promotes different thoughts, and think deeply. What else?
Test cricket and changing tastes

I did not like Test cricket in the past. It’s not easy to find time to watch a match that can go on for 5 days (for men) and still end in a draw. But sometimes you can watch some of it over the weekend, in the background, in the evening thanks to timezone differences, etc. There’s no doubt T20 and One Day matches are here to take cricket forward and Test cricket will eventually be at their mercy. In fact, it already is. There are fewer Tests and some cricket playing countries have more or less given up on it (in spirit, at least). There is even talk of shortening matches to 4 days, etc. But putting aside instant gratification, if you develop a taste for it you start appreciating a drawn match or series, and even a maiden over. I watched more Test cricket in last 10 years or so than I did before. And in recent years, I have come to the conclusion that if you don’t appreciate Test cricket, you simply don’t understand cricket. Current favorite series- the Ashes, Border-Gavaskar trophy, Anderson-Tendulkar trophy, and maybe WTC. Sadly, there aren’t many countries willing to spend on Test cricket. Teams play but most of the times there’s no heart into it. Teams like Pak, SL have gone downhill due to politics, etc. And capitalism has tilted the balance towards shorter formats. And yet, in the last 5 years, there have been Test matches and series which will be remembered for long.

When the changeover from the likes of Sachin, Dravid, Ganguly was taking place in Indian Cricket, a lot of things were changing- Social media, IPL, Indian cricket had more money, Mumbai was ceding ground to towns like Ranchi. I didn’t like the bearded young fellows of Indian cricket with weird nicknames (Sir, Gabbar), their antics on the ground and in front of camera, and who were comfortable with being- for the want of a better word- themselves. I thought that they were playing to the gallery. And yet today I appreciate Ravindra Jadeja. Frankly, I never thought he would last this long in the Indian side. And while I still think Rishabh Pant should ignore cameras and save risky shots for the IPL, there’s no doubt that he brings a lot to the Indian team. Recent players are more confident, gritty. It’s like Gen Z (or whatever you want to call them) people are more themselves and are breaking the mould like they do in corporate offices. And while each generation is different from the previous one, maybe there are second order effects of social media, and certainly of India’s 1991 reforms as well.
Dentists

Why do lady dentists have the habit of engaging you in, you know, small talk when you are in the dentist’s chair?
End of an era in Tennis

There isn’t always a single definite point in the arrow of time at which we can say a new era has begun and the old one’s gone. There is a transition which takes place where people, objects, practices, customs, and cultures change, and the old makes way for the new. The transition may not be, but generally is, gradual. While the new is getting established, the old may make and succeed in the last ditch attempts to survive and flourish. And yet the old has to go. It is the way of life. There may be some catalysts during this changeover. For example, the second world war accelerated the dismantling of the British empire; French, Russian revolutions and corresponding changes in regimes and societies, etc.

One such transition has recently taken place… in men’s tennis. The Big Three (Federer, Nadal, Djokovic) are gone; the Big Four if you include Murray.

Well, Djokovic is still not retired and in that sense he will make some spirited efforts to win next few championships and may win some. Just like Federer did. But Djokovic’s play being somewhat mechanical, there is no obvious weakness to work on or gap to bridge (unlike Federer where he improved his backhand dramatically and won the 2017 Aus Open). Most likely, this- being somewhat mechanical- made Djokovic a little boring for me. He’s the best returner, if someone broke his serve chances were he’d break back right in the next game, etc. Of course, his mental toughness was extraordinary (for example, 2019 Wimbledon final). So certainly there’s heart in it; it’s not all mechanical. Anyway, coming back to transition this tweet should give us some idea. Here’s the image from the tweet:
Doorman Fallacy

Till the last month I did not know about the doorman fallacy. It does not seem to have a wikipedia entry; so not sure how widely accepted the fallacy is. I vaguely knew the concept. But did not know that there’s a name for it. Very early in my career, with agile and automation (CI pipeline, CD etc.), an excitable department head decreed that we will re-skill all manual testers to be automation testing engineers, and do away with manual and exploratory testing altogether. The exercise did not end as intended. (You know the drill: they hire external consultants, go too far because they miss the nuances and the fine print in excitement, nobody is fully accountable, and people move on to tell success stories.) While imposition from top down helps align and speed things up some of these things take time, and there are second order effects. As I my understanding has widened (and sometimes deepened) about things in general I have started understanding how complex various systems are. Many times we act on them with the illusion of knowledge, control, and confidence. And yet there is so much we don’t know. Of course, we can sometimes act with confidence precisely because of our ignorance. And sometimes we have to act. We cannot have the luxury of time, of understanding everything, etc. (as if we ever can). Then there is this related thing called Chesterton’s fence. I got to know about it via youtube comments. There must be many such related concepts which I am not aware of. I suppose premium schools (for example, IITs), management schools, networks and environment help.
Pretty Little Baby

Recently I wasted some time on twitter’s ‘For you’ feed. I follow very few people on twitter and normally my twitter feed is only the people I am following; I don’t visit the ‘For you’ tab at all. I don’t have IG, FB accounts, although I have wasted time on youtube shorts before. Anyway, this twitter ‘For you’ feed doom scrolling is quite extraordinary garbage in economy size. In the past, I thought Indians ruined orkut and quora. Later I understood that it’s not limited to Indians and those sites. Every social site gets ruined. Of course, well curated feed improves even linked-in and twitter. Coming to twitter’s garbage ‘For you’ feed, I know that the algorithm will recommend me more of the kind of content I consume. But if I have not visited the feed in the past, why should I see people with one or two sentence life philosophy, gym fail videos, bro killing it, cuteness overload, linked-in like BS essays and life stories? Of course, there are the kids, daddy’s baby girl, favorite scenes from movies, cats and dogs videos (too many to link), and sometimes light humor which make it somewhat bearable. And this could be the case of frequency illusion but I noticed and came to know that the song Pretty Little Baby is quite popular- I have seen it in videos of kittens and cats, puppies and dogs, other animals, birds, toddlers, etc.
That’d be quite a spectacle

I haven’t visited eye doctor recently but it looks clear that I may have to wear age related spectacles. During childhood, I envied people with eyeglasses because I genuinely thought glasses made look people intelligent. Yet not something I imagined I would someday ‘have to’ wear. I know it’s ok but I feel like screaming ‘My eyes, my eyes’.
Sleep

Sleep- particularly nightly sleep- is more important than you realize. That’s it. That’s the short take.
Women’s ODI Cricket world cup

Well, given the current frenzy around it, the IndWvsAusW ICC WODI WC semifinal match will be forgotten somewhat sooner than you’d expect. But irrespective of who wins the world cup its impact will last forever. Its impact has already happened in a sense. Because, it was not even one particular match as such. And certainly it was not a fluke like some T20 power hitting game. Longer formats test your character better. India Women’s team has been the second best team almost consistently; at least for last few years in all formats. And Aussies are still the best. But I think the impact of match is not about the team; it’s about women’s cricket in India. This impact will take a few (5+?) years to be evenly distributed but it would be like the start of IPL or like Aishwarya and Sushmita winning those beauty pageants. Some things will be there- corruption, favoritism, vested interests, power struggles, scarcity mentality, unprofessional attitudes (acquired.fm podcast’s IPL episode said something like ‘these are just the norms’, way to do business in India, etc.). But there will be more money, more mind-space, and there will be progress. From physical fitness to professionalism. There will be unmatched bench strength. More than that people will be more invested in women’s cricket. Twitter became aware of the match as it progressed in favor of India and went crazy after the win. But the match was not trending earlier. With this win, I hope future matches will be better- for the want of an apt word- supported. To be sure the Aussies will continue to dominate cricket. They are a much better team than India in all areas and are thoroughly professional. And they are likely to remain so. But India will be a force to be reckoned with. And inspired performances like this match will build character. For example, I panicked and felt that India had as good as lost the match when Smriti got out. But maybe only for 5 seconds. I continued watching the match. Gone are the days when people switched off the TV as soon as Tendulkar got out. As the match progressed it’s evident that the Aussies panicked. Harman-Jemi partnership was the big tilt. Sending in Deepti before Richa was also a great move (if I am not reading too much into it). And I felt certain of India’s win when Ashleigh Gardner bowled a few not-wide-but-out-of-reach balls to Richa in the 45th over. When you put someone like Gardner on the back-foot like that and send the ball beyond the rope in the Ellyse Perry region of the boundary then you have arrived.

A lot of factors have gone into getting there. Indian board being the richest because people are crazy about cricket (and there are a lot of people in India), the adjacent possible, the T20 leagues (IPL, WPL, WBBL, Hundred), Harman may be retiring before the next cup, and she has swallowed the bitter pill many times in the past and would like to win the cup, Deepti’s fighting spirit, India has home advantage so would like to win, the 3 defeats on the road to the semifinal stung and possibly awoke the giant, Smriti overcoming struggles against off-spinners, etc. Jemi’s personal struggles (abuse because of her non-cricket activities like dance reels, being targeted by religious losers, etc.) were of course on a different level. Until about 2 years back she was life of the party on the ground. She seemed outgoing, extroverted, new generation, etc. While the change in her presence on the ground from the past to now was visible, I did not know that she’s fighting those battles still. That reminds me that everyone is fighting a losing battle that you know nothing about; so be kinder. Lastly, if you had asked me about a single differentiating factor during close-to-retirement years of Federer, I would have said that he had better chance of winning if he got the first serve in. In the same vein, if there’s one single factor that worked better than any other in the match it was that India kept cool and never let their run rate drop. Any suggestions for the final, you ask? Keep Renuka, Radha in the team, and as a wild suggestion send Smriti in at number 3.
Weak minds

This short take may sound like a darker side of me. But there it is. Yet let’s be clear that this is not about a person or a country or a religion. It’s more about weakness of mind that is universal. You and I have such weaknesses- maybe differing in specific aspects. Let’s take a case of a popular twitter persona- we’ll call him Kabra. Well, people from Pune cannot escape this persona if they are on twitter. Talks a lot of sense a lot of time. In that sense he is doing more good work than harm. Science, engineering, etc. And yet possibly due to driven by audience or for content engagement, ends up making compromises. People do that. But then this Orwell quote sums it up- ‘He wears a mask, and his face grows to fit it’. At one point he had written a quite messed up thread on a popular verse from the Gita. Totally screwed up meaning. His thread and replies on it told me that he did not understand the meaning- like AT ALL. And the verse is so popular that you’d think that people cannot get the basic meaning wrong. I explained correct meaning of the verse. No acknowledgement. Later he deleted his old related tweets and created a new thread on the same verse and in doing so failed to admit his old interpretation was wrong. Instead, made it sound like the old thread needed rework as the meaning can be applicable in many situations. My point is that his online persona started looking fake to me. While I have innate- well, maybe not innate- habit of correcting people, after a couple of years post college, I started doing it without any sense of superiority, and to the extent possible, privately- not publicly. With genuine desire to help. But problem is that people are not able to separate their identity from their tweets. Just like programmers think they are the code they write. Such people use their misconceptions, echo chambers, and brute force to ignore reason. These people are maybe suitable for the jobs of bad Presidents or Prime Ministers of some country. I have been wrong hundreds of times and have admitted to being wrong thousands of times. Ha ha. Now you may say that I am confusing one-off incident with the person- sort of like treating a programmer same as the code they wrote. But due to incentives of user engagement such people are supple. They can sense and adapt the narrative to that which is currently popular. These are the people who always like the top ranked male tennis player. A couple of years post the Gita verse thing, he started some youtube show and in one episode attempted to explain science behind a peculiarly Indian idea… you know one of those ideas like rebirth, karma, avatar, etc. Why? Why do rational sounding intellectuals do that? Why can’t they keep these things separate? Follow the rituals if you want. Religion is not a bad thing. I know a lot of reasons why people do this. Yet this Huxley observation hurts me. Maybe not as much as Naipaul’s ‘A wounded civilization’. Because even though criticized for colonial mindset, Naipaul tries to find reasons why people do that. And the reasons also hurt. It may appear more pronounced in Indian mind but is not limited to it. For example, in the Ross-Phoebe evolution argument Ross should have denied even a teeny tiny possibility that the theory of evolution could be wrong. Until proven otherwise- demonstrably, repeatably.
Is-Ought Fallacy

Recently, while watching the Everything is everything episode on evolution I came to know about the name for the concept- Is-Ought problem or fallacy. Read more about it on the wikipedia page. One of the things they missed discussing is that there is no directionality to evolution. (Well, Ajay did discuss about there being no destination, evolution never ends, etc.) But for example, human brains (some parts) are fairly better developed than in other species but may deteriorate over generations if say we leave everything to AGI. Evolution works on Good-Enough principle- not on perfecting principle. (This last statement was from a Robert Full TED talk which has been archived.) Of course, on evolutionary time-scale there could be local maximas.

Another thing I learned from the episode is that ‘Science expands knowledge but it doesn’t necessarily expand language’.
Content Creation Pressure

They recently stopped making Everything Is Everything(EiE) episodes. I watched a few episodes. I used to read Amit’s blog (maybe 2009-10), and have listened to a few episodes of his The Seen And The Unseen(TSATU) podcast. I learned many things. But I understand why they stopped EiE. Well, Amit’s explanation in the final episode made sense. But Ajay’s reasons resonated. If you have listened to TSATU in the past, a lot of content (particularly from Amit’s side) on EiE was known to you. Of course, the ideas were sometimes put in different contexts, etc. But there was bound to be some repetition. The Sanskrit concept of Nava-rasa (9 emotions(?)) can be packaged in various combinations to create a lot of things. In that sense some ideas are bound to be repeated. Another example- maybe you have read on this blog a few repeated things- importance of naming things, identifying things by their proper names, clean code, SAFe bashing, cognitive biases, etc. When people repeat these things it’s likely because those are close to their hearts- which is true for things repeated on this blog at least. There could be other reasons. One reason, for example, is that those things sell- like dogs feeling sleepy videos or soccer girls rolling their shorts up. But some people are more comfortable with producing what sells than others. And this is where Ajay’s reasons resonated. You have only so much to say; once you’ve said it you can utilize time better by doing some other things, etc. And creating serious content can drain energy. When creating a presentation how many times do you revise the slides? (Is this a case of ‘science does not necessarily advance the language’ mentioned above? Slides? What do you mean slides? If you insist on living in age old days why not call them transparencies instead?) Anyway, bullet pointed talk is still ok. Let’s do one experiment. Say you are going to discuss or talk about some idea for an hour. Whatever the idea you want to present, talk about may be (not lasagna recipe, of course)- just write down what you are going to talk about. Why write the content down? To give it a fixed form. Then see how many revisions/ corrections you end up doing over a couple of days. Read it after 10-15 days. Next, create such write up once every week for say two months on different topics- serious enough, and in-depth and broad enough to interest your colleagues- people more or less in your intellectual bandwidth. It’s difficult. This is why- even though I liked the EiE- I am not sad that they stopped. The versions we see must have taken a lot of work coming up with content, editing, etc. I will consume TSATU episodes and what Amit comes back with next. But I will also watch out for Ajay’s work, if it’s not too specific to economics and if I can grasp it.
Content targeted towards audience

When you create content, you may have a certain target audience in mind. Wodehouse wrote stories set in upper class old world England with Earls and butlers because he saw that there was a curiosity for such thing among his readers. PuLa wrote primarily for educated, somewhat nostalgic, middle class Marathi brahmins of his era who lived in the cities. During the process some authors educate their readers. Many Marathi writers before PuLa also wrote for similar audience. But they were stuck in that world. For example, humor by authors like C. V. Joshi before PuLa could not rise beyond certain limits set by the audience it was targeted towards. I think when writers like PuLa educate and elevate readers to a higher plane, they work through/ with the readers. Mangesh Padgaonkar mentions that in some plays Shakespeare added vulgarity, sexual innuendos, etc. at the beginning of the play to get people inside the theater on time. Like Mani Ratnam adding dance numbers at the beginning of his movies. People get in and forget the outside and then the drama starts. Another example, Romeo in early acts describes Rosaline using shallow ideas, and even describes Juliet in abstract terms. His love sound bookish. But in the later acts his descriptions, desires acquire more maturity, spiritual references; the love becomes deep. This is to show his own growth and development as a lover but also to get the viewers invested so that the tragedy at the end hits hard. (This- working through/ with the readers- is not always the case. Some authors may elevate the readers in spite of those same readers (they give them cars when people would’ve demanded faster horses).) Amit educated me in the areas of economics, books, ideas, etc. via The Seen And The Unseen podcast. His approach is scholarly in that he cites/ credits quite a few references, books where he came across those ideas, etc. On Everything is Everything youtube podcast, maybe because they had to work with the readers/ viewers, there were certain limitations- from topics to depth. Can you imagine what kind of content they will be able to create if they changed the audience from the likes of you and me to say people already on a somewhat higher plane? Whether they’ll want to do that may depend on various factors. But it could be one of the reasons they stopped EiE.
English grammar

I did my schooling in Marathi medium. English was the third language during school. While written English comes easier to me than spoken; it’s not my first language in either form. And it sounds like an excuse (and at some level it no doubt is) but you may see some grammatical errors here. Sentence construction, order of words, may-might, a-an-the, I-me, would, who-whom, shall, etc. By the way, do people still use shall or is it limited to grandpas?
Straw man fallacy

A straw man fallacy is when people counter an argument that’s different from the one actually being discussed. They also don’t understand, or if they understand then they don’t acknowledge the difference. Not exactly closer to what Kahneman calls bias involving substitution where people answer cognitively easier question instead of the actual hard question that was asked.
System Dynamics

System thinking/ dynamics is an approach where you look at a group of interacting subsystems as a whole from a viewpoint to understand how they interact with and affect each other and the whole.

In the past, I mentioned “Thinking In Systems”. Weinberg in his An Introduction to General Systems Thinking quotes Tagore poetry: “By plucking her petals you do not gather the beauty of the flower.” (I left both books half read, by the way.) Also, take a look at this video explaining difference between Analysis and Synthesis approaches. If you have an engineering (or related field) background, you are familiar with and have used both these approaches. And yet, statements like Structure generates behaviour immediately hit like a mathematics formula that tells you an absolute truth. Makes sense right away. For example, adding training wheels when a kid is learning to ride a bike may help. Or moving an energy draining person from one team to another seems to improve that person’s as well as both teams’ performance. And it could be one of the reasons why countries want to bring about a change in regime of an enemy country. The linked video makes an interesting point that “our first instinct is to blame people in the system and that is a low leverage response.” This seems to counter Weinberg’s second rule of consulting that says “No matter how it looks at first, it’s always a people problem.” But the video says that actions people take are possible, feasible, and rational because of the universe in which they exist. And dynamics emerge from the interaction of physics, information availability, and decision rules. They make another important point that there are no side effects; they are only effects you haven’t thought about yet. Also, check out this chapter wise summary of Thinking In Systems.
She walked in beauty like a starlit night

I now know that it’s not the exact line from that poem by Lord Byron. I remembered it like ‘starlit night’. Is this called mondegreen? I must have read the line somewhere- possibly in some Marathi book- and memorized it incorrectly. But it’s likely that the author quoted it like that. Also, I cannot be sure, but I don’t think I have read/ recalled/ used that phrase in the last 10-15 years. I can’t say what triggered it but when it came to mind today, it was sudden and came so forcefully that I had to stop and google it. This is when I found the poem. I know I am going to come back to this poem many times for the lines ‘Where thoughts serenely sweet express, How pure, how dear their dwelling-place’. This is also a… At this point- keeping the previous sentence half finished- I had to go and check if the line- ‘starlit night’- was there where I thought it would be. It was in the book I thought it’d be in and in the article I thought it’d be in and it was like I remembered it- ‘starlit night’, and it’s made me happy. Let’s complete the earlier half finished sentence some other time.

GFS Paper

2025-03-24T00:00:00+05:30

Here is my understanding about Google File System from reading the GFS paper. Some other references: MIT’s lecture video. I will use mostly text here- some of it is directly copied from the paper. So if you need a video walkthrough of the paper with diagrams, etc, here is a good youtube video.

This paper was published in 2003, and while the concepts used were not new to academia, GFS put those to use in production. So in that sense it was revolutionary. Why did I read it now? Well, DeepSeek’s 3FS was recently announced/ open sourced and I wanted to read it but not much material seems available yet (Some reference is there as a part of an old paper… maybe I will read that next). But maybe that was a trigger.

The ideas relevant in this description include: distributed applications, serialization, heartbeat pings, consistency, hierarchical file system, write-ahead-log, etc.

GFS was designed with specific goals/ usage patterns for Google’s applications. For example, random access is not a stated goal, files are large, mostly blocks are appended than overwritten, etc. The paper mentions documents- something like web-crawler result files. But maybe while reading the paper, it’d help to think of large files- like GB sized youtube videos- being chunked apart and pieced back together.

Client applications access GFS using a library. This library provides an API and some utilities. When the word ‘client’ is used in the rest of this post, think of it as this client library.

A GFS cluster consists of a single master and multiple chunk-servers and is accessed by multiple clients. They use commodity hardware.

GFS stores large files by splitting them into fixed sized chunks of 64MB each. Each chunk is stored as a separate file in linux local disk. The machines storing these chunks are called chunk servers. Chunks of a file may be stored on different chunk servers. Chunks are replicated and replicas are stored on different servers. (The paper talks about racks, etc. but not much about data centers. Maybe they were using a single data center then.)

The workloads primarily consist of two kinds of reads: large streaming reads and small random reads. The workloads also have many large, sequential writes that append data to files.

In applications using GFS, files are often used as producer-consumer queues or for many-way merging. Hundreds of producers, running one per machine, concurrently append to a file. Atomicity with minimal synchronization overhead is essential.

High sustained bandwidth is more important than low latency.

Each chunk is identified by an immutable and globally unique 64 bit chunk handle assigned by the master at the time of chunk creation. Chunkservers read or write chunk data specified by a chunk handle and byte range. Large chunk size reduces metadata stored on the master and also reduces client-master interaction for this metadata. Clients can also keep TCP connection to chunk server alive for longer duration as they need to talk to fewer chunk servers due to bigger chunk sizes.

The master maintains all file system metadata. This includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks. It also controls system-wide activities such as chunk lease management, garbage collection of orphaned chunks, and chunk migration between chunkservers. The master periodically communicates with each chunkserver in HeartBeat messages to give it instructions and collect its state.

Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunkservers. Clients never read and write file data through the master. Instead, a client asks the master which chunkservers it should contact. The client then caches this information for a limited time and interacts with the chunkservers directly for many subsequent operations.

First, using the fixed chunk size, the client translates the file name and byte offset specified by the application into a chunk index within the file. Then, it sends the master a request containing the file name and chunk index. The master replies with the corresponding chunk handle and locations of the replicas. The client caches this information using the file name and chunk index as the key.

The client then sends a request to one of the replicas, most likely the closest one. The request specifies the chunk handle and a byte range within that chunk. Further reads of the same chunk require no more client-master interaction until the cached information expires or the file is reopened. In fact, the client typically asks for multiple chunks in the same request and the master can also include the information for chunks immediately following those requested. This extra information sidesteps several future client-master interactions at practically no extra cost.

The Master and metadata

The master stores three major types of metadata: the file and chunk namespaces, the mapping from files to chunks, and the locations of each chunk’s replicas. All metadata is kept in the master’s memory. The first two types (namespaces and file-to-chunk mapping) are also kept persistent by logging mutations to a write-ahead-log stored on the master’s local disk and replicated on remote machines. This helps when master fails and in-memory data is lost. Based on the write-ahead-log (in the paper they call it operations log), the files and chunks, as well as their versions, are all uniquely and eternally identified by the logical times at which they were created. The write-ahead-log being critical persistence record is replicated on multiple remote machines and the master responds to a client operation only after flushing the corresponding log record to disk both locally and remotely (flushed in batches; not real time). To minimize startup time, they keep the log small. The master checkpoints its state whenever the log grows beyond a certain size so that it can recover by loading the latest checkpoint from local disk and replaying only the limited number of log records after that. The checkpoint is in a compact B-tree like form that can be directly mapped into memory and used for namespace lookup without extra parsing. The master switches to a new log file and creates the new checkpoint in a separate thread. The new checkpoint includes all mutations before the switch. It can be created in a minute or so for a cluster with a few million files. When completed, it is written to disk both locally and remotely. Recovery needs only the latest complete checkpoint and subsequent log files. Older checkpoints and log files can be freely deleted, though they keep a few around to guard against catastrophes. A failure during checkpointing does not affect correctness because the recovery code detects and skips incomplete checkpoints. This persistence mechanism reminded me of SSTables and memtable.

The master does not store chunk location information persistently. Instead, it asks each chunkserver about its chunks at master startup and whenever a chunkserver joins the cluster.

GFS Consistency Model

I think this section would be easier to understand if you watch MIT’s video linked above and the leases and mutation order section below.

A mutation is an operation that changes the contents or metadata of a chunk such as a write or an append operation. When a mutation succeeds without interference from concurrent writers, the affected region is defined (and by implication consistent): all clients will always see what the mutation has written. Concurrent successful mutations leave the region undefined but consistent: all clients see the same data, but it may not reflect what any one mutation has written. Typically, it consists of mingled fragments from multiple mutations. A failed mutation makes the region inconsistent (hence also undefined): different clients may see different data at different times.

Data mutations may be writes or record appends. A write (in infrequent cases, it could be an overwrite as well) causes data to be written at an application-specified file offset. A “regular” append is merely a write at an offset that the client believes to be the current end of file. Whereas a record append causes data (the “record”) to be appended atomically at least once even in the presence of concurrent mutations, but at an offset of GFS’s choosing. The offset is returned to the client and marks the beginning of a defined region that contains the record. In addition, GFS may insert padding or record duplicates in between. They occupy regions considered to be inconsistent and are typically dwarfed by the amount of user data.

After a sequence of successful mutations, the mutated file region is guaranteed to be defined and contain the data written by the last mutation. GFS achieves this by (a) applying mutations to a chunk in the same order on all its replicas, and (b) using chunk version numbers to detect any replica that has become stale because it has missed mutations while its chunkserver was down. Afterwards, these stale replicas are never involved in another mutation or given to clients asking the master for chunk locations. They are garbage collected at the earliest opportunity.

Readers deal with the occasional padding and duplicates as follows. Each record prepared by the writer contains extra information like checksums so that its validity can be verified. A reader can identify and discard extra padding and record fragments using the checksums. If it cannot tolerate the occasional duplicates (e.g. if they would trigger non-idempotent operations), it can filter them out using unique identifiers in the records.

Leases and Mutation Order

Each mutation is performed at all the chunk’s replicas. GFS uses leases to maintain a consistent mutation order across replicas. The master grants a chunk lease to one of the replicas, which is called the primary. The primary picks a serial order for all mutations to the chunk. All replicas follow this order when applying mutations. Thus, the global mutation order is defined first by the lease grant order chosen by the master, and within a lease by the serial numbers assigned by the primary.

The client asks the master which chunkserver holds the current lease for the chunk and the locations of the other replicas. If no one has a lease, the master grants one to a replica it chooses (not shown in the above diagram).
The master replies with the identity of the primary and the locations of the other (secondary) replicas. The client caches this data for future mutations. It needs to contact the master again only when the primary becomes unreachable or replies that it no longer holds a lease.
The client pushes the data to all the replicas. A client can do so in any order. Each chunkserver will store the data in an internal LRU buffer cache until the data is used or aged out. By decoupling the data flow from the control flow, we can improve performance by scheduling the expensive data flow based on the network topology regardless of which chunkserver is the primary.
Once all the replicas have acknowledged receiving the data, the client sends a write request to the primary. The request identifies the data pushed earlier to all of the replicas. The primary assigns consecutive serial numbers to all the mutations it receives, possibly from multiple clients, which provides the necessary serialization. It applies the mutation to its own local state in serial number order.
The primary forwards the write request to all secondary replicas. Each secondary replica applies mutations in the same serial number order assigned by the primary.
The secondaries all reply to the primary indicating that they have completed the operation.
The primary replies to the client. Any errors encountered at any of the replicas are reported to the client. In case of errors, the write may have succeeded at the primary and an arbitrary subset of the secondary replicas. (If it had failed at the primary, it would not have been assigned a serial number and forwarded.) The client request is considered to have failed, and the modified region is left in an inconsistent state. The client library code handles such errors by retrying the failed mutation. It will make a few attempts at steps (3) through (7) before falling back to a retry from the beginning of the write.

If a write by the application is large or straddles a chunk boundary, GFS client code breaks it down into multiple write operations.

While control flows from the client to the primary and then to all secondaries, data is pushed linearly along a carefully picked chain of chunkservers in a pipelined fashion.

Atomic Record Appends

(Again MIT video may help.) GFS provides an atomic append operation called record append. In a traditional write, the client specifies the offset at which data is to be written. Concurrent writes to the same region are not serializable: the region may end up containing data fragments from multiple clients. In a record append, however, the client specifies only the data. GFS appends it to the file at least once atomically (i.e., as one continuous sequence of bytes) at an offset of GFS’s choosing and returns that offset to the client.

Record append is a kind of mutation and follows the control flow in the diagram above with only a little extra logic at the primary. The client pushes the data to all replicas of the last chunk of the file. Then, it sends its request to the primary. The primary checks to see if appending the record to the current chunk would cause the chunk to exceed the maximum size (64 MB). If so, it pads the chunk to the maximum size, tells secondaries to do the same, and replies to the client indicating that the operation should be retried on the next chunk. (Record append is restricted to be at most one-fourth of the maximum chunk size to keep worst-case fragmentation at an acceptable level.) If the record fits within the maximum size, which is the common case, the primary appends the data to its replica, tells the secondaries to write the data at the exact offset where it has, and finally replies success to the client.

If a record append fails at any replica, the client retries the operation. As a result, replicas of the same chunk may contain different data possibly including duplicates of the same record in whole or in part. GFS does not guarantee that all replicas are bytewise identical. It only guarantees that the data is written at least once as an atomic unit. This property follows readily from the simple observation that for the operation to report success, the data must have been written at the same offset on all replicas of some chunk. Furthermore, after this, all replicas are at least as long as the end of record and therefore any future record will be assigned a higher offset or a different chunk even if a different replica later becomes the primary.

Snapshot

When the master receives a snapshot request, it first revokes any outstanding leases on the chunks in the files it is about to snapshot. This ensures that any subsequent writes to these chunks will require an interaction with the master to find the lease holder. This will give the master an opportunity to create a new copy of the chunk first. After the leases have been revoked or have expired, the master logs the operation to disk. It then applies this log record to its in-memory state by duplicating the metadata for the source file or directory tree. The newly created snapshot files point to the same chunks as the source files. The first time a client wants to write to a chunk C after the snapshot operation, it sends a request to the master to find the current lease holder. The master notices that the reference count for chunk C is greater than one. It defers replying to the client request and instead picks a new chunk handle C’. It then asks each chunkserver that has a current replica of C to create a new chunk called C’. By creating the new chunk on the same chunkservers as the original, the data can be copied locally, not over the network.

Master Operation

The master executes all namespace operations. In addition, it manages chunk replicas throughout the system: it makes placement decisions, creates new chunks and hence replicas, and coordinates various system-wide activities to keep chunks fully replicated, to balance load across all the chunkservers, and to reclaim unused storage.

GFS logically represents its namespace as a lookup table mapping full pathnames to metadata. With prefix compression, this table can be efficiently represented in memory. Each node in the namespace tree (either an absolute file name or an absolute directory name) has an associated read-write lock. One nice property of this locking scheme is that it allows concurrent mutations in the same directory. For example, multiple file creations can be executed concurrently in the same directory: each acquires a read lock on the directory name and a write lock on the file name. The read lock on the directory name suffices to prevent the directory from being deleted, renamed, or snapshotted. The write locks on file names serialize attempts to create a file with the same name twice.

Replicas: Creation, Re-replication, Rebalancing

Chunk replicas are created for three reasons: chunk creation, re-replication, and rebalancing. When the master creates a chunk, it chooses where to place the initially empty replicas. It considers several factors. (1) Place new replicas on chunkservers with below-average disk space utilization. Over time this will equalize disk utilization across chunkservers. (2) Limit the number of “recent” creations on each chunkserver. Although creation itself is cheap, it reliably predicts imminent heavy write traffic because chunks are created when demanded by writes, and in append-once-read-many workload they typically become practically read-only once they have been completely written. (3) Spread replicas of a chunk across racks.

The master re-replicates a chunk as soon as the number of available replicas falls below a user-specified goal.

The master re-balances replicas periodically: it examines the current replica distribution and moves replicas for better disk space and load balancing.

Master Replication

If its machine or disk fails, monitoring infrastructure outside GFS starts a new master process elsewhere with the replicated write-ahead-log. Clients use only the canonical name of the master (e.g. gfs-test), which is a DNS alias that can be changed if the master is relocated to another machine. Moreover, “shadow” masters provide read-only access to the file system even when the primary master is down. They are shadows, not mirrors, in that they may lag the primary slightly, typically fractions of a second.

Data Integrity

Each chunkserver uses checksum to detect corruption of stored data.

A 64MB chunk is broken up into 64KB blocks. Each has a corresponding 32 bit checksum. Like other metadata, checksums are kept in chunkserver’s memory and also stored persistently with logging, separate from user data.

For reads, the chunkserver verifies the checksum of data blocks that overlap the read range before returning any data to the requester, whether a client or another chunkserver. Therefore chunkservers will not propagate corruptions to other machines. If a block does not match the recorded checksum, the chunkserver returns an error to the requestor and reports the mismatch to the master. In response, the requestor will read from other replicas, while the master will clone the chunk from another replica. After a valid new replica is in place, the master instructs the chunkserver that reported the mismatch to delete its replica.

GFS client code further reduces this overhead by trying to align reads at checksum block boundaries. Moreover, checksum lookups and comparison on the chunkserver are done without any I/O, and checksum calculation can often be overlapped with I/Os. (The optimization mentioned in the last part- overlapped with I/O- makes me sad for not getting a CS degree. How many such fine things must be going wasted on me just because I am not aware enough!)

Some passing observations

When you are learning about distributed systems, this is one of the first papers/ systems you learn about. Not many advanced concepts are used (transactions, multi-master, paxos, for example), some trade-off are made (like on consistency). And yet a great system was designed (the paper shares some metrics of actual GFS clusters being used). In that sense it was a very simple paper. Probably, of all the tech-papers I read so far, I finished reading this the fastest. My basic familiarity with some of the concepts (thanks to having read some other papers, Designing Data Intensive Applications book, etc.) must have helped.

In all distributed systems, failures are far too common. You probably even remember that Leslie Lamport’s quote or maybe Joe Armstrong view of the world. But from the papers which I have read/ documented on this site, only Google’s BigTable paper talked about error scenarios/ worst cases enough. Of course, all these systems, do handles failure, etc. But the papers did not talk a lot about those. Ever since reading Michael Nygard’s book ‘Release It’, that is the first thing I think about when looking at a system- how does it handle failure, is it a recovery-oriented design, etc.

Lights Out: A book about fall of General Electric

2025-01-31T00:00:00+05:30

I recently read the book Lights Out subtitled Pride, Delusion, And The Fall Of General Electric. It talks about troubles at GE. It was published in 2021 and GE stock has improved a lot since then. BillG recommended the book in past.

Around the 1999-2000-01 turn of the century (or if you prefer, millennium/ Newmannium) time-frame, magazines made a few lists like best discovery/ invention of human history*, greatest human being of last century (Gandhi, Einstein), best company of last century (GE, Coca Cola), etc. If I remember correctly, GE was voted the best company and Jack Welch the CEO of the century. Soon he planned to retire and the next CEO was chosen. I first came to know about GE around that time.

While I will talk about the book it will be mixed up with my own opinions, perceptions, etc. Some of it will not even be related to the book or GE. And some of this post may come across as unstructured, half baked thoughts. Or a quote after quote from the book. Please bear with that.

While I may use past tense and maybe GE is reduced to a fraction of its former self, it is not dead. And maybe the worst is over.

If interested, as a mental exercise, while reading this post, picture this in your mind: some 10-20 years down the line from today, Apple has gone out of business or your nation has become a failed state, and you are reading the post mortem. What will differ and what will be similar to the GE story?

GE was a conglomerate for most of its existence. GE was a brand. The company had its roots in personalities like Edison (mostly for the sake of name) and Morgan (of JPMorgan). It made great products from appliances like alarm clocks, fans to jet engines and turbines. It was in healthcare, insurance, medical systems (C arm was produced in India if I remember correctly). It owned NBC, and produced Friends, and Seinfeld. It spanned across generations and geographies. If I am not wrong, at one point it was the largest software exporter from India.

While GE was best at engineering, in the corporate world it was popular for management, numbers, dividends, six-sigma, etc. Jack Welch was a big factor in this. The GE brand, logo, the GE way, management/ executive programs, culture had acquired an aura of mythical sort and proportion.

The book talks about GE’s fall from grace. In short, it did not face innovators dilemma- Sony Walkman vs iPod type (maybe renewable energy challenged its oil, power business.) Its business did not get cannibalized. In plain words the businesses were mismanaged. The practices which GE followed (particularly accounting practices), while not outright fraudulent, were ‘success theatre’. They manipulated numbers so that the quarterly results looked good (and had culture and practices which allowed that).

The book does not place a lot of direct blame on Welch but, of course, he’s responsible for some of the mess. It just happened to surface after he stepped down.

‘The GE of the late twentieth century and the fitful first two decades of the twenty-first has stood for more than what it made. It has represented a capitalistic meritocracy, a locus not just of success but of a certain version of virtue— the virtue of targets made, goals surpassed, earnings earned, markets won. And it has stood for a vague but well-marketed notion that, in the unapologetic pursuit of a company’s fortunes, and one’s own, there is a certain uprightness— and a lesson for others. But GE has stood for a well-bred hubris as well.’

‘GE wasn’t the company that most people imagined.’

‘GE’s size advantage was also one of its biggest weaknesses.’

The CEO was also the chairman of the board. So no-one really questioned/ opposed him. ‘The absence of robust opposition also pointed to the broader problem, long cultivated and growing into a quiet crisis within the company, of real candor and self-awareness.’

‘The board was openly dysfunctional, problems at the company were deep and troubling…’

It’s a fact that nothing succeeds like success. If you show great profits, dividends, etc. year after year and decade after decade you will acquire an aura. You derive credibility for yourself, for your strategies, for management style from the brand. And even if the good results are thanks to some loopholes and the numbers are gamed, you start believing the myth. You create ivory towers, echo chambers, and yes men. The book mentions how Jack Welch and Jeff Immelt did not want to hear any bad news. Immelt used to say ‘where’s the guy I used to know’ (who had good news for me). This ‘can you do it or shall I find someone who can’ approach can work both positively and negatively. You know ‘speaking truth to power’ thing.

‘There was no market for hard truths or bad news. Not as far as the guy at the top was concerned.’

Immelt

Immelt was chosen and groomed by Jack. Four days after Immelt became CEO, the 9/11 attacks happened. Later sub-prime crisis happened. And GE Capital suffered. Immelt tried to separate the troubled GE Capital from other businesses. And focused on making the conglomerate more industrial business, more GE Power (turbines, oil & gas) centric. He did away with NBC, appliances, etc. Why did GE move out of appliances business: ‘In a culture in which people were accustomed to tossing more treasured devices like phones and computers after just a few years, no one expected to get a full generation of use out of a dryer anymore.’

People felt Immelt (even before he became the CEO) paid more to make deals happen.

Immelt era did not seem to generate enough cash. The numbers were manipulated (a practice from pre-Immelt era.) But he went on buying shares back to save on dividends and yet promised a dividend target which he or, later, Flannery could not meet.

‘Jeff Immelt will not be remembered for wisely deciding how to spend GE’s cash.’

‘Under Immelt, the company believed that the will to hit a target could supersede the math…’ (An aside: I am never comfortable with American way of saying math instead of maths.)

GE Capital

GE Capital, ‘in the months leading up to the crash (sub-prime), had been effectively the seventh-largest bank in the country, but its operations had been almost impenetrable. It was a “black box,” everyone had said with a shrug— until the black box had almost dragged the company down for good.’

‘The rigorous protocols of the banking industry also bore little resemblance to the customs that prevailed at GE Capital. Like the parent company, GE Capital’s system for evaluating and monitoring deals was rooted in personal accountability more than enterprise risk.’

‘The world in which Jeff Immelt had thought he would be leading GE had been turned upside down. The recession and the uncertainty that followed the terrorist attacks had dampened the global growth on which GE’s industrial businesses depended. And changes to accounting rules in the wake of the Enron scandal… had eliminated an easy and reliable source of paper profits to smooth over rough periods.’

GE Capital had to be bailed out post sub-prime crisis. And had to agree to some oversight from federal agencies/ bodies.

‘The investigative team soon found a slew of questionable practices that showed the scope of GE’s ability to stretch, manipulate, and ignore accounting practices that are required of all public companies for the protection of investors.’

And ‘in addition, the broader market had lost its former patience for letting big industrial companies mess around in the risky business of finance.’

‘…gut-driven management style could be rationalized— or explained at the very least— to the industrial divisions. But many of Immelt’s grand visions fell flat at Capital, and that reception may have contributed to his distaste for that business. The marketing-based commercial strategy Immelt was imposing on GE’s industrial units didn’t really translate to financial services. And his optimism-driven management style, all pep talks and insistence that sheer determination to prevail could surmount any obstacle, didn’t go over well with a division that measured success and failure in fractions of a percentage point and whose analyses almost always turned on quantitative factors.’

Predix

GE’s Software foray(Predix); Immelt era. ‘Even GE and Predix had to obey laws of physics that weren’t susceptible to marketing.’

‘… the company was unlikely to turn a profit from customized code, since it usually couldn’t be sold again to someone else. Software businesses usually turn a profit by selling uniform programs that can be sold many times over once they are built, not by designing unique programs, customer by customer. But even making those bespoke sales required something else: a viable product that GE’s customers— oil drillers, airlines, power plant operators, freight railroads, hospital chains— actually wanted. GE didn’t just pour money into Predix— it smothered the project with cash. But without a coherent strategy and well-thought-out processes, the product development path was a wasteful one. GE’s plan to move fast, produce a viable product, and then perfect it in the field got bogged down partly because of the size of the effort.’

‘Instead of charging a small team with developing the best product and then letting the operation grow with the product’s evolution, GE set up a huge organization that wasn’t quite needed yet. Development was often paused or delayed in order to start the process over entirely or just to stabilize the systems.’

Domain knowledge

GE prided itself on producing great managers. But the generalist manager approach did not always work. Sometimes domain expertise was lacking.

‘GE was a siloed organization, in contrast to the image it presented to outsiders.’

‘Some of his colleagues suspected that Immelt’s background in sales and the industrial units had left him with a fuzzy grasp of the financial basics. His positions in those businesses had been meaningful, and he had been responsible for their finances. But GE’s matrix management structure had assigned others to be the crucial arbiters on some of the most complex financial decisions in which he had been involved.’

‘Immelt struggled with basic concepts— the difference between secured and unsecured debt, for instance, which was fundamental to a lending operation like GE Capital.’

‘Their attempts to explain to the boss what they feared he didn’t fully grasp were proving futile. He always waved away their doubts.’

‘In the eyes of Capital workers, the role of the sales organization shifted when Immelt became CEO. Salespeople could push around finance people, but they weren’t held accountable for outcomes. If a deal closed, it was celebrated as a success of the salespeople. If it fell apart, the blame seemed to fall on the risk and finance team. The culture created by this dynamic led to executive promotions of people who were underqualified for complex finance jobs. Some were not only unprepared for their roles but also unfamiliar with basic finance concepts, according to people inside the business at the time.’

And this, of course, was not limited to Immelt.

When under Immelt GE went full throttle into oil and gas (turbines), etc. they did not consider the effects of what will happen if crude oil prices went up/ down. I am not knowledgeable much about it, but as per the book, this is a very basic consideration and this mistake cost GE Power a lot. Same for acquisition of Alstom for which, after announcing the deal, GE had to fight/ accommodate the French bureaucracy and ended up paying much more than original estimate.

There were attempts at telling a story. Slogan was changed and buzzwords were created. ‘Ecomagination’, for example. Why do new leaders change slogans, come up with buzzwords, coin phrases? The spoken and written language- say English or any other business language- is rich enough. But I think it’s the easiest way of re-arranging pieces on the chess board (an example of bike shedding). I guess it helps them say I have arrived, change is in the air, fall in line or else, etc., and in the process identify who their cronies are. In the corporate world the show of activity instead of actual substantial activity makes you sick.

Another such practice is changing the org structure.

Leaders also seem to introduce themes. For example, GE is an industrial company, we are getting out of entertainment business. Such ideas are logical, of course. They help the company align. You must have come across such ideas: we are going lean, cloud native, etc. We can easily identify such things by the narrative: ‘in past we slipped on …’ or ‘if we miss the coming wave…’, etc. There is nothing, absolutely nothing wrong with such things. There is no better way to align a large group than Bezos’s API mandate or JFK’s clear goal setting: ‘before this decade is out, of landing a man on the Moon and returning him safely to the Earth’. But we need to understand the dynamics. When people go all-in on such things sometimes they loose responsiveness and at times it’s too late to notice that such initiatives have stopped working (or, in some cases, negatives have overshadowed positives). In general, as Bertrand Russel said, ‘all movements go too far’. Then (in my experience after 3-4 years) some organization restructuring happens and new leaders introduce some different themes. GE did a bit of that- for example, ‘Welch’s core mission was to attack complexity, ripping out layers of bureaucracy that had built up inside the company and making the massive company more nimble’, or Immelt moving out of NBC, Capital and into Power, or Immelt wanting GE to become more start-up like. And sometimes it helps if you are able to tell a story around such ideas. The thing is that the narratives need content. In GE’s case I think investors felt like it’s a lot of just fancy-shmancy stuff… ‘like a big budget movie with a story that goes nowhere’. Like Elaine, the market would have none of that.

‘A carefully curated self-mythology had always been central to GE’s methods, going back to Edison. GE had marshaled the tools of public opinion that had marked every era…’ There is a line in the book ‘The narrative of success begat success’.

‘The premise of GE’s every communication to investors was that its management was so strong, and its vision of the entire conglomerate so penetrating and expansive, that its stock was a gamble worth taking. Even in the inevitable event that something somewhere went wrong, its protocols for working out of trouble would limit the amount of long-term damage and preserve its investors’ money.’

‘In the weeks and months after Immelt left GE in 2017, a parade of negative stories and embarrassing disclosures revealed major problems that sent the company’s stock into a long decline. Conversations about what happened inevitably shifted to blame, and Immelt was the obvious target. He had spent sixteen years at the top and, regardless of what Welch had left for him, he’d had plenty of time to fix it. But there was plenty of blame to go around. Perhaps most of it should be placed on the board of directors, the independent group that oversees the CEO.’… ‘It had been their job to know, however, and their job to ask the hard questions that weren’t fully answered, or were never asked at all. It was their job to oversee management, and it was their job to protect investors from fatal hubris. Still, the path ultimately leads back to Immelt. As chairman, he was also responsible for steering the board.’

‘There is also plenty of blame to put on GE’s top-down culture, which Welch and any number of midlevel managers used to their advantage as readily as Immelt did.’

Post Immelt era

‘Under Flannery, the message reverberating around the headquarters and the entire company was: no more success theater.’

Flannery came across a lot of problems; particularly at GE Power, Insurance (which people believed GE had sold years ago). GE Power profits, on close examination, seemed to exist mostly on paper. ‘Power was building inventory even as the global market for turbines was slowing. “It was like they drove off a cliff,” Flannery later told an observer, “and there were no skid marks.”’

‘Immelt often made it a point to go around the board table to ensure that everyone had a chance to comment on a strategic decision, but directors rarely challenged him. To Immelt, this was proof that he solicited input and encouraged debate.’

‘Some GE lifers saw Flannery as a tragic figure, taking the fall as he tried to clean up years’ worth of others’ mistakes. Flannery has told friends that he remains certain that there was no quick fix for the serious problems he unearthed.’

‘If Immelt was known for his vaulting optimism, Flannery soon became known for his indecision and endless analysis. Few decisions, even major ones, were final. A critical strategic move, like the separation of a major division, could be made, only to be reassessed at any time. Flannery’s style was quickly grating on top executives who worked with him.’

‘Jack Welch continues to privately rage about the man he chose to succeed him. The aging CEO still likes to say he gives himself an A grade for the way he ran GE, and an F for his choice of successor.’ (Welch passed away some time back.)

‘The SEC didn’t name the GE executives, but the order makes it clear that the misbehavior stopped with Flannery’s first full quarter as CEO, the fourth quarter of 2017.’

Post Flannery, Culp became (and still is, as of today) the CEO. First outside CEO though Flannery had brought him on the board of directors.

The authors end the book with: ‘But even the success of Culp’s efforts would undermine a central tenet of GE’s oldest and most precious belief: that it knew how to manage any business and could teach any of its own to do so. In the end, when General Electric most desperately needed a manager to save what was left of the company, it had to go looking somewhere else.’

This ending is the most biting and maybe perfect one in my opinion.

A few additional impressions. The book could have been edited better. For better reading experience and for the layman like myself the chapters are kept small and paragraphs are also small. But this has a side effect that some information gets repeated and a bit of to and fro happens. Some flow gets interrupted. Maybe the chapters were written by individual authors instead of jointly.

The book does not mention this. But in my mind, Immelt seems to be the kind of manager/ leader you used to read about in trashy self help books. Shoulders back, chest out, firm handshake, make and hold eye contact, interest your audience by talking about sport, etc.

Any cash rich company can have inefficiencies, bad practices. (Commercial paper- a form of short term debt in GE’s case comes to mind along with executive perks, etc.) It is better to get rid of these when there is time. When cash crunch occurs some of these things are first to go (company cars). But some others are difficult to get rid of. And sometimes you end up letting wolf inside the door (regulators, watchdog, seat on the board, etc.).

A lot of these things appear obvious or common sense in hindsight. I guess none of the leaders wanted to do any harm to the company. Welch, Immelt, etc. The business of conglomerate was complex. And most certainly even after reading a book or two we may not fully understand/ appreciate the complexity. But if a CEO gets 15-20 years to shape up the company with uncontested power, like Welch and Immelt did, couldn’t they have done better? Immelt was almost always firefighting. But couldn’t his shortcomings (not enough knowledge about Capital business, not factoring in the crude oil price hike) be overcome? I guess people fall in love with their own images and approaches. When they reach top, they stop listening. Some of this my-way-or-highway approach must be necessary to avoid second-guessing everything. But I think you should be open to one or two levels above and below your own level. (For example, in Toyota Production System a worker on shop floor could bring assembly line to halt if he/she found a problem.) A CEO who takes the board for granted is better without having that board or the one who isn’t willing to listen to bad-news from his subordinates will have to hear that news from the market. I think life is an incessant flow of choices (irrespective of whether free will exists or not). It is all about to choices we make and paths we take. And the management at GE made bad choices for long.

A few generic reasons I can think of behind such mess:

The people involved are optimizing for their self interests- career paths, perks, money, power, etc. They know their limitations, and that business, companies are too impersonal, their time limited, etc. ‘It’s just a job.’ I don’t think there is anything wrong with that. But you can do that with a bit of professionalism.
While they are in the system, they become part of it and cannot appraise it impartially.
People underestimate complexity because they don’t understand it. Even when they understand complexity, their brains by evolutionary pressures may abstract away most of it.
People have an illusion of control and can become cocksure about the things they do, believe. In case of uncertainty, they double down on their approaches.
People often optimize for easier ones of the many available metrics and when metrics become targets and can be gamed, they stop being good metrics.
People, when they can’t answer a difficult question, tend to substitute the difficult question with an easier one which they can answer. This is substitution principle from Thinking, Fast and Slow. Any answer you can come to questions like: should we have NHS like healthcare system, should we have minimum wages, should we tax the rich more, should primary education be…, etc. is likely to be an answer to a substituted simpler question (unless you are a domain expert). Perhaps this is why companies, as first actions divest from some segments, fire employees, and even have awards and promotion cycles, etc.
Systems influence people and people influence systems. Both these are dynamic.
Celebrity culture, hero’s blind worship is a factor.
Some of these things appear to be sort of directed acyclic graphs. It may seem that there’s an inevitable directionality to them, ‘it was bound to happen like that’. But as Steve Jobs said you can connect the dots only in hindsight. There’s no way one can be sure how dynamic systems will evolve. If we travel back to K-T boundary and the earth is hit again by same asteroid/ comet at same time and place, there is no way you can be sure that some 66 million years later, girls would want to impress boys by talking about NVIDIA stock.
GE people were not particularly different. You and I fall for these things all the time.

Borrowing money (from VCs, funds, loans, market) is starting to look less appealing to me. And yet companies do need to borrow money from market and other sources to expand, sustain, etc. The founder of Ikea borrowed 500 Krona to purchase and then sell fountain pens. He repaid the money, and Ikea, since then, have had no outside financing, no debt financing.

Lastly and most importantly I think it’s the culture that matters. Of course, it’s a vague and open-to-interpretation concept. And culture is generally not stagnant; doesn’t remain unchanged for long. But many of your actions are rooted in an existing culture or are aspiring towards some culture. And every action updates the culture- what is acceptable and what’s not, what is rewarded and what isn’t, are you open to criticism, debate, difference of opinion, what you do when things go wrong, etc. Certainly, GE had a culture problem. I have seen a couple of rotten companies and maybe it boiled down to this vague idea of culture. This becomes particularly noticeable when you are inside and company’s fortunes turn (for better or for worse).

*Best invention/ discovery was electricity, wheel. Electricity seems ok. But wheel? No way. The best was probably the second wheel. Have you seen the idiots on unicycles? (Rich Hall gag.) Surprisingly, AFAIK, evolution has not come up with a wheel (some sea-bed dwelling fish have pods closely resembling wheels but not exactly wheels- your know, translate-to-rotary motion, friction reducing, etc.). Evolution works on the good enough principle, doesn’t want to perfect things/ mechanisms.

TODO: A post about power.

While updating the post today, I ended up using Seinfeld’s ‘big budget movie with a story that goes nowhere’ over Shakespeare’s ‘full of sound and fury signifying nothing’. And I love how quotable Seinfeld is- along with Wodehouse, Calvin And Hobbes and P. L. Deshpande. I have heard similar- quoteworthiness- about Dune (purchased but unread). And you have no idea how I resisted putting Immelt’s cavalier approach in terms of Monty Python’s ‘Tis but a scratch/ flesh wound. But I have to say, at times when I understand him, I just love Shakespeare.

Reading Paper Scaling Memcache at Facebook

2024-12-08T00:00:00+05:30

I recently read the paper Scaling Memcache At Facebook. Some of my notes below:

First let’s familiarize with some terms, ideas which are used in the paper/ my notes on it. This section may come across as teach-me-as-if-I-am-five-year-old description. Sorry for that. A caching solution generally holds data in a hash-table like key-value structure- mostly in memory- to provide- more or less- O(1) look up. It saves trips to and thus reduces load on the data store. Thus improving performance**. The data store can be a database, a service, etc. Data can become stale (inconsistent with underlying source-of-truth store) and updates/ deletes/ new additions of data need to be managed. Some caching systems may take up persistence of data (data written first into the cache and later flushed to data-store in batches AKA write-back). Evicting data from cache/ cache invalidation can be challenging (There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.- Jeff Atwood of StackOverflow, codinghorror blog.). Look up terms like LRU, LFU, TTL, cache-aside, write-through/back/around, versioning, cache-stampede/ thundering-herd, etc.

memcached- Free & open source, high-performance, distributed memory object caching system. I pronounce it with d in the end (like in etcd) but you can pronounce it as the past tense of a verb. It was first designed for use in LiveJournal and is now a popular caching solution.

memcache- Notice the missing d in the end. Facebook’s (FB) distributed caching system which uses memcached. When fb designed this system memcached was not a distributed solution. This paper discusses this system. They added capabilities to memcached and some of them are now open sourced.

One of the authors of the paper discusses some of the techniques/ rationale in this talk.

For convenience and to avoid confusion, some clarification on terms used. They use the term client not to indicate the client app of cached-data but the memcached client app/ library to embed into/ call from the facebook application. FB web or mobile app interacts with memcached servers via this client side app. This client-library kind of app is called mcrouter. The term ‘frontend clusters’ is used a lot. If I am not mistaken it means the deployments of user facing facebook site. As the term frontend can become overloaded with things like webapp, backend for mobile app, UI, javascript, etc. let’s keep in mind that it means deployments of user facing facebook site.***

FB users consume an order of magnitude more content than they create. This behavior results in a workload dominated by fetching data. So caching such content helps. Second, the read operations fetch data from a variety of sources such as MySQL databases, HDFS installations, and backend services. These are authoritative sources. Memcache is not the authoritative source of the data. This heterogeneity of data sources requires a flexible caching strategy. Also, they are willing to expose slightly stale data in exchange for insulating a backend storage service from excessive load.

Their use of cache is demand-filled look-aside cache (cache-aside). The content is saved to data store but is cached only when it’s fetched/ read. On updates, they delete cached data instead of updating it because deletes are idempotent.

The data that is cached isn’t just end-user created data. Some derived data such as data created by ML algorithms is also held in cache.

The scale and reach of FB, wide fanout (say celebrity created content), read heavy use, latency expectations, etc. dictate that FB site is deployed in multiple regions and clusters within those regions. This places demand on data sources. So data sources are also deployed in these clusters and memcached servers as well. But this also means they need data replication and consistency.

The goals of caching at FB: reduce the latency of fetching cached data or the load imposed due to a cache miss.

Reduce Latency

The memcached servers are co-located in a cluster with webservers and storage. This reduces load on databases and other services and also reduces network lags. Cached items are distributed across the memcached servers through consistent hashing. Thus web servers have to routinely communicate with many memcached servers to satisfy a user request. As a result, all web servers communicate with every memcached server in a short period of time. This all-to-all communication pattern can cause incast congestion or allow a single server to become the bottleneck for many web servers. Data replication often alleviates the single-server bottleneck but leads to significant memory inefficiencies in the common case. They solved this using an indirection by introducing memcache client which runs on each web server. This client serves a range of functions, including serialization, compression, request routing, error handling, and request batching. The memcache client logic is provided as two components: a library that can be embedded into applications or as a standalone proxy named mcrouter. For convenience, I will call both of these (library and proxy) as mcrouter. This proxy presents a memcached server interface and routes the requests/replies to/from other servers. These mcrouters maintain a map of all available servers, which is updated through an auxiliary configuration system (I believe this aux config service is zookeeper).

To minimize the number of network round trips necessary to respond to page requests, FB construct a directed acyclic graph (DAG) representing the dependencies between data. A web server uses this DAG to maximize the number of items that can be fetched concurrently. They also use techniques such as sliding window to throttle the demand. The window size is dependent on observed metrics because smaller window means more number of request in progress and higher window size can cause incast congestion.

They use UDP for get requests to reduce latency and overhead. Since UDP is connectionless, each thread in the web server is allowed to directly communicate with memcached servers directly, bypassing mcrouter. For reliability, they route set and delete operations over TCP via mcrouter.

They treat get errors as cache misses, but web servers skip inserting entries into memcached after querying for data to avoid putting additional load on a possibly overloaded network or server. This number tends to be small in their experience.

Reduce Load on data stores or services

They use leases to address two problems: stale sets and thundering herds. (If you are not aware, look up thundering herds and also cache stampede.)

A stale set occurs when a web server sets a value in memcache that does not reflect the latest value that should be cached. This can occur when concurrent updates to memcache get reordered. To refresh your memory, two operations are concurrent when each of them is unaware of the other taking place. And re-ordering of operations is interesting and a common problem because of network delays, etc.

Stale sets are more relevant particularly for cache-aside solutions. Suppose the entry for Mary Jane’s marital status is missing in the cache while it’s stored as single in the master database. One fine day, Peter Parker looks at her profile and her status. His read from data store request would subsequently want to set the status as single in the demand-filled cache. But that fine day being the first day of spring and all, Mary Jane gets married to Harry. (In a Spiderman story she shouldn’t, but this being memcached story suppose she does.) Now Harry and MJ would update their marital statuses to married. Suppose this data store update and subsequent read-your-own-write operation inserts her status as married into the cache. But due concurrent operation reordering Peter’s read operation overwrites that status to single. It can cause bad blood. This kind of scenario is stale set. Read more about it here.

A thundering herd happens when a specific key undergoes heavy read and write activity. As the write activity repeatedly invalidates the recently set values, many reads default to the more costly path.

A memcached instance gives a lease to a mcrouter to set data back into the cache when that client experiences a cache miss. The lease is a token bound to the specific key the client originally requested. The client provides the lease token when setting the value in the cache. With the lease token, memcached can verify and determine whether the data should be stored and thus arbitrate concurrent writes. Verification can fail if memcached has invalidated the lease token due to receiving a delete request for that item. Leases prevent stale sets in a manner similar to how load-link/store-conditional operates.

To address thundering herds each memcached server regulates the rate at which it returns tokens. By default, a token only once every 10 seconds per key. Requests for a key’s value within 10 seconds of a token being issued results in a special notification telling the mcrouter to wait a short amount of time. Typically, the mcrouter with the lease will have successfully set the data within a few milliseconds. Thus, when waiting clients retry the request, the data is often present in cache.

At FB, there are areas where stale values (not stale sets) in cache are ok. The above Spiderman scenario was relevant when data was not present in cache. Because FB delete cached data from cache on data store updates, stale values are more relevant in value being updated in data store but not yet deleted from cache. A get request can return a lease token or data that is marked as stale. Applications that can continue to make forward progress with stale data do not need to wait for the latest value to be fetched from the databases.

To further reduce load on different backend stores, services which can differ in access patterns, memory footprints, etc. FB created multiple cache pools. They partition a cluster’s memcached servers into separate pools. One pool (named wildcard) is the default. For keys not suitable for this wildcard pool, separate pools are provisioned. For example, a small pool for keys that are accessed frequently but for which a cache miss is inexpensive, another large pool for infrequently accessed keys for which cache misses are expensive, app pool(a pool devoted for a specific application- high churn, high miss rate), a replicated pool for frequently accessed data, and a regional pool for rarely accessed information, etc.

Some pools keys are replicated to improve the latency and efficiency of memcached servers. The idea is to replicate a category of keys within a pool when (1) the application routinely fetches many keys simultaneously, (2) the entire data set fits in one or two memcached servers and (3) the request rate is much higher than what a single server can manage. Replication in this instance is better than further dividing the key space. Consider a memcached server holding 100 items and capable of responding to 500k requests per second. Each request asks for 100 keys. The difference in memcached overhead for retrieving 100 keys per request instead of 1 key is small. To scale the system to process 1M requests/sec, splitting the the key space by introducing a new server may work. But clients now need to split each request for 100 keys into two parallel requests for ∼50 keys. Consequently, both servers still have to process 1M requests per second. However, if the entire keyspace is replicated instead of splitting it, a client’s request for 100 keys can be sent to any replica. This reduces the load per server to 500k requests per second. Each client chooses replicas based on its own IP address. This approach requires delivering invalidations to all replicas to maintain consistency.

Handling Failures: Two scales of failures: (1) a widespread outage that affects many servers within the cluster- In this case requests are routed to different cluster or regions. (2) a small number of hosts are inaccessible due to a network or server failure For this scenario, FB dedicate small set of machines, named Gutter, to take over the responsibilities of a few failed servers. Gutter accounts for approximately 1% of the memcached servers in a cluster. When a memcached client receives no response to its get request, the client assumes the server has failed and issues the request again to a special Gutter pool. If this second request misses, the client will insert the appropriate key-value pair into the Gutter machine after querying the database. Entries in Gutter expire quickly to obviate Gutter invalidations. Gutter limits the load on backend services at the cost of slightly stale data. By using Gutter to store these results, a substantial fraction of these failed get requests are converted into hits in the gutter pool thereby reducing load on the backing store.

Replication within a regiion

(refer to the image above) FB split web and memcached servers into multiple frontend clusters. These clusters, along with a storage cluster that contain the databases, define a region. This region architecture also allows for smaller failure domains and a tractable network configuration. Thus FB trade replication of data for more independent failure domains, tractable network configuration, and a reduction of incast congestion.

A web server that modifies data also sends invalidations to its own cluster. SQL statements that modify authoritative state are amended to include memcache keys that need to be invalidated once the transaction commits. Invalidation daemons (named mcsqueal) run on every database. Each daemon inspects the SQL statements that its database commits, extracts any deletes, and broadcasts these deletes to the memcache deployment in every frontend cluster in that region. Invalidation daemons batch deletes into fewer packets and send them to mcrouter instances in each frontend cluster. These mcrouters then unpack individual deletes from each batch and route those invalidations to the right memcached serves. Invalidation via mcsqueal is better than via web servers. First, batching is possible in mcsqueal and second, use of mcrouters avoids misrouting of deletes due to a configuration error.

A system called Cold Cluster Warmup mitigates poor hit rates when cache is cold by allowing clients in the “cold cluster” (i.e. the frontend cluster that has an empty cache) to retrieve data from the “warm cluster” (i.e. a cluster that has caches with normal hit rates) rather than the persistent storage. Memcached deletes support nonzero hold-off times that reject add operations for the specified hold-off time. By default, all deletes to the cold cluster are issued with a two second hold-off. When a miss is detected in the cold cluster, the client re-requests the key from the warm cluster and adds it into the cold cluster. The failure of the add (as the key is not present even in the warm cluster) indicates that newer data is available on the database and thus the client will refetch the value from the databases.

Consistency across regions

Each region consists of a storage cluster and several frontend clusters. One region holds the master databases and the other regions contain read-only replicas. FB use on MySQL’s replication mechanism to keep replica databases up-to-date with their masters. When scaling across multiple regions, maintaining consistency between data in memcache and the persistent storage becomes the primary technical challenge. These challenges stem from a single problem: replica databases may lag behind the master database. The paper describes what works for FB. Memcache system at FB represents just one point in the wide spectrum of consistency and performance trade-offs. They provide best-effort eventual consistency but place an emphasis on performance and availability.

Writes from a master region: use mcsqueal described earlier. Writes from a non-master region: Consider a user who updates his data from a non-master region when replication lag is excessively large. The user’s next request could result in confusion if his recent change is missing. A cache refill from a replica’s database should only be allowed after the replication stream has caught up. Without this, subsequent requests could result in the replica’s stale data being fetched and cached. FB use a remote marker mechanism to minimize the probability of reading stale data. The presence of the marker indicates that data in the local replica database are potentially stale and the query should be redirected to the master region. When a web server (non-master region) wishes to update data that affects a key k, that server (1) sets a remote marker rk in the region, (2) performs the write to the master embedding k and rk to be invalidated in the SQL statement, and (3) deletes k in the local cluster. On a subsequent request for k, a web server will be unable to find the cached data, check whether rk exists, and direct its query to the master or local region depending on the presence of rk. In this situation, FB are ok with added latency (introduced due to a cache miss) so as to avoid reading stale data. The presence of a remote marker helps distinguish whether a non-master database holds stale data or not.

Performance Optimizations

The first major optimizations were to: (1) allow automatic expansion of the hash table to avoid look-up times drifting to O(n), (2) make the server multi-threaded using a global lock to protect multiple data structures, and (3) giving each thread its own UDP port to reduce contention when sending replies and later spreading interrupt processing overhead. The first two optimizations were contributed back to the open source community. Employing fine-grained locking triples the peak get rate. UDP implementation outperforms TCP implementation by 13% for single gets and 8% for 10-key multigets(more data into each request than single gets).**

Memcached employs a slab allocator to manage memory. The allocator organizes memory into slab classes, each of which contains pre-allocated, uniformly sized chunks of memory. Memcached stores items in the smallest possible slab class that can fit the item’s metadata, key, and value. Slab classes start at 64 bytes and exponentially increase in size by a factor of 1.07 up to 1 MB. FB implemented an adaptive allocator that periodically re-balances slab assignments to match the current workload. It identifies slab classes as needing more memory if they are currently evicting items and if the next item to be evicted was used at least 20% more recently than the average of the least recently used items in other slab classes. If such a class is found, then the slab holding the least recently used item is freed and transferred to the needy class.

While memcached supports expiration times, entries may live in memory well after they have expired. Memcached lazily evicts such entries by checking expiration times when serving a get request for that item or when they reach the end of the LRU. Although efficient for the common case, this scheme allows shortlived keys that see a single burst of activity to waste memory until they reach the end of the LRU. FB introduced a hybrid scheme that relies on lazy eviction for most keys and proactively evicts shortlived keys when they expire. To do this they place short-lived items into a circular buffer of linked lists (indexed by seconds until expiration)– called the Transient Item Cache– based on the expiration time of the item. Every second, all of the items in the bucket at the head of the buffer are evicted and the head advances by one.

FB modified memcached to store its cached values and main data structures in System V shared memory regions so that the data can remain live across a software upgrade and thereby minimize disruption.

In conclusion section the paper says: Many of the trade-offs discussed are not fundamental, but are rooted in the realities of balancing engineering resources while evolving a live system under continuous product development. While building, maintaining, and evolving our system we have learned the following lessons. (1) Separating cache and persistent storage systems allows us to independently scale them. (2) Features that improve monitoring, debugging and operational efficiency are as important as performance. (3) Managing stateful components is operationally more complex than stateless ones. As a result keeping logic in a stateless client helps iterate on features and minimize disruption. (4) The system must support gradual rollout and rollback of new features even if it leads to temporary heterogeneity of feature sets. (5) Simplicity is vital.

** Performance is one of those hygiene factors, isn’t it? You can read more about hygiene factors on wikipedia. Basically, having a clean table in cafetaria during lunch hour does not make you happy but seeing it unclean will make you unhappy.

Here is some related content I copied from DDIA book:

It seems intuitively obvious that a fast service is better for users than a slow service. However, it is surprisingly difficult to get hold of reliable data to quantify the effect that latency has on user behavior.

Some often-cited statistics are unreliable. In 2006 Google reported that a slowdown in search results from 400 ms to 900 ms was associated with a 20% drop in traffic and revenue. However, another Google study from 2009 reported that a 400 ms increase in latency resulted in only 0.6% fewer searches per day, and in the same year Bing found that a two-second increase in load time reduced ad revenue by 4.3%. Newer data from these companies appears not to be publicly available.

A more recent Akamai study claims that a 100 ms increase in response time reduced the conversion rate of e-commerce sites by up to 7%; however, on closer inspection, the same study reveals that very fast page load times are also correlated with lower conversion rates! This seemingly paradoxical result is explained by the fact that the pages that load fastest are often those that have no useful content (e.g., 404 error pages). However, since the study makes no effort to separate the effects of page content from the effects of load time, its results are probably not meaningful.

A study by Yahoo compares click-through rates on fast-loading versus slow-loading search results, controlling for quality of search results. It finds 20–30% more clicks on fast searches when the difference between fast and slow responses is 1.25 seconds or more.

*** By and large, from technical and social perspective, FB created a lot of value. I deleted my FB account long, well actually, long long back. I haven’t been a socially extroverted person ever. But a factor was my experience with orkut. I did not want to be on another such network. Of course, FB makes a lot of things easy. And not being on it has some compromises associated with it which any herd animal has to make by not being a part of the a herd. And I don’t know if people use FB much these days. Do they even change their DPs? Going back to my post on platforms or quoting from Steve Yegge’s platform rant, “Facebook – that is, the stock service they offer with walls and friends and such – is the killer app for the Facebook Platform. And it is a very serious mistake to conclude that the Facebook App could have been anywhere near as successful without the Facebook Platform.” It’s the platform which enabled things like ads, viral content, suggestions, third party integrations, games, marketplace, smear campaigns, propoganda, etc. If you haven’t, go read Steve Yegge’s platform rant right now. It is more valuable than the paper.

I finished reading the paper yesterday. Reading it took a few sittings over the last two weeks. And even with underlines, annotations, etc. writing this post from the paper took a lot of time today. Plus there are things like nagging internal voice telling you to take a shortcut and watch a related memcache talk, read some post to help you understand. But we do some things precisely because they need the attention thus delaying gratification in the process. Otherwise, I would end up doom-scrolling on social networks, wouldn’t I? So long as we can be happy in long term with the choices we make and paths we take… Of course, I felt bad when I realized I missed watching Ellyse Perry ODI century. And I did watch the movie Aliens. So there is that.

Skin In The Game

2024-11-23T00:00:00+05:30

Recently I listened to audiobook version of Nassim Nicholas Taleb’s (NNT) book Skin In The Game. Here is a brief understanding.

Taleb has been a trader/ hedge fund manager. His books on uncertainty, risks, luck, randomness, etc. are bundled as Incerto. I have only read Fooled By Randomness before Skin In The Game. His books seem to have a sort-of philosophical idea at the center with some statistics, Greek/ Roman/ European classics/ philosophy, experiences in trading in the background.

The idea behind Skin In The Game is that people who decide policy should have to share the risks if the policy does not yield intended results. If you don’t have a stake in the system, you should not be allowed to decide about the system. “For social justice, focus on symmetry and risk sharing. You cannot make profits and transfer the risks to others, as bankers and large corporations do… Forcing skin in the game corrects this asymmetry better than thousands of laws and regulations.”

I think at a high level everyone knows the concept. It’s somewhat like accountability. But a lot of systems we see, participate in or are part of don’t seem to have this mechanism built in. The idea of having accountability built in may generate mixed emotions. People are generally ok with reaping the benefits of success but may be uncomfortable in suffering due to failures. Having it built in into systems like politics may make you happy. But- and sorry if the next few lines make you uncomfortable (they made me uncomfortable while writing)- suppose you are a programmer or a project manager and the project fails, budget overruns or the code has production defects, how would you like to share the burden? You can sense if someone does not have skin in the game. A Monica (Friends) tends to break up with Richard who prefers having fun over having children or a Jim (US version of The Office) is forced to give up on Pam if he gets the feeling that ‘I kinda put it all on the line. Twice, actually. And she said No.’

Now, one may say that we don’t participate in many such Jim-Pam systems. But then your stake in systems could be proportionate. An architect, a manager and a coder need not and will not have same amount of the skin in the game if the system they built fails. One of the finer concept NNT writes about in the book is that the risk you end up sharing does not have to as big as the failure. It can be proportionate with some scaling down.

The idea is to commit. The system may fail. You have to have something to lose if the system fails. If you are playing it safe, you do not have skin in the game. In spite of uncertainty- known and unknown unknowns- you have to commit. If someone is drowning in the river then you- sitting on the riverbank- have no right to teach them how to swim. And if you are the person who is drowning, should you listen to someone who is shouting instructions from the bank or place yourself in the care of the not-so-expert swimmer who jumps into the river to save you? Of course, if the swimmer who jumps in is an expert one, then it’s the happy path. But then neither you nor the savior has skin in the game. Because successful outcome is guaranteed. If the project succeeds, everybody from coder to manager may get rewards. The concept of skin in the game is more prominently apperant in the not-exactly-ideal-outcome scenarios.

A couple of examples of skin in the game: (Based on my reading)

Rahul Bajaj was an industrialist in Pune and led the Bajaj Group. Bajaj Auto- one of the big companies in the group- set up school for children of their workers. And Rahul Bajaj’s children studied there.
Anil Awachat was a medical doctor turned social worker. He stayed near the slum like area in Pune where he worked and both his daughters went to the same school which the children from the neighborhood went to.
The fugu fish possesses a potentially fatal poison so the chef preparing a dish has to eat from the prepared dish before serving to others. (Not sure if this is true these days.) Talking about the fish reminds me of the Columbo episode.

And now something related but not conceptually related: The known unknown, unknown unknown reminds me of Iraq invasion. NNT is quite critical by naming names of war mongering people. Also, to quote from wikipedia, ‘Robert Rubin, a highly-paid director and senior advisor at Citigroup, paid no financial penalty when Citigroup had to be rescued by U.S. taxpayers due to overreach. Taleb calls this sort of a trade, with upside gain but no or limited downside risk, a “Bob Rubin trade”.’ I agree with the ideas. But this sort of naming and shaming did not appeal to me. If I remember correctly he has done same for Soros in Randomness.

Another thing that that I did not like was the lack of scientific rigor. NNT’s style seems to be take an idea, have some anecdotal tales, personal experiences, statistical ideas, philosophical background thrown around the central idea, criticize some people and mindsets, etc. But one of the tenets of science is saying that we were wrong, our experiments did not prove anything conclusive, our new drug does not perform significantly better than a placebo, etc. NNT’s writing does not seem to have the smallest of self-doubt. He even preempts it by telling us to be wary of scientism. Another thing he says evolution or passage of time is the best evaluator of results/ ideas. And yet he says western invasions (Iraq, Syria) did not work out. Why doesn’t he allow the same passage of time there? I am not saying that any of these/ such wars are good. My point is that there seems to be double standards to ideas he advocates and the ones he opposes. Why read his books then? Because his ideas are important, appealing and seem to make sense. Whether they will make sense when subjected to scientific rigor/ debate is not for me to say as I am not knowledgeable enough. But the lack of rigor is visible. However, his statistical knowledge/ understanding seems deep.

A couple of more points- and these are my shortcomings- due to which there was some impedance mismatch between his books and myself: 1. Most of his writing features his background as a trader. And due to lack of my interest in that area some of his discussion is lost. 2. Many of the references are from Greek Roman myths, Western European thinkers and some of those being foreign to me- at this stage- have lesser appeal. Contrast this to Yuval Noah Harari whose references include Vedas as well as Code of Hammurabi.

Atul

Thinking In Systems

Short Takes 3

Law of the minimum

Bounded Rationality

Framing effect

Thoughts In Progress: 3

Reading 1988 Book Introduction to Functional Programming

An Opinionated Programmer

Short Takes 2

PuLa

Madeleine moment

Language and script (in entertainment)

Thought subversion is difficult

Test cricket and changing tastes

Dentists

End of an era in Tennis

Doorman Fallacy

Pretty Little Baby

That’d be quite a spectacle

Sleep

Women’s ODI Cricket world cup

Weak minds

Is-Ought Fallacy

Content Creation Pressure

Content targeted towards audience

English grammar

Straw man fallacy

System Dynamics

She walked in beauty like a starlit night