[{"data":1,"prerenderedAt":6666},["ShallowReactive",2],{"\u002Fblog\u002FComputer-Systems-Architecture-Understanding-Performance":3,"post-count":872,"series-global-data":873,"authors-all":983,"series-sidebar-CSA":1178,"sidebar-authors":6565},{"id":4,"title":5,"author":6,"body":7,"date":861,"description":284,"draft":862,"edited_at":861,"extension":863,"featured_image":864,"meta":865,"navigation":866,"path":867,"pinned":862,"seo":868,"sitemap":869,"stem":870,"tags":864,"__hash__":871},"blog\u002Fblog\u002FComputer-Systems-Architecture-Understanding-Performance.md","Computer Systems Architecture: Understanding Performance","chinono",{"type":8,"value":9,"toc":817},"minimark",[10,15,19,27,34,38,41,46,49,53,65,69,76,80,83,87,94,98,105,108,115,131,135,138,144,150,156,160,163,167,170,174,181,185,188,192,198,201,204,208,215,219,222,226,237,240,258,261,265,268,272,275,285,288,294,297,303,314,317,323,327,341,347,351,361,367,374,378,381,387,403,407,529,536,540,543,613,624,628,632,638,641,645,651,654,658,661,664,674,678,685,700,704,711,715,733,736,740,778,781,785,788,795,800],[11,12,14],"h2",{"id":13},"why-should-we-care-about-performance","Why Should We Care About Performance?",[16,17,18],"p",{},"Here's something wild to think about: the laptop you're reading this on has more computing power than an IBM mainframe from just 10–15 years ago. Processors have become so cheap that we literally throw some of them away (think disposable RFID chips). The cost of computing keeps plummeting while performance keeps skyrocketing — and that's not by accident.",[16,20,21,22,26],{},"Modern desktop applications are ",[23,24,25],"em",{},"hungry",". Image processing, 3D rendering, speech recognition, video conferencing, multimedia authoring — all of these demand serious computational muscle. On the server side, businesses rely on powerful machines for transaction processing and massive client\u002Fserver networks, and cloud providers run enormous banks of servers to handle high-volume workloads for countless clients.",[16,28,29,30],{},"So when we talk about \"performance\" in computer architecture, we're really asking: ",[31,32,33],"strong",{},"how do we design systems that keep up with these ever-growing demands?",[11,35,37],{"id":36},"speeding-up-the-processor","Speeding Up the Processor",[16,39,40],{},"Modern processors don't just run instructions one after another in a straight line. They use a collection of clever techniques to squeeze out as much speed as possible. Let's walk through the big ones.",[42,43,45],"h3",{"id":44},"pipelining","Pipelining",[16,47,48],{},"Think of pipelining like an assembly line in a factory. Instead of waiting for one instruction to finish completely before starting the next, the processor breaks execution into stages. While one instruction is being executed, the next one is already being decoded, and the one after that is being fetched. This overlap means instructions effectively complete much faster on average.",[42,50,52],{"id":51},"branch-prediction","Branch Prediction",[16,54,55,56,60,61,64],{},"When your code has an ",[57,58,59],"code",{},"if-else"," statement, the processor encounters a \"branch\" — it doesn't yet know which path the program will take. Rather than stalling and waiting, the processor ",[23,62,63],{},"guesses"," which branch is more likely (using historical patterns) and starts executing speculatively down that path. If it guesses right, great — no time wasted. If it guesses wrong, it rolls back and takes the correct path. Modern processors guess correctly the vast majority of the time.",[42,66,68],{"id":67},"superscalar-execution","Superscalar Execution",[16,70,71,72,75],{},"A superscalar processor can issue ",[23,73,74],{},"more than one instruction"," in a single clock cycle. Think of it as having multiple assembly lines running simultaneously. Instead of processing instructions one at a time, the processor identifies independent instructions and fires them off in parallel.",[42,77,79],{"id":78},"data-flow-analysis","Data Flow Analysis",[16,81,82],{},"The processor analyzes which instructions depend on the results of other instructions. If instruction B needs the result of instruction A, they can't run in parallel. But if instructions C and D are completely independent, the processor can reorder and execute them whenever their inputs are ready — even out of the original program order.",[42,84,86],{"id":85},"speculative-execution","Speculative Execution",[16,88,89,90,93],{},"Building on branch prediction, the processor can execute instructions ",[23,91,92],{},"ahead of time"," before it's certain they'll actually be needed. If it turns out they were needed, the results are already available. If not, the results are simply discarded. This keeps the pipeline full and minimizes wasted cycles.",[11,95,97],{"id":96},"the-performance-balance-problem","The Performance Balance Problem",[16,99,100,101,104],{},"Here's a core challenge in computer design: ",[31,102,103],{},"not all components run at the same speed."," The processor might be blazing fast, but if it's constantly waiting for data from memory or an I\u002FO device, that speed is wasted.",[16,106,107],{},"Think about typical data rates across different I\u002FO devices — an Ethernet modem, a graphics display, a hard disk, an optical drive, a keyboard — they all operate at vastly different speeds. A keyboard sends data at maybe a few bytes per second, while a graphics display might need gigabytes per second.",[16,109,110,111,114],{},"The architect's job is to ",[31,112,113],{},"balance the system"," — adjusting the organization and architecture so the mismatch between components doesn't create bottlenecks. Some strategies include:",[116,117,118,122,125,128],"ul",{},[119,120,121],"li",{},"Making the bus (the data highway between components) wider or faster",[119,123,124],{},"Adding caches between the processor and main memory",[119,126,127],{},"Using buffering schemes so faster components don't have to wait for slower ones",[119,129,130],{},"Building a hierarchy of memory (registers → cache → RAM → disk) so the most frequently used data is always close to the processor",[11,132,134],{"id":133},"improving-chip-organization-and-architecture","Improving Chip Organization and Architecture",[16,136,137],{},"Over the decades, chip designers have pushed performance forward in three main ways:",[16,139,140,143],{},[31,141,142],{},"1. Increasing hardware speed."," Shrinking transistors means more gates packed into less space, which raises the clock rate and reduces signal propagation time. Smaller = faster.",[16,145,146,149],{},[31,147,148],{},"2. Bigger and faster caches."," By dedicating part of the processor chip itself to cache memory, access times drop dramatically compared to going off-chip to main memory.",[16,151,152,155],{},[31,153,154],{},"3. Smarter organization."," Even without faster hardware, clever architectural changes — like deeper pipelines, more parallelism, and better instruction scheduling — can increase the effective speed of instruction execution.",[11,157,159],{"id":158},"the-wall-problems-with-clock-speed-and-logic-density","The Wall: Problems with Clock Speed and Logic Density",[16,161,162],{},"If shrinking transistors makes everything faster, why not just keep shrinking forever? Well, we've hit some very real physical limits.",[42,164,166],{"id":165},"power-and-heat","Power and Heat",[16,168,169],{},"As you pack more transistors together and run them faster, power consumption goes up. More power means more heat. At some point, you simply can't dissipate heat fast enough — the chip would melt or become unreliable. This is often called the \"power wall.\"",[42,171,173],{"id":172},"rc-delay","RC Delay",[16,175,176,177,180],{},"Electrons flowing through wires face resistance (R) and capacitance (C). As components shrink, the wires connecting them get thinner (higher resistance) and closer together (higher capacitance). The product R × C determines signal delay, and it actually ",[23,178,179],{},"increases"," as things get smaller. So while transistors get faster, the wires between them can get slower.",[42,182,184],{"id":183},"memory-latency-and-throughput","Memory Latency and Throughput",[16,186,187],{},"Even if the processor can crunch numbers at incredible speed, it still has to wait for data from memory. Memory access speed (latency) and transfer speed (throughput) have historically lagged far behind processor speeds. This gap — sometimes called the \"memory wall\" — is one of the biggest challenges in modern architecture.",[11,189,191],{"id":190},"the-multicore-era","The Multicore Era",[16,193,194,195],{},"Since we can't just keep cranking up the clock speed, the industry took a different approach: ",[31,196,197],{},"put multiple processors (cores) on a single chip.",[16,199,200],{},"The idea is straightforward — instead of one very fast core, use two, four, eight, or more simpler cores working in parallel. With two processors, larger caches become justified. As caches grew, it made sense to create two and then three levels of cache hierarchy on a single chip.",[16,202,203],{},"The catch? Software has to be written to take advantage of parallelism. A single-threaded program won't magically run faster on eight cores. This shift has had profound implications for how we write software.",[42,205,207],{"id":206},"many-integrated-core-mic","Many Integrated Core (MIC)",[16,209,210,211,214],{},"MIC takes the multicore concept further — a ",[23,212,213],{},"large"," number of general-purpose cores on a single chip. The leap in raw performance is impressive, but the challenge of writing software that effectively uses dozens or hundreds of cores is significant.",[42,216,218],{"id":217},"graphics-processing-units-gpus","Graphics Processing Units (GPUs)",[16,220,221],{},"GPUs were originally designed to perform parallel operations on graphics data — encoding and rendering 2D\u002F3D graphics and processing video. But their massively parallel architecture turns out to be great for any task involving repetitive computations: scientific simulations, machine learning, cryptography, and more. This is sometimes called GPGPU — General-Purpose computing on Graphics Processing Units.",[11,223,225],{"id":224},"computer-clocks-the-heartbeat-of-the-system","Computer Clocks: The Heartbeat of the System",[16,227,228,229,232,233,236],{},"Every digital system is driven by a ",[31,230,231],{},"clock"," — a quartz crystal and a converter that produce a constant, regular electrical signal. This signal is the heartbeat of the computer, and it determines ",[23,234,235],{},"when"," events take place inside the hardware.",[16,238,239],{},"A few key definitions:",[116,241,242,248],{},[119,243,244,247],{},[31,245,246],{},"Clock period (or clock cycle time):"," The time it takes for one complete cycle. For example, 5 nanoseconds.",[119,249,250,253,254,257],{},[31,251,252],{},"Clock rate:"," The inverse of the clock period — how many cycles happen per second. If the clock period is 5 ns, then the clock rate is 1 \u002F (5 × 10⁻⁹) = ",[31,255,256],{},"200 MHz",".",[16,259,260],{},"So when you hear that a processor runs at \"3.5 GHz,\" that means its clock ticks 3.5 billion times per second. Each tick represents one opportunity for the processor to do work.",[11,262,264],{"id":263},"measuring-performance-cpu-time","Measuring Performance: CPU Time",[16,266,267],{},"Now let's get quantitative. How do we actually measure how fast a processor runs a program?",[42,269,271],{"id":270},"the-cpu-time-formula","The CPU Time Formula",[16,273,274],{},"The time to execute a program can be expressed as:",[276,277,282],"pre",{"className":278,"code":280,"language":281},[279],"language-text","CPU time = CPU clock cycles × clock cycle time\n","text",[57,283,280],{"__ignoreMap":284},"",[16,286,287],{},"Since clock cycle time = 1 \u002F clock rate, this is equivalent to:",[276,289,292],{"className":290,"code":291,"language":281},[279],"CPU time = CPU clock cycles \u002F clock rate\n",[57,293,291],{"__ignoreMap":284},[16,295,296],{},"How do we figure out the number of clock cycles? That depends on two things — how many instructions are in the program and how many cycles each instruction takes on average:",[276,298,301],{"className":299,"code":300,"language":281},[279],"CPU clock cycles = instruction count × CPI\n",[57,302,300],{"__ignoreMap":284},[16,304,305,306,309,310,313],{},"Where ",[31,307,308],{},"CPI"," stands for ",[31,311,312],{},"Cycles Per Instruction"," — the average number of clock cycles needed to execute one instruction.",[16,315,316],{},"Putting it all together:",[276,318,321],{"className":319,"code":320,"language":281},[279],"CPU time = instruction count × CPI × clock cycle time\nCPU time = instruction count × CPI \u002F clock rate\n",[57,322,320],{"__ignoreMap":284},[42,324,326],{"id":325},"a-worked-example","A Worked Example",[16,328,329,330,333,334,337,338,257],{},"Suppose a computer has a clock rate of ",[31,331,332],{},"50 MHz"," and we want to run a program with ",[31,335,336],{},"1,000 instructions",", where the average CPI is ",[31,339,340],{},"3.5",[276,342,345],{"className":343,"code":344,"language":281},[279],"CPU time = instruction count × CPI \u002F clock rate\n         = 1000 × 3.5 \u002F (50 × 10⁶)\n         = 3500 \u002F 50,000,000\n         = 70 microseconds\n",[57,346,344],{"__ignoreMap":284},[42,348,350],{"id":349},"what-happens-when-clock-rate-changes","What Happens When Clock Rate Changes?",[16,352,353,354,356,357,360],{},"Say the clock rate goes from ",[31,355,256],{}," to ",[31,358,359],{},"250 MHz"," and everything else stays the same. The speedup is:",[276,362,365],{"className":363,"code":364,"language":281},[279],"Speedup = (old CPU time) \u002F (new CPU time)\n        = clock rate new \u002F clock rate old\n        = 250 \u002F 200\n        = 1.25×\n",[57,366,364],{"__ignoreMap":284},[16,368,369,370,373],{},"The computer is ",[31,371,372],{},"25% faster"," — but only if we assume the CPI and instruction count don't change, which in the real world may not always hold true.",[11,375,377],{"id":376},"computing-cpi-in-practice","Computing CPI in Practice",[16,379,380],{},"Different types of instructions take different numbers of cycles. A simple ALU operation might take 1 cycle, while a memory load might take 5. If we know the instruction mix (what fraction of instructions are of each type), we can calculate the overall CPI:",[276,382,385],{"className":383,"code":384,"language":281},[279],"CPI = Σ (CPIᵢ × Fᵢ)\n",[57,386,384],{"__ignoreMap":284},[16,388,305,389,392,393,396,397,400,401,257],{},[31,390,391],{},"CPIᵢ"," is the cycles for instruction type ",[23,394,395],{},"i",", and ",[31,398,399],{},"Fᵢ"," is the fraction of instructions that are type ",[23,402,395],{},[42,404,406],{"id":405},"example","Example",[408,409,410,432],"table",{},[411,412,413],"thead",{},[414,415,416,420,424,426,429],"tr",{},[417,418,419],"th",{},"Operation",[417,421,423],{"align":422},"center","Fraction (Fᵢ)",[417,425,391],{"align":422},[417,427,428],{"align":422},"CPIᵢ × Fᵢ",[417,430,431],{"align":422},"% of Time",[433,434,435,453,470,487,503],"tbody",{},[414,436,437,441,444,447,450],{},[438,439,440],"td",{},"ALU",[438,442,443],{"align":422},"50%",[438,445,446],{"align":422},"1",[438,448,449],{"align":422},"0.5",[438,451,452],{"align":422},"23%",[414,454,455,458,461,464,467],{},[438,456,457],{},"Load",[438,459,460],{"align":422},"20%",[438,462,463],{"align":422},"5",[438,465,466],{"align":422},"1.0",[438,468,469],{"align":422},"45%",[414,471,472,475,478,481,484],{},[438,473,474],{},"Store",[438,476,477],{"align":422},"10%",[438,479,480],{"align":422},"3",[438,482,483],{"align":422},"0.3",[438,485,486],{"align":422},"14%",[414,488,489,492,494,497,500],{},[438,490,491],{},"Branch",[438,493,460],{"align":422},[438,495,496],{"align":422},"2",[438,498,499],{"align":422},"0.4",[438,501,502],{"align":422},"18%",[414,504,505,510,515,520,525],{},[438,506,507],{},[31,508,509],{},"Total",[438,511,512],{"align":422},[31,513,514],{},"100%",[438,516,517],{"align":422},[518,519],"br",{},[438,521,522],{"align":422},[31,523,524],{},"2.2",[438,526,527],{"align":422},[31,528,514],{},[16,530,531,532,535],{},"So the weighted average CPI is ",[31,533,534],{},"2.2 cycles per instruction",". Notice that even though loads are only 20% of instructions, they account for 45% of the execution time because they're so expensive. This tells the architect where to focus optimization efforts!",[11,537,539],{"id":538},"performance-factors-what-affects-what","Performance Factors: What Affects What?",[16,541,542],{},"Not every design decision affects every aspect of performance. Here's a simplified view:",[408,544,545,560],{},[411,546,547],{},[414,548,549,552,555,557],{},[417,550,551],{},"Factor",[417,553,554],{"align":422},"Instruction Count",[417,556,308],{"align":422},[417,558,559],{"align":422},"Clock Rate",[433,561,562,576,587,600],{},[414,563,564,567,570,572],{},[438,565,566],{},"Instruction Set Architecture (ISA)",[438,568,569],{"align":422},"✓",[438,571,569],{"align":422},[438,573,574],{"align":422},[518,575],{},[414,577,578,581,583,585],{},[438,579,580],{},"Compiler Technology",[438,582,569],{"align":422},[438,584,569],{"align":422},[438,586,569],{"align":422},[414,588,589,592,596,598],{},[438,590,591],{},"Processor Implementation",[438,593,594],{"align":422},[518,595],{},[438,597,569],{"align":422},[438,599,569],{"align":422},[414,601,602,605,609,611],{},[438,603,604],{},"Cache and Memory Hierarchy",[438,606,607],{"align":422},[518,608],{},[438,610,569],{"align":422},[438,612,569],{"align":422},[16,614,615,616,619,620,623],{},"The ISA defines ",[23,617,618],{},"what"," instructions exist (affecting instruction count and CPI). The compiler decides ",[23,621,622],{},"which"," instructions to use. The hardware implementation determines how fast each instruction actually executes.",[11,625,627],{"id":626},"beware-of-misleading-metrics","Beware of Misleading Metrics",[42,629,631],{"id":630},"mips-millions-of-instructions-per-second","MIPS (Millions of Instructions Per Second)",[276,633,636],{"className":634,"code":635,"language":281},[279],"MIPS = instruction count \u002F (execution time × 10⁶)\n",[57,637,635],{"__ignoreMap":284},[16,639,640],{},"MIPS is easy to understand and measure, but it can be deeply misleading. A processor that executes many simple instructions per second might have a higher MIPS rating than one executing fewer but more powerful instructions — even though the second processor finishes the actual task faster. MIPS doesn't account for the fact that not all instructions do the same amount of work.",[42,642,644],{"id":643},"mflops-millions-of-floating-point-operations-per-second","MFLOPS (Millions of Floating-Point Operations Per Second)",[276,646,649],{"className":647,"code":648,"language":281},[279],"MFLOPS = floating-point operations \u002F (execution time × 10⁶)\n",[57,650,648],{"__ignoreMap":284},[16,652,653],{},"MFLOPS has the same advantages and drawbacks as MIPS, with the additional limitation that it only measures floating-point work — it tells you nothing about integer performance, I\u002FO, or anything else.",[42,655,657],{"id":656},"the-mhzghz-trap","The MHz\u002FGHz Trap",[16,659,660],{},"1 Hertz = 1 cycle per second. So 1 GHz = 1 billion cycles per second.",[16,662,663],{},"It's tempting to compare processors purely by clock speed, but this ignores CPI differences. A classic example: an 800 MHz Pentium III could outperform a 1 GHz Pentium 4 on certain tasks because the Pentium III had a lower CPI — it did more useful work per cycle.",[16,665,666,669,670,673],{},[31,667,668],{},"Bottom line:"," Clock speed, MIPS, and MFLOPS are all ",[23,671,672],{},"partial"," metrics. Relying on any one of them alone can lead you to the wrong conclusion about which system is actually faster for your workload.",[11,675,677],{"id":676},"benchmarks-measuring-performance-properly","Benchmarks: Measuring Performance Properly",[16,679,680,681,684],{},"Since individual metrics can be misleading, the industry uses ",[31,682,683],{},"benchmarks"," — standardized programs designed to test real performance. A good benchmark should be:",[686,687,688,691,694,697],"ol",{},[119,689,690],{},"Written in a high-level language (so it's portable across machines)",[119,692,693],{},"Representative of a real programming domain (systems, numerical, commercial)",[119,695,696],{},"Easy to measure",[119,698,699],{},"Widely distributed",[42,701,703],{"id":702},"spec-the-industry-standard","SPEC: The Industry Standard",[16,705,706,707,710],{},"The ",[31,708,709],{},"System Performance Evaluation Corporation (SPEC)"," is an industry consortium that defines and maintains the most widely recognized benchmark suites. SPEC benchmarks are used everywhere — by researchers, hardware vendors, and buyers — to compare systems on a level playing field.",[42,712,714],{"id":713},"spec-cpu2017","SPEC CPU2017",[16,716,717,718,720,721,724,725,728,729,732],{},"The flagship suite for processor-intensive workloads is ",[31,719,714],{},". It's designed for applications that spend most of their time doing computation rather than I\u002FO. The suite consists of ",[31,722,723],{},"20 integer benchmarks"," and ",[31,726,727],{},"23 floating-point benchmarks"," written in C, C++, and Fortran, containing over ",[31,730,731],{},"11 million lines of code"," in total.",[16,734,735],{},"The benchmarks cover a fascinating range of real-world tasks: Perl interpreting, GCC compilation, route planning, video compression (x264), chess AI (alpha-beta search), Go AI (Monte Carlo tree search), Sudoku solving, weather forecasting, molecular dynamics, 3D rendering, ocean modeling, and more.",[42,737,739],{"id":738},"key-spec-terminology","Key SPEC Terminology",[116,741,742,748,754,760,766,772],{},[119,743,744,747],{},[31,745,746],{},"System under test:"," The system you're evaluating.",[119,749,750,753],{},[31,751,752],{},"Reference machine:"," A baseline machine SPEC uses to establish reference times for each benchmark.",[119,755,756,759],{},[31,757,758],{},"Base metric:"," Results compiled with strict, conservative compiler settings (required for all reported results).",[119,761,762,765],{},[31,763,764],{},"Peak metric:"," Results where users can aggressively optimize compiler settings to squeeze out maximum performance.",[119,767,768,771],{},[31,769,770],{},"Speed metric:"," How long a single task takes to complete — useful for comparing single-threaded performance.",[119,773,774,777],{},[31,775,776],{},"Rate metric:"," How many tasks a system can complete in a given time — a throughput measure that leverages multiple processors.",[16,779,780],{},"The SPEC evaluation process follows a structured workflow: get the benchmark program, run it multiple times, select the median result, compute the ratio against the reference machine, and finally compute the geometric mean across all benchmarks to get a single aggregate score.",[11,782,784],{"id":783},"wrapping-up","Wrapping Up",[16,786,787],{},"Performance in computer architecture isn't just about having the fastest clock. It's a delicate balancing act between processor speed, memory access, I\u002FO throughput, and the software that ties it all together. As we've hit physical limits on clock speed, the industry has pivoted to parallelism — multicore, MIC, and GPUs — putting the burden on software to exploit these architectures.",[16,789,790,791,794],{},"When evaluating performance, always look beyond a single number. Understand CPI, clock rate, and instruction count together. Use standardized benchmarks like SPEC rather than relying on MIPS or raw clock speed. And remember: the fastest system is the one that finishes ",[23,792,793],{},"your workload"," in the least time.",[16,796,797],{},[31,798,799],{},"Key takeaways:",[116,801,802,805,808,811,814],{},[119,803,804],{},"CPU time = instruction count × CPI \u002F clock rate — this is the fundamental equation",[119,806,807],{},"Clock speed alone is misleading; CPI matters just as much",[119,809,810],{},"Physical limits (power, RC delay, memory wall) ended the era of simple clock speed scaling",[119,812,813],{},"Multicore and GPU parallelism are the present and future of performance",[119,815,816],{},"SPEC benchmarks provide the most reliable, standardized performance comparisons",{"title":284,"searchDepth":818,"depth":818,"links":819},2,[820,821,829,830,831,836,840,841,846,849,850,855,860],{"id":13,"depth":818,"text":14},{"id":36,"depth":818,"text":37,"children":822},[823,825,826,827,828],{"id":44,"depth":824,"text":45},3,{"id":51,"depth":824,"text":52},{"id":67,"depth":824,"text":68},{"id":78,"depth":824,"text":79},{"id":85,"depth":824,"text":86},{"id":96,"depth":818,"text":97},{"id":133,"depth":818,"text":134},{"id":158,"depth":818,"text":159,"children":832},[833,834,835],{"id":165,"depth":824,"text":166},{"id":172,"depth":824,"text":173},{"id":183,"depth":824,"text":184},{"id":190,"depth":818,"text":191,"children":837},[838,839],{"id":206,"depth":824,"text":207},{"id":217,"depth":824,"text":218},{"id":224,"depth":818,"text":225},{"id":263,"depth":818,"text":264,"children":842},[843,844,845],{"id":270,"depth":824,"text":271},{"id":325,"depth":824,"text":326},{"id":349,"depth":824,"text":350},{"id":376,"depth":818,"text":377,"children":847},[848],{"id":405,"depth":824,"text":406},{"id":538,"depth":818,"text":539},{"id":626,"depth":818,"text":627,"children":851},[852,853,854],{"id":630,"depth":824,"text":631},{"id":643,"depth":824,"text":644},{"id":656,"depth":824,"text":657},{"id":676,"depth":818,"text":677,"children":856},[857,858,859],{"id":702,"depth":824,"text":703},{"id":713,"depth":824,"text":714},{"id":738,"depth":824,"text":739},{"id":783,"depth":818,"text":784},"2026-04-18",false,"md",null,{},true,"\u002Fblog\u002FComputer-Systems-Architecture-Understanding-Performance",{"title":5,"description":284},{"loc":867},"blog\u002FComputer-Systems-Architecture-Understanding-Performance","uJBODumJuxhoGq9uD6CknSUusw9f-z8fK9rvsZq0Vmw",101,{"id":874,"extension":875,"meta":876,"series":877,"stem":981,"__hash__":982},"series\u002Fseries.json","json",{},{"微積分教學":878,"生活紀錄":881,"Motor Control":883,"生活隨筆":897,"Motor learning":901,"小兒物治":919,"中風":934,"平衡":945,"Network Communication":955,"CSA":962,"機器學習":968,"小腦":972},[879,880],"微積分隨筆-未完成版","2025數學回顧",[882],"一個漂流到地球的故事",[884,885,886,887,888,889,890,891,892,893,894,895,896],"控制自己-Be-water-my-friend","控制自己-Be-water-my-friend（二）","控制自己-Be-water-my-friend（三）","控制自己-Be-water-my-friend（四）","控制自己-Be-water-my-friend（五）","進階控制制制制","周圍理論學派（一）反射理論","周圍理論學派（二）階層理論","中樞理論學派（一）CPG","中樞理論學派（二）Motor-Program","模組理論","系統理論","動態模組理論",[898,899,900],"你好，世界。","根本沒人在乎你的部落格","早安-午安-晚安",[902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918],"動作學習（一）介紹","動作學習（二）form-of-learning","動作學習（三）Measurement-of-learning","動作學習（四）理論","動作學習（五）理論-2","動作學習（六）理論-3","動作學習（七）練習方式-1","動作學習（八）練習方式-2","動作學習（九）回饋-1","動作學習（十）回饋-2-擴增性(KR)","動作學習（十一）回饋-3-擴增性(KP)","動作學習（十一）回饋-4-(間隔+物理引導)","動作學習（十二）神經可塑性","動作學習（十二）神經可塑性2","動作學習（十三）臨床應用","動作學習（十四）記憶","動作學習（十五）影響表現的因素",[920,921,922,923,924,925,926,927,928,929,930,931,932,933],"腦性痲痺-CP","CP補充（一）","CP—Rood-&-Bobath","CP—Rood-&-Bobath（二）","Motor-Learning","Motor-Learning小兒（二）","Gait-analysis小兒（一）","Gait-analysis小兒（二）","小兒發展（一）","小兒發展（二）","小兒發展（三）","小兒發展（四）","小兒發展（五）","GMFCS",[935,936,937,938,939,940,941,942,943,944],"腦血管病變（CVA）（中風）(一)","CVA（二）","CVA（三）血管症候群-i","CVA（四）血管症候群-(ii)","CVA（四）","CVA（六）","CVA（七）評估-(i)","CVA（八）評估-(ii)","CVA（九）復健—手部-(i)","CVA跑台（一）",[946,947,948,949,950,951,952,953,954],"平衡與前庭失調（一）","Balance（二）前庭覺-(i)","Balance（三）","Balance（四）評估","Balance（五）復健","Balance（六）功能恢復","Balance（七）前庭障礙","Balance（八）檢查","Balance（九）干預",[956,957,958,959,960,961],"Network-Communication,-Chapter-1","Network-Communication,-Chapter-2","Network-Communication,-Chapter-3","Network-Communication-Chapter-4","Network-Communications,-Chapter-5","Network-Communication,-Chapter-6",[963,964,965,966,967],"Week-1-—-Introduction-to-Computer-Systems","Computer-Systems-Architecture-Understanding-Performance","A-Top-Level-View-of-Computer-Function-and-Interconnection","The-Memory-Hierarchy-Understanding-Cache-Memory","Internal-Memory-How-Your-Computer-Remembers-Things",[969,970,971],"機器學習導論","資料前處理與迴歸分析","決策樹",[973,974,975,976,977,978,979,980],"小腦（一）","小腦（二）","小腦（三）功能","小腦（四）損傷","小腦（五）各功能障礙","小腦（六）評估","小腦（七）評估(ii)","小腦（八）治療","series","sJucyaG7gC0dv353KrH133RVxHX0BiFnZMLor9gTQ3o",[984,1003,1023,1042,1059,1076,1091,1108,1126,1148],{"id":985,"title":986,"avatar":987,"banner":864,"bio":988,"body":989,"description":284,"extension":863,"meta":993,"name":986,"navigation":866,"path":994,"seo":995,"sitemap":996,"social":997,"stem":1001,"__hash__":1002},"authors\u002Fauthors\u002Fautomata.md","Automata","\u002Fimages\u002Fuploads\u002Fnier-automata-2b.jpg","一隻吐司天喵，漂浮在銀河星辰中",{"type":8,"value":990,"toc":991},[],{"title":284,"searchDepth":818,"depth":818,"links":992},[],{},"\u002Fauthors\u002Fautomata",{"description":284},{"loc":994},{"website":998,"twitter":999,"github":1000},"https:\u002F\u002Freurl.cc\u002FWOeM29","https:\u002F\u002Freurl.cc\u002FLnvLEy","https:\u002F\u002Fgithub.com\u002FAutomata-0","authors\u002Fautomata","IkVbO2zA7revgYq624iVWpSZQUyMmWa82tw_EbWXViE",{"id":1004,"title":1005,"avatar":1006,"banner":1007,"bio":1008,"body":1009,"description":284,"extension":863,"meta":1013,"name":1014,"navigation":866,"path":1015,"seo":1016,"sitemap":1017,"social":1018,"stem":1021,"__hash__":1022},"authors\u002Fauthors\u002Fchinono.md","Chinono","\u002Fimages\u002Fuploads\u002F103467998_p0 copy.png","\u002Fimages\u002Fbackground_light.jpg","我不是女生！",{"type":8,"value":1010,"toc":1011},[],{"title":284,"searchDepth":818,"depth":818,"links":1012},[],{},"七糯糯","\u002Fauthors\u002Fchinono",{"description":284},{"loc":1015},{"github":1019,"twitter":284,"website":1020},"https:\u002F\u002Fgithub.com\u002FChinHongTan","https:\u002F\u002Fchinono.dev","authors\u002Fchinono","jj1J9mFh3InZFL6XtCzGBQ5jPip0EwBDE3mjGvnN6jE",{"id":1024,"title":1025,"avatar":1026,"banner":1027,"bio":1028,"body":1029,"description":284,"extension":863,"meta":1033,"name":1034,"navigation":866,"path":1035,"seo":1036,"sitemap":1037,"social":1038,"stem":1040,"__hash__":1041},"authors\u002Fauthors\u002Fhibiki12141132.md","Hibiki12141132","https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F265822020?v=4","\u002Fimages\u002Fuploads\u002F1773978423557-___.jpg","享受著知識強姦大腦的過程 (內文含個人發癲 不要再意)",{"type":8,"value":1030,"toc":1031},[],{"title":284,"searchDepth":818,"depth":818,"links":1032},[],{},"HiBiKi","\u002Fauthors\u002Fhibiki12141132",{"description":284},{"loc":1035},{"github":1039,"twitter":284},"https:\u002F\u002Fgithub.com\u002FHiBiKi12141132","authors\u002Fhibiki12141132","dbRnKEcYeCH_faD8R7AUmPPcwgc26s_fR4Q_lu4qtA4",{"id":1043,"title":1044,"avatar":1045,"banner":864,"bio":1046,"body":1047,"description":284,"extension":863,"meta":1051,"name":1044,"navigation":866,"path":1052,"seo":1053,"sitemap":1054,"social":1055,"stem":1057,"__hash__":1058},"authors\u002Fauthors\u002Fmahiro.md","Mahiro","https:\u002F\u002Ftruth.bahamut.com.tw\u002Fs01\u002F202601\u002F2a29b047d341f840b2ce89f7d65b2ba3.JPG","一個致力於逃離新竹的電機系小雜魚",{"type":8,"value":1048,"toc":1049},[],{"title":284,"searchDepth":818,"depth":818,"links":1050},[],{},"\u002Fauthors\u002Fmahiro",{"description":284},{"loc":1052},{"github":1056},"https:\u002F\u002Fgithub.com\u002Fwifekurumi","authors\u002Fmahiro","b435tdWu9eXUf06WroCge0I405cqA0FhLlUUhoPk14k",{"id":1060,"title":1061,"avatar":1062,"banner":864,"bio":1063,"body":1064,"description":284,"extension":863,"meta":1068,"name":1061,"navigation":866,"path":1069,"seo":1070,"sitemap":1071,"social":1072,"stem":1074,"__hash__":1075},"authors\u002Fauthors\u002Fosborrrrn.md","Osborrrrn","\u002Fimages\u002Fuploads\u002Frectangle_large_type_2_c516437ed713e5de1f7d2dca8a20cd81.jpg","別人笑我太瘋癲，我笑他人看不穿。\n不見五陵豪傑墓，無花無酒鋤就田",{"type":8,"value":1065,"toc":1066},[],{"title":284,"searchDepth":818,"depth":818,"links":1067},[],{},"\u002Fauthors\u002Fosborrrrn",{"description":284},{"loc":1069},{"github":1073},"https:\u002F\u002Fgithub.com\u002FOsborrrrn","authors\u002Fosborrrrn","w6VWZKPUwvXn5i7MKXOpU2Jeqr3BrdTKVCeDOF2jZlU",{"id":1077,"title":1078,"avatar":864,"banner":864,"bio":1079,"body":1080,"description":284,"extension":863,"meta":1084,"name":1078,"navigation":866,"path":1085,"seo":1086,"sitemap":1087,"social":1088,"stem":1089,"__hash__":1090},"authors\u002Fauthors\u002F法法.md","法法","123",{"type":8,"value":1081,"toc":1082},[],{"title":284,"searchDepth":818,"depth":818,"links":1083},[],{},"\u002Fauthors\u002F法法",{"description":284},{"loc":1085},{"github":284},"authors\u002F法法","o5pdVuPCfTmhkDCpvgy4YmAP0CGdvFluPvjhgvQVbsI",{"id":1092,"title":1093,"avatar":1094,"banner":864,"bio":1095,"body":1096,"description":284,"extension":863,"meta":1100,"name":1093,"navigation":866,"path":1101,"seo":1102,"sitemap":1103,"social":1104,"stem":1106,"__hash__":1107},"authors\u002Fauthors\u002F灰海獅.md","灰海獅","\u002Fimages\u002Fuploads\u002Fimg_3279.jpeg","守夜人",{"type":8,"value":1097,"toc":1098},[],{"title":284,"searchDepth":818,"depth":818,"links":1099},[],{},"\u002Fauthors\u002F灰海獅",{"description":284},{"loc":1101},{"github":1105},"https:\u002F\u002Fgithub.com\u002Fyuiri333","authors\u002F灰海獅","iZoSIFbQdS-6v3LiK1txgxnIMKy-d2CyZXQk9CMua_s",{"id":1109,"title":1110,"avatar":1111,"banner":1112,"bio":1113,"body":1114,"description":284,"extension":863,"meta":1118,"name":1110,"navigation":866,"path":1119,"seo":1120,"sitemap":1121,"social":1122,"stem":1124,"__hash__":1125},"authors\u002Fauthors\u002F花夜.md","花夜","\u002Fimages\u002Fuploads\u002F1772719470518-791_20260218161129.png","\u002Fimages\u002Fuploads\u002Fimg_2446.png","無論你身在何處，我都會在這裡等你",{"type":8,"value":1115,"toc":1116},[],{"title":284,"searchDepth":818,"depth":818,"links":1117},[],{},"\u002Fauthors\u002F花夜",{"description":284},{"loc":1119},{"github":1123,"twitter":284},"https:\u002F\u002Fgithub.com\u002Fflowernight0709","authors\u002F花夜","a7jeQiF_JkawgYIR-aYSGceJdDP6Z-OWydsICvgSIzs",{"id":1127,"title":1128,"avatar":1129,"banner":1130,"bio":1131,"body":1132,"description":1136,"extension":863,"meta":1139,"name":1128,"navigation":866,"path":1140,"seo":1141,"sitemap":1142,"social":1143,"stem":1146,"__hash__":1147},"authors\u002Fauthors\u002F輝月.md","輝月","\u002Fimages\u002Fuploads\u002Ffb_img_1771085634823.jpg","\u002Fimages\u002Fuploads\u002Fimg_1751.jpg","天下布魔好好玩",{"type":8,"value":1133,"toc":1137},[1134],[16,1135,1136],{},"準大學生，目前正在製作TFR模組",{"title":284,"searchDepth":818,"depth":818,"links":1138},[],{},"\u002Fauthors\u002F輝月",{"description":1136},{"loc":1140},{"twitter":1144,"github":1145},"https:\u002F\u002Fx.com\u002Fhuiyue945","https:\u002F\u002Fgithub.com\u002Fhuiyueyea","authors\u002F輝月","koUocBihphDy3453-nAcolM7JJYwI7UMBpVkf1JQrMQ",{"id":1149,"title":1150,"avatar":1151,"banner":864,"bio":1152,"body":1153,"description":1157,"extension":863,"meta":1170,"name":1150,"navigation":866,"path":1171,"seo":1172,"sitemap":1173,"social":1174,"stem":1176,"__hash__":1177},"authors\u002Fauthors\u002F阿西狄亞.md","阿西狄亞","\u002Fimages\u002Fuploads\u002Fimg_20251215_121849_589.jpg","君は実に馬鹿だな",{"type":8,"value":1154,"toc":1168},[1155,1158],[16,1156,1157],{},"我是阿西狄亞，阿西狄亞的阿，阿西狄亞的西，阿西狄亞的狄，阿西狄亞的亞，你可以叫我阿西。",[16,1159,1160,1163,1164,1167],{},[31,1161,1162],{},"我說的所有事情都抱有極度主觀的看法以及意見","，如果你有其他想法，",[31,1165,1166],{},"你是對的","。",{"title":284,"searchDepth":818,"depth":818,"links":1169},[],{},"\u002Fauthors\u002F阿西狄亞",{"description":1157},{"loc":1171},{"github":1175},"https:\u002F\u002Fgithub.com\u002FAcedia0130","authors\u002F阿西狄亞","q5ECEDl8-0Y33tPck0lYZnzPjFdJkrOnBN7HkAO3pls",[1179,2659,3230,4062,5301],{"id":1180,"title":1181,"author":6,"body":1182,"date":861,"description":284,"draft":862,"edited_at":2652,"extension":863,"featured_image":864,"meta":2653,"navigation":866,"path":2654,"pinned":862,"seo":2655,"sitemap":2656,"stem":2657,"tags":864,"__hash__":2658},"blog\u002Fblog\u002FWeek-1-—-Introduction-to-Computer-Systems.md","Introduction to Computer Systems",{"type":8,"value":1183,"toc":2623},[1184,1188,1191,1208,1211,1222,1226,1229,1233,1240,1281,1285,1292,1312,1316,1319,1341,1348,1351,1355,1366,1369,1387,1398,1402,1409,1435,1438,1441,1444,1451,1465,1468,1471,1485,1488,1495,1499,1518,1572,1575,1589,1593,1603,1607,1614,1628,1636,1640,1643,1713,1716,1720,1723,1778,1785,1788,1792,1795,1799,1804,1807,1848,1851,1855,1861,1864,1867,1893,1897,1904,1978,1985,1989,1992,2006,2009,2028,2239,2242,2246,2253,2256,2266,2273,2276,2280,2290,2294,2297,2318,2332,2336,2343,2350,2360,2363,2401,2405,2408,2428,2434,2438,2449,2468,2483,2486,2499,2503,2520,2523,2526,2546,2552,2556,2559,2566,2598,2602,2609,2612],[11,1185,1187],{"id":1186},"why-this-series-exists","Why this series exists",[16,1189,1190],{},"Computer Systems Architecture (CSA) is one of those subjects that sounds intimidating but is actually about a very simple question:",[1192,1193,1194],"blockquote",{},[16,1195,1196,1199,1200,1199,1205],{},[31,1197,1198],{},"What is"," ",[23,1201,1202],{},[31,1203,1204],{},"inside",[31,1206,1207],{},"a computer, and how do those parts work together to run the software we write?",[16,1209,1210],{},"Every time you open a browser, play a game, or send a message, billions of tiny switches are doing an enormous amount of coordinated work under the hood. CSA is the map of that hidden world. If you are a programmer, understanding the machine below your code will make you write better, faster, and more memory-friendly programs. If you are just curious, it is genuinely one of the most elegant engineering stories of the 20th century.",[16,1212,1213,1214,1217,1218,1221],{},"In this first post we will cover the ",[31,1215,1216],{},"big picture",": the vocabulary, the mental model, and a short history that explains ",[23,1219,1220],{},"how we got here",". Later posts will zoom in on the pieces one by one.",[11,1223,1225],{"id":1224},"_1-computer-architecture-vs-computer-organization","1. Computer Architecture vs. Computer Organization",[16,1227,1228],{},"The very first distinction in CSA is between two words that sound like synonyms but mean different things.",[42,1230,1232],{"id":1231},"computer-architecture","Computer Architecture",[16,1234,1235,1236,1239],{},"Architecture refers to the parts of the system that are ",[31,1237,1238],{},"visible to the programmer"," — the things that affect how a program behaves logically. Examples include:",[116,1241,1242,1259,1265,1271],{},[119,1243,706,1244,1247,1248,1251,1252,1251,1255,1258],{},[31,1245,1246],{},"instruction set"," (what commands the CPU understands, like ",[57,1249,1250],{},"ADD",", ",[57,1253,1254],{},"LOAD",[57,1256,1257],{},"JUMP",")",[119,1260,706,1261,1264],{},[31,1262,1263],{},"number of bits"," used for data (8-bit, 32-bit, 64-bit)",[119,1266,706,1267,1270],{},[31,1268,1269],{},"I\u002FO mechanism"," (how the CPU talks to devices)",[119,1272,706,1273,1276,1277,1280],{},[31,1274,1275],{},"addressing technique"," (how memory locations are named) If you change the architecture, programs written for the old one may no longer work. It is the ",[23,1278,1279],{},"contract"," between hardware and software.",[42,1282,1284],{"id":1283},"computer-organization","Computer Organization",[16,1286,1287,1288,1291],{},"Organization refers to the ",[31,1289,1290],{},"operational units and how they are interconnected"," — the actual hardware implementation that realises the architecture. Examples:",[116,1293,1294,1300,1306],{},[119,1295,1296,1299],{},[31,1297,1298],{},"Control signals"," between components",[119,1301,1302,1305],{},[31,1303,1304],{},"Interfaces"," between the computer and peripherals",[119,1307,1308,1311],{},[31,1309,1310],{},"Memory technology"," (DRAM, SRAM, cache levels, etc.)",[42,1313,1315],{"id":1314},"an-analogy","An analogy",[16,1317,1318],{},"Think of a car.",[116,1320,1321,1327],{},[119,1322,706,1323,1326],{},[31,1324,1325],{},"architecture"," is the driver's interface: steering wheel, pedals, gear stick, dashboard. Every Toyota Camry driver knows how to drive any other Camry because the architecture is the same.",[119,1328,706,1329,1332,1333,1336,1337,1340],{},[31,1330,1331],{},"organization"," is what is under the hood: the specific engine size, turbocharger, transmission design. Two Camrys can have the ",[23,1334,1335],{},"same architecture"," but ",[23,1338,1339],{},"different organizations"," — one is the base model, the other is the sport version. They drive the same, but one is faster and more expensive.\nThis leads to a key observation from the slides:",[1192,1342,1343],{},[16,1344,1345],{},[23,1346,1347],{},"Same architecture but different organization → different price and performance. Architecture tends to last a long time with only minor changes, while organization changes as technology improves. By changing the organization, the user can decide the performance they want.",[16,1349,1350],{},"This is literally how Intel and AMD sell you ten different CPUs that all run Windows — same architecture (x86-64), different organizations.",[11,1352,1354],{"id":1353},"_2-structure-and-function-the-mental-model","2. Structure and Function: the mental model",[16,1356,1357,1358,1361,1362,1365],{},"A computer is a ",[31,1359,1360],{},"complex system",". To understand complex systems, engineers use a universal trick: they break them into ",[31,1363,1364],{},"hierarchical levels",", from highest to lowest, and study each level separately. The levels then combine through the interrelationships between them.",[16,1367,1368],{},"At each level, we ask two questions:",[116,1370,1371,1377],{},[119,1372,1373,1376],{},[31,1374,1375],{},"Structure"," — How are the components inter-related? (the wiring, the shape)",[119,1378,1379,1382,1383,1386],{},[31,1380,1381],{},"Function"," — What does each component ",[23,1384,1385],{},"do"," as part of the whole? (the behaviour)",[16,1388,1389,1390,1393,1394,1397],{},"There is a classic question that comes up here: when analysing or designing a computer, do you go ",[31,1391,1392],{},"top-down"," (start from the whole system and decompose) or ",[31,1395,1396],{},"bottom-up"," (start from transistors and build up)? Both approaches have value, but in this course we will mostly go top-down: start from what a computer does, then zoom in to how it does it.",[11,1399,1401],{"id":1400},"_3-the-four-functions-of-a-computer","3. The Four Functions of a Computer",[16,1403,1404,1405,1408],{},"No matter how fancy a computer gets — your phone, a laptop, a supercomputer — it only ever performs ",[31,1406,1407],{},"four basic functions",":",[686,1410,1411,1417,1423,1429],{},[119,1412,1413,1416],{},[31,1414,1415],{},"Data Processing"," — crunching numbers, transforming data.",[119,1418,1419,1422],{},[31,1420,1421],{},"Data Storage"," — keeping data, either briefly (RAM) or for a long time (SSD, hard drive).",[119,1424,1425,1428],{},[31,1426,1427],{},"Data Movement"," — shuffling data between the computer and the outside world.",[119,1430,1431,1434],{},[31,1432,1433],{},"Control"," — orchestrating the other three, deciding what happens when.\nLet's look at each one.",[42,1436,1415],{"id":1437},"data-processing",[16,1439,1440],{},"The computer takes data in, transforms it somehow, and produces new data out. Adding two numbers, resizing an image, decoding a video — all data processing.",[42,1442,1421],{"id":1443},"data-storage",[16,1445,1446,1447,1450],{},"Data has to live ",[23,1448,1449],{},"somewhere",". Storage is split into two flavours:",[116,1452,1453,1459],{},[119,1454,1455,1458],{},[31,1456,1457],{},"Short-term storage"," — fast but volatile (it disappears when the power goes off). This is RAM.",[119,1460,1461,1464],{},[31,1462,1463],{},"Long-term storage"," — slower but persistent. This is your SSD, hard drive, or USB stick.",[42,1466,1427],{"id":1467},"data-movement",[16,1469,1470],{},"There are two sub-categories worth knowing:",[116,1472,1473,1479],{},[119,1474,1475,1478],{},[31,1476,1477],{},"Data communications"," — moving data over long distances or between remote devices (think: Wi-Fi, Ethernet, the internet).",[119,1480,1481,1484],{},[31,1482,1483],{},"Input\u002FOutput (I\u002FO)"," — moving data to and from peripherals directly connected to the computer (keyboard, mouse, screen, printer).",[42,1486,1433],{"id":1487},"control",[16,1489,1490,1491,1494],{},"A user-defined ",[31,1492,1493],{},"control algorithm"," decides when each of the other three functions happens, in what order, and in response to what. Without control, the other parts are just a pile of capabilities with no coordinator.",[42,1496,1498],{"id":1497},"the-four-functions-visualised","The four functions, visualised",[16,1500,1501,1502,1251,1505,1251,1507,1251,1510,1513,1514,1517],{},"The slides use a neat diagram with four circles — ",[23,1503,1504],{},"Movement",[23,1506,1433],{},[23,1508,1509],{},"Storage",[23,1511,1512],{},"Processing"," — connected in a way that shows how control sits in the middle, orchestrating everything. Different patterns of the arrows highlight different ",[23,1515,1516],{},"modes"," of operation:",[408,1519,1520,1530],{},[411,1521,1522],{},[414,1523,1524,1527],{},[417,1525,1526],{},"Mode",[417,1528,1529],{},"What is happening",[433,1531,1532,1542,1552,1562],{},[414,1533,1534,1539],{},[438,1535,1536],{},[31,1537,1538],{},"Data storage device",[438,1540,1541],{},"Data moves between the external environment and storage (read\u002Fwrite).",[414,1543,1544,1549],{},[438,1545,1546],{},[31,1547,1548],{},"Data movement device",[438,1550,1551],{},"Data is transferred from one peripheral or communication line to another (no storage, no processing).",[414,1553,1554,1559],{},[438,1555,1556],{},[31,1557,1558],{},"Data processing, storage-only",[438,1560,1561],{},"Data already in storage is processed and written back — nothing leaves the machine.",[414,1563,1564,1569],{},[438,1565,1566],{},[31,1567,1568],{},"Data processing, involving external environment",[438,1570,1571],{},"Data comes in from outside, is processed, and results go back out. This is the full pipeline.",[16,1573,1574],{},"Two important notes from the slides:",[686,1576,1577,1583],{},[119,1578,1579,1582],{},[31,1580,1581],{},"All four operations involve the control function."," Control is always in the loop.",[119,1584,1585,1588],{},[31,1586,1587],{},"The functions can run simultaneously with multi-core processors."," In older single-core machines, things happened one at a time; today, your laptop is doing storage, movement, and processing all at once.",[11,1590,1592],{"id":1591},"_4-structure-how-the-pieces-fit-together","4. Structure: how the pieces fit together",[16,1594,1595,1596,1598,1599,1602],{},"Now that we know ",[23,1597,618],{}," a computer does, let's look at ",[23,1600,1601],{},"how"," it is built. We will zoom in level by level.",[42,1604,1606],{"id":1605},"level-1-the-computer-and-the-world-outside-it","Level 1: The computer and the world outside it",[16,1608,1609,1610,1613],{},"At the highest level, the computer is a single blob that interacts with the ",[31,1611,1612],{},"external environment"," through:",[116,1615,1616,1622],{},[119,1617,1618,1621],{},[31,1619,1620],{},"Peripherals"," (keyboard, mouse, monitor, printer, etc.)",[119,1623,1624,1627],{},[31,1625,1626],{},"Communication lines"," (network cables, Wi-Fi, Bluetooth)",[16,1629,1630,1631,724,1633,1635],{},"Inside that blob, the essential jobs are ",[31,1632,1509],{},[31,1634,1512],{},". Everything else at this level is just \"the outside world\".",[42,1637,1639],{"id":1638},"level-2-the-internal-structure-of-a-computer","Level 2: The internal structure of a computer",[16,1641,1642],{},"If we crack open the computer blob, we find four major internal components:",[408,1644,1645,1655],{},[411,1646,1647],{},[414,1648,1649,1652],{},[417,1650,1651],{},"Component",[417,1653,1654],{},"What it does",[433,1656,1657,1670,1683,1696],{},[414,1658,1659,1664],{},[438,1660,1661],{},[31,1662,1663],{},"Central Processing Unit (CPU)",[438,1665,1666,1667],{},"Controls the operation of the computer and performs data processing. ",[23,1668,1669],{},"The brain.",[414,1671,1672,1677],{},[438,1673,1674],{},[31,1675,1676],{},"Main memory",[438,1678,1679,1680],{},"Stores data. ",[23,1681,1682],{},"The short-term workspace.",[414,1684,1685,1690],{},[438,1686,1687],{},[31,1688,1689],{},"I\u002FO",[438,1691,1692,1693],{},"Moves data between the computer and its external environment. ",[23,1694,1695],{},"The hands, eyes, and mouth.",[414,1697,1698,1703],{},[438,1699,1700],{},[31,1701,1702],{},"System Interconnection",[438,1704,1705,1706,1709,1710],{},"Mechanism that provides communication among CPU, main memory, and I\u002FO. This is what we usually call the ",[31,1707,1708],{},"System Bus",". ",[23,1711,1712],{},"The nervous system.",[16,1714,1715],{},"Everything else you have heard of — hard drive, GPU, network card — is either part of these or hangs off them via I\u002FO.",[42,1717,1719],{"id":1718},"level-3-inside-the-cpu","Level 3: Inside the CPU",[16,1721,1722],{},"Crack the CPU open and we find another four components:",[408,1724,1725,1733],{},[411,1726,1727],{},[414,1728,1729,1731],{},[417,1730,1651],{},[417,1732,1654],{},[433,1734,1735,1745,1755,1768],{},[414,1736,1737,1742],{},[438,1738,1739],{},[31,1740,1741],{},"Control Unit (CU)",[438,1743,1744],{},"Controls the operation of the CPU (and, by extension, the whole computer).",[414,1746,1747,1752],{},[438,1748,1749],{},[31,1750,1751],{},"Arithmetic and Logic Unit (ALU)",[438,1753,1754],{},"Performs the computer's data processing functions — the actual adding, comparing, ANDing, ORing.",[414,1756,1757,1762],{},[438,1758,1759],{},[31,1760,1761],{},"Registers",[438,1763,1764,1765,1767],{},"Provide tiny, super-fast storage ",[23,1766,1204],{}," the CPU.",[414,1769,1770,1775],{},[438,1771,1772],{},[31,1773,1774],{},"CPU Interconnection",[438,1776,1777],{},"Provides communication between the control unit, ALU, and registers.",[16,1779,1780,1781,1784],{},"Notice how the structure is ",[31,1782,1783],{},"recursive",": at each level, we see \"something that processes, something that stores, something that controls, something that connects them.\" This is not a coincidence — it is a reflection of the four functions from Section 3, applied at every scale.",[16,1786,1787],{},"This recursive pattern is one of the most beautiful ideas in computer architecture. We will see it again and again.",[11,1789,1791],{"id":1790},"_5-a-brief-history-of-computers","5. A Brief History of Computers",[16,1793,1794],{},"Now the fun part. Let us walk through the four generations of computer hardware and see how each jump unlocked what came next.",[42,1796,1798],{"id":1797},"generation-1-vacuum-tubes-1940s1950s","Generation 1: Vacuum Tubes (1940s–1950s)",[1800,1801,1803],"h4",{"id":1802},"eniac-the-first-general-purpose-electronic-computer","ENIAC — the first general-purpose electronic computer",[16,1805,1806],{},"ENIAC (Electronic Numerical Integrator and Computer) was the first general-purpose computer built with vacuum tubes. Some eye-watering stats:",[116,1808,1809,1815,1822,1828,1835,1842],{},[119,1810,1811,1812],{},"Weighed ",[31,1813,1814],{},"30 tons",[119,1816,1817,1818,1821],{},"Occupied ",[31,1819,1820],{},"1,500 square feet"," of floor space",[119,1823,1824,1825],{},"Used ",[31,1826,1827],{},"18,000 vacuum tubes",[119,1829,1830,1831,1834],{},"Consumed ",[31,1832,1833],{},"140 kW"," of power (enough to power ~100 modern homes)",[119,1836,1837,1838,1841],{},"Was a ",[31,1839,1840],{},"decimal machine",": 10 vacuum tubes were used to represent a single decimal digit",[119,1843,1844,1845],{},"Was programmed by ",[31,1846,1847],{},"manually flipping switches",[16,1849,1850],{},"Imagine programming a machine the size of a tennis court by walking around flipping thousands of switches. That was software engineering in 1946.",[1800,1852,1854],{"id":1853},"edvac-and-the-von-neumann-breakthrough","EDVAC and the Von Neumann breakthrough",[16,1856,1857,1858,257],{},"EDVAC (Electronic Discrete Variable Computer) was proposed by John Von Neumann and introduced the idea that still defines virtually every computer today: the ",[31,1859,1860],{},"stored-program concept",[16,1862,1863],{},"Instead of rewiring the machine for every new program, you store the program in memory alongside the data, just as numbers. The computer then reads its own instructions one by one.",[16,1865,1866],{},"EDVAC also:",[116,1868,1869,1876,1882],{},[119,1870,1871,1872,1875],{},"Had the ",[31,1873,1874],{},"basic internal structure of a modern CPU"," — control unit, memory, ALU, I\u002FO (sound familiar? Section 4!)",[119,1877,1837,1878,1881],{},[31,1879,1880],{},"binary system"," — 1s and 0s instead of decimal digits",[119,1883,1884,1885,1888,1889,1892],{},"Had ",[31,1886,1887],{},"1,000 storage locations",", each ",[31,1890,1891],{},"40 bits"," wide, for both data and instructions",[1800,1894,1896],{"id":1895},"the-ias-computer-and-its-registers","The IAS computer and its registers",[16,1898,1899,1900,1903],{},"The IAS machine (built at the Institute for Advanced Study, a relative of EDVAC) is often used as the textbook example of early stored-program architecture. It introduced several specialised ",[31,1901,1902],{},"registers"," that still exist, in one form or another, in modern CPUs:",[408,1905,1906,1916],{},[411,1907,1908],{},[414,1909,1910,1913],{},[417,1911,1912],{},"Register",[417,1914,1915],{},"Role",[433,1917,1918,1928,1938,1948,1958,1968],{},[414,1919,1920,1925],{},[438,1921,1922],{},[31,1923,1924],{},"Memory Buffer Register (MBR)",[438,1926,1927],{},"Holds a word being stored to memory or sent to I\u002FO, and vice versa.",[414,1929,1930,1935],{},[438,1931,1932],{},[31,1933,1934],{},"Memory Address Register (MAR)",[438,1936,1937],{},"Specifies the memory address that will be written to or read into the MBR.",[414,1939,1940,1945],{},[438,1941,1942],{},[31,1943,1944],{},"Instruction Register (IR)",[438,1946,1947],{},"Holds the 8-bit opcode of the current instruction.",[414,1949,1950,1955],{},[438,1951,1952],{},[31,1953,1954],{},"Instruction Buffer Register (IBR)",[438,1956,1957],{},"Temporarily holds an instruction.",[414,1959,1960,1965],{},[438,1961,1962],{},[31,1963,1964],{},"Program Counter (PC)",[438,1966,1967],{},"Holds the address of the next instruction pair to be fetched from memory.",[414,1969,1970,1975],{},[438,1971,1972],{},[31,1973,1974],{},"Accumulator (AC) and Multiplier Quotient (MQ)",[438,1976,1977],{},"Temporarily hold operands and results of ALU operations.",[16,1979,1980,1981,1984],{},"You do not need to memorise these today, but notice the pattern: ",[23,1982,1983],{},"every register has one specific job",". CPU design is all about having the right little box in the right place.",[42,1986,1988],{"id":1987},"generation-2-transistors-late-1950s1960s","Generation 2: Transistors (late 1950s–1960s)",[16,1990,1991],{},"In 1947, Bell Labs invented the transistor. It was a revolution:",[116,1993,1994,2000],{},[119,1995,1996,1999],{},[31,1997,1998],{},"Smaller, cheaper, and dissipated less heat"," than a vacuum tube",[119,2001,2002,2005],{},[31,2003,2004],{},"Solid-state"," device made from silicon — no fragile glass, no vacuum",[16,2007,2008],{},"To appreciate the jump, picture the difference:",[116,2010,2011,2018],{},[119,2012,2013,2014,2017],{},"A ",[31,2015,2016],{},"vacuum tube"," requires wires, metal plates, a glass capsule, and a vacuum inside. Picture a light bulb with extra plumbing.",[119,2019,2013,2020,2023,2024,2027],{},[31,2021,2022],{},"transistor"," is a tiny chunk of silicon with three leads. That's it.\nThe flagship second-generation machine was the ",[31,2025,2026],{},"IBM 7094",". Over the course of the IBM 700\u002F7000 series (see the table below), you can watch hardware improve dramatically in just 12 years:",[408,2029,2030,2064],{},[411,2031,2032],{},[414,2033,2034,2037,2040,2043,2046,2049,2052,2055,2058,2061],{},[417,2035,2036],{},"Model",[417,2038,2039],{},"Year",[417,2041,2042],{},"CPU Tech",[417,2044,2045],{},"Memory Tech",[417,2047,2048],{},"Cycle Time (µs)",[417,2050,2051],{},"Memory (K)",[417,2053,2054],{},"Opcodes",[417,2056,2057],{},"Index Registers",[417,2059,2060],{},"Floating Point",[417,2062,2063],{},"Speed (relative to 701)",[433,2065,2066,2098,2128,2155,2185,2213],{},[414,2067,2068,2071,2074,2077,2080,2083,2086,2089,2092,2095],{},[438,2069,2070],{},"701",[438,2072,2073],{},"1952",[438,2075,2076],{},"Vacuum tubes",[438,2078,2079],{},"Electrostatic tubes",[438,2081,2082],{},"30",[438,2084,2085],{},"2–4",[438,2087,2088],{},"24",[438,2090,2091],{},"0",[438,2093,2094],{},"no",[438,2096,2097],{},"1×",[414,2099,2100,2103,2106,2108,2111,2114,2117,2120,2122,2125],{},[438,2101,2102],{},"704",[438,2104,2105],{},"1955",[438,2107,2076],{},[438,2109,2110],{},"Core",[438,2112,2113],{},"12",[438,2115,2116],{},"4–32",[438,2118,2119],{},"80",[438,2121,480],{},[438,2123,2124],{},"yes",[438,2126,2127],{},"2.5×",[414,2129,2130,2133,2136,2138,2140,2142,2145,2148,2150,2152],{},[438,2131,2132],{},"709",[438,2134,2135],{},"1958",[438,2137,2076],{},[438,2139,2110],{},[438,2141,2113],{},[438,2143,2144],{},"32",[438,2146,2147],{},"140",[438,2149,480],{},[438,2151,2124],{},[438,2153,2154],{},"4×",[414,2156,2157,2160,2163,2168,2170,2173,2175,2178,2180,2182],{},[438,2158,2159],{},"7090",[438,2161,2162],{},"1960",[438,2164,2165],{},[31,2166,2167],{},"Transistor",[438,2169,2110],{},[438,2171,2172],{},"2.18",[438,2174,2144],{},[438,2176,2177],{},"169",[438,2179,480],{},[438,2181,2124],{},[438,2183,2184],{},"25×",[414,2186,2187,2190,2193,2195,2197,2199,2201,2204,2207,2210],{},[438,2188,2189],{},"7094 I",[438,2191,2192],{},"1962",[438,2194,2167],{},[438,2196,2110],{},[438,2198,496],{},[438,2200,2144],{},[438,2202,2203],{},"185",[438,2205,2206],{},"7",[438,2208,2209],{},"yes (double precision)",[438,2211,2212],{},"30×",[414,2214,2215,2218,2221,2223,2225,2228,2230,2232,2234,2236],{},[438,2216,2217],{},"7094 II",[438,2219,2220],{},"1964",[438,2222,2167],{},[438,2224,2110],{},[438,2226,2227],{},"1.4",[438,2229,2144],{},[438,2231,2203],{},[438,2233,2206],{},[438,2235,2209],{},[438,2237,2238],{},"50×",[16,2240,2241],{},"From 1× to 50× speed in twelve years, purely from hardware progress. And notice the cycle time dropping from 30 µs to 1.4 µs — that is the clock speed getting faster.",[1800,2243,2245],{"id":2244},"data-channels-giving-the-cpu-a-break","Data channels: giving the CPU a break",[16,2247,2248,2249,2252],{},"A major architectural shift with the IBM 7094 was the introduction of ",[31,2250,2251],{},"data channels",". Here is the idea:",[16,2254,2255],{},"In the old IAS design, the CPU personally supervised every byte moving to or from an I\u002FO device. That is like a CEO answering every email in the company — a terrible use of expensive brainpower.",[16,2257,2013,2258,2261,2262,2265],{},[31,2259,2260],{},"data channel"," is a small, specialised processor with its own instruction set, dedicated to I\u002FO. The CPU just signals the data channel (\"please read this file\"), and the channel does the work on its own and reports back when done. ",[31,2263,2264],{},"The burden of the CPU is reduced",", and the CPU can get on with real computing.",[16,2267,2268,2269,2272],{},"Alongside this was the ",[31,2270,2271],{},"multiplexor",", which manages how data flows to and from the CPU or memory when there are multiple channels competing for attention.",[16,2274,2275],{},"This pattern — offloading specialised work to specialised hardware so the CPU can focus — is everywhere in modern systems. GPUs, DMA controllers, network cards… they are all descendants of that 1962 data channel.",[42,2277,2279],{"id":2278},"generation-3-integrated-circuits-1958-onwards","Generation 3: Integrated Circuits (1958 onwards)",[16,2281,2282,2283,2286,2287,257],{},"In 1958, Jack Kilby and Robert Noyce independently invented the ",[31,2284,2285],{},"integrated circuit (IC)"," — multiple transistors, resistors, and capacitors fabricated together on a single silicon wafer. This started the era of ",[31,2288,2289],{},"microelectronics",[1800,2291,2293],{"id":2292},"the-two-building-blocks","The two building blocks",[16,2295,2296],{},"At the heart of every integrated circuit are just two primitive elements:",[116,2298,2299,2308],{},[119,2300,2301,2304,2305,257],{},[31,2302,2303],{},"Gates"," — perform Boolean logic (AND, OR, NOT). They take inputs and produce an output when the activate signal tells them to. Gates are used for ",[31,2306,2307],{},"data processing",[119,2309,2310,2313,2314,2317],{},[31,2311,2312],{},"Memory cells"," — binary storage cells with a Read\u002FWrite signal. They are used for ",[31,2315,2316],{},"data storage",".\nAnd that is it. With enough gates and enough memory cells, wired up in the right pattern, you can build anything — including the computer you are reading this on.",[16,2319,2320,2321,396,2324,2327,2328,2331],{},"The paths among components handle ",[31,2322,2323],{},"data movement",[31,2325,2326],{},"control signals"," control when gates and memory cells fire. Look at that: the ",[31,2329,2330],{},"four functions again",", now implemented as physical hardware elements on silicon.",[1800,2333,2335],{"id":2334},"how-ics-are-made","How ICs are made",[16,2337,2338,2339,2342],{},"The integrated circuit (transistors, resistors, and capacitors) is fabricated on a ",[31,2340,2341],{},"silicon wafer",". A single wafer can hold many chips with the same configuration of gates, memory cells, and I\u002FO connections, which are cut out and packaged individually. This is why chips keep getting cheaper: you make them by the wafer, not one at a time.",[1800,2344,2346,2347],{"id":2345},"ibm-system360-the-first-computer-family","IBM System\u002F360 — the first computer ",[23,2348,2349],{},"family",[16,2351,2352,2353,2356,2357,257],{},"When IBM built the System\u002F360 in 1964, they did something strategically brilliant. It used integrated circuit technology and, unfortunately, was ",[23,2354,2355],{},"not compatible"," with the previous IBM machines — so existing customers had to rewrite their software. But in exchange, IBM offered something new: the industry's ",[31,2358,2359],{},"first planned family of computers",[16,2361,2362],{},"A \"family\" meant several models with shared characteristics, so customers could upgrade without relearning everything:",[686,2364,2365,2371,2377,2383,2389,2395],{},[119,2366,2367,2370],{},[31,2368,2369],{},"Similar or identical instruction set"," — code written for a small model still runs on a bigger one.",[119,2372,2373,2376],{},[31,2374,2375],{},"Similar or identical operating system"," — sysadmins do not need retraining.",[119,2378,2379,2382],{},[31,2380,2381],{},"Increasing speed"," — bigger model = faster clock.",[119,2384,2385,2388],{},[31,2386,2387],{},"Increasing number of I\u002FO ports"," — bigger model = more devices can connect.",[119,2390,2391,2394],{},[31,2392,2393],{},"Increasing memory size"," — bigger model = more data can fit.",[119,2396,2397,2400],{},[31,2398,2399],{},"Increasing cost"," — you pay for what you get.\nThis is literally how every laptop lineup works today. MacBook Air → Pro → Pro Max. Same macOS, same apps, just more horsepower. IBM invented that pricing model.",[42,2402,2404],{"id":2403},"generation-4-lsi-vlsi-ulsi-1970s-and-beyond","Generation 4: LSI, VLSI, ULSI (1970s and beyond)",[16,2406,2407],{},"As manufacturing improved, engineers kept cramming more components onto a single chip. The industry labelled successive milestones:",[116,2409,2410,2416,2422],{},[119,2411,2412,2415],{},[31,2413,2414],{},"LSI"," (Large Scale Integration)",[119,2417,2418,2421],{},[31,2419,2420],{},"VLSI"," (Very Large Scale Integration)",[119,2423,2424,2427],{},[31,2425,2426],{},"ULSI"," (Ultra Large Scale Integration)",[16,2429,706,2430,2433],{},[31,2431,2432],{},"number of components per chip increased yearly"," — the phenomenon famously captured by Moore's Law. Two major leaps happened in this era.",[1800,2435,2437],{"id":2436},"semiconductor-memory-replaces-core-memory","Semiconductor memory replaces core memory",[16,2439,2440,2441,2444,2445,2448],{},"In the 1950s and 1960s, memory was built from ",[31,2442,2443],{},"rings of ferromagnetic material"," called ",[31,2446,2447],{},"cores",". Core memory was fast for its time, but had serious downsides:",[116,2450,2451,2457,2462],{},[119,2452,2453,2456],{},[31,2454,2455],{},"Bulky"," — imagine tiny magnetic donuts threaded on wires, by hand",[119,2458,2459],{},[31,2460,2461],{},"Expensive",[119,2463,2464,2467],{},[31,2465,2466],{},"Destructive readout"," — reading a bit erased it, so the value had to be re-written immediately afterwards",[16,2469,2470,2471,2474,2475,2478,2479,2482],{},"In ",[31,2472,2473],{},"1970, Fairchild introduced semiconductor memory",". It was ",[31,2476,2477],{},"non-destructive"," (reading did not erase the bit) and ",[31,2480,2481],{},"faster than core",". This is the ancestor of every DRAM and SRAM chip in use today.",[16,2484,2485],{},"To appreciate the insane scale of this transition, the slides end with one of my favourite images in the whole course:",[1192,2487,2488,2493],{},[16,2489,2490],{},[31,2491,2492],{},"8 Bytes of core memory vs. 8 Gigabytes on a microSD card.",[16,2494,2495,2496,257],{},"The core memory looks like a woven sculpture the size of a brick. The microSD card is smaller than a fingernail. And the microSD holds ",[31,2497,2498],{},"one billion times more data",[1800,2500,2502],{"id":2501},"the-microprocessor-a-whole-cpu-on-one-chip","The microprocessor: a whole CPU on one chip",[16,2504,2470,2505,2508,2509,2512,2513,2516,2517,257],{},[31,2506,2507],{},"1971, Intel released the 4004"," — the first chip to contain ",[23,2510,2511],{},"all"," the components of a CPU on a ",[31,2514,2515],{},"single chip",". This was the birth of the ",[31,2518,2519],{},"microprocessor",[16,2521,2522],{},"Before 1971, a CPU was a large circuit board (or several boards) full of separate chips. After 1971, \"the CPU\" became one tiny square you could hold in tweezers. This is the innovation that made personal computers, mobile phones, and embedded systems possible.",[16,2524,2525],{},"From there, microprocessor improvements have followed a few steady trends:",[116,2527,2528,2534,2540],{},[119,2529,2530,2533],{},[31,2531,2532],{},"Increase in the number of bits in the register"," (4-bit → 8-bit → 16-bit → 32-bit → 64-bit)",[119,2535,2536,2539],{},[31,2537,2538],{},"Decrease in clock switching time"," (faster clock = more instructions per second)",[119,2541,2542,2545],{},[31,2543,2544],{},"Other hardware improvements"," (pipelining, caches, branch prediction, multiple cores…)",[16,2547,2548,2549,257],{},"The chip in your phone right now is a direct descendant of the Intel 4004 — only about ",[31,2550,2551],{},"a billion times more capable",[11,2553,2555],{"id":2554},"recap","Recap",[16,2557,2558],{},"Let's pull all of this together.",[16,2560,2561,2562,2565],{},"We started with the question: ",[31,2563,2564],{},"what is inside a computer?"," We answered it in three layers of zoom:",[686,2567,2568,2574,2592],{},[119,2569,2570,2573],{},[31,2571,2572],{},"Architecture vs. Organization"," — the contract with the programmer vs. the hardware implementation.",[119,2575,2576,2579,2580,1251,2583,1251,2586,396,2589,2591],{},[31,2577,2578],{},"Structure and Function"," — at every level of the computer, there are things that ",[23,2581,2582],{},"process",[23,2584,2585],{},"store",[23,2587,2588],{},"move data",[23,2590,1487],{},". Structure is how they are connected; function is what they do.",[119,2593,2594,2597],{},[31,2595,2596],{},"History"," — vacuum tubes gave us programmable electronic computers, transistors made them practical, integrated circuits made them affordable, and LSI\u002FVLSI put a whole computer in your pocket.\nThe same four functions (processing, storage, movement, control) reappear at every level of abstraction — from a warehouse-sized ENIAC, down to the inside of a modern CPU, down to individual gates and memory cells on silicon. Once you see this pattern, the rest of CSA is a lot less intimidating.",[11,2599,2601],{"id":2600},"where-we-go-next","Where we go next",[16,2603,2604,2605,2608],{},"In the next post we will zoom in on ",[31,2606,2607],{},"the CPU"," — how the control unit and ALU cooperate to fetch, decode, and execute a single instruction. That is where the magic of stored-program computing actually happens.",[16,2610,2611],{},"Until then, a little homework to cement the ideas:",[116,2613,2614,2617,2620],{},[119,2615,2616],{},"Look at your own laptop or phone. Can you name its architecture (e.g. ARM, x86-64)?",[119,2618,2619],{},"For each of the four functions, point at something on your desk that performs it.",[119,2621,2622],{},"Ask yourself: in an ENIAC with 18,000 vacuum tubes, how would you debug a failure?\nSee you in Week 2.",{"title":284,"searchDepth":818,"depth":818,"links":2624},[2625,2626,2631,2632,2639,2644,2650,2651],{"id":1186,"depth":818,"text":1187},{"id":1224,"depth":818,"text":1225,"children":2627},[2628,2629,2630],{"id":1231,"depth":824,"text":1232},{"id":1283,"depth":824,"text":1284},{"id":1314,"depth":824,"text":1315},{"id":1353,"depth":818,"text":1354},{"id":1400,"depth":818,"text":1401,"children":2633},[2634,2635,2636,2637,2638],{"id":1437,"depth":824,"text":1415},{"id":1443,"depth":824,"text":1421},{"id":1467,"depth":824,"text":1427},{"id":1487,"depth":824,"text":1433},{"id":1497,"depth":824,"text":1498},{"id":1591,"depth":818,"text":1592,"children":2640},[2641,2642,2643],{"id":1605,"depth":824,"text":1606},{"id":1638,"depth":824,"text":1639},{"id":1718,"depth":824,"text":1719},{"id":1790,"depth":818,"text":1791,"children":2645},[2646,2647,2648,2649],{"id":1797,"depth":824,"text":1798},{"id":1987,"depth":824,"text":1988},{"id":2278,"depth":824,"text":2279},{"id":2403,"depth":824,"text":2404},{"id":2554,"depth":818,"text":2555},{"id":2600,"depth":818,"text":2601},"2026-04-19",{},"\u002Fblog\u002FWeek-1-—-Introduction-to-Computer-Systems",{"title":1181,"description":284},{"loc":2654},"blog\u002FWeek-1-—-Introduction-to-Computer-Systems","94-DJj9_cAojlCQAlJd4kiflgnC9KQOhiMrZBfIwCTE",{"id":4,"title":5,"author":6,"body":2660,"date":861,"description":284,"draft":862,"edited_at":861,"extension":863,"featured_image":864,"meta":3227,"navigation":866,"path":867,"pinned":862,"seo":3228,"sitemap":3229,"stem":870,"tags":864,"__hash__":871},{"type":8,"value":2661,"toc":3185},[2662,2664,2666,2670,2674,2676,2678,2680,2682,2684,2690,2692,2696,2698,2700,2702,2706,2708,2712,2714,2718,2728,2730,2732,2736,2740,2744,2746,2748,2750,2752,2754,2758,2760,2762,2764,2768,2770,2772,2774,2778,2780,2782,2784,2790,2792,2804,2806,2808,2810,2812,2814,2819,2821,2826,2828,2833,2839,2841,2846,2848,2856,2861,2863,2869,2874,2878,2880,2882,2887,2897,2899,2987,2991,2993,2995,3057,3063,3065,3067,3072,3074,3076,3081,3083,3085,3087,3089,3095,3097,3101,3111,3113,3117,3119,3129,3131,3133,3159,3161,3163,3165,3169,3173],[11,2663,14],{"id":13},[16,2665,18],{},[16,2667,21,2668,26],{},[23,2669,25],{},[16,2671,29,2672],{},[31,2673,33],{},[11,2675,37],{"id":36},[16,2677,40],{},[42,2679,45],{"id":44},[16,2681,48],{},[42,2683,52],{"id":51},[16,2685,55,2686,60,2688,64],{},[57,2687,59],{},[23,2689,63],{},[42,2691,68],{"id":67},[16,2693,71,2694,75],{},[23,2695,74],{},[42,2697,79],{"id":78},[16,2699,82],{},[42,2701,86],{"id":85},[16,2703,89,2704,93],{},[23,2705,92],{},[11,2707,97],{"id":96},[16,2709,100,2710,104],{},[31,2711,103],{},[16,2713,107],{},[16,2715,110,2716,114],{},[31,2717,113],{},[116,2719,2720,2722,2724,2726],{},[119,2721,121],{},[119,2723,124],{},[119,2725,127],{},[119,2727,130],{},[11,2729,134],{"id":133},[16,2731,137],{},[16,2733,2734,143],{},[31,2735,142],{},[16,2737,2738,149],{},[31,2739,148],{},[16,2741,2742,155],{},[31,2743,154],{},[11,2745,159],{"id":158},[16,2747,162],{},[42,2749,166],{"id":165},[16,2751,169],{},[42,2753,173],{"id":172},[16,2755,176,2756,180],{},[23,2757,179],{},[42,2759,184],{"id":183},[16,2761,187],{},[11,2763,191],{"id":190},[16,2765,194,2766],{},[31,2767,197],{},[16,2769,200],{},[16,2771,203],{},[42,2773,207],{"id":206},[16,2775,210,2776,214],{},[23,2777,213],{},[42,2779,218],{"id":217},[16,2781,221],{},[11,2783,225],{"id":224},[16,2785,228,2786,232,2788,236],{},[31,2787,231],{},[23,2789,235],{},[16,2791,239],{},[116,2793,2794,2798],{},[119,2795,2796,247],{},[31,2797,246],{},[119,2799,2800,253,2802,257],{},[31,2801,252],{},[31,2803,256],{},[16,2805,260],{},[11,2807,264],{"id":263},[16,2809,267],{},[42,2811,271],{"id":270},[16,2813,274],{},[276,2815,2817],{"className":2816,"code":280,"language":281},[279],[57,2818,280],{"__ignoreMap":284},[16,2820,287],{},[276,2822,2824],{"className":2823,"code":291,"language":281},[279],[57,2825,291],{"__ignoreMap":284},[16,2827,296],{},[276,2829,2831],{"className":2830,"code":300,"language":281},[279],[57,2832,300],{"__ignoreMap":284},[16,2834,305,2835,309,2837,313],{},[31,2836,308],{},[31,2838,312],{},[16,2840,316],{},[276,2842,2844],{"className":2843,"code":320,"language":281},[279],[57,2845,320],{"__ignoreMap":284},[42,2847,326],{"id":325},[16,2849,329,2850,333,2852,337,2854,257],{},[31,2851,332],{},[31,2853,336],{},[31,2855,340],{},[276,2857,2859],{"className":2858,"code":344,"language":281},[279],[57,2860,344],{"__ignoreMap":284},[42,2862,350],{"id":349},[16,2864,353,2865,356,2867,360],{},[31,2866,256],{},[31,2868,359],{},[276,2870,2872],{"className":2871,"code":364,"language":281},[279],[57,2873,364],{"__ignoreMap":284},[16,2875,369,2876,373],{},[31,2877,372],{},[11,2879,377],{"id":376},[16,2881,380],{},[276,2883,2885],{"className":2884,"code":384,"language":281},[279],[57,2886,384],{"__ignoreMap":284},[16,2888,305,2889,392,2891,396,2893,400,2895,257],{},[31,2890,391],{},[23,2892,395],{},[31,2894,399],{},[23,2896,395],{},[42,2898,406],{"id":405},[408,2900,2901,2915],{},[411,2902,2903],{},[414,2904,2905,2907,2909,2911,2913],{},[417,2906,419],{},[417,2908,423],{"align":422},[417,2910,391],{"align":422},[417,2912,428],{"align":422},[417,2914,431],{"align":422},[433,2916,2917,2929,2941,2953,2965],{},[414,2918,2919,2921,2923,2925,2927],{},[438,2920,440],{},[438,2922,443],{"align":422},[438,2924,446],{"align":422},[438,2926,449],{"align":422},[438,2928,452],{"align":422},[414,2930,2931,2933,2935,2937,2939],{},[438,2932,457],{},[438,2934,460],{"align":422},[438,2936,463],{"align":422},[438,2938,466],{"align":422},[438,2940,469],{"align":422},[414,2942,2943,2945,2947,2949,2951],{},[438,2944,474],{},[438,2946,477],{"align":422},[438,2948,480],{"align":422},[438,2950,483],{"align":422},[438,2952,486],{"align":422},[414,2954,2955,2957,2959,2961,2963],{},[438,2956,491],{},[438,2958,460],{"align":422},[438,2960,496],{"align":422},[438,2962,499],{"align":422},[438,2964,502],{"align":422},[414,2966,2967,2971,2975,2979,2983],{},[438,2968,2969],{},[31,2970,509],{},[438,2972,2973],{"align":422},[31,2974,514],{},[438,2976,2977],{"align":422},[518,2978],{},[438,2980,2981],{"align":422},[31,2982,524],{},[438,2984,2985],{"align":422},[31,2986,514],{},[16,2988,531,2989,535],{},[31,2990,534],{},[11,2992,539],{"id":538},[16,2994,542],{},[408,2996,2997,3009],{},[411,2998,2999],{},[414,3000,3001,3003,3005,3007],{},[417,3002,551],{},[417,3004,554],{"align":422},[417,3006,308],{"align":422},[417,3008,559],{"align":422},[433,3010,3011,3023,3033,3045],{},[414,3012,3013,3015,3017,3019],{},[438,3014,566],{},[438,3016,569],{"align":422},[438,3018,569],{"align":422},[438,3020,3021],{"align":422},[518,3022],{},[414,3024,3025,3027,3029,3031],{},[438,3026,580],{},[438,3028,569],{"align":422},[438,3030,569],{"align":422},[438,3032,569],{"align":422},[414,3034,3035,3037,3041,3043],{},[438,3036,591],{},[438,3038,3039],{"align":422},[518,3040],{},[438,3042,569],{"align":422},[438,3044,569],{"align":422},[414,3046,3047,3049,3053,3055],{},[438,3048,604],{},[438,3050,3051],{"align":422},[518,3052],{},[438,3054,569],{"align":422},[438,3056,569],{"align":422},[16,3058,615,3059,619,3061,623],{},[23,3060,618],{},[23,3062,622],{},[11,3064,627],{"id":626},[42,3066,631],{"id":630},[276,3068,3070],{"className":3069,"code":635,"language":281},[279],[57,3071,635],{"__ignoreMap":284},[16,3073,640],{},[42,3075,644],{"id":643},[276,3077,3079],{"className":3078,"code":648,"language":281},[279],[57,3080,648],{"__ignoreMap":284},[16,3082,653],{},[42,3084,657],{"id":656},[16,3086,660],{},[16,3088,663],{},[16,3090,3091,669,3093,673],{},[31,3092,668],{},[23,3094,672],{},[11,3096,677],{"id":676},[16,3098,680,3099,684],{},[31,3100,683],{},[686,3102,3103,3105,3107,3109],{},[119,3104,690],{},[119,3106,693],{},[119,3108,696],{},[119,3110,699],{},[42,3112,703],{"id":702},[16,3114,706,3115,710],{},[31,3116,709],{},[42,3118,714],{"id":713},[16,3120,717,3121,720,3123,724,3125,728,3127,732],{},[31,3122,714],{},[31,3124,723],{},[31,3126,727],{},[31,3128,731],{},[16,3130,735],{},[42,3132,739],{"id":738},[116,3134,3135,3139,3143,3147,3151,3155],{},[119,3136,3137,747],{},[31,3138,746],{},[119,3140,3141,753],{},[31,3142,752],{},[119,3144,3145,759],{},[31,3146,758],{},[119,3148,3149,765],{},[31,3150,764],{},[119,3152,3153,771],{},[31,3154,770],{},[119,3156,3157,777],{},[31,3158,776],{},[16,3160,780],{},[11,3162,784],{"id":783},[16,3164,787],{},[16,3166,790,3167,794],{},[23,3168,793],{},[16,3170,3171],{},[31,3172,799],{},[116,3174,3175,3177,3179,3181,3183],{},[119,3176,804],{},[119,3178,807],{},[119,3180,810],{},[119,3182,813],{},[119,3184,816],{},{"title":284,"searchDepth":818,"depth":818,"links":3186},[3187,3188,3195,3196,3197,3202,3206,3207,3212,3215,3216,3221,3226],{"id":13,"depth":818,"text":14},{"id":36,"depth":818,"text":37,"children":3189},[3190,3191,3192,3193,3194],{"id":44,"depth":824,"text":45},{"id":51,"depth":824,"text":52},{"id":67,"depth":824,"text":68},{"id":78,"depth":824,"text":79},{"id":85,"depth":824,"text":86},{"id":96,"depth":818,"text":97},{"id":133,"depth":818,"text":134},{"id":158,"depth":818,"text":159,"children":3198},[3199,3200,3201],{"id":165,"depth":824,"text":166},{"id":172,"depth":824,"text":173},{"id":183,"depth":824,"text":184},{"id":190,"depth":818,"text":191,"children":3203},[3204,3205],{"id":206,"depth":824,"text":207},{"id":217,"depth":824,"text":218},{"id":224,"depth":818,"text":225},{"id":263,"depth":818,"text":264,"children":3208},[3209,3210,3211],{"id":270,"depth":824,"text":271},{"id":325,"depth":824,"text":326},{"id":349,"depth":824,"text":350},{"id":376,"depth":818,"text":377,"children":3213},[3214],{"id":405,"depth":824,"text":406},{"id":538,"depth":818,"text":539},{"id":626,"depth":818,"text":627,"children":3217},[3218,3219,3220],{"id":630,"depth":824,"text":631},{"id":643,"depth":824,"text":644},{"id":656,"depth":824,"text":657},{"id":676,"depth":818,"text":677,"children":3222},[3223,3224,3225],{"id":702,"depth":824,"text":703},{"id":713,"depth":824,"text":714},{"id":738,"depth":824,"text":739},{"id":783,"depth":818,"text":784},{},{"title":5,"description":284},{"loc":867},{"id":3231,"title":3232,"author":6,"body":3233,"date":2652,"description":4055,"draft":862,"edited_at":2652,"extension":863,"featured_image":864,"meta":4056,"navigation":866,"path":4057,"pinned":862,"seo":4058,"sitemap":4059,"stem":4060,"tags":864,"__hash__":4061},"blog\u002Fblog\u002FA-Top-Level-View-of-Computer-Function-and-Interconnection.md","A Top-Level View of Computer Function and Interconnection",{"type":8,"value":3234,"toc":4026},[3235,3242,3245,3249,3256,3288,3292,3295,3319,3322,3336,3340,3347,3351,3376,3380,3383,3408,3412,3423,3443,3454,3471,3474,3478,3485,3495,3499,3552,3556,3566,3584,3587,3591,3594,3624,3628,3638,3642,3645,3655,3660,3670,3676,3721,3725,3728,3739,3743,3750,3754,3760,3770,3780,3786,3789,3793,3800,3804,3811,3835,3839,3908,3912,3918,3924,3938,3942,3949,3964,3967,3991,3995,3998,4013,4023],[16,3236,3237,3238,3241],{},"So you've heard that computers are made of a processor, memory, and I\u002FO devices — but how do they actually ",[23,3239,3240],{},"work together","? This post walks through the big picture: how a computer fetches and runs instructions, how it deals with interruptions, and how all its components talk to each other through buses and point-to-point links.",[16,3243,3244],{},"If you're new to computer architecture, this is a great starting point. Let's dive in.",[11,3246,3248],{"id":3247},"_1-the-von-neumann-architecture","1. The Von Neumann Architecture",[16,3250,3251,3252,3255],{},"Almost every modern computer traces its design back to ideas developed by ",[31,3253,3254],{},"John von Neumann"," at the Institute for Advanced Studies in Princeton. The architecture rests on three core principles:",[686,3257,3258,3264,3274],{},[119,3259,3260,3263],{},[31,3261,3262],{},"Unified memory"," — Both data and instructions live in the same read-write memory. There isn't one memory for programs and another for data; they share the same space.",[119,3265,3266,3269,3270,3273],{},[31,3267,3268],{},"Address-based access"," — Memory contents are referenced by their ",[23,3271,3272],{},"location"," (address), regardless of whether the content is an integer, a character, or a machine instruction.",[119,3275,3276,3279,3280,3283,3284,3287],{},[31,3277,3278],{},"Sequential execution"," — The processor works through instructions one after another, in order, unless an instruction explicitly tells it to jump somewhere else.\nThis might sound obvious today, but the alternative — a ",[31,3281,3282],{},"hardwired program",", where you physically rewire components to change what the computer does — was once the norm. The von Neumann model gave us the power of ",[23,3285,3286],{},"software",": change the program in memory, and the same hardware does something completely different.",[42,3289,3291],{"id":3290},"the-three-core-components","The Three Core Components",[16,3293,3294],{},"At the highest level, a computer is built from three types of modules:",[116,3296,3297,3303,3313],{},[119,3298,3299,3302],{},[31,3300,3301],{},"Processor (CPU)"," — Reads instructions and data, performs computation, writes results, and coordinates everything via control signals. It also receives interrupt signals (more on that soon).",[119,3304,3305,3308,3309,3312],{},[31,3306,3307],{},"Memory"," — A collection of ",[23,3310,3311],{},"N"," words, each with a unique numerical address (0, 1, …, N−1). The processor can read from or write to any address.",[119,3314,3315,3318],{},[31,3316,3317],{},"I\u002FO Modules"," — From the computer's internal perspective, I\u002FO works a lot like memory — there are read and write operations. A single I\u002FO module may control multiple external devices (keyboard, display, disk, etc.).",[16,3320,3321],{},"Two special registers sit between the processor and memory:",[116,3323,3324,3330],{},[119,3325,3326,3329],{},[31,3327,3328],{},"MAR (Memory Address Register)"," — Holds the address of the memory location the processor wants to access.",[119,3331,3332,3335],{},[31,3333,3334],{},"MBR (Memory Buffer Register)"," — Holds the data being written to or read from that address.",[11,3337,3339],{"id":3338},"_2-the-instruction-cycle","2. The Instruction Cycle",[16,3341,3342,3343,3346],{},"The fundamental rhythm of a processor is the ",[31,3344,3345],{},"instruction cycle",": fetch an instruction, then execute it. Over and over again. Let's break that down.",[42,3348,3350],{"id":3349},"the-fetch-phase","The Fetch Phase",[686,3352,3353,3362,3365,3368,3373],{},[119,3354,706,3355,3357,3358,3361],{},[31,3356,1964],{}," holds the address of the ",[23,3359,3360],{},"next"," instruction to fetch.",[119,3363,3364],{},"The processor reads the instruction at that address from memory.",[119,3366,3367],{},"The PC is incremented so it points to the following instruction.",[119,3369,3370,3371,257],{},"The fetched instruction is loaded into the ",[31,3372,1944],{},[119,3374,3375],{},"The processor decodes the instruction and figures out what to do.",[42,3377,3379],{"id":3378},"the-execute-phase","The Execute Phase",[16,3381,3382],{},"Once the processor knows what the instruction says, the action falls into one of four categories:",[116,3384,3385,3391,3397,3403],{},[119,3386,3387,3390],{},[31,3388,3389],{},"Processor–Memory"," — Transfer data between the CPU and main memory (load\u002Fstore).",[119,3392,3393,3396],{},[31,3394,3395],{},"Processor–I\u002FO"," — Transfer data between the CPU and an I\u002FO module.",[119,3398,3399,3402],{},[31,3400,3401],{},"Data processing"," — Perform arithmetic or logical operations on data.",[119,3404,3405,3407],{},[31,3406,1433],{}," — Change the sequence of execution (e.g., jump to a different address).",[42,3409,3411],{"id":3410},"a-simple-example","A Simple Example",[16,3413,3414,3415,3418,3419,3422],{},"Imagine we want to add the contents of memory location ",[31,3416,3417],{},"940"," to the contents of location ",[31,3420,3421],{},"941",". With a simple instruction set, this might take three instruction cycles:",[686,3424,3425,3431,3437],{},[119,3426,3427,3430],{},[31,3428,3429],{},"LOAD 940"," — Fetch the value at address 940 into the accumulator.",[119,3432,3433,3436],{},[31,3434,3435],{},"ADD 941"," — Fetch the value at address 941 and add it to the accumulator.",[119,3438,3439,3442],{},[31,3440,3441],{},"STORE 941"," — Write the result from the accumulator back to address 941.\nEach of these is one fetch + one execute. Three cycles total.",[16,3444,3445,3446,3449,3450,3453],{},"Now, some processors have more powerful instructions. Consider the PDP-11 instruction ",[57,3447,3448],{},"ADD B, A",", which does all of this in a ",[23,3451,3452],{},"single"," instruction cycle — but that single execute phase is more complex:",[686,3455,3456,3459,3462,3465,3468],{},[119,3457,3458],{},"Fetch the ADD instruction.",[119,3460,3461],{},"Read memory location A into the processor.",[119,3463,3464],{},"Read memory location B into the processor (the CPU needs two internal registers to hold both values).",[119,3466,3467],{},"Add the two values.",[119,3469,3470],{},"Write the result back to location A.",[16,3472,3473],{},"The takeaway: richer instructions can reduce the number of cycles, but each cycle does more work.",[11,3475,3477],{"id":3476},"_3-interrupts","3. Interrupts",[16,3479,3480,3481,3484],{},"Here's a problem. External devices like printers and disks are ",[23,3482,3483],{},"much"," slower than the processor. If the CPU sends data to a printer and then just waits for the printer to finish, it wastes thousands of instruction cycles doing nothing. That's terrible for performance.",[16,3486,3487,3490,3491,3494],{},[31,3488,3489],{},"Interrupts"," solve this by letting other modules (I\u002FO controllers, timers, etc.) signal the processor that something needs attention — ",[23,3492,3493],{},"without"," forcing the CPU to sit idle.",[42,3496,3498],{"id":3497},"classes-of-interrupts","Classes of Interrupts",[408,3500,3501,3511],{},[411,3502,3503],{},[414,3504,3505,3508],{},[417,3506,3507],{},"Class",[417,3509,3510],{},"What triggers it",[433,3512,3513,3523,3533,3542],{},[414,3514,3515,3520],{},[438,3516,3517],{},[31,3518,3519],{},"Program",[438,3521,3522],{},"A condition arising from instruction execution — arithmetic overflow, division by zero, illegal instruction, or an out-of-bounds memory access.",[414,3524,3525,3530],{},[438,3526,3527],{},[31,3528,3529],{},"Timer",[438,3531,3532],{},"A hardware timer inside the processor fires periodically, letting the OS perform housekeeping tasks on a regular schedule.",[414,3534,3535,3539],{},[438,3536,3537],{},[31,3538,1689],{},[438,3540,3541],{},"An I\u002FO controller signals that an operation has completed normally, that it needs service, or that an error occurred.",[414,3543,3544,3549],{},[438,3545,3546],{},[31,3547,3548],{},"Hardware failure",[438,3550,3551],{},"Something physical went wrong — power failure, memory parity error, etc.",[42,3553,3555],{"id":3554},"how-interrupts-improve-efficiency","How Interrupts Improve Efficiency",[16,3557,3558,3561,3562,3565],{},[31,3559,3560],{},"Without interrupts:"," The user program calls WRITE, prepares the I\u002FO data (code segment 4), issues the I\u002FO command, and then ",[23,3563,3564],{},"waits"," until the device finishes before running the completion code (segment 5). During that wait, the CPU is idle.",[16,3567,3568,3571,3572,3575,3576,3579,3580,3583],{},[31,3569,3570],{},"With interrupts:"," The program issues the I\u002FO command and then ",[23,3573,3574],{},"keeps executing other instructions",". The I\u002FO device works concurrently in the background. When it finishes, it sends an ",[31,3577,3578],{},"interrupt request signal",". The processor pauses its current work, jumps to an ",[31,3581,3582],{},"interrupt handler"," (a small routine, usually part of the OS, that services the device), and then resumes the original program right where it left off.",[16,3585,3586],{},"This is dramatically more efficient. The CPU stays busy doing useful work instead of waiting.",[42,3588,3590],{"id":3589},"what-happens-during-an-interrupt","What Happens During an Interrupt",[16,3592,3593],{},"When the processor detects an interrupt (typically checked at the end of each instruction cycle):",[686,3595,3596,3606,3612,3618],{},[119,3597,3598,3601,3602,3605],{},[31,3599,3600],{},"Suspend"," the current program and ",[31,3603,3604],{},"save its context"," (register contents, PC value, status flags) — usually onto a system stack.",[119,3607,3608,3611],{},[31,3609,3610],{},"Set the PC"," to the starting address of the appropriate interrupt handler.",[119,3613,3614,3617],{},[31,3615,3616],{},"Fetch and execute"," the interrupt handler instructions (determine the interrupt source, perform the needed action).",[119,3619,3620,3623],{},[31,3621,3622],{},"Restore"," the saved context and resume the interrupted program.\nThere is some overhead — the handler must figure out what caused the interrupt and respond — but the time saved by not idling far outweighs it.",[42,3625,3627],{"id":3626},"short-io-wait-vs-long-io-wait","Short I\u002FO Wait vs. Long I\u002FO Wait",[16,3629,3630,3631,3634,3635,257],{},"If the I\u002FO operation finishes before the program issues its next WRITE, everything flows smoothly. But if the program reaches a ",[23,3632,3633],{},"second"," WRITE before the first I\u002FO operation completes, it has to wait — the device is still busy. This is a ",[31,3636,3637],{},"long I\u002FO wait",[42,3639,3641],{"id":3640},"multiple-interrupts","Multiple Interrupts",[16,3643,3644],{},"What if several devices interrupt at the same time? Two strategies:",[16,3646,3647,3650,3651,3654],{},[31,3648,3649],{},"1. Disabled (sequential) approach:","\nWhile handling one interrupt, the processor ",[23,3652,3653],{},"disables"," further interrupts. Any new interrupt stays pending. Once the current handler finishes and re-enables interrupts, the processor checks for and services the next pending interrupt.",[116,3656,3657],{},[119,3658,3659],{},"Simple, but it ignores urgency. A time-critical interrupt might have to wait behind a low-priority one.",[16,3661,3662,3665,3666,3669],{},[31,3663,3664],{},"2. Priority-based (nested) approach:","\nEach interrupt source has a ",[31,3667,3668],{},"priority level",". A higher-priority interrupt can preempt (interrupt) a lower-priority handler. Lower-priority interrupts must wait.",[16,3671,3672,3675],{},[31,3673,3674],{},"Example:"," Suppose we have three devices with these priorities — Printer: 2, Disk: 4, Communications line: 5.",[116,3677,3678,3685,3691,3697,3703,3709,3715],{},[119,3679,3680,3681,3684],{},"At ",[23,3682,3683],{},"t = 0",", the user program starts.",[119,3686,3680,3687,3690],{},[23,3688,3689],{},"t = 10",", the printer interrupts. The user program's state is saved; the printer ISR (Interrupt Service Routine) begins.",[119,3692,3680,3693,3696],{},[23,3694,3695],{},"t = 15",", the communications line interrupts (priority 5 > 2). The printer ISR is paused and its state saved; the comm ISR begins.",[119,3698,3680,3699,3702],{},[23,3700,3701],{},"t = 20",", the disk interrupts (priority 4 \u003C 5). It must wait — the comm ISR continues.",[119,3704,3680,3705,3708],{},[23,3706,3707],{},"t = 25",", the comm ISR finishes. The processor restores the printer ISR's state — but before it executes even one instruction, it notices the pending disk interrupt (priority 4 > 2) and services it.",[119,3710,3680,3711,3714],{},[23,3712,3713],{},"t = 35",", the disk ISR finishes. The printer ISR finally resumes.",[119,3716,3680,3717,3720],{},[23,3718,3719],{},"t = 40",", the printer ISR finishes. Control returns to the user program.\nThis priority scheme ensures that urgent devices get serviced quickly, even at the cost of delaying less critical ones.",[11,3722,3724],{"id":3723},"_4-io-function-and-dma","4. I\u002FO Function and DMA",[16,3726,3727],{},"The processor can exchange data with I\u002FO modules directly — reading from or writing to them using special I\u002FO instructions (or memory-mapped I\u002FO, where certain memory addresses correspond to device registers).",[16,3729,3730,3731,3734,3735,3738],{},"But there's a better way for large data transfers: ",[31,3732,3733],{},"Direct Memory Access (DMA)",". With DMA, the processor grants an I\u002FO module the authority to read from or write to main memory ",[23,3736,3737],{},"on its own",". The I\u002FO module handles the entire transfer, and the processor is free to do other work. The CPU only gets involved at the start (to set up the transfer) and at the end (when the I\u002FO module signals completion via an interrupt).",[11,3740,3742],{"id":3741},"_5-interconnection-structures","5. Interconnection Structures",[16,3744,3745,3746,3749],{},"So the processor, memory, and I\u002FO modules need to communicate. How? Through an ",[31,3747,3748],{},"interconnection structure",". There are two main approaches.",[42,3751,3753],{"id":3752},"bus-interconnection","Bus Interconnection",[16,3755,2013,3756,3759],{},[31,3757,3758],{},"bus"," is a shared communication pathway made up of multiple lines (wires). All modules connect to the same bus and share it. A bus has three types of lines:",[16,3761,3762,3765,3766,3769],{},[31,3763,3764],{},"Data Bus:","\nCarries the actual data being transferred. The ",[23,3767,3768],{},"width"," of the data bus — how many parallel lines it has (e.g., 32, 64, 128 bits) — directly affects how many bits can move at once and is a key factor in overall system performance.",[16,3771,3772,3775,3776,3779],{},[31,3773,3774],{},"Address Bus:","\nSpecifies ",[23,3777,3778],{},"where"," data should go or come from. The width of the address bus determines the maximum memory the system can address. Higher-order bits select which module (memory chip, I\u002FO port), and lower-order bits select a location within that module.",[16,3781,3782,3785],{},[31,3783,3784],{},"Control Bus:","\nCarries command and timing signals — things like \"this is a read operation,\" \"this is a write,\" and \"the data on the bus is valid now.\" Because the data and address lines are shared by all components, the control bus coordinates who gets to use them and when.",[16,3787,3788],{},"Buses are simple and low-cost, but they become bottlenecks when many high-speed components compete for the same shared pathway.",[42,3790,3792],{"id":3791},"point-to-point-interconnect","Point-to-Point Interconnect",[16,3794,3795,3796,3799],{},"Modern systems have largely moved away from shared buses toward ",[31,3797,3798],{},"point-to-point interconnects",", where each pair of components has its own dedicated connection. This eliminates the arbitration overhead of shared buses and provides much higher bandwidth.",[11,3801,3803],{"id":3802},"_6-intel-quickpath-interconnect-qpi","6. Intel QuickPath Interconnect (QPI)",[16,3805,3806,3807,3810],{},"Introduced by Intel in 2008, ",[31,3808,3809],{},"QPI"," is a point-to-point interconnect that replaced the older shared front-side bus. Its key features:",[116,3812,3813,3819,3825],{},[119,3814,3815,3818],{},[31,3816,3817],{},"Multiple direct connections"," — Components are linked in pairs, removing the need for bus arbitration.",[119,3820,3821,3824],{},[31,3822,3823],{},"Layered protocol architecture"," — Rather than simple control signals, QPI uses a structured protocol stack (like a network protocol), which makes it more flexible and robust.",[119,3826,3827,3830,3831,3834],{},[31,3828,3829],{},"Packetized data transfer"," — Data travels in ",[23,3832,3833],{},"packets",", each containing control headers and error control codes.",[42,3836,3838],{"id":3837},"qpi-layers","QPI Layers",[408,3840,3841,3850],{},[411,3842,3843],{},[414,3844,3845,3848],{},[417,3846,3847],{},"Layer",[417,3849,1915],{},[433,3851,3852,3862,3884,3894],{},[414,3853,3854,3859],{},[438,3855,3856],{},[31,3857,3858],{},"Physical",[438,3860,3861],{},"The actual wires and electrical signaling.",[414,3863,3864,3869],{},[438,3865,3866],{},[31,3867,3868],{},"Link",[438,3870,3871,3872,3875,3876,3879,3880,3883],{},"Handles ",[31,3873,3874],{},"flow control"," (preventing a fast sender from overwhelming a slow receiver) and ",[31,3877,3878],{},"error control"," (detecting and recovering from bit errors using CRC codes). Operates on units called ",[23,3881,3882],{},"flits"," (flow control units): 72 bits of payload + 8 bits of CRC.",[414,3885,3886,3891],{},[438,3887,3888],{},[31,3889,3890],{},"Routing",[438,3892,3893],{},"Determines the path a packet takes through the interconnect network. Routes are defined by firmware.",[414,3895,3896,3901],{},[438,3897,3898],{},[31,3899,3900],{},"Protocol",[438,3902,3903,3904,3907],{},"Defines the packet as the unit of transfer. A critical function here is the ",[31,3905,3906],{},"cache coherency protocol",", which ensures that when multiple caches hold copies of the same memory location, they all stay consistent.",[11,3909,3911],{"id":3910},"_7-pci-express-pcie","7. PCI Express (PCIe)",[16,3913,3914,3917],{},[31,3915,3916],{},"PCI (Peripheral Component Interconnect)"," was once the dominant I\u002FO bus — a high-bandwidth, processor-independent shared bus. But as device speeds increased, the shared bus became a bottleneck.",[16,3919,3920,3923],{},[31,3921,3922],{},"PCI Express (PCIe)"," replaced it with a point-to-point interconnect scheme. Key motivations:",[116,3925,3926,3932],{},[119,3927,3928,3931],{},[31,3929,3930],{},"High capacity"," for modern I\u002FO devices like Gigabit Ethernet.",[119,3933,3934,3937],{},[31,3935,3936],{},"Support for time-dependent data streams"," (audio, video) that need guaranteed bandwidth.",[42,3939,3941],{"id":3940},"pcie-transaction-layer-tl","PCIe Transaction Layer (TL)",[16,3943,3944,3945,3948],{},"The Transaction Layer sits at the top of the PCIe protocol stack. It receives read and write requests from software and creates ",[31,3946,3947],{},"request packets"," that travel down through the layers to the destination device.",[16,3950,3951,3952,3955,3956,3959,3960,3963],{},"Most transactions use a ",[31,3953,3954],{},"split transaction"," technique: the source sends a request packet, then waits for a ",[31,3957,3958],{},"completion packet"," in response. Some writes and messages are ",[23,3961,3962],{},"posted"," (fire-and-forget — no response expected).",[16,3965,3966],{},"The TL supports four address spaces:",[116,3968,3969,3974,3979,3985],{},[119,3970,3971,3973],{},[31,3972,3307],{}," — Maps to system main memory and memory-mapped I\u002FO devices.",[119,3975,3976,3978],{},[31,3977,1689],{}," — For legacy PCI devices with reserved address ranges.",[119,3980,3981,3984],{},[31,3982,3983],{},"Configuration"," — Lets the system read\u002Fwrite configuration registers of PCIe devices.",[119,3986,3987,3990],{},[31,3988,3989],{},"Message"," — For control signals: interrupts, error handling, power management.\nThe TL supports both 32-bit and extended 64-bit memory addressing.",[11,3992,3994],{"id":3993},"summary","Summary",[16,3996,3997],{},"Here's the big picture of what we covered:",[16,3999,4000,4001,4004,4005,4008,4009,4012],{},"The computer's fundamental operation is a loop — ",[31,4002,4003],{},"fetch"," an instruction, ",[31,4006,4007],{},"execute"," it, check for ",[31,4010,4011],{},"interrupts",", repeat. Interrupts make this cycle dramatically more efficient by letting the CPU do useful work while slow I\u002FO devices operate in the background. When multiple interrupts compete, a priority system ensures the most urgent ones get serviced first.",[16,4014,4015,4016,4019,4020,4022],{},"All of this requires a way for the processor, memory, and I\u002FO to communicate. Older systems used ",[31,4017,4018],{},"shared buses"," (simple but limited in bandwidth). Modern systems use ",[31,4021,3798],{}," like Intel's QPI and PCIe, which offer dedicated high-speed links, layered protocols, and packetized data transfer.",[16,4024,4025],{},"Understanding this top-level view — the instruction cycle, interrupts, and interconnection — gives you the foundation for everything else in computer architecture. Each of these topics goes much deeper, but now you have the mental map to navigate them.",{"title":284,"searchDepth":818,"depth":818,"links":4027},[4028,4031,4036,4043,4044,4048,4051,4054],{"id":3247,"depth":818,"text":3248,"children":4029},[4030],{"id":3290,"depth":824,"text":3291},{"id":3338,"depth":818,"text":3339,"children":4032},[4033,4034,4035],{"id":3349,"depth":824,"text":3350},{"id":3378,"depth":824,"text":3379},{"id":3410,"depth":824,"text":3411},{"id":3476,"depth":818,"text":3477,"children":4037},[4038,4039,4040,4041,4042],{"id":3497,"depth":824,"text":3498},{"id":3554,"depth":824,"text":3555},{"id":3589,"depth":824,"text":3590},{"id":3626,"depth":824,"text":3627},{"id":3640,"depth":824,"text":3641},{"id":3723,"depth":818,"text":3724},{"id":3741,"depth":818,"text":3742,"children":4045},[4046,4047],{"id":3752,"depth":824,"text":3753},{"id":3791,"depth":824,"text":3792},{"id":3802,"depth":818,"text":3803,"children":4049},[4050],{"id":3837,"depth":824,"text":3838},{"id":3910,"depth":818,"text":3911,"children":4052},[4053],{"id":3940,"depth":824,"text":3941},{"id":3993,"depth":818,"text":3994},"So you've heard that computers are made of a processor, memory, and I\u002FO devices — but how do they actually work together? This post walks through the big picture: how a computer fetches and runs instructions, how it deals with interruptions, and how all its components talk to each other through buses and point-to-point links.",{},"\u002Fblog\u002FA-Top-Level-View-of-Computer-Function-and-Interconnection",{"title":3232,"description":4055},{"loc":4057},"blog\u002FA-Top-Level-View-of-Computer-Function-and-Interconnection","UqcUU_uBVGq3CfxbWv0lODHcWxe61h3dxG4b2TSnj3k",{"id":4063,"title":4064,"author":6,"body":4065,"date":2652,"description":284,"draft":862,"edited_at":5294,"extension":863,"featured_image":864,"meta":5295,"navigation":866,"path":5296,"pinned":862,"seo":5297,"sitemap":5298,"stem":5299,"tags":864,"__hash__":5300},"blog\u002Fblog\u002FThe-Memory-Hierarchy-Understanding-Cache-Memory.md","The Memory Hierarchy: Understanding Cache Memory",{"type":8,"value":4066,"toc":5275},[4067,4071,4074,4088,4092,4107,4114,4123,4133,4136,4140,4152,4155,4170,4177,4275,4290,4294,4297,4300,4314,4320,4324,4327,4356,4425,4429,4432,4436,4439,4457,4460,4464,4474,4478,4485,4492,4522,4536,4542,4552,4681,4685,4692,4695,4719,4731,4736,4741,4835,4839,4864,4893,4896,4901,4906,4909,5023,5027,5033,5059,5063,5070,5080,5090,5094,5097,5111,5115,5118,5126,5130,5137,5165,5171,5175,5186,5189,5209,5216,5220,5223,5229,5239,5245,5247,5250,5253,5268],[11,4068,4070],{"id":4069},"why-should-you-care-about-memory","Why Should You Care About Memory?",[16,4072,4073],{},"Imagine you're studying at the library. You have your desk, a small shelf next to you, and the massive library stacks behind you. When you're working on an essay, do you walk to the stacks every time you need to check a single sentence? Of course not — you keep the books you're actively using right on your desk.",[16,4075,4076,4077,4080,4081,4084,4085,257],{},"Computers face the exact same problem. The processor (CPU) needs data to work with, and where that data lives — and how quickly it can be fetched — has a ",[23,4078,4079],{},"huge"," impact on performance. This is the core idea behind the ",[31,4082,4083],{},"memory hierarchy"," and, more specifically, ",[31,4086,4087],{},"cache memory",[11,4089,4091],{"id":4090},"the-principle-of-locality","The Principle of Locality",[16,4093,4094,4095,4098,4099,4102,4103,4106],{},"Before we dive into cache, we need to understand ",[23,4096,4097],{},"why"," cache works so well. The answer lies in something called the ",[31,4100,4101],{},"principle of locality"," (also known as ",[31,4104,4105],{},"locality of reference",").",[16,4108,4109,4110,4113],{},"When a program runs, it doesn't access memory randomly. Instead, memory accesses tend to ",[31,4111,4112],{},"cluster"," around certain locations. This clustering comes in two flavours:",[16,4115,4116,4119,4120,4122],{},[31,4117,4118],{},"Temporal locality"," — if a program accessed a piece of data recently, it's very likely to access it again soon. Think about a loop counter: the variable ",[57,4121,395],{}," gets read and written on every iteration, over and over. Constants, temporary variables, and working stacks all exhibit this pattern.",[16,4124,4125,4128,4129,4132],{},[31,4126,4127],{},"Spatial locality"," — if a program accessed a memory address, it's likely to access ",[23,4130,4131],{},"nearby"," addresses soon. When you iterate through an array, you move from element 0 to element 1 to element 2 — each one sitting right next to the previous in memory.",[16,4134,4135],{},"These two tendencies are what make caching effective. If we can predict what data will be needed next (because it was used recently or lives nearby), we can keep it in a small, fast storage area and avoid the expensive trip to main memory.",[11,4137,4139],{"id":4138},"the-memory-hierarchy","The Memory Hierarchy",[16,4141,4142,4143,1199,4146,724,4149],{},"Designing a computer's memory system boils down to three competing questions: ",[31,4144,4145],{},"how much?",[31,4147,4148],{},"how fast?",[31,4150,4151],{},"how expensive?",[16,4153,4154],{},"Unfortunately, you can't have it all:",[116,4156,4157,4160,4163],{},[119,4158,4159],{},"Faster memory costs more per bit.",[119,4161,4162],{},"Larger memory is cheaper per bit, but slower.",[119,4164,4165,4166,4169],{},"You always want more capacity ",[23,4167,4168],{},"and"," more speed, but your budget disagrees.",[16,4171,4172,4173,4176],{},"The solution is a ",[31,4174,4175],{},"hierarchy"," — multiple levels of memory, each with different speed, size, and cost characteristics:",[408,4178,4179,4195],{},[411,4180,4181],{},[414,4182,4183,4186,4189,4192],{},[417,4184,4185],{},"Level",[417,4187,4188],{},"Technology",[417,4190,4191],{},"Typical Size",[417,4193,4194],{},"Managed By",[433,4196,4197,4212,4228,4244,4260],{},[414,4198,4199,4203,4206,4209],{},[438,4200,4201],{},[31,4202,1761],{},[438,4204,4205],{},"CMOS (on-chip flip-flops)",[438,4207,4208],{},"A few hundred bytes",[438,4210,4211],{},"Compiler",[414,4213,4214,4219,4222,4225],{},[438,4215,4216],{},[31,4217,4218],{},"Cache (L1\u002FL2\u002FL3)",[438,4220,4221],{},"SRAM \u002F eDRAM",[438,4223,4224],{},"KB to tens of MB",[438,4226,4227],{},"Processor hardware",[414,4229,4230,4235,4238,4241],{},[438,4231,4232],{},[31,4233,4234],{},"Main Memory",[438,4236,4237],{},"DRAM",[438,4239,4240],{},"GB",[438,4242,4243],{},"Operating System",[414,4245,4246,4251,4254,4257],{},[438,4247,4248],{},[31,4249,4250],{},"Secondary Storage",[438,4252,4253],{},"SSD \u002F HDD",[438,4255,4256],{},"TB",[438,4258,4259],{},"OS \u002F User",[414,4261,4262,4267,4270,4273],{},[438,4263,4264],{},[31,4265,4266],{},"Offline\u002FArchival",[438,4268,4269],{},"Tape \u002F Cloud",[438,4271,4272],{},"Virtually unlimited",[438,4274,4259],{},[16,4276,4277,4278,4281,4282,4285,4286,4289],{},"As you move ",[23,4279,4280],{},"up"," the pyramid (toward registers), memory gets smaller, faster, and more expensive. As you move ",[23,4283,4284],{},"down"," (toward archival storage), memory gets larger, slower, and cheaper. The trick is that the principle of locality ensures you spend most of your time accessing the ",[23,4287,4288],{},"top"," levels.",[11,4291,4293],{"id":4292},"what-is-cache-memory","What Is Cache Memory?",[16,4295,4296],{},"Cache is a small, fast memory that sits between the CPU and main memory. Its job is simple: keep copies of the data the processor is most likely to need next, so the processor doesn't have to wait for the (relatively) slow main memory.",[16,4298,4299],{},"When the processor needs a piece of data, it checks the cache first:",[116,4301,4302,4308],{},[119,4303,4304,4307],{},[31,4305,4306],{},"Cache hit"," — the data is in the cache. Great! The processor gets it almost instantly.",[119,4309,4310,4313],{},[31,4311,4312],{},"Cache miss"," — the data isn't in the cache. The system has to fetch it from main memory (or a lower cache level), which takes much longer.",[16,4315,706,4316,4319],{},[31,4317,4318],{},"hit rate"," (percentage of accesses that are hits) is the single most important measure of cache performance. A well-designed cache can achieve hit rates above 90%, meaning the processor rarely has to wait.",[42,4321,4323],{"id":4322},"key-terminology","Key Terminology",[16,4325,4326],{},"Before we go further, let's nail down a few terms:",[116,4328,4329,4335,4341,4350],{},[119,4330,4331,4334],{},[31,4332,4333],{},"Block"," — the minimum chunk of data transferred between cache and main memory. You don't fetch a single byte; you fetch an entire block at a time (taking advantage of spatial locality).",[119,4336,4337,4340],{},[31,4338,4339],{},"Line"," — a slot in the cache that can hold one block. Think of it as a shelf space.",[119,4342,4343,4346,4347,4349],{},[31,4344,4345],{},"Tag"," — a label attached to each cache line that identifies ",[23,4348,622],{}," block from main memory is currently stored there.",[119,4351,4352,4355],{},[31,4353,4354],{},"Line size"," — the number of data bytes in a block\u002Fline (commonly 32 or 64 bytes).",[4357,4358,4362,4365],"details",{"className":4359},[4360,4361],"info-box","info-box-info",[3993,4363,4364],{},"Analogy",[4366,4367,4370,4373,4383],"div",{"className":4368},[4369],"info-box-content",[16,4371,4372],{},"If you still can't understand what the above term means, here's a more refined version:",[16,4374,4375,4376,4378,4379,4382],{},"Imagine you are writing a research paper. The ",[31,4377,4234],{}," is a massive, multi-story library down the street. The ",[31,4380,4381],{},"Cache"," is a small, fast bookshelf right on your desk.",[686,4384,4385,4398,4407,4413],{},[119,4386,4387,4390,4391,4394,4395,4397],{},[31,4388,4389],{},"Block:"," The library has a rule: you can't check out just one book. You have to check out a whole ",[23,4392,4393],{},"crate"," of books at once. That crate of books is the ",[31,4396,4333],{},". It's the actual data you are moving around.",[119,4399,4400,4403,4404,4406],{},[31,4401,4402],{},"Line:"," On your desk bookshelf, you have exactly 10 empty cubbies to hold crates. Each empty cubby is a ",[31,4405,4339],{},". It is the physical space reserved for the data.",[119,4408,4409,4412],{},[31,4410,4411],{},"Line Size:"," This is simply the physical dimensions of the cubby. If the \"line size\" is 64 bytes, it means that cubby is built to hold a crate that contains exactly 64 bytes of data. Every cubby (Line) and every crate (Block) is the exact same size.",[119,4414,4415,4418,4419,4422,4423,257],{},[31,4416,4417],{},"Tag:"," Because you only have 10 cubbies, you are constantly swapping crates in and out depending on what you are researching. If you reach for cubby #3, how do you know what crate is currently in there? You stick a sticky note on the front of the cubby that says, ",[23,4420,4421],{},"\"This cubby currently holds the crate from the Biology section.\""," That note is the ",[31,4424,4345],{},[11,4426,4428],{"id":4427},"elements-of-cache-design","Elements of Cache Design",[16,4430,4431],{},"Designing a cache involves several interrelated decisions. Let's walk through each one.",[42,4433,4435],{"id":4434},"_1-cache-size","1. Cache Size",[16,4437,4438],{},"How big should the cache be? There's a sweet spot:",[116,4440,4441,4447],{},[119,4442,4443,4446],{},[31,4444,4445],{},"Too small"," → not enough room, too many misses.",[119,4448,4449,4452,4453,4456],{},[31,4450,4451],{},"Too large"," → more expensive, physically bigger, and paradoxically ",[23,4454,4455],{},"slightly slower"," because addressing a larger cache requires more gate logic.",[16,4458,4459],{},"In practice, L1 caches are typically 32–64 KB, L2 caches are 256 KB–1 MB, and L3 caches can be several megabytes to tens of megabytes. The exact optimal size depends heavily on the workload, so there's no single \"best\" answer.",[42,4461,4463],{"id":4462},"_2-mapping-function","2. Mapping Function",[16,4465,4466,4467,4469,4470,4473],{},"When a block is fetched from main memory, ",[23,4468,3778],{}," does it go in the cache? This is determined by the ",[31,4471,4472],{},"mapping function",", and there are three main approaches.",[1800,4475,4477],{"id":4476},"direct-mapping","Direct Mapping",[16,4479,4480,4481,4484],{},"Each block of main memory maps to ",[31,4482,4483],{},"exactly one"," specific cache line.",[16,4486,4487,4488,4491],{},"The mapping is simple — typically ",[57,4489,4490],{},"cache line = block number mod number of lines",". The memory address is split into three fields:",[408,4493,4494,4506],{},[411,4495,4496],{},[414,4497,4498,4500,4503],{},[417,4499,4345],{},[417,4501,4502],{},"Line (index)",[417,4504,4505],{},"Word (offset)",[433,4507,4508],{},[414,4509,4510,4514,4518],{},[438,4511,4512],{},[518,4513],{},[438,4515,4516],{},[518,4517],{},[438,4519,4520],{},[518,4521],{},[16,4523,4524,4527,4528,4531,4532,4535],{},[31,4525,4526],{},"How it works:"," The ",[23,4529,4530],{},"line"," field tells the cache which slot to check. The ",[23,4533,4534],{},"tag"," field is compared against the tag stored in that slot. If they match, it's a hit. If not, it's a miss, and the existing block in that line gets evicted.",[16,4537,4538,4541],{},[31,4539,4540],{},"Pros:"," Simple, fast hardware. No searching required — go straight to the indexed line.",[16,4543,4544,4547,4548,4551],{},[31,4545,4546],{},"Cons:"," If two frequently-used blocks happen to map to the same line, they'll keep evicting each other. This is called ",[31,4549,4550],{},"thrashing",", and it can destroy performance even when the cache has plenty of empty lines elsewhere.",[4357,4553,4555,4558],{"className":4554},[4360,4361],[3993,4556,4557],{},"Explaination",[4366,4559,4561,4568,4579,4586,4611,4618,4621,4648,4651,4654,4675],{"className":4560},[4369],[16,4562,4563,4564,4567],{},"In a Direct Mapped cache, every single crate of books (Block) in the massive library has ",[31,4565,4566],{},"exactly one specific cubby (Line)"," it is allowed to go into on your desk. It is strict assigned seating.",[16,4569,4570,4571,4574,4575,4578],{},"Imagine your desk cache has only ",[31,4572,4573],{},"4 lines"," (numbered 0, 1, 2, and 3). The library has ",[31,4576,4577],{},"16 blocks"," (numbered 0 to 15).",[16,4580,4581,4582,4585],{},"How do we assign seats? We use modulo math (",[57,4583,4584],{},"Block Number MOD Total Cache Lines","):",[116,4587,4588,4591,4594,4597,4600,4605],{},[119,4589,4590],{},"Block 0 goes to Line 0",[119,4592,4593],{},"Block 1 goes to Line 1",[119,4595,4596],{},"Block 2 goes to Line 2",[119,4598,4599],{},"Block 3 goes to Line 3",[119,4601,4602],{},[23,4603,4604],{},"Block 4 wraps around and goes to Line 0",[119,4606,4607,4610],{},[23,4608,4609],{},"Block 5 goes to Line 1..."," and so on.",[16,4612,4613,4614,4617],{},"Because of this rule, Cache Line 0 is the ",[23,4615,4616],{},"only"," place Blocks 0, 4, 8, and 12 can ever be stored.",[16,4619,4620],{},"When the CPU asks for a specific piece of data, it hands the cache an address. The cache slices this address into three parts to find the data instantly:",[686,4622,4623,4629,4642],{},[119,4624,4625,4628],{},[31,4626,4627],{},"Line (Index):"," \"Which cubby do I check?\" The cache uses this middle chunk of the address to immediately jump to the correct physical slot. There is zero searching involved.",[119,4630,4631,4633,4634,4637,4638,4641],{},[31,4632,4417],{}," \"Is this the right crate?\" Because Line 0 could hold Block 0, 4, 8, or 12, the cache looks at the Tag (the sticky note on the cubby) to see which one is currently sitting there. If the Tag matches the address, it's a ",[31,4635,4636],{},"Cache Hit",". If it doesn't, it's a ",[31,4639,4640],{},"Cache Miss",", and the current block gets thrown out (evicted) to make room for the new one.",[119,4643,4644,4647],{},[31,4645,4646],{},"Word (Offset):"," \"Which specific book inside this crate do I read?\" Once the correct crate is confirmed, this tells the CPU exactly which byte of data to pull from the block.",[16,4649,4650],{},"Direct mapping is incredibly fast and cheap to build because the hardware never has to search—it just checks one specific location.",[16,4652,4653],{},"But imagine your code is trying to read data from Block 0 and Block 4 back-to-back in a loop.",[116,4655,4656,4662,4672],{},[119,4657,4658,4659,257],{},"CPU asks for Block 0. It goes into ",[31,4660,4661],{},"Line 0",[119,4663,4664,4665,4668,4669,4671],{},"CPU asks for Block 4. It also ",[23,4666,4667],{},"must"," go into ",[31,4670,4661],{},". It kicks out Block 0.",[119,4673,4674],{},"CPU asks for Block 0 again. It kicks out Block 4.",[16,4676,4677,4678,4680],{},"They are fighting over the exact same seat. Even if Cache Lines 1, 2, and 3 are completely empty, the cache will keep kicking out perfectly good data. This endless cycle of misses and evictions is called ",[31,4679,4550],{},", and it forces the CPU to constantly wait on the slow main memory.",[1800,4682,4684],{"id":4683},"associative-fully-associative-mapping","Associative (Fully Associative) Mapping",[16,4686,4687,4688,4691],{},"Any block can go in ",[31,4689,4690],{},"any"," cache line.",[16,4693,4694],{},"The address is split into just two fields:",[408,4696,4697,4707],{},[411,4698,4699],{},[414,4700,4701,4704],{},[417,4702,4703],{},"Tag (22 bits)",[417,4705,4706],{},"Word (2 bits)",[433,4708,4709],{},[414,4710,4711,4715],{},[438,4712,4713],{},[518,4714],{},[438,4716,4717],{},[518,4718],{},[16,4720,4721,4723,4724,4727,4728,257],{},[31,4722,4526],{}," When looking for a block, the cache must compare the tag against ",[23,4725,4726],{},"every"," line simultaneously. This requires special hardware called a ",[31,4729,4730],{},"content-addressable memory (CAM)",[16,4732,4733,4735],{},[31,4734,4540],{}," Maximum flexibility — no thrashing from mapping conflicts.",[16,4737,4738,4740],{},[31,4739,4546],{}," Expensive and slow for large caches because every tag must be checked in parallel.",[4357,4742,4744,4746],{"className":4743},[4360,4361],[3993,4745,4557],{},[4366,4747,4749,4756,4759,4762,4782,4785,4788,4792,4798,4804,4814,4825,4830],{"className":4748},[4369],[16,4750,4751,4752,4755],{},"In a Fully Associative cache, there are no assigned seats. Any crate of books from the library (Block) can be placed into ",[31,4753,4754],{},"any empty cubby (Line)"," on your desk.",[16,4757,4758],{},"If your desk cache has 4 lines, and you fetch Block 0, you can put it in Line 0, 1, 2, or 3. If you fetch Block 4, it can go in any of the remaining empty lines.",[16,4760,4761],{},"Because of this open seating arrangement, the CPU address is sliced differently:",[686,4763,4764,4772,4777],{},[119,4765,4766,1199,4768,4771],{},[31,4767,4627],{},[23,4769,4770],{},"This no longer exists."," Because a block could be sitting anywhere, there is no specific line to jump to.",[119,4773,4774,4776],{},[31,4775,4417],{}," \"Who are you?\" Because any line can hold any block, the sticky note (Tag) on the front of the cubby must be longer and much more specific to identify exactly which crate from the entire library is sitting there.",[119,4778,4779,4781],{},[31,4780,4646],{}," Still tells the CPU exactly which byte inside the block to read.",[16,4783,4784],{},"Remember in Direct Mapping how Block 0 and Block 4 kept kicking each other out of Line 0, even when the rest of the cache was empty?",[16,4786,4787],{},"Fully Associative mapping completely solves this. If you ask for Block 0, it goes in Line 0. If you ask for Block 4, it just goes into the next empty spot (Line 1). No fighting, no thrashing.",[42,4789,4791],{"id":4790},"the-catch-finding-the-data-the-cam","The Catch: Finding the Data (The CAM)",[16,4793,4794,4795,257],{},"The flexibility of open seating introduces a massive new problem: ",[31,4796,4797],{},"Searching",[16,4799,4800,4801,4755],{},"When the CPU asks for Block 4, it doesn't know which cubby to check. It has to look at the sticky notes (Tags) of ",[23,4802,4803],{},"every single cubby",[16,4805,4806,4807,4810,4811,257],{},"In software, checking every item in a list takes time (a ",[57,4808,4809],{},"for"," loop). But a CPU cache has to be lightning fast. To solve this, engineers use special hardware called ",[31,4812,4813],{},"Content-Addressable Memory (CAM)",[16,4815,4816,4817,4820,4821,4824],{},"Imagine you have a magical assistant for your bookshelf. Instead of reading the sticky notes one by one, you shout, ",[23,4818,4819],{},"\"Does anyone have Block 4?!\""," and the specific cubby holding Block 4 instantly lights up. The CAM hardware allows the cache to compare the requested Tag against ",[23,4822,4823],{},"every single line in the cache simultaneously"," in one clock cycle.",[16,4826,4827,4829],{},[31,4828,4540],{}," Maximum flexibility. You will almost never get a cache miss unless the entire cache is 100% full.",[16,4831,4832,4834],{},[31,4833,4546],{}," CAM hardware is incredibly expensive, takes up a lot of physical space on the silicon chip, and consumes a lot of power. Because of this, Fully Associative caches are usually only used for very small, highly critical caches.",[1800,4836,4838],{"id":4837},"set-associative-mapping","Set Associative Mapping",[16,4840,4841,4842,4845,4846,4849,4850,4853,4854,4857,4858,4860,4861,4863],{},"The compromise. The cache is divided into ",[31,4843,4844],{},"sets",", each containing ",[23,4847,4848],{},"k"," lines (this is called ",[23,4851,4852],{},"k-way"," set associative). A block maps to a specific ",[31,4855,4856],{},"set"," (like direct mapping), but within that set it can go in ",[31,4859,4690],{}," of the ",[23,4862,4848],{}," lines (like associative mapping).",[408,4865,4866,4877],{},[411,4867,4868],{},[414,4869,4870,4872,4875],{},[417,4871,4345],{},[417,4873,4874],{},"Set (index)",[417,4876,4505],{},[433,4878,4879],{},[414,4880,4881,4885,4889],{},[438,4882,4883],{},[518,4884],{},[438,4886,4887],{},[518,4888],{},[438,4890,4891],{},[518,4892],{},[16,4894,4895],{},"For example, in a 2-way set associative cache, each set has 2 lines. A block hashes to one set, and the cache checks both lines in that set.",[16,4897,4898,4900],{},[31,4899,4540],{}," Much less thrashing than direct mapping, while requiring far less comparison hardware than full associativity.",[16,4902,4903,4905],{},[31,4904,4546],{}," Slightly more complex than direct mapping.",[16,4907,4908],{},"Most modern processors use set associative caches (commonly 4-way, 8-way, or even 16-way).",[4357,4910,4912,4914],{"className":4911},[4360,4361],[3993,4913,4557],{},[4366,4915,4917,4924,4930,4944,4951,4961,4989,4996,4999,5002],{"className":4916},[4369],[16,4918,4919,4920,4923],{},"If Direct Mapping is \"Assigned Seating\" (you must sit in chair #3) and Fully Associative is \"Open Seating\" (sit anywhere), Set Associative Mapping is the ",[31,4921,4922],{},"\"Assigned Table\""," rule.",[16,4925,4926,4927,257],{},"Imagine we take your 4 bookshelf cubbies and divide them into ",[31,4928,4929],{},"2 distinct zones (Sets)",[116,4931,4932,4938],{},[119,4933,4934,4937],{},[31,4935,4936],{},"Set 0"," contains Cubby 0 and Cubby 1.",[119,4939,4940,4943],{},[31,4941,4942],{},"Set 1"," contains Cubby 2 and Cubby 3.",[16,4945,4946,4947,4950],{},"Because each Set has exactly 2 cubbies, we call this a ",[31,4948,4949],{},"2-Way Set Associative"," cache.",[16,4952,4953,4954,4957,4958,257],{},"Now, when you bring a crate of books (Block) from the library, it is assigned to a specific ",[23,4955,4956],{},"Set",", but it can go into ",[23,4959,4960],{},"any empty cubby within that Set",[116,4962,4963,4973,4982],{},[119,4964,4965,4966,4968,4969,4972],{},"Block 0 is assigned to ",[31,4967,4936],{}," -> It can go into Cubby 0 ",[23,4970,4971],{},"or"," Cubby 1.",[119,4974,4975,4976,4978,4979,4981],{},"Block 1 is assigned to ",[31,4977,4942],{}," -> It can go into Cubby 2 ",[23,4980,4971],{}," Cubby 3.",[119,4983,4984,4985,4968,4987,4972],{},"Block 2 is assigned to ",[31,4986,4936],{},[23,4988,4971],{},[16,4990,4991,4992,4995],{},"Because a block can only go to one specific Set, the cache doesn't have to search the entire bookshelf. The CPU address gives it the ",[31,4993,4994],{},"Set Index",", so it instantly jumps to the correct zone. This eliminates the massive, expensive, slow hardware search of Fully Associative caches.",[16,4997,4998],{},"But, because there are multiple \"ways\" (slots) within that Set, it drastically reduces the thrashing problem of Direct Mapping. If your code asks for Block 0, it goes to Set 0, Cubby 0. If your code immediately asks for Block 2, it also goes to Set 0, but instead of kicking out Block 0, it just slides into Cubby 1. No thrashing!",[16,5000,5001],{},"The CPU slices the address like this:",[686,5003,5004,5010,5018],{},[119,5005,5006,5009],{},[31,5007,5008],{},"Set (Index):"," \"Which zone do I check?\" The cache jumps immediately to this Set.",[119,5011,5012,5014,5015,257],{},[31,5013,4417],{}," \"Are you here?\" The cache uses a small, cheap piece of CAM hardware to simultaneously check the sticky notes of ",[23,5016,5017],{},"only the cubbies in this specific Set",[119,5019,5020,5022],{},[31,5021,4646],{}," The specific byte of data you want to read.",[42,5024,5026],{"id":5025},"_3-replacement-algorithms","3. Replacement Algorithms",[16,5028,5029,5030,1408],{},"When a cache set (or the entire cache, for fully associative) is full and a new block needs to come in, which existing block gets evicted? For direct mapping there's no choice — there's only one candidate. But for associative and set-associative caches, we need a ",[31,5031,5032],{},"replacement algorithm",[116,5034,5035,5041,5047,5053],{},[119,5036,5037,5040],{},[31,5038,5039],{},"Least Recently Used (LRU)"," — evict the block that hasn't been accessed for the longest time. This aligns perfectly with temporal locality and is the most effective strategy. It's also the most popular because it's relatively simple to implement in hardware.",[119,5042,5043,5046],{},[31,5044,5045],{},"First-In-First-Out (FIFO)"," — evict the block that has been in the cache the longest, regardless of how recently it was accessed. Easy to implement with a circular buffer.",[119,5048,5049,5052],{},[31,5050,5051],{},"Least Frequently Used (LFU)"," — evict the block that has been accessed the fewest times. Requires a counter per line.",[119,5054,5055,5058],{},[31,5056,5057],{},"Random"," — evict a random block. Surprisingly, random replacement performs only slightly worse than LRU in many workloads and is dead simple to implement.",[42,5060,5062],{"id":5061},"_4-write-policy","4. Write Policy",[16,5064,5065,5066,5069],{},"When the processor ",[23,5067,5068],{},"writes"," data, what happens to the cache and main memory?",[16,5071,5072,5075,5076,5079],{},[31,5073,5074],{},"Write Through"," — every write updates ",[23,5077,5078],{},"both"," the cache and main memory immediately. This keeps memory always consistent and is simple to understand. The downside is that it generates a lot of memory traffic, which can become a bottleneck.",[16,5081,5082,5085,5086,5089],{},[31,5083,5084],{},"Write Back"," — writes update only the cache. The modified block is written back to main memory later, only when it gets evicted. This drastically reduces memory traffic but means main memory can be temporarily out of date. A ",[31,5087,5088],{},"dirty bit"," on each line tracks whether it's been modified.",[1800,5091,5093],{"id":5092},"what-about-write-misses","What About Write Misses?",[16,5095,5096],{},"If the processor writes to an address that isn't in the cache, there are two options:",[116,5098,5099,5105],{},[119,5100,5101,5104],{},[31,5102,5103],{},"Write allocate"," — fetch the block into the cache first, then perform the write there. This is typically paired with write-back.",[119,5106,5107,5110],{},[31,5108,5109],{},"No write allocate"," — write directly to main memory without loading the block into the cache. This is typically paired with write-through.",[42,5112,5114],{"id":5113},"_5-line-size","5. Line Size",[16,5116,5117],{},"Bigger blocks take better advantage of spatial locality (you prefetch more neighbouring data), but there's a trade-off:",[116,5119,5120,5123],{},[119,5121,5122],{},"Larger blocks mean fewer lines in a fixed-size cache, increasing the chance of evictions.",[119,5124,5125],{},"Each miss takes longer to service because more data must be transferred.\nTypical line sizes are 32 or 64 bytes.",[42,5127,5129],{"id":5128},"_6-number-of-caches","6. Number of Caches",[16,5131,5132,5133,5136],{},"Modern systems don't have just one cache — they use a ",[31,5134,5135],{},"multilevel"," hierarchy:",[116,5138,5139,5149,5159],{},[119,5140,5141,5144,5145,5148],{},[31,5142,5143],{},"L1 cache"," — on the same chip as the processor, tiny but extremely fast. Often ",[31,5146,5147],{},"split"," into separate instruction and data caches (called an I-cache and D-cache) to avoid contention between the fetch and execution units.",[119,5150,5151,5154,5155,5158],{},[31,5152,5153],{},"L2 cache"," — larger and slightly slower. Usually ",[31,5156,5157],{},"unified"," (holds both instructions and data).",[119,5160,5161,5164],{},[31,5162,5163],{},"L3 cache"," — even larger, often shared across multiple processor cores.",[16,5166,5167,5168,5170],{},"Splitting L1 into instruction and data caches is important for ",[31,5169,44],{},", where the processor simultaneously fetches a new instruction while executing a previous one. If both operations need the same cache, they'd conflict. Splitting eliminates this contention.",[11,5172,5174],{"id":5173},"cache-coherency","Cache Coherency",[16,5176,5177,5178,5181,5182,5185],{},"Things get interesting, and complicated, when multiple processors (or cores) each have their own cache but share the same main memory. If Core A writes a new value to address X in its cache, Core B's cache might still hold the ",[23,5179,5180],{},"old"," value of X. This is the ",[31,5183,5184],{},"cache coherency"," problem.",[16,5187,5188],{},"Several approaches exist:",[116,5190,5191,5197,5203],{},[119,5192,5193,5196],{},[31,5194,5195],{},"Bus watching (snooping) with write-through"," — each cache controller monitors the memory bus. If it sees another core writing to an address that exists in its own cache, it invalidates that entry. Simple but depends on all caches using write-through.",[119,5198,5199,5202],{},[31,5200,5201],{},"Hardware transparency (snooping protocols)"," — dedicated hardware ensures that any cache update is propagated to all other caches. More complex, but works with write-back policies too.",[119,5204,5205,5208],{},[31,5206,5207],{},"Noncacheable memory"," — shared memory regions are simply marked as noncacheable. Every access goes directly to main memory. Simple but sacrifices performance for shared data.",[16,5210,5211,5212,5215],{},"In modern multi-core processors, sophisticated protocols like ",[31,5213,5214],{},"MESI"," (Modified, Exclusive, Shared, Invalid) handle coherency, but the fundamental ideas are the same.",[11,5217,5219],{"id":5218},"inclusion-policy","Inclusion Policy",[16,5221,5222],{},"When you have multiple cache levels, should a block in L1 also be kept in L2? There are three schools of thought:",[16,5224,5225,5228],{},[31,5226,5227],{},"Inclusive"," — if data is in L1, it's guaranteed to also be in L2 (and L3). This simplifies coherency checks in multi-core systems because you only need to search the last-level cache to know whether any core might have a copy.",[16,5230,5231,5234,5235,5238],{},[31,5232,5233],{},"Exclusive"," — if data is in L1, it is ",[23,5236,5237],{},"not"," in L2. No wasted space from duplicate copies, so you effectively get more total cache capacity. The trade-off is that coherency checks become harder since you may need to search multiple levels.",[16,5240,5241,5244],{},[31,5242,5243],{},"Non-inclusive"," — data in L1 may or may not be in L2. A flexible middle ground, but with similar coherency challenges as the exclusive policy.",[11,5246,784],{"id":783},[16,5248,5249],{},"Cache memory is one of those topics that sounds simple on the surface — \"just keep frequently used data nearby\" — but the design decisions run deep. From choosing a mapping function to picking a write policy and managing coherency across cores, every choice involves trade-offs between speed, complexity, cost, and correctness.",[16,5251,5252],{},"The key takeaways:",[16,5254,706,5255,5257,5258,5260,5261,5263,5264,5267],{},[31,5256,4101],{}," is ",[23,5259,4097],{}," caches work. Temporal and spatial locality mean that a small, fast memory can satisfy the vast majority of a processor's requests. The ",[31,5262,4083],{}," gives us the best of all worlds: the speed of small memories and the capacity of large ones. ",[31,5265,5266],{},"Cache design"," is a web of interconnected decisions — mapping, replacement, write policy, line size, levels, and inclusion — where no single choice is optimal in isolation; each must be tuned in context.",[16,5269,5270,5271,5274],{},"If you remember nothing else, remember this: the goal of the entire memory system is to create the ",[31,5272,5273],{},"illusion"," that the processor has access to a very large, very fast memory — even though no such memory physically exists.",{"title":284,"searchDepth":818,"depth":818,"links":5276},[5277,5278,5279,5280,5283,5291,5292,5293],{"id":4069,"depth":818,"text":4070},{"id":4090,"depth":818,"text":4091},{"id":4138,"depth":818,"text":4139},{"id":4292,"depth":818,"text":4293,"children":5281},[5282],{"id":4322,"depth":824,"text":4323},{"id":4427,"depth":818,"text":4428,"children":5284},[5285,5286,5287,5288,5289,5290],{"id":4434,"depth":824,"text":4435},{"id":4462,"depth":824,"text":4463},{"id":5025,"depth":824,"text":5026},{"id":5061,"depth":824,"text":5062},{"id":5113,"depth":824,"text":5114},{"id":5128,"depth":824,"text":5129},{"id":5173,"depth":818,"text":5174},{"id":5218,"depth":818,"text":5219},{"id":783,"depth":818,"text":784},"2026-04-20",{},"\u002Fblog\u002FThe-Memory-Hierarchy-Understanding-Cache-Memory",{"title":4064,"description":284},{"loc":5296},"blog\u002FThe-Memory-Hierarchy-Understanding-Cache-Memory","KodXT8URXZpZYHhv6TxzreaODZA4xeIqVxw7HDR0vyo",{"id":5302,"title":5303,"author":6,"body":5304,"date":2652,"description":284,"draft":862,"edited_at":6558,"extension":863,"featured_image":864,"meta":6559,"navigation":866,"path":6560,"pinned":862,"seo":6561,"sitemap":6562,"stem":6563,"tags":864,"__hash__":6564},"blog\u002Fblog\u002FInternal-Memory-How-Your-Computer-Remembers-Things.md","Internal Memory: How Your Computer Remembers Things",{"type":8,"value":5305,"toc":6530},[5306,5308,5314,5321,5324,5335,5345,5349,5356,5383,5409,5412,5416,5419,5544,5555,5619,5623,5628,5650,5658,5662,5678,5689,5694,5716,5721,5724,5773,5777,5788,5802,5809,5958,5962,5968,5973,5978,5989,5993,5999,6005,6025,6029,6032,6038,6046,6055,6065,6079,6083,6090,6107,6114,6119,6138,6155,6195,6199,6202,6206,6216,6227,6233,6237,6240,6254,6264,6290,6294,6297,6301,6312,6319,6323,6333,6340,6414,6425,6429,6432,6439,6443,6450,6464,6468,6471,6481,6494,6498,6505,6508,6512,6515,6521,6527],[11,5307,4070],{"id":4069},[16,5309,5310,5311],{},"Here's a fact that might surprise you: ",[31,5312,5313],{},"your processor is way faster than your memory.",[16,5315,5316,5317,5320],{},"A 3 GHz processor can execute a simple \"add\" operation in about 0.33 nanoseconds. But fetching data from main memory? That takes over 33 nanoseconds — roughly ",[31,5318,5319],{},"100 times slower",". That means if your system naively accessed memory every time it needed data, loads and stores would bottleneck everything else.",[16,5322,5323],{},"So engineers face an impossible wish list:",[116,5325,5326,5329,5332],{},[119,5327,5328],{},"Memory that runs at processor speed",[119,5330,5331],{},"Memory large enough for all running programs",[119,5333,5334],{},"Memory that's cheap",[16,5336,5337,5338,5340,5341,5344],{},"You can't have all three at once. That tension is the reason we have a ",[23,5339,4083],{}," — different types of memory layered together, each making a different trade-off between speed, size, and cost. In this post, we'll focus on ",[31,5342,5343],{},"internal memory",": the semiconductor-based memory that lives on or near the processor, as opposed to external storage like hard drives.",[11,5346,5348],{"id":5347},"the-memory-cell-the-smallest-unit","The Memory Cell: The Smallest Unit",[16,5350,5351,5352,5355],{},"Every piece of memory, at its most basic level, is built from ",[31,5353,5354],{},"memory cells",". A single memory cell:",[116,5357,5358,5370,5377],{},[119,5359,5360,5361,5364,5365,5367,5368],{},"Has ",[31,5362,5363],{},"two stable states",": representing either a ",[57,5366,2091],{}," or a ",[57,5369,446],{},[119,5371,5372,5373,5376],{},"Can be ",[31,5374,5375],{},"written to"," — setting its state",[119,5378,5372,5379,5382],{},[31,5380,5381],{},"read from"," — sensing its current state",[16,5384,5385,5386,5389,5390,5393,5394,5396,5397,5400,5401,5404,5405,5408],{},"When you ",[23,5387,5388],{},"write"," to a cell, a ",[31,5391,5392],{},"select"," signal activates it, a ",[31,5395,1487],{}," signal tells it \"this is a write operation,\" and ",[31,5398,5399],{},"data in"," provides the value. When you ",[23,5402,5403],{},"read",", the select and control signals activate the cell, and a ",[31,5406,5407],{},"sense"," line outputs the stored value.",[16,5410,5411],{},"Think of it like a light switch with a lock — you can flip it on (1) or off (0), lock it in place, and later check which position it's in.",[11,5413,5415],{"id":5414},"the-big-picture-semiconductor-memory-types","The Big Picture: Semiconductor Memory Types",[16,5417,5418],{},"Before we dive deep, here's a roadmap of the main memory types and how they relate to each other:",[408,5420,5421,5440],{},[411,5422,5423],{},[414,5424,5425,5428,5431,5434,5437],{},[417,5426,5427],{},"Memory Type",[417,5429,5430],{},"Category",[417,5432,5433],{},"Erasure",[417,5435,5436],{},"Write Mechanism",[417,5438,5439],{},"Volatile?",[433,5441,5442,5461,5480,5496,5513,5528],{},[414,5443,5444,5449,5452,5455,5458],{},[438,5445,5446],{},[31,5447,5448],{},"RAM",[438,5450,5451],{},"Read-write",[438,5453,5454],{},"Electrically, byte-level",[438,5456,5457],{},"Electrically",[438,5459,5460],{},"Yes",[414,5462,5463,5468,5471,5474,5477],{},[438,5464,5465],{},[31,5466,5467],{},"ROM",[438,5469,5470],{},"Read-only",[438,5472,5473],{},"Not possible",[438,5475,5476],{},"Masks (at factory)",[438,5478,5479],{},"No",[414,5481,5482,5487,5489,5491,5494],{},[438,5483,5484],{},[31,5485,5486],{},"PROM",[438,5488,5470],{},[438,5490,5473],{},[438,5492,5493],{},"Electrically (once)",[438,5495,5479],{},[414,5497,5498,5503,5506,5509,5511],{},[438,5499,5500],{},[31,5501,5502],{},"EPROM",[438,5504,5505],{},"Read-mostly",[438,5507,5508],{},"UV light, chip-level",[438,5510,5457],{},[438,5512,5479],{},[414,5514,5515,5520,5522,5524,5526],{},[438,5516,5517],{},[31,5518,5519],{},"EEPROM",[438,5521,5505],{},[438,5523,5454],{},[438,5525,5457],{},[438,5527,5479],{},[414,5529,5530,5535,5537,5540,5542],{},[438,5531,5532],{},[31,5533,5534],{},"Flash",[438,5536,5505],{},[438,5538,5539],{},"Electrically, block-level",[438,5541,5457],{},[438,5543,5479],{},[16,5545,5546,5547,5550,5551,5554],{},"The two big branches are ",[31,5548,5549],{},"volatile"," memory (loses data without power) and ",[31,5552,5553],{},"nonvolatile"," memory (retains data even when powered off). Let's explore each.",[4357,5556,5558,5560],{"className":5557},[4360,4361],[3993,5559,4557],{},[4366,5561,5563,5609,5612],{"className":5562},[4369],[116,5564,5565,5571,5577,5587,5593,5599],{},[119,5566,5567,5570],{},[31,5568,5569],{},"RAM (Random Access Memory):"," The defining example of volatile memory. It is fast, allows byte-level read\u002Fwrites, and is wiped when power is lost.",[119,5572,5573,5576],{},[31,5574,5575],{},"ROM (Read-Only Memory):"," Hardwired at the factory. Cannot be changed.",[119,5578,5579,5582,5583,5586],{},[31,5580,5581],{},"PROM (Programmable ROM):"," Blank from the factory, but can only be written to ",[23,5584,5585],{},"once"," (like burning a CD-R).",[119,5588,5589,5592],{},[31,5590,5591],{},"EPROM (Erasable Programmable ROM):"," A massive leap forward, but it required pulling the chip out of the computer and exposing a little quartz window on it to strong UV light to erase it.",[119,5594,5595,5598],{},[31,5596,5597],{},"EEPROM (Electrically Erasable Programmable ROM):"," Allowed memory to be erased electrically without removing the chip, but only one byte at a time (too slow for bulk storage).",[119,5600,5601,5604,5605,5608],{},[31,5602,5603],{},"Flash:"," The defining example of modern non-volatile memory (used in SSDs, USB drives, and smartphones). You need to know that it is electrically erased at the ",[31,5606,5607],{},"block-level"," (which is why writing data to a nearly full SSD can slow down).",[16,5610,5611],{},"Byte-Level means the computer can target a single byte of data (usually 8 bits) and change it without affecting the data sitting right next to it. It is incredibly convenient and fast for making small changes, but building the microscopic circuitry required to target every individual byte makes the physical chip more complex and expensive to manufacture.",[16,5613,5614,5615,5618],{},"Block-level means the memory is divided into larger chunks called \"blocks\" (often thousands of bytes large). To change even a single byte of data, the computer must erase the ",[23,5616,5617],{},"entire"," block first, and then rewrite the whole block with the new change included. By giving up byte-level precision, engineers were able to drastically simplify the wiring inside the chip. This is exactly why Flash memory became so cheap and can hold massive amounts of data. However, it requires clever software controllers to manage the constant copying and erasing of blocks so the drive does not wear out too quickly.",[11,5620,5622],{"id":5621},"ram-the-workhorse","RAM: The Workhorse",[16,5624,5625,5627],{},[31,5626,5448],{}," (Random Access Memory) is the memory your computer uses for active work. It's called \"random access\" because any byte can be read or written in roughly the same amount of time, regardless of its location. The key characteristics:",[116,5629,5630,5637,5643],{},[119,5631,5632,5633,5636],{},"Data is read and written using ",[31,5634,5635],{},"electrical signals"," — fast and easy",[119,5638,5639,5640,5642],{},"It's ",[31,5641,5549],{}," — turn off the power, and everything disappears",[119,5644,5645,5646,5649],{},"It serves as ",[31,5647,5648],{},"temporary storage"," for programs and data currently in use",[16,5651,5652,5653,724,5655,257],{},"There are two major flavors of RAM: ",[31,5654,4237],{},[31,5656,5657],{},"SRAM",[42,5659,5661],{"id":5660},"dram-dynamic-ram-the-forgetful-one","DRAM (Dynamic RAM) — The Forgetful One",[16,5663,5664,5665,5668,5669,5671,5672,5674,5675,257],{},"DRAM stores each bit of data as a ",[31,5666,5667],{},"charge on a tiny capacitor",". A charged capacitor represents a ",[57,5670,446],{},"; a discharged one represents ",[57,5673,2091],{},". The circuit for a single DRAM cell is remarkably simple — just ",[31,5676,5677],{},"one transistor and one capacitor",[16,5679,5680,5681,5684,5685,5688],{},"But there's a catch: capacitors ",[23,5682,5683],{},"leak",". The charge slowly drains away over time, so the data would be lost if left alone. That's why DRAM needs ",[31,5686,5687],{},"periodic refreshing"," — the memory controller must regularly read every row and write it back to restore the charges. This is what makes it \"dynamic.\"",[16,5690,5691],{},[31,5692,5693],{},"How DRAM reads and writes:",[116,5695,5696,5702],{},[119,5697,5698,5701],{},[31,5699,5700],{},"Writing:"," A voltage is applied to the bit line (high for 1, low for 0). Then the address line is activated, which turns on the transistor, allowing the charge to transfer to the capacitor.",[119,5703,5704,5707,5708,5711,5712,5715],{},[31,5705,5706],{},"Reading:"," The address line is activated, turning on the transistor. The charge stored in the capacitor flows out through the bit line to a ",[31,5709,5710],{},"sense amplifier",", which compares it against a reference voltage to determine whether it's a 0 or 1. Here's the tricky part — reading is ",[31,5713,5714],{},"destructive",". The act of reading drains the capacitor, so the charge must be written back after every read.",[16,5717,5718],{},[31,5719,5720],{},"Refreshing in practice:",[16,5722,5723],{},"A dedicated refresh circuit is built into the chip. It temporarily disables normal access, steps through each row, reads the data, and writes it back. This takes time and slightly reduces the apparent performance of the chip. Every few milliseconds, the entire memory array must be refreshed.",[4357,5725,5727,5730],{"className":5726},[4360,4361],[3993,5728,5729],{},"Additional Explaination",[4366,5731,5733,5736,5742,5748,5762],{"className":5732},[4369],[16,5734,5735],{},"DRAM is arranged in a massive grid, much like a spreadsheet or a city map. To read or write data, the memory controller needs a way to target specific \"coordinates\" on that grid.",[16,5737,5738,5741],{},[31,5739,5740],{},"The Address Line"," is the horizontal wire running across a row of memory cells. Think of the transistor in the DRAM cell as a door blocking access to the capacitor (where the data lives). The address line controls that door. When the memory controller sends a high voltage down the address line, it \"opens the door\" (turns on the transistor) for every single cell in that specific row, connecting their capacitors to the rest of the circuit.",[16,5743,5744,5747],{},[31,5745,5746],{},"The Bit Line"," is the vertical wire running down a column of memory cells. If the address line opens the door, the bit line is the hallway the data travels through.",[116,5749,5750,5756],{},[119,5751,5752,5755],{},[31,5753,5754],{},"During a Write:"," The memory controller forces a high voltage (a 1) or low voltage (a 0) down the bit line. Because the address line opened the transistor door, that voltage flows from the bit line into the capacitor, charging or discharging it.",[119,5757,5758,5761],{},[31,5759,5760],{},"During a Read:"," The transistor door opens, and whatever tiny bit of charge is stored in the capacitor spills out onto the bit line to be read.",[16,5763,5764,5765,5768,5769,5772],{},"The Sense Amplifier is a highly sensitive measuring circuit situated at the end of each bit line. The capacitor inside a DRAM cell is microscopic. When it dumps its charge onto the much larger bit line during a \"read\" operation, the resulting voltage change is incredibly weak—barely a whisper of a signal. The sense amplifier detects that tiny voltage shift and instantly ",[31,5766,5767],{},"amplifies"," it into a strong, clear digital 1 or 0 that the computer's processor can actually understand. Furthermore, because reading the capacitor drained it (destructive read), the sense amplifier immediately pushes that newly amplified strong signal ",[23,5770,5771],{},"back"," up the bit line to recharge the capacitor, saving the data from being lost forever.",[42,5774,5776],{"id":5775},"sram-static-ram-the-fast-one","SRAM (Static RAM) — The Fast One",[16,5778,5779,5780,5783,5784,5787],{},"SRAM takes a completely different approach. Instead of a capacitor, it stores each bit using a ",[31,5781,5782],{},"flip-flop"," circuit — a configuration of ",[31,5785,5786],{},"six transistors"," (typically labeled T₁ through T₆). Two pairs of transistors form cross-coupled inverters (T₁\u002FT₃ and T₂\u002FT₄), creating two stable states. The other two transistors (T₅ and T₆) connect the cell to the bit lines and are controlled by the address line.",[116,5789,5790,5796],{},[119,5791,5792,5795],{},[31,5793,5794],{},"State 1:"," Point C₁ is high, C₂ is low. Transistors T₁ and T₄ are off, T₂ and T₃ are on.",[119,5797,5798,5801],{},[31,5799,5800],{},"State 0:"," Point C₂ is high, C₁ is low. Transistors T₂ and T₃ are off, T₁ and T₄ are on.",[16,5803,5804,5805,5808],{},"The beauty of SRAM is that ",[31,5806,5807],{},"as long as power is supplied, the flip-flop holds its state indefinitely"," — no refresh needed. To write, you apply the desired value to bit line B and its complement to B̄, then activate the address line. To read, the value simply appears on bit line B when the address line is activated.",[4357,5810,5812,5814],{"className":5811},[4360,4361],[3993,5813,5729],{},[4366,5815,5817,5823,5826,5833,5839,5847,5850,5864,5870,5873,5885,5891,5905,5917,5920,5926,5951],{"className":5816},[4369],[42,5818,5820],{"id":5819},"_1-the-core-the-flip-flop-t-t-t-t",[31,5821,5822],{},"1. The Core: The Flip-Flop (T₁, T₂, T₃, T₄)",[16,5824,5825],{},"This is where the data actually lives. These four transistors are wired together to create two \"inverters.\" An inverter is a simple logic gate: whatever signal goes in, the exact opposite comes out (a 1 becomes a 0, and a 0 becomes a 1).",[16,5827,5828,5829,5832],{},"In SRAM, these two inverters are ",[31,5830,5831],{},"cross-coupled",", meaning the output of the first is plugged into the input of the second, and the output of the second is plugged into the input of the first.",[16,5834,5835,5838],{},[31,5836,5837],{},"The Analogy:"," Imagine two people, Alice and Bob, locked in a room.",[116,5840,5841,5844],{},[119,5842,5843],{},"Alice’s only rule is: \"I must shout the exact opposite of whatever Bob shouts.\"",[119,5845,5846],{},"Bob’s only rule is: \"I must shout the exact opposite of whatever Alice shouts.\"",[16,5848,5849],{},"If Alice shouts \"YES\" (1), Bob hears it and immediately shouts \"NO\" (0). Alice hears Bob's \"NO\" and uses it to justify continuing to shout \"YES\".",[16,5851,5852,5853,5856,5857,724,5860,5863],{},"They will hold this exact state forever, without needing any outside help, as long as they are awake (connected to power). This self-reinforcing loop is what makes SRAM ",[31,5854,5855],{},"static",". It does not need to be refreshed because the circuit actively holds itself in place. Points ",[31,5858,5859],{},"C₁",[31,5861,5862],{},"C₂"," from your text represent what Alice and Bob are currently shouting.",[42,5865,5867],{"id":5866},"_2-the-doors-the-access-transistors-t-and-t",[31,5868,5869],{},"2. The Doors: The Access Transistors (T₅ and T₆)",[16,5871,5872],{},"If T₁ through T₄ are Alice and Bob locked in a room, T₅ and T₆ are two separate doors leading into that room.",[116,5874,5875,5882],{},[119,5876,5877,5878,5881],{},"They are both controlled by the ",[31,5879,5880],{},"Address Line"," (the horizontal wire that selects the row).",[119,5883,5884],{},"When the Address Line is activated, both doors open simultaneously.",[42,5886,5888],{"id":5887},"_3-the-pathways-the-bit-lines-b-and-b̄",[31,5889,5890],{},"3. The Pathways: The Bit Lines (B and B̄)",[16,5892,5893,5894,5897,5898,724,5901,5904],{},"Notice that SRAM uses ",[23,5895,5896],{},"two"," bit lines for a single cell: ",[31,5899,5900],{},"B",[31,5902,5903],{},"B̄"," (pronounced \"B-bar\" or \"B-complement\").",[116,5906,5907,5912],{},[119,5908,5909,5911],{},[31,5910,5900],{}," carries the actual data (e.g., a 1).",[119,5913,5914,5916],{},[31,5915,5903],{}," always carries the exact opposite (e.g., a 0).",[16,5918,5919],{},"We use two lines because the internal loop (Alice and Bob) is very stubborn. To read or write quickly and reliably, we need to interact with both sides of the loop at the same time.",[42,5921,5923],{"id":5922},"how-reading-and-writing-works",[31,5924,5925],{},"How Reading and Writing Works",[116,5927,5928,5940],{},[119,5929,5930,5933,5934,5936,5937,5939],{},[31,5931,5932],{},"Writing (Forcing a change):"," Let's say the cell is currently storing a 0, but you want to write a 1. The memory controller sends a strong \"1\" signal down the ",[31,5935,5900],{}," line, and a strong \"0\" signal down the ",[31,5938,5903],{}," line. Then, it activates the Address Line, opening the doors (T₅ and T₆). The powerful signals from the outside flood into the room, overpower Alice and Bob, and force them to flip their stances. Once the doors close, the loop stabilizes in its new state.",[119,5941,5942,5945,5946,724,5948,5950],{},[31,5943,5944],{},"Reading (A non-destructive look):"," The controller pre-charges both the ",[31,5947,5900],{},[31,5949,5903],{}," lines to a neutral, middle voltage. Then it activates the Address Line to open the doors (T₅ and T₆). Because the internal loop is actively powered, it pushes its internal voltages out through the doors onto the bit lines. The Sense Amplifiers at the bottom of the bit lines detect which line went slightly up and which went slightly down to figure out if it's a 1 or a 0.",[16,5952,5953,5954,5957],{},"Most importantly: ",[31,5955,5956],{},"reading does not destroy the data."," Because the flip-flop is constantly connected to the main power supply, looking at its state doesn't drain it the way reading a DRAM capacitor does.",[42,5959,5961],{"id":5960},"sram-vs-dram-a-quick-comparison","SRAM vs. DRAM — A Quick Comparison",[16,5963,5964,5965,5967],{},"Both SRAM and DRAM are ",[31,5966,5549],{}," — they both need power to hold their data. But beyond that, they differ in almost every way:",[16,5969,5970,5972],{},[31,5971,4237],{}," is simpler to build (1 transistor + 1 capacitor per cell), smaller per bit, higher density, and cheaper. But it needs constant refreshing.",[16,5974,5975,5977],{},[31,5976,5657],{}," is faster — no refresh delays, no destructive reads. But it uses 6 transistors per cell, so it's larger, less dense, and more expensive.",[16,5979,5980,5981,5984,5985,5988],{},"This is why ",[31,5982,5983],{},"DRAM is used for main memory"," (where you need lots of gigabytes at a reasonable price) and ",[31,5986,5987],{},"SRAM is used for cache"," (where you need blazing speed and don't mind the higher cost for smaller amounts).",[11,5990,5992],{"id":5991},"rom-memory-that-doesnt-forget","ROM: Memory That Doesn't Forget",[16,5994,5995,5996,5998],{},"While RAM is great for active computation, sometimes you need memory that survives a power cycle. That's where ",[31,5997,5467],{}," (Read Only Memory) comes in.",[16,6000,6001,6002,6004],{},"ROM is ",[31,6003,5553],{}," — no power source is needed to maintain the stored data. It's used for things like:",[116,6006,6007,6013,6019],{},[119,6008,6009,6012],{},[31,6010,6011],{},"BIOS\u002Ffirmware"," — the code that boots your computer before the OS loads",[119,6014,6015,6018],{},[31,6016,6017],{},"Library subroutines"," — frequently used functions baked into hardware",[119,6020,6021,6024],{},[31,6022,6023],{},"Function tables"," — lookup tables for common calculations",[42,6026,6028],{"id":6027},"the-rom-family","The ROM Family",[16,6030,6031],{},"ROM comes in several varieties, each with different trade-offs between flexibility and cost:",[16,6033,6034,6037],{},[31,6035,6036],{},"ROM (Mask ROM):"," The data is literally wired in during the manufacturing process using photographic masks. It's permanent and cannot be changed. This makes it extremely cheap at scale, but there's absolutely no room for error — if the data is wrong, you throw away the whole chip. Best for mass-produced devices where the firmware will never change.",[16,6039,6040,6042,6043,6045],{},[31,6041,5581],{}," Like ROM, but the user can write data to it ",[31,6044,5585],{}," after manufacture, using special programming equipment. It's non-erasable, so mistakes are permanent. PROM is ideal for small production runs or prototyping where mask ROM's setup costs aren't justified.",[16,6047,6048,6050,6051,6054],{},[31,6049,5591],{}," A step up — EPROM can be erased by exposing the chip to ",[31,6052,6053],{},"ultraviolet light"," through a small quartz window on the chip package. UV exposure erases the entire chip (you can't selectively erase), and the process takes a relatively long time (minutes to hours). Once erased, it can be reprogrammed electrically.",[16,6056,6057,6060,6061,6064],{},[31,6058,6059],{},"EEPROM (Electrically Erasable PROM):"," No UV lamp needed — EEPROM can be erased and reprogrammed ",[31,6062,6063],{},"electrically at the byte level",". You can update individual bytes without erasing the whole chip. The downsides: writing takes much longer than reading, and EEPROM is more expensive and less dense than EPROM.",[16,6066,6067,6070,6071,6074,6075,6078],{},[31,6068,6069],{},"Flash Memory:"," Flash is the sweet spot between EPROM and EEPROM. It erases ",[31,6072,6073],{},"electrically"," (no UV light needed) but at the ",[31,6076,6077],{},"block level"," rather than byte level. A section of memory cells are erased in a single fast action — that's where the name \"flash\" comes from. It uses only one transistor per bit (like EPROM), achieving high density, while being much faster to program than EEPROM.",[11,6080,6082],{"id":6081},"inside-a-dram-chip-organisation","Inside a DRAM Chip: Organisation",[16,6084,6085,6086,6089],{},"Let's look at how a real DRAM chip is organised internally, using a ",[31,6087,6088],{},"16 Mbit DRAM (4M × 4)"," as an example. This chip stores 16 megabits of data, arranged as 4 million locations, each storing 4 bits.",[4366,6091,6093],{"className":6092},[4360,4361],[16,6094,6095,6098,6099,6102,6103,6106],{},[31,6096,6097],{},"\"4M × 4\""," means there are ",[31,6100,6101],{},"4 million individual lockers"," (locations), and every time you open one locker, you put in or take out exactly ",[31,6104,6105],{},"4 bits"," of data at once.",[16,6108,6109,6110,6113],{},"The memory array is physically laid out as a ",[31,6111,6112],{},"2048 × 2048 × 4"," grid. But 2048 rows × 2048 columns × 4 bits = 16,777,216 bits = 16 Mbit. To address 2048 rows, you need 11 address lines (since 2¹¹ = 2048).",[16,6115,6116],{},[31,6117,6118],{},"The clever trick — address multiplexing:",[16,6120,6121,6122,6125,6126,6129,6130,6133,6134,6137],{},"Instead of using 22 address pins (11 for rows + 11 for columns), DRAM chips ",[31,6123,6124],{},"multiplex"," the address: they send the row address and column address ",[23,6127,6128],{},"over the same pins at different times",". First, the row address is sent and latched with the ",[31,6131,6132],{},"RAS (Row Address Strobe)"," signal. Then the column address is sent and latched with the ",[31,6135,6136],{},"CAS (Column Address Strobe)"," signal. This cuts the pin count roughly in half — a huge deal for chip packaging.",[16,6139,6140,6141,6144,6145,6148,6149,396,6152,257],{},"The chip also includes a ",[31,6142,6143],{},"refresh counter"," (to cycle through rows during refresh), a ",[31,6146,6147],{},"MUX"," (to select between external addresses and refresh addresses), ",[31,6150,6151],{},"row and column decoders",[31,6153,6154],{},"data I\u002FO buffers",[4357,6156,6158,6161],{"className":6157},[4360,4361],[3993,6159,6160],{},"Additional notes",[4366,6162,6164,6174,6183,6189],{"className":6163},[4369],[16,6165,6166,6169,6170,6173],{},[31,6167,6168],{},"Row and column decoders",": these are what actually ",[23,6171,6172],{},"use"," the latched addresses. The row decoder takes the 11-bit row number and activates one physical wire out of 2048. The column decoder does the same for columns. They're the bridge between \"the chip received address bits\" and \"the correct cells are now connected.\"",[16,6175,6176,6179,6180,6182],{},[31,6177,6178],{},"Refresh counter",": this is just a simple counter that automatically cycles through row numbers (0, 1, 2, ... 2047, 0, 1, ...) so every row gets refreshed periodically. We already covered ",[23,6181,4097],{}," refresh is needed in the DRAM section. The counter is just the mechanism that automates it.",[16,6184,6185,6188],{},[31,6186,6187],{},"MUX (multiplexer)",": during normal operation, the row decoder receives the external address from the CPU. During refresh, it needs the address from the refresh counter instead. The MUX is a simple switch that picks between these two sources.",[16,6190,6191,6194],{},[31,6192,6193],{},"Data I\u002FO buffers",": just the circuitry that drives the data pins during a read and receives data during a write. Straightforward.",[42,6196,6198],{"id":6197},"packaging-and-chip-pinouts","Packaging and Chip Pinouts",[16,6200,6201],{},"Memory chips come in standard packages with defined pinouts. For example, an 8 Mbit EPROM might use a 32-pin DIP (Dual In-line Package), while a 16 Mbit DRAM uses a 24-pin package. The DRAM needs fewer pins partly because of address multiplexing, a neat design win.",[42,6203,6205],{"id":6204},"building-bigger-memory-modules","Building Bigger: Memory Modules",[16,6207,6208,6209,6212,6213,257],{},"A single chip usually stores one bit (or a few bits) per address. To build a full ",[31,6210,6211],{},"byte-wide"," memory system, you combine multiple chips into a ",[31,6214,6215],{},"module",[16,6217,6218,6219,6222,6223,6226],{},"For example, to build a ",[31,6220,6221],{},"256 kByte module",", you can combine ",[31,6224,6225],{},"eight"," 1-bit RAM chips, each with 256k locations. All eight chips share the same address bus, and each chip contributes one bit to form an 8-bit (1-byte) data word. The Memory Address Register (MAR) feeds the address to all chips simultaneously, and the Memory Buffer Register (MBR) collects the 8 bits.",[16,6228,2013,6229,6232],{},[31,6230,6231],{},"1 MByte module"," scales this up further by grouping chips into four groups (A, B, C, D), with a chip-group-enable signal selecting which group is active. This approach lets you increase capacity by adding more chip groups without redesigning the chip itself.",[11,6234,6236],{"id":6235},"error-correction-because-bits-flip","Error Correction: Because Bits Flip",[16,6238,6239],{},"Memory isn't perfect. Errors can and do occur:",[116,6241,6242,6248],{},[119,6243,6244,6247],{},[31,6245,6246],{},"Hard failures"," are permanent physical defects — a cell that's stuck at 0 or 1.",[119,6249,6250,6253],{},[31,6251,6252],{},"Soft errors"," are random, non-destructive events. A cosmic ray or electrical noise might flip a bit, but the cell itself is fine. No permanent damage to the memory.",[16,6255,6256,6257,6260,6261,257],{},"To detect and fix these, memory systems use ",[31,6258,6259],{},"error-correcting codes (ECC)",", most commonly ",[31,6262,6263],{},"Hamming codes",[16,6265,6266,6267,6270,6271,6273,6274,6277,6278,6281,6282,6285,6286,6289],{},"Here's the basic idea: when data (M bits) is written to memory, a function ",[23,6268,6269],{},"f"," generates K check bits from the data. Both the M data bits and K check bits are stored together. When the data is read back, the same function ",[23,6272,6269],{}," is applied to the M data bits to generate a ",[23,6275,6276],{},"new"," set of K check bits. These are ",[31,6279,6280],{},"compared"," with the stored check bits. If they match, no error. If they differ, the comparison result (called a ",[31,6283,6284],{},"syndrome",") identifies which bit flipped, and a ",[31,6287,6288],{},"corrector"," circuit fixes it. An error signal is also raised so the system knows a correction occurred.",[11,6291,6293],{"id":6292},"advanced-dram-sdram-and-ddr","Advanced DRAM: SDRAM and DDR",[16,6295,6296],{},"Basic DRAM is asynchronous — the processor sends a request and waits an unpredictable amount of time for the data to arrive. As processors got faster, this \"waiting around\" became a serious bottleneck. Enter the advanced DRAM technologies.",[42,6298,6300],{"id":6299},"sdram-synchronous-dram","SDRAM (Synchronous DRAM)",[16,6302,6303,6304,6307,6308,6311],{},"SDRAM changed the game by ",[31,6305,6306],{},"synchronizing memory access to an external clock",". In conventional DRAM, the CPU sends a request and then just... waits. With SDRAM, because data transfers happen in lockstep with the system clock, the CPU ",[31,6309,6310],{},"knows exactly when the data will be ready",". This means the CPU can go do something else in the meantime instead of sitting idle.",[16,6313,6314,6315,6318],{},"SDRAM also supports ",[31,6316,6317],{},"burst mode",": after providing a starting address, the chip can fire out a stream of consecutive data words on successive clock cycles without needing a new address for each one. For example, with a burst length of 4 and a CAS latency of 2, you issue one READ command and get four consecutive data outputs starting two clock cycles later.",[42,6320,6322],{"id":6321},"ddr-sdram-double-data-rate","DDR SDRAM (Double Data Rate)",[16,6324,6325,6326,6329,6330,6332],{},"DDR took SDRAM further by transferring data on ",[31,6327,6328],{},"both edges of the clock signal"," — the rising edge ",[23,6331,4168],{}," the falling edge. This effectively doubles the data rate without increasing the clock frequency.",[16,6334,6335,6336,6339],{},"But DDR's improvements go beyond that. Each successive DDR generation added a wider ",[31,6337,6338],{},"prefetch buffer"," — an internal buffer that reads more bits from the memory array in each access:",[408,6341,6342,6358],{},[411,6343,6344],{},[414,6345,6346,6349,6352,6355],{},[417,6347,6348],{},"Generation",[417,6350,6351],{},"Prefetch Buffer",[417,6353,6354],{},"Voltage",[417,6356,6357],{},"Data Rate",[433,6359,6360,6374,6387,6401],{},[414,6361,6362,6365,6368,6371],{},[438,6363,6364],{},"DDR1",[438,6366,6367],{},"2 bits",[438,6369,6370],{},"2.5V",[438,6372,6373],{},"200–400 Mbps",[414,6375,6376,6379,6381,6384],{},[438,6377,6378],{},"DDR2",[438,6380,6105],{},[438,6382,6383],{},"1.8V",[438,6385,6386],{},"400–1,066 Mbps",[414,6388,6389,6392,6395,6398],{},[438,6390,6391],{},"DDR3",[438,6393,6394],{},"8 bits",[438,6396,6397],{},"1.5V",[438,6399,6400],{},"800–2,133 Mbps",[414,6402,6403,6406,6408,6411],{},[438,6404,6405],{},"DDR4",[438,6407,6394],{},[438,6409,6410],{},"1.2V",[438,6412,6413],{},"2,133–4,266 Mbps",[16,6415,6416,6417,6420,6421,6424],{},"Notice the pattern: each generation ",[31,6418,6419],{},"lowers the voltage"," (less power, less heat) while ",[31,6422,6423],{},"increasing the data rate",". The internal memory array runs at roughly the same speed across generations — the speed gains come from the wider prefetch and faster I\u002FO bus. DDR4 also introduced a two-multiplexer design with bank groups, further boosting throughput.",[11,6426,6428],{"id":6427},"flash-memory-the-best-of-both-worlds","Flash Memory: The Best of Both Worlds",[16,6430,6431],{},"Flash memory deserves its own section because it has become incredibly important in modern computing — from USB drives to SSDs to the storage in your phone.",[16,6433,6434,6435,6438],{},"First introduced in the mid-1980s, flash sits between EPROM and EEPROM in both cost and functionality. Like EEPROM, it erases electrically (no UV light). Like EPROM, it uses only one transistor per bit, achieving high density. The trade-off is that it ",[31,6436,6437],{},"cannot erase individual bytes"," — you must erase entire blocks at once.",[42,6440,6442],{"id":6441},"how-flash-works","How Flash Works",[16,6444,6445,6446,6449],{},"A flash memory cell is a modified transistor with an extra layer: a ",[31,6447,6448],{},"floating gate"," sandwiched between the control gate and the transistor channel, surrounded by insulating oxide.",[116,6451,6452,6458],{},[119,6453,6454,6457],{},[31,6455,6456],{},"To store a 0:"," Electrons are injected onto the floating gate (by applying a high voltage to the control gate). These trapped electrons raise the transistor's threshold voltage, making it harder to turn on. When read, the cell doesn't conduct — that's interpreted as 0.",[119,6459,6460,6463],{},[31,6461,6462],{},"To store a 1:"," The floating gate has no trapped electrons (or they've been removed by erasure). The transistor turns on normally when read — that's interpreted as 1.",[42,6465,6467],{"id":6466},"nor-vs-nand-flash","NOR vs. NAND Flash",[16,6469,6470],{},"Flash chips come in two main architectures:",[16,6472,6473,6476,6477,6480],{},[31,6474,6475],{},"NOR flash"," connects each cell individually to the bit line (like NOR gates in parallel). This allows ",[31,6478,6479],{},"random access"," to any byte — great for executing code directly from flash (called XIP, \"execute in place\"). NOR flash is typically used for firmware storage and embedded applications.",[16,6482,6483,6486,6487,6489,6490,6493],{},[31,6484,6485],{},"NAND flash"," connects cells in ",[31,6488,981],{}," (like a chain). You can't access individual bytes randomly — you read and write in pages (typically 4 kB or larger). But NAND is ",[31,6491,6492],{},"denser and cheaper"," per bit because the series connection uses less chip area. NAND flash is what's inside your SSD, USB drive, and smartphone storage.",[11,6495,6497],{"id":6496},"looking-ahead-nonvolatile-ram-nvram","Looking Ahead: Nonvolatile RAM (NVRAM)",[16,6499,6500,6501,6504],{},"The memory world is actively pursuing technologies that combine the speed of RAM with the persistence of flash. These ",[31,6502,6503],{},"nonvolatile RAM"," (NVRAM) technologies sit in the memory hierarchy between DRAM and flash\u002FSSD, potentially offering both fast access and data retention without power.",[16,6506,6507],{},"Some promising NVRAM technologies include STT-RAM (Spin-Transfer Torque RAM), PCRAM (Phase-Change RAM), and ReRAM (Resistive RAM). Each uses a different physical mechanism to store data persistently while aiming for DRAM-like speeds. While these are still maturing, they could eventually blur the line between memory and storage.",[11,6509,6511],{"id":6510},"summary-putting-it-all-together","Summary: Putting It All Together",[16,6513,6514],{},"Internal memory is all about trade-offs. Here's how the pieces fit together in a typical system:",[16,6516,6517,6520],{},[31,6518,6519],{},"Closest to the CPU → Fastest but smallest and most expensive:","\nRegisters → SRAM Cache (L1, L2, L3) → DRAM Main Memory → Flash\u002FSSD Storage",[16,6522,6523,6524,6526],{},"Each layer compensates for the weaknesses of the next. SRAM cache hides the relative slowness of DRAM. DRAM provides the capacity that SRAM can't afford. Flash provides persistence that DRAM can't. And the entire hierarchy works together to give you the ",[23,6525,5273],{}," of memory that's fast, big, cheap, and permanent — even though no single technology delivers all four.",[16,6528,6529],{},"Understanding this hierarchy — and the physics behind each layer — is one of the most fundamental insights in computer architecture. Every optimisation in modern computing, from CPU cache policies to SSD wear levelling, traces back to these core trade-offs.",{"title":284,"searchDepth":818,"depth":818,"links":6531},[6532,6533,6534,6535,6540,6543,6547,6548,6552,6556,6557],{"id":4069,"depth":818,"text":4070},{"id":5347,"depth":818,"text":5348},{"id":5414,"depth":818,"text":5415},{"id":5621,"depth":818,"text":5622,"children":6536},[6537,6538,6539],{"id":5660,"depth":824,"text":5661},{"id":5775,"depth":824,"text":5776},{"id":5960,"depth":824,"text":5961},{"id":5991,"depth":818,"text":5992,"children":6541},[6542],{"id":6027,"depth":824,"text":6028},{"id":6081,"depth":818,"text":6082,"children":6544},[6545,6546],{"id":6197,"depth":824,"text":6198},{"id":6204,"depth":824,"text":6205},{"id":6235,"depth":818,"text":6236},{"id":6292,"depth":818,"text":6293,"children":6549},[6550,6551],{"id":6299,"depth":824,"text":6300},{"id":6321,"depth":824,"text":6322},{"id":6427,"depth":818,"text":6428,"children":6553},[6554,6555],{"id":6441,"depth":824,"text":6442},{"id":6466,"depth":824,"text":6467},{"id":6496,"depth":818,"text":6497},{"id":6510,"depth":818,"text":6511},"2026-04-21",{},"\u002Fblog\u002FInternal-Memory-How-Your-Computer-Remembers-Things",{"title":5303,"description":284},{"loc":6560},"blog\u002FInternal-Memory-How-Your-Computer-Remembers-Things","ijpMb06XXoS4sMkXGGkIJkRveH4ZtcNrH4vSYFSjScw",[6566,6575,6584,6593,6602,6611,6620,6629,6638,6649],{"id":985,"title":986,"avatar":987,"banner":864,"bio":988,"body":6567,"description":284,"extension":863,"meta":6571,"name":986,"navigation":866,"path":994,"seo":6572,"sitemap":6573,"social":6574,"stem":1001,"__hash__":1002},{"type":8,"value":6568,"toc":6569},[],{"title":284,"searchDepth":818,"depth":818,"links":6570},[],{},{"description":284},{"loc":994},{"website":998,"twitter":999,"github":1000},{"id":1004,"title":1005,"avatar":1006,"banner":1007,"bio":1008,"body":6576,"description":284,"extension":863,"meta":6580,"name":1014,"navigation":866,"path":1015,"seo":6581,"sitemap":6582,"social":6583,"stem":1021,"__hash__":1022},{"type":8,"value":6577,"toc":6578},[],{"title":284,"searchDepth":818,"depth":818,"links":6579},[],{},{"description":284},{"loc":1015},{"github":1019,"twitter":284,"website":1020},{"id":1024,"title":1025,"avatar":1026,"banner":1027,"bio":1028,"body":6585,"description":284,"extension":863,"meta":6589,"name":1034,"navigation":866,"path":1035,"seo":6590,"sitemap":6591,"social":6592,"stem":1040,"__hash__":1041},{"type":8,"value":6586,"toc":6587},[],{"title":284,"searchDepth":818,"depth":818,"links":6588},[],{},{"description":284},{"loc":1035},{"github":1039,"twitter":284},{"id":1043,"title":1044,"avatar":1045,"banner":864,"bio":1046,"body":6594,"description":284,"extension":863,"meta":6598,"name":1044,"navigation":866,"path":1052,"seo":6599,"sitemap":6600,"social":6601,"stem":1057,"__hash__":1058},{"type":8,"value":6595,"toc":6596},[],{"title":284,"searchDepth":818,"depth":818,"links":6597},[],{},{"description":284},{"loc":1052},{"github":1056},{"id":1060,"title":1061,"avatar":1062,"banner":864,"bio":1063,"body":6603,"description":284,"extension":863,"meta":6607,"name":1061,"navigation":866,"path":1069,"seo":6608,"sitemap":6609,"social":6610,"stem":1074,"__hash__":1075},{"type":8,"value":6604,"toc":6605},[],{"title":284,"searchDepth":818,"depth":818,"links":6606},[],{},{"description":284},{"loc":1069},{"github":1073},{"id":1077,"title":1078,"avatar":864,"banner":864,"bio":1079,"body":6612,"description":284,"extension":863,"meta":6616,"name":1078,"navigation":866,"path":1085,"seo":6617,"sitemap":6618,"social":6619,"stem":1089,"__hash__":1090},{"type":8,"value":6613,"toc":6614},[],{"title":284,"searchDepth":818,"depth":818,"links":6615},[],{},{"description":284},{"loc":1085},{"github":284},{"id":1092,"title":1093,"avatar":1094,"banner":864,"bio":1095,"body":6621,"description":284,"extension":863,"meta":6625,"name":1093,"navigation":866,"path":1101,"seo":6626,"sitemap":6627,"social":6628,"stem":1106,"__hash__":1107},{"type":8,"value":6622,"toc":6623},[],{"title":284,"searchDepth":818,"depth":818,"links":6624},[],{},{"description":284},{"loc":1101},{"github":1105},{"id":1109,"title":1110,"avatar":1111,"banner":1112,"bio":1113,"body":6630,"description":284,"extension":863,"meta":6634,"name":1110,"navigation":866,"path":1119,"seo":6635,"sitemap":6636,"social":6637,"stem":1124,"__hash__":1125},{"type":8,"value":6631,"toc":6632},[],{"title":284,"searchDepth":818,"depth":818,"links":6633},[],{},{"description":284},{"loc":1119},{"github":1123,"twitter":284},{"id":1127,"title":1128,"avatar":1129,"banner":1130,"bio":1131,"body":6639,"description":1136,"extension":863,"meta":6645,"name":1128,"navigation":866,"path":1140,"seo":6646,"sitemap":6647,"social":6648,"stem":1146,"__hash__":1147},{"type":8,"value":6640,"toc":6643},[6641],[16,6642,1136],{},{"title":284,"searchDepth":818,"depth":818,"links":6644},[],{},{"description":1136},{"loc":1140},{"twitter":1144,"github":1145},{"id":1149,"title":1150,"avatar":1151,"banner":864,"bio":1152,"body":6650,"description":1157,"extension":863,"meta":6662,"name":1150,"navigation":866,"path":1171,"seo":6663,"sitemap":6664,"social":6665,"stem":1176,"__hash__":1177},{"type":8,"value":6651,"toc":6660},[6652,6654],[16,6653,1157],{},[16,6655,6656,1163,6658,1167],{},[31,6657,1162],{},[31,6659,1166],{},{"title":284,"searchDepth":818,"depth":818,"links":6661},[],{},{"description":1157},{"loc":1171},{"github":1175},1776830066792]