<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Sihan's Blog</title><link>https://blog.sihanwei.org/</link><description>Recent content on Sihan's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>Sihan Wei</copyright><atom:link href="https://blog.sihanwei.org/index.xml" rel="self" type="application/rss+xml"/><item><title>Teaching Reflection (I)</title><link>https://blog.sihanwei.org/p/teaching-reflection-i/</link><pubDate>Sat, 19 Jul 2025 22:54:24 -0400</pubDate><guid>https://blog.sihanwei.org/p/teaching-reflection-i/</guid><description>&lt;p>I have been a teaching assistant for many years, but I only find the difficulty of teaching until I started teaching my own course.&lt;/p>
&lt;p>It is often said that being good at a subject does not necessarily make someone a good teacher. That point is well understood. What I find to be an even more subtle and exhausting challenge is this: as a non-native speaker, I have to explain complex concepts in a second language to native speakers while simultaneously paying attention to tone and word choice. I need to make sure my expression is not interpreted as condescending, and that I do not discourage students, even unintentionally.&lt;/p>
&lt;p>When a student gives an incorrect answer, even one that is far off, I cannot simply say “no.” I have to acknowledge their effort, validate any part of their reasoning that makes sense, and gently redirect them. This kind of emotional labor is already demanding for any instructor. For someone teaching in a second language, it requires an added layer of constant mental effort.&lt;/p>
&lt;p>It takes time, self-awareness, and a great deal of patience — not only with students, but also with myself.&lt;/p></description></item><item><title>Changelog</title><link>https://blog.sihanwei.org/changelog/</link><pubDate>Sat, 05 Apr 2025 22:19:18 -0400</pubDate><guid>https://blog.sihanwei.org/changelog/</guid><description>&lt;p>This is where I keep track of changes to the blog — new posts, series, structure tweaks, and tiny milestones.&lt;br>
A little timeline for this little internet corner.&lt;/p>
&lt;hr>
&lt;h2 id="2025">2025
&lt;/h2>&lt;ul>
&lt;li>&lt;strong>2025.04.03&lt;/strong> — Started planning a new series, &lt;em>Unified View&lt;/em>, inspired by the FTRL vs OMD post. It will explore connections between seemingly different ML concepts.&lt;/li>
&lt;li>&lt;strong>2025.03.02&lt;/strong> — Decided to document my learning process in long-form series. First up: &lt;em>Math Tricks for ML&lt;/em> and &lt;em>SVM&lt;/em>.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2024">2024
&lt;/h2>&lt;ul>
&lt;li>&lt;strong>2024.10.18&lt;/strong> — Picked the blog back up, after digging into the connection between FTRL and OMD.&lt;br>
That rabbit hole reminded me why I started writing in the first place.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="2020">2020
&lt;/h2>&lt;ul>
&lt;li>&lt;strong>2020.07.24&lt;/strong> — Got my domain &lt;code>sihanwei.org&lt;/code> on Netlify for &lt;span>$&lt;/span>10.99/year (now &lt;span>$&lt;/span>14.99/year thanks to inflation!)&lt;/li>
&lt;li>&lt;strong>2020.06.27&lt;/strong> — Moved my blog from GitHub Pages to Netlify. &lt;em>(too lazy to build/deploy manually, to be honest)&lt;/em>&lt;/li>
&lt;li>&lt;strong>2017–2018&lt;/strong> — Found GitHub Pages &amp;amp; Hexo during senior year. Tried building an academic homepage to help with grad school apps. Eventually switched to Hugo — it’s fast, simple, and, let’s face it, pretty cool.&lt;/li>
&lt;/ul></description></item><item><title>Support Vector Machines (Part III): What Is a Support Vector?</title><link>https://blog.sihanwei.org/p/support-vector-machines-part-iii-what-is-a-support-vector/</link><pubDate>Wed, 26 Mar 2025 21:29:58 -0400</pubDate><guid>https://blog.sihanwei.org/p/support-vector-machines-part-iii-what-is-a-support-vector/</guid><description>&lt;img src="https://blog.sihanwei.org/p/support-vector-machines-part-iii-what-is-a-support-vector/sv-cover.jpg" alt="Featured image of post Support Vector Machines (Part III): What Is a Support Vector?" />&lt;h2 id="motivation">Motivation
&lt;/h2>&lt;p>As a former TA for machine learning courses—and a learner myself—I’ve noticed that many beginners encounter support vectors as abstract definitions in lecture slides or textbooks. While technically correct, these explanations often lack a visual or intuitive component, making it difficult to see which data points actually matter in practice.&lt;/p>
&lt;p>This post is my attempt to bridge that gap. We’ll revisit what support vectors are, why they matter, and—most importantly—how to recognize them visually. By the end, you should be able to look at the plot of a trained SVM and confidently identify the support vectors: the handful of data points that directly define the decision boundary.&lt;/p>
&lt;hr>
&lt;h2 id="the-role-of-lagrange-multipliers">The Role of Lagrange Multipliers
&lt;/h2>&lt;p>In the SVM framework, each training data point is associated with a Lagrange multiplier, denoted as \(\alpha_i\). The decision boundary is computed as:&lt;/p>
&lt;p>\[
\mathbf{w}^* = \sum_{i=1}^n \alpha_i y_i \mathbf{x}_i
\]&lt;/p>
&lt;p>Only those data points for which \(\alpha_i &amp;gt; 0\) contribute to this sum. In other words, if a data point’s corresponding multiplier is zero, it has no impact on the decision boundary. These influential points are what we call &lt;strong>support vectors&lt;/strong>.&lt;/p>
&lt;hr>
&lt;h2 id="support-vectors-in-hard-margin-svm">Support Vectors in Hard-Margin SVM
&lt;/h2>&lt;p>For datasets that are perfectly separable, we use a hard-margin SVM. Here, the complementary slackness condition tells us:&lt;/p>
&lt;p>\[
\alpha_i(1 - y_i\mathbf{w}^\top\mathbf{x}_i) = 0
\]&lt;/p>
&lt;p>This equation means that for points &lt;em>not&lt;/em> on the margin (where \(y_i\mathbf{w}^\top\mathbf{x}_i &amp;gt; 1\)), the multiplier \(\alpha_i\) must be 0. Hence, only those points that lie exactly &lt;em>on&lt;/em> the margin (where \(y_i\mathbf{w}^\top\mathbf{x}_i = 1\)) have non-zero multipliers and are thus support vectors.&lt;/p>
&lt;h3 id="try-yourself">Try yourself!
&lt;/h3>&lt;p>Can you identify the support vectors in the plot below?&lt;/p>
&lt;figure>
&lt;img src="hard_margin.png" class="clickable-image w-60">
&lt;figcaption>Hard-Margin SVM&lt;/figcaption>
&lt;/figure>
&lt;details>
&lt;summary>✅ Click to reveal the answer&lt;/summary>
&lt;p>There are &lt;strong>3&lt;/strong> support vectors in the plot above. These are the data points that lie &lt;em>exactly&lt;/em> on the margin boundaries. They are the only points with non-zero Lagrange multipliers $\alpha_i > 0$ and directly influence the position of the decision boundary.&lt;/p>
&lt;figure>
&lt;img src="hard_margin_marked.png" class="clickable-image w-60">
&lt;figcaption>Support Vectors in Hard-Margin SVM&lt;/figcaption>
&lt;/figure>
&lt;/details>
&lt;h3 id="code-for-generating-the-plots">Code for generating the plots
&lt;/h3>&lt;p>Want to experiment yourself? Below is the full code used to generate the plots. Try adjusting the &lt;code>random_state&lt;/code> in &lt;code>make_blobs&lt;/code> to generate different datasets and see how the support vectors change!&lt;/p>
&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt"> 1
&lt;/span>&lt;span class="lnt"> 2
&lt;/span>&lt;span class="lnt"> 3
&lt;/span>&lt;span class="lnt"> 4
&lt;/span>&lt;span class="lnt"> 5
&lt;/span>&lt;span class="lnt"> 6
&lt;/span>&lt;span class="lnt"> 7
&lt;/span>&lt;span class="lnt"> 8
&lt;/span>&lt;span class="lnt"> 9
&lt;/span>&lt;span class="lnt">10
&lt;/span>&lt;span class="lnt">11
&lt;/span>&lt;span class="lnt">12
&lt;/span>&lt;span class="lnt">13
&lt;/span>&lt;span class="lnt">14
&lt;/span>&lt;span class="lnt">15
&lt;/span>&lt;span class="lnt">16
&lt;/span>&lt;span class="lnt">17
&lt;/span>&lt;span class="lnt">18
&lt;/span>&lt;span class="lnt">19
&lt;/span>&lt;span class="lnt">20
&lt;/span>&lt;span class="lnt">21
&lt;/span>&lt;span class="lnt">22
&lt;/span>&lt;span class="lnt">23
&lt;/span>&lt;span class="lnt">24
&lt;/span>&lt;span class="lnt">25
&lt;/span>&lt;span class="lnt">26
&lt;/span>&lt;span class="lnt">27
&lt;/span>&lt;span class="lnt">28
&lt;/span>&lt;span class="lnt">29
&lt;/span>&lt;span class="lnt">30
&lt;/span>&lt;span class="lnt">31
&lt;/span>&lt;span class="lnt">32
&lt;/span>&lt;span class="lnt">33
&lt;/span>&lt;span class="lnt">34
&lt;/span>&lt;span class="lnt">35
&lt;/span>&lt;span class="lnt">36
&lt;/span>&lt;span class="lnt">37
&lt;/span>&lt;span class="lnt">38
&lt;/span>&lt;span class="lnt">39
&lt;/span>&lt;span class="lnt">40
&lt;/span>&lt;span class="lnt">41
&lt;/span>&lt;span class="lnt">42
&lt;/span>&lt;span class="lnt">43
&lt;/span>&lt;span class="lnt">44
&lt;/span>&lt;span class="lnt">45
&lt;/span>&lt;span class="lnt">46
&lt;/span>&lt;span class="lnt">47
&lt;/span>&lt;span class="lnt">48
&lt;/span>&lt;span class="lnt">49
&lt;/span>&lt;span class="lnt">50
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">numpy&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">np&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">matplotlib.pyplot&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">plt&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">sklearn&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">datasets&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">sklearn.svm&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">SVC&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Generate a dataset&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">X&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">datasets&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">make_blobs&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_samples&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">centers&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">random_state&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">6&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">cluster_std&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">1.2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Create a soft margin SVM classifier&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">svm&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">SVC&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">C&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">1.0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;linear&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">svm&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">X&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Plot the data&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">figure&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">figsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">6&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">markers&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;o&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;x&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, marker=&amp;#39;o&amp;#39;, edgecolors=&amp;#39;k&amp;#39;)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">class_value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">marker&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">zip&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unique&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">markers&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">X&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">y&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">class_value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">y&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">class_value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">80&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">marker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">marker&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Class &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">class_value&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Plot the decision boundary&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ax&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">gca&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">xlim&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get_xlim&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ylim&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get_ylim&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Create grid to evaluate model&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">xx&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linspace&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">xlim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">xlim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="mi">30&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">yy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linspace&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ylim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">ylim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="mi">30&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">YY&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">XX&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">meshgrid&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">yy&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">xx&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">xy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">vstack&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">XX&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ravel&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">YY&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ravel&lt;/span>&lt;span class="p">()])&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">T&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">Z&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">svm&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">decision_function&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">xy&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reshape&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">XX&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Plot decision boundary and margins&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">contour&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">XX&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">YY&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">Z&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">colors&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;k&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">levels&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">alpha&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">0.5&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">linestyles&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;-&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">num_support_vectors&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">svm&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">support_vectors_&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;The number of suppost vectors is &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">num_support_vectors&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Highlight the support vectors&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">sv&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">svm&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">support_vectors_&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sv&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">sv&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">150&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">linewidth&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">facecolors&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;none&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">edgecolors&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;red&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tick_params&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">axis&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;both&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labelsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xlabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Feature 1&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">fontsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Feature 2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">fontsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># plt.savefig(&amp;#39;hard_margin_marked.png&amp;#39;, dpi=300)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">fontsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">12&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">show&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div>&lt;hr>
&lt;h2 id="support-vectors-in-soft-margin-svm">Support Vectors in Soft-Margin SVM
&lt;/h2>&lt;p>When data is not perfectly separable, SVMs use a soft-margin approach with slack variables \(\xi_i\). The complementary slackness and KKT conditions become:&lt;/p>
\[
\alpha_i(1 - y_i\mathbf{w}^\top\mathbf{x}_i - \xi_i) = 0 \\
(C - \alpha_i)\xi_i = 0
\]&lt;p>We then encounter two cases:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>\(\alpha_i &amp;gt; 0\), \(\xi_i = 0\):&lt;/strong>&lt;br>
The point lies exactly on the margin border. It is a &lt;strong>support vector&lt;/strong>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>\(\alpha_i &amp;gt; 0\), \(\xi_i &amp;gt; 0\):&lt;/strong>&lt;br>
The point is either inside the margin or misclassified. Here, \(\alpha_i = C\). These points also influence the decision boundary and are &lt;strong>support vectors&lt;/strong>.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>In contrast, points with \(\alpha_i = 0\) lie far from the margin and do &lt;strong>not&lt;/strong> affect the model.&lt;/p>
&lt;h3 id="try-yourself-1">Try yourself!
&lt;/h3>&lt;p>Can you identify the support vectors in the plot below?&lt;/p>
&lt;figure>
&lt;img src="soft_margin.png" class="clickable-image w-60">
&lt;figcaption>Soft-Margin SVM&lt;/figcaption>
&lt;/figure>
&lt;details>
&lt;summary>✅ Click to reveal the answer&lt;/summary>
&lt;p>There are &lt;strong>6&lt;/strong> support vectors in the plot above. In the soft-margin setting, support vectors are the data points with non-zero Lagrange multipliers $\alpha_i > 0$. These include:&lt;/p>
&lt;ul>
&lt;li>Points lying exactly on the margin boundaries&lt;/li>
&lt;li>Points that are within the margin&lt;/li>
&lt;li>Points that are misclassified (on the wrong side of the decision boundary)&lt;/li>
&lt;/ul>
&lt;p>Only these points influence the position of the decision boundary. Points farther away from the margin have $\alpha_i = 0$ and do not contribute.&lt;/p>
&lt;figure>
&lt;img src="soft_margin_marked.png" class="clickable-image w-60">
&lt;figcaption>Support Vectors in Soft-Margin SVM&lt;/figcaption>
&lt;/figure>
&lt;/details>
&lt;h3 id="code-for-generating-the-plots-1">Code for generating the plots
&lt;/h3>&lt;div class="highlight">&lt;div class="chroma">
&lt;table class="lntable">&lt;tr>&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code>&lt;span class="lnt"> 1
&lt;/span>&lt;span class="lnt"> 2
&lt;/span>&lt;span class="lnt"> 3
&lt;/span>&lt;span class="lnt"> 4
&lt;/span>&lt;span class="lnt"> 5
&lt;/span>&lt;span class="lnt"> 6
&lt;/span>&lt;span class="lnt"> 7
&lt;/span>&lt;span class="lnt"> 8
&lt;/span>&lt;span class="lnt"> 9
&lt;/span>&lt;span class="lnt">10
&lt;/span>&lt;span class="lnt">11
&lt;/span>&lt;span class="lnt">12
&lt;/span>&lt;span class="lnt">13
&lt;/span>&lt;span class="lnt">14
&lt;/span>&lt;span class="lnt">15
&lt;/span>&lt;span class="lnt">16
&lt;/span>&lt;span class="lnt">17
&lt;/span>&lt;span class="lnt">18
&lt;/span>&lt;span class="lnt">19
&lt;/span>&lt;span class="lnt">20
&lt;/span>&lt;span class="lnt">21
&lt;/span>&lt;span class="lnt">22
&lt;/span>&lt;span class="lnt">23
&lt;/span>&lt;span class="lnt">24
&lt;/span>&lt;span class="lnt">25
&lt;/span>&lt;span class="lnt">26
&lt;/span>&lt;span class="lnt">27
&lt;/span>&lt;span class="lnt">28
&lt;/span>&lt;span class="lnt">29
&lt;/span>&lt;span class="lnt">30
&lt;/span>&lt;span class="lnt">31
&lt;/span>&lt;span class="lnt">32
&lt;/span>&lt;span class="lnt">33
&lt;/span>&lt;span class="lnt">34
&lt;/span>&lt;span class="lnt">35
&lt;/span>&lt;span class="lnt">36
&lt;/span>&lt;span class="lnt">37
&lt;/span>&lt;span class="lnt">38
&lt;/span>&lt;span class="lnt">39
&lt;/span>&lt;span class="lnt">40
&lt;/span>&lt;span class="lnt">41
&lt;/span>&lt;span class="lnt">42
&lt;/span>&lt;span class="lnt">43
&lt;/span>&lt;span class="lnt">44
&lt;/span>&lt;span class="lnt">45
&lt;/span>&lt;span class="lnt">46
&lt;/span>&lt;span class="lnt">47
&lt;/span>&lt;span class="lnt">48
&lt;/span>&lt;span class="lnt">49
&lt;/span>&lt;span class="lnt">50
&lt;/span>&lt;span class="lnt">51
&lt;/span>&lt;span class="lnt">52
&lt;/span>&lt;span class="lnt">53
&lt;/span>&lt;span class="lnt">54
&lt;/span>&lt;span class="lnt">55
&lt;/span>&lt;/code>&lt;/pre>&lt;/td>
&lt;td class="lntd">
&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">numpy&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">np&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">import&lt;/span> &lt;span class="nn">matplotlib.pyplot&lt;/span> &lt;span class="k">as&lt;/span> &lt;span class="nn">plt&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">sklearn&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">datasets&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="kn">from&lt;/span> &lt;span class="nn">sklearn.svm&lt;/span> &lt;span class="kn">import&lt;/span> &lt;span class="n">SVC&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Adjusting the dataset to be linearly nonseparable for a soft-margin linear SVM&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">X&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">datasets&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">make_blobs&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">n_samples&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">20&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">centers&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">2&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">random_state&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">cluster_std&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">1.2&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Create a soft margin SVM classifier with a linear kernel for the adjusted dataset&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">svm_linear_soft&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">SVC&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">C&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">kernel&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;linear&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># Adjusting C for a softer margin&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">svm_linear_soft&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">fit&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">X&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">y&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Plot the data&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">figure&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">figsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">8&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">6&lt;/span>&lt;span class="p">))&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">markers&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;o&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;x&amp;#39;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, marker=&amp;#39;o&amp;#39;, edgecolors=&amp;#39;k&amp;#39;)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">class_value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">marker&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="nb">zip&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">unique&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">y&lt;/span>&lt;span class="p">),&lt;/span> &lt;span class="n">markers&lt;/span>&lt;span class="p">):&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">X&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">y&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">class_value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">X&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="n">y&lt;/span> &lt;span class="o">==&lt;/span> &lt;span class="n">class_value&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">80&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">marker&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="n">marker&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">label&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s2">&amp;#34;Class &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">class_value&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Plot the decision boundary&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ax&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">gca&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">xlim&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get_xlim&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ylim&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">get_ylim&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Create grid to evaluate model&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">xx&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linspace&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">xlim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">xlim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="mi">30&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">yy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">linspace&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">ylim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">ylim&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="mi">30&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">YY&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">XX&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">meshgrid&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">yy&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">xx&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">xy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">np&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">vstack&lt;/span>&lt;span class="p">([&lt;/span>&lt;span class="n">XX&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ravel&lt;/span>&lt;span class="p">(),&lt;/span> &lt;span class="n">YY&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ravel&lt;/span>&lt;span class="p">()])&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">T&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">Z&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">svm_linear_soft&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">decision_function&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">xy&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">reshape&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">XX&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># print the number of support vectors&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">num_support_vectors&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">svm_linear_soft&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">support_vectors_&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">shape&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="mi">0&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">print&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="sa">f&lt;/span>&lt;span class="s1">&amp;#39;The number of support vectors is &lt;/span>&lt;span class="si">{&lt;/span>&lt;span class="n">num_support_vectors&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s1">.&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Plot decision boundary and margins&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">contour&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">XX&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">YY&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">Z&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">colors&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;k&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">levels&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="o">-&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">alpha&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mf">.5&lt;/span>&lt;span class="p">,&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">linestyles&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;-&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="s1">&amp;#39;--&amp;#39;&lt;/span>&lt;span class="p">])&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># Highlight the support vectors&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">sv&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">svm_linear_soft&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">support_vectors_&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">ax&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">scatter&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">sv&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="mi">0&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">sv&lt;/span>&lt;span class="p">[:,&lt;/span> &lt;span class="mi">1&lt;/span>&lt;span class="p">],&lt;/span> &lt;span class="n">s&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">150&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">linewidth&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">facecolors&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;none&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">edgecolors&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;red&amp;#39;&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">tick_params&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">axis&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s1">&amp;#39;both&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">labelsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">xlabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Feature 1&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">fontsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">ylabel&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;Feature 2&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">fontsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">15&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">legend&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">fontsize&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="mi">12&lt;/span>&lt;span class="p">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="c1"># plt.savefig(&amp;#39;soft_margin_marked.png&amp;#39;, dpi=300)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">plt&lt;/span>&lt;span class="o">.&lt;/span>&lt;span class="n">show&lt;/span>&lt;span class="p">()&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/td>&lt;/tr>&lt;/table>
&lt;/div>
&lt;/div>&lt;hr>
&lt;h2 id="conclusion">Conclusion
&lt;/h2>&lt;p>Support vectors are not just a technical detail in SVMs—they are the essential data points that shape the decision boundary. Whether you’re working with a hard-margin or soft-margin SVM, the concept remains the same:
Only the data points with non-zero Lagrange multipliers ($\alpha_i > 0$) influence the final classifier.&lt;/p>
&lt;p>By now, you should be able to look at an SVM plot and confidently pick out the support vectors, understanding exactly why they matter.&lt;/p></description></item><item><title>Support Vector Machines (Part I): What Is a Margin, Really?</title><link>https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/</link><pubDate>Wed, 12 Mar 2025 00:08:20 -0400</pubDate><guid>https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/</guid><description>&lt;img src="https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/margin-cover.jpg" alt="Featured image of post Support Vector Machines (Part I): What Is a Margin, Really?" />&lt;h2 id="before-we-get-started">Before we get started
&lt;/h2>&lt;p>SVM is easily &lt;em>my favorite&lt;/em> machine learning algorithm—no &amp;ldquo;one of&amp;rdquo; needed. In fact, I started the whole &lt;strong>Wandering ML&lt;/strong> series because one day it struck me that I had to write something about SVM. It&amp;rsquo;s simple, elegant, and powerful.&lt;/p>
&lt;p>I first heard about SVM during my early college years. At the time, I knew nothing about machine learning and was obsessively into signal processing (you wouldn’t believe how crazy I was about the Fourier transform). Later, after moving to the U.S., I took a machine learning course. The moment I saw the instructor derive the Lagrange dual problem (which I later learned is a common technique in convex optimization), I thought: “Wow, this is so cool.”&lt;/p>
&lt;p>I still remember how fascinated I was when I finally understood the core idea of SVM. That fascination only deepened when I encountered learning theory—thanks to Dr. Vapnik. Hopefully, after reading this SVM series, you’ll share a bit of that excitement too.&lt;/p>
&lt;h2 id="motivation">Motivation
&lt;/h2>&lt;p>Suppose our data is linearly separable. We can draw a line that perfectly separates the two classes. Then we’re done, right?&lt;/p>
&lt;figure>
&lt;img src="many_lines.svg" class="clickable-image w-60">
&lt;figcaption>Fig. 1. Infinite decision boundaries for linearly separable data&lt;/figcaption>
&lt;/figure>
&lt;p>Well, not quite. There are infinitely many lines that can separate the classes without error. As shown in the figures below, all three lines separate the data perfectly. But which one is the best?&lt;/p>
&lt;p>&lt;img src="https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_bad1.png"
width="2700"
height="2100"
srcset="https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_bad1_hu4737866079674126212.png 480w, https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_bad1_hu1987813729283393096.png 1024w"
loading="lazy"
alt="Decision Boundary (a)"
class="gallery-image"
data-flex-grow="128"
data-flex-basis="308px"
>&lt;img src="https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_bad2.png"
width="2700"
height="2100"
srcset="https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_bad2_hu12563777562775526781.png 480w, https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_bad2_hu10806942748683563302.png 1024w"
loading="lazy"
alt="Decision Boundary (b)"
class="gallery-image"
data-flex-grow="128"
data-flex-basis="308px"
>&lt;img src="https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary.png"
width="2700"
height="2100"
srcset="https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_hu3911894267793471473.png 480w, https://blog.sihanwei.org/p/support-vector-machines-part-i-what-is-a-margin-really/decision_boundary_hu4226989026953172112.png 1024w"
loading="lazy"
alt="Decision Boundary (c)"
class="gallery-image"
data-flex-grow="128"
data-flex-basis="308px"
>&lt;/p>
&lt;figure>
&lt;figcaption>Fig. 2. Three different decision boundaries for the same data&lt;/figcaption>
&lt;/figure>
Let’s introduce a new data point—a "dangerous point" with label -1.
&lt;figure>
&lt;img src="dangerous_point.svg" class="clickable-image w-60">
&lt;figcaption>Fig. 3. A "dangerous point"&lt;/figcaption>
&lt;/figure>
&lt;p>How do the classifiers behave with this new point?&lt;/p>
&lt;ul>
&lt;li>Classifier (a)? It misclassifies the point.&lt;/li>
&lt;li>Classifier (c)? It gets it right.&lt;/li>
&lt;/ul>
&lt;p>Classifier (c) is more robust to this tricky example. In practice, we want all classifiers to behave like (c), avoiding such misclassifications whenever possible. But the question is, why is the third classifier more robust? What makes it safer?&lt;/p>
&lt;p>Here’s the key: &lt;strong>the margin&lt;/strong>. The third classifier has a wider margin than the other two. The wider the margin, the more robust the classifier is to noise and outliers.&lt;/p>
&lt;h2 id="what-is-a-margin">What is a margin?
&lt;/h2>&lt;p>Let’s start with a simple analogy. Imagine driving on a two-lane road, one lane per direction. If the road is wide, you feel safe driving in your lane without worrying about the car coming from the opposite direction. But if the road is narrow, you must drive more carefully, keeping a safe distance.&lt;/p>
&lt;figure>
&lt;img src="road.png" class="clickable-image w-60">
&lt;figcaption>Fig. 4. A two-lane road&lt;/figcaption>
&lt;/figure>
&lt;p>We feel safer on wider roads, and classifiers feel the same. A margin acts like a &lt;em>buffer zone&lt;/em>. The wider the margin, the more robust the classifier becomes to noise and outliers.&lt;/p>
&lt;p>Let’s formalize this idea.&lt;/p>
&lt;p>In SVMs, the margin is defined as the &lt;strong>minimum distance&lt;/strong> from the decision boundary to all the training data. In other words, it quantifies the “space” that separates the two classes.&lt;/p>
&lt;figure>
&lt;img src="svm_margin.svg" class="clickable-image w-60">
&lt;figcaption>Fig. 5. Max-margin classifier&lt;/figcaption>
&lt;/figure>
&lt;h2 id="size-of-the-margin">Size of the margin
&lt;/h2>&lt;p>Now that we know what a margin is, let’s see how to compute its size mathematically.&lt;/p>
&lt;p>By definition, it is the distance from the decision boundary to the closest data point. Sound familiar? It’s the classic high-school formula for the distance from a point to a hyperplane.&lt;/p>
&lt;p>The decision boundary is defined as&lt;/p>
$$
\mathbf{w}^\top \mathbf{x} + b = 0
$$&lt;p>What is the distance from a point to a line? Let&amp;rsquo;s say we have an arbitrary data point $(\mathbf{x}, y)$ and a line defined by the equation $\mathbf{w}^\top \mathbf{x} + b = 0$. $\mathbf{x_p}$ is the projection of $\mathbf{x}$ on the line. $\mathbf{w}$ is the normal vector of the line.&lt;/p>
&lt;figure>
&lt;img src="distance.svg" style="width: 40%;">
&lt;figcaption>Fig. 6. Distance of a point to a line.&lt;/figcaption>
&lt;/figure>
&lt;p>We consider the vector between $\mathbf{x}$ and its projection $\mathbf{x_p}$. Then the distance $d$ from the point to the line is given by:&lt;/p>
$$
d\cdot \frac{\mathbf{w}}{\|\mathbf{w}\|} = y\cdot(\mathbf{x} - \mathbf{x_p})
$$&lt;p>Think about it: why is the label $y$ here?&lt;/p>
&lt;details>
&lt;summary>✅ Click to reveal the answer&lt;/summary>
$y$ is the label of the point $\mathbf{x}$. If $\mathbf{x}$ is a positive point, then $y=1$. If $\mathbf{x}$ is a negative point, then $y=-1$. Since the normal vector $\mathbf{w}$ is pointing in the direction of the positive class, we multiply $(\mathbf{x} - \mathbf{x_p})$ by $y$ to ensure that the distance is positive. This way, we can always get a positive distance regardless of the class of the point $\mathbf{x}$.
&lt;/details>
&lt;p>Multiplying both sides by $\mathbf{w}$, we have:
&lt;/p>
$$
d\cdot \frac{\mathbf{w}^\top\mathbf{w}}{\|\mathbf{w}\|} = y\cdot((\mathbf{w}^\top\mathbf{x}+b) - (\mathbf{w}^\top\mathbf{x_p}+b))
$$&lt;p>Since $\mathbf{x_p}$ is on the line, we have:
&lt;/p>
$$
\mathbf{w}^\top \mathbf{x_p} + b = 0
$$&lt;p>
Hence, we can rewrite the above equation as:
&lt;/p>
$$
y\cdot(\mathbf{w}^\top \mathbf{x} + b) = d\cdot \|\mathbf{w}\|
$$&lt;p>
Then we have:
&lt;/p>
$$
d = y\cdot\frac{\mathbf{w}^\top \mathbf{x} + b}{\|\mathbf{w}\|}
$$&lt;p>Since the margin is defined as &lt;strong>the minimum distance from all training data to the decision boundary&lt;/strong>, we can write:&lt;/p>
$$
\text{margin} = \min_{i} \left( y_i\cdot\frac{\mathbf{w}^\top \mathbf{x_i} + b}{\|\mathbf{w}\|} \right)
$$&lt;p>Then the problem of maximizing the margin or the distance from all training data to the decision boundary can be formulated as:
&lt;/p>
$$
\max_{\mathbf{w}, b} \min_{i} \left( y_i\cdot\frac{\mathbf{w}^\top \mathbf{x_i} + b}{\|\mathbf{w}\|} \right)
$$&lt;p>Notice that we can always rescale the vector $\mathbf{w}$ and the bias $b$ by a constant factor without changing the decision boundary. Therefore, it suffices to set
the normal vector $\mathbf{w}$ such that $\min_i y_i\cdot(\mathbf{w}^\top \mathbf{x_i} + b)=1$&lt;/p>
&lt;p>This leads to the following optimization problem:
&lt;/p>
$$
\max_{\mathbf{w}, b} \frac{1}{\|\mathbf{w}\|} \quad \text{s.t.} \quad \forall i,y_i\cdot(\mathbf{w}^\top \mathbf{x_i} + b) \geq 1
$$&lt;h2 id="conclusion">Conclusion
&lt;/h2>&lt;p>In this post, we introduced the concept of margin in SVMs. We also saw how to calculate the margin and why maximizing it leads to more robust classifiers.&lt;/p>
&lt;p>In the next post, we will discuss how to find the maximum margin classifier and how to solve the optimization problem using Lagrange multipliers. We will also handle the case of non-linearly separable data.&lt;/p>
&lt;blockquote>
&lt;p>Code and plots are available in the &lt;a class="link" href="https://github.com/RaphelWei/blog-codebase/tree/main/ml-notes/svm-series" target="_blank" rel="noopener"
>GitHub repository&lt;/a>&lt;/p>
&lt;/blockquote></description></item><item><title>Math Tricks for Machine Learning (Part I): Concentration Inequality</title><link>https://blog.sihanwei.org/p/math-tricks-for-machine-learning-part-i-concentration-inequality/</link><pubDate>Sun, 02 Mar 2025 06:52:04 -0500</pubDate><guid>https://blog.sihanwei.org/p/math-tricks-for-machine-learning-part-i-concentration-inequality/</guid><description>&lt;img src="https://blog.sihanwei.org/p/math-tricks-for-machine-learning-part-i-concentration-inequality/ci-cover.jpg" alt="Featured image of post Math Tricks for Machine Learning (Part I): Concentration Inequality" />&lt;h2 id="introduction">Introduction
&lt;/h2>&lt;p>Over the past few years, I’ve passionately studied various machine learning and statistical concepts. One thing I’ve learned is that many research papers rely on clever mathematical “tricks”—techniques that are used so routinely they often go unexplained. In this series, I plan to catalog these tricks to help demystify the math behind modern ML.&lt;/p>
&lt;p>In this first installment, we’ll focus on concentration inequalities, a key tool for understanding how random variables behave. Whether you’re analyzing generalization bounds or just trying to get a grip on how data “concentrates” around its mean, these inequalities provide a rigorous way to quantify uncertainty.&lt;/p>
&lt;h3 id="what-are-concentration-inequalities">What Are Concentration Inequalities?
&lt;/h3>&lt;p>Concentration inequalities provide bounds on the probability that a random variable deviates from some central value (often its expected value). In simpler terms, they tell us how “concentrated” a random variable is around its mean.&lt;/p>
&lt;p>For example, if you compute the average of a large number of independent samples, a concentration inequality can help you answer: How likely is it that the average is far from the true mean? This is crucial for ensuring that what we observe empirically (on our training set, say) is representative of the underlying data distribution.&lt;/p>
&lt;h3 id="why-they-matter-in-machine-learning">Why They Matter in Machine Learning
&lt;/h3>&lt;p>In machine learning, concentration inequalities are the backbone of many generalization guarantees. They help us:&lt;/p>
&lt;ul>
&lt;li>Quantify the reliability of empirical estimates: For instance, ensuring that the training error is close to the true error.&lt;/li>
&lt;li>Derive performance bounds: Many algorithms’ guarantees hinge on these inequalities.&lt;/li>
&lt;li>Analyze convergence: When using stochastic optimization methods, concentration inequalities can show how fast our estimates converge to their true values.&lt;/li>
&lt;/ul>
&lt;h3 id="key-examples-of-concentration-inequalities">Key Examples of Concentration Inequalities
&lt;/h3>&lt;p>Here are some of the most common concentration inequalities that you might encounter in ML literature:&lt;/p>
&lt;ul>
&lt;li>Hoeffding’s Inequality: Provides a bound for the sum of bounded independent random variables.&lt;/li>
&lt;li>McDiarmid’s Inequality: Useful when the function of independent random variables does not change too much when any single variable is altered.&lt;/li>
&lt;li>Chebyshev’s Inequality: Offers a more general (though often looser) bound using the variance of the random variable.&lt;/li>
&lt;li>Chernoff Bounds: Provide exponentially decreasing bounds on tail distributions of sums of independent random variables.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="a-closer-look-hoeffdings-inequality">A Closer Look: Hoeffding’s Inequality
&lt;/h2>&lt;p>To illustrate the concept, consider Hoeffding’s inequality. Suppose you have independent random variables \(X_1, X_2, \dots, X_n\) that are bounded (say, each \(X_i \in [a_i, b_i]\)). Define the empirical average:&lt;/p>
\[
\frac{1}{n} \sum_{i=1}^n X_i
\]&lt;p>Hoeffding’s inequality gives us a bound on how far this average can deviate from its expected value. Specifically, for any \(t > 0\):&lt;/p>
\[
\Pr\left( \left| \frac{1}{n} \sum_{i=1}^n X_i - \mathbb{E}\left[\frac{1}{n} \sum_{i=1}^n X_i\right] \right| \ge t \right) \le 2\exp\left(\frac{-2n^2 t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)
\]&lt;p>In plain terms: the more samples you have, the tighter the concentration around the true mean. The probability of a large deviation shrinks exponentially fast in the number of samples \(n\).&lt;/p>
&lt;h3 id="common-variants-in-ml">Common Variants in ML
&lt;/h3>&lt;p>In machine learning, our data is often assumed to be i.i.d. and bounded in \([0, 1]\). In this case, Hoeffding’s inequality simplifies to:&lt;/p>
\[
\Pr\left( \left| \frac{1}{n} \sum_{i=1}^n X_i - \mathbb{E}[X_i] \right| \ge t \right)
\le 2 \exp(-2nt^2)
\]&lt;p>This is commonly used when bounding the difference between empirical risk and true risk.&lt;/p>
&lt;h3 id="one-sided-version">One-Sided Version
&lt;/h3>&lt;p>If you only care about the upper (or lower) tail—for example, bounding overestimation of the mean—you can drop the absolute value:&lt;/p>
\[
\Pr\left( \frac{1}{n} \sum_{i=1}^n X_i - \mathbb{E}[X_i] \ge t \right)
\le \exp(-2nt^2)
\]&lt;p>This is especially handy when applying a union bound across multiple events.&lt;/p>
&lt;hr>
&lt;h2 id="a-closer-look-mcdiarmids-inequality">A Closer Look: McDiarmid’s Inequality
&lt;/h2>&lt;p>McDiarmid’s inequality is a powerful concentration result that applies to functions of independent random variables—especially when the function doesn&amp;rsquo;t change too much if a single variable is altered. It is sometimes referred to as the &lt;strong>bounded difference inequality&lt;/strong>.&lt;/p>
&lt;h3 id="setup">Setup
&lt;/h3>&lt;p>Let \( X_1, X_2, \dots, X_n \) be independent random variables taking values in arbitrary spaces. Suppose we have a function&lt;br>
&lt;/p>
\[
f : \mathcal{X}_1 \times \dots \times \mathcal{X}_n \to \mathbb{R}
\]&lt;p>&lt;br>
such that changing any one coordinate \( X_i \) (while keeping the others fixed) changes the value of \( f \) by at most \( c_i \). Formally, for all \( i \in \{1, \dots, n\} \):&lt;/p>
\[
\sup_{x_1,\dots,x_n,\,x_i'} \left| f(x_1, \dots, x_i, \dots, x_n) - f(x_1, \dots, x_i', \dots, x_n) \right| \le c_i
\]&lt;p>Then for any \( t > 0 \):&lt;/p>
\[
\Pr\left( f(X_1, \dots, X_n) - \mathbb{E}[f(X_1, \dots, X_n)] \ge t \right)
\le \exp\left( \frac{-2t^2}{\sum_{i=1}^n c_i^2} \right)
\]&lt;p>There is also a &lt;strong>two-sided version&lt;/strong>:&lt;/p>
\[
\Pr\left( \left| f(X_1, \dots, X_n) - \mathbb{E}[f(X_1, \dots, X_n)] \right| \ge t \right)
\le 2\exp\left( \frac{-2t^2}{\sum_{i=1}^n c_i^2} \right)
\]&lt;hr>
&lt;h3 id="why-it-matters-in-ml">Why It Matters in ML
&lt;/h3>&lt;p>McDiarmid’s inequality is especially useful in situations where we evaluate some function over a dataset, like the &lt;strong>empirical risk&lt;/strong>, and want to show that it concentrates around its expected value.&lt;/p>
&lt;p>Unlike Hoeffding’s inequality, which applies to sums of random variables, McDiarmid applies to more general functions—as long as &lt;strong>no single variable has too much influence&lt;/strong>. This makes it highly suitable for:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Stability analysis of algorithms&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Generalization bounds&lt;/strong> when empirical loss functions change only slightly with a single data point&lt;/li>
&lt;li>&lt;strong>Complex random processes&lt;/strong>, such as Rademacher complexities or covering number arguments&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h3 id="example">Example
&lt;/h3>&lt;p>Let’s say \( f \) is the empirical risk over a dataset of \( n \) samples:&lt;/p>
\[
f(X_1, \dots, X_n) = \frac{1}{n} \sum_{i=1}^n \ell(X_i)
\]&lt;p>If the loss function \( \ell \) is bounded in \([0, 1]\), then changing one data point changes the empirical risk by at most \( \frac{1}{n} \). So \( c_i = \frac{1}{n} \), and:&lt;/p>
\[
\sum_{i=1}^n c_i^2 = n \cdot \left(\frac{1}{n}\right)^2 = \frac{1}{n}
\]&lt;p>Plugging this into McDiarmid’s inequality gives:&lt;/p>
\[
\Pr\left( f(X_1, \dots, X_n) - \mathbb{E}[f] \ge t \right)
\le \exp(-2nt^2)
\]&lt;p>— which is exactly the same bound as Hoeffding’s inequality for i.i.d. bounded random variables, but derived in a more general framework.&lt;/p></description></item><item><title>Regret Analysis of FTRL and OMD Algorithms</title><link>https://blog.sihanwei.org/p/regret-analysis-of-ftrl-and-omd-algorithms/</link><pubDate>Fri, 18 Oct 2024 22:54:24 -0400</pubDate><guid>https://blog.sihanwei.org/p/regret-analysis-of-ftrl-and-omd-algorithms/</guid><description>&lt;img src="https://blog.sihanwei.org/p/regret-analysis-of-ftrl-and-omd-algorithms/omd-cover.jpg" alt="Featured image of post Regret Analysis of FTRL and OMD Algorithms" />&lt;h1 id="regret-analysis-of-ftrl-and-omd-algorithms">Regret Analysis of FTRL and OMD Algorithms
&lt;/h1>&lt;h2 id="introduction">Introduction
&lt;/h2>&lt;p>In this note, we&amp;rsquo;ll explore the regret analysis of both the &lt;strong>Follow-The-Regularized-Leader (FTRL)&lt;/strong> algorithm and the &lt;strong>Online Mirror Descent (OMD)&lt;/strong> algorithm. We&amp;rsquo;ll highlight their similarities and differences, and demonstrate how, under certain conditions, they are essentially equivalent. This analysis includes detailed derivations and mathematical expressions.&lt;/p>
&lt;h2 id="follow-the-regularized-leader-ftrl">Follow-The-Regularized-Leader (FTRL)
&lt;/h2>&lt;h3 id="problem-setup">Problem Setup
&lt;/h3>&lt;p>Consider an online convex optimization problem over \( T \) rounds. At each round \( t \):&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Decision Making&lt;/strong>: The learner selects \( \mathbf{x}_t \in \mathcal{X} \subseteq \mathbb{R}^n \).&lt;/li>
&lt;li>&lt;strong>Loss Revealing&lt;/strong>: An adversary reveals a convex loss function \( f_t : \mathcal{X} \rightarrow \mathbb{R} \).&lt;/li>
&lt;li>&lt;strong>Loss Incurred&lt;/strong>: The learner incurs loss \( f_t(\mathbf{x}_t) \).&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Goal&lt;/strong>: Minimize the cumulative &lt;strong>regret&lt;/strong>:&lt;/p>
\[
\text{Regret}_T = \sum_{t=1}^T f_t(\mathbf{x}_t) - \min_{\mathbf{x} \in \mathcal{X}} \sum_{t=1}^T f_t(\mathbf{x}).
\]&lt;h3 id="ftrl-algorithm">FTRL Algorithm
&lt;/h3>&lt;p>At each round \( t \), the FTRL algorithm updates the decision by solving:&lt;/p>
\[
\mathbf{x}_t = \arg\min_{\mathbf{x} \in \mathcal{X}} \left\{ \eta \sum_{s=1}^{t-1} f_s(\mathbf{x}) + R(\mathbf{x}) \right\},
\]&lt;p>where:&lt;/p>
&lt;ul>
&lt;li>\( \eta > 0 \) is the learning rate.&lt;/li>
&lt;li>\( R : \mathcal{X} \rightarrow \mathbb{R} \) is a strongly convex regularization function.&lt;/li>
&lt;/ul>
&lt;h3 id="regret-analysis">Regret Analysis
&lt;/h3>&lt;p>&lt;strong>Assumptions&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Convexity&lt;/strong>: Each loss function \( f_t \) is convex.&lt;/li>
&lt;li>&lt;strong>Lipschitz Continuity&lt;/strong>: The subgradients are bounded: \( \| \nabla f_t(\mathbf{x}) \|_* \leq G \) for all \( \mathbf{x} \in \mathcal{X} \).&lt;/li>
&lt;li>&lt;strong>Strong Convexity&lt;/strong>: The regularizer \( R \) is \( \lambda \)-strongly convex with respect to a norm \( \| \cdot \| \).&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Key Steps&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>One-Step Regret Bound&lt;/strong>&lt;/p>
&lt;p>Using the convexity of \( f_t \):&lt;/p>
\[
f_t(\mathbf{x}_t) - f_t(\mathbf{x}^*) \leq \langle \nabla f_t(\mathbf{x}_t), \mathbf{x}_t - \mathbf{x}^* \rangle,
\]&lt;p>where \( \mathbf{x}^* = \arg\min_{\mathbf{x} \in \mathcal{X}} \sum_{t=1}^T f_t(\mathbf{x}) \).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Regret Decomposition&lt;/strong>&lt;/p>
&lt;p>Summing over \( t \):&lt;/p>
\[
\text{Regret}_T \leq \sum_{t=1}^T \langle \nabla f_t(\mathbf{x}_t), \mathbf{x}_t - \mathbf{x}^* \rangle.
\]&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bounding the Inner Product&lt;/strong>&lt;/p>
&lt;p>Using the properties of the regularizer and the FTRL updates, we can relate the sum to the Bregman divergence \( D_R \):&lt;/p>
\[
\sum_{t=1}^T \langle \nabla f_t(\mathbf{x}_t), \mathbf{x}_t - \mathbf{x}^* \rangle \leq \frac{R(\mathbf{x}^*) - R(\mathbf{x}_1)}{\eta}.
\]&lt;p>&lt;strong>Bregman Divergence Definition&lt;/strong>:&lt;/p>
\[
D_R(\mathbf{x}, \mathbf{y}) = R(\mathbf{x}) - R(\mathbf{y}) - \langle \nabla R(\mathbf{y}), \mathbf{x} - \mathbf{y} \rangle.
\]&lt;/li>
&lt;li>
&lt;p>&lt;strong>Regret Bound&lt;/strong>&lt;/p>
&lt;p>Therefore, the total regret is bounded by:&lt;/p>
\[
\text{Regret}_T \leq \frac{R(\mathbf{x}^*) - R(\mathbf{x}_1)}{\eta}.
\]&lt;p>By choosing \( \eta \) appropriately (e.g., \( \eta = \sqrt{\dfrac{2 [R(\mathbf{x}^*) - R(\mathbf{x}_1)]}{G^2 T}} \)), we can achieve a regret bound of:&lt;/p>
\[
\text{Regret}_T \leq G \sqrt{2 [R(\mathbf{x}^*) - R(\mathbf{x}_1)] T}.
\]&lt;/li>
&lt;/ol>
&lt;h2 id="online-mirror-descent-omd">Online Mirror Descent (OMD)
&lt;/h2>&lt;h3 id="algorithm-steps">Algorithm Steps
&lt;/h3>&lt;ol>
&lt;li>
&lt;p>&lt;strong>Initialization&lt;/strong>: Choose an initial point \( \mathbf{x}_1 \in \mathcal{X} \).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>For each round \( t = 1, \dots, T \)&lt;/strong>:&lt;/p>
&lt;p>a. &lt;strong>Compute Subgradient&lt;/strong>:&lt;/p>
\[
\mathbf{g}_t = \nabla f_t(\mathbf{x}_t).
\]&lt;p>b. &lt;strong>Dual Space Update&lt;/strong>:&lt;/p>
\[
\mathbf{z}_{t+1} = \mathbf{z}_t - \eta \mathbf{g}_t,
\]&lt;p>where \( \mathbf{z}_t = \nabla \psi(\mathbf{x}_t) \).&lt;/p>
&lt;p>c. &lt;strong>Primal Space Update&lt;/strong>:&lt;/p>
\[
\mathbf{x}_{t+1} = \nabla \psi^*(\mathbf{z}_{t+1}),
\]&lt;p>with \( \psi^* \) being the convex conjugate of \( \psi \).&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="regret-analysis-1">Regret Analysis
&lt;/h3>&lt;p>&lt;strong>Assumptions&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Convexity&lt;/strong>: Each \( f_t \) is convex.&lt;/li>
&lt;li>&lt;strong>Lipschitz Continuity&lt;/strong>: Subgradients are bounded: \( \| \mathbf{g}_t \|_* \leq G \).&lt;/li>
&lt;li>&lt;strong>Strong Convexity&lt;/strong>: The mirror map \( \psi \) is \( \lambda \)-strongly convex.&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Key Steps&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Regret Decomposition&lt;/strong>&lt;/p>
&lt;p>The regret can be bounded by:&lt;/p>
\[
\text{Regret}_T \leq \sum_{t=1}^T \langle \mathbf{g}_t, \mathbf{x}_t - \mathbf{x}^* \rangle.
\]&lt;/li>
&lt;li>
&lt;p>&lt;strong>Using Mirror Descent Updates&lt;/strong>&lt;/p>
&lt;p>Utilizing the properties of the Bregman divergence \( D_\psi \) and the mirror descent updates:&lt;/p>
\[
\sum_{t=1}^T \langle \mathbf{g}_t, \mathbf{x}_t - \mathbf{x}^* \rangle = \frac{1}{\eta} \left[ D_\psi(\mathbf{x}^*, \mathbf{x}_1) - D_\psi(\mathbf{x}^*, \mathbf{x}_{T+1}) + \sum_{t=1}^T D_\psi(\mathbf{x}_{t+1}, \mathbf{x}_t) \right].
\]&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bounding the Bregman Divergences&lt;/strong>&lt;/p>
&lt;p>Since \( D_\psi(\mathbf{x}^*, \mathbf{x}_{T+1}) \geq 0 \) and \( D_\psi(\mathbf{x}_{t+1}, \mathbf{x}_t) \leq \dfrac{\eta^2 G^2}{2 \lambda} \), we have:&lt;/p>
\[
\text{Regret}_T \leq \frac{D_\psi(\mathbf{x}^*, \mathbf{x}_1)}{\eta} + \frac{\eta G^2 T}{2 \lambda}.
\]&lt;/li>
&lt;li>
&lt;p>&lt;strong>Optimizing the Learning Rate&lt;/strong>&lt;/p>
&lt;p>Choosing:&lt;/p>
\[
\eta = \sqrt{\dfrac{2 \lambda D_\psi(\mathbf{x}^*, \mathbf{x}_1)}{G^2 T}},
\]&lt;p>yields the regret bound:&lt;/p>
\[
\text{Regret}_T \leq G \sqrt{\dfrac{2 D_\psi(\mathbf{x}^*, \mathbf{x}_1) T}{\lambda}}.
\]&lt;/li>
&lt;/ol>
&lt;h2 id="equivalence-of-ftrl-and-omd">Equivalence of FTRL and OMD
&lt;/h2>&lt;p>Under certain conditions, FTRL and OMD are equivalent algorithms.&lt;/p>
&lt;h3 id="conditions-for-equivalence">Conditions for Equivalence
&lt;/h3>&lt;ul>
&lt;li>&lt;strong>Matching Regularizers and Mirror Maps&lt;/strong>: If the regularizer \( R \) in FTRL is identical to the mirror map \( \psi \) in OMD.&lt;/li>
&lt;li>&lt;strong>Unconstrained Domain&lt;/strong>: When the feasible set \( \mathcal{X} \) is the entire space \( \mathbb{R}^n \).&lt;/li>
&lt;/ul>
&lt;h3 id="demonstration-of-equivalence">Demonstration of Equivalence
&lt;/h3>&lt;ol>
&lt;li>
&lt;p>&lt;strong>FTRL Update in Terms of Gradients&lt;/strong>&lt;/p>
&lt;p>The FTRL update can be expressed as:&lt;/p>
\[
\mathbf{x}_t = \arg\min_{\mathbf{x} \in \mathcal{X}} \left\{ \left\langle \eta \sum_{s=1}^{t-1} \mathbf{g}_s, \mathbf{x} \right\rangle + R(\mathbf{x}) \right\}.
\]&lt;/li>
&lt;li>
&lt;p>&lt;strong>Relation to Dual Variables in OMD&lt;/strong>&lt;/p>
&lt;p>In OMD, the dual variable \( \mathbf{z}_t \) is:&lt;/p>
\[
\mathbf{z}_t = \nabla \psi(\mathbf{x}_t) = \mathbf{z}_1 - \eta \sum_{s=1}^{t-1} \mathbf{g}_s.
\]&lt;/li>
&lt;li>
&lt;p>&lt;strong>Primal Update via Convex Conjugate&lt;/strong>&lt;/p>
&lt;p>The FTRL update becomes:&lt;/p>
\[
\mathbf{x}_t = \nabla R^*\left( -\eta \sum_{s=1}^{t-1} \mathbf{g}_s \right),
\]&lt;p>which matches the OMD update when \( R = \psi \):&lt;/p>
\[
\mathbf{x}_t = \nabla \psi^*\left( \nabla \psi(\mathbf{x}_1) - \eta \sum_{s=1}^{t-1} \mathbf{g}_s \right).
\]&lt;/li>
&lt;/ol>
&lt;h3 id="conclusion">Conclusion
&lt;/h3>&lt;p>By aligning the regularization function in FTRL with the mirror map in OMD and considering the unconstrained domain, the updates of both algorithms coincide. This demonstrates that FTRL and OMD are essentially equivalent under these conditions, offering different perspectives on the same optimization process.&lt;/p></description></item><item><title>About</title><link>https://blog.sihanwei.org/about/</link><pubDate>Sun, 28 Jun 2020 08:31:57 -0500</pubDate><guid>https://blog.sihanwei.org/about/</guid><description>&lt;h2 id="about-me">About Me
&lt;/h2>&lt;p>Hi, I’m Sihan Wei &amp;mdash; a learner who documents the path, and lights it up for others.&lt;/p>
&lt;p>I write (and think) about machine learning theory, optimization, and the occasional abstract rabbit hole.&lt;/p>
&lt;p>This blog is a space for slow thoughts: the kind that start with a proof, wander through patterns, and land somewhere in probability.&lt;/p>
&lt;p>I mostly write in English, but every now and then you’ll find me posting something fun in &lt;a class="link" href="https://blog.sihanwei.org/zh-cn/" target="_blank" rel="noopener"
>Chinese&lt;/a> &amp;mdash; it’s my mother tongue, and sometimes it just captures the feeling better.&lt;/p>
&lt;p>For academic stuff: &lt;a class="link" href="https://sihanwei.org" target="_blank" rel="noopener"
>Check out my research homepage&lt;/a>.&lt;/p>
&lt;p>Hope you enjoy hanging out here on my blog!&lt;/p>
&lt;hr>
&lt;h2 id="about-this-blog">About this blog
&lt;/h2>&lt;p>I started this blog because I once didn’t understand — and now that I do, I want to help others get there faster. This is my way of passing the torch.&lt;/p>
&lt;p>This blog has three main flavors:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Research notes&lt;/strong> — mostly for myself. Stuff I’m thinking about, half-finished ideas, little technical rabbit holes.&lt;br>
If you’re into optimization or ML theory, cool — you might find a gem (or at least a weird equation) here and there.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>ML notes&lt;/strong> — for anyone trying to make sense of machine learning.&lt;br>
I write these when I finally understand something I’ve been stuck on — hoping it saves someone else a bit of headache.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Learning log&lt;/strong> — notes from when I’m learning something outside my usual lane.&lt;br>
I try to capture not just what I’ve learned, but how I got there — the questions, the patterns, the mental clicks along the way.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Basically: I write to figure things out. And sometimes, I hit “publish” in case it helps someone else too.&lt;/p>
&lt;hr>
&lt;p>&lt;strong>Proofs&lt;/strong> — because structure matters.&lt;br>
&lt;strong>Patterns&lt;/strong> — because abstraction connects everything.&lt;br>
&lt;strong>Probabilities&lt;/strong> — because uncertainty is part of all learning, and all living.&lt;/p>
&lt;p>This is the lens I bring to research — and sometimes, to writing too.&lt;/p>
&lt;hr>
&lt;h2 id="my-name">My Name
&lt;/h2>&lt;p>My Chinese name is &lt;strong>思涵&lt;/strong>, pronounced Sī Hán in Mandarin. (&lt;a class="link" href="https://translate.google.com/?sl=auto&amp;amp;tl=en&amp;amp;text=%E6%80%9D%E6%B6%B5&amp;amp;op=translate" target="_blank" rel="noopener"
>Hear it here via Google Translate&lt;/a>)&lt;/p>
&lt;p>It was chosen by my mom, and it means a lot to both of us.&lt;/p>
&lt;p>&lt;a class="link" href="https://en.wiktionary.org/wiki/%E6%80%9D" target="_blank" rel="noopener"
>&amp;ldquo;思&amp;rdquo;&lt;/a> means “to think” or “thought,” and &lt;a class="link" href="https://en.wiktionary.org/wiki/%E6%B6%B5" target="_blank" rel="noopener"
>&amp;ldquo;涵&amp;rdquo;&lt;/a> means “to forgive,” “to tolerate,” or “to be lenient.”&lt;/p>
&lt;p>My mom once told me she had lived through a lot of anger and intolerance, and she hoped I’d grow into someone who thinks before speaking, and meets the world with calm and grace.&lt;/p>
&lt;p>I still think about that often. And I hope to live up to the name.&lt;/p>
&lt;hr>
&lt;p>Thanks for stopping by!&lt;/p></description></item><item><title>Archives</title><link>https://blog.sihanwei.org/archives/</link><pubDate>Tue, 28 May 2019 00:00:00 +0000</pubDate><guid>https://blog.sihanwei.org/archives/</guid><description/></item><item><title>Links</title><link>https://blog.sihanwei.org/links/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://blog.sihanwei.org/links/</guid><description>&lt;p>Here are some links I found really helpful or just plain cool. Hope you enjoy them too!&lt;/p></description></item><item><title>Search</title><link>https://blog.sihanwei.org/search/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://blog.sihanwei.org/search/</guid><description/></item><item><title>Subscribe</title><link>https://blog.sihanwei.org/subscribe/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://blog.sihanwei.org/subscribe/</guid><description>&lt;h3 id="subscribe-to-this-blog">Subscribe to This Blog
&lt;/h3>&lt;p>Stay updated with the latest posts from this blog via RSS!&lt;/p>
&lt;ul>
&lt;li>&lt;strong>RSS Feed URL&lt;/strong>: &lt;a class="link" href="https://blog.sihanwei.org/index.xml" >https://blog.sihanwei.org/index.xml&lt;/a>&lt;/li>
&lt;li>&lt;strong>How to subscribe&lt;/strong>:&lt;br>
Copy the feed URL above and paste it into your favorite RSS reader.&lt;/li>
&lt;/ul>
&lt;h3 id="recommended-rss-readers">Recommended RSS Readers
&lt;/h3>&lt;ul>
&lt;li>&lt;a class="link" href="https://feedly.com/" target="_blank" rel="noopener"
>Feedly&lt;/a> – Clean, cloud-based, and free&lt;/li>
&lt;li>&lt;a class="link" href="https://inoreader.com/" target="_blank" rel="noopener"
>Inoreader&lt;/a> – Power-user friendly with automation&lt;/li>
&lt;li>&lt;a class="link" href="https://miniflux.app/" target="_blank" rel="noopener"
>Miniflux&lt;/a> – Minimalist and self-hostable&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>RSS is a simple, privacy-friendly way to follow blogs — no email or accounts required.&lt;br>
You’ll always be the first to know when a new post drops!&lt;/p></description></item></channel></rss>