blog-content/edgemasks.html

<a href="/blog">
    {% load static %}
    <div class="bottom_right_div"><img src="{% static '2hu.png' %}"></div>
</a>
<div id="overlay" aria-hidden="true" onclick="removefull()"></div>
<div class="wrapper_article">
    <style scoped>
        convolution {
            display: flex;
            flex-direction: column;
            font-size: 250pt;
            width: 1em;
            height: 1em;
        }

        convolution > * > * {
            flex-grow: 1;
            flex-shrink: 1;
            border: 1px #7f7f7f solid;
            font-size: 30pt;
            display: flex;
            align-content: space-around;
        }

        convolution.c3x3 > * > * {
            flex-basis: 33%;
        }

        convolution.c5x5 > * > * {
            flex-basis: 20%;
        }

        convolution.c7x7 > * > * {
            flex-basis: 14.28%;
        }

        convolution > * > * > span {
            margin: auto;
        }

        convolution > * {
            display: flex;
            flex-direction: row;
            height: 100%;
        }

        convolution > * > transparent {
            background-color: transparent;
        }

        convolution > * > *[data-accent="1"] {
            border-color: #e17800;
        }

        convolution > * > *[data-accent="2"] {
            border-color: #6c3e00;
        }
    </style>
    <p class="heading">Edge Masks</p>
    <div class="content">
        <p class="subhead">Table of contents</p>
        <ul>
            <li><a href="#c_intro">Abstract</a></li>
            <li><a href="#c_theory">Theory, examples, and explanations</a></li>
            <li><a href="#c_deband">Using edge masks</a></li>
            <li><a href="#c_performance">Performance</a></li>
            <li><a href="#c_end">Conclusion</a></li>
        </ul>
        <a id="c_intro" href="#c_intro"><p class="subhead">Abstract</p></a>
        <p>
            Digital video can suffer from a plethora of visual artifacts caused by lossy compression, transmission,
            quantization, and even errors in the mastering process.
            These artifacts include, but are not limited to, banding, aliasing,
            loss of detail and sharpness (blur), discoloration, halos and other edge artifacts, and excessive noise.<br>
            Since many of these defects are rather common, filters have been created to remove, minimize, or at least
            reduce their visual impact on the image. However, the stronger the defects in the source video are, the more
            aggressive filtering is needed to remedy them, which may induce new artifacts.
        </p>
        <p>
            In order to avoid this, masks are used to specifically target the affected scenes and areas while leaving
            the
            rest unprocessed.
            These masks can be used to isolate certain colors or, more importantly, certain structural components of an
            image.
            Many of the aforementioned defects are either limited to the edges of an image (halos, aliasing) or will
            never
            occur in edge regions (banding). In these cases, the unwanted by-products of the respective filters can be
            limited by only applying the filter to the relevant areas. Since edge masking is a fundamental component of
            understanding and analyzing the structure of an image, many different implementations were created over the past few
            decades, many of which are now available to us.</p>
        <p> In this article, I will briefly explain and compare different ways to generate masks that deal with the
            aforementioned problems.</p>
        <a id="c_theory" href="#c_theory"><p class="subhead">Theory, examples, and explanations</p></a>
        <p>
            Most popular algorithms try to detect abrupt changes in brightness by using convolutions to analyze the
            direct
            neighbourhood of the reference pixel. Since the computational complexity of a convolution is
            <em>0(n<sup>2</sup>)</em>
            (where n is the radius), the the radius should be as small as possible while still maintaining a reasonable
            level of accuracy. Decreasing the radius of a convolution will make it more susceptible to noise and similar
            artifacts.</p>
        <p>Most algorithms use 3x3 convolutions, which offer the best balance between speed and accuracy. Examples are
            the operators proposed by Prewitt, Sobel, Scharr, and Kirsch. Given a sufficiently clean (noise-free)
            source, 2x2 convolutions can also be used<span class="source"><a
                    href="http://homepages.inf.ed.ac.uk/rbf/HIPR2/roberts.htm">[src]</a></span>, but with modern
            hardware being able to calculate 3x3 convolutions
            for HD video in real time, the gain in speed is often outweighed by the decreased accuracy.</p>
        <p>To better illustrate this, I will use the Sobel operator to compute an example image.<br>
            Sobel uses two convolutions to detect edges along the x and y axes. Note that you either need two separate
            convolutions per axis or one convolution that returns the absolute values of each pixel, rather than 0 for
            negative values.
        <table class="full_width_table">
            <tr>
                <td style="width: 40%;">
                    <convolution class="c3x3 accent1">
                        <row>
                            <transparent><span>-1</span></transparent>
                            <transparent><span>-2</span></transparent>
                            <transparent><span>-1</span></transparent>
                        </row>
                        <row>
                            <transparent><span>0</span></transparent>
                            <transparent><span>0</span></transparent>
                            <transparent><span>0</span></transparent>
                        </row>
                        <row>
                            <transparent><span>1</span></transparent>
                            <transparent><span>2</span></transparent>
                            <transparent><span>1</span></transparent>
                        </row>
                    </convolution>
                </td>
                <td style="width: 40%;">
                    <convolution class="c3x3 accent1">
                        <row>
                            <transparent><span>-1</span></transparent>
                            <transparent><span>0</span></transparent>
                            <transparent><span>1</span></transparent>
                        </row>
                        <row>
                            <transparent><span>-2</span></transparent>
                            <transparent><span>0</span></transparent>
                            <transparent><span>2</span></transparent>
                        </row>
                        <row>
                            <transparent><span>-1</span></transparent>
                            <transparent><span>0</span></transparent>
                            <transparent><span>1</span></transparent>
                        </row>
                    </convolution>
                </td>
            </tr>
        </table>
        <p>
            Every pixel is set to the highest output of any of these convolutions. A simple implementation using the
            Convolution function of Vapoursynth would look like this:</p>
        <pre><code class="python">def sobel(src):
    sx = src.std.Convolution([-1, -2, -1, 0, 0, 0, 1, 2, 1], saturate=False)
    sy = src.std.Convolution([-1, 0, 1, -2, 0, 2, -1, 0, 1], saturate=False)
    return core.std.Expr([sx, sy], 'x y max')</code></pre>
        Fortunately, Vapoursynth has a build-in Sobel function
        <code>core.std.Sobel</code>, so we don't even have to write our own code.
        <p>Hover over the following image to see the Sobel edge mask. </p>
        <img src="/media/articles/res_edge/brickwall.png"
             onmouseover="this.setAttribute('src', '/media/articles/res_edge/brickwall_sobel.png')"
             onmouseout="this.setAttribute('src', '/media/articles/res_edge/brickwall.png')"><br>
        <p>Of course, this example is highly idealized. All lines run parallel to either the x or the y axis, there are
            no small details, and the overall complexity of the image is very low.</p>
        Using a more complex image with blurrier lines and more diagonals results in a much more inaccurate edge mask.
        <img src="/media/articles/res_edge/kuzu.png" onmouseover="this.setAttribute('src','/media/articles/res_edge/kuzu_sobel.png')"
             onmouseout="this.setAttribute('src','/media/articles/res_edge/kuzu.png')"><br>
        <p>
            A simple way to greatly improve the accuracy of the detection is the use of 8-connectivity rather than
            4-connectivity. This means utilizing all eight directions of the <a
                href="https://en.wikipedia.org/wiki/Moore_neighborhood">Moore neighbourhood</a>, i.e. also using the
            diagonals of the 3x3 neighbourhood.<br>
            To achieve this, I will use a convolution kernel proposed by Russel A. Kirsch in 1970<span class="source"><a
                href="https://ddl.kageru.moe/konOJ.pdf">[src]</a></span>.
        </p>
        <convolution class="c3x3 accent1">
            <row>
                <transparent><span>5</span></transparent>
                <transparent><span>5</span></transparent>
                <transparent><span>5</span></transparent>
            </row>
            <row>
                <transparent><span>-3</span></transparent>
                <transparent><span>0</span></transparent>
                <transparent><span>-3</span></transparent>
            </row>
            <row>
                <transparent><span>-3</span></transparent>
                <transparent><span>-3</span></transparent>
                <transparent><span>-3</span></transparent>
            </row>
        </convolution>
        <br>
        This kernel is then rotated in increments of 45° until it reaches its original position.<br>
        Since Vapoursynth does not have an internal function for the Kirsch operator, I had to build my own; again,
        using the internal convolution.

        <pre><code class="python">def kirsch(src):
    kirsch1 = src.std.Convolution(matrix=[ 5,  5,  5, -3,  0, -3, -3, -3, -3])
    kirsch2 = src.std.Convolution(matrix=[-3,  5,  5,  5,  0, -3, -3, -3, -3])
    kirsch3 = src.std.Convolution(matrix=[-3, -3,  5,  5,  0,  5, -3, -3, -3])
    kirsch4 = src.std.Convolution(matrix=[-3, -3, -3,  5,  0,  5,  5, -3, -3])
    kirsch5 = src.std.Convolution(matrix=[-3, -3, -3, -3,  0,  5,  5,  5, -3])
    kirsch6 = src.std.Convolution(matrix=[-3, -3, -3, -3,  0, -3,  5,  5,  5])
    kirsch7 = src.std.Convolution(matrix=[ 5, -3, -3, -3,  0, -3, -3,  5,  5])
    kirsch8 = src.std.Convolution(matrix=[ 5,  5, -3, -3,  0, -3, -3,  5, -3])
    return core.std.Expr([kirsch1, kirsch2, kirsch3, kirsch4, kirsch5, kirsch6, kirsch7, kirsch8],
            'x y max z max a max b max c max d max e max')</code></pre>
        <p>
            It should be obvious that the cheap copy-paste approach is not acceptable to solve this problem. Sure, it
            works,
            but I'm not a mathematician, and mathematicians are the only people who write code like that. Also, yes, you
            can pass more than three clips to sdt.Expr, even though the documentation says otherwise.<br>Or maybe my
            limited understanding of math (not being a mathematician, after all) was simply insufficient to properly
            decode “Expr evaluates an expression per
            pixel for up to <span class="accent1">3</span> input clips.”</p>
        Anyway, let's try that again, shall we?
        <pre><code class="python">def kirsch(src: vs.VideoNode) -> vs.VideoNode:
    w = [5]*3 + [-3]*5
    weights = [w[-i:] + w[:-i] for i in range(4)]
    c = [core.std.Convolution(src, (w[:4]+[0]+w[4:]), saturate=False) for w in weights]
    return core.std.Expr(c, 'x y max z max a max')</code></pre>
        <p>Much better already. Who needed readable code, anyway?</p>
        If we compare the Sobel edge mask with the Kirsch operator's mask, we can clearly see the improved accuracy.
        (Hover=Kirsch)<br>
        <img src="/media/articles/res_edge/kuzu_sobel.png" onmouseover="this.setAttribute('src','/media/articles/res_edge/kuzu_kirsch.png')"
             onmouseout="this.setAttribute('src','/media/articles/res_edge/kuzu_sobel.png')"><br>
        <p> The higher overall sensitivity of the detection also results in more noise being visible in the edge mask.
            This can be remedied by denoising the image prior to the analysis.<br>
            The increase in accuracy comes at an almost negligible cost in terms of computational complexity. About
            175 fps
            for 8-bit 1080p content (luma only) compared to 215 fps with the previously shown sobel
            <span title="You know why I'm putting this in quotes">‘implementation’</span>. The internal Sobel filter is
            not used for this comparison as it also includes a high- and lowpass function as well as scaling options,
            making it slower than the Sobel function above. Note that many of the edges are also detected by the Sobel
            operator, however, these are very faint and only visible after an operation like std.Binarize.</p>
        A more sophisticated way to generate an edge mask is the TCanny algorithm which uses a similar procedure to find
        edges but
        then reduces these edges to 1 pixel thin lines. Optimally, these lines represent the middle
        of each edge, and no edge is marked twice. It also applies a gaussian blur to the image to eliminate noise
        and other distortions that might incorrectly be recognized as edges. The following example was created with
        TCanny using these settings: <code>core.tcanny.TCanny(op=1,
        mode=0)</code>. op=1 uses a modified operator that has been shown to achieve better signal-to-noise ratios<span
            class="source"><a href="http://www.jofcis.com/publishedpapers/2011_7_5_1516_1523.pdf">[src]</a></span>.<br>
        <img src="/media/articles/res_edge/kuzu_tcanny.png"><br>
        Since I've already touched upon bigger convolutions earlier without showing anything specific, here is an
        example of the things that are possible with 5x5 convolutions.
        <pre><code class="python"
        >src.std.Convolution(matrix=[1,  2,  4,  2, 1,
                            2, -3, -6, -3, 2,
                            4, -6,  0, -6, 4,
                            2, -3, -6, -3, 2,
                            1,  2,  4,  2, 1], saturate=False)</code></pre>
        <img src="/media/articles/res_edge/kuzu5x5.png">
        This was an attempt to create an edge mask that draws around the edges. With a few modifications, this might
        become useful for
        halo removal or edge cleaning. (Although something similar (probably better) can be created with a regular edge
        mask, std.Maximum, and std.Expr)
        <a id="c_deband" href="#c_deband"><p class="subhead">Using edge masks</p></a>
        <p>
            Now that we've established the basics, let's look at real world applications. Since 8-bit video sources are
            still everywhere, barely any encode can be done without debanding. As I've mentioned before, restoration
            filters
            can often induce new artifacts, and in the case of debanding, these artifacts are loss of detail and, for
            stronger debanding, blur. An edge mask could be used to remedy these effects, essentially allowing the
            debanding
            filter to deband whatever it deems necessary and then restoring the edges and details via
            std.MaskedMerge.</p>
        <p>
            GradFun3 internally generates a mask to do exactly this. f3kdb, the <span
                title="I know that “the other filter” is misleading since GF3 is not a filter but a script, but if that's your only concern so far, I must be doing a pretty good job.">other popular debanding filter</span>,
            does not have any integrated masking functionality.</p>
        Consider this image:<br>
        <img src="/media/articles/res_edge/aldnoah.png">
        <p>
            As you can see, there is quite a lot of banding in this image. Using a regular debanding filter to remove it
            would likely also destroy a lot of small details, especially in the darker parts of the image.<br>
            Using the Sobel operator to generate an edge mask yields this (admittedly rather disappointing) result:</p>
        <img src="/media/articles/res_edge/aldnoah_sobel.png">
        <p>
            In order to better recognize edges in dark areas, the retinex
            algorithm can be used for local contrast enhancement.</p>
        <img src="/media/articles/res_edge/aldnoah_retinex.png">
        <div style="font-size: 80%; text-align: right">The image after applying the retinex filter, luma only.</div>
        <p>
            We can now see a lot of information that was previously barely visible due to the low contrast. One might
            think
            that preserving this information is a vain effort, but with HDR-monitors slowly making their way into the
            mainstream and more possible improvements down the line, this extra information might be visible on consumer
            grade screens at some point. And since it doesn't waste a noticeable amount of bitrate, I see no harm in
            keeping it.</p>
        Using this newly gained knowledge, some testing, and a little bit of magic, we can create a surprisingly
        accurate edge mask.
        <pre><code class="python">def retinex_edgemask(luma, sigma=1):
    ret = core.retinex.MSRCP(luma, sigma=[50, 200, 350], upper_thr=0.005)
    return core.std.Expr([kirsch(luma), ret.tcanny.TCanny(mode=1, sigma=sigma).std.Minimum(
        coordinates=[1, 0, 1, 0, 0, 1, 0, 1])], 'x y +')</code></pre>
        <p>Using this code, our generated edge mask looks as follows:</p>
        <img src="/media/articles/res_edge/aldnoah_kage.png">
        <p>
            By using std.Binarize (or a similar lowpass/highpass function) and a few std.Maximum and/or std.Inflate
            calls, we can transform this edgemask into a
            more usable detail mask for our debanding function or any other function that requires a precise edge mask.
        </p>
        <a id="c_performance" href="#c_performance"><p class="subhead">Performance</p></a>
        Most edge mask algorithms are simple convolutions, allowing them to run at over 100 fps even for HD content. A
        complex algorithm like retinex can obviously not compete with that, as is evident by looking at the benchmarks.
        While a simple edge mask with a Sobel kernel ran consistently above 200 fps, the function described above only
        procudes 25 frames per second. Most of that speed is lost to retinex, which, if executed alone, yields about
        36.6 fps. A similar, <span title="and I mean a LOT more inaccurate">albeit more inaccurate</span>, way to
        improve the detection of dark, low-contrast edges would be applying a simple curve to the brightness of the
        image.
        <pre><code class="python">bright = core.std.Expr(src, 'x 65535 / sqrt 65535 *')</code></pre>
        This should (in theory) improve the detection of dark edges in dark images or regions by adjusting their
        brightness
        as shown in this curve:
        <img src="/media/articles/res_edge/sqrtx.svg"
             title="yes, I actually used matplotlib to generate my own image for sqrt(x) rather than taking one of the millions available online">
        <a id="c_end" href="#c_end"><h2 class="subhead">Conclusion</h2></a>
        Edge masks have been a powerful tool for image analysis for decades now. They can be used to reduce an image to
        its most essential components and thus significantly facilitate many image analysis processes. They can also be
        used to great effect in video processing to minimize unwanted by-products and artifacts of more agressive
        filtering. Using convolutions, one can create fast and accurate edge masks, which can be customized and adapted
        to serve any specific purpose by changing the parameters of the kernel. The use of local contrast enhancement
        to improve the detection accuracy of the algorithm was shown to be possible, albeit significantly slower.<br><br>
        <pre><code class="python"># Quick overview of all scripts described in this article:
################################################################

# Use retinex to greatly improve the accuracy of the edge detection in dark scenes.
# draft=True is a lot faster, albeit less accurate
def retinex_edgemask(src: vs.VideoNode, sigma=1, draft=False) -> vs.VideoNode:
    core = vs.get_core()
    src = mvf.Depth(src, 16)
    luma = mvf.GetPlane(src, 0)
    if draft:
        ret = core.std.Expr(luma, 'x 65535 / sqrt 65535 *')
    else:
        ret = core.retinex.MSRCP(luma, sigma=[50, 200, 350], upper_thr=0.005)
    mask = core.std.Expr([kirsch(luma), ret.tcanny.TCanny(mode=1, sigma=sigma).std.Minimum(
        coordinates=[1, 0, 1, 0, 0, 1, 0, 1])], 'x y +')
    return mask


# Kirsch edge detection. This uses 8 directions, so it's slower but better than Sobel (4 directions).
# more information: https://ddl.kageru.moe/konOJ.pdf
def kirsch(src: vs.VideoNode) -> vs.VideoNode:
    core = vs.get_core()
    w = [5]*3 + [-3]*5
    weights = [w[-i:] + w[:-i] for i in range(4)]
    c = [core.std.Convolution(src, (w[:4]+[0]+w[4:]), saturate=False) for w in weights]
    return core.std.Expr(c, 'x y max z max a max')


# should behave similar to std.Sobel() but faster since it has no additional high-/lowpass or gain.
# the internal filter is also a little brighter
def fast_sobel(src: vs.VideoNode) -> vs.VideoNode:
    core = vs.get_core()
    sx = src.std.Convolution([-1, -2, -1, 0, 0, 0, 1, 2, 1], saturate=False)
    sy = src.std.Convolution([-1, 0, 1, -2, 0, 2, -1, 0, 1], saturate=False)
    return core.std.Expr([sx, sy], 'x y max')


# a weird kind of edgemask that draws around the edges. probably needs more tweaking/testing
# maybe useful for edge cleaning?
def bloated_edgemask(src: vs.VideoNode) -> vs.VideoNode:
    return src.std.Convolution(matrix=[1,  2,  4,  2, 1,
                                       2, -3, -6, -3, 2,
                                       4, -6,  0, -6, 4,
                                       2, -3, -6, -3, 2,
                                       1,  2,  4,  2, 1], saturate=False)</code></pre>
        <div class="download_centered">
            <span class="source">Some of the functions described here have been added to my script collection on Github<br></span><a href="https://gist.github.com/kageru/d71e44d9a83376d6b35a85122d427eb5">Download</a></div>
        <br><br><br><br><br><br><span class="ninjatext">Mom, look! I found a way to burn billions of CPU cycles with my new placebo debanding script!</span>
    </div>
</div>