380 lines
23 KiB
HTML
380 lines
23 KiB
HTML
|
<a href="/blog">
|
||
|
{% load static %}
|
||
|
<div class="bottom_right_div"><img src="{% static '2hu.png' %}"></div>
|
||
|
</a>
|
||
|
<div id="overlay" aria-hidden="true" onclick="removefull()"></div>
|
||
|
<div class="wrapper_article">
|
||
|
<style scoped>
|
||
|
convolution {
|
||
|
display: flex;
|
||
|
flex-direction: column;
|
||
|
font-size: 250pt;
|
||
|
width: 1em;
|
||
|
height: 1em;
|
||
|
}
|
||
|
|
||
|
convolution > * > * {
|
||
|
flex-grow: 1;
|
||
|
flex-shrink: 1;
|
||
|
border: 1px #7f7f7f solid;
|
||
|
font-size: 30pt;
|
||
|
display: flex;
|
||
|
align-content: space-around;
|
||
|
}
|
||
|
|
||
|
convolution.c3x3 > * > * {
|
||
|
flex-basis: 33%;
|
||
|
}
|
||
|
|
||
|
convolution.c5x5 > * > * {
|
||
|
flex-basis: 20%;
|
||
|
}
|
||
|
|
||
|
convolution.c7x7 > * > * {
|
||
|
flex-basis: 14.28%;
|
||
|
}
|
||
|
|
||
|
convolution > * > * > span {
|
||
|
margin: auto;
|
||
|
}
|
||
|
|
||
|
convolution > * {
|
||
|
display: flex;
|
||
|
flex-direction: row;
|
||
|
height: 100%;
|
||
|
}
|
||
|
|
||
|
convolution > * > transparent {
|
||
|
background-color: transparent;
|
||
|
}
|
||
|
|
||
|
convolution > * > *[data-accent="1"] {
|
||
|
border-color: #e17800;
|
||
|
}
|
||
|
|
||
|
convolution > * > *[data-accent="2"] {
|
||
|
border-color: #6c3e00;
|
||
|
}
|
||
|
</style>
|
||
|
<p class="heading">Edge Masks</p>
|
||
|
<div class="content">
|
||
|
<p class="subhead">Table of contents</p>
|
||
|
<ul>
|
||
|
<li><a href="#c_intro">Abstract</a></li>
|
||
|
<li><a href="#c_theory">Theory, examples, and explanations</a></li>
|
||
|
<li><a href="#c_deband">Using edge masks</a></li>
|
||
|
<li><a href="#c_performance">Performance</a></li>
|
||
|
<li><a href="#c_end">Conclusion</a></li>
|
||
|
</ul>
|
||
|
<a id="c_intro" href="#c_intro"><p class="subhead">Abstract</p></a>
|
||
|
<p>
|
||
|
Digital video can suffer from a plethora of visual artifacts caused by lossy compression, transmission,
|
||
|
quantization, and even errors in the mastering process.
|
||
|
These artifacts include, but are not limited to, banding, aliasing,
|
||
|
loss of detail and sharpness (blur), discoloration, halos and other edge artifacts, and excessive noise.<br>
|
||
|
Since many of these defects are rather common, filters have been created to remove, minimize, or at least
|
||
|
reduce their visual impact on the image. However, the stronger the defects in the source video are, the more
|
||
|
aggressive filtering is needed to remedy them, which may induce new artifacts.
|
||
|
</p>
|
||
|
<p>
|
||
|
In order to avoid this, masks are used to specifically target the affected scenes and areas while leaving
|
||
|
the
|
||
|
rest unprocessed.
|
||
|
These masks can be used to isolate certain colors or, more importantly, certain structural components of an
|
||
|
image.
|
||
|
Many of the aforementioned defects are either limited to the edges of an image (halos, aliasing) or will
|
||
|
never
|
||
|
occur in edge regions (banding). In these cases, the unwanted by-products of the respective filters can be
|
||
|
limited by only applying the filter to the relevant areas. Since edge masking is a fundamental component of
|
||
|
understanding and analyzing the structure of an image, many different implementations were created over the past few
|
||
|
decades, many of which are now available to us.</p>
|
||
|
<p> In this article, I will briefly explain and compare different ways to generate masks that deal with the
|
||
|
aforementioned problems.</p>
|
||
|
<a id="c_theory" href="#c_theory"><p class="subhead">Theory, examples, and explanations</p></a>
|
||
|
<p>
|
||
|
Most popular algorithms try to detect abrupt changes in brightness by using convolutions to analyze the
|
||
|
direct
|
||
|
neighbourhood of the reference pixel. Since the computational complexity of a convolution is
|
||
|
<em>0(n<sup>2</sup>)</em>
|
||
|
(where n is the radius), the the radius should be as small as possible while still maintaining a reasonable
|
||
|
level of accuracy. Decreasing the radius of a convolution will make it more susceptible to noise and similar
|
||
|
artifacts.</p>
|
||
|
<p>Most algorithms use 3x3 convolutions, which offer the best balance between speed and accuracy. Examples are
|
||
|
the operators proposed by Prewitt, Sobel, Scharr, and Kirsch. Given a sufficiently clean (noise-free)
|
||
|
source, 2x2 convolutions can also be used<span class="source"><a
|
||
|
href="http://homepages.inf.ed.ac.uk/rbf/HIPR2/roberts.htm">[src]</a></span>, but with modern
|
||
|
hardware being able to calculate 3x3 convolutions
|
||
|
for HD video in real time, the gain in speed is often outweighed by the decreased accuracy.</p>
|
||
|
<p>To better illustrate this, I will use the Sobel operator to compute an example image.<br>
|
||
|
Sobel uses two convolutions to detect edges along the x and y axes. Note that you either need two separate
|
||
|
convolutions per axis or one convolution that returns the absolute values of each pixel, rather than 0 for
|
||
|
negative values.
|
||
|
<table class="full_width_table">
|
||
|
<tr>
|
||
|
<td style="width: 40%;">
|
||
|
<convolution class="c3x3 accent1">
|
||
|
<row>
|
||
|
<transparent><span>-1</span></transparent>
|
||
|
<transparent><span>-2</span></transparent>
|
||
|
<transparent><span>-1</span></transparent>
|
||
|
</row>
|
||
|
<row>
|
||
|
<transparent><span>0</span></transparent>
|
||
|
<transparent><span>0</span></transparent>
|
||
|
<transparent><span>0</span></transparent>
|
||
|
</row>
|
||
|
<row>
|
||
|
<transparent><span>1</span></transparent>
|
||
|
<transparent><span>2</span></transparent>
|
||
|
<transparent><span>1</span></transparent>
|
||
|
</row>
|
||
|
</convolution>
|
||
|
</td>
|
||
|
<td style="width: 40%;">
|
||
|
<convolution class="c3x3 accent1">
|
||
|
<row>
|
||
|
<transparent><span>-1</span></transparent>
|
||
|
<transparent><span>0</span></transparent>
|
||
|
<transparent><span>1</span></transparent>
|
||
|
</row>
|
||
|
<row>
|
||
|
<transparent><span>-2</span></transparent>
|
||
|
<transparent><span>0</span></transparent>
|
||
|
<transparent><span>2</span></transparent>
|
||
|
</row>
|
||
|
<row>
|
||
|
<transparent><span>-1</span></transparent>
|
||
|
<transparent><span>0</span></transparent>
|
||
|
<transparent><span>1</span></transparent>
|
||
|
</row>
|
||
|
</convolution>
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
<p>
|
||
|
Every pixel is set to the highest output of any of these convolutions. A simple implementation using the
|
||
|
Convolution function of Vapoursynth would look like this:</p>
|
||
|
<pre><code class="python">def sobel(src):
|
||
|
sx = src.std.Convolution([-1, -2, -1, 0, 0, 0, 1, 2, 1], saturate=False)
|
||
|
sy = src.std.Convolution([-1, 0, 1, -2, 0, 2, -1, 0, 1], saturate=False)
|
||
|
return core.std.Expr([sx, sy], 'x y max')</code></pre>
|
||
|
Fortunately, Vapoursynth has a build-in Sobel function
|
||
|
<code>core.std.Sobel</code>, so we don't even have to write our own code.
|
||
|
<p>Hover over the following image to see the Sobel edge mask. </p>
|
||
|
<img src="/media/articles/res_edge/brickwall.png"
|
||
|
onmouseover="this.setAttribute('src', '/media/articles/res_edge/brickwall_sobel.png')"
|
||
|
onmouseout="this.setAttribute('src', '/media/articles/res_edge/brickwall.png')"><br>
|
||
|
<p>Of course, this example is highly idealized. All lines run parallel to either the x or the y axis, there are
|
||
|
no small details, and the overall complexity of the image is very low.</p>
|
||
|
Using a more complex image with blurrier lines and more diagonals results in a much more inaccurate edge mask.
|
||
|
<img src="/media/articles/res_edge/kuzu.png" onmouseover="this.setAttribute('src','/media/articles/res_edge/kuzu_sobel.png')"
|
||
|
onmouseout="this.setAttribute('src','/media/articles/res_edge/kuzu.png')"><br>
|
||
|
<p>
|
||
|
A simple way to greatly improve the accuracy of the detection is the use of 8-connectivity rather than
|
||
|
4-connectivity. This means utilizing all eight directions of the <a
|
||
|
href="https://en.wikipedia.org/wiki/Moore_neighborhood">Moore neighbourhood</a>, i.e. also using the
|
||
|
diagonals of the 3x3 neighbourhood.<br>
|
||
|
To achieve this, I will use a convolution kernel proposed by Russel A. Kirsch in 1970<span class="source"><a
|
||
|
href="https://ddl.kageru.moe/konOJ.pdf">[src]</a></span>.
|
||
|
</p>
|
||
|
<convolution class="c3x3 accent1">
|
||
|
<row>
|
||
|
<transparent><span>5</span></transparent>
|
||
|
<transparent><span>5</span></transparent>
|
||
|
<transparent><span>5</span></transparent>
|
||
|
</row>
|
||
|
<row>
|
||
|
<transparent><span>-3</span></transparent>
|
||
|
<transparent><span>0</span></transparent>
|
||
|
<transparent><span>-3</span></transparent>
|
||
|
</row>
|
||
|
<row>
|
||
|
<transparent><span>-3</span></transparent>
|
||
|
<transparent><span>-3</span></transparent>
|
||
|
<transparent><span>-3</span></transparent>
|
||
|
</row>
|
||
|
</convolution>
|
||
|
<br>
|
||
|
This kernel is then rotated in increments of 45° until it reaches its original position.<br>
|
||
|
Since Vapoursynth does not have an internal function for the Kirsch operator, I had to build my own; again,
|
||
|
using the internal convolution.
|
||
|
|
||
|
<pre><code class="python">def kirsch(src):
|
||
|
kirsch1 = src.std.Convolution(matrix=[ 5, 5, 5, -3, 0, -3, -3, -3, -3])
|
||
|
kirsch2 = src.std.Convolution(matrix=[-3, 5, 5, 5, 0, -3, -3, -3, -3])
|
||
|
kirsch3 = src.std.Convolution(matrix=[-3, -3, 5, 5, 0, 5, -3, -3, -3])
|
||
|
kirsch4 = src.std.Convolution(matrix=[-3, -3, -3, 5, 0, 5, 5, -3, -3])
|
||
|
kirsch5 = src.std.Convolution(matrix=[-3, -3, -3, -3, 0, 5, 5, 5, -3])
|
||
|
kirsch6 = src.std.Convolution(matrix=[-3, -3, -3, -3, 0, -3, 5, 5, 5])
|
||
|
kirsch7 = src.std.Convolution(matrix=[ 5, -3, -3, -3, 0, -3, -3, 5, 5])
|
||
|
kirsch8 = src.std.Convolution(matrix=[ 5, 5, -3, -3, 0, -3, -3, 5, -3])
|
||
|
return core.std.Expr([kirsch1, kirsch2, kirsch3, kirsch4, kirsch5, kirsch6, kirsch7, kirsch8],
|
||
|
'x y max z max a max b max c max d max e max')</code></pre>
|
||
|
<p>
|
||
|
It should be obvious that the cheap copy-paste approach is not acceptable to solve this problem. Sure, it
|
||
|
works,
|
||
|
but I'm not a mathematician, and mathematicians are the only people who write code like that. Also, yes, you
|
||
|
can pass more than three clips to sdt.Expr, even though the documentation says otherwise.<br>Or maybe my
|
||
|
limited understanding of math (not being a mathematician, after all) was simply insufficient to properly
|
||
|
decode “Expr evaluates an expression per
|
||
|
pixel for up to <span class="accent1">3</span> input clips.”</p>
|
||
|
Anyway, let's try that again, shall we?
|
||
|
<pre><code class="python">def kirsch(src: vs.VideoNode) -> vs.VideoNode:
|
||
|
w = [5]*3 + [-3]*5
|
||
|
weights = [w[-i:] + w[:-i] for i in range(4)]
|
||
|
c = [core.std.Convolution(src, (w[:4]+[0]+w[4:]), saturate=False) for w in weights]
|
||
|
return core.std.Expr(c, 'x y max z max a max')</code></pre>
|
||
|
<p>Much better already. Who needed readable code, anyway?</p>
|
||
|
If we compare the Sobel edge mask with the Kirsch operator's mask, we can clearly see the improved accuracy.
|
||
|
(Hover=Kirsch)<br>
|
||
|
<img src="/media/articles/res_edge/kuzu_sobel.png" onmouseover="this.setAttribute('src','/media/articles/res_edge/kuzu_kirsch.png')"
|
||
|
onmouseout="this.setAttribute('src','/media/articles/res_edge/kuzu_sobel.png')"><br>
|
||
|
<p> The higher overall sensitivity of the detection also results in more noise being visible in the edge mask.
|
||
|
This can be remedied by denoising the image prior to the analysis.<br>
|
||
|
The increase in accuracy comes at an almost negligible cost in terms of computational complexity. About
|
||
|
175 fps
|
||
|
for 8-bit 1080p content (luma only) compared to 215 fps with the previously shown sobel
|
||
|
<span title="You know why I'm putting this in quotes">‘implementation’</span>. The internal Sobel filter is
|
||
|
not used for this comparison as it also includes a high- and lowpass function as well as scaling options,
|
||
|
making it slower than the Sobel function above. Note that many of the edges are also detected by the Sobel
|
||
|
operator, however, these are very faint and only visible after an operation like std.Binarize.</p>
|
||
|
A more sophisticated way to generate an edge mask is the TCanny algorithm which uses a similar procedure to find
|
||
|
edges but
|
||
|
then reduces these edges to 1 pixel thin lines. Optimally, these lines represent the middle
|
||
|
of each edge, and no edge is marked twice. It also applies a gaussian blur to the image to eliminate noise
|
||
|
and other distortions that might incorrectly be recognized as edges. The following example was created with
|
||
|
TCanny using these settings: <code>core.tcanny.TCanny(op=1,
|
||
|
mode=0)</code>. op=1 uses a modified operator that has been shown to achieve better signal-to-noise ratios<span
|
||
|
class="source"><a href="http://www.jofcis.com/publishedpapers/2011_7_5_1516_1523.pdf">[src]</a></span>.<br>
|
||
|
<img src="/media/articles/res_edge/kuzu_tcanny.png"><br>
|
||
|
Since I've already touched upon bigger convolutions earlier without showing anything specific, here is an
|
||
|
example of the things that are possible with 5x5 convolutions.
|
||
|
<pre><code class="python"
|
||
|
>src.std.Convolution(matrix=[1, 2, 4, 2, 1,
|
||
|
2, -3, -6, -3, 2,
|
||
|
4, -6, 0, -6, 4,
|
||
|
2, -3, -6, -3, 2,
|
||
|
1, 2, 4, 2, 1], saturate=False)</code></pre>
|
||
|
<img src="/media/articles/res_edge/kuzu5x5.png">
|
||
|
This was an attempt to create an edge mask that draws around the edges. With a few modifications, this might
|
||
|
become useful for
|
||
|
halo removal or edge cleaning. (Although something similar (probably better) can be created with a regular edge
|
||
|
mask, std.Maximum, and std.Expr)
|
||
|
<a id="c_deband" href="#c_deband"><p class="subhead">Using edge masks</p></a>
|
||
|
<p>
|
||
|
Now that we've established the basics, let's look at real world applications. Since 8-bit video sources are
|
||
|
still everywhere, barely any encode can be done without debanding. As I've mentioned before, restoration
|
||
|
filters
|
||
|
can often induce new artifacts, and in the case of debanding, these artifacts are loss of detail and, for
|
||
|
stronger debanding, blur. An edge mask could be used to remedy these effects, essentially allowing the
|
||
|
debanding
|
||
|
filter to deband whatever it deems necessary and then restoring the edges and details via
|
||
|
std.MaskedMerge.</p>
|
||
|
<p>
|
||
|
GradFun3 internally generates a mask to do exactly this. f3kdb, the <span
|
||
|
title="I know that “the other filter” is misleading since GF3 is not a filter but a script, but if that's your only concern so far, I must be doing a pretty good job.">other popular debanding filter</span>,
|
||
|
does not have any integrated masking functionality.</p>
|
||
|
Consider this image:<br>
|
||
|
<img src="/media/articles/res_edge/aldnoah.png">
|
||
|
<p>
|
||
|
As you can see, there is quite a lot of banding in this image. Using a regular debanding filter to remove it
|
||
|
would likely also destroy a lot of small details, especially in the darker parts of the image.<br>
|
||
|
Using the Sobel operator to generate an edge mask yields this (admittedly rather disappointing) result:</p>
|
||
|
<img src="/media/articles/res_edge/aldnoah_sobel.png">
|
||
|
<p>
|
||
|
In order to better recognize edges in dark areas, the retinex
|
||
|
algorithm can be used for local contrast enhancement.</p>
|
||
|
<img src="/media/articles/res_edge/aldnoah_retinex.png">
|
||
|
<div style="font-size: 80%; text-align: right">The image after applying the retinex filter, luma only.</div>
|
||
|
<p>
|
||
|
We can now see a lot of information that was previously barely visible due to the low contrast. One might
|
||
|
think
|
||
|
that preserving this information is a vain effort, but with HDR-monitors slowly making their way into the
|
||
|
mainstream and more possible improvements down the line, this extra information might be visible on consumer
|
||
|
grade screens at some point. And since it doesn't waste a noticeable amount of bitrate, I see no harm in
|
||
|
keeping it.</p>
|
||
|
Using this newly gained knowledge, some testing, and a little bit of magic, we can create a surprisingly
|
||
|
accurate edge mask.
|
||
|
<pre><code class="python">def retinex_edgemask(luma, sigma=1):
|
||
|
ret = core.retinex.MSRCP(luma, sigma=[50, 200, 350], upper_thr=0.005)
|
||
|
return core.std.Expr([kirsch(luma), ret.tcanny.TCanny(mode=1, sigma=sigma).std.Minimum(
|
||
|
coordinates=[1, 0, 1, 0, 0, 1, 0, 1])], 'x y +')</code></pre>
|
||
|
<p>Using this code, our generated edge mask looks as follows:</p>
|
||
|
<img src="/media/articles/res_edge/aldnoah_kage.png">
|
||
|
<p>
|
||
|
By using std.Binarize (or a similar lowpass/highpass function) and a few std.Maximum and/or std.Inflate
|
||
|
calls, we can transform this edgemask into a
|
||
|
more usable detail mask for our debanding function or any other function that requires a precise edge mask.
|
||
|
</p>
|
||
|
<a id="c_performance" href="#c_performance"><p class="subhead">Performance</p></a>
|
||
|
Most edge mask algorithms are simple convolutions, allowing them to run at over 100 fps even for HD content. A
|
||
|
complex algorithm like retinex can obviously not compete with that, as is evident by looking at the benchmarks.
|
||
|
While a simple edge mask with a Sobel kernel ran consistently above 200 fps, the function described above only
|
||
|
procudes 25 frames per second. Most of that speed is lost to retinex, which, if executed alone, yields about
|
||
|
36.6 fps. A similar, <span title="and I mean a LOT more inaccurate">albeit more inaccurate</span>, way to
|
||
|
improve the detection of dark, low-contrast edges would be applying a simple curve to the brightness of the
|
||
|
image.
|
||
|
<pre><code class="python">bright = core.std.Expr(src, 'x 65535 / sqrt 65535 *')</code></pre>
|
||
|
This should (in theory) improve the detection of dark edges in dark images or regions by adjusting their
|
||
|
brightness
|
||
|
as shown in this curve:
|
||
|
<img src="/media/articles/res_edge/sqrtx.svg"
|
||
|
title="yes, I actually used matplotlib to generate my own image for sqrt(x) rather than taking one of the millions available online">
|
||
|
<a id="c_end" href="#c_end"><h2 class="subhead">Conclusion</h2></a>
|
||
|
Edge masks have been a powerful tool for image analysis for decades now. They can be used to reduce an image to
|
||
|
its most essential components and thus significantly facilitate many image analysis processes. They can also be
|
||
|
used to great effect in video processing to minimize unwanted by-products and artifacts of more agressive
|
||
|
filtering. Using convolutions, one can create fast and accurate edge masks, which can be customized and adapted
|
||
|
to serve any specific purpose by changing the parameters of the kernel. The use of local contrast enhancement
|
||
|
to improve the detection accuracy of the algorithm was shown to be possible, albeit significantly slower.<br><br>
|
||
|
<pre><code class="python"># Quick overview of all scripts described in this article:
|
||
|
################################################################
|
||
|
|
||
|
# Use retinex to greatly improve the accuracy of the edge detection in dark scenes.
|
||
|
# draft=True is a lot faster, albeit less accurate
|
||
|
def retinex_edgemask(src: vs.VideoNode, sigma=1, draft=False) -> vs.VideoNode:
|
||
|
core = vs.get_core()
|
||
|
src = mvf.Depth(src, 16)
|
||
|
luma = mvf.GetPlane(src, 0)
|
||
|
if draft:
|
||
|
ret = core.std.Expr(luma, 'x 65535 / sqrt 65535 *')
|
||
|
else:
|
||
|
ret = core.retinex.MSRCP(luma, sigma=[50, 200, 350], upper_thr=0.005)
|
||
|
mask = core.std.Expr([kirsch(luma), ret.tcanny.TCanny(mode=1, sigma=sigma).std.Minimum(
|
||
|
coordinates=[1, 0, 1, 0, 0, 1, 0, 1])], 'x y +')
|
||
|
return mask
|
||
|
|
||
|
|
||
|
# Kirsch edge detection. This uses 8 directions, so it's slower but better than Sobel (4 directions).
|
||
|
# more information: https://ddl.kageru.moe/konOJ.pdf
|
||
|
def kirsch(src: vs.VideoNode) -> vs.VideoNode:
|
||
|
core = vs.get_core()
|
||
|
w = [5]*3 + [-3]*5
|
||
|
weights = [w[-i:] + w[:-i] for i in range(4)]
|
||
|
c = [core.std.Convolution(src, (w[:4]+[0]+w[4:]), saturate=False) for w in weights]
|
||
|
return core.std.Expr(c, 'x y max z max a max')
|
||
|
|
||
|
|
||
|
# should behave similar to std.Sobel() but faster since it has no additional high-/lowpass or gain.
|
||
|
# the internal filter is also a little brighter
|
||
|
def fast_sobel(src: vs.VideoNode) -> vs.VideoNode:
|
||
|
core = vs.get_core()
|
||
|
sx = src.std.Convolution([-1, -2, -1, 0, 0, 0, 1, 2, 1], saturate=False)
|
||
|
sy = src.std.Convolution([-1, 0, 1, -2, 0, 2, -1, 0, 1], saturate=False)
|
||
|
return core.std.Expr([sx, sy], 'x y max')
|
||
|
|
||
|
|
||
|
# a weird kind of edgemask that draws around the edges. probably needs more tweaking/testing
|
||
|
# maybe useful for edge cleaning?
|
||
|
def bloated_edgemask(src: vs.VideoNode) -> vs.VideoNode:
|
||
|
return src.std.Convolution(matrix=[1, 2, 4, 2, 1,
|
||
|
2, -3, -6, -3, 2,
|
||
|
4, -6, 0, -6, 4,
|
||
|
2, -3, -6, -3, 2,
|
||
|
1, 2, 4, 2, 1], saturate=False)</code></pre>
|
||
|
<div class="download_centered">
|
||
|
<span class="source">Some of the functions described here have been added to my script collection on Github<br></span><a href="https://gist.github.com/kageru/d71e44d9a83376d6b35a85122d427eb5">Download</a></div>
|
||
|
<br><br><br><br><br><br><span class="ninjatext">Mom, look! I found a way to burn billions of CPU cycles with my new placebo debanding script!</span>
|
||
|
</div>
|
||
|
</div>
|