, 2016, 17(1): 32-40 doi: 10.1631/FITEE.1500171

Original article

Image meshing via hierarchical optimization

XIE Hao, TONG Ruo-feng,

Institute of Artificial Intelligence, Zhejiang University, Hangzhou 310027, China

Corresponding authors: E-mail: xiehao@zju.edu.cn E-mail: trf@zju.edu.cn E-mail: xiehao@zju.edu.cn E-mail: trf@zju.edu.cn

First author contact: Corresponding Author

Received: 2015-05-27   Accepted: 2015-09-16  

Fund supported: the National Natural Science Foundation of ChinaNo. 61170141
National High-Tech R&D Program (863) of ChinaNo. 2013AA013903

Abstract

Vector graphic , as a kind of geometric representation of raster images, has many advantages, e.g., definition independence and editing facility. A popular way to convert raster images into vector graphics is {image meshing}, the aim of which is to find a mesh to represent an image as faithfully as possible. For traditional meshing algorithms, the crux of the problem resides mainly in the high non-linearity and non-smoothness of the objective, which makes it difficult to find a desirable optimal solution. To ameliorate this situation, we present a hierarchical optimization algorithm solving the problem from coarser levels to finer ones, providing initialization for each level with its coarser ascent. To further simplify the problem, the original non-convex problem is converted to a linear least squares one, and thus becomes convex, which makes the problem much easier to solve. A dictionary learning framework is used to combine geometry and topology elegantly. Then an alternating scheme is employed to solve both parts. Experiments show that our algorithm runs fast and achieves better results than existing ones for most images.

Keywords: Image meshing ; Hierarchical optimization ; Convexification

PDF (1027KB) Metadata Metrics Related articles Export EndNote| Ris| Bibtex  Favorite Suggest

Cite this article

XIE Hao, TONG Ruo-feng. Image meshing via hierarchical optimization. [J], 2016, 17(1): 32-40 doi:10.1631/FITEE.1500171

1 Introduction

How to represent an image is an old yet still open topic. Currently, two categories, namely raster image and vector graphic, serve in our daily life. A raster image basically stores color values for all pixels (Compressed formats such as JPG and PNG are not considered for simplicity), and vector graphic is a type of geometric representation. Generally, the predominance of vector graphics is in part due to facility for editing, independence on display resolution, and usually volume reduction of storage, etc. Thus, vector graphic is a potential tool for representing images.

Image meshing, as a class of methods for converting raster images into vector graphics, is a popular branch of 'image vectorization'. With image meshing, many further applications, e.g., image wrapping, image composition, and object editing, become facilitated. The main process of image meshing is to find a (usually triangular) mesh covering the whole input image (This means that no pixel in the image would be left), with each vertex given a color value defined as its located pixel in the input image. Then, color values of other positions on the mesh can be expressed by the vertex colors mentioned above (mostly through linear interpolation). In this way, the input image can be rendered totally by the mesh. Naturally, the goal of image meshing is to find such an optimal mesh, to minimize the difference between the input image and its rendered counterpart as much as possible.

To obtain this optimal mesh, various methods have been proposed, most of which employ the Euclidean distance in the color space to measure the difference. In this paper, a hierarchical optimization method is presented. An example of inputs and outputs is illustrated in Fig. 1. The most difficult point here is not the modeling of the problem, but the solution. As will be shown later, these objectives are usually highly non-linear and non-smooth, and direct use of conventional routines in numerical optimization methods may not achieve the desired results, or can hardly make the variables move even a tiny step in the solution space. In numerical optimization, a global optimal solution to a non-convex problem can rarely be obtained. Generally, initialization plays a key role in influencing the performance of non-convex optimization. Apart from the property of the problem itself, which can hardly be improved, a proper initialization of variables is much easier to achieve. To achieve a better initialization, a hierarchical manner is proposed in this paper. As a preprocessing step, the input image is filtered into several levels, ranging from coarse to fine, by, e.g., the bilateral filter, which preserves meaningful features. In this way, the coarsest level becomes much smoother, and thus may be easier to solve than the original one. Then, the result of a coarser level is recycled and employed as the initial value of its finer descendant. As a result, the performance of the optimizer for the finest level, i.e., the original image, is improved.

Fig. 1

Fig. 1   Inputs and outputs. Given an image, 'Lena' (a), and the sampling ratio (1% in this example), our algorithm generates a mesh-based vector graphic (d) with its rendered result (b). Detail regions of both (a) and (b) are enlarged and shown in (c), in which upper and lower parts correspond to (a) and (b), respectively. References to color refer to the online version of this figure


In addition to a hierarchical process, the objective function at the stage of geometry optimization is still non-convex. Generally, without using 'greedy' schemes, the remaining choices of global optimization are mostly quasi-Newton methods, e.g., L-BFGS (Liu and Nocedal, 1989), and the result is still highly dependent on the initial values. To further reduce the complexity of the problem, a conversion from a non-convex problem to a convex one is proposed. Since the color space is also a metric space, it can be considered together with the image space, which thus forms a new 5D metric space. In this way, a pixel in the image, as well as its attached color, is mapped into a 5D point in the newly constructed metric space. Therefore, the original non-convex problem is converted into a simple linear least squares one, and can be efficiently solved via some common routines, independent of the initialization. The running time can also be reduced.

The contributions of this paper are summarized as follows:

1. A hierarchical process is proposed, which provides an initial input for a finer level with the result of a coarser level to make the original problem smoother.

2. By combining the color space and image space, a conversion from a non-convex problem to a convex one is proposed to further reduce the complexity of the problem at the stage of geometry optimization.

2 Related work

Recently, many algorithms have been proposed on image meshing. The most representative one, proposed by Adams (2011), is a type of greedy algorithm. The key idea is to simply eliminate redundant vertices iteratively from an initial mesh with a denser distribution of vertices. In each loop, the vertex with the least influence on the result of the whole system was removed, and the edges were rearranged; i.e., vertices were re-connected, based on Delaunay triangulation. To accelerate the algorithm, Yang et al. (2003) employed an error diffusion scheme to generate an initial mesh, instead of the initial mesh used in an adaptive thinning method (Demaret and Iske, 2004), consisting of vertices at all pixels. Although this algorithm is fast, it does not involve the quality of the recovered image when rearranging edges (simply because of using Delaunay triangulation), and thus the quality of the generated mesh is not sufficient.

As an improvement, Xie et al. (2014) took the movement of existing vertices into account, and got rid of the constraints of Delaunay triangulation. In Xie et al. (2014), an initial mesh was optimized by performing geometry and topology optimization alternately. Instead of processing a vertex only once as in Adams (2011), Xie et al. (2014) re-adjusted the position of a vertex by a reasonable number of times, so that the local optimality of that vertex would not be corrupted by later processing of other vertices.

Another type of algorithm involves Xia et al. (2009) and Liao et al. (2012): the authors considered the generated mesh as a surface mesh, with a third coordinate of a vertex, i.e., the color of its located pixel. From this viewpoint, an image can be processed by use of some geometry tools. Note that the three-channel colors are processed separately and consistently. As a result, three meshes with the same connectivity are generated for one given image. Although this conversion gives a good result, the reasons are not clearly stated in either study.

Xia et al. (2009) first constructed an initial dense surface mesh, with some crack holes corresponding to the edge features of the input image, then performed a mesh simplification while maintaining the main edge features of the image, and finally converted it into a Bézier triangular mesh. This algorithm employs a thin-plate spline (TPS) to interpolate colors on the mesh, generating a better result. The shortcomings of this algorithm involve the complex representation of the mesh, and the non-adaptive sampling when calculating the coefficients of the TPS.

Rather than a Bézier triangular mesh, Liao et al. (2012) employed piecewise smooth subdivision surfaces. The initial mesh was the same as the one in Xia et al. (2009), and in the stage of mesh simplification, edge features were maintained at different levels to form a multi-resolution mesh representation. Using these features, one can edit the generated mesh easily, with manipulations such as object deformation and feature enhancement. Although this representation is flexible, it is still somewhat complicated (edge features need to be stored). In addition, there is no standard criterion for the quality of the recovered image for either algorithm proposed by Xia et al. (2009) or Liao et al. (2012). The difference from our algorithm is that these two methods are based mainly on non-global mesh operations, while in ours, a global scheme is employed.

Still, there are many other image meshing methods, such as Demaret et al. (2006), Lecot and Levy (2006), and Swaminarayan and Prasad (2006). These methods assume that the color variation in each region is relatively plain, which can also be considered as being generated by linear interpolation. In another type of algorithm, Sun et al. (2007) and Lai et al. (2009) used gradient mesh to represent images. Their results are good, yet the algorithms depend heavily on image segmentation, as they can deal only with isolated objects with smooth color variance.

3 Objective

Given an input raster image $\boldsymbol{I}$, our goal is to generate a triangular mesh $\boldsymbol{M}=\{\boldsymbol{V}, \boldsymbol{K}\}$ covering the image and representing it as faithfully as possible, where geometry $\boldsymbol{V}$ and topology $\boldsymbol{K}$ are described as follows.

For geometry information, let $\boldsymbol{v}_{i}\in \mathbb{R}^{2}$ ($i = 1, 2, \dots, n_{\mathrm{v}}$) denote a vertex of the triangular mesh, where $n_{\mathrm{v}}$ is the number of vertices, and $\boldsymbol{V}\in \mathbb{R}^{2\times n_{\mathrm{v}}}$, a matrix whose ith column represents vertex $\boldsymbol{v}_{i}$, denotes the set of all vertices on mesh $\boldsymbol{M}$. To recover image $\boldsymbol{I}$, each vertex is attached with a color value, and the colors of other positions on the mesh are linearly interpolated by neighboring vertices, namely all vertices of its belonging triangle. Inspired by Xiong et al. (2014), a dictionary learning framework is introduced for modeling both geometry and topology information.

A sparse matrix $\boldsymbol{K}\in \mathbb{R}^{n_{\mathrm{v}}\times n_{\mathrm{p}}}$ is constructed to denote the connectivity of vertices, whose column $\boldsymbol{\kappa_j}$ denotes the barycentric coordinates of pixel $\boldsymbol{p}_{j} \in \mathbb{R}^{2}$ ($j=1, 2, \dots, n_{\mathrm{p}}$), where $n_{\mathrm{p}}$ is the number of sampling points, i.e., the number of pixels. Note that the sparse matrix $\boldsymbol{K}$ should be constructed such that the mesh can form a manifold. That is, each column of $\boldsymbol{K}$ has at most three non-zero elements, corresponding to up to three vertices of a triangular face, and the summation of the magnitudes of these elements should be one. Thus, our goal is modeled as follows:

$\begin{gathered} ({{\boldsymbol{V}}^*},{{\boldsymbol{K}}^*}) = \mathop {\arg {\text{min}}}\limits_{{\boldsymbol{V}},{\boldsymbol{K}}} E({\boldsymbol{V}},{\boldsymbol{K}}) \hfill \\ {\text{s}}.{\text{t}}.{\mkern 1mu} {{\boldsymbol{\kappa }}_{\boldsymbol{i}}}{_0} \leqslant 3,{\mkern 1mu} {{\boldsymbol{\kappa }}_{\boldsymbol{i}}}{_1} = 1,i = 1,2, \ldots ,{n_{\text{p}}} \hfill \\ \end{gathered} $,

where $E(\boldsymbol{V}, \boldsymbol{K})$ measures the fitting error of the recovered image.

Let $\mathcal{I}\in\{f:\mathbb{R}^{2}\mapsto\mathbb{R}^{3}\}$. Then $\mathcal{I}(\boldsymbol{p})$ denotes the color value attached to pixel $\boldsymbol{p}$, and $E(\boldsymbol{V}, \boldsymbol{K})$ is thus expressed as

$\begin{gathered} E({\boldsymbol{V}},{\boldsymbol{K}}) = {\mkern 1mu} \frac{1}{{{n_{\text{p}}}}}\mathcal{I}({\boldsymbol{P}}) - \mathcal{I}({\boldsymbol{V}}){\boldsymbol{K}}_{\text{F}}^2 \\ = {\mkern 1mu} \frac{1}{{{n_{\text{p}}}}}\sum\limits_i {\mathcal{I}({{\boldsymbol{p}}_i}) - \mathcal{I}({\boldsymbol{V}}){{\boldsymbol{\kappa }}_{\boldsymbol{i}}}_2^2} \end{gathered} $,

where $\|\cdot\|_\mathrm{F}$ is the Frobenius norm, $\boldsymbol{P}\in \mathbb{R}^{2\times n_{\mathrm{p}}}$ is the set of all pixels, whose jth column denotes the jth pixel $\boldsymbol{p}_{j}$, and $\mathcal{I}(\boldsymbol{P}) = \{\mathcal{I}(\boldsymbol{p}_{i})\}_{i=1}^{n_{\mathrm{p}}} \in \mathbb{R}^{3\times n_{\mathrm{p}}}$, $\mathcal{I}(\boldsymbol{V}) = \{\mathcal{I}(\boldsymbol{v}_{i})\}_{i=1}^{n_{\mathrm{v}}} \in \mathbb{R}^{3\times n_{\mathrm{v}}}$.

Note that the sparse matrix $\boldsymbol{K}$ mentioned above is constructed to elegantly combine the geometry and topology information in Eq.(2). In this way, the connectivity of vertices is encoded into a matrix, which makes the objective easy to express. Also, note that the model in this study is actually linear, while a more general higher order model might generate a better visual effect. However, a higher order model can make the whole process more complicated, and entails more running time. Thus, a simple linear model is employed.

4 Hierarchical optimization

4.1 Overview

Once the problem has been modeled, the next task is to solve it. As shown in Algorithm 1, the pipeline simply involves several steps. As a preprocessing step, a hierarchy with several levels is established for the input image. An initial mesh is then generated, followed by the whole iterative process for all levels. At each level, a geometry and topology alternating optimization is performed. To further simplify the problem, a convexification of the objective function is performed beforehand. Some details are discussed in the following paragraphs.

Algorithm 1   Pipeline of our algorithm

1: HierarchyEstablishment
2: MeshInitialization
3: $i\Leftarrow N - 1$
4: while Level i ≥ 0 do
5:     ObjectiveConvexification(i)
6:     AlternatingOptimization(i)
7:    $i\Leftarrow i - 1$
8: end while

New window| CSV


4.2 Hierarchy establishment

As shown in Eq. (1), the objective function is highly non-linear and non-smooth, and thus direct use of common routines of optimization algorithms, e.g., L-BFGS, is not quite suitable in this context. Since the body of the algorithm itself can hardly be improved for such a non-convex problem, another path should be sought. For non-convex optimization, a proper initialization is significant. For this, a hierarchical process is proposed. Specifically, the input image is filtered into N levels beforehand. Intuitively, the coarser level is smoother than the finer one, and is supposed to have better performance through the same algorithm. Therefore, as illustrated in Fig. 2, the whole process starts from the coarsest level (level N-1), travels through all other levels of images, ranging from coarser to finer, and finally reaches the original input image (the finest level, level 0). When dealing with level i (i<N-1), the result from level i+1 is used as an initial value.

Fig. 2

Fig. 2   Hierarchy establishment: a hierarchy with N levels, from coarser (higher level) to finer (lower level), is established. At each level, an image of this level and a mesh from the last level are employed to generate a mesh of this level. Note that the mesh of level 0 is the desired final mesh


As for the choice of the filter, our goal is to make the problem smoother, while preserving some necessary features in the image. Thus, general smoothing filters, e.g., the Gaussian filter, are not quite suitable. Instead, filters preserving features are preferable. In our algorithm, the simple bilateral filter is chosen. In fact, other filters, such as the filter proposed in Xu et al. (2011), might have a better result, yet are somewhat time-consuming compared with the bilateral filter.

4.3 Mesh initialization

Before the main process of various levels, an initial mesh for the coarsest level, i.e., level N-1, needs to be generated properly. Although our algorithm has convexified the problem at the stage of geometry optimization, the situation at the stage of topology optimization is not improved, and the whole problem is still somewhat complex. Thus, a totally random initial mesh might generate a undesirable result, and involve a longer running time. Therefore, to achieve better performance and effect, the results generated by Adams (2011) are employed as the very beginning initialization in our algorithm.

4.4 Objective convexification

To further simplify the problem, a convexification of the objective is performed. Specifically, the position of each vertex $\boldsymbol{v}_{i}\in\mathbb{R}^{2}$ on mesh $\boldsymbol{M}$, together with its attached color value $\mathcal{I}(\boldsymbol{v}_{i})\in\mathbb{R}^{3}$, is mapped into a point $\hat{\boldsymbol{v}}_{i}=(\boldsymbol{v}^\mathrm{T}_{i}, \mathcal{I}(\boldsymbol{v}_{i})^\mathrm{T})^\mathrm{T}\in\mathbb{R}^{5}$. Thus, the planar mesh becomes a 2-manifold in a 5-dimensional (5D) space. Similarly, all pixels of the input image $\boldsymbol{I}$ are converted into such 5D points. In this way, the original non-convex problem is converted into a convex one. Thus, Eq. (2) is converted as follows:

$E\left({\boldsymbol{\hat V}} \right) = \frac{1}{{{\eta _{\text{p}}}}}\left\| {\boldsymbol{\hat P} - \boldsymbol{\hat VK}} \right\|_{\text{F}}^2 = \frac{1}{{{\eta _{\text{p}}}}}\sum\limits_i {\left\| {{{\boldsymbol{\hat p}}_i} - \boldsymbol{\hat V{\kappa _i}}} \right\|_2^2} $,

where $\hat{\boldsymbol{p}}_{i}$ is the ith column of $\hat{\boldsymbol{P}}\in\mathbb{R}^{5\times n_{\mathrm{p}}}$, and $\hat{\boldsymbol{v}}_{i}$ is the ith column of $\hat{\boldsymbol{V}}\in\mathbb{R}^{5\times n_{\mathrm{v}}}$.

The rationale for this conversion is explained as follows. In the original problem, the metrics in the image space and color space are both strongly related to the Euclidian distances. Therefore, the combination of these two spaces is possible in form. On the other hand, it makes sense in geometry. In fact, this is a type of approximation to the original problem. Take a 1D image with one channel as an example (Fig.3). Fig.3a shows the color distance in the original problem, where the segments (red line) connecting the recovered image (slope straight line) and original one (dashed curve) are orthogonal to the image plane (the horizontal line below); in contrast, in Fig.3b, this line is orthogonal to the triangular face of the mesh. When the distances become small enough, these two forms are supposed to generate similar effects. Based on the above explanation, this convexification is proposed and employed in our model.

Fig. 3

Fig. 3   Image meshing in a 1D image: the original problem (a) and that after convexification (b). References to color refer to the online version of this figure


4.5 Alternating optimization

After the preprocessing, the solution for each level can be solved via common routines. In general, to solve this problem, geometry and topology aspects are usually treated separately and alternately in each iteration, as in Xie et al.(2014). Both sub-optimization stages are discussed below.

4.5.1 Geometry optimization

In this stage, the topology $\boldsymbol{K}$ of the mesh, i.e., connection of vertices, is fixed, and the only variable concerned is the position matrix of vertices, $\hat{\boldsymbol{V}}$. Since the original problem has been converted into a simple linear least squares one (Section 4.4), the preconditioned linear conjugate gradient method is employed to solve $\hat{\boldsymbol{V}}$. Note that the rows of $\hat{\boldsymbol{V}}$ are solved separately using the same factorization of $\boldsymbol{KK}^\mathrm{T}$, which can be done beforehand to speed up the process.

4.5.2 Topology optimization

Given a fixed geometry, i.e., positions of vertices $\hat{\boldsymbol{V}}$, the task of this stage is then to optimize the topology, i.e., the connectivity of vertices $\boldsymbol{K}$. As is known to all, topology operations involve edge flip, edge or face collapse, edge or face split, etc. Since the positions of vertices are constant in this context, which means the number of vertices is not changed, the only operator left is edge flip. To further reduce the total energy, the strategy in Xie et al.(2014) is employed, the process of which is described briefly below. For further details, please refer to Xie et al.(2014).

Before flipping the edges, a suspect set of edges is defined, the element in which is an edge indicating that it might be flipped. At the beginning, all edges excluding boundary edges, i.e., interior edges, are inserted into the set. To sort these suspect edges, a fitting error defined as the summation of fitting errors of two adjacent faces is attached to each suspect as a criterion. Then the suspect set becomes a priority queue. Each time, a suspect is popped up and checked whether to flip or not, depending on whether the total energy declines. If a suspect is flipped, its neighbor edges are also labeled as suspects and pushed into the queue. The process ends when the queue is empty. To avoid possible endless loops, the times visited for each edge are recorded, and edges visited enough times are no longer concerned in this pass.

5 Experiments and discussion

In our experiments, all images are tested on a PC with AMD PhenomTM II X4 945 CPU and 8~GB RAM. All the algorithms are implemented in an Arch Linux operating system, using C++ language, with the Surface_mesh library (Sieger and Botsch, 2012) employed to represent the mesh data structure.

5.1 Measurements

The peak signal-to-noise ratio (PSNR) (HuynhThu and Ghanbari, 2008) and running time are used to evaluate our algorithm. Here,

where $\rho$ is the number of bits or samples in source or target images, and $\mathrm{MSE}$ is the mean squared error between the original image and reconstructed image, which is the total fitting error in this context (Section 4.4). Generally, the larger the PSNR, the better the performance of fitting.

5.2 Parameters

The first parameter to be tuned is the number of hierarchy levels. In general, deeper levels might not produce better performance. To verify this, several images are tested via various levels. In our experiments, we find that using two or three levels usually generates better results. The reason for this is that after filtering enough times, the image is smooth enough, and the influence of initialization on the optimization fades.

Other parameters are described here. In bilateral filtering, the diameter of each pixel neighborhood that is used during filtering is set to 9, and the two parameters used to measure the size of convolution kernels in the color and coordinate space, $\sigma_{\mathrm{Color}}$ and $\sigma_{\mathrm{Space}}$, are both set to 50. As for the number of iterations in each level, we find two iterations give a trade-off between performance and speed.

5.3 Comparisons

In this section, two algorithms given in Adams (2011) and Xie et al. (2014) are compared with ours. The images for evaluation are selected diversely, ranging from human to natural plants and animals. The results of PSNR are listed in Table 1. As shown, our algorithm performs better than the other two for most of the test images. Note that the PSNR value obtained here using the algorithm proposed by Xie et al. (2014) has some tiny difference from that given in Xie et al. (2014), due to the implementation. In Xie et al. (2014), the algorithm is for single-channel images, while in this study, it is implemented for color images, for convenience of comparison.

Table 1   Peak signal-to-noise ratio (PSNR) comparison among our algorithm and the algorithm proposed by Adams (2011) and Xie et al. (2014)

Image SR (%) PSNR (dB)
Adams(2011) Xieet al.(2014) Ours
Lena127.375928.860029.7244
229.826330.954431.3691
330.980131.991832.1422
Fruits125.926926.958927.7363
228.532729.442429.7518
329.962930.802430.5960
Flowe125.802826.971827.7477
227.921729.607129.9863
328.661530.800930.9495
Falcon127.319629.966630.4886
228.532331.972932.1160
328.867532.787732.7106
Car123.164524.647825.0299
225.694126.885727.2032
326.816327.949028.0324
Fish123.987325.230125.9045
226.629627.763628.1316
328.107429.216629.2648
Leaf126.806127.881628.6272
228.839529.984630.3554
329.592431.000831.1287
Swan124.482625.961426.7547
226.989028.378328.9821
328.365129.701930.0601
Frog127.440428.977629.8903
229.884831.058431.5725
330.879131.968932.2853

SR: sampling rate

New window| CSV


To compare the mesh generated, the wireframes of both initial and final results are shown in Fig. 4. %The initial mesh is shown left, while the optimized counterpart is shown on the right. The mesh from Adams (2011) is a Delaunay triangular mesh, which has less ability than ours to express detailed features. Note that both meshes shown are generated by a 1% sampling rate.

Fig. 4

Fig. 4   Wireframes: (a) is the mesh generated from Adams (2011), employed as the initial mesh; (b) is ours. As shown, the triangles near edges are skinnier after optimization of our algorithm, to represent features better. References to color refer to the online version of this figure


To further illustrate the effect of our algorithm, some images rendered by the three algorithms, and the original ones, are shown in Fig. 5 (see p.39), with some details highlighted and compared.

Fig. 5

Fig. 5   {Detailed comparison:} the odd rows show the recovered images with original sizes, while the even rows enlarge their highlighted regions. From left to right: results obtained by the algorithms proposed by Adams (2011), Xie et al. (2014) and this study, and the original raster images. References to color refer to the online version of this figure


As demonstrated in Fig. 5, our algorithm has a better visual effect than the other two. For instance, the mouth of the frog in the first image is recovered much better, which can be seen in the enlarged regions in the second row. The competitiveness of our algorithm can also be seen in the edge of the leaf.

Among many factors, the main reason why our algorithm performs better is the choice of the initial value. A good initial value makes the problem much smoother, and thus makes it easier to reach a better local optimality. In addition, our algorithm converts a non-linear problem into a linear least squares one at the stage of geometry optimization, and such convexification of the original problem speeds up the whole process.

The running time of our algorithm is listed in Table 2. Although not in real time, it is already acceptable for daily use. Generally, a good initialization leads to a shorter running time, since it locates near the local optimum in the solution space.

Table 2   Runtime of our algorithm

Image Resolution SR(%) Time(s)
Lena512×51212.03
23.66
36.23
Fruits512×48011.75
23.10
35.19
Flower400×30010.66
21.02
31.61
Falcon400×28010.59
20.86
31.33
Car400×30010.64
21.04
31.61
Fish320×40010.66
21.05
31.66
Leaf373×40010.83
21.36
32.20
Swan286×40010.60
20.95
31.48
Frog357×40010.90
21.44

SR: sampling rate

New window| CSV


5.4 Limitations

Our algorithm has some flaws. For instance, it is not quite suitable for images with complicated textures. Fig. 6 shows such an example of a failure case. The textures in the water and feather are not well recovered. Since the initial meshes are obtained by bilateral filtering, the results from our algorithm lack some details of complicated textures, while the main features are preserved.

Fig. 6

Fig. 6   A failure case: (a) is the result of swan by our algorithm at a 1% sampling rate; (b) is the original image. References to color refer to the online version of this figure


6 Conclusions

We present a novel algorithm to convert a raster image into a vector graphic represented by a triangular mesh. Compared with other algorithms, the results are improved by establishing a hierarchy of several levels for the input image. In addition, a convexification of the problem in the stage of geometry optimization is performed to further simplify the non-linear problem into a linear least squares one, which can be solved easily via common optimizers.

In the future, we will seek a way to obtain a better initial value. For this, the PatchNet technique (Hu et al., 2013) might be a favorable choice, for our algorithm to achieve a better local optimum. Meanwhile, we will consider using a more general non-linear model to represent images. We believe such a model would render results with better quality.

Reference

Adams MD .

A flexible content-adaptive meshgeneration strategy for image representation

IEEE Trans. Image Process. 2011, 20(9): 2414-2427 doi: 10.1109/TIP.2011.2128336

DOI:10.1109/TIP.2011.2128336      URL     PMID:21421439      [Cited within: 9]

Based on the greedy-point removal (GPR) scheme of Demaret and Iske, a simple yet highly effective framework for constructing triangle-mesh representations of images, called GPRFS, is proposed. By using this framework and ideas from the error diffusion (ED) scheme (for mesh-generation) of Yang , a highly effective mesh-generation method, called GPRFS-ED, is derived and presented. Since the ED scheme plays a crucial role in our work, factors affecting the performance of this scheme are also studied in detail. Through experimental results, our GPRFS-ED method is shown to be capable of generating meshes of quality comparable to, and in many cases better than, the state-of-the-art GPR scheme, while requiring substantially less computation and memory. Furthermore, with our GPRFS-ED method, one can easily trade off between mesh quality and computational/memory complexity. A reduced-complexity version of the GPRFS-ED method (called GPRFS-MED) is also introduced to further demonstrate the computational/memory-complexity scalability of our GPRFS-ED method.

Demaret L , Iske A .

Advances in digital image compression by adaptive thinning

Ann. MCFA. 2004, 3: 105 -109

URL     [Cited within: 1]

This paper proposes a novel concept for digital image compression. The resulting compression scheme relies on adaptive thinning algorithms, which are recent multiresolution methods from scattered data approximation. Adaptive thinning algorithms are recursive point removal schemes, which are combined with piecewise linear interpolation over decremental Delaunay triangulations. This paper shows the utility of adaptive thinning algorithms in digital image compression. To this end, specific adaptive pixel removal criteria are designed for multiresolution modelling of digital images. This is combined with a previous customized coding scheme for scattered data. The good performance of our compression scheme is finally shown in comparison with the well-established wavelet-based compression method SPIHT.

Demaret L , Dyn N , Iske A .

Image compression by linear splines over adaptive triangulations

Signal Process. 2006, 86(7): 1604 -1616 doi: 10.1016/j.sigpro.2005.09.003

DOI:10.1016/j.sigpro.2005.09.003      URL     [Cited within: 1]

This paper proposes a new method for image compression. The method is based on the approximation of an image, regarded as a function, by a linear spline over an adapted triangulation, D ( Y ), which is the Delaunay triangulation of a small set Y of significant pixels. The linear spline minimizes the distance to the image, measured by the mean square error, among all linear splines over D ( Y ). The significant pixels in Y are selected by an adaptive thinning algorithm, which recursively removes less significant pixels in a greedy way, using a sophisticated criterion for measuring the significance of a pixel. The proposed compression method combines the approximation scheme with a customized scattered data coding scheme. We compare our compression method with JPEG2000 on two geometric images and on three popular test cases of real images.

Hu SM , Zhang FL , Wang M . et al. .

PatchNet: a patch-based image representation for interactive librarydriven image editing

ACM Trans. Graph. 2013, 32(6): 196 doi: 10.1145/2508363.250838

DOI:10.1145/2508363.250838      URL     [Cited within: 1]

We introduce PatchNets, a compact, hierarchical representation describing structural and appearance characteristics of image regions, for use in image editing. In a PatchNet, an image region with coherent appearance is summarized by a graph node, associated with a single representative patch, while geometric relationships between different regions are encoded by labelled graph edges giving contextual information. The hierarchical structure of a PatchNet allows a coarse-to-fine description of the image. We show how this PatchNet representation can be used as a basis for interactive, library-driven, image editing. The user draws rough sketches to quickly specify editing constraints for the target image. The system then automatically queries an image library to find semantically-compatible candidate regions to meet the editing goal. Contextual image matching is performed using the PatchNet representation, allowing suitable regions to be found and applied in a few seconds, even from a library containing thousands of images.

Huynh-Thu Q , Ghanbari M .

Scope of validity of PSNR in image/video quality assessment

Electron. Lett. 2008, 44 (13) : 800 -801 doi: 10.1049/el:20080522

DOI:10.1049/el:20080522      URL    

Experimental data are presented that clearly demonstrate the scope of application of peak signal-to-noise ratio (PSNR) as a video quality metric. It is shown that as long as the video content and the codec type are not changed, PSNR is a valid quality measure. However, when the content is changed, correlation between subjective quality and PSNR is highly reduced. Hence PSNR cannot be a reliable method for assessing the video quality across different video contents.

Lai YK , Hu SM , Martin RR .

Automatic and topology-preserving gradient mesh generation for image vectorization

ACM Trans. Graph. 2009, 28 (3): 85 doi: 10.1145/1531326.1531391

DOI:10.1145/1531326.1531391      URL     [Cited within: 1]

Gradient mesh vector graphics representation, used in commercial software, is a regular grid with specified position and color, and their gradients, at each grid point. Gradient meshes can compactly represent smoothly changing data, and are typically used for single objects. This paper advances the state of the art for gradient meshes in several significant ways. Firstly, we introduce a topology-preserving gradient mesh representation which allows an arbitrary number of holes. This is important, as objects in images often have holes, either due to occlusion, or their 3D structure. Secondly, our algorithm uses the concept of image manifolds, adapting surface parameterization and fitting techniques to generate the gradient mesh in a fully automatic manner. Existing gradient-mesh algorithms require manual interaction to guide grid construction, and to cut objects with holes into disk-like regions. Our new algorithm is empirically at least 10 times faster than previous approaches. Furthermore, image segmentation can be used with our new algorithm to provide automatic gradient mesh generation for a whole image. Finally, fitting errors can be simply controlled to balance quality with storage.

Lecot G , Levy B .

Ardeco: automatic region detection and conversion

2006 17th Eurographics Symp. on Rendering, p. 349: 360 doi: 10.2312/EGWR/EGSR06/349-360

DOI:10.2312/EGWR/EGSR06/349-360      URL     [Cited within: 1]

Liao ZC , Hoppe H , Forsyth D . et al. .

A subdivision-based representation for vector image editing

IEEE Trans. Vis. Comput. Graph. 2012, 18 (11): 1858-1867 doi: 10.1109/TVCG.2012.76

DOI:10.1109/TVCG.2012.76      URL     PMID:22392717      [Cited within: 3]

Vector graphics has been employed in a wide variety of applications due to its scalability and editability. Editability is a high priority for artists and designers who wish to produce vector-based graphical content with user interaction. In this paper, we introduce a new vector image representation based on piecewise smooth subdivision surfaces, which is a simple, unified and flexible framework that supports a variety of operations, including shape editing, color editing, image stylization, and vector image processing. These operations effectively create novel vector graphics by reusing and altering existing image vectorization results. Because image vectorization yields an abstraction of the original raster image, controlling the level of detail of this abstraction is highly desirable. To this end, we design a feature-oriented vector image pyramid that offers multiple levels of abstraction simultaneously. Our new vector image representation can be rasterized efficiently using GPU-accelerated subdivision. Experiments indicate that our vector image representation achieves high visual quality and better supports editing operations than existing representations.

Liu DC , Nocedal J .

On the limited memory BFGS method for large-scale optimization

Math. Program. 1989, 45 (3): 503-528 doi: 10.1007/BF01589116

DOI:10.1007/BF01589116      [Cited within: 1]

Sieger D , Botsch M .

Design, implementation, and evaluation of the surface_mesh data structure

2012, Proc. 20th Int. Meshing Roundtable: p.533-550 doi: 10.1007/978-3-642-24734-7_29

DOI:10.1007/978-3-642-24734-7_29      URL    

We present the design, implementation, and evaluation of an efficient and easy to use data structure for polygon surface meshes. The design choices that arise during development are systematically investigated and detailed reasons for choosing one alternative over another are given. We describe our implementation and compare it to other contemporary mesh data structures in terms of usability, computational performance, and memory consumption. Our evaluation demonstrates that our new Surface_mesh data structure is easier to use, offers higher performance, and consumes less memory than several state-of-the-art mesh data structures.

Sun J , Liang L , Wen F . et al. .

Image vectorization using optimized gradient meshes

ACM Trans. Graph. 2007, 26 (3): 11 doi: 10.1145/1239451.1239462

DOI:10.1145/1239451.1239462      URL     [Cited within: 1]

Recently, gradient meshes have been introduced as a powerful vector graphics representation to draw multicolored mesh objects with smooth transitions. Using tools from Abode Illustrator and Corel CorelDraw, a user can manually create gradient meshes even for photo-realistic vector arts, which can be further edited, stylized and animated. In this paper, we present an easy-to-use interactive tool, called optimized gradient mesh, to semi-automatically and quickly create gradient meshes from a raster image. We obtain the optimized gradient mesh by formulating an energy minimization problem. The user can also interactively specify a few vector lines to guide the mesh generation. The resulting optimized gradient mesh is an editable and scalable mesh that otherwise would have taken many hours for a user to manually create.

Swaminarayan S Prasad L .

Rapid automated polygonal image decomposition

2006 35th IEEE Applied Imagery and Pattern Recognition Workshop, p.28-33 doi: 10.1109/AIPR.2006.30

DOI:10.1109/AIPR.2006.30      URL    

We present RaveGrid, a software that efficiently converts a raster image to a scalable vector image comprised of polygons whose boundaries conform to the edges in the image. The resulting vector image has good visual quality and fidelity and can be displayed at various sizes and on various display screen resolutions. The software can render vector images in the SVG (scalable vector graphics) format or in EPS (Encapsulated Postscript). The ubiquity of image data in graphics, on the Web, and in communications, as well as the wide range of devices, from big screen TVs to hand-held cellular phones that support image display, calls for a scalable and more manipulable representation of imagery. Moreover, with the growing need for automating image-based search, object recognition, and image understanding, it is desirable to represent image content at a semantically higher level by means of tokens that support computer vision tasks.

Xia T , Liao BB , Yu YZ .

Patch-based image vectorization with automatic curvilinear feature alignment

ACM Trans. Graph. 2009, 28 (5): 115 doi: 10.1145/1618452.1618461

DOI:10.1145/1618452.1618461      URL     [Cited within: 4]

Raster image vectorization is increasingly important since vector-based graphical contents have been adopted in personal computers and on the Internet. In this paper, we introduce an effective vector-based representation and its associated vectorization algorithm for full-color raster images. There are two important characteristics of our representation. First, the image plane is decomposed into nonoverlapping parametric triangular patches with curved boundaries. Such a simplicial layout supports a flexible topology and facilitates adaptive patch distribution. Second, a subset of the curved patch boundaries are dedicated to faithfully representing curvilinear features. They are automatically aligned with the features. Because of this, patches are expected to have moderate internal variations that can be well approximated using smooth functions. We have developed effective techniques for patch boundary optimization and patch color fitting to accurately and compactly approximate raster images with both smooth variations and curvilinear features. A real-time GPU-accelerated parallel algorithm based on recursive patch subdivision has also been developed for rasterizing a vectorized image. Experiments and comparisons indicate our image vectorization algorithm achieves a more accurate and compact vector-based representation than existing ones do.

Xie H , Tong RF , Zhang Y .

Image meshing via alternative optimization

J. Comput. Inform. Syst. 2014, 10 (19): 8209-8217 doi: 10.12733/jcis11723

DOI:10.12733/jcis11723      URL     [Cited within: 13]

Image meshing is a type of method on image vectorization. It generates a proper mesh covering and representing a given input raster image. In this paper, we present a novel image meshing algorithm through alternatively performing both geometry and topology optimizations. In each iteration, we guarantee that the total energy strictly declines. Rather than processing a vertex only once as other existing methods do, our algorithm could re-adjust position for a vertex a reasonable number of times, so that the local optimality of that vertex would not be corrupted because of later processing of other vertices. Although our algorithm could not generate a global optimal mesh, a local optimality could be obtained in each iteration. Given a legal initial mesh with proper topology, our algorithm could converge within several iterations, and generate acceptable results, usually with enhanced quality compared to other approaches.

Xiong SY , Zhang JY , Zheng JM . et al. .

Robust surface reconstruction via dictionary learning

ACM Trans. Graph. 2014, 33(6) doi: 10.1145/2661229.2661263

DOI:10.1145/2661229.2661263      URL     [Cited within: 1]

Surface reconstruction from point cloud is of great practical importance in computer graphics. Existing methods often realize reconstruction via a few phases with respective goals, whose integration may not give an optimal solution. In this paper, to avoid the inherent limitations of multi-phase processing in the prior art, we propose a unified framework that treats geometry and connectivity construction as one joint optimization problem. The framework is based on dictionary learning in which the dictionary consists of the vertices of the reconstructed triangular mesh and the sparse coding matrix encodes the connectivity of the mesh. The dictionary learning is formulated as a constrained &ell;2,q-optimization (0 &lt; q &lt; 1), aiming to find the vertex position and triangulation that minimize an energy function composed of point-to-mesh metric and regularization. Our formulation takes many factors into account within the same framework, including distance metric, noise/outlier resilience, sharp feature preservation, no need to estimate normal, etc., thus providing a global and robust algorithm that is able to efficiently recover a piecewise smooth surface from dense data points with imperfections. Extensive experiments using synthetic models, real world models, and publicly available benchmark show that our method outperforms the state-of-the-art in terms of accuracy, robustness to noise and outliers, geometric feature and detail preservation, and mesh connectivity.

Xu L , Lu CW , Xu Y . et al. .

Image smoothing via L0 gradient minimization

ACM Trans. Graph. 2011, 30 (6): 174 doi: 10.1145/2024156.2024208

DOI:10.1145/2024156.2024208      [Cited within: 1]

Yang YY , Wernick MN , Brankov JG . et al. .

A fast approach for accurate content-adaptive mesh generation

IEEE Trans. Image Process. 2003, 12(8): 866-881 doi: 10.1109/TIP.2003.812757

DOI:10.1109/TIP.2003.812757      URL     PMID:18237961      [Cited within: 1]

Mesh modeling is an important problem with many applications in image processing. A key issue in mesh modeling is how to generate a mesh structure that well represents an image by adapting to its content. We propose a new approach to mesh generation, which is based on a theoretical result derived on the error bound of a mesh representation. In the proposed method, the classical Floyd-Steinberg error-diffusion algorithm is employed to place mesh nodes in the image domain so that their spatial density varies according to the local image content. Delaunay triangulation is next applied to connect the mesh nodes. The result of this approach is that fine mesh elements are placed automatically in regions of the image containing high-frequency features while coarse mesh elements are used to represent smooth areas. The proposed algorithm is noniterative, fast, and easy to implement. Numerical results demonstrate that, at very low computational cost, the proposed approach can produce mesh representations that are more accurate than those produced by several existing methods. Moreover, it is demonstrated that the proposed algorithm performs well with images of various kinds, even in the presence of noise.

浙ICP备14002560号-5
版权所有 © Frontiers of Information Technology & Electronic Engineering
地址:浙江省杭州市浙大路38号 邮编:310027
联系电话:+86-571-87952783/87952276 E-mail: jzus_zzy@zju.edu.cn; jzus@zju.edu.cn
本系统由北京玛格泰克科技发展有限公司设计开发

/