Paper Scanner: November 2009

Monday, November 23, 2009

Two-view Feature Generation Model for Semi-supervised Learning

We first take a look at their logic: for semi-supervised learning, a generative model is usually preferred since unlabeled data help estimate the margin distribution \Pr(x). In a Bayesian MAP formulation, we are actually

\min_\alpha - \sum_i \log \Pr(y_i \mid \alpha, x_i) - \log \Pr(x_u \mid \alpha)\Pr(\alpha)

which is a little different from a direct generative model. Here the first term is actually a discriminative term and the second term is a penalty from unlabeled part (therefore it is more similar to a ``supervised loss + penalty'' model). This paper does talk about models of the latter, using auxiliary problems.

The two-view model means, analogous to co-training, we have two view of feature vector x, namely z_1(x), z_2(x), which are inpdependent conditioned on the label. The different thing about this model is in order to solve \Pr(y \mid z_1, z_2), we need \Pr(y \mid z_1), \Pr(y \mid z_2). Now we only consider \Pr(y \mid z_1). One possibility is to make a low-rank decomposition of \Pr(z_2 \mid z_1) = \sum_y \Pr(z_2 \mid y) \Pr(y \mid z_1) but the LHS is sometimes impossible to compute. An approximation is to encode z_2 with a set of binary labels t_1^k(z_2). Then \Pr( t_1^k \mid z_1) = \sum_y \Pr(t_1^k \mid y) \Pr(y \mid z_1) can be computed. By increasing the number of related binary labels t_1^k we may have a good estimation of \Pr(y \mid z_1).

They proposed two models (one linear and the other log-linear, which resembles linear regression and logistic regression in a way). The linear version coincides with the SVD-ASO model in their JMLR paper. The log-linear model is solved via EM-like algorithm.

The thing is what kind of binary auxilliary function would be essential to our semi-supervised problems? This might be a key to understanding their JMLR paper for multi-task learning.

A Framework for Learning Predicative Structures from Multiple Tasks and Unlabeled Data

by Rie Kubota Ando and Tong Zhang

This paper addresses a framework for multi-task learning. Their idea is quite simple. There is a common factor \Theta which is shared in different but related problems. Therefore in each problem P_k, our parameters include w_i, which is problem-specific and v_i which is dependent on the common feature controled by \Theta. To solve the model it usually desirable to alternatively optimize over w_i, v_i and \Theta. Usually a regularizer is also included for better generalization capacity.

using this idea, the authors proposed a linear model which is solved by the ASO using SVD in each iteration to find \Theta (SVD-ASO in their term). With this idea, they analyzed the semi-supervised learning with auxiliary functions, which are essentially those multi-tasks.

Their extention for this piece of work is scanned here.

Saturday, November 21, 2009

Discriminative Semi-supervised Feature Selection via Manifold Regularization

by Zenglin Xu, Rong Jin, Michael R. Lyu and Irwin King

This paper talks about feature selection via SVM. The semi-supervised part is enabled by adding a manifold regularizer. The method is to multiply the feature with a diagonal 0-1 matrix (selecting features). With these variables in the optimization as well, we get the optimization for this problem. The key idea to solve this problem is to reformulate it with the dual of SVM but leaving the feature selecting variables alone. Then the optima is the saddle point of the optimization problem. This kind of problem can be found in multiple-kernel learning, which has a standard algorithm (alternating optimization w.r.t. difference variables).

The idea of using SVM for feature selection is not new. The contribution might be the semi-supervised setting. In my own research it seems that we still do not have a clear goal of achieving this with other methods. hmm...

Thursday, November 12, 2009

Dappled Photography: Mask Enhanced Cameras for Heterodyned Light Fields and Coded Aperture Refocusing

by A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan and J. Tumblin

A light field conveys both spatial and angular distribution of light incident on the camera sensor. The pioneer work to capture a light field in one photographic exposure is the plenoptic camera, a device that uses a microlens array to rearrange a 4D light field and capture it with a 2D sensor. However, the optics of the microlens array defines a fixed resolution tradeoff between spatial and angular sampling of the light field.
In this paper, the authors propose to modulate the light field by shadowing the incoming light with a mask in the optical path. In the Fourier Light Field Space (FLS), the mask creates a train of identical kernels positioned in a slanted slice, and thereby, via convolution, pulls high angular frequencies to the central angular slice, the only slice the camera measures in the FLS. Assuming that the incident light field is band limited, the captured image is the flattened version of the incident light field in the Fourier domain.
Moreover, the slant of the mask kernel, which decides the spatial-angular resolution tradeoff, is determined by the location of the mask. Consequently, the resolution tradeoff can be adjusted by translating the mask. The minimal and the maximal angular resolution are achieved by placing the mask at the aperture and at the conjugate plane respectively.
However, the mask enhanced camera seems to trade reconstruction quality for flexibility. First, the optimal pattern of the mask varies with its location yet in practice the mask pattern is permanent. Second, the mask blocks about half of the incident light and reduces the signal-to-noise ratio of the sensed image. After all, the paper provides a profound analysis on the principle of mask enhanced cameras, a major category of computational camera, making itself influential in the Computation Photography community.

4D Frequency Analysis of Computational Cameras for Depth of Field Extension

by A. Levin, S. W. Hasinoff, P. Green, F. Durand and W. T. Freeman

Although many types of cameras are invented to extend their depth of field (DoF), none of them optimize the quality of the resulting image or, equivalently, maximize the modulation transfer function (MTF). In this paper, the authors perform a 4D frequency analysis to estimate the maximal frequency spectrums of optical systems.
The key of the analysis lies in the observation of the dimensional gap between the 3D MTF and the 4D ambiguity function that characterizes a camera: the former was a manifold embedded in the latter, called “the focal segments”. To maximize the MTF, therefore, the ambiguity function is desired to uniformly distribute all the energy on these segments. This analysis leads to an upper bound of the MTF.
Unfortunately, most contemporary computational cameras waste energy out of the region. The only exception is the focal sweep camera, but the phase incongruence of its OTF across various focus settings lowers the spectrum magnitude. The authors propose the lattice-focal lens. This lens is composed of a number of sub-squares, each responsible for focusing light rays from a specific depth. This spatial division of aperture also concentrates energy on the focal region, but achieves a much higher spectrum than the focal sweep camera.
The ambiguity function, defined as auto-correlation of the 2D scalar field of an optical system, is a redundant representation. This prohibits the authors from determining the tight upper bound of the frequency spectrum. Still, the proposed analysis sheds much light on this question. Although there is no explicit analysis, it indicates that the key of maximizing MTFs may lie in phase incoherence of the optical system.

Flexible Depth of Field Photography

by H. Nagahara, S. Kuthirummal, C. Zhou, and S. Nayar

The depth of field (DoF) of a lens is confined to a frontal-parallel slab. In this paper, the authors attempt to break this limitation by proposing flexible depth of field photography, i.e. to translate the detectors within the shutter. Thus at each moment the lens produces a point spread function (PSF) associated with the sensor position, and the final PSF, called integrated PSF (IPSF) is the sum of all the PSFs produced over the exposure.

Under this scheme, the authors suggested three applications for manipulating DoF. In the first application, the sensors are translated uniformly to produce a depth-independent, frequency preserving IPSF, ensuring a good quality of the restored all-in-focus image. In the second application, the authors play with non-uniform translations. Unwanted depth layers in the middle can be skipped so that their image would be blurry enough to be unnoticeable. The authors also showed that by rolling the detector’s exposure time during the translation, an arbitrary shape of DoF can be produced. All the applications are realized with a prototype flexible DoF imaging system built by the author.

The idea of flexible DoF is interesting, but it is not as novel as the authors claim in the paper. Although never thoroughly investigated, focus sweep is a widely used technique in photography under the name variable-focus photography; uniform sweeping of the DoF was even proposed as early as 1972 by Hausler. The major difference between Hausler’s work and this paper is just replacing focus sweeping with sensor translation, which is minor. The usefulness of the other two applications are somewhat vague due to the underlying assumption that scene depth is known before capture.

Time-Constrained Photography

by S. W. Hasinoff, K. N. Kutulakos, F. Durand, and W. T. Freeman

Current designs and evaluations of depth-of-field (DoF) extended cameras all assume a single photo is captured, yet within the same exposure time, a focal stack shot may give rise to a less noisy restoration. Therefore, the authors propose to evaluate the performance of camera designs in the multiple-shots scenario.

The authors firstly develop a Bayesian approach to restore the depth map and the all-in-focus image of a scene; they also estimate the scene-independent restoration error as a function of the interested range of depth and the schedule of capture. Fixing the total exposure time, they seek the optimal number of photos to capture: a few more photos would increase the signal-to-noise ratio (SNR) of the restored image because the dominant source of image noise, the photon noise, is multiplicative; the other source, the additive read noise, would penalize excessive photos though.

According to the expected restoration error, the authors compared the performance of various camera designs. Surprisingly, the conventional camera performs as well as, if not better than, other types of cameras in all but the low time budget settings; it also benefits most from the multiple-shot scheme. In contrast, the coded aperture camera performs poorly in term of light efficiency because its mask blocks most of the light from the sensor.

This paper constitutes a strong defense against intensity masking techniques by alluding the crucial role of time budgets in DoF extension. Indeed, there is no point in designing a DoF extended camera without a time constraint: closing the aperture is most convenient. For the same reason, albeit computational cameras do not outperform conventional ones in most occasions, they serve the most demanding purpose: to capture a sharp image with the minimal exposure.

Fourier Slice Photography

by Ren Ng

Unlike the 2D definition of image, light field more expressively measures the light radiance along all possible 4D rays. This notion was initially suggested to synthesize new view of non-Lambertian scenes by ray tracing, yet in this paper, the author is interested in employing this tool to model photographic imaging.

The core of this work was the Fourier Slice Photography Theorem, developed from a generalization of Bracewell’s Fourier Slice Theorem. The latter addresses that the Fourier Transform (FT) of a signal’s integral projection corresponds to a sheared slice in its FT. As the imaging process can be seen as a sheared integral of the light field, in the Fourier domain, a photograph formed with full aperture corresponds to a 2D slice in the 4D light field. This discovery speeds up digital refocusing from O(n⁴) to O(n²logn) and improves precision because Fourier Transform has a fast implementation and avoids the numerical loss in integral. The advantage of the new algorithm was also validated by experiments.

To show the utility of the theorem, the author goes further to provide in close-form the performance limit of plenoptic cameras of finite aperture. Preliminarily quantitative analysis was only possible for the simplest pinhole model as the assumption of aperture complicates the problem by low-passing the light field. The author tackles this difficulty by working in the Fourier domain. Firstly he shows that the 4D convolution of the light field yields 2D convolution of photographs focused at a variety of depths. Applying the Fourier Slice Photography Theorem to this statement, he concludes that a band-limit assumption on plenoptic cameras would degrade its performance in digital refocusing, and the amount of degradation increases linearly with the directional resolution of the sampled light field.

This paper gives an in-depth insight to the relationship between a light field and its photographic images, and pioneers theoretical analysis of optical designs in the Computational Photography community. It also inspires the invention of new light field cameras, e.g. Raskar’s design of dappled photography. Although the author only focused on aberration-free lens models, the Theorem of Fourier Slice Photography can be applied to a broader range of optical systems.

Extended Depth of Field through Wavefront Coding

by E. Dowski and W. Cathey

The authors worked on extending the depth of field (DoF) of optical systems, the range of distance from which objects can be imaged in full detail. Beyond this range, the image undergoes blurred, mathematically modeled as a convolution between the in-focus image and a point spread function (PSF) associated with the distance of the object. In the more general definition, an object is also considered to be within the DoF if its image can be ideally restored from what is read from the sensor.

Previously the mainstream approach to DoF extension is to block light at the aperture with an apodizer, but the optical power at the sensor is also decreased, resulting in a much longer exposure. The only prior approach with full aperture was Hausler's focus sweep method. However, it is limited in application due to the requirement to continuously change the focus setting during exposure.

The authors proposed to attach a phase mask to the optical system to achieve a PSF that is invariant to misfocus and beneficial to recovery of the full-resolution image, in the sense that its optical transfer function (OTF) has large values within its passband. Thus, the in-focus image could be fully recovered from the sensed image without knowledge about depth of the object.

To compute the profile of the phase mask, the authors firstly computed the OTF as a function of depth and phase profile of the mask. The OTF at a specific distance was proved to be a slice of the ambiguity function across the center, and the slope of the slice corresponds to the amount of misfocus. Therefore, the PSF produced by an optical system is constant to focal distance only when its corresponding ambiguity function is rotationally invariant over the angular region that corresponds to the extended DoF.

Limiting the profile of the phase mask to be monomial, the authors further derived that it has to be in cubic form. The bandwidth of cubic phase masked(cubic-pm) systems was also analyzed as a function of the monomial coefficient to guarantee that the OTF would not have zeros in its passband.

In experiments the authors compared the half-maximum amplitude and Fisher information of the cubic-pm PSFs to standard ones. Although the cubic-pm design greatly outperforms, this comparison may be unfair to the standard one because insensitivity to focus change is not indispensable to DoF extension. Nevertheless, the above experiments did suggest that cubic-pm design avoids the obstacle of PSF identification.

A comparison of restored images was also performed by simulation under noise-free assumption. Again, the cubic-pm optics appeared superior to the standard one. The key to its success lies in a wider support of the OTF. Acknowledging that noise is unavoidable, the authors estimated the signal-to-noise ratio of cubic-pm systems to be more than 20dB. Still, the experiments could have been more persuading had the authors included results in real optical systems to account for not only noise, but also manufactural imprecision of the phase mask.

In summary, this paper presents a novel solution to DoF extension. Although it was published 15 years ago, its influence in Computational Photography (CP) is still significant. On one hand, the need to maximize the amount of light at the sensor is increasingly emphasized in the area and phase masking remains an unique solution to extended DoF under this constraint till now. On the other hand, this paper highlights the importance of ambiguity function, which has recently been found to be the bridge between light field theory in CP and wavefront optics.

Paper Scanner