Simple Try for the Instagram Engineering Challenge

2011年11月13日 02:25

Introduction

在@Teaonly的微博中看到这个转帖:

Instagram公司给了一张把图像切片、打乱、组合成的图像。[1]

原图是:

要的是自动对乱序的图像切片进行重新排列,组成原始图片:

雷锋网有介绍。[2]

Instagram给出了图像尺寸,356×640,切片的宽度是32。对于如何自动找width的问题,Teaonly提出了方案。这里简单用32。

图片质量很好,只是简单的切割,没有任何degradation操作,于是使用了一个工程性的方法搞定:

 

Motivation

在切片相接处,两侧像素值差异很小,依次来判别是否相接。

Method

根据差异值进行统计,排序,就可以得到一个切片的左边可能是谁,右边可能是谁的hypothesis;排序的时候,取最相近的前两名。对左侧的判定结果,使用右侧排序结果进行hypothesis vertification。最后,可以得出最右侧元素,依次向前track出所有的图像切片,拼接即得原始图。

两列像素值差异计算公式简单用:

d = \sum_{i=1}^{n}\left|x_i-y_i \right|

在计算的时候,我只用了灰度信息,即把原图转化为灰度图,这样计算简便。也可以计算color的3维欧氏距离。不过,对这张图片grayscale就足够应付了。

Implementation

% The Image Unshredder script
% 
% write by visionfans @ 2011.11.14

clear;clc;close all

% load image
imgOri = imread('http://instagram-static.s3.amazonaws.com/images/TokyoPanoramaShredded.png');

% get image size
nClip = 20;
[nrows ncols nCh] = size(imgOri);
widClip = ncols / nClip;
xAxis = linspace(0, ncols, nClip+1) + 1;

% show every clip, format 
lSide = zeros(nrows, nClip);
rSide = zeros(nrows, nClip);

imgColClip  = uint8(zeros([nrows widClip nCh nClip]));

imgGray = rgb2gray(imgOri);
figure;
for i = 1:nClip
	% crop image patch
	rectArea = [xAxis(i) 1 31 nrows];
	imgClip = imcrop(imgGray, rectArea);
	imgColClip(:,:,:,i) = imcrop(imgOri, rectArea);  % for show
	
	subplot(2, nClip/2, i);	imshow(imgColClip(:,:,:,i));title(num2str(i));
	
	% statics side line
	lSide(:, i) = imgClip(1:end, 1);
	rSide(:, i) = imgClip(1:end, end);
end

% calculate coloum pixel diversity 
distMat = zeros(nClip, nClip);
for i = 1:nClip
	for j = 1:nClip
		if i==j
			distMat(i, j) = Inf;
		else
			distMat(i, j) = sum(abs(lSide(:, i) - rSide(:,j)));
		end
	end
end

% find candidate matchings
[~, lMatch] = sort(distMat');
[~, rMatch] = sort(distMat);

candsAll = zeros(2, nClip);
for i = 1:nClip
	lcands = lMatch(1:2,i);
	for j = 1:2
		rcands = rMatch(1:2, lcands(j));
		
		if ismember(i, rcands)
			candsAll(j,i) = lcands(j);
		end
	end
end

% filter repetition
rightElement = setdiff(1:nClip, unique(candsAll(1,:)));

if numel(rightElement)~=1, error('Need more elaborate check.'); end

% get result 
imgOrder = zeros(1,nClip);
idxNext = rightElement;
for i = 1:nClip
	imgOrder(i) = idxNext;
	idxNext = candsAll(1, idxNext);
end
imgOrder = fliplr(imgOrder);

outImg = [];
for i = 1:nClip
	outImg = [outImg imgColClip(:,:,:,imgOrder(i))];
end

figure;imshow(outImg);title('Unshreded Image');

 

Result

Segmentation of image strips,每条宽32个pixels:

Sort result:

imgOrder = [  9    11    15    17    19    14     8     4     3    12     5    20    18    13     7    16     2     6     1    10 ]

Unshreded Image:

Discuss

更多需要考虑的情况:

1. 如何自动检测切片的分割位置,其实,这里给了基本假设是直线切割,如果不是直线怎么办。

2. 对于低质图像,本文中简单的边界差异比较算法就无效了,要考虑切片内部内容。'

3. 对于非直线切割的情况,如何利用边界特征去进行匹配。

3. 当前算法,难以处理通用的拼接。更多问题,有兴趣的同学可以继续挖下去。

Just for fun! ^-^

 

References

[1] Instagram Engineering Challenge: The Unshredder,http://instagram-engineering.tumblr.com/

[2] 复原这张图片,你就有机会加入Instagram, http://www.leiphone.com/join-instagram.html?jtss=tsina

评论(0) 阅读(1443)

About organization of visual fragments in codebook

2011年10月18日 17:19

关于codebook的不同用法,摘自[1]:

The methods differ on the details of the codebook, but more fundamentally they differ in how strictly the geometry of the configuration of parts constituting an object class is constrained.

For example, Csurka et al. [2], Bar-Hillel et al. [3] and Opelt et al. [4] simply use a "bag of visual words" model (with no geometrical relations between the parts at all), Agarwal & Roth [5], Amores et al. [6], and Vidal-Naquet and Ullman [7] use quite loose pairwise relations, whilst Fergus et al. [8] have a strongly parametrized geometric model consisting of a joint Gaussian over the centroid position of all the parts. The  approaches using no geometric relations are able to categorize images (as containing the object class), but generally do not provide location information (no detection). Whereas the methods with even loose geometry are able to detect the object's location.

从图像中提取出visual fragments之后,如何利用呢?

  1. 无组织的利用:bag of visual words [2,3,4]
  2. 考虑visual fragments之间的组织关系,怎么表示它们之间的关系呢,待详读[5,6,7,8]。

 

Referecnes:

[1] Opelt, A.; Pinz, A. & Zisserman, A. A Boundary-Fragment-Model for Object Detection European Conference on Computer Vision, 2006, 575-588

[2] G. Csurka, C. Bray, C. Dance, and L. Fan. Visual categorization with bags of keypoints. In ECCV04. Workshop on Stat. Learning in Computer Vision, pages 59-74, 2004.

[3] A. Bar-Hillel, T. Hertz, and D. Weinshall. Object class recognition by boosting a part-based model. In Proc. CVPR, volume 2, pages 702-709, June 2005.

[4] A. Opelt, M. Fussenegger, A. Pinz, and P. Auer. Weak hypotheses and boosting for generic object detection and recognition. In Proc. ECCV, pages 71-84, 2004.

[5] S. Agarwal, A. Awan, and D. Roth. Learning to detect objects in images via a sparse, part-based representation. IEEE PAMI, 26(11):1475-1490, Nov. 2004.

[6] J. Amores, N. Sebe, and P. Radeva. Fast spatial pattern discovery integrating boosting with constellations of contextual descriptors. In Proc. CVPR, volume 2, pages 769-774, CA, USA, June 2005.

[7] M. Vidal-Naquet and S. Ullman. Object recognition with informative features and linear classi¯cation. In Proc. ICCV, volume 1, pages 281-288, 2003.

[8] R. Fergus, P. Perona, and A. Zisserman. Object class recognition by unsupervised scale-invariant learning. In Proc. CVPR, pages 264-271, 2003.

评论(0) 阅读(932)

2011.10.19

2011年10月18日 16:17

Some notes related to contour grouping and matching

The problem of computational perceptual grouping received considerable attention before the advent of appearance-based recognition, when object models were typically shape-based and image features were typically contour-based.[1]

Moreover, while object databases were rather small, it was generally assumed that a linear search of a database, i.e., matching the image features against each model in succession and choosing the best-matching model, was an unacceptable strategy, for it did not scale to very large databases. In an effort to achieve sublinear scaling, much effort was devoted to the problem of object indexing, i.e., using a set of image features to query the database for candidate objects that might account for the image features.[1]

拿模型一个一个去匹配的方法已经过时了,因为现在的图像库都很大,计算上不现实了。现在主要是研究如何更好的进行object indexing。

grouping was based not on object-level prior knowledge, but rather on mid-level (object-independent) prior knowledge. Such grouping was essential, since local contour features were highly ambiguous, and without grouping them into more discriminative structures, effective indexing into large databases was problematic.[1]

Grouping所组织出来的是mid level的信息,是有用的。

References:

[1] Sala, P. & Dickinson, S. J. Contour Grouping and Abstraction Using Simple Part Models European Conference on Computer Vision, 2010, 603-616

评论(1) 阅读(2316)