Английские материалы
|
| Авторы |
Название статьи |
Описание |
Рейтинг |
| Ioannis Kompatsiaris, and Michael Gerassimos Strintzis, |
Spatiotemporal Segmentation and Tracking of Objects for Visualization of Videoconference Image Sequences |
Abstract—In this paper, a procedure is described for the segmentation,
content-based coding, and visualization of videoconference
image sequences. First, image sequence analysis is used to estimate
the shape and motion parameters of the person facing the camera.
A spatiotemporal filter, taking into account the intensity differences
between consequent frames, is applied, in order to separate
the moving person from the static background. The foreground is
segmented in a number of regions in order to identify the face. For
this purpose, we propose the novel procedure of K-Means with connectivity
constraint algorithm as a general segmentation algorithm
combining several types of information including intensity, motion
and compactness. In this algorithm, the use of spatiotemporal regions
is introduced since a number of frames are analyzed simultaneousl,
y and as a result, the same region is present in consequent
frames. Based on this information, a 3-D ellipsoid is adapted to the
person’s face using an efficient and robust algorithm. The rigid 3-D
motion is estimated next using a least median of squares approach.
Finally, a Virtual Reality Modeling Language (VRML) file is created
containing all the above information; this file may be viewed
by using any VRML 2.0 compliant browser.
RAR 391 кбайт
|
?
|
| Candemir Toklu, A. Murat Tekalp, and A. Tanju Erdem |
Semi-Automatic Video Object Segmentation in the Presence of occlusion |
Abstract—We describe a semi-automatic approach for segmenting
a video sequence into spatio-temporal video objects in
the presence of occlusion. Motion and shape of each video object
is represented by a 2-D mesh. Assuming that the boundary of an
object of interest is interactively marked on some keyframes, the
proposed method finds the boundary of the object in all other
frames automatically by tracking the 2-D mesh representation
of the object in both forward and backward directions. A key
contribution of the proposed method is automatic detection of
covered and uncovered regions at each frame, and assignment of
pixels in the uncovered regions to the object or background based
on color and motion similarity. Experimental results are presented
on two MPEG-4 test sequences and the resulting segmentations
are evaluated both visually and quantitatively.
RAR 209 кбайт
|
?
|
| Isёэl Celasun and A. Murat Tekalp, |
Optimal 2-D Hierarchical Content-Based Mesh Design and Update for Object-Based Video |
Abstract—Representation of video objects (VOs) using hierarchical
2-D content-based meshes for accurate tracking and level
of detail (LOD) rendering have been previously proposed, where
a simple suboptimal hierarchical mesh design algorithm was employed.
However, it was concluded that the performance of the
tracking and rendering very much depends on how well each level
of the hierarchical mesh structure fits the VO under consideration.
To this effect, this paper proposes an optimized design of hierarchical
2-D content-based meshes with a shape-adaptive simplification
and temporal update mechanism for object-based video.
Particular contributions of this work are: 1) analysis of optimal
number of nodes for the initial fine level-of-detail mesh design; 2)
adaptive shape simplification across hierarchy levels; 3) optimization
of the interior-node decimation method to remove only a maximal
independent set to preserve Delaunay topology across hierarchy
levels for better bitrate versus quality performance; and 4)
a mesh-update mechanism which serves to update temporally 2-D
dynamic mesh in case of occlusion due to 3-D motion and self-occlusion.
The proposed optimized and temporally updated hierarchical
mesh representation can be applied in object-based video coding,
retrieval, and manipulation.
RAR 3564 кбайт
|
?
|
| Toshiaki Fujii, Tadahiko Kimoto, and Masayuki Tanimoto |
A New Flexible Acquisition System of Ray-Space Data for Arbitrary Objects |
Abstract—Conventional ray-space acquisition systems require
very precise mechanisms to control the small movement of
cameras or objects. Most of them adopt camera with a gantry or
a turntable. Although they are good for acquiring the ray space
of small objects, they are not suitable for ray-space acquisition of
very large structures, such as a building, tower, etc. This paper
proposes a new ray-space acquisition system which consists of a
camera and a 3-D position and orientation sensor. It is not only
a compact and easy-to-handle system, but is also free from limitations
of size or shape, in principle. It can obtain any ray-space
data as far as the camera is located within the coverage of the
3-D sensor. This paper describes our system and its specifications.
Experimental results are also presented.
RAR 212 кбайт
|
?
|
| Jin Liu, David Przewozny, and Siegmund Pastoor |
Layered Representation of Scenes Based on Multiview Image Analysis |
Abstract—This paper describes a novel cooperative procedure
for the segmentation of multiview image sequences exploiting multiple
sources of information. Compared to other approaches, no a
priori information is needed about the structure and the arrangement
of objects in the scene. Three cameras in a particular unsymmetrical
set-up are used in the system. The color distribution
and the object contours in the constituent 2-D images, the disparity
information in stereo image pairs, as well as motion information
in subsequent images, are analyzed and evaluated in a cooperative
procedure to get reliable segmentation results. The scene is decomposed
into a variable number of depth layers, with each layer
showing a subset of the segmented regions. The layered representation
can be used in a variety of applications. In this paper, the application
aims at synthesizing 3-D images for enhanced telepresence
allowing the user to “look around” in natural scenes (intermediate
views for interactive displays). In another application, 3-D images
showing a natural depth-of-focus are synthesized in order to improve
viewing comfort with 3-D displays.
RAR 435 кбайт
|
?
|
| Atsushi Marugame, Akio Yamada, and Mutsumi Ohta |
Focused Object Extraction with Multiple Cameras |
Abstract—This paper describes a novel framework for object extraction
from images utilizing multiple cameras. Focused regions
in images and disparities of point correspondences among multiple
images are 3-D clues for the extraction.We examine the extraction
of focused objects from images by these automatically acquired
clues. Edges in images captured by the cameras are detected, and
disparities of the edges in focused regions become the clues, called
disparity keys. A focused object is extracted from an image as a
set of edge intervals with the disparity keys. The falsely extracted
parts can be detected by discontinuous contours of the object and
recovered by contour morphing. Some experimental results under
different conditions demonstrate the effectiveness and robustness
of the proposed method. The method can be applied to image synthesis
methods, such as Synthesis/Natural Hybrid Coding (SNHC)
and to object-scalable coding in MPEG-4.
RAR 968 кбайт
|
?
|
| Sila Ekmekci |
Encoding and Reconstruction of Incomplete 3-D Video Objects |
Abstract—A new approach for compact representation,
MPEG-4 encoding, and reconstruction of video objects captured
by an uncalibrated system of multiple cameras is presented.
The method is based on the incomplete 3-D (I3D) technique,
which was initially investigated for stereo video objects captured
by parallel cameras. Non-overlapping portions of the object
are extracted from the reference views, each view having the
corresponding portion with the highest resolution. This way, the
redundancy of the initial multiview data is reduced. The areas
which are extracted from the basis views are denoted as areas
of interest. The output of the analysis stage, i.e., the areas of
interest and the corresponding parts of the disparity fields are
encoded in the MPEG-4 bitstream. Disparity fields define the
correspondence relations between the reference views. The view
synthesis is performed by disparity-oriented reprojection of the
areas of interest into the virtual view plane and can be seen as
an intermediate postprocessing stage between the decoder and
the scene compositor. This work performs the extension from
parallel stereo views to arbitrary configured multi-views with
new analysis and synthesis algorithms. Moreover, a two-way
interaction is built between the analysis and reconstruction stages,
which provides the tradeoff between the final image quality and
amount of data transmitted. The focus is on a low-complexity
solution enabling online processing capability while preserving
the MPEG-4 compatibility of the I3D representation. It is finally
shown that our method yields quite convincing results despite the
minimal data used and the approximations involved.
RAR 321 кбайт
|
?
|
| Peter Eisert, Eckehard Steinbach, and Bernd Girod, Fellow |
Automatic Reconstruction of Stationary 3-D Objects from Multiple Uncalibrated Camera Views |
Abstract—A system for the automatic reconstruction of
real-world objects from multiple uncalibrated camera views is
presented. The camera position and orientation for all views,
the 3-D shape of the rigid object, as well as the associated color
information, are recovered from the image sequence. The system
proceeds in four steps. First, the internal camera parameters
describing the imaging geometry are calibrated using a reference
object. Second, an initial 3-D description of the object is computed
from two views. This model information is then used in a third
step to estimate the camera positions for all available views using
a novel linear 3-D motion and shape estimation algorithm. The
main feature of this third step is the simultaneous estimation
of 3-D camera-motion parameters and object shape refinement
with respect to the initial 3-D model. The initial 3-D shape
model exhibits only a few degrees of freedom and the object
shape refinement is defined as flexible deformation of the initial
shape model. Our formulation of the shape deformation allows
the object texture to slide on the surface, which differs from
traditional flexible body modeling. This novel combined shape
and motion estimation using sliding texture considerably improves
the calibration data of the individual views in comparison to
fixed-shape model-based camera-motion estimation. Since the
shape model used for model-based camera-motion estimation
is only approximate, a volumetric 3-D reconstruction process is
initiated in the fourth step that combines the information from
all views simultaneously. The recovered object consists of a set of
voxels with associated color information that describes even fine
structures and details of the object. New views of the object can
be rendered from the recovered 3-D model, which has potential
applications in virtual reality or multimedia systems and the
emerging field of video coding using 3-D scene models.
RAR 1321 кбайт
|
?
|
| Gian Luca Foresti |
Object Recognition and Tracking for Remote Video Surveillance |
Abstract—In this paper, a system for real-time object recognition
and tracking for remote video surveillance is presented. In
order to meet real-time requirements, a unique feature, i.e., the
statistical morphological skeleton, which achieves low computational
complexity, accuracy of localization, and noise robustness
has been considered for both object recognition and tracking.
Recognition is obtained by comparing an analytical approximation
of the skeleton function extracted from the analyzed image
with that obtained from model objects stored into a database.
Tracking is performed by applying an extended Kalman filter
to a set of observable quantities derived from the detected
skeleton and other geometric characteristics of the moving object.
Several experiments are shown to illustrate the validity of the
proposed method and to demonstrate its usefulness in video-based
applications.
RAR 1347 кбайт
|
?
|
| Ebroul Izquierdo, and Xiaohua Feng |
Modeling Arbitrary Objects Based on Geometric Surface Conformity |
Abstract—In this paper, we address the problem of efficient and
flexible modeling of arbitrary three-dimensional (3-D) objects and
the accurate tracking of the generated model. These goals are
reached by combining available multiview image analysis tools
with a straightforward 3-D modeling method, which exploit wellestablished
techniques from both computer vision and computer
graphics, improving and combining them with new strategies. The
basic idea of the technique presented is to use feature points and
relevant edges in the images as nodes and edges of an initial twodimensional
wire grid. The method is adaptive in the sense that
an initial rough surface approximation is progressively refined at
the locations where the triangular patches do not approximate
the surface accurately. The approximation error is measured
according to the distance of the model to the object surface,
taking into account the reliability of the depth estimated from
the stereo image analysis. Once the initial wireframe is available,
it is deformed and updated from frame to frame according to
the motion of the object points chosen to be nodes. At the end of
this process we obtain a temporally consistent 3-D model, which
accurately approximates the visible object surface and reflects
the physical characteristics of the surface with as few planar
patches as possible. The performance of the presented methods
is confirmed by several computer experiments.
RAR 1081 кбайт
|
?
|
| Jens-Rainer Ohm, and Karsten MЁuller |
Incomplete 3-D Multiview Representation of Video Objects |
Abstract—This paper introduces a new form of representation
for three-dimensional (3-D) video objects. We have developed
a technique to extract disparity and texture data from video
objects that are captured simultaneously with multiple-camera
configurations. For this purpose, we derive an “area of interest”
(AOI) for each of the camera views, which represents an area on
the video object’s surface that is best visible from this specific
camera viewpoint. By combining all AOI’s, we obtain the video
object plane as an unwrapped surface of a 3-D object, containing
all texture data visible from any of the cameras. This texture
surface can be encoded like any 2-D video object plane, while
the 3-D information is contained in the associated disparity
map. It is then possible to reconstruct different viewpoints from
the texture surface by simple disparity-based projection. The
merits of the technique are efficient multiview encoding of single
video objects and support for viewpoint adaptation functionality,
which is desirable in mixing natural and synthetic images. We
have performed experiments with the MPEG-4 video verification
model, where the disparity map is encoded by use of the tools
provided for grayscale alpha data encoding. Due to its simplicity,
the technique is suitable for applications that require real-time
viewpoint adaptation toward video objects.
RAR 669 кбайт
|
?
|
| Joo-Hee Moon, Ji-Heon Kweon, and Hae-Kwang Kim |
Boundary Block-Merging (BBM) Technique for Efficient Texture Coding of Arbitrarily Shaped Object |
Abstract—We present an efficient texture coding method which
enhances the coding efficiency of conventional discrete cosine
transform (DCT) with padding techniques for arbitrarily shaped
objects in object-based video coding where shape information
is provided. The BBM (boundary block-merging) technique is
applied to the boundary macroblocks of 16 . 16 pixels of a VOP
(video object plane) which consist of both background and object
pixels. A macroblock consists of four subblocks of 8 . 8 pixels.
For boundary subblocks consisting of object and background
pixels, padding is performed in the background region. For a
pair of padded boundary subblocks in a macroblock of which
alignment belongs to a predefined set, one subblock is rotated
180. and merged into another one if object pixels do not overlap.
After merging, the boundary macroblock is coded using the conventional
DCT coding. The merging process reduces the number
of subblocks to be DCT coded, and high correlation between
adjacent subblocks makes the number of DCT coding bits small.
Experimentation has been done on various test sequences under
different test conditions, and verifies significant coding efficiency
improvement: reduction of coding bits for luminance boundary
blocks by 5.7–11.9% at the same PSNR values compared with
the padding-based DCT without BBM.
RAR 1443 кбайт
|
?
|
| Di Zhong and Shih-Fu Chang |
An Integrated Approach for Content-Based Video Object Segmentation and Retrieval |
Abstract—Object-based video data representations enable unprecedented
functionalities of content access and manipulation. In
this paper, we present an integrated approach using region-based
analysis for semantic video object segmentation and retrieval.
We first present an active system that combines low-level region
segmentation with user inputs for defining and tracking semantic
video objects. The proposed technique is novel in using an integrated
feature fusion framework for tracking and segmentation
at both region and object levels. Experimental results and extensive
performance evaluation show excellent results compared to
existing systems. Building upon the segmentation framework, we
then present a unique region-based query system for semantic
video object. The model facilitates powerful object search, such
as spatio-temporal similarity searching at multiple levels.
RAR 404 кбайт
|
?
|
| Munchurl Kim, Jae Gark Choi, Daehee Kim, Hyung Lee, Myoung Ho Lee, Chieteuk Ahn, and Yo-Sung Ho |
A VOP Generation Tool: Automatic Segmentation of Moving Objects in Image Sequences Based on Spatio-Temporal Information |
Abstract—The new MPEG-4 video coding standard enables
content-based functionalities. In order to support the philosophy
of the MPEG-4 visual standard, each frame of video sequences
should be represented in terms of video object planes (VOP’s).
In other words, video objects to be encoded in still pictures or
video sequences should be prepared before the encoding process
starts. Therefore, it requires a prior decomposition of sequences
into VOP’s so that each VOP represents a moving object. This
paper addresses an image segmentation method for separating
moving objects from the background in image sequences.
The proposed method utilizes the following spatio-temporal
information. 1) For localization of moving objects in the image
sequence, two consecutive image frames in the temporal direction
are examined and a hypothesis testing is performed by comparing
two variance estimates from two consecutive difference images,
which results in an F-test. 2) Spatial segmentation is performed to
divide each image into semantic regions and to find precise object
boundaries of the moving objects. The temporal segmentation
yields a change detection mask that indicates moving areas
(foreground) and nonmoving areas (background), and spatial segmentation
produces spatial segmentation masks. A combination
of the spatial and temporal segmentation masks produces VOP’s
faithfully. This paper presents various experimental results.
with objects, and reuse of content information by scene composition,
which are all suitable for multimedia applications.
RAR 681 кбайт
|
?
|
| Sotiris Malassiotis and Michael G. Strintzis, |
Tracking Textured Deformable Objects Using a Finite-Element Mesh |
Abstract—This paper presents an algorithm for the estimation
of the motion of textured objects undergoing nonrigid deformations
over a sequence of images. An active mesh model,
which is a finite-element deformable membrane, is introduced
in order to achieve efficient representation of global and local
deformations. The mesh is constructed using an adaptive
triangulation procedure that places more triangles over high
detail areas. Through robust least squares techniques and modal
analysis, efficient estimation of global object deformations is
achieved, based on a set of sparse displacement measurements.
A local warping procedure is then applied to minimize the
intensity matching error between subsequent images, and thus
estimate local deformations. Among the major contributions of
this paper are novel techniques developed to acquire knowledge
of the object dynamics and structure directly from the image
sequence, even in the absence of prior intelligence regarding
the scene. Specifically, a coarse-to-fine estimation scheme is first
developed, which adapts the model to locally deforming features.
Subsequently, principal components modal analysis is used to
accumulate knowledge of the object dynamics. This knowledge
is finally exploited to constrain the object deformation. The
problem of tracking the model over time is addressed, and a novel
motion-compensated prediction approach is proposed to facilitate
this. A novel method for the determination of the dynamical
principal axes of deformation is developed. The experimental
results demonstrate the efficiency and robustness of the proposed
scheme, which has many potential applications in the areas of
image coding, image analysis, and computer graphics.
RAR 627 кбайт
|
?
|
| Chuang Gu, and Ming-Chieh Lee |
Semiautomatic Segmentation and Tracking of Semantic Video Objects |
Abstract—This paper introduces a novel semantic video object
extraction system using mathematical morphology and a perspective
motion model. Inspired by the results from the study of
the human visual system, we intend to solve the semantic video
object extraction problem in two separate steps: supervised I-
frame segmentation, and unsupervised P-frame tracking. First,
the precise semantic video object boundary can be found using a
combination of human assistance and a morphological segmentation
tool. Second, the semantic video objects in the remaining
frames are obtained using global perspective motion estimation
and compensation of the previous semantic video object plus
boundary refinement as used for I frames.
RAR 376 кбайт
|
?
|
| Yining Deng, and B. S. Manjunath |
NeTra-V: Toward an Object-Based Video Representation |
Abstract— We present here a prototype video analysis and
retrieval system, called NeTra-V, that is being developed to
build an object-based video representation for functionalities
such as search and retrieval of video objects. A region-based
content description scheme using low-level visual descriptors is
proposed. In order to obtain regions for local feature extraction,
a new spatio-temporal segmentation and region-tracking scheme
is employed. The segmentation algorithm uses all three visual
features: color, texture, and motion in the video data. A group
processing scheme similar to the one in the MPEG-2 standard is
used to ensure the robustness of the segmentation. The proposed
approach can handle complex scenes with large motion. After
segmentation, regions are tracked through the video sequence
using extracted local features. The results of tracking are sequences
of coherent regions, called “subobjects.” Subobjects are
the fundamental elements in our low-level content description
scheme, which can be used to obtain meaningful physical objects
in a high-level content description scheme. Experimental results
illustrating segmentation and retrieval are provided.
RAR 347 кбайт
|
?
|
| Hiroyuki Katata, Norio Ito, and Hiroshi Kusao |
Temporal-Scalable Coding Based on Image Content |
Abstract— An object-based temporal scalability codec
is proposed by introducing shape coding, a new motion
estimation/compensation method, weighting techniques, and
background composition. The major feature of this technique is
determining the frame rate of the selected objects in the motion
picture individually so that the motion of the selected region is
smoother than that of the other area. The observation of the
computer simulation proves that the proposed method achieves
the better image quality and it enables us to represents the
motion of the selected objects hierarchically.
RAR 355 кбайт
|
?
|
| Philippe Salembier, Ferran Marquґes, Montse Pard`as, Josep Ramon Morros, Isabelle Corset, Sylvie Jeannin, Lionel Bouchard, Fernand Meyer, and Beatriz Marcotegui |
Segmentation-Based Video Coding System Allowing the Manipulation of Objects |
Abstract—This paper presents a generic video coding algorithm
allowing the content-based manipulation of objects. This manipulation
is possible thanks to the definition of a spatiotemporal
segmentation of the sequences. The coding strategy relies on a
joint optimization in the rate-distortion sense of the partition
definition and of the coding techniques to be used within each
region. This optimization creates the link between the analysis
and synthesis parts of the coder. The analysis defines the time
evolution of the partition, as well as the elimination or the
appearance of regions that are homogeneous either spatially or in
motion. The coding of the texture as well as of the partition relies
on region-based motion compensation techniques. The algorithm
offers a good compromise between the ability to track and
manipulate objects and the coding efficiency.
RAR 1509 кбайт
|
?
|
| Kevin J. O’Connell |
Object-Adaptive Vertex-Based Shape Coding Method |
Abstract—The paper presents a new technique for compactly representing
the shape of a visual object within a scene. This method encodes the
vertices of a polygonal approximation of the object’s shape by adapting
the representation to the dynamic range of the relative locations of the
object’s vertices and by exploiting an octant-based representation of
each individual vertex. The object-level adaptation to the relative-location
dynamic range provides the flexibility needed to efficiently encode objects
of different sizes and with different allowed approximation distortion.
At the vertex-level, the octant-based representation allows coding gains
for vertices closely spaced relative to the object-level dynamic range.
This vertex coding method may be used with techniques which code
the polygonal approximation error for further gains in coding efficiency.
Results are shown which demonstrate the effectiveness of the vertex
encoding method. The rate-distortion comparisons presented show that
the technique’s adaptive nature allows it to operate efficiently over a wide
range of rates and distortions and across a variety of input material,
whereas other methods are efficient over more limited conditions.
RAR 121 кбайт
|
?
|
| Emmanuel Reusens, Touradj Ebrahimi, and Murat Kunt, Fellow |
Dynamic Coding of Visual Information |
Abstract—This paper introduces a novel approach to visual
data compression. The approach, named dynamic coding, consists
of an effective competition between several representation
models used for describing data portions. The image data is
represented as the union of several regions each approximated
by a representation model locally appropriate. The dynamic
coding concept leads to attractive features such as genericness,
flexibility, and openness and is therefore particularly suited to
a multimedia environment in which many types of applications
are involved. Dynamic coding is a general proposal to visual
data compression and many variations on the same theme may
be designed. They differ by the particular procedure by which
the data is segmented into objects and the local representation
model selected. As an illustrative example, a video compression
scheme based on the principles of dynamic coding is presented.
This compression algorithm performs a joint optimization of
the segmentation (restricted to a so-called generalized quadtree
partition) together with the representation models associated with
each data segment. Four representation models are competing
namely, fractal, motion compensation, text and graphics, and
background modes. Optimality is defined with respect to a ratedistortion
tradeoff and the optimization procedure leads to a
multicriterion segmentation.
RAR 303 кбайт
|
?
|
| Mark R. Banham, and James C. Brailean |
A Selective Update Approach to Matching Pursuits Video Coding |
Abstract—This paper addresses an approach to video coding
utilizing an iterative nonorthogonal expansion technique called
“matching pursuits” (MP) in combination with a new algorithm
for selecting an appropriate coding technique at each frame
in a sequence. This decision algorithm is called “selective update”
and is based on an estimate of the amount and type of
motion occurring between coded frames in a video sequence.
This paper demonstrates that the matching pursuits approach
is most efficient for video coding when motion compensation
results in prediction error which is well localized to the edges
of moving objects. In the presence of global motion, such as
panning and zooming, or in the presence of objects entering or
leaving a scene, matching pursuits becomes less effective than
orthogonal transform-based coding techniques like the blockbased
discrete cosine transform (DCT). The rate-distortion characteristics
of matching pursuits and block-wise DCT coding are
used to demonstrate how MP coding can be more efficient than
block-wise DCT-based coding. When an appropriate combination
of these nonorthogonal and orthogonal transforms are used
for encoding a complete low bit-rate video sequence, improved
overall compression efficiency can be achieved. Results are shown
which demonstrate the effectiveness of a hybrid video codec based
on this concept.
RAR 304 кбайт
|
?
|
| Fran?cois Br?emond and Monique Thonnat |
Tracking Multiple Nonrigid Objects in Video Sequences |
Abstract— This paper presents a method to track multiple
nonrigid objects in video sequences. First, we present related
works on tracking methods. Second, we describe our proposed
approach. We use the notion of target to represent the perception
of object motion. To handle the particularities of nonrigid objects
we define a target as an individually tracked moving region or
as a group of moving regions globally tracked. Then we explain
how to compute the trajectory of a target and how to compute
the correspondences between known targets and moving regions
newly detected. In the case of an ambiguous correspondence
we define a compound target to freeze the associations between
targets and moving regions until a more accurate information is
available. Finally we provide an example to illustrate the way
we have implemented the proposed tracking method for videosurveillance
applications.
RAR 845 кбайт
|
?
|
| Kiyoharu Aizawa, Kazuya Kodama, and Akira Kubota |
Producing Object-Based Special Effects by Fusing Multiple Differently Focused Images |
Abstract—We propose a novel approach for producing special
visual effects by fusing multiple differently focused images. This
method differs from conventional image-fusion techniques because
it enables us to arbitrarily generate object-based visual effects such
as blurring, enhancement, and shifting. Notably, the method does
not need any segmentation. Using a linear imaging model, it directly
generates the desired image from multiple differently focused
images.
RAR 666 кбайт
|
?
|
| Radu S. Jasinschi, Thumpudi Naveen, Ali J. Tabatabai, and Paul Babic-Vovk |
Apparent 3-D Camera Velocity—Extraction and Applications |
Abstract—In this paper, we describe a robust method for the extraction
of the apparent 3-D camera velocity and 3-D scene structure
information. Our method performs the extraction of the apparent
3-D camera velocity in a fully automated way without any
knowledge about 3-D scene content information as used in current
methods. This has the advantage that it can be used to fully automate
the generation of natural-looking virtual/augmented environments,
as well as in video-database browsing. First, we describe our
method for the robust extraction of 3-D parameters. This method is
a combination of the eight-point method in structure-from-motion
with a statistical technique to automatically select feature points in
the image, irrespective of 3-D content information. Second, we discuss
two applications which use the results of the 3-D parameter
extraction. The first application is the generation of sprite layers
using 3-D camera velocity information to represent an eight-parameter
perspective image-to-sprite mapping plus 3-D scene depth
information for the sprite layering. The second application is the
use of 3-D camera velocity for the indexing of large video databases
according to a set of seven independent types of camera motion.
RAR 357 кбайт
|
?
|
| Chun-Jen Tsai and Aggelos K. Katsaggelos, Fellow |
Sequential Construction of 3-D-Based Scene Description |
Abstract—Binocular camera systems are commonly used to construct
3-D-based scene description. However, there is a tradeoff
between the length of the camera baseline and the difficulty of
the matching problem and the extent of the field of view of the
3-D scene. A large baseline system provides better depth resolution
than a smaller baseline system at the expense of a narrower field of
view. To increase the depth resolution without increasing the difficulty
of the matching problem and decreasing the field of view of
the 3-D scene, a sequential 3-D-based scene description technique
is proposed in this paper. Multiple small-baseline 3-D scene descriptions
from a single moving camera or an array of cameras are
used to sequentially construct a large baseline 3-D scene description
while maintaining the field of view of a small-baseline system.
A Bayesian framework using a disparity-space image (DSI) technique
for disparity estimation is presented. The cost function for
large baseline image matching is designed based not only on the
photometric matching error, the smoothness constraint, and the
ordering constraint, but also on the previous disparity estimates
from smaller baseline stereo image pairs as a prior model. Texture
information is registered along the scan path of the camera(s). Experimental
results demonstrate the effectiveness of this technique
in visual communication applications.
RAR 545 кбайт
|
?
|
| In Kyu Park, Il Dong Yun, and Sang Uk Lee, |
Automatic 3-D Model Synthesis from Measured Range Data |
Abstract—In this paper, we propose an algorithm to construct
3-D surface model from a set of range data, based on non-uniform
rational B-splines (NURBS) surface-fitting technique. It is
assumed that the range data is initially unorganized and scattered
3-D points, while their connectivity is also unknown. The proposed
algorithm consists of three stages: initial model approximation employing
-means clustering, hierarchical decomposition of the initial
model, and construction of NURBS surface patch network.
The initial model is approximated by both polyhedral and triangular
model. Then, the initial model is represented by a hierarchical
graph, which is efficiently used to construct the 1 continuous
NURBS patch network of the whole object. Experiments are
carried out on synthetic and real range data to evaluate the performance
of the proposed algorithm. It is shown that the initial
model as well as the NURBS patch network are constructed automatically
with tolerable computation. The modeling error of the
NURBS model is reduced to 10%, compared with the initial mesh
model.
RAR 295 кбайт
|
?
|
| Haruo Hoshino, Fumio Okano, and Ichiro Yuyama |
A Study on Resolution and Aliasing for
Multi-Viewpoint Image Acquisition |
Abstract—We equate multi-viewpoint image acquisition with
object sampling from different viewpoints, and calculate the resolution
of multi-viewpoint camera systems. Aliasing, which occurs
with the sampling, leads to depth shifting of objects. For instance,
an image of a distant object may be taken as if it were near when
aliasing occurs. A condition of the camera pitch free from aliasing
is discussed. An appropriate prefilter for the sampling can eliminate
alias-causing spatial–frequency components, even when the
camera pitch is large. We analyze the characteristics of the prefilters
from the aspects of depth shifting, ghosting, and waveform
distortion. The experimental results show that a prefilter, which
reduces ghosting, can be realized optically. For precise acquisition
of multi-viewpoint images, however, a prefilter with electrical processing
is needed.
RAR 354 кбайт
|
?
|
| Andrй Redert, Emile Hendriks, and Jan Biemond, Fellow |
3-D Scene Reconstruction with Viewpoint Adaptation on Stereo Displays |
Abstract—In this paper, we propose a generic algorithm for
the geometrically correct reconstruction of 3-D scenes on stereo
displays with viewpoint adaptation. This forms the basis of
multiviewpoint systems, which are currently the most promising
candidates for real-time implementations of 3-D visual communication
systems. The reconstruction algorithm needs 3-D tracking
of the viewers’ eyes with respect to the display. We analyze the
effect of eye-tracking errors. A simple bound will be derived,
below which reconstruction errors cannot be observed.We design
a multiviewpoint system using a recently introduced image-based
scene representation. The design formed the basis of the real-time
multiviewpoint system that was recently built in the European
PANORAMA project. Experiments with both natural and synthetic
scenes show that the proposed reconstruction algorithm
performs well. The experiments are performed by computer
simulations and the real-time PANORAMA system.
RAR 633 кбайт
|
?
|
| Fabio Lavagetto and Roberto Pockaj |
The Facial Animation Engine: Toward a High-Level Interface for the Design of MPEG-4 Compliant Animated Faces |
Abstract—In this paper, we propose a method for implementing
a high-level interface for the synthesis and animation of animated
virtual faces that is in full compliance with MPEG-4 specifications.
This method allows us to implement the simple facial object
profile and part of the calibration facial object profile.
In fact, starting from a facial wireframe and from a set of con-
figuration files, the developed system is capable of automatically
generating the animation rules suited for model animation driven
by a stream of facial animation parameters. If the calibration
parameters (feature points and texture) are available, the system
is able to exploit this information for suitably modifying the
geometry of the wireframe and for performing its animation
by means of calibrated rules computed ex novo on the adapted
somatics of the model.
Evidence of the achievable performance is reported at the end
of this paper by means of figures showing the capability of the
system to reshape its geometry according to the decoded MPEG-4
facial calibration parameters and its effectiveness in performing
facial expressions.
RAR 1230 кбайт
|
?
|
| JЁorgen Ahlberg and Haibo Li |
Representing and Compressing Facial Animation Parameters Using Facial Action Basis Functions |
Abstract—In model-based, or semantic, coding, parameters describing
the nonrigid motion of objects, e.g., the mimics of a face,
are of crucial interest. The facial animation parameters (FAP’s)
specified in MPEG-4 compose a very rich set of such parameters,
allowing a wide range of facial motion. However, the FAP’s are
typically correlated and also constrained in their motion due to
the physiology of the human face. We seek here to utilize this
spatial correlation to achieve efficient compression. As it does
not introduce any interframe delay, the method is suitable for
interactive applications, e.g., videophone and interactive video,
where low delay is a vital issue.
RAR 86 кбайт
|
?
|
| ? |
Introduction to the Special Issue on Object-Based Video Coding and Description |
RAR 136 кбайт
|
?
|
| Wilfried Philips |
Comparison of Techniques for Intra-Frame Coding of Arbitrarily Shaped Video Object Boundary Blocks |
Abstract— This paper presents experimental results that
demonstrate that the weakly separable polynomial orthonormal
transform outperforms the shape adaptive discrete cosine
transform (SADCT) and the recently introduced improved
SADCT with .dc correction at the expense of a nonprohibitive
increase in the number of computations. Some other
improvements to SADCT-like schemes are also suggested.
noise ratio (PSNR) of the O-SADCT is typically 1–2 dB
better than that of the NO-SADCT for any given bit rate.
RAR 75 кбайт
|
?
|
| Bert DeKnuydt, Stef Desmet, and Luc Van Eycken |
Coding of Dynamic Texture for Mapping on 3-D Scenes |
Abstract—As the availability of powerful 3-D scene renderers
grows, the demand for high visual quality 3-D scenes is increasing.
Besides more detailed geometric and texture information, this
presupposes the ability to map dynamic textures. This is obviously
needed to model movies, computers, and TV screens but also, for
example, for the landscape as seen from inside a moving vehicle
or shadow and lighting effects that are not modeled separately.
Downloading the complete scene to the user, before letting him
interact with the scene, becomes very impractical and inefficient
with huge scenes. If, as is often the case, a back channel is available,
on-demand downloading allows the user to start interacting
with the scene immediately.
Specifically for dynamic texture, if we know the viewpoint
of the user (or several users), we can code the texture taking
into account the viewing conditions, i.e., coding and transmitting
each part of the texture with the required resolution only.
Applications that would benefit from view-dependent coding of
dynamic textures include (but are not limited to) multiplayer
three-dimensional (3-D) games, walkthroughs of dynamic constructions
or scenes, and 3-D simulations of dynamic systems.
In this paper, the feasibility of such a scheme based on an
adapted optimal level allocation video codec is shown together
with the huge data-rate reductions that can be achieved with it.
RAR 743 кбайт
|
?
|
| Hideyuki Fujishima, Yusuke Takemoto, Takao Onoye, and Isao Shirakawa, Fellow |
An Architecture of a Matrix-Vector Multiplier Dedicated to Video Decoding and Three-Dimensional Computer Graphics |
Abstract—An architecture of a matrix-vector multiplier (MVM)
is devised, which is dedicated to MPEG-4 natural/synthetic video
decoding. The MVM can perform the matrix-vector multiplication
both in the inverse discrete cosine transform (IDCT) and
in the geometrical transformation of three-dimensional computer
graphics (3-D CG); or, specifically, it can achieve the multiplication
of a 4 . 4 matrix by a four-tuple vector necessary in the
one-dimensional IDCT for eight pixels and in the geometrical
transformation for a point in a 3-D space. This paper describes a
new architecture of this MVM and also shows the implementation
result of a functional module composed of four MVM’s with the
use of 440-k transistors, which can operate at 20 MHz or less.
RAR 433 кбайт
|
?
|
| G. L. Foresti |
A Real-Time System for Video Surveillance of Unattended Outdoor Environments |
Abstract—This paper describes a visual surveillance system
for remote monitoring of unattended outdoor environments. The
system, which works in real time, is able to detect, localize, track,
and classify multiple objects moving in a surveilled area. The
object classification task is based on a statistical morphological
operator, the statistical pecstrum (called specstrum), which is
invariant to translations, rotations, and scale variations, and it
is robust to noise. Classification is performed by matching the
specstrum extracted from each detected object with the specstra
extracted from multiple views of different real object models
contained in a large database. Outdoor images are used to test
the system in real functioning conditions. Performances about
good classification percentage, false and missed alarms, viewpoint
invariance, noise robustness, and processing time are evaluated.
RAR 258 кбайт
|
?
|
| A. Murat Tekalp, Yucel Altunbasak, and Gozde Bozdagi |
Two- Versus Three-Dimensional Object-Based Video Compression |
Abstract— This paper compares two-dimensional (2-D) and threedimensional
(3-D) object modeling in terms of their capabilities and
performance (peak signal-to-noise-ratio and visual image quality) for
very low bitrate video coding. We show that 2-D object-based coding
with affine/perspective transformations and triangular mesh models can
simulate almost all capabilities of 3-D object-based approaches using
wireframe models at a fraction of the computational cost. Furthermore,
experiments indicate that a 2-D mesh-based coder–decoder performs
favorably compared to the new H.263 standard in terms of visual quality.
RAR 479 кбайт
|
?
|
| Minoru Etoh, Choong Seng Boon, and Shinya Kadono |
Template-Based Video Coding with Opacity Representation |
Abstract— We describe an image coding scheme based on
multiple templates for interactive audio-visual (A/V) database
retrieval. The image sequence to be coded consists of overlapping
image planes with opacity information and depth ordering. In this
method, each image plane is independently encoded to a different
bit stream where each video object sequence is reconstructed
from representative frames (i.e., templates) by global and local
deformations. Owing to this coding scheme, the following new
functionalities are supported: identification and manipulation
of two-dimensional (2-D) video objects, selective decoding and
browsing of visual contents, and very high compression efficiency.
We have extended an MPEG1 coding software to an object-based
coding software. Experimental results of the proposed scheme
prove the above advantages.
RAR 665 кбайт
|
?
|
| Federico Pedersini, Augusto Sarti, and Stefano Tubaro |
Visible Surface Reconstruction With Accurate Localization of Object Boundaries |
A common limitation of many techniques for 3-D reconstruction
from multiple perspective views is the poor quality of
the results near the object boundaries. The interpolation process
applied to “unstructured” 3-D data (“clouds” of non-connected
3-D points) plays a crucial role in the global quality of the 3-D reconstruction.
In this paper, we present a method for interpolating
unstructured 3-D data, which is able to perform a segmentation
of such data into different data sets that correspond to different
objects. The algorithm is also able to perform an accurate localization
of the boundaries of the objects. The method is based on an
iterative optimization algorithm. As a first step, a set of surfaces
and boundary curves are generated for the various objects. Then,
the edges of the original images are used for refining such boundaries
as best as possible. Experimental results with real data are
presented for proving the effectiveness of the proposed algorithm.
RAR 1127 кбайт
|
?
|
| Liang Zhang |
Automatic Adaptation of a Face Model Using Action Units for Semantic Coding of Videophone Sequences |
Abstract—The topic of investigation is automatic adaptation
of a face model at the beginning of a videophone sequence
for implementing mimic analysis by means of action units in a
semantic coder. Here, not only the face model is to be adapted to
match the real face, but also initial values of action units are to be
determined. In the proposed algorithm, eye and mouth features
are first estimated using deformable templates. Then, the face
model Candide is adapted to these estimated features in three
steps, namely: 1) the global adaptation; 2) the local adaptation;
and 3) the mimic adaptation. For the mimic adaptation, six
action units are used and their initial values are determined. The
proposed adaptation algorithm differs from previous works in the
following aspects: 1) there is no restriction on the rotation for the
global adaptation of the face model and 2) initial values of action
units are determined due to the mimic adaptation. The proposed
algorithm has been experimented onto synthetic images and
natural head-and-shoulder videophone sequences with a spatial
resolution corresponding to CIF and a frame rate of 10 Hz. The
average errors for the estimation of eye and mouth features and
for the adaptation of the face model amount to 1.936 (pel) and
2.009 (pel), respectively. With this adaptation algorithm, mimic
analysis for semantic coding by means of action units in the
subsequent frames is realizable.
RAR 482 кбайт
|
?
|
| Nebojsa Jojic, Jin Gu, Helen C. Shen, and Thomas S. Huang, Fellow |
Computer Modeling, Analysis, and Synthesis of Dressed Humans |
Abstract—In this paper, we present computer vision techniques
for building dressed human models using images. In the first
part, we develop an algorithm for three-dimensional body reconstruction
and texture mapping using contour, stereo, and texture
information from several images and deformable superquadrics
as the model parts. In the second part, we demonstrate a novel
vision technique for analysis of cloth draping behavior. This
technique allows for estimation of cloth model parameters, such
as bending properties, but can also be used to estimate the contact
points between the body and clothing in the range data of dressed
humans. Combined with our body reconstruction algorithm and
additional constraints on the articulation model, the detection
of the garment–body contact points allows construction of a
dressed human model in which even the geometry that was
covered by clothing in the available data is reasonably well
estimated.
RAR 905 кбайт
|
?
|