# Dedicated Multicore Reconfigurable Processor for Image Processing

### Nirmal T M

M.Tech Scholar, Electronics & Communication Dept. Sahrdaya College of Engineering & Technology Thrissur, Kerala, India nirmal.graphic@gmail.com

Abstract— Field Programmable Gate Array (FPGA) technology has become a viable target for the implementation of real time algorithms suited to video and image processing applications. It is a power full tool in image and video processing. The binary image processing is useful in various areas, such as object recognition, tracking, motion detection and machine intelligence, image analysis and understanding, video processing, computer vision, and identification and authentication systems. The paper mainly focused to implement the processor architecture to work two image processing algorithm simultaneously on a single processor. The processor is implemented on FPGA with high amount of parallelism. The processor's architecture is attached with the module reconfigurable binary processing, input, output, display control unit, and peripheral circuits. The re-configurable processing module, which is comprised of binary calculation unit reconfigurable logic control and mixed-grained operating output, binary image processing operations, particularly mathematical morphology operations, and tools related algorithms more than 200 f / s for  $1024 \times 1024$  display. The architecture in this paper is consist of a subsidiary core dedicated to complex image processing algorithm, parallel to the first core. The paper focused on an optimized implementation of the pixel level fusion based DWT. The algorithms are the most convenient method since it uses the original pixel values of the images and can be done in both areas and spatial change. Spatial domain fusion acts directly on the pixels of the source images but will introduce distortions in the image fused spatial and spectral information not available. The disadvantages were overcome by using a minor change. Fusion technique is called fusion method VTVA incorporated in this manner. This paper presents a detailed description of the methodology change the fusion method VTVA hardware process realizable fusion method is linear pixel-level employees and

Keywords— Binary image processing, field-programmable gate array (FPGA), mixed grained, real time, reconfigurable.Image fusion, segmentation, feature extraction, object recognition, DWT, VTVA.

implemented a hardware system working-gate-array-based a

field Future programming in this area it is intended to extend to

other types of moralities image and to objectively evaluate image

fusion methods in real-time.

## I. INTRODUCTION

Real Time Image Processing algorithms are booming in the present scenario. Most of the current software's for running real time image processing algorithms are requiring most of the

## Anoop Suraj A

Assistant Professor, Electronics & Communication Dept.
Sahrdaya College of Engineering & Technology
Thrissur, Kerala, India
anoopsuraj@ieee.org

resources of a high end desktop computer to process. Also it will not be able to keep up with the increased scanning speed in case of real time applications. So researchers are moving towards implementation of hardware to offload much of these image reconstruction algorithms. FPGA[5] provides excellent platform in implementing real time Image Processing applications, since inherent parallelism of the architecture can be exploited explicitly. Image processing tasks executed on FPGAs are faster than the equivalent applications on general purpose hardware. In this paper, a thorough study based on implementation of Image fusion algorithm using Pixel level DWT is presented.

Binary Image processing is extremely useful in various areas, such as object recognition, tracking, motion detection and machine intelligence, image analysis and understanding, video processing, computer vision, and identification and authentication systems. Binary image processing has been commonly implemented using processors such as CPU or DSP. However, it is inefficient and difficult to use such processors for binary image processing. High-speed implementation of binary image processing operations can be efficiently realized by using chips specialized for binary image processing. Therefore, binary image processing chips have attracted much attention in the field of image processing. Similarly this paper is focused to implementation of a duel image processing architecture of image processor. Which will be more help full for application specific areas. The processing speed and power of processing can be very large[6]. The complex image processing algorithms that are already implemented on FPGAs with high utilization of FPGA. The aim of this architecture is to optimize the core structure and implement two algorithm with inter comparability on single FPGA. The algorithms can be changed according to the application. The reconfiguration nature of FPGA can be easily adopted for the same.

This paper presents a binary image processor that consists of a re-configurable binary processing module, including reconfigurable binary compute units and output control logic, input and output image control units, and peripheral circuits. The re-configurable binary compute units are of a mixed grained architecture, which has the characteristics of high flexibility, efficiency, and performance. The performance of the processor is enhanced by using the dynamic reconfiguration approach. The processor is implemented to perform real-time binary image processing. It is found that the processor can

process pixel-level images and extract image features, such as boundary and motion images. Basic mathematical morphology operations and complicated algorithms can easily be implemented on it. The processor has the merit of high speed, simple structure, and wide application range. Along with these a sub core of DWT[3] based image fusion algorithm.

Image fusion methods mainly classified into pixel (Low), feature (center), or (high) symbolic level. Pixel-level techniques in the field of spatial works have gained considerable interest primarily because of its simplicity and linearity. Multi resolution analysis approach is another popular pixel-level image fusion [3], using filters with increased spatial level in order to produce a series of images at different resolutions pyramid. In most of these approaches, at each position of the image transformation, the value of the pyramid corresponds to the highest saliency in use. Finally, the inverse transform of the image composite employed to derive the fused image. In the fields of remote sensing, the fusion of images multiband spectral bands will be different and corresponding areas of the electromagnetic spectrum is one of the main areas of research. The goal is the most efficient production techniques represented Multispectral image data together, ie oriented visualization applications in reduced data

There are many image fusion methods that can be used for high-resolution images from low-resolution Multispectral Multispectral images. Multiresolution Image Fusion Algorithms are more helpful in night vision and remote sensing application. The result is a fusion of new display image is more feasible human and machine perception of additional image processing operations such as segmentation, feature extraction and object recognition.

In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, known as super pixels). The goal is to simplify segmentation and / or change the representation of the image into something that is more meaningful and easier to analyze. Segmented display is usually used to things and boundaries (lines, curves, etc.) to find images. More precisely, image segmentation is the process of assigning a label to each pixel in an image such that pixels with the same label share certain visual characteristics.

The result is a series of image segmentation extracts together cover the entire image, or set of contours extracted from the image (see edge detection). Each of the pixels in the region are similar on some characteristic or computed property, such as color, intensity, or texture. The regions near significant difference in the characteristic (s) thereof. When a stack of images, typical in medical imaging, the contours can result after split display used to create 3D restored with the help of interpolation algorithms like marching cubes.

Pattern recognition and image processing, feature extraction is a special form of dimensionality reduction. Once the algorithm input data is too big to process and it is suspected to be notoriously excessive (eg, measuring the same in both feet and meters) then the transformed data into a reduced set of features represented (named features also vector). Transforming the CCD input data elements called feature

extraction. If the extracted features carefully chosen is expected that the version features relevant information from the input data in order to achieve the desired task using the reduced representation instead of the input size reported.

In computer vision task of finding and identifying objects in an image or video sequence - object recognition. People identify numerous objects in images with little effort, despite the fact that the image of the little things change in different views, in many sizes / different scale or even when they move or rotate. Can recognize objects even when they are partially blocked from view. This is still challenging task for computer vision systems. Many approaches to the task implemented some multiple years.

Also it is the main idea behind this implementation of the wavelet Transform in real-time environment, making implementation in FPGA DWT. The pixel-level image fusion algorithm run. Pixel-level fusion of images obtained using multiple sensors fuse. It is useful for remote sensing and night vision.

The implementation of real-time image fusion system is very demanding, since it employs algorithms are relatively high runtime complexity. Lately, hardware implementations has emerged as a means of real-time performance in image processing systems to achieve. The tailored primarily to multisensor platforms for video processing applications, such as the deployment of military efforts, security, and safety. In this paper, implementation is hardware real-time fusion system proposed. The system is based on Altera Cyclone II FPGA and performs pixel-level algorithm configurable linear lead fused images using color description language VHSIC hardware. The overall architecture is based on the control module, module covariance estimation, Cholesky decomposition transformation module. A detailed description of the Cholesky decomposition also available.

This paper is organized as follows. In Section II, a review of related work is presented. Section III, the image fusion method and section IV, the processor architecture is presented. The processor implementation is described in Section V, In Section VI, performance of the processor is evaluated and comparisons with the existing processors are made. Finally, discussions and conclusions are provided in Section VII.

# II. RELATED WORK

Reconfigurable binary image processing chip designed to generalize the binary image on a chip applications. Chips were presented to fundamental morphological binary operations, such as dilation, erosion, opening, and closing. Recommendation programmable analog video processors based on cellular neural network nonlinear architecture or universal machine for a wide range of applications such as motion analysis and texture classification.

A single instruction multiple programmable data (SIMD) chip real-time vision was presented to achieve target tracking speed, coprocessor introduced programmable binary morphology to the visual content analysis engine on the chip used for visual surveillance. A reconfigurable image processing accelerator incorporated eight macro processing element was

designed to support the transformation of real-time detection and background registration based on video object segmentation algorithm. Recently, a vision chip architecture massively parallel cellular range of processing elements were present for image processing by using the asynchronous or synchronous processing technique.

It is common practice to build application-specific chip for real-time binary image processing. However, there is a limited range of applications such chips. On the other hand, have their own problems-purpose binary image processing chips mentioned above. Some of the chips are made from the output circuit, and some are made up of analog and digital part. When compared with their digital output indicating the low robustness, accuracy, and scalability while it is a small area and low power consumption. Other general-purpose chip is the digital processor array architecture, which handles every single pixel digital processor. When largesized images processing, the chips will be very large. Therefore, further studies are needed for high performance, small size, and application across the range chip for real-time binary image processing design.

Works include former Face / Object Recognition [4], Tracking [5], Motion Detection & Machine Intelligence [6], Understanding [7], Video Processing [8], Computer Vision [9], Identity and Authentication System Image Analysis and. To recognize the image of natural scenery including some things [4], should be meaningful image regions separated, extracted and identified separately for reducing the complexity of the problem. Our proposed recognition procedure coarse region segmentation / extraction, Gabor wavelet transform (GWT) and dynamic link matching. The resistive-fuse network model image segmentation process well known to preserve its image edges and eliminate noise 1). Some efforts to implement the proposed LSI output 2, 3). However, the design is practical for large scale analog circuits resistive-fuse network (more than 100 × 100 pixels) is very difficult because of unexpected parasitic components and various non-idealities in analog circuits. Accordingly, we have recommended the resistivefuse network circuit using pulse-idthmodulation cellular neural networks (PWM-CNN) 4, 5), and demonstrated the successful implementation of LSI for 1-D case 6). We have also applied to the resistive-fuse network model of digital image processing 7).

Computer vision is in place smart embedded systems used in a wide range of fields ranging from robotics human computer interaction. Object [5] The basic component of computer vision can be very beneficial in applications such as unmanned vehicles, surveillance, traffic control automation, biomedical image analysis and intelligent robots tracking, to name a few. Object tracking is used for identifying the trajectory of objects moving video frame sequences. Like most computer vision tasks, object tracking involves intensive computation in order to extract the desired information from high-volume video data. Also, emphasis on the needs of realtime processing of various computer vision applications on the need for high-performance implementations of object tracking. In this work we propose the implementation of an effective tracking system on FPGA object could be employed in a wide range of embedded systems to provide high-performance and low-power. With shrinking process technologies enable higher transistor means a silicon die, FPGAs have been calculated attractive platforms for complex applications high-performance and low-power requirements. With hundreds of thousands of configurable logic blocks as well as thousands of distributed memory and DSP hardware modules, offering great flexibility for application mapping spatial parallel architectures. However, requiring benefit from the advantages of their modules and hardware re-configurable mapping algorithms efficient by carefully balancing parameters of performance, area and power. In this paper we describe our object tracking application on Altera Stratix III FPGA of. By profiling and analysis of the implementation of software performance bottlenecks identified and designed a hardware architecture to effectively leverage the spatial parallelism of reconfigurable fabric and exposes the different types of parallelism inherent in the object tracking algorithm selected. Our experimental results show that significant performance improvement (over 100x) to achieve compared to the execution of software for video multiobjects

Make many embedded DSP systems using DSP chip using one processing core with high memory-bandwidth connections to DSP algorithms implementation. In this investigation, we developed an alternative approach based on FPGA embedded system for image processing [6]. Field Programmable Gate Array's (FPGA) which is widely used in embedded applications such as automotive, communications, industrial automation, motor control, medical imaging, etc. FPGA is chosen because of its ability reconfigurable. Without requiring hardware change-out, extends the use of FPGA type devices to update product life data stream files. FPGAs have grown to have the capacity to hold an entire system on a single chip Meanwhile, it allows testing and debugging the system platform.

In addition, it provides the opportunity to use hardware / software co-design for high performance systems for various applications developed by incorporating processors (central processor hardware or software processor core), on-chip busses, memory , and hardware accelerators for specific software functions.

Image fusion has attracted a lot of interest in recent years. As a result, different fusion methods are proposed mainly in the fields of remote sensing and computer (eg, night) vision [7], and hardware implementations are presented as well as cope with processing real-time in different application areas. In this paper, the method is linear pixel-level fusion employees and implemented on-gate-array-based field-programmable hardware system is ideal for remote sensing data. Our work incorporates fusion technique (called VTVA) linear change is based on the Cholesky decomposition of covariance matrix of the data source. The circuit is composed of several modules, including an estimate of covariance, Cholesky decomposition, and promote transformation. Can the hardware design resulting compact characterized as linear configurable application because the symptoms can last color fused color selected by the user a way to control the resulting correlation between color components.

Today, a significant number of embedded systems focus on multimedia applications almost insatiable demand for lowcost, high-performance hardware and low-power [8]. Design of complex systems such as image and video processing, compression, face recognition, object tracking, 3G or 4G modems, multi-standard codecs and high definition (HD) decoding schemes requires complex integration of many blocks long and verification process. The complex designs based on the in peripheral / O, one or more processors, bus interfaces, A / D, D / A, embedded software, memories and sensors. Complete system is used to design a multi-chip and connected together on PCBs, but with today's technology, that can incorporate all these functions into a single chip.

These systems, called system-on-chip (SoC). Can Designing and video image processing unit complex and time consuming, and the verification process can take months depending on the complexity of the system.

Several studies done in recent years on the design and implementation of multimedia applications on FPGAs using systematic formal approach or otherwise. For example, provides Streams-C compiler technology to map high-level C language describes parallel circuit-level netlists focused on FPGAs. To use Streams-C effectively, the programmer must have some application specific hardware mapping expertise, as well as expertise in parallel programs under CSP (Communicating Sequential Process) model of computation. Streams-C consists of a small number of libraries and intrinsic functions added to a subset of C that the user must use to synthesizable HDL.

#### III. IMAGE FUSION METHOD

In this section, the vector representation of multidimensional image and basic principles of pixel – level image fusion algorithm is discussed. Which are more helpful while implementation on an FPGA [1].

The vector representation of a multidimensional image can be represented as follows, consider the image of M·N pixels per channel and K different channels can be explored, we can represented it as a vector,

$$X = [X_1, X_2, \dots, X_K]^T$$
 (1)

While the mean vector is used to define the average or expected position of the pixels in the vector space, the covariance matrix describes their scatter

$$C_x = \frac{1}{M \cdot N} \sum_{i=1}^{M \cdot N} X_i X_i^T - \overline{XX}^T$$
 (2)

The covariance matrix can be used to quantify the correlation between the multispectral bands. In the case of a high degree of correlation, the corresponding off-diagonal elements in the covariance matrix will be large. The correlation between the different multispectral components can also be described by means of the correlation coefficient. The correlation coefficient r is related to the corresponding covariance matrix element, since it is the covariance matrix element divided by the standard deviation of the corresponding multispectral component  $(r_{ij} = c_{ij}/\sigma_i\sigma_j)$ . The correlation coefficient matrix Rx has as elements the correlation coefficient between the i<sup>th</sup> and j<sup>th</sup> multispectral components. Accordingly, all the diagonal elements will be one, and the matrix is symmetric.

An important case is the Karhunen–Loeve transform, also known as principal component analysis (PCA). For this transformation, the matrix  $C_x$  is real and symmetric, thereby finding that a set of orthonormal eigenvalues is always possible. Let  $e_i$  and  $\lambda_i$ ,  $i=1,2,\ldots$ ,K, be the eigenvectors and the corresponding eigenvalues of  $C_x$  arranged in descending order.

Furthermore, let A be a matrix whose rows are formed by the eigenvectors of  $C_x$  ordered so that the first row of A is the eigenvector corresponding to the largest eigenvalue and the last row is the eigenvector corresponding to the smallest one. The matrix A is the transformation matrix that maps vector X into Y

$$Y = A^{T}(X - \bar{X}) \tag{3}$$

The mean of Y resulting from that transformation is zero, and the covariance matrix  $C_y$  is given by

$$C_{v} = AC_{x}A^{T}. (4)$$

The resulting covariance matrix  $C_y$  will be diagonal, and the elements along the main diagonal are the eigenvalues of  $C_x$ . The off-diagonal elements of the covariance matrix are zero, denoting that the elements of the vector population Y are uncorrelated. This transformation will establish a new coordinate system whose origin is at the centroid of the population and whose axes are in the direction of the eigenvectors of  $C_x$ . This coordinate system clearly shows that the transformation in (3) is a rotation transformation that aligns the eigenvectors with the data, and this alignment is exactly the mechanism that decorrelates the data.

The PCA transform is optimal in the sense that the first principal component will have the highest contrast and it can be displayed as a grayscale image with the bigger percentage of the total variance and, thus, the bigger percentage of visual information. The aforementioned property does not hold in the case of a color image. If the three principal components are used to establish a red-green-blue (RGB) image (the first component as red, the second as green, and the third as blue), the result is not optimal for the human visual system. The first principal component (red) will exhibit a high degree of contrast, the second (green) will display only limited available brightness value, and the third one (blue) will demonstrate an even smaller range. In addition, the three components displayed as R, G, and B are totally uncorrelated, and this is an assumption that does not hold for natural images. Therefore, a color image having as RGB channels the first three principal components resulted by the PCA transformation of the source multispectral channels possesses, most of the times, unnatural correlation properties as opposed to natural color images.

A different method for RGB multispectral data is not to totally decorrelate the data but to control the correlation between the color components of the final image. This is achieved by means of the covariance matrix. The proposed transformation distributes the energy of the source multispectral bands so that the correlation between the RGB components of the final image may be selected by the user/visual expert or adjusted to be similar to that of natural color images. For example, we could consider the case of calculating the mean correlation between red—green, red—blue,

and green-blue channels for the database of a large number of natural images. This can be achieved using a linear transformation of the form

$$Y = A^{T}X \tag{5}$$

where X and Y are the population vectors of the source and the final images, respectively. The relation between the covariance matrices is

$$C_{v} = A^{T}C_{x}A \tag{6}$$

where  $C_x$  is the covariance of the vector population X and  $C_y$  is the covariance of the resulting vector population Y. The required values for the elements in the resulting covariance matrix  $C_y$  are based on the study of natural color images. The RGB correlation coefficients depend on the scenes depicted in the images. However, since a large variety of images with different scenes, perceptually pleasing for the observer, have been chosen from the database, the mean value of the correlation coefficients is not affected by the selection of the scenes. The matrices  $C_x$  and  $C_y$  are of the same dimension, and if they are known, the transformation matrix A can be evaluated using the Cholesky factorization method. Accordingly, a symmetric positive definite matrix S can be decomposed by means of an upper triangular matrix Q so that

$$S = QT \cdot Q. \tag{7}$$

The matrices  $C_x$  and  $C_y$  using the aforementioned factorization can be written as

$$C_x = Q_x^T Q_x$$

$$C_y = Q_y^T Q_y$$
(8)

Thus we can say that,

$$Q_{y} = Q_{x}A \tag{9}$$

And the transformation matrix A is

$$A = Q_x^{-1} Q_y \tag{10}$$

The final form of the transformation matrix A implies that the proposed transformation depends on the statistical properties of the original multispectral data set. Additionally, in the design of the transformation, the statistical properties of natural colour images are taken into account. The resulting population vector Y is of the same order as the original population vector X, but only three of the components of Y will be used for colour representation.

The relation between the covariance Cy and the correlation coefficient matrix Ry is given by

$$C_{y} = \Sigma R_{y} \Sigma^{T} \tag{11}$$

Where,

$$\Sigma = \begin{bmatrix} \sigma_{y1} & 0 & 0 & 0 \\ 0 & \sigma_{y2} & 0 & 0 \\ 0 & 0 & \sigma_{y3} & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \sigma_{vk} \end{bmatrix}$$
(12)

is the diagonal matrix with the variances (or standard deviations) of the new vectors in the main diagonal and



Fig 1. Flowchart for the proposed system.

Is the desired correlation coefficient matrix. The necessary steps for the method implementation are shown in Fig. 1 and can be summarized as follows.

- 1) Estimate the covariance matrix  $C_x$  of population vectors X.
- 2) Compute the covariance matrix  $C_y$  of population vectors Y, using the correlation coefficient matrix  $R_y$  and the diagonal matrix  $\Sigma$ .
- 3) Decompose the covariance matrices  $C_x$  and  $C_y$  using the Cholesky factorization method in (8) by means of the upper triangular matrices  $Q_x$  and  $Q_y$ , respectively.
- 4) Compute the inverse of the upper triangular matrix Qx, namely,  $Q_x^{-1}$
- 5) Compute the transformation matrix A in (10).
- Compute the transformed population vectors Y using (5).
- 7) Scale the mapped images to the range of [0, 255] in orde to produce RGB representation.

## IV. ARCHITECTURE

The presentation processor designed for applications in image and video processing, computer vision, machine information, and identification and authentication systems. Should have high flexibility and high performance processor for application throughout the system; Thus, the processor design oriented high flexibility and speed. Some of the traditional works are designed for specific applications and some large areas and high power consumption. Then binary reconfigurable processing module with high speed and simple structure in place for widespread use and takes less hardware resources. The proposed architecture of the processor shown in Fig. 4.1. The core is the processor module reconfigurable processing unit comprises binary calculation binary and analog control logic. There are two bus interfaces, the units of input and output control logic, the control unit processes, and group configuration program the processor too.



Fig. 3.1. Architecture of the binary image processor.

# A. Reconfigurable binary processing module

The diagram of the binary reconfigurable processing module (RBPM) given in Fig. 4.2. It can be divided into two main sections. The first part of the control logic output, select the output from the binary output calculating unit in accordance with the given parameters and converts data series 1-b binary images into parallel data. Constitutes the second part of the calculation unit to make some binary logic operations binary and binary image at high speed. These algorithms achieve binary image by calculating the operations in separate units and pattern binary connection of units. The unit can execute binary image operations pipelined or parallel way. The operation is executed in the calculation unit configurable binary decision programs, including logic operation parameters, parameters of image resolution, mask sizes, input parameters and output selection, and auxiliary parameters. Fig. 4.3 shows examples of how the process works module with reconfigurable binary eight binary calculation unit. Fig. 4.3 (a), the reconfiguration RBPM eight-stage pipelined architecture. Fig. 4.3 reconfiguration RBPM two four-stage pipelined architectures such can have two image processing at the same time. Fig. 4.3 (c) - (e), the RBPM are reconfigured in a parallel structure. Fig. 4.3 (c), eight images.



Fig. 3.2. Diagram of the reconfigurable binary processing module.

Subject to the same image processing operations in eight binary calculation unit, respectively. Fig. 4.3 (d), the same operation is performed on eight different parts of the image. Fig. 4.3 (e), various operations performed on the eight parts of

an image. The architecture provides a reconfigurable hardware using higher than the pipelined architecture. For example, if one need only display a single operation, the pipelined architecture shown in Fig. 4.3 (a) underlying hardware use low and inefficiency. The reconfiguration parallel architecture, as shown in Fig. 4.3 (d), can increase the use of hardware, and is the eighth time that the process of the pipelined architecture, as shown in Fig. 3.3 (a). The architecture is binary calculation unit shown in Fig. 4.4 (a). All binary calculation unit, with two binary computation element and one set of operating elements that can make logic, reduction, median filtering, and set operations. Mixed-grained architecture has high flexibility, efficiency, and performance, and a short reconfiguration time the binary calculation unit. Level granularity refers to the manipulation of data. Usually, there are two types of discrimination granularity: fine-grained, corresponding to bitlevel data manipulation, and coarse-grained, corresponding to the word level. Is a fine-grained architecture is very flexible, and the architecture is less coarse grained reconfiguration parameters and it is very effective. Mixedgrained architecture is more flexible and efficient than coarse-grained architecture, and reconfigure parameters is less than the fine-grained architecture.



Fig. 3.3. Some examples of the reconfigurable binary processing module. Pipelined manners. (a) Eight-stage pipelined architecture. (b) Two four-stage pipelined architectures. (c) Eight images undergo the same image processing operation. (d) Same operation is performed on eight different parts of an image. (e) Different operations are performed on eight parts of an image. BCU: binary compute unit.

Can set the operating element set of binary operations such as union, intersection, complement, subtraction, addition, and straight-through output. The input of the operating element set and outputs the binary calculation units transmitted through

two sets of multiplexers, respectively, which makes the unit more flexible architecture. The input can be transmitted to the operating element to be set by the multiplexers operating results of binary logic elements, reducing the yield, and the median filtering result. Can calculate the binary output units transmitted through multiplexers to the original input the binary calculation unit, operating results of binary logic elements, reducing the yield, the median filtering result, and the operating result the operating element set. The element is a fine-grained set architecture. Operands are set element 1 b; therefore, logic block is 1-b of the element set and shows high flexibility and efficiency.



Fig. 4.4. (a) Architecture of the binary compute unit. (b) Architecture of the binary compute element.

When is the size of the image processing block  $n \times n$ , n - 1is necessary memories come to a depth equal to the width of the display buffer display signs. When that video images processing, data input parameters selected from the group program or SDRAM. The element can be a binary logic operations such as AND, OR, NOT, NAND, NOR, XOR, XNOR, and straight-through output. Reduce the element performs operations such as reduction of AND, OR reduce, reduce NAND, NOR reduce, reduce XOR, XNOR reduction, and straightthrough output. Fulfills the element set operations such as union, intersection, complement, subtraction, and XOR. All the operating results from the binary logic elements, reducing the element, and the median filter output binary and three multiplexers are synchronized with the other binary calculation unit. The binary element calculation coarse grained architecture featured by high performance and reconfigurable short time. The binary calculation unit the characteristic of programmability and configurability since the programmable logic applied in the design of binary logic element, reducing component and binary median filter in binary computation element, the element set, and multiplexers. In summary, the binary calculation unit suitable for binary image processing due to its high performance, flexibility, and short configuration time.

# B. Input and output control logic units

The display signals must be synchronized with the input control logic unit before reconfigurable processing module input binary because one-to-one matching required between the pixels in different images. Selects the input control logic unit and synchronizes the input video images, SDRAM, and programs to the synchronization circuit. The block diagram of the control logic input unit shown in Fig. 4.5.



Fig 4.5 Block diagram of the input control logic unit.



Fig 4.6 The Architecture which easily simulated

The unit has four data converters and synchronization circuit. Data Converters 1 and 2 convert 1-b signals image data into parallel 32-b, which is the same format of the data from SDRAM and programs. 3 and 4 Converters convert parallel data display signs 1-b, which is then synchronized by the synchronization circuit. To increase the processing rate, down two sampling circuit is a down-sample image signals before they are processed by the data converters 1 and 2. The unit writes control logic output parallel image data selected from the reconfigurable processing module binary in the SDRAM through the bus interface 1.

## C. Process control unit and configuration registers

Reads the control unit processes the configuration information in the configuration programs. It controls the operation process of the binary reconfigurable processing module. It also controls the units of input and output control logic and interface bus during data access. After processing the

image data written to SDRAM, transmits the control unit processes interrupt requests to carry out the interaction between the processor with external systems. The group's configuration program extremely important part in the proposed processor. Control parameters, reconfiguration information, operating parameters, and information interaction. Most of the programs in the group configuration program written by an external CPU via the system bus, and the rest are written by the internal modules in the proposed processor.



Fig 4.7 Modified Architecture

The processor can also modified as a dual core mode so a dedicated processing architecture can be include to achieve high real time performance, the modified architecture can be include blocks such as image fusion, image convolution etc... the fig 3.7 shows the architecture of image processor with a separate processing module of image fusion.

### V. CIRCUIT IMPLEMENTATION

In this section, an image processing system based on the proposed binary image processor is implemented on an Altera Stratix II EP2S180C4 field-programmable gate array (FPGA) to verify the performance and feasibility of the processor in binary image processing. The following shows the main circuit blocks of the system.

## A. Binary compute unit

To look at the size, generality, and usability of the processor, logic element is set to be 32 wide b. The maximum block size of  $5 \times 5$  image processing two binary inputs of each element computer and program video image signal, respectively. The line memories for all aspects of computer binary memories 4-line length of 1280 (the maximum size of the processed image is  $1280 \times 720$ , depending on the resolution video camera). Computes the binary reduction feature 32-b, 25b, or 9-b operate control programs reduce configuration. The filter performs binary median filtering medium 32 b, 25 b, or 9 b. This operation is 25-b is a valid logic element from 4 b to 28 b, although the operation 9-b is valid from 12 b to 20 b. As for the direct reduction operation through, the output of the 16 bit input logic element. As shown in Fig. 4.1, there are four binary calculation unit converters and two binary reconfigurable processing module. Are the binary calculation unit 1 analog input The input control logic unit. The calculation unit binary inputs 2/3/4 outputs logic input control unit and the binary calculation unit 1/2/3. The calculation unit multiplexers determine binary input will be processed. One hundred and sixteen bit control and configuration parameters required for the operations of each unit of binary computation. Regarding the entire processing module binary, is  $13 \times 32$  bit program group configuration is used for reconfiguration control and image processing.

## B. Image processing system

The binary image processing system with the proposed processor shown in Fig. 4.2. The bi-bus architecture for the system adopted for data access efficiency. SDRAM1 is used as the main memory for the CPU. SDRAM2 used to store images. The CPU is used as a controller. The program group 2 and interrupt controller is also used to control the system. This approach is applie dynamic reconfiguration to reconfigure the binary image processor. Reconfiguration parameters reduced to  $24\times32$  bits because of the mixed-grained architecture the binary image processor. The reconfiguration time less than 30 cycles.

## C. Fusion subsystem implementation

To effectively implement the fusion method VTVA in hardware based on FPGA, we derived the block-based system architecture shown in Figure necessary. 5.1. The circuit is composed of seven blocks (marked in blue), where the Cholesky decomposition, the inverse matrix, and the transformation matrix integrated in one subsystem, as defined in Section V. The overall architecture of the four input data and one data output. Also, control signal is required, as shown in Table I.



Fig.5.1. Block-based architecture of the VTVA fusion system.

TABLE I VTVA Fusion System I/O Signals

| Ports           | No.<br>Bits                                               | Description                                                        |  |  |  |  |
|-----------------|-----------------------------------------------------------|--------------------------------------------------------------------|--|--|--|--|
|                 |                                                           | Inputs                                                             |  |  |  |  |
| $X_{1i}$        | 8                                                         | The i-th component of the original population vector X1            |  |  |  |  |
| X <sub>2i</sub> | 8                                                         | The i-th component of the original population vector X2            |  |  |  |  |
| X <sub>3i</sub> | 8                                                         | The i-th component of the original population vector X3            |  |  |  |  |
| $X_{4i}$        | 8                                                         | The i-th component of the original population vector X4            |  |  |  |  |
| EN              | 1                                                         | High activated to declare valid input data                         |  |  |  |  |
|                 |                                                           | Outputs                                                            |  |  |  |  |
| $Y_{1i}$        | 8 The i-th comp. of the transformed population vecto      |                                                                    |  |  |  |  |
| $Y_{2i}$        | 72i 8 The i-th comp. of the transformed population vector |                                                                    |  |  |  |  |
| Y <sub>3i</sub> | 8                                                         | The i-th comp. of the transformed population vector Y <sub>3</sub> |  |  |  |  |
| Y <sub>4i</sub> | 8                                                         | The i-th comp. of the transformed population vector Y <sub>4</sub> |  |  |  |  |

#### D. Synthesis results

The processor is synthesized with the SMIC 0.18- $\mu$ m cell library and synthesis results of the proposed processor are shown in Table II. Then, the processor is implemented on Altera Stratix II board EP2S180C4 FPGA for verification. The detailed hardware consumption for each component of the processor shown in Table III. Eight 4  $\times$  1280-b line memories are made in the block memories reconfigurable processing module binary. When the earth rises to the line memories, the processor is processing more images. For example, if the maximum image size of 1920 horizontal, the depth can be line memory 1920. The memory block units' logic input and output buffer used to synchronize images.

The results of the application of the calculation unit of the FPGA binary ALUTs 641, 457 programs and 240-b 10 memories. When the frequency of 100 MHz, can the dilation or erosion operation is performed at 200m pixels per second. This means that the frame rate will 200 f/s when dilation or erosion with the structuring element  $5 \times 5 \times 1024$  1024 operation display. Possible number of binary calculation unit to adjust to achieve a target of the processor performance.

# VI. HARDWARE SYSTEM SYNTHESIS RESULTS

The processor is synthesized with the SMIC 0.18-um cell library and synthesis results of the proposed processor are shown in Table II. Then, the processor is implemented on Altera Stratix II board EP2S180C4 FPGA for verification. The detailed hardware consumption for each component of the processor shown in Table III. Eight  $4 \times 1280$ -b line memories are made in the block memories reconfigurable processing module binary. When the earth rises to the line memories, the processor is processing more images. For example, if the maximum image size of 1920 horizontal, the depth can be line memory 1920. The memory block units logic input and output buffer used to synchronize images. The results of the application of the calculation unit of the FPGA binary ALUTs 641, 457 programs and 240-b 10 memories. When the frequency of 100 MHz, can the dilation or erosion operation is performed at 200m pixels per second. This means that the frame rate will 200 f / s when dilation or erosion with the structuring element  $5 \times 5 \times 1024$  1024 operation display. Possible number of binary calculation unit to adjust to achieve a target of the processor performance. The processor used is normal shown in Table III,

TABLE II Utilization of resource on normal FPGA Image Processor Altera Stratix II EP2S180C4 FPGA

| Process                   | SMIC 0.18-μm |  |  |
|---------------------------|--------------|--|--|
| Area (mm²)                | 2.56         |  |  |
| Gate count (K)            | 45           |  |  |
| Memory (mm <sup>2</sup> ) | 1.96         |  |  |
| Power consumption (mW)    | 98.5         |  |  |
| Speed (MHz)               | 220          |  |  |

For the modified processor, consider the case of Fusion mechanism. For the normal image fusion need the resource according to Table IV. So we need to make the FPGA program according to accommodate both processor in single system. It is possible be making program for each logic cell separately.

TABLE III
Occupied Resources of the Fusion Implementation On An
Altera Stratix II EP2S180C4 FPGA

| System's Unit                                         | Logic<br>Registers<br>(Available<br>34593) | LUT/<br>LC | Logic<br>Cells<br>(Available<br>33216) | Memory<br>Bits | Multiplier<br>9-bit |
|-------------------------------------------------------|--------------------------------------------|------------|----------------------------------------|----------------|---------------------|
| Covariance<br>Matrix                                  | 2590<br>(7.49%)                            | 217        | 2807<br>(8.45%)                        | 70             | 16                  |
| Variance Y                                            | 535<br>(1.55%)                             | 266        | 801<br>(2.41%)                         | 0              | 0                   |
| Cholesky<br>Decomposition<br>Transformation<br>Matrix | 1725<br>(4.99%)                            | 1100       | 2825<br>(8.50%)                        | 0              | 11                  |
| Linear<br>Transformation                              | 660<br>(1.91%)                             | 240        | 900<br>(2.71%)                         | 0              | 30                  |
| Scaling                                               | 876<br>(2.53%)                             | 723        | 1599<br>(4.81%)                        | 0              | 0                   |
| Control Unit                                          | 80 (0.23%)                                 | 282        | 362<br>(1.09%)                         | 0              | 0                   |
| Overall System                                        | 5590<br>(18.7%)                            | 2828       | 9294<br>(27.97%)                       | 70             | 57                  |

The normal utilization of FPGA for image Processor is according to Table IV

TABLE IV Utilization FPGA Image Processor Altera Stratix II EP2S180C4 FPGA

| Module                                     | ALUTs | LC Registers | Block<br>Memory (bits) |
|--------------------------------------------|-------|--------------|------------------------|
| Reconfigurable binary<br>processing module | 2564  | 1828         | 40 960                 |
| Input and output logic                     | 1331  | 844          | 19456                  |
| Register and control logic                 | 264   | 809          | 0                      |
| Whole processor                            | 4159  | 3481         | 60416                  |

Table IV shows the total utilization of modified dual core image processor on FPGA, it contain both normal image processor as well as the add on processor, in this case analysis is based on image fusion architecture. Throughput Of The Individual Units Of The Proposed Implementation

#### VII. CONCLUSION

In this paper, a reconfigurable binary image processor was proposed to perform real-time binary image processing. The Modified Processor can perform very fast to application oriented real-time image processing applications. The proposed model of image processor can be used in a real time manner and it is implemented on an EP2S180 field-programmable gate array. The fusion method discussed was the VTVA fusion method is configurable so it allows the user to control the relation between images and make better fusion result from that and the fusion result will be color image. The hardware realization can be made based on FPGA technology provides a fast, and compact solution for image fusion. The dedicated sections provide a detailed description of the methodology to transform the VTVA fusion method in a hardware realizable process. Future work in this field is planned for extension to other types of image modalities and to objectively evaluate image fusion methods in real time.

In this paper, a reconfigurable binary image processor was proposed to perform real-time binary image processing. The processor consists of a reconfigurable binary processing module, input and output image control units, and peripheral circuits. The reconfigurable binary processing module has a mixed-grained architecture with the characteristics of high efficiency and performance. The dynamic reconfiguration approach was used to increase the processor performance. Basic mathematical morphology operations and complicated algorithms can easily be implemented on it because of its simple structure. The processor, featured by high speed, simple structure, and wide application range, is suitable for binary image processing, such as object recognition, object tracking and motion detection, computer vision, identification, and authentication.

The optimization and finalizing the processor configurations can be done as a future part.

## REFERENCES

- Bin Zhang, Kuizhi Mei and Nanning Zheng "Reconfigurable Processor For Binary Image Processing" Ieee Transactions On Circuits And Systems For Video Technology, Vol. 23, No. 5, Pp 823-831, May 2013
- [2] C.T. Johnston, K.T.Gribbon, D.G.Bailey, "Implementing Image Processing Algorithms on FPGAs", Eleventh Electronics New ZealandConference, Palmerston North, New Zealand, PP 118-123, 2004.

- [3] Bruce A. Draper, J. Ross Beveridge, A.P. Willem Böhm, Charles Ross, Monica Chawathe, "Accelerated Image Processing on FPGAs", IEEE Transactions on Image Processing, Vol. 12, No. 12. Pp. 1543-1551, 2003.
- [4] T. Nakano, T. Morie, and A. Iwata, "A Face/Object Recognition System Using FPGA Implementation of Coarse Region Segmentation", SICE Annual conference in Fuki 4-6, Pp 1418-1422, August 2003.
- [5] Saisudheer.A, Mtech Vlsisd, "Object Tracking System Using Stratix FPGA" International Journal of Computer Engineering Science (IJCES), Volume 3 Issue 10, Pp 1-9, ISSN: 2250:3439, October 2013.
- [6] S. Nazeer Hussain, K. Sreenivasa Rao, S.Mohammed Ashfaq "The Hardware Implementation of Motion Object Detection Based on Background Subtraction" International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 10, PP 1297-1302, October 2013.
- [7] Dimitrios Besiris, Vassilis Tsagaris, Nikolaos Fragoulis, and Christos Theoharatos, "An FPGA-Based Hardware Implementation of Configurable Pixel-Level Color Image Fusion", IEEE Transactions On Geoscience And Remote Sensing, Vol. 50, No. 2, Pp 362-373, February 2012.
- [8] C. Desmouliers E. Oruklu S. Aslan J. Saniie F.M. Vallina, "Image and video processing platform for field programmable gate arrays using a high-level synthesis", IEEE Workshop on Embedded Computer Vision, pp. 441–452, June 2005.
- [9] Mainak Sen, Ivan Corretjer, Fiorella Haim, Sankalita Saha, Jason Schlessman†, Shuvra S. Bhattacharyya, and Wayne Wolf, "Computer Vision on FPGAs: Design Methodology and its Application to Gesture Recognition", SICE Annual conference in Fuki 4-6, Pp 1418-1422, August 2003.
- [10] Steffen Klupsch, Sorin A. Huss, M. Rumpf, R. Strzod, "Real Time Image Processing based on Reconfigurable Hardware Acceleration" International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 8, PP 812-835, September 2012.
- [11] Andreas Ellmauthaler, Carla L. Pagliari, and Eduardo A. B. da Silva "Multiscale Image Fusion Using the Undecimated Wavelet Transform With Spectral Factorization and Nonorthogonal Filter Banks" IEEE Transactions On Image Processing, Vol. 22, No. 3, March 2013, pp.1005-1017.
- [12] Mrityunjay Kumar and Sarat Dass "A Total Variation-Based Algorithm for Pixel-Level Image Fusion" IEEE Transactions On Image Processing, Vol. 18, No. 9, pp.2137-2143. September 2009I. S. Jacobs and C. P. Bean, "Fine particles, thin films and exchange anisotropy," in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
- [13] K. Kotwal and S. Chaudhuri, "Visualization of hyperspectral images using bilateral filtering," IEEE Trans. Geosci. Remote Sens., vol. 48, no. 5, pp. 2308–2316, May 2010.
- [14] K. Mei, B. Zhang, and C. Ge, "A hierarchical and parallel SoC architecture for vision procesor" IEICE Electron. Express, vol. 6, no. 19, pp. 1380–1386, 2009.