VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
Cell's biomechanical features extraction from very high
Major: Computer Science
“I hereby declare that the work contained in this thesis is of my own and has not been
previously submitted for a degree or diploma at this or any other higher education institution.
To the best of my knowledge and belief, the thesis contains no materials previously published
or written by another person except where due reference or acknowledgement is made.”
“I hereby approve that the thesis in its current form is ready for committee
examination as a requirement for the Bachelor of Computer Science degree at the University
of Engineering and Technology.”
Firstly, I would like to express my sincere gratitude to my supervisor Assoc.
Prof Le Thanh Ha of University of Engineering and Technology, Vietnam National
University, Ha Noi for their instructions, guidance and their research experiences.
Secondly, I am grateful to thank my co-supervisor M.S. Tran Si Hoai Trung of
Division of Solid State Physics and Nano Lund, Department of Physics, Lund
University, Sweden for invaluable assistance and knowledge during our working time.
Moreover, I am grateful to thank all the teachers of University of Engineering
and Technology, VNU for their invaluable lessons which I have learnt during my
I would like to also thank my friends in K59CA class, University of
Engineering and Technology, VNU.
I greatly appreciate the helps and support from Human Machine Interaction
Laboratory of University of Engineering and Technology during this project.
Abstract: Interdisciplinary research has been a primary study in the recent years. Especially
the computational biology, which includes many aspects of bioinformatics, is the combined
research of math, statistics and computer science to solve biology-based problems. Nowadays,
thanks to advanced technology, we were able to collect a vast volume of biological data, faster
than we can analyze it. More and more the need to develop analytical method for interpreting
accurate quantitative biological features of these available data.
One of the basic biological data is cell information. To exploit and interpret the data,
the process is often complicated and requires a lot of efforts and time consume. Our project
takes on the research of cell classification with the goal to break a new ground in the cell
classifier method. For further study, we first aim at the pre-analysis of data, the extraction of
possible accurate quantitative features that we can exploit from our data.
Our thesis will introduce a new cell’s biomechanical features extraction method to
interpret biological cell data. Despite the simple method and some drawbacks, the conclusion
results in evaluative, accuracy features. From these biomechanical features extraction, we
help to understand and study more about available cell data in the project.
Keywords: Bioinformatics, Biomechanical features extraction.
Tóm tắt: Nghiên cứu liên ngành đang là một trong các hướng phát triển chính trong những
năm gần đây. Trong đó, nổi bật là tính toán sinh học, bao gồm rất nhiều khía cạnh trong tin
sinh, là một tổ hợp nghiên cứu kết hợp tính toán, phân tích và khoa học máy tính để giải quyết
những bài toán gốc sinh học. Ngày nay, nhờ những phát minh hiện đại, chúng ta đã có thể
trích xuất được rất nhiều dữ liệu sinh học, nhanh hơn cả khả năng xử lý và phân tích bọn
chúng. Ngày càng nhiều nhu cầu phát triển những phương pháp, phần mềm hỗ trợ phân tích
và trích xuất giá trị định lượng sinh học đúng đắn này.
Một trong những dữ liệu sinh học cơ bản là tế bào sinh học. Việc nghiên cứu và khai
thác dữ liệu tế bào sinh học luôn là một bài toán phức tạp và khó khăn. Dự án của chúng tôi
thực hiện nghiên cứu phân loại tế bào sinh học với mục tiêu tìm được một bước đột phá mới
trong phương pháp phân loại tế bào. Để tìm hiểu sâu hơn, trước tiên chúng ta cần tiền xử lý
dữ liệu, trích xuất những đặc trưng có thể, chính xác từ dữ liệu tế bào sinh học.
Đồ án dưới đây sẽ giới thiệu một phương pháp trích xuất đặc trưng cơ sinh học mới
của tế bào sinh học. Tuy phương pháp đơn giản và còn vài hạn chế, nhưng vẫn đưa ra những
kết quả đặc trưng chính xác có đánh giá. Từ những dữ liệu đặc trưng này, chúng ta giúp bản
thân hiểu sâu hơn về những tế bào sinh học trong dự án.
Từ khóa: Tin sinh học, Trích xuất đặc trưng cơ sinh học.
TABLE OF CONTENTS
List of Figures
List of Tables
Nowadays, interdisciplinary research has been a primary focus of the study.
With the help of modern laboratory technology, biologists were able to collect a huge
number of data. The process is quicker than biologists can analyze it. With the
improvement of internet and iClouds, it is possible to share and store data among
biology research centres. We now have a vast volume of biological data without the
mean and technique to interpret it.
This is when the need to develop newly analytical method for interpreting
accurate quantitative features of these available biological data. Computational
biology, which includes many aspects of bioinformatics, is the combined research of
math, statistics and computer science to solve biology-based problems. Computational
biologists develop and apply software tool, statistical physics and algorithm design to
analyze biological data.
With the help of statistical, mathematical and computational software,
biologists can analyze the quantitative prediction and interpretation of available data,
explore more sophisticated and highly complex problem.
1.2. Contribution and thesis overview
In biology, the view of cell is a basic data which can receive from microscope’s
extraction. The study of these biological data through research has been a complex and
The purpose of this thesis is to propose a new algorithm, which allows
extracting biomechanical features of cell from high spatio-temporal videos. From that,
helping to understand and study more about the characteristics of cell in the data.
By using SIFT, affine transformation and many simple algorithms, we create an
overall interface to extract evaluative accurate quantitative biomechanical features of
1.2.2. Thesis overview
The rest of this thesis consists of four parts.
Chapter 2 introduces our project overview and biological cell data, which are
high spatio-temporal videos. We are going to describe the basic attributes of the data,
what we can see, the standard information. Then come the challenges of analyzing
biomechanical features from these cell data that need to be solved.
Chapter 3 describes the methods we are going to use for extracting the
biomechanical features of cell from high spatio-temporal videos. We extract total two
kinds of features. Shape features, which contains area, perimeter and deformability of
the cell. Speed features, which contains translation speed and rotation speed of the cell.
In chapter 4, we are gonna talking about the graph result of biomechanical
features, the interface, the drawbacks and the evaluation accuracy of these extracted
Chapter 5 is the conclusion. What we have achieved, what we have not and
what is the future development from this thesis.
2.1. Project experiment
The project takes on research on different types of cell. A mixture of cells was
loaded to a microfluidic sorting chip. Based on the typical geometry of pillar array,
different fractions of cells were separated and then collected after the operation (Figure
2.1). To be able evaluate the deformation of an individual cell, a high-speed camera
(up to 10.000fps) was used to record and export the videos.
Our objective is to extract the cell’s biomechanical features from the cell’s
videos got recorded from Capture 1, where the cells have not separated. By retrieving
these analyzed features, we understand and gain more knowledge about the data and
research of the biological cell.
Each experiment is input of one type of cell. From each input, we can control
the microscope to select coordinates, scale and length of the recorded video. We can
control the air pressure flow through glass panel to create a variety type of data. The
stronger the air pressure, the faster the cell moved, we can see clearer the differences
between cells from their biomechanical features.
Deformability-based cell separation experiment
Figure 2.1. Overview of the experiment of deformability-based cell separation. The
images of cells are captured by a high-speed camera and then extracted by our
2.2. Data extraction
The cell data are recorded by high speed camera, with speed of 9095 frames per
second and scale of 0,108 µm per pixel. The computer auto detect frames containing
cells and extract them. Then, concatenate these frames into video data.
We control the air pressure in the glass panel, which lead to increasing and
decreasing of water pressure, speed of the cell. From that, we can extract total four
types of data per cell type depends on the air pressure, which are 300 mbar, 500 mbar,
700 mbar and 900 mbar.
Our biological cell data are single band grayscale with low amplitude of pixel
intensity. Under each video are computing details like frame per second, cell type, air
pressure, date and time of the recorded data. For exploring the cell’s biomechanical
features, we only need to analyze the cell. Therefore, we cut off the detail’s part and
keep only the cell’s frame data, like in Figure 2.2.
Figure 2.2. (a) Original frame with a detail of the camera setting (b) The cropped
frame for our analysis
3.1. Workflow diagram
Figure 3.1. Cell’s biomechanical features extraction workflow diagram.
From an input frame of original videos, there are several steps to process and
analysis to extract cell biomechanical features. Different algorithms and methods are
used in the work for the code has a faster processing time, such as background
removal, morphological transformations (erosion and dilation), SIFT and affine
transformation. In the final result, five extraction features are detected and divided into
two categories. The first category is shape features, which contain area, perimeter and
deformability of the cell. The second category is speed features, which contain the
translation speed and rotation speed of the cell.
3.2. Background removal
The original cell data are grayscale and low amplitude in pixel intensity.
Because the cell is mostly transparent, except the border and some distinctive area of
the cell, a large part of the cell are blended in with the background pixel intensity. This
leads to the difficulty of analyzing and interpreting the biomechanical features on
In order to explore the cell further, we apply a background removal algorithm.
For that, we remove the static background and set as black, keeping the moving
foreground, which is the cell and set as white. Because the cells are almost transparent,
they have identical pixel intensity with the background. This lead to background
removal algorithm only detects distinctive parts of the cell. However, in our case, is
enough for us to locate and further analyze the cell.
Figure 3.2. Background removal algorithm. (a) An original image (b) The image with
its background removed.
3.2. Shape features
3.2.1. Density cell (Erode and Dilate)
After we acquire background removal frame, we can see the foreground cell is
not a dense object, but a group of white pixel contours not connect to each other. To
further explore shape features like area, perimeter of the cell, we need to densify these
To create a fully dense object, we apply binary erode and binary dilate
algorithm in morphological image processing. The two methods are opposite of each
other. Binary erode, for each pixel in a kernel, if a pixel is 0, the destination pixel is
also a 0, else 1. Binary dilation, for each pixel in a kernel, if at least a pixel is 1, the
destination pixel is 1, else 0.
Erosion computes a local minimum over the area of the kernel, which reduce
the size of object.
Figure 3.3. Erosion example. Retrieved from
Dilation computes a local maximum over the area of the kernel, which enlarge
the size of object.
Figure 3.4. Dilation example. Retrieved from
By applying erosion after dilation, we can close the hole between contours or
object, without significantly changing the structure of the cell.
Figure 3.5. Close example. Retrieved from
Depend on the size of the kernel, the larger the kernel size, the bigger the gap
we can close by using erosion after dilation, however it’s also risky of adding more
noises to the cell. So we need the kernel size to be smaller the better, but still enough
to fill the gaping holes between cell’s contours. To know the optimal threshold for
kernel size, we keep increasing its size until the cell is fully closed. In our data, we
found the minimum size of kernel averages around 21 x 21 to fill out the gaping cell.
Figure 3.6. Density cell: (a) Background removal data (b) Density cell data.
3.2.2. Area and Perimeter
On the example above, we get the value of only one big, dense contour which is
the cell. However, the video input has many frames, in some frames, there may be case
exist more than one dense contour or there is no contour at all.
In the first case, where there is more than one contour, higher the chance the
biggest contour in area is the cell. The smaller contours in these cases are likely the
microcell in water, which cause noises.
In case both the cells are normal cell. Our algorithm still calculates area of two
cells and only choose the biggest cell in area to extract shape features. This cell will
also be the cell we further analyze its speed features. The same goes for three or more
cells, by only choosing the biggest cell to exploit, we limit analysis errors and allow
extracting more accuracy features.
In special case, there is no cell in the frame, however we can still detect
microcell contours. We set a minimum threshold so that unless the area of the density
contour is larger than requirement, we don’t further explore the microcell. The
threshold depends on the scale of the video taken. For our data, we set the minimum
area at 250 pixels, which equals 2,912 pm2.
Figure 3.7. Area and perimeter graphs.
In Figure 3.7, we can see the detection of total five cells, with each cell’s area
and perimeter start from zero pixels, then significantly rise up to their peaks when the
cell fully appear in the microscope sight. At the end, when a cell leaves the camera, its
area and perimeter quickly decrease down to zero. There is a tiny null value gap
between each cell’s appearance, showing the frames when there are no detected cells
in the data.
We can see that the area and perimeter are highly similar, this is because their
features only have a small change when the cell collides with the pillars. Almost the
changes in area and perimeter graphs are parts of the cell were missing from
microscope sight, which lead to the simultaneously increasing and decreasing in area
and perimeter graphs. To see a more dramatic change in shape features, we need to
look into the next shape feature, deformability.
One of the crucial shape feature we are gonna extracted from cell is the
deformability (roundness) of the cell. The equation of the cell’s deformability is
described in Equation 3.1.
Equation 3.1. Deformability (Roundness) equation.
Figure 3.8. Deformability attributes example. Retrieve from:
This statistic results from 0 to 1. Base on the shape of the cell, the deformability
equals 1 for a circular cell and less than 1 for a cell that departs from roundness. The
reason we care about this feature is that in the following speed features, the extraction
of speed from an irregular object is a difficult and complex task. However, by select
and explore only the speed from near round cell. We can get more accuracy
Figure 3.9. Deformability graph.
The deformability graph in Figure 3.9 is the calculated result of the same five
cells from area and perimeter graphs in Figure 3.8. We can see exactly five cells with
their deform ratio quickly rise up when they appear from the left border of the video.
Reach their peaks when these cells fully appear in the microscope sight. Different from
area and perimeter graphs, we see clearer the change in the roundness of the cell. The
middle lower deformability sections are caused by the collision of the cell with pillars,
which causing the cell to deform or a portion of the cell was missing from microscope
sight. At the end, the deformability quickly decrease before each cell disappears at one
border of the map, showing the fading of the cell from microscope sight. There is a
tiny null value gap between each cell’s appearance, showing the frames when there are
no detected cells in the data.
3.3. Speed features
To calculate the cell’s speed, we explore the feature using affine transform
formula. Let takes two different frames of the same cell, P is a list of cell’s coordinates
in frame 1, Q is a list of cell’s coordinates in frame 2. Because an affine transformation
preserves shape and cell’s structure. After an affine transformation, we have all the
pixel in Q is a mapping of a pixel in P through a linear function.
In the thesis, base on the fact most of the cells are round and near round shape,
we explore two types of affine transformation, which are rotation speed and translation
speed. The linear function of the affine transformation equation is described in
Equation 3.2. Affine transformation equation.
P.x, P.y: are the coordinates of cell’s pixel in frame 1.
Q.x, Q.y: are the coordinates of cell’s pixel in frame 2.
𝜃 is the rotation speed of the cell in clockwise (radians/T)
a is the horizontal translation speed along X axis (pixels/T)
b is the vertical translation speed along Y axis (pixels/T)
T is the frame gap between frame 1 and frame 2
With this equation, we need to find list cell’s coordinates P pairwise list cell’s
coordinates Q. Then, apply least square algorithm, dividing two matrices Q and P to
achieve the linear function result, which consist the speed features for
rotation speed, a for horizontal speed and b for vertical translation speed. Despite there
are four results, we calculate the average of from
sin)( and -sin (), because we also want to calculate the
(negative result, while cos() always return a positive rotation speed.
The difficulty in this equation is to find pixel list P, which is the exact mapping
of pixel list Q. In this thesis, we apply SIFT to locate high scale-invariant key points
and their descriptors between two frames. After that, we match key points with the
highest similarity in descriptor to get the pairwise pixel in P and Q. Our algorithm
finds about 60-80 matching key points per SIFT matching. However, not all of these
pairwise matches are accurate, there are drawbacks of outliers that we want to remove
to get more accurate results.
Figure 3.10. Matching key points with high similar descriptor. (a) Detected key points
in frame 1 (b) Matching detected key points in frame 2.
To reduce the outliers, first, we get T, which is the distance between two frames,
the smaller the better. In this thesis, T is 1 so frame 1 is the previous input frame of
frame 2. There are two reasons for this, missing value pixel points and locating center
Missing value pixel points, the cell are mostly transparent and inconsistent from
frame to frame so some key points of the cell from frame 1 may be unavailable in
frame 2. This leads to the difficulty of selecting key points and descriptors for P and Q.
Our thesis using SIFT algorithm to detect scale invariant key points to select a
pairwise matching pixel. Because the cell is inconsistent, the bigger the gap between
two frames, more value key points are missing and the increase of similar descriptor
noises appear in the cell. So we want to avoid this by reducing T as small as we can.
Locate center of mass, for affine transformation equation, the rotation center is
O(0,0) in the coordinate system. We want to calculate the cell which rotates around its
center of mass instead of O, so the formula in Equation 3.2 changes to Equation 3.3.