Site Loader

Fuzzy Keyword Search over
Encrypted Data in Cloud Computing

 

Pranjuli Yavatkar, Nikita
Patil, Sneha Kale

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Department of Computer
Engineering, Bharati Vidyapeeth College of Engineering, Navi Mumbai,
Maharashtra

 

 

Abstract –
As Cloud Computing becomes prevalent, more and more sensitive information are
being centralized into the cloud. For the protection of data privacy, sensitive
data usually have to be encrypted before outsourcing, which makes effective
data utilization a very challenging task. Although traditional searchable
encryption schemes allow a user to securely search over encrypted data through
keywords and selectively retrieve files of interest, these techniques support
only exact keyword search. That is, there is no tolerance of minor typos and
format inconsistencies which, on the other hand, are typical user searching
behavior and happen very frequently. This significant drawback makes existing
techniques unsuitable in Cloud Computing as it greatly affects system usability,
rendering user searching experiences very frustrating and system efficacy very
low. In this paper, for the first time we formalize and solve the problem of
effective fuzzy keyword search over encrypted cloud data while maintaining
keyword privacy. Fuzzy keyword search greatly enhances system usability by
returning the matching files when users’ searching inputs exactly match the
predefined keywords or the closest possible matching files based on keyword
similarity semantics, when exact match fails.

 

KEYWORDS: Encryption, Fuzzy Keyword, Cloud Computing

 

I.                   
INTRODUCTION

 

As Cloud Computing becomes prevalent, more and more
sensitive information are being centralized into the cloud, such as emails,
personal health records, government documents, etc. By storing their data into
the cloud, the data owners can be relieved from the burden of data storage and
maintenance so as to enjoy the on-demand high quality data storage service.
However, the fact that data owners and cloud server are not in the same trusted
domain may put the outsourced data at risk, as the cloud server may no longer
be fully trusted. It follows that sensitive data usually should be encrypted
prior to outsourcing for data privacy and combating unsolicited accesses.
However, data encryption makes effective data utilization a very challenging
task given that there could be a large amount of outsourced data files.
Moreover, in Cloud Computing, data owners may share their outsourced data with
a large number of users. The individual users might want to only retrieve
certain specific data files they are interested in during a given session. One
of the most popular ways is to selectively retrieve files through keyword-based
search instead of retrieving all the encrypted files back which is completely
impractical in cloud computing scenarios. Such keyword-based search technique
allows users to selectively retrieve files of interest and has been widely
applied in plaintext search scenarios, such as Google search. Unfortunately,
data encryption restricts user’s ability to perform keyword search and thus
makes the traditional plaintext search methods unsuitable for Cloud Computing.
Besides this, data encryption also demands the protection of keyword privacy
since keywords usually contain important information related to the data files.
Although encryption of keywords can protect keyword privacy, it further renders
the traditional plaintext search techniques useless in this scenario. To
securely search over encrypted data, searchable encryption techniques have been
developed in recent years. Searchable encryption schemes usually build up an
index for each keyword of interest and associate the index with the files that
contain the keyword. By integrating the trapdoors of keywords within the index
information, effective keyword search can be realized while both file content
and keyword privacy are well-preserved. Although allowing for performing
searches securely and effectively, the existing searchable encryption
techniques do not suit for cloud computing scenario since they support only
exact keyword search. That is, there is no tolerance of minor typos and format
inconsistencies.

It is quite common that users’ searching input might
not exactly match those pre-set keywords due to the possible typos,
representation inconsistencies, and/or her lack of exact knowledge about the
data. The naive way to support fuzzy keyword search is through simple spell
check mechanisms. However, this approach does not completely solve the problem
and sometimes can be ineffective due to the following reasons: on the one hand,
it requires additional interaction of user to determine the correct word from
the candidates generated by the spell check algorithm, which unnecessarily
costs user’s extra computation effort; on the other hand, in case that user
accidentally types some other valid keywords by mistake (for example, search
for “hat” by carelessly typing “cat”), the spell check algorithm would not even
work at all, as it can never differentiate between two actual valid words.
Thus, the drawbacks of existing schemes signifies the important need for new techniques
that support searching flexibility, tolerating both minor typos and format
inconsistencies.

In this paper, we focus on enabling effective yet
privacy-preserving fuzzy keyword search in Cloud Computing. To the best of our
knowledge, we formalize for the first time the problem of effective fuzzy
keyword search over encrypted cloud data while maintaining keyword privacy.
Fuzzy keyword search greatly enhances system usability by returning the
matching files when users’ searching inputs exactly match the predefined
keywords or the closest possible matching files based on keyword similarity
semantics, when exact match fails. More specifically, we use edit distance to
quantify keywords similarity and develop a novel technique, i.e., a
wildcard-based technique, for the construction of fuzzy keyword sets. This
technique eliminates the need for enumerating all the fuzzy keywords and the
resulted size of the fuzzy keyword sets is significantly reduced. Based on the
constructed fuzzy keyword sets, we propose an efficient fuzzy keyword search
scheme.

 

II.                
RELATED WORK

 

Plaintext fuzzy
keyword search: Recently, the importance of fuzzy
search has received attention in the context of plaintext searching in
information retrieval community 11–13. They addressed this problem in the
traditional information access paradigm by allowing user to search without
using try-and-see approach for finding relevant information based on
approximate string matching. At the first glance, it seems possible for one to
directly apply these string matching algorithms to the context of searchable
encryption by computing the trapdoors on a character base within an alphabet.
However, this trivial construction suffers from the dictionary and statistics
attacks and fails to achieve the search privacy.

 

Searchable
encryption: Traditional searchable encryption has
been widely studied in the context of cryptography. Among those works, most are
focused on efficiency improvements and security definition formalizations. The
first construction of searchable encryption was proposed by Song et al., in
which each word in the document is encrypted independently under a special
two-layered encryption construction. Goh proposed to use Bloom filters to
construct the indexes for the data files. To achieve more efficient search,
Chang et al. and Curtmola et al. both proposed similar “index” approaches,
where a single encrypted hash table index is built for the entire file
collection. In the index table, each entry consists of the trapdoor of a
keyword and an encrypted set of file identifiers whose corresponding data files
contain the keyword. As a complementary approach, Boneh et al. presented a
public-key based searchable encryption scheme. Note that all these existing
schemes support only exact keyword search, and thus are not suitable for Cloud
Computing.

 

III.             
PROBLEM FORMULATION

 

A. System Model:

 

 

 

In this paper, we
consider a cloud data system consisting of data owner, data user and cloud
server. Given a collection of n encrypted data files C = (F1, F2,…, FN)
stored in the cloud server, a predefined set of distinct keywords W = {w1, w2,
…, wp}, the cloud server provides the search service for the authorized users
over the encrypted data C. We assume the authorization between the data owner
and users is appropriately done. An authorized user types in a request to
selectively retrieve data files of his/her interest. The cloud server is
responsible for mapping the searching request to a set of data files, where
each file is indexed by a file ID and linked to a set of keywords. The fuzzy
keyword search scheme returns the search results according to the following
rules: 1) if the user’s searching input exactly matches the pre-set keyword,
the server is expected to return the files containing the keyword1; 2) if there
exist typos and/or format inconsistencies in the searching input, the server
will return the closest possible results based on pre-specified similarity
semantics. An architecture of fuzzy keyword search is shown in the Fig. 1.

 

B.
Threat Model

 

We consider a
semi-trusted server. Even though data files are encrypted, the cloud server may
try to derive other sensitive information from users’ search requests while
performing keyword-based search over C. Thus, the search should be conducted in
a secure manner that allows data files to be securely retrieved while revealing
as little information as possible to the cloud server. More specifically, it is
required that nothing should be leaked from the remotely stored files and index
beyond the outcome and the pattern of search queries.

 

C.
Design Goals

 

In this paper, we
address the problem of supporting efficient yet privacy-preserving fuzzy
keyword search services over encrypted cloud data. Specifically, we have the
following goals: i) to explore new mechanism for constructing storage efficient
fuzzy keyword sets; ii) to design efficient and effective fuzzy search scheme
based on the constructed fuzzy keyword sets; iii) to validate the security of
the proposed scheme.

 

IV. CONSTRUCTIONS OF EFFECTIVE
FUZZY KEYWORD SEARCH IN CLOUD

 

The key idea behind
our secure fuzzy keyword search is two-fold: 1) building up fuzzy keyword sets
that incorporate not only the exact keywords but also the ones differing
slightly due to minor typos, format inconsistencies, etc.; 2) designing an
efficient and secure searching approach for file retrieval based on the resulted
fuzzy keyword sets.

 

A.
Advanced Technique for Constructing Fuzzy Keyword Sets

 

To provide more
practical and effective fuzzy keyword search constructions with regard to both
storage and search efficiency, we now propose an advanced technique to improve
the straightforward approach for constructing the fuzzy keyword set. Without
loss of generality, we will focus on the case of edit distance d = 1 to
elaborate the proposed advanced technique. For larger values of d, the
reasoning is similar. Note that the technique is carefully designed in such a
way that while suppressing the fuzzy keyword set, it will not affect the search
correctness.

 

Wildcard-based
Fuzzy Set Construction

 

In the above
straightforward approach, all the variants of the keywords have to be listed
even if an operation is performed at the same position. Based on the above
observation, we proposed to use a wildcard to denote edit operations at the
same position. The wildcard-based fuzzy set of wi with edit distance
d is denoted as S wi,d ={S’wi,0, S’wi,2, ··· ,
S’wi,d }, where S’ wi ,?  denotes the set of words wi with ? wildcards.
Note each wildcard represents an edit operation on wi. For example,
for the keyword CASTLE with the pre-set edit distance 1, its wildcard-based
fuzzy keyword set can be constructed as SCASTLE,1 = {CASTLE, *CASTLE, *ASTLE,
C*ASTLE, C*STLE, ··· , CASTL*E, CASTL*, CASTLE*}. The total number of variants
on CASTLE constructed in this way is only 13 + 1, instead of 13 × 26 + 1 as in
the above exhaustive enumeration approach when the edit distance is set to be
1. Generally, for a given keyword wi with length l,
the size of S wi,1 will be only 2 l +1+1,
as compared to (2 l + 1) × 26 + 1 obtained in the
straightforward approach. The larger the pre-set edit distance, the more
storage overhead can be reduced: with the same setting of the example in the straightforward
approach, the proposed technique can help reduce the storage of the index from
30GB to approximately 40MB. In case the edit distance is set to be 2 and 3, the
size of S wi,2 and S wi,3 will be C1 l +1+C1 l ·
C1 l
+2C2 l +2 and
C1 l
+ C3 l + 2C2 l +
2C2 l
· C1 l . In other words, the number is only O( l d)
for the keyword with length l
and edit distance d.

 

B.
The Efficient Fuzzy Keyword Search Scheme

 

Based on the
storage-efficient fuzzy keyword sets, we show how to construct an efficient and
effective fuzzy keyword search scheme. The scheme of the fuzzy keyword search
goes as follows: 1) To build an index for wi with edit distance d, the data
owner first constructs a fuzzy keyword set Swi,d using the wildcard
based technique. Then he computes trapdoor set {Tw’i} for each w’i
? S
wi,d with a secret key sk shared between data owner and authorized users.
The data owner encrypts FIDwi as Enc(sk, FIDwi  || wi). The index table {({ Tw’i
}w’i? S
wi,d , Enc(sk, FIDwi || wi ))} wi ?W and encrypted data files are
outsourced to the cloud server for storage;

2) To search with
(w, k), the authorized user computes the trapdoor set {Tw’}w’
? Sw,k , where Sw,k is
also derived from the wildcard-based fuzzy set construction. He then sends {Tw’}w’
? Sw,k to the server;

3) Upon receiving
the search request {Tw’}w’ ? Sw,k, the server
compares them with the index table and returns all the possible encrypted file
identifiers {Enc(sk, FIDwi || wi)} according to the fuzzy
keyword definition in section III-D. The user decrypts the returned results and
retrieves relevant files of interest.

In this
construction, the technique of constructing search request for w is the same as
the construction of index for a keyword. As a result, the search request is a
trapdoor set based on Sw,k , instead of a single trapdoor as in the
straightforward approach. In this way, the searching result correctness can be
ensured.

 

V. CONCLUSION

 

In this paper, we formalize and solve the problem of
supporting efficient yet privacy-preserving fuzzy search for achieving
effective utilization of remotely stored encrypted data in Cloud Computing. We
design an advanced technique (i.e., wildcard-based technique) to construct the
storage-efficient fuzzy keyword sets by exploiting a significant observation on
the similarity metric of edit distance. Based on the constructed fuzzy keyword
sets, we further propose an efficient fuzzy keyword search scheme. Through
rigorous security analysis, we show that our proposed solution is secure and
privacy-preserving, while correctly realizing the goal of fuzzy keyword search.

 

REFERENCES

1 D. Song, D. Wagner, and A. Perrig, “Practical
techniques for searches on encrypted data,” in Proc. of IEEE Symposium on
Security and Privacy’00, 2000.

2 E.-J. Goh, “Secure indexes,” Cryptology ePrint
Archive, Report 2003/216, 2003, http://eprint.iacr.org/.

3 Fuzzy keyword search over encrypted data in cloud
computing”, Illinois Institute of Technology, ISSN: 2321- 8134.

4 Y.-C. Chang and M. Mitzenmacher, “Privacy
preserving keyword searches on remote encrypted data,” in Proc. of ACNS’05,
2005.

5 D. Boneh, G. D. Crescenzo, R. Ostrovsky, and G.
Persiano, “Public key encryption with keyword search,” in Proc. of EUROCRYP’04,
2004.

6 R. Curtmola, J. A. Garay, S. Kamara, and R.
Ostrovsky, “Searchable symmetric encryption: improved definitions and efficient
constructions,” in Proc. of ACM CCS’06, 2006.

7 D. Boneh and B. Waters, “Conjunctive, subset, and
range queries on encrypted data,” in Proc. of TCC’07, 2007, pp. 535–554.

8 REVIEW PAPER ON FUZZY SEARCH OVER ENCRYPTED DATA
IN CLOUD COMPUTING by Neel Gala ISSN:2393-9842

9 F. Bao, R. Deng, X. Ding, and Y. Yang, “Private
query on encrypted data in multi-user settings,” in Proc. of ISPEC’08, 2008.

10 C. Li, J. Lu, and Y. Lu, “Efficient merging and
filtering algorithms for approximate string searches,” in Proc. of ICDE’08,
2008.

 

 

Post Author: admin

x

Hi!
I'm Russell!

Would you like to get a custom essay? How about receiving a customized one?

Check it out