This is an archived version of CCC's website. Please visit the new ccc website for the latest information.

Relevant Links

Research Papers

Keywords

Big-Data, Data Warehouse

Buzz


feed icon

feed icon

feed icon

COMPUTING RESEARCH HIGHLIGHT OF THE WEEK [July 17 - July 24]

RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems


Industry engineers and academic researchers from Facebook, Ohio State University, and Institute of Computing Technology, Chinese Academy of Sciences have developed a data placement structure, called RCFile, to efficiently store increasingly big data sets using a large and distributed data warehouse. This data placement problem is challenging to big data management and users, such as social network, Web service providers, and online stores.

Linear SystemAccording to the collaborative team, they have identified "four critical requirements to the design and implementation of a data placement structure, namely 1) fast data loading, 2) fast query processing, 3) highly efficient storage space utilization, and 4) strong adaptivity to highly dynamic workload patterns." And their solution RCFile is able to satisfy all the requirements by balancing merits and limits of various existing data placement structures.

RCFile and its open source implementation were documented in a paper presented in the 27th International Conference on Data Engineering in 2011. According to the wikipedia of RCFile and several related industry documents, RCFile has been widely used in real-world systems. For example, It has become the default data placement structure in Facebook's production data warehouse, which is so far the largest Hadoop data warehouse in the world. RCFile is also adopted in two open source data analytic systems, Apache Hive and Apache Pig, which are being used in major Internet services, including Facebook, Linkedin, Taobao, Twitter, and Yahoo.

Researchers:

Yongqiang He (Facebook)
Rubao Lee (The Ohio State University)
Yin Huai (The Ohio State University)
Zheng Shao (Facebook)
Namit Jain (Facebook)
Xiaodong Zhang (The Ohio State University)
Zhiwei Xu (Institute of Computing Technology, Chinese Academy of Sciences)

Agencies (that have supported the research):
Facebook, The Ohio State University, Institute of Computing Technology in Chinese Academy of Sciences

 

Current Highlight | Past Highlights


Computing Research Highlight of the Week is a service of the Computing Community Consortium and the Computing Research Association designed to highlight some of the exciting and important recent research results in the computing fields. Each week a new highlight is chosen by CRA and CCC staff and volunteers from submissions from the computing community. Want your research featured? Submit it!.