The development of indexing techniques for large scale data in the JavaScript object format

Please use this identifier to cite or link to this item: http://202.28.20.112/dspace/handle/123456789/974

Title:	The development of indexing techniques for large scale data in the JavaScript object format การพัฒนาเทคนิคสร้างดรรชนีสำหรับข้อมูลขนาดใหญ่ที่อยู่ในรูปแบบวัตถุของจาวาสคริปต์
Authors:	Jirawat Duangkaew จิรวัฒน์ ดวงแก้ว Bowonsak Srisungsittisunti บวรศักดิ์ ศรีสังสิทธิสันติ University of Phayao Bowonsak Srisungsittisunti บวรศักดิ์ ศรีสังสิทธิสันติ bowonsak.sr@up.ac.th bowonsak.sr@up.ac.th
Keywords:	การจัดทำดรรชนีหนาแน่น การทำดรรชนีกระจาย ชุดข้อมูลขนาดใหญ่ ฐานข้อมูลที่ไม่ใช่เชิงสัมพันธ์ ไฟล์รูปแบบวัตถุของจาวาสคริปต์ Dense Indexing Sparse Indexing Large Scale Dataset Not only Structured Query Language JavaScript Object Notation
Issue Date:	4
Publisher:	University of Phayao
Abstract:	The use of JavaScript Object Notation (JSON) in Not only Structured Query Language (NoSQL) has become increasingly popular. However, considering the limitations of NoSQL, particularly the Dense Index technique in a One-to-One format and the Sparse Index in a One-to-Many format for large-scale JSON files, these present significant challenges. This research proposes The development of Indexing techniques for large-scale data in the JavaScript object format. The researcher simulated data of no less than eight hundred thousand entries, with data file sizes of no less than thirty gigabytes, under the JSON files. This data was then subjected to an analytical design process, incorporating some techniques of Relational databases, namely Dense and Sparse Index, Linear Search (LS) techniques, and the application of an Array structure. Binary search (BS), Binary search Tree (BST), and Adelson-Velskii Landis Tree (AVL Tree) search techniques were used, along with Python for performance comparison testing in terms of the time duration of the index in data retrieval. The researcher incorporated these techniques into the process of developing an index for large-scale JSON files, including a method for creating a Dense and a Sparse Index. Subsequently, the tree structure was adjusted to suit the data set of the BS, BST, and AVL Tree techniques, which are under the large-scale JSON files. Ultimately, the research found that the Dense Index approach could reduce data access time by up to 98.57% compared to the Non Index method, and the Sparse Index could reduce data access time by up to 98.45%. Furthermore, when enhancing data access performance per keyword with LS, BS, BST, and AVL Tree techniques. The research found that the AVL Tree technique had the fastest time performance in searching per keyword for the Dense Index case, with an average time of 0.005 milliseconds. Conversely, the BS technique had the fastest time performance in searching per keyword for the Sparse Index case, with an average time of 0.011 milliseconds. The LS technique also demonstrated the slowest time performance when searching per keyword for both index types. การใช้รูปแบบวัตถุของจาวาสคริปต์ (JavaScript Object Notation, JSON) ในฐานข้อมูลที่ไม่ใช่เชิงสัมพันธ์ (Not only Structured Query Language, NoSQL) ได้รับความนิยมเป็นอย่างมาก อย่างไรก็ตาม หากพิจารณาข้อจำกัด NoSQL โดยเฉพาะส่วนของเทคนิคการจัดทำดรรชนีแบบหนาแน่น (Dense Index) แบบหนึ่งต่อหนึ่ง (One-to-One) และดรรชนีกระจาย (Sparse Index) แบบหนึ่งต่อกลุ่ม (One-to-Many) สำหรับชุดข้อมูลไฟล์ JSON ขนาดใหญ่ จึงเป็นข้อจำกัดอันท้าทายเป็นอย่างมาก งานวิจัยครั้งนี้ได้เสนอการพัฒนาเทคนิคสร้างดรรชนีสำหรับข้อมูลขนาดใหญ่ที่อยู่ในรูปแบบวัตถุของจาวาสคริปต์ (The development of Indexing techniques for large-scale data in the JavaScript object format) ผู้วิจัยได้จำลองข้อมูลไม่ต่ำกว่าแปดแสนรายการขึ้นไป ซึ่งมีขนาดไฟล์ข้อมูลไม่ต่ำกว่าสามสิบกิ๊กกะไบต์ อยู่ภายใต้ชนิดไฟล์ JSON นำเข้าสู่กระบวนการวิเคราะห์ออกแบบขั้นตอนวิธี โดยผสมผสานบางเทคนิคของฐานข้อมูลเชิงสัมพันธ์ (Relational databases) ได้แก่ ดรรชนีแบบหนาแน่น (Dense Index) และดรรชนีแบบกระจาย (Sparse Index) และการประยุกต์ใช้โครงสร้างแบบแถวลำดับ (Array) เทคนิคการค้นหาเชิงเส้น (Linear search, LS) เทคนิคการค้นหาแบบทวิภาค (Binary search, BS) เทคนิคต้นไม้ค้นหาแบบทวิภาค (Binary search Tree, BST) และเทคนิค Adelson-Velskii Landis Tree (AVL Tree) รวมถึงนำภาษาไพทอน (Python) สำหรับทดสอบเปรียบเทียบประสิทธิภาพทางด้านระยะเวลาของดรรชนีในการสืบค้นหาข้อมูล ผู้วิจัยได้นำเทคนิคข้างต้นเข้าสู่กระบวนการพัฒนาเทคนิคสร้างดรรชนีสำหรับข้อมูลขนาดใหญ่ที่อยู่ในรูปแบบวัตถุของจาวาสคริปต์ ได้แก่ ขั้นตอนวิธีสำหรับการจัดทำดรรชนีแบบหนาแน่นและดรรชนีแบบกระจาย ขั้นตอนเทคนิคการค้นหาเชิงเส้น (Linear search, LS) จากนั้นดำเนินการปรับแต่งโครงสร้างต้นไม้ให้เหมาะสมกับชุดข้อมูลของเทคนิค BS BST และ AVL Tree ซึ่งอยู่ภายใต้ชุดข้อมูลไฟล์ JSON ขนาดใหญ่ ท้ายที่สุดผลการวิจัยพบว่าแนวทางการจัดทำดรรชนีแบบหนาแน่น (Dense Index) สามารถลดเวลาในการเข้าถึงข้อมูลถึง 98.57% จากวิธีการไม่มีดรรชนี (Non Index) และดรรชนีแบบกระจาย (Sparse Index) สามารถลดเวลาในการเข้าถึงข้อมูลถึง 98.45% นอกจากนี้เมื่อเพิ่มประสิทธิภาพการเข้าถึงข้อมูลต่อคำหลักด้วยเทคนิค LS BS BST และ AVL Tree ผลการวิจัยพบว่าเทคนิค AVL Tree มีประสิทธิภาพเวลาที่เร็วที่สุดในการค้นหาต่อคําหลัก สำหรับกรณีดรรชนีแบบหนาแน่น โดยใช้เวลาเฉลี่ยเท่ากับ 0.005 มิลลิวินาที ในทางกลับกันเทคนิค BS มีประสิทธิภาพเวลาที่เร็วที่สุดในการค้นหาต่อคําหลัก สำหรับกรณีดรรชนีแบบกระจาย โดยใช้เวลาเฉลี่ยเท่ากับ 0.011 มิลลิวินาที และพบว่าเทคนิค LS มีประสิทธิภาพเวลาที่ช้าที่สุดในการค้นหาต่อคําหลักทั้งสองดรรชนี
URI:	http://202.28.20.112/dspace/handle/123456789/974
Appears in Collections:	School of Information and Communication Technology

Files in This Item:

File	Description	Size	Format
64024804.pdf		2.51 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets