Overview of Source Code Plagiarism in Programming Courses
Deniz Kılınç1, Fatma Bozyiğit2, Alp Kut3, Muhammet Kaya4
1Deniz Kılınç, Celal Bayar University, Department of Software Engineering, Turkey.
2Fatma Bozyiğit, Celal Bayar University, Department of Software Engineering, Turkey.
3Alp Kut, Dokuz Eylul University, Department of Computer Engineering, Turkey
4Muhammet Kaya, Celal Bayar University, Department of Software Engineering, Turkey.
Manuscript received on April 16, 2015. | Revised Manuscript received on April 26, 2015. | Manuscript published on March 05, 2015. | PP: 79-85 | Volume-5, Issue-2, May 2015. | Retrieval Number: B2610055215/2015©BEIESP
Open Access | Ethics and Policies | Cite 
©The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/

Abstract: Plagiarism of programming source codes is an undesirable situation in the many fields of software development world. Especially in educational field, it is obviously realized that plagiarism in programming courses increases consistently. The aim of this study is attempting to answer questions such as “which codes are similar?”, “what similarity ratios are?” in order to prevent plagiarism among university students who attend programming courses. While developing the proposed methodology, N-gram similarity calculation method and Vector Space Model (VSM) were considered. Information Retrieval (IR) System and Cosine Normalization (CN) methods were utilized to calculate similarity ratios. Experimental study was performed on the dataset yielded by changing source code examples in different forms. The results obtained provide convincing evidence that the study is fit the purpose.
Keywords: Plagiarism source code, n-gram, vector space model, cosine normalization.