• Research article
  • Open access
  • Published: 27 July 2020

Testing of support tools for plagiarism detection

  • Tomáš Foltýnek 1 , 2 ,
  • Dita Dlabolová 1 ,
  • Alla Anohina-Naumeca 3 ,
  • Salim Razı 4 ,
  • Július Kravjar 5 ,
  • Laima Kamzola 3 ,
  • Jean Guerrero-Dib 6 ,
  • Özgür Çelik 7 &
  • Debora Weber-Wulff 8  

International Journal of Educational Technology in Higher Education volume  17 , Article number:  46 ( 2020 ) Cite this article

33k Accesses

44 Citations

73 Altmetric

Metrics details

There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software to determine if a text is plagiarized or not. Software cannot determine plagiarism, but it can work as a support tool for identifying some text similarity that may constitute plagiarism. But how well do the various systems work? This paper reports on a collaborative test of 15 web-based text-matching systems that can be used when plagiarism is suspected. It was conducted by researchers from seven countries using test material in eight different languages, evaluating the effectiveness of the systems on single-source and multi-source documents. A usability examination was also performed. The sobering results show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic.

Introduction

Teddi Fishman, former director of the International Centre for Academic Integrity, has proposed the following definition for plagiarism: “ Plagiarism occurs when someone uses words, ideas, or work products, attributable to another identifiable person or source, without attributing the work to the source from which it was obtained, in a situation in which there is a legitimate expectation of original authorship, in order to obtain some benefit, credit, or gain which need not be monetary “(Fishman, 2009 , p. 5). Plagiarism constitutes a severe form of academic misconduct. In research, plagiarism is included in the three “cardinal sins”, FFP—Fabrication, falsification, and plagiarism. According to Bouter, Tijdink, Axelsen, Martinson, and ter Riet ( 2016 ), plagiarism is one of the most frequent forms of research misconduct.

Plagiarism constitutes a threat to the educational process because students may receive credit for someone else’s work or complete courses without actually achieving the desired learning outcomes. Similar to the student situation, academics may be rewarded for work which is not their own. Plagiarism may also distort meta-studies, which make conclusions based on a number or percentage of papers that confirm or refute a certain phenomenon. If these papers are plagiarized, then the number of actual experiments is lower and conclusions of the meta-study may be incorrect.

There can also be other serious consequences for the plagiarist. The cases of politicians who had to resign in the aftermath of a publicly documented plagiarism case are well known, not only in Germany (Weber-Wulff, 2014 ) and Romania (Abbott, 2012 ), but also in other countries. Scandals involving such high-profile persons undermine citizens’ confidence in democratic institutions and trust in academia (Tudoroiu, 2017 ). Thus, it is of great interest to academic institutions to invest the effort both in plagiarism prevention and in its detection.

Foltýnek, Meuschke, and Gipp ( 2019 ) identify three important concerns in addressing plagiarism:

Similarity detection methods that for a given suspicious document, are expected to identify possible source document(s) in a (large) repository;

Text-matching systems that maintain a database of potential sources, employ various detection methods, and provide an interface to users;

Plagiarism policies that are used for defining institutional rules and processes to prevent plagiarism or to handle cases that have been identified.

This paper focuses on the second concern. Users and policymakers expect what they call plagiarism detection software , but more exactly should be referred to as text-matching software , to use state-of-the-art similarity detection methods. The expected output is a report with all the passages that are identical or similar to other documents highlighted, together with links to and information about the potential sources. To determine how the source was changed and whether a particular case constitutes plagiarism or not, an evaluation by a human being is always needed, as there are many inconclusive or problematic results reported. The output of such a system is often used as evidence in a disciplinary procedure. Therefore, both the clarity of the report and the trustworthiness of its content are important for the efficiency and effectiveness of institutional processes.

There are dozens of such systems available on the market, both free and paid services. Some can be used online, while others need to be downloaded and used locally. Academics around the globe are naturally interested in the question: How far can these systems reach in detecting text similarities and to what extent are they successful? In this study, we will look at the state-of-the-art text-matching software with a focus on non-English languages and provide a comparison based on specific criteria by following a systematic methodology.

The investigation was conducted by nine members of the European Network for Academic Integrity (ENAI) in the working group TeSToP, Te sting of S upport To ols for P lagiarism Detection. There was no external funding available, the access to the various systems was provided to the research group free of charge by the companies marketing the support tools.

The paper is organized as follows. The next section provides a detailed survey of related work. This is followed by a specification of the methodology used to carry out the research and then a description of the systems used in the research. After reporting on the results acquired, discussion and conclusion points are given.

Survey of related work

Since the beginning of this century, considerable attention has been paid, not only to the problem of plagiarism, but also to text-matching software that is widely used to help find potentially plagiarized fragments in a text. There are plenty of scientific papers that postulate in their titles that they offer a classification, a comparative study, an overview, a review, a survey, or a comparison of text-matching software tools. There are, however, issues with many of the papers. Some, such as Badge and Scott ( 2009 ) and Marjanović, Tomašević, and Živković ( 2015 ), simply refer to comparative tests performed by other researchers with the aim of demonstrating the effectiveness of such tools. Such works could be useful for novices in the field who are not familiar with such automated aides, but they are meaningless for those who want to make an informed choice of a text-matching tool for specific needs.

Many research works offer only a primitive classification of text-matching software tools into several categories or classes. Others provide a simple comparative analysis that is based on functional features. They are usually built on a description of the tools as given on their official websites (e.g. Nahas, 2017 ; Pertile, Moreira, & Rosso, 2016 ), a categorization given in another source (e.g. Chowdhury & Bhattacharyya, 2016 ; Lukashenko, Graudina, & Grundspenkis, 2007 ; Urbina et al., 2010 ) or a study of corresponding literature and use of intelligent guess (e.g. Lancaster & Culwin, 2005 ). These types of research give a good insight into a broad scope of the functional features, focus, accessibility, and shortcomings of text-matching software. However, they are still incomplete for guiding the selection of a tool, as they do not evaluate and compare the performance of software systems and their usability from the viewpoint of end-users.

The most frequently mentioned categorizations are as follows:

Software that checks text-based documents, source code, or both (Chowdhury & Bhattacharyya, 2016 ; Clough, 2000 ; Lancaster & Culwin, 2005 ; Lukashenko et al., 2007 );

Software that is free, private, or available by subscription (Chowdhury & Bhattacharyya, 2016 ; Lancaster & Culwin, 2005 ; Lukashenko et al., 2007 ; Nahas, 2017 ; Pertile et al., 2016 ; Shkodkina & Pacauskas, 2017 ; Urbina et al., 2010 ; Vandana, 2018 );

Software that is available online (web-based) or can be installed on a desktop computer (Lancaster & Culwin, 2005 ; Marjanović et al., 2015 ; Nahas, 2017 ; Pertile et al., 2016 ; Shkodkina & Pacauskas, 2017 ; Vandana, 2018 );

Software that operates intra-corpally, extra-corpally, or both (Lancaster & Culwin, 2005 ; Lukashenko et al., 2007 ; Marjanović et al., 2015 ).

Additionally, some researchers include unconventional comparative criteria. Pertile et al. ( 2016 ) indicate if a tool can make a citation analysis, a content analysis, structural analysis, or a paraphrase analysis. Lancaster and Culwin ( 2005 ) take into account the number of documents that are processed together to generate a numeric value of similarity and the computational complexity of the methods employed to find similarities. McKeever ( 2006 ) classifies text-matching software tools into search-based systems, systems performing linguistic analysis, software based on collusion detection, and systems for detecting software plagiarism.

Shkodkina and Pacauskas ( 2017 ) have defined 28 comparison criteria that are divided into four categories: affordability, material support, functionality, and showcasing. They compared three tools based on the criteria defined. However, it is not clear how the comparison was actually conducted, whether only by studying information available on product websites, or by trying out each tool. The descriptive part of their comparison does not contain references to information sources. Moreover, the set of criteria includes the ability of a tool to recognize different types of plagiarism (such as paraphrasing, translation, obfuscation, or self-plagiarism) and there are no indications of how these criteria were evaluated.

Shynkarenko and Kuropiatnyk ( 2017 ) have not compared available text-matching software tools, but they have defined more than 30 requirements for the development of such tools based on the analysis of other authors’ works and the documentation of tools. They provide a comparison between the 27 tools mentioned in their paper and the defined requirements.

It is rather surprising that with the variety of research work on text-matching software, only a few of them address the performance and usability of these tools. Moreover, some of them do not make a comparative analysis of performance, but mainly check working principles and capabilities of the tools based on testing them on different kinds of submissions. At the end of the last century, Denhart ( 1999 ) published a lively discussion of his check of three systems. He uploaded his senior thesis and a mini-essay made up of randomly selected sentences from four well-known authors with some slight paraphrasing to the systems. He found problems with properly quoted material and the inability to find many plagiarized sentences in the mini-essay. He also mentioned poor usability for one of the systems that otherwise had quite good performance results.

Culwin and Lancaster ( 2000 ) used a more carefully constructed text and checked four tools operating at that time using six sentences: four original sentences from two famous works widely available on the web, one paraphrased sentence from an essay available on a free essay site, and an original sentence from a newly indexed personal website. They checked the performance of tools and described if the text was found or not and at which sites. They also addressed some usability problems of systems for tutors and students.

Maurer, Kappe, and Zaka ( 2006 ) checked three tools in relation to verbatim plagiarism, paraphrasing, tabular information processing, translation plagiarism, image/multimedia processing, reference validity check, and a possibility to exclude/select sources. Despite that they are not describing the experiments in detail, there is evidence that they used a prepared sample of texts. These included a paragraph from proceedings that was paraphrased using a simple automatic word replacement tool, text compiled from documents available on the Internet, tabular information, and text in languages with special characters. They conclude that tools work reasonably well when plagiarized text is available on the internet or in other electronic sources. However, text-matching software fails to match paraphrasing plagiarism, plagiarism based on non-electronically available documents, and translation plagiarism. They also do not do well when processing tabular information and special characters.

Vani and Gupta ( 2016 ) used a small text fragment from the abstract of a scientific article and modified it based on four main types of obfuscation: verbatim plagiarism, random obfuscation, translation obfuscation, and summary obfuscation. Using the prepared text sample, they checked three tools and found that tools fail to find translation and summary obfuscations.

Křížková, Tomášková, and Gavalec ( 2016 ) made a comparative analysis of five systems completing two test series that used the same eight articles. The first test consisted of the articles without any modifications; the second test included manually modified articles by reordering words in the text. Their analysis consisted mainly of the percentage of plagiarism found and the time spent by the systems for checking the articles. They then applied multi-criteria decision-making for choosing the best system. However, there is no clear indication of the comparison goal, information about the already presented plagiarism in each of the articles, or how much plagiarism found by the systems matched the initially presented plagiarism. They also addressed usability by using a criterion “additional support” that includes a possibility to edit text directly on the website, multilingual checking, availability of vast information about plagiarism, etc.

Bull, Collins, Coughlin, and Sharp ( 2001 ) used a well-planned methodology and checked five systems identified through a survey of the academic staff from the higher education sector. They compared many functional features and also tested the performance of the tools. The criteria for evaluation contained among other issues the clarity of reports, the clarity of the instructions, the possibility to print the results, and the ease of interpreting the results that refer to the usability of tools. To test the performance they used eleven documents from six academic disciplines and grouped them into four categories according to the type of plagiarized material: essays from on-line essay banks, essays with verbatim plagiarism from the internet, essays written in collusion with others but with no internet material included, and essays written in collusion and containing some copied internet material. They tested the documents over a period of 3 months. In the end, they concluded that the tools were “ effective in identifying the types of plagiarism that they are designed to detect ” (Bull et al., 2001 , p. 5). However, not all tools performed well in their experiments and they also reported on some anomalies in their results.

Chaudhuri ( 2008 ) examined only one particular tool and used 50 plagiarized papers from many different sources (freely available databases, subscription databases, open access journals, open sources, search engines, etc.) in different file formats. The researcher found that the tool is unable to match papers from subscribed databases, to process cited and quoted material, and articles from open access journals.

Luparenko ( 2014 ) tested 22 tools that were selected as popular ones based on an analysis of scientific literature and web sources. She considered many criteria related to functional specification (such as type, availability of free trial mode, need for mandatory registration at a website, number of users that have access to the program, database, acceptable file formats, etc.) and also checked the performance of the tools using one scientific paper in the Ukrainian language and another one in English. Moreover, the checking was done using three different methods: entering the text in the field of website, uploading a file, and submitting the URL of the article. She measured the checking time and evaluated the quality of the report provided by tools, as well as reported the percentage of unique text found in each of the articles.

The Croatian researchers Birkić, Celjak, Cundeković, and Rako ( 2016 ) tested four tools that are widely used in Europe and have the possibility to be used at the national and institutional level. They compared such criteria as the existence of an API (application programming interface) and the possibility to integrate it as a plug-in for learning management systems, database scope, size of the user community, and other criteria. The researchers tested the tools using two papers for each type of submission: journal articles, conference papers, master’s and doctoral theses, and student papers. However, they did not include different types of plagiarism and evaluated the checking process with a focus on quote recognition, tool limitations, and interface intuitiveness.

Kakkonen and Mozgovoy ( 2010 ) tested eight systems using 84 test documents from several sources (internet, electronically unpublished books or author’s own prepared texts, paper mills) that contained several types of plagiarism: verbatim copying, paraphrasing (e.g. adding more spaces, making intentional spelling errors, deleting or adding commas, replacing words by synonyms, etc.) and applying technical tricks. The technical tricks included the use of homoglyphs, which involve substituting similar-looking characters from different alphabets, and adding a character in a white-colored font as an empty space or including text as images. The authors provided a very detailed description of the experiments conducted and used a well-planned methodology. Their findings include problems with submissions from a paper mill, difficulties in identification of synonymous and paraphrased text, as well as finding the source for text obfuscated by technical tricks.

However, the most methodologically sound comparisons were conducted by Debora Weber-Wulff and her team (Weber-Wulff, Möller, Touras, & Zincke, 2013 ) between 2004 and 2013 (see http://plagiat.htw-berlin.de/start-en/ ). In their last testing experiment in 2013, the researchers compared 15 tools that were selected based on previous comparisons. The testing set contained both plagiarized and original documents in English, German, and Hebrew. The test set included various types of plagiarism from many different sources. They found serious problems with both false positives and false negatives, as well as usability problems such as many clicks needed for simple tasks, unclarity of reports, or language issues.

Summarizing the related work discussed in this chapter, it is worth mentioning that available studies on text-matching software:

rarely address the evaluation of performance and usability of such tools, but mostly include a simple overview of their functional features, primitive categorization, or trivial comparisons;

infrequently provide justification for the selection of tools based on well-defined reasons but often only mention the popularity of the tools;

seldom use a well-planned scientific methodology and a well-considered corpus of texts in cases in which they evaluate the performance of the tools;

do not report “explicitly on experimental evaluations of the accuracy and false detection rates” (Kakkonen & Mozgovoy, 2010 , p. 139).

Taking into account the rapid changes in the field (some tools are already out of service, others have been continuously improved, and new tools are emerging) the need for comparative studies that in particular test the performance of the tools is constant. McKeever ( 2006 , p. 159) also notes that “ with such a bewildering range of products and techniques available, there is a compelling need for up-to-date comparative research into their relative effectiveness ”.

Methodology

The basic premise of this software test is that the actual usage of text-matching software in an educational setting is to be simulated. The following assumptions were made based on the academic experience of some members of the testing group before preparing for the test:

Students tend to plagiarize using documents found on the internet, especially Wikipedia.

Some students attempt to disguise their plagiarism.

Very few students use advanced techniques for disguising plagiarism (for example, homoglyphs).

Most plagiarizing students do not submit a complete plagiarism from one source, but use multiple sources.

Instructors generally have many documents to test at one time.

There are legal restrictions on instructors submitting student work.

In some situations the instructor only reviews the reports, submission is done either by the students themselves or by a teaching assistant.

Instructors do not have much time to spend on reviewing reports.

Reports must be stored in printed form in a student’s permanent record if a sanction is levied.

Universities wish to know how expensive the use of a system will be on a yearly basis.

Not all of these assumptions were able to be put to test, for example, most systems would not tell us how much they charge, as this is negotiated on an individual basis with each institution.

Testing documents

In order to test the systems, a large collection of intentionally plagiarized documents in eight different languages were prepared: Czech, English, German, Italian, Latvian, Slovak, Spanish, and Turkish. The documents used various sources (Wikipedia, online articles, open access papers, student theses available online) and various plagiarism techniques (copy & paste, synonym replacement, paraphrase, translation). Various disguising techniques (white characters, homoglyphs, text as image) were used in additional documents in Czech. The testing set also contained original documents to check for possible false positives and a large document to simulate a student thesis.

One of the vendors noted in pre-test discussions that they perceived Turnitin’s exclusive access to publisher’s databases as an unfair advantage for that system. As we share this view and did not want to distort the results, documents with restricted access were deliberately not included.

All testing documents were prepared by TeSToP team members or their collaborators. All of them were obliged to adhere to the written guidelines. As a result, each language set contained at least these documents:

a Wikipedia article in a given language with 1/3 copy & paste, 1/3 with manual synonym replacement, and 1/3 manual paraphrase;

4–5 pages from any publicly available source in a given language with 1/3 copy & paste, 1/3 with manual synonym replacement, and 1/3 manual paraphrase;

translation of the English Wikipedia article on plagiarism detection, half using Google Translate and half translated manually;

an original document, i.e. a document which is not available online and has not been submitted previously to any text-matching software;

a multi-source document in three variations, once as a complete copy & paste, once with manual synonym replacement and once as a manual paraphrase.

The basic multi-source document was created as a combination from five different documents following the pattern ABCDE ABCDE ABCDE ABCDE ABCDE, where each letter represents a chunk of text from a specific source. Each chunk was one paragraph (approx. 100 words) long. The documents were taken from Wikipedia, open access papers, and online articles. In some languages, there were additional documents included to test specific features of the systems.

Table 1 gives an overview of the testing documents and naming convention used for each language set.

Some language sets contained additional documents. Since many Slovak students study at Czech universities and the Czech and Slovak languages are very similar, a translation from Slovak to Czech was included in the Czech set and vice versa. There is also a significant Russian minority in Latvia so that a translation from Russian to Latvian was also included. The German set contained a large document with known plagiarism to test the usability of the systems, but it is not included in the coverage evaluation.

The documents were prepared in PDF, DOCX, and TXT format. By default, the PDF version was uploaded. If a system did not allow that format, DOCX was used. If DOCX was not supported, TXT was used. Some systems do not enable uploading documents at all, so the text was only copied and pasted from the TXT file. Permission to use the sources in this manner was obtained from all original authors. The permissions were either implicit (e.g. Creative Commons license), or explicit consent was obtained from the author.

Testing process

Between June and September 2018, we contacted 63 system vendors. Out of these, 20 agreed to participate in the testing. Three systems had to be excluded because they do not consider online sources and one because it has a word limit of 150 words for its web interface. In the next stage, the documents were submitted to the systems by authorized TeSToP members at a time unknown to the vendor. System default parameters were used at all times; if values such as minimum word run are discernable, they were recorded. After submission of the documents, one system withdrew from testing. Thus 15 systems were tested using documents in eight languages.

For evaluation, the following aspects were considered:

coverage: How much of the known plagiarism was found? How did the system deal with the original text?

usability: How smooth was the testing process itself? How understandable are the reports? How expensive is the system? Other usability aspects.

To perform the coverage evaluation, the results were meticulously reviewed in both the online interface and the PDF reports, if available. Since the percentages of similarity reported do not include exact information on the actual extent of plagiarism and may even be misleading, a different evaluation metric was used. The coverage was evaluated by awarding 0 – 5 points for each test case for the amount of text similarity detected:

5 points: all or almost all text similarity detected;

4 points: a major portion;

3 points: more than half;

2 points: half or less;

1 point: a very minor portion;

0 points: one sentence or less.

For original work that produced false positives, the scale was reversed. Two or three team members independently examined each report for a specific language and discussed cases in which they did not agree. In some cases, it was difficult to assign points from the above-mentioned categories, especially for the systems which show only the matches found and hide the rest of the documents. If the difference between evaluators was not higher than 1 point, the average was taken. The interpretation of the above-mentioned scale was continuously discussed within the whole team.

To perform a usability evaluation, we designed a set of qualitative measures stemming from available literature (e.g. Badge & Scott, 2009 ; Chowdhury & Bhattacharyya, 2016 ; Hage, Rademaker, & van Vugt, 2010 ; Martins, Fonte, Henriques, & da Cruz, 2014 ) and our experience. There were three major areas identified:

Testing process;

Test results;

Other aspects.

Two independent team members assessed all systems in all criteria, only giving a point if the criteria are satisfactory and no points if not. After that, they discussed all differences together with a third team member in order to reach a consensus as far as possible. If an agreement was not possible, half a point was awarded. It a system offered such a functionality, but if the three researchers testing the systems were unable to find it without detailed guidance, 0.5 points were awarded.

Our testing took place between Nov 2018 and May 2019. During this time, we tested both coverage and usability. An additional test of multi-source document took place between August and November 2019. Since the present research did not benefit from any funding, the researchers were expected to fulfil their institutional workloads during the research period. Considering the size of the project team from various countries, we could make significant progress only during semester breaks, which explains the length of the testing process. It should be noted that we tested what the systems offered at the time of data collection. We used features that were allowed by the access given to us by the vendors.

The methodology was sent to all vendors, so that they were informed about the aim of the testing and other aspects of the process. The vendors were informed about categories of our testing (coverage criteria and usability criteria), as well as the fact we planned using documents in multiple languages.

Since the analysis and interpretation of the data are quite sensitive, we approached this period with the utmost care. As suggested by Guba and Lincoln ( 1989 ), member check is an effective technique for establishing the trustworthiness criteria in qualitative studies. Therefore, having analyzed and reported the data, we sent a preprint of the results to the vendors. Team members closely evaluated the issues raised by the vendors. Not all of them were able to be addressed in this paper, but as many as possible were incorporated. Because of the rigorous efforts to establish the validity of the results and the reliability of the study in this process, this study was further delayed.

Overview of systems

In this chapter, a brief description of all of the web-based systems involved in the test is given. The information presented here is based on the information provided by the companies operating the systems—either from their website, or they were provided upon request by telephone or email using the list of questions documented in (Weber-Wulff, 2019 ). Links to the main pages of the systems can be found in the Appendix .

The Akademia system presents itself as an anti-plagiarism system. It is intended for use at all levels of educational institutions and also for commercial institutions. The primary focus is on the region of Kosovo and Albania. The system was introduced in 2018. It is run by the company Sh.PK Academy Platform located in Pristina, Kosovo (Innovation Centre Kosovo, 2018 ).

Copyscape declares itself to be a plagiarism checker. The primary aim is to provide a tool for owners of websites to check if their original content was not used by others. They also provide a service of regular checks and email alerts. Copyscape, which started in 2004 (Greenspan, 2019 ), is operated by a private company, Indigo Stream Technologies Ltd., which is apparently based in Gibraltar. It does not have its own database but uses Google services to crawl the web.

Docol©c describes itself as a system for finding “ similarities between text documents on the Internet ” (Docol©c, 2019 ). It is intended for institutional use and focuses on German-speaking countries. According to the company, the system is used by more than 300 educational institutions in Austria, Germany, and Switzerland, plus around 20 universities worldwide and is integrated into the conference systems EDAS and OpenConf. Docol©c is operated by a private company, Docol©c UG (haftungsbeschränkt) & Co KG, based in Germany. It was developed in the years 2004–2005 at the University of Braunschweig— intended for personal use only. In 2006, it became available commercially. It uses MS Bing services to crawl the web and enables its customers to connect and browse their own databases. The license costs depend on the number of pages to be scanned per year and per institution.

DPV is part of the Slovenian National Portal of Open Science, which also provides an academic repository. The project, which is supported by the Slovenian higher education institutions, started in 2013. The detection software was developed by researchers from Slovenian universities. The operation of the system is partially funded by the European Union from the European Regional Development Fund and the Ministry of Education, Science and Sport (Ojsteršek et al., 2014 ).

Dupli Checker presents itself as a plagiarism checker. It is a free tool, but each search is limited to 1,000 characters). It does not focus on any specific users or purposes. There is no information about who operates it available at the website, we were also not able to receive such information when we asked directly via email. The website offers a variety of tools such as a paraphrasing tool and many search engine optimization (SEO) and other website management tools. Additionally, according to the statement found on their web site, they “have delivered over 1,000,000 pages of high-quality content which attracts large amounts of traffic to [their] client’s websites” (Dupli Checker, 2019 ), so that it appears that they also offer a copywriting service.

The system intihal.net is operated by a Turkish private company, Asos Eğitim Bilişim Danışmanlik, and focuses on that language. According to direct information from the representatives, the system is being used by 50 Turkish universities and it has been operating approximately since 2017.

PlagAware is operated by a German private company, PlagAware Unternehmergesellschaft (haftungsbeschränkt). PlagAware states that it has 39,000 active users, and focuses on a wide range of customers—universities, schools and businesses, that are offered institutional licenses, and individuals, who can use purchased credits for individual checks of documents. They promise to perform the comparison with 10 billion online documents and reference texts provided by the user.

Plagiarism Software is operated by a private company settled in Pakistan. They focus on any type of individual users and claim to have 500,000 users. According to the information from the representative of the company, they started approximately in 2014 (on the web they claim to have seven years of experience which would date back to 2012) and they are using search engine services to browse the web. They offer five levels of pricing that differ according to the amount of the content being compared.

PlagiarismCheck.org presents itself as a plagiarism checking tool. It is operated by a company based in the United Kingdom and it has been on the market since 2011. Since around 2017 they are focusing on the B2B market. They state that they have more than 77,000 users in 72 countries. They use MS Bing for online searches and for the English language. The representatives claim they are able to do synonym detection. They provide three levels of institutional licenses.

PlagScan presents itself as a plagiarism checker. It is operated by the German company PlagScan GmbH and was launched in 2009. They state that they have more than 1,500 organizations as customers. Although they focus on higher education, high schools, and businesses, PlagScan is also available for single users. They search the internet using MS Bing, published academic articles, their so-called “Plagiarism Prevention Pool”, and optionally a customer’s own database. PlagScan offers multiple pricing plans for each type of customer, as well as options for a free trial.

StrikePlagiarism.com presents itself as a plagiarism detection system, operated by the Polish company Plagiat.pl. It provides its services to over 500 universities in 20 countries. Apart from universities, it is also used by high schools and publishers. They state that they are market leaders in Poland, Romania, and Ukraine. In 2018, they signed a Memorandum of Cooperation with the Ukrainian Ministry of Education and Science. The software searches in multiple databases and aggregators.

Turnitin was founded in 1999 by four students and grew to be an internationally known company. In 2014, they acquired the Dutch system Ephorus and “ joined forces ” (Ephorus, 2015 ). In 2019 they themselves were taken over by a US investment company, Advance (Turnitin, 2019 ). With a focus on institutional users only, they are used by 15,000 institutions in 150 countries. Turnitin uses its own crawler to search the web including also an archive of all previously indexed web pages (Turnitin, n.d. ). Turnitin further compares the texts against published academic articles, as well as their own database of all assignments which have ever been submitted to the system, and optionally institutional databases. They are also developing many additional software tools for educators to use in teaching and giving feedback.

Unicheck , which declares itself to be a plagiarism checker, was launched in 2014 under the name Unplag. The brand name changed in 2017. It is operated by the company UKU Group ltd registered in Cyprus (Opencorporates, 2019 ). It is being used by 1,100 institutions in 69 countries, and apart from institutional users (high schools, higher education institutions, and business), they also offer their services for personal use. The pricing plans differ according to the type of user. Unicheck compares the documents with web content, open access sources and for business customers, also with their private library. They also claim to perform homoglyph (similar-looking character replacement) detection.

Urkund, which presents itself as a fully-automated text-recognition system for dealing with detection, handling, and prevention of plagiarism (Urkund, 2019 ), was founded in 1999. It is currently owned by a private equity fund, Procuritas Capital Investors VI, located in Stockholm. They claim to be a leader in the Nordic countries, and to have clients in 70 countries worldwide—mainly academic institutions and high schools, including over 800 of Sweden’s high schools. They crawl the web “ with the aid of an ordinary search engine ” (Urkund, 2019 ) and they also compare the documents with student submissions to the system.

Viper presents itself as a plagiarism checker. It was founded in 2007. Viper focuses on all types of customers; the pricing is based on the pay-as-you-go principle. Currently, it is owned by All Answers Limited ( 2019 ), which according to the information at the website, gives an impression of an essay mill. It is interesting to see the progress in the way Viper uses the uploaded content on their “Terms and conditions” page. In 2016 the page stated “[w] hen you scan a document, you agree that 9 months after completion of your scan, we will automatically upload your essay to our student essays database which will appear on one of our network of websites so that other students may use it to help them write their own essays” (Viper, 2016 ). The time span was shortened to 3 months some time afterwards (Viper, 2019a ). These paragraphs have been removed from the current version of the page (Viper, 2019b ). On a different page, it is noted that “when you scan your work for plagiarism using Viper Premium it will never be published on any of our study sites” (Viper, 2019c ). In e-mail communication, Viper claims that they are not using any essay without the author’s explicit consent.

Coverage results

This section discusses how much of the known text similarity was found by the systems. As they have various strengths and weaknesses, it is not possible to boil down the results to a single number that could easily be compared. Rather, the focus is on different aspects that will be discussed in detail. All tables in this section show the averages of the evaluation, therefore the maximum possible score is 5 and the minimum possible score is 0. Boldface indicates the maximum value achieved per each line, providing an answer to the question as to which system performed best for this specific criterion . All the values are shaded from red (worst) to dark green (best) with yellow being intermediate.

Language comparison

Table 2 shows the aggregated results of the language comparisons based on the language sets. It can be seen that most of the systems performed better for English, Italian, Spanish, and German, whereas the results for Latvian, Slovak, Czech, and Turkish languages are poorer in general. The only system which found a Czech student thesis from 2010 which is publicly available from a university webpage, was StrikePlagiarism.com.The Slovak paper in an open-access journal was not found by any of the systems. Urkund was the only system that found an open-access book in Turkish. It is worth noting that a Turkish system, intihal.net, did not find this Turkish source.

Unfortunately, our testing set did not contain documents in Albanian or Slovenian, so we were not able to evaluate the potential strengths of the national systems (Akademia and DPV). And due to the restrictions on our account, it was not possible for us to process the Italian language in Akademia, although that should now be possible.

There are interesting differences between the systems depending on the language. PlagScan performed best on the English set, Urkund on Spanish, Slovak, and Turkish, PlagAware on German and StrikePlagiarism.com on Czech set. Three systems (PlagiarismCheck.org, PlagScan, and StrikePlagiarism.com) achieved the same maximum score for the Italian set.

Besides the individual languages, we also evaluated language groups according to a standard linguistic classification, that is, Germanic (English and German), Romanic (Italian and Spanish), and Slavic (Czech and Slovak). Table 3 shows the results for these language subgroups. Systems achieved better results with Germanic and Romanic languages, their results are comparable. The results for Slavic languages are noticeably worse.

Types of plagiarism sources

This subsection discusses the differences between various types of sources with the results given in Table 4 . The testing set contained Wikipedia extracts, open-access papers, student theses, and online documents such as blog posts. The systems generally yielded the best results for Wikipedia sources. The scores between the systems vary due to their ability to detect paraphrased Wikipedia articles. Urkund scored the best for Wikipedia, Turnitin found the most open access papers, StrikePlagiarism.com scored by the best in the detection of student theses and PlagiarismCheck.org gave the best result for online articles.

Since it is assumed that Wikipedia is an important source for student papers, the Wikipedia results were examined in more detail. Table 5 summarizes the results from 3 × 8 single-source documents (one article per language) and Wikipedia scores from multi-source documents containing one-fifth of the text taken from the current version of Wikipedia. In general, most of the systems are able to find similarities to the text that has been copied and pasted from Wikipedia.

Over a decade ago, Bretag & Mahmud ( 2009 ), p. 53 wrote:

“The text-matching facility in electronic plagiarism detection software is only suited to detect ‘word-for-word’ or ‘direct’ plagiarism and then only in electronic form. The more subtle forms of plagiarism, plus all types of plagiarism from paper-based sources, are not able to be detected at present.”

Technological progress, especially in software development, advances rapidly. It is commonly expected that text-matching in the sense of finding both exact text matches and paraphrased ones should be a trivial task today. The testing results do not confirm this.

The results in Table 5 are quite surprising and indicate insufficient systems. The performance on plagiarism from Wikipedia disguised by a synonym replacement was generally poorer and almost no system was able to satisfyingly identify manual paraphrase plagiarism. This is surely due to both the immense number of potential sources and the exponential explosion of potential changes to a text.

Plagiarism methods

The same aggregation as was done in Table 5 for Wikipedia was also done over all 16 single-source and eight multi-source documents. Not only copy & paste, synonym replacement, and manual paraphrase were examined, but also translation plagiarism.

Translations were done from English to all languages, as well as from Slovak to Czech, from Czech to Slovak and from Russian to Latvian. The results are shown in Table 6 , which confirms that software performs worse on synonym replacement and manual paraphrase plagiarism.

As has been shown in other investigations (Weber-Wulff et al., 2013 ) translation plagiarism is very seldom picked up by software systems. The worst performance of the systems in this test was indeed the translation plagiarism, with one notable exception—Akademia. This system is the only one that performs semantic analysis and allows users to choose the translation language. Unfortunately, their database—with respect to the languages of our testing—is much smaller than the database of other systems. However, the performance drop between copy-paste and translation plagiarism is much smaller for Akademia than for the other systems.

Given the very poor performance of the systems for translation plagiarism, it did not make sense to distinguish Google Translate and manual translation. The vast majority of the systems did not find text translated by either of them.

Some systems found a match in translation from Slovak to Czech, nevertheless, it was due to the words identical in both languages. For other languages, the examination of the system outputs for translation plagiarism revealed that the only match the systems found in translated documents was for the references. Matches in the references might be an indicator of translation plagiarism, but of course, if two papers use the same source, they should be written in an identical form if they follow the same style guide. This is an important result for educators.

Single-source vs. multi-source documents

One scenario that was considered in the test was when a text is compiled from short passages taken from multiple sources. This seems to be much closer to a real-world setting, in which plagiarism of a whole document is less likely, whereas ‘patch-writing’ or ‘compilation’ is a frequent strategy of student writers, especially second-language student writers (Howard, 1999 , p. 117ff). Surprisingly, some systems performed differently for these two scenarios (see Table 7 ). To remove a bias caused by different types of sources, the Wikipedia-only portions were also examined in isolation (see Table 8 ), the results are consistent in both cases.

Usability results

The usability of the systems was evaluated using 23 objective criteria which were divided into three groups of criteria related to the system workflow process, the presentation of the results, and additional aspects. The points were assigned based on researcher findings during a specific period of the time.

Workflow process usability

The first criteria group is related to the usability of the workflow process of using the systems. It was evaluated using the following questions:

Is it possible to upload and test multiple documents at the same time?

Does the system ask to fill in metadata for documents?

Does the system use original file names for the report?

Is there any word limit for the document testing?

Does the system display text in the chosen language only?

Can the system process large documents (for example, a bachelor thesis)?

The results are summarized in Table 9 . With respect to the workflow process, five systems were assigned the highest score in this category. The scores of only five systems were equal to or less than 3. Moreover, the most supported features are the processing of large documents (13 systems), as well as displaying text in the chosen language and having no word limits (12 systems). Uploading multiple documents is a less supported feature, which is unfortunate, as it is very important for educational institutions to be able to test several documents at the same time.

Result presentation usability

The presentation and understandability of the results reported by the systems were evaluated in a second usability criteria group. Since the systems cannot determine plagiarism, the results must be examined by one or more persons in order to determine if plagiarism is present and a sanction warranted. It must be necessary to download the result reports and to be able to locate them again in the system. Some systems rename the documents, assigning internal numbering to them, which makes it extremely difficult to find the report again. Many systems have different formats for online and downloadable reports. It would be useful for the report review if the system kept the original formatting and page numbers of the document being analyzed in order to ease the load of evaluation.

It is assumed that the vast majority of the universities require that the evidence of found similarities be documented in the report so that they can be printed out for a student’s permanent record. This evidence is examined by other members of the committee, who may not have access to the system at a disciplinary hearing.

The results related to the presentation group are summarized in Table 10 and all criteria are listed below:

Reports are downloadable.

Results are saved in the user’s account and can be reviewed afterwards.

Matched passages are highlighted in the online report.

Matched passages are highlighted in the downloaded report (offline).

Evidence of similarity is demonstrated side-by-side with the source in the online report.

Evidence of similarity is demonstrated side-by-side with the source in the downloaded report.

Document formatting is not changed in the report.

Document page numbers are shown in the report.

The report is not spoiled by false positives.

None of the systems was able to get the highest score in the usability group related to the test results. Two systems (PlagScan and Urkund) support almost all features, but six systems support half or fewer features. The most supported features are the possibility to download result reports and highlighting matched passages in the online report. Less supported features are a side-by-side demonstration of evidence in the downloaded report and in the online report, as well as keeping document formatting.

Other usability aspects

Besides the workflow process and the result presentation, there are also other system usability aspects worth evaluating, specifically:

System costs are clearly stated in the system homepage.

Information about a free system trial version is advertised on the webpage.

The system can be integrated as an API to a learning management system.

The system can be integrated with the Moodle platform.

The system provides call support.

The call support is provided in English.

English is properly used on the website and reports.

There are no external advertisements.

In order to test the call support, telephone numbers were called from a German university telephone during normal European working hours (9:00–17:00 CET/GMT + 1). A checklist (Weber-Wulff, 2019 ) was used to guide the conversation if anyone answered the phone. English was used as the language of communication, even when calling German companies. For intihal.net, the call was not answered, but an hour later the call was returned. StrikePlagiarism.org did not speak English, but organized someone who did. He refused, however, to give information in English and insisted that email support be used. Plagiarism Software publishes a number in Saudia Arabia, but returned the call from a Pakistani number. PlagAware only has an answering machine taking calls, but will respond to emails. The woman answering the Turnitin number kept repeating that all information was found on the web pages and insisted that since this was not a customer calling, that the sales department be contacted. Each of these systems was awarded half a point. Akademia published a wrong number on their web page as far as could be discerned, as the person answering the phone only spoke a foreign language. In case we did not reach anyone via phone and thus we could not assess their ability to speak English, we assigned 0 points for this criterion.

As shown in Table 11 , only PlagiarismCheck.org and Unicheck fulfilled all criteria. Five systems were only able to support less than half of the defined features. The most supported features were no grammatical mistakes seen and no external advertisements. Problematic areas are not stating the system costs clearly, unclear possible integration with Moodle, and the lack of provision of call support in English.

In the majority of the previous research on testing of text-matching tools, the main focus has been on coverage. The problem with most of these studies is that they approach coverage from only one perspective. They only aim at measuring the overall coverage performance of the detection tools, whereas the present study approaches coverage from four perspectives: language-based coverage, language subgroup-based coverage, source-based coverage, and disguising technique-based coverage. This study also includes a usability evaluation.

It must be noted that both the coverage and the usability scores are based on work that was done with potentially older versions of the systems. Many companies have responded to say that they now are able to deal with various issues. This is good, but we can only report on what we saw when we evaluated the systems. If any part of the evaluation was to be repeated, it would have to be repeated for all systems. It should be noted that similar responses have come from vendors for all of Weber-Wulff’s tests, such as (Weber-Wulff et al., 2013 ).

It must be also noted that selection of usability criteria and their weights reflect personal experience of the project team. We are fully aware that different institutions may have different priorities. To mitigate this limitation, we have published all usability scores, allowing for calculations using individual weights.

Language-based coverage

With respect only to the language-based coverage, the performance of the tools for eight languages was evaluated in order to determine which tools yield the best results for each particular language. The results showed that best-performing tools with respect only to coverage are (three systems tied for Italian):

PlagAware for German,

PlagScan for English, Italian,

PlagiarismCheck.org for Italian, Latvian,

Strikeplagiarism.com for Czech, Italian and

Urkund for Slovak, Spanish, and Turkish.

It is worth noting that, in an overall sense, the text-matching tools tested yield better results for widely spoken languages. In the literature, language-based similarity detection mainly revolves around identifying plagiarism among documents in different languages. No study, to our knowledge, has been conducted specifically on the coverage of multiple languages. In this respect, these findings offer valuable insights to the readers. As for the language subgroups, the tested text-matching tools work best for Germanic languages and Romanic languages while results are not satisfactory for Slavic languages.

Source-based coverage testing

Source-based coverage testing was made using four types of sources; Wikipedia, open access papers, a student thesis and online articles. For many students, Wikipedia is the starting point for research (Howard & Davies, 2009 ), and thus can be regarded as one of the primary sources for plagiarists. Since a Wikipedia database is freely available, it is expected that Wikipedia texts should easily be identifiable. Testing the tools with Wikipedia texts demonstrates the fundamental ability to catch text matches.

Three articles per language were created, each of which was made using a different disguising technique (copy & paste, synonym replacement and manual paraphrase) for all eight languages. The best performing tools for the sources tested over all languages were

PlagiarismCheck.org for online articles,

StrikePlagiarism.com for the student thesis (although this may be because the student thesis was in Czech),

Turnitin for open-access papers and

Urkund for Wikipedia

Since Wikipedia is assumed to be a widely used source, it was worth investigating Wikipedia texts deeper. The results revealed that the majority of tools are successful at detecting similarity with copy & paste from Wikipedia texts, except for Intihal.net, DPV and Dupli Checker respectively. However, a considerable drop was observed in synonym replacement texts in all systems, except for Urkund, PlagiarismCheck.org and Turnitin. Unlike other systems, Urkund, PlagiarismCheck.org and Turnitin yielded promising results in synonym replacement texts. This replicates the result of the study of Weber-Wulff et al. ( 2013 ), in which Urkund and Turnitin were found to have the best results among 16 tools.

As for the paraphrased texts, all systems fell short in catching similarity at a satisfactory level. PlagiarismCheck.org was the best performing tool in paraphrased texts compiled from Wikipedia. Overall, Urkund was the best performing tool at catching similarity in Wikipedia texts created by all three disguising techniques.

One aspect of Wikipedia sources that is not adequately addressed by the text-matching software systems is the proliferation of Wikipedia copies on the internet. As discussed in Weber-Wulff et al. ( 2013 ), this can lead to the appearance of many smallish text matches instead of one large one. In particular, this can happen if the copy of the ever-changing Wikipedia in the database of the software system is relatively old and the copies on the internet are from newer versions. A careless teacher may draw false conclusions if they focus only on the quantity of Wikipedia similarities in the report.

Disguising technique-based coverage

The next dimension of coverage testing is disguising technique-based coverage . In this phase, documents were created using copy & paste, synonym replacement, paraphrase, and translation techniques. In copy & paste documents, all systems achieved acceptable results except DPV, intihal.net and Dupli Checker. Urkund was the best tool at catching similarity in copy & paste texts. The success of some of the tools tested in catching similarity in copy & paste texts has also been validated by other studies such as Turnitin (Bull et al., 2001 ; Kakkonen & Mozgovoy, 2010 ; Maurer et al., 2006 ; Vani & Gupta, 2016 ) and Docol©c (Maurer et al., 2006 ).

For synonym replacement texts, the best-performing tools from copy & paste texts continued their success with a slight decline in scores, except for PlagiarismCheck.org which yielded better results in synonym replacement texts than copy & paste texts. Plagiarism Software and Viper showed the sharpest decline in their scores for synonym replacement. Urkund and PlagiarismCheck.org were the best tools in this category.

For paraphrased texts, none of the systems was able to provide satisfactory results. However, PlagiarismCheck.org, Urkund, PlagScan and Turnitin scored somewhat better than the other systems. PlagScan (Křížková et al., 2016 ) and Turnitin (Bull et al., 2001 ) also scored well in paraphrased texts in some studies.

In translated texts, all the systems were unable to detect translation plagiarism, with the exception of Akademia. This system allows users an option to check for potential translation plagiarism. The systems detected translation plagiarism mainly in the references, not in the texts. This is similar to the previous research findings and has not been improved since then. For example, Turnitin and Docol©c have previously been shown not to be efficient in detecting translation plagiarism (Maurer et al., 2006 ). To increase the chances of detecting translation plagiarism, paying extra attention to the matches with the reference entries should be encouraged since matches from the same source can be a significant indicator of translation plagiarism. However, it should be noted that some systems may omit matches with the reference entries by default.

Multi-source coverage testing

In the last phase of coverage testing, we tested the ability of systems to detect similarity in the documents that are compiled from multiple sources. It is assumed that plagiarised articles contain text taken from multiple sources (Sorokina, Gehrke, Warner, & Ginsparg, 2006 ). This type of plagiarism requires additional effort to identify. If a system is able to find all similarity in documents which are compiled from multiple sources, this is a significant indicator of its coverage performance.

The multi-source results show that Urkund, the best performing system in single-source documents, shares the top score with PlagAware in multi-source documents, while Dupli Checker, DPV and intihal.net yielded very unsatisfactory results. Surprisingly, only the performance of two systems (Akademia and Unicheck) demonstrated a sharp decline in multi-source documents whereas the performance of ten systems actually improved for multi-source documents. This shows that the systems perform better in catching short fragments in a multi-source text rather than the whole document taken from a single source.

As for the general testing, the results are highly consistent with the Wikipedia results which contributes the validity of the single-source and multi-source testing. Again, in single-source documents, Urkund obtained the highest score, while PlagAware is the best performing system in multi-source documents. Dupli Checker, DPV and intihal.net obtained the least scores in both categories. Most of the systems demonstrated better performance for multi-source documents than for single-source ones. This is most probably explained by the chances the systems had for having access to a source. If one source was missing in the tool’s database, it had no chance to identify the text match. The use of multiple sources gave the tools multiple chances of identifying at least one of the sources. This points out quite clearly the issue of false negatives: even if a text-matching tool does not identify a source, the text can still be plagiarized.

Overall coverage performance

Based on the total coverage performance, calculated as an average of the scores for each testing document, we can divide the systems into four categories (sorted alphabetically within each category) based on their overall placement on a scale of 0 (worst) to 5 (best).

Useful systems - the overall score in [3.75 – 5.0]:

There were no systems in this category

Partially useful systems - the overall score in [2.5 – 3.75):

PlagAware, PlagScan, StrikePlagiarism.com, Turnitin, Urkund

Marginally useful systems - the overall score in [1.25 – 2.5):

Akademia, Copyscape, Docol©c, PlagiarismCheck.org, Plagiarism Software, Unicheck, Viper

Unsuited for academic institutions - the overall score in [0 – 1.25):

Dupli Checker, DPV, intihal.net

The second evaluation focus of the present study is on usability. The results can be interpreted in two ways, either in a system-based perspective or a feature-based one, since some users may prioritize a particular feature over others. For the system-based usability evaluation, Docol © c, DPV, PlagScan, Unicheck, and Urkund were able to meet all of the specified criteria. PlagiarismCheck.org, Turnitin, and Viper were missing only one criterion (PlagiarismCheck.org dropped the original file names and both Turnitin and Viper insisted on much metadata being filled in).

In the feature-based perspective, the ability to process large documents, no word limitations, and using only in the chosen language were the features most supported by the systems. Unfortunately, the uploading of multiple documents at the same time was the least supported feature. This is odd, because it is an essential feature for academic institutions.

A similar usability evaluation was conducted by Weber-Wulff et al. ( 2013 ). In this study, they created a 27-item usability checklist and evaluated the usability of 16 systems. Their checklist includes similar criteria of the present study such as storing reports, side-by-side views, or effective support service. The two studies have eight systems in common. In the study of Weber-Wulff et al. ( 2013 ), the top three systems were Turnitin, PlagAware, and StrikePlagiarism.com while in the present study Urkund, StrikePlagiarism, and Turnitin are the best scorers. Copyscape, Dupli Checker, and Docol © c were the worst scoring systems in both studies.

Another similar study (Bull et al., 2001 ) addressed the usability of five systems including Turnitin. For usability, the researchers set some criteria and evaluated the systems based on these criteria by assigning stars out of five. As a result of the evaluation, Turnitin was given five stars for the clarity of reports, five stars for user-friendliness, five stars for the layout of reports and four stars for easy-to-interpret criteria.

The similarity reports are the end products of the testing process and serve as crucial evidence for decision makers such as honour boards or disciplinary committees. Since affected students may decide to ask courts to evaluate the decision, it is necessary for there to be clear evidence, presented with the offending text and a potential source presented in a synoptic (side-by-side) style, and including metadata such as page numbers to ease verifiability. Thus, the similarity reports generated were the focus of the usability evaluation.

However, none of the systems managed to meet all of the stated criteria. PlagScan (no side-by-side layout in the offline report) and Urkund (did not keep the document formatting) scored seven out of eight points. They were closely followed by Turnitin and Unicheck which missed two criteria (no side-by-side layout in online or offline reports).

The features supported most were downloadable reports and some sort of highlighting of the text match in the online reports. Two systems, Dupli Checker and Copyscape, do not provide downloadable reports to the users. The side-by-side layout was the least supported feature. While four systems offer side-by-side evidence in their online reports, only one system (Urkund) supports this feature in the offline report. It can be argued that the side-by-side layout is an effective way to make a contrastive analysis in deciding whether a text match can be considered plagiarism or not, but this feature is not supported by most of the systems.

Along with the uploading process and the understandability of reports, we also aimed to address certain features that would be useful in academia. Eight criteria were included in this area:

clearly stated costs,

the offer of a free trial,

integration to an LMS (Learning Management System) via API,

Moodle integration (as this is a very popular LMS),

availability of support by telephone during normal European working hours (9–15),

availability of support by telephone in English,

proper English usage on the website and in the reports, and

no advertisements for other products or companies.

The qualitative analysis in this area showed that only PlagiarismCheck.org and Unicheck were able to achieve a top score. PlagScan scored seven points out of eight and was followed by PlagAware (6.5 points), StrikePlagiarism.com (6.5 points), Docol © c and Urkund (6 points). Akademia (2 points), DPV (2 points), Dupli Checker (3 points), intihal.net (3 points) and Viper (3 points) did not obtain satisfactory results.

Proper English usage was the most supported feature in this category, followed by no external advertisements. The least supported feature was clearly stated system costs, only six systems fulfilled this criterion. While it is understandable that a company wants to be able to charge as much as they can get from a customer, it is in the interests of the customer to be able to compare the total cost of use per year up front before diving into extensive tests.

In order to calculate the overall usability score, the categories were ranked based on their impact on usability. In this respect, the interpretation of the reports was considered to have the most impact on usability, since similarity reports can be highly misleading (also noted by Razı, 2015 ) when they are not clear enough or have inadequate features. Thus, the scores from this category were weighted threefold. The workflow process criteria were weighted twofold and the other criteria were weighted by one. The maximum weighted score was thus 47. Based on these numbers, we classified the systems into three categories (the boundaries for these categories were 35, 23, and 11:

Useful systems: Docol©c, PlagScan, Turnitin, Unicheck, Urkund;

Partially useful systems: DPV, PlagAware, PlagiarismCheck.org, StrikePlagiarism.com, Viper;

Marginally useful systems: Akademia, Dupli Checker, Copyscape, intihal.net, Plagiarism Software.

Unsuited for academic institutions: -

Please note that these categories are quite subjective, as our evaluation criteria are subjective and the weightings as well. For other use cases, the criteria might be different.

Combined coverage

If the results for coverage and usability are combined on a two-dimensional graph, Fig.  1 emerges. In this section, the details of the coverage and usability are discussed.

figure 1

Coverage and Usability combined. X-axis: Total score for coverage; Y-axis: Total weighted score for usability

Coverage is the primary limitation of a web-based text-matching tool (McKeever, 2006 ) and the usability of such a system has a decisive influence on the system users (Liu, Lo, & Wang, 2013 ). Therefore, Fig. 1 presents a clear portrayal of the overall effectiveness of the systems. Having determined their criteria related to the coverage and usability of a web-based text-matching tool, clients can decide which system works best in their settings. Vendors are given an idea about the overall effectiveness of their systems among the tools tested. This diagram presents an initial blueprint for vendors to improve their systems and the direction of improvement.

One important result that can be seen in this diagram is that the usability performance of the systems is relatively better than their coverage performance (see Fig. 1 ). As for coverage, the systems demonstrated at best only average performance. Thus, it has been shown that the systems tested fall short in meeting the coverage expectations. They are useful in the sense that they find some text similarity that can be considered plagiarism, but they do not find all such text similarity and they also suffer from false positives.

Conclusions and recommendations

This study is the output of an intensive two-year collaboration and systematic effort of scholars from a number of different European countries. Despite the lack of external funding, the enthusiasm-driven team performed a comprehensive test of web-based text-matching tools with the aim to offer valuable insights to academia, policymakers, users, and vendors. Our results reflect the state of the art of text-matching tools between November 2018 and November 2019. Testing of the text-matching tools is not a new endeavour, however, previous studies generally have fallen short in providing satisfactory results. This study tries to overcome the problems and shortcomings of previous efforts. It compares 15 tools using two main criteria (coverage and usability), analyzing testing documents in eight languages, compiled from several sources, and using various disguising techniques.

A summary of the most important findings includes the following points:

Some systems work better for a particular language or language family. Coverage of sources written in major languages (English, German, and Spanish) is in general much better than coverage of minor language sources (Czech or Slovak).

The systems’ performance varies according to the source of the plagiarized text. For instance, most systems are good at finding similarity to current Wikipedia texts, but not as good for open access papers, theses, or online articles.

The performance of the systems is also different depending on the disguising technique used. The performance is only partially satisfactory in synonym replacement and quite unsatisfactory for paraphrased and translated texts. Considering that patchwriting, which includes synonym replacement and sentence re-arranging, is a common technique used by students, vendors should work to improve this shortcoming.

The systems appear to be better at catching similarity in multi-source documents than single-source ones, although the test material was presented in blocks and not mixed on a sentence-by-sentence level.

As for the usability perspective, this study clearly shows how important the similarity reports and how user-friendly the testing process of the systems are. The users can see which features are supported by the systems and which are not. Also, vendors can benchmark their features with other systems.

Based on our results, we offer the following recommendations for the improvement of the systems, although we realize that some of these are computationally impossible:

Detect more types of plagiarism, particularly those coming from synonym replacement, translation, or paraphrase. Some semantic analysis results look promising, although their use will increase the amount of time needed for evaluating a system.

Clearly identify the source location in the report, do not just report “Internet source” or “Wikipedia”, but specify the exact URL and date stored so that an independent comparison can be done.

Clearly identify the original sources of plagiarism when a text has been found similar to a number of different sources. For example, if Wikipedia and another page that has copied or used text from Wikipedia turn up as potential sources, the system should show both as possible sources of plagiarism, prioritizing showing Wikipedia first because it is more likely to be the real source of plagiarism. Once Wikipedia has been determined as a potential source, this particular article should be closely compared to see if there is more from this source.

Avoid asking users to enter metadata (for example, author, title, and/or subject) in the system along with the text or file as mandatory information. It is good to have this feature available, but it should not be mandatory.

Lose the single number that purports to identify the amount of similarity. It does not, and it is misused by institutions as a decision maker. Plagiarism is multi-dimensional and must be judged by an expert, not a machine. For example, a system could report the number of word match sequences found, the longest one, the average length of sequences, the number of apparent synonym substitutions, etc.

Design useful reports and documentation. They must be readable and understandable both online and printed as a PDF. Special care should be taken with printed forms, as they will become part of a student’s permanent record. It must show users the suspected text match side-by-side with the possible sources of plagiarism, highlighting the text that appears similar.

Distinguish false positives from real plagiarism. Many of these false positives occur due to commonly used phrases within the context or language employed, or ignoring variant quotation styles (German or French quotation marks or indentation).

A number of important points for educators need to be emphasized:

Despite the systems being able to find a good bit of text overlap, they do not determine plagiarism . There is a prevalent misconception about these tools. In the literature, most of the studies use the term ‘plagiarism detection tools’. However, plagiarism and similarity are very different concepts. What these tools promise is to find overlapping texts in the document examined. Overlapping texts do not always indicate plagiarism, thus the decision about whether plagiarism is present or not should never be taken on the basis of a similarity percentage. The similarity reports of these tools must be inspected by an experienced human being, such as a teacher or an academic, because all systems suffer from false positives (correctly quoted material counted as similarity) and false negatives (potential sources were not found).

Translation plagiarism can sometimes be found by a number of matches in references.

Another problem related to these tools is the risk of their possible cooperation with essay mills; this is because technically a company can store uploaded documents and share them with third parties. In the ‘Terms and Conditions’ sections of some tools, this notion is clearly stated. Uploading documents to such websites can cause a violation of ethics and laws, and teachers may end up with legal consequences. Thus, users must be skeptical about the credibility of the tools before uploading any documents to retrieve a similarity report.

It is necessary to obtain the legal consent of students before uploading their work to third parties. Since this legal situation can be different from country to country or even from university to university, make sure that the relevant norms are being respected before using such systems.

Because of European data privacy laws, for higher education institutions in the EU it must be certain that the companies are only using servers in the EU if they are storing material.

Teachers must make sure that they do not violate a non-disclosure agreement by uploading student work to the text-matching software.

Detecting plagiarism happens far too late in the writing process. It is necessary to institute institution-wide efforts to prevent academic misconduct and to develop a culture of excellence and academic integrity. This encourages genuine learning and shows how academic communication can be done right, instead of focusing on policing and sanctioning.

Considering both the number of participating systems and the number of testing documents and language variety, this paper describes the largest testing which has ever been conducted. We hope the results will be useful both for educators and for policymakers who decide which system to use at their institution. We plan to repeat the test in 3 years to see if any improvements can be seen.

Acknowledgements

We are deeply indebted to the contributions made to this investigation by the following persons:

● Gökhan Koyuncu and Nil Duman from the Canakkale Onsekiz Mart University (Turkey) uploaded many of the test documents to the various systems;

● Jan Mudra from Mendel University in Brno (Czechia) contributed to the usability testing, and performed the testing of the Czech language set;

● Caitlin Lim from the University of Konstanz (Germany) contributed to the literature review;

● Pavel Turčínek from Mendel University in Brno (Czechia) prepared the Czech language set;

● Esra Şimşek from the Canakkale Onsekiz Mart University (Turkey), helped in preparing the English language set;

● Maira Chiera from University of Calabria (Italy) prepared the Italian language set;

● Styliani Kleanthous Loizou from the University of Nicosia (Cyprus) contributed to the methodology design;

● We wish to especially thank the software companies that provided us access to their systems free of charge and patiently extended our access as the testing took much more time than originally anticipated.

● We also wish to thank the companies that sent us feedback on an earlier version of this report. We are not able to respond to every issue raised, but are grateful for them pointing out areas that were not clear.

Additional information

A pre-print of this paper has been published on Arxiv: http://arxiv.org/abs/2002.04279

This research did not receive any external funding. HTW Berlin provided funding for openly publishing the data and materials.

Author information

Authors and affiliations.

Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00, Brno, Czechia

Tomáš Foltýnek & Dita Dlabolová

University of Wuppertal, Wuppertal, Germany

Tomáš Foltýnek

Riga Technical University, Riga, Latvia

Alla Anohina-Naumeca & Laima Kamzola

Canakkale Onsekiz Mart University, Çanakkale, Turkey

Slovak Centre for Scientific and Technical Information, Bratislava, Slovakia

Július Kravjar

Universidad de Monterrey, Mexico, Mexico

Jean Guerrero-Dib

Balikesir University, Balikesir, Turkey

Özgür Çelik

University of Applied Sciences HTW Berlin, Berlin, Germany

Debora Weber-Wulff

You can also search for this author in PubMed   Google Scholar

Contributions

TF managed the project and performed the overall coverage evaluation. DD communicated with the companies that are providing the systems. AAN and LK wrote the survey of related work. SR and ÖÇ wrote the discussion and conclusion. LK and DWW performed the usability evaluation. DWW designed the methodology, made all the phone calls, and improved the language of the final paper. All authors meticulously evaluated the similarity reports of the systems and contributed to the whole project. All authors read and approved the final manuscript. The contributions of others who are not authors are listed in the acknowledgements.

Corresponding author

Correspondence to Tomáš Foltýnek .

Ethics declarations

Competing interests.

Several authors are involved in organization of the regular conferences Plagiarism across Europe and Beyond , which receive funding from Turnitin, Urkund, PlagScan and StrikePlagiarism.com . One team member received “Turnitin Global Innovation Awards” in 2015. These facts did not influence the research in any phase.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These are the main contact URLs for the 15 systems evaluated.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Foltýnek, T., Dlabolová, D., Anohina-Naumeca, A. et al. Testing of support tools for plagiarism detection. Int J Educ Technol High Educ 17 , 46 (2020). https://doi.org/10.1186/s41239-020-00192-4

Download citation

Received : 13 February 2020

Accepted : 16 March 2020

Published : 27 July 2020

DOI : https://doi.org/10.1186/s41239-020-00192-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Text-matching software
  • Software testing
  • Plagiarism detection tools
  • Usability testing

plagiarism detection research papers

plagiarism report

Prevent plagiarism, run a free plagiarism check.

  • Knowledge Base

Best Plagiarism Checkers of 2023 Compared

Published on June 7, 2023 by Koen Driessen .

The best plagiarism checker should be able to detect plagiarism the most accurately, even if the original phrasing has been altered. The tool should also provide a clear, comprehensive plagiarism report.

To identify which plagiarism checker is best, we conducted in-depth research comparing the performance of 10 checkers. We uploaded plagiarized texts that were either directly copied and pasted or edited to varying degrees. In total, we used 140 sources to construct our test documents.

Overview of total scores per plagiarism checker, based on the amount of detected plagiarism:

Table of contents

Our process for comparing checkers, 1. scribbr review , 2. plagaware review, 3. quetext review, 4. viper review, 5. grammarly review, 6. plagiarism detector review, 7. copyleaks review, 8. smodin review, 9. compilatio review, 10. writer review, frequently asked questions about plagiarism checkers.

In order to find the best plagiarism checker, we analyzed different aspects of the tools, focusing on both depth and breadth.

We based our analysis on the following factors:

  • Access to the biggest and most varied database
  • Ability to detect the most plagiarism for the most source types
  • Ability to detect plagiarism when the plagiarized texts have been paraphrased
  • Highest quality of matches
  • Level of user-friendliness and trustworthiness

We used the same test documents, test date (April/May 2023), evaluation criteria, and data analysis for each tool in order to objectively compare the plagiarism results side by side. This ensured that the results required very little interpretation on our part.

Also see our list of the best free plagiarism checkers and the best AI detectors .

“Catches plagiarism more accurately than any other checker”

Scribbr Plagiarism Checker

  • Finds the most plagiarism and works for edited texts, too
  • Does not store or sell documents
  • Offers a happiness guarantee and live support
  • Offers a Self-Plagiarism Checker to check for self-plagiarism
  • Offers a limited free version
  • Quality comes at a price
  • Cannot work directly in the tool

Quality of matches

Scribbr performed well for all source types relevant to students, such as journal articles and dissertations.

Most importantly, Scribbr’s checker was the most successful at detecting plagiarism in source texts that had been heavily edited to mimic accidental paraphrasing plagiarism.

Scribbr was also able to find full matches. This means the entire plagiarized portion is matched correctly to just one source, rather than multiple incorrect sources.

The results are presented in a clear, downloadable overview. Different colors are used for different sources, making it easy for users to assess each plagiarism issue separately.

Issues can be fixed with Scribbr’s free citation generator , which generates proper citations for any missed or improperly cited sources.

Users can also choose to combine the Plagiarism Checker with the Self-Plagiarism Checker , which is unique to Scribbr. This tool allows users to upload their own unpublished documents in addition to the public database.

Instead of requiring users to subscribe to their services, Scribbr charges per plagiarism check ($19.95–$39.95, depending on the word count).

However, users are unable to work directly in the tool, and it is not possible to re-check your document for free.

Trustworthiness

Scribbr does not store the uploaded documents, sell them to third parties, or share them with academic institutions. Data is automatically deleted after 30 days, or students can opt to manually delete their document after the check.

Scribbr has live and responsive customer support to assist students in multiple languages. There is easy access to a plagiarism checker guide and other free resources about plagiarism.

Scribbr also has a happiness guarantee, where students receive a new check or refund if they aren’t satisfied for any reason.

Try Scribbr’s Plagiarism Checker now

Plagaware

  • Detects most of the plagiarism
  • Documents not stored in database
  • Multiple support options, but no live support
  • Difficult to read PDF reports
  • Does not work well for scholarly sources

Plagaware was often able to find full matches and attribute the text to the correct source. However, it performed poorly with heavily edited texts.

Sometimes, the tool did not correctly distinguish between separate plagiarized sources (i.e., it incorrectly attributed two consecutive paragraphs to the same source).

The website report is easy to navigate as it uses different colors for sources. The tool has a function that tells you how many words a plagiarized section has and gives a similarity percentage.

Unfortunately, the downloadable report is confusing to read as it uses only one color and doesn’t clearly link plagiarized text with a source.

Plagaware does not store files and does not sell uploaded content. Their website contains a contact form and phone number.

Discover Plagaware’s plagiarism checker

QueText

  • Users can work directly in the tool
  • Offers a citation assistant that helps with adding missing citations
  • Partial matches where one source text is matched to multiple sources
  • Reports are hard to read

Quetext detected about half the plagiarized text but was often unable to fully match the entire source text to one source. Instead, individual sentences get attributed to different sources.

The website states that Quetext checks against webpages and academic sources, but the tool does not in fact perform well for academic sources.

The website report sometimes did not correctly distinguish between plagiarized paragraphs and attributed them to the same source.

Quetext differentiates by severity of plagiarism in its online report: orange for partial matches and red for full matches. Otherwise, the same colors are used for different sources.

Users can work directly in the tool, and Quetext offers a citation assistant that helps generate the missing citations.

The original layout of the text is lost in both the website report and the downloadable report.

The tool does not store or re-upload your text, and it offers a help center with FAQs. There is no live support, but users can submit a help request on the website.

Discover Quetext’s plagiarism checker

Viper Plagiarism Checker

  • Ability to compare your newest document with previously uploaded documents
  • Different colors for different source types
  • Documents get published on external websites if you use the free version
  • Performs poorly for scholarly sources
  • Report can be hard to read, due to partial matches

Viper found about half of the plagiarism when the source text was directly copied or lightly edited, but it struggled to find plagiarism in moderately and heavily edited texts.

The tool had average performance for most source types but struggled specifically with scholarly source types such as journal articles and dissertations. This makes this tool less useful for students.

When Viper flagged plagiarism, it often matched the entire passage to one source. While the downloadable report was somewhat difficult to read, it helpfully uses different colors for different sources.

Viper also allows users to check their text against their own previously uploaded documents, which may help to catch instances of self-plagiarism.

Viper stores previous submissions and shows matches with those texts.

The tool does not sell your document if you use the paid version (prices start at $3.95 per 5,000 words). However, if you use the free version, the document is uploaded to an internal database. After three months, the text is published on an external website as an example for other students. This is not good if the content of your text is confidential.

Viper does not have live support, but there is a contact form on the website.

Discover Viper’s plagiarism checker

Grammarly

  • Offers a language and citation assistant
  • Does not sell or share documents with other parties
  • Finds quite a low percentage of plagiarism
  • 100,000 character limit (14,000–25,000 words)
  • Same colors for different sources

Grammarly found a relatively low amount of plagiarism. It performed best with lightly edited texts. However, it scored below 50% in all other rounds of testing.

When it did find plagiarism, the tool was often able to find the right source. However, the matches were usually only partial and rarely included the entire plagiarized section.

The design is not very clear. As the tool is primarily used as a language checker, the plagiarism checker function is somewhat difficult to find. The tool uses the same color for all sources, making it hard to read.

The subscription comes with a language and style tool and offers a citation assistant that helps generate the missing citations. In addition, there is a 100,000 character limit for both the monthly plan ($30 per month) and the yearly plan ($12 per month).

Grammarly does not store, sell, or share documents with other parties.

There is a support page with tips, tutorials, and FAQs, and it is possible to submit a question via a form. However, there is no live support.

Discover Grammarly’s plagiarism checker

Plagiarism detector

  • Does not store or sell your document
  • Difficult to find the plagiarized source in the report
  • Same highlight colors for different sources
  • Technical difficulties generating the report, and no live support

This tool had difficulty uploading the document for each round of testing (for the first round, it took over six hours to generate a report). It seems that Plagiarism Detector was unable to adequately process a document of this size, even though the document did not exceed their word limit of 25,000.

This tool was mostly only able to find partial matches, attributing individual sentences to one or more sources, rather than the entire section.

In the downloadable report, the format of the text is lost, making it hard to read. The list of sources is also difficult to cross-reference with the text above because the same colors are used for different sources.

Plagiarism Detector offers a rewrite tool to help resolve similarities, but the quality of this tool is questionable, and it does not help with the citation issues. You should resolve plagiarism by citing the relevant source, not by attempting to rewrite the text to disguise it.

Plagiarism Detector does not store or sell uploaded documents. There is no live support, but the website does offer a help request form.

Discover Plagiarism Detector’s plagiarism checker

Copyleaks

  • Text layout is kept intact in the online tool
  • Difficult to read reports
  • Performed poorly with academic sources
  • Unclear data protection policy

Copyleaks performed relatively poorly with all source types. It lists multiple possible sources for flagged sentences. However, it was sometimes able to attribute an entire source block to the correct source.

The report distinguishes between text that is “identical” to a source, text that contains “minor changes,” and text that is “paraphrased.” However, its judgment is sometimes wrong.

The site report and downloadable report differed (the downloaded report flagged content that was not flagged on the site report). On the website report, the source information function did not always work correctly, so it was sometimes impossible to check the accuracy of the attributions.

The reports were also difficult to read, as they use the same color highlights for all sources. While the source formatting is maintained in the online tool, it is altered in the downloadable report.

Copyleaks claim that they “will never steal your work.” To permanently remove uploaded content from their database, users must contact customer support. The website has a chat option and a contact form.

Discover Copyleaks’ plagiarism checker

Smodin

  • Site has a citation generator
  • Very low word limit
  • Report is difficult to read
  • Scans individual sentences rather than full texts

Smodin performed relatively poorly in all rounds of testing. While it got some sources correct, it struggled to get full matches and frequently attributed the same text to multiple different sources. It did particularly poorly with scholarly sources.

The tool has a very low limit of 2,000 words per scan.

Both the online report and the downloadable PDF report are difficult to read. The downloadable report displays the uploaded document, but it doesn’t highlight the text that it flags as plagiarized. Instead, below this, it includes the individual parts of the text that it recognizes as plagiarism.

However, it is difficult to read as the same text is repeated over and over again if multiple sources were found for it. The sources also do not seem to be in a logical order.

Their data protection policy suggests they can/might sell or repurpose uploaded content. There is no live support, but there is a contact form on the site.

Discover Smodin’s plagiarism checker

Compilatio

  • You can use your credits for multiple documents
  • Scans against other documents you have uploaded
  • Doesn’t highlight the plagiarized parts in the text
  • Hard to review and resolve instances of plagiarism

Compilatio was able to find a few of the plagiarized sources, but it struggled if the source text had been moderately or heavily edited. However, when it was able to identify a source, it was often correct.

It was not possible to determine how accurately Compilatio could match the source to a plagiarized source text, since the plagiarized parts were not highlighted. Instead, the report only shows the general area that matches the source. This may limit its helpfulness to users, since it’s hard to review and resolve potential instances of plagiarism.

As the report does not highlight the flagged text, it does not provide a good overview of the potential issues. Additionally, you cannot work in the tool, so it’s not possible to exclude similarities from the report.

The tool scans the text against other documents you have uploaded, which can help to avoid self-plagiarism.

Users can buy packages for 5,000 words for €3.99 (no $ value is available on the site), 25,000 words for €14.99, or 50,000 words for €24.99. These credits can be used for multiple documents and are valid for 12 months after purchase.

Compilatio does not share or sell submitted documents, and the documents are not used as comparison material for other users.

There is no live support available, but they do provide a helpdesk with FAQs and a request form.

Discover Compilatio’s plagiarism checker

Writer plagiarism checker

  • Part of a tool with many other useful functions
  • Allows you to edit in the tool
  • Text format is not altered
  • Doesn’t find much plagiarism
  • Doesn’t provide a downloadable report

Quality of matches 

This tool found very little plagiarism. It struggled particularly with scholarly sources like dissertations and academic journals.

It scans sentence by sentence, rather than whole paragraphs. As a result, the plagiarized text is often attributed to multiple sources, rather than just one.

The plagiarism checker function is part of a larger language tool that provides grammar and spelling checks. As the tool doesn’t display an overall plagiarism percentage, we couldn’t apply our normal testing methodology. 

The tool has a nice, clean design and was relatively easy to use. The format of the text remains intact, and you can work directly in the tool. However, it doesn’t provide a downloadable report, so you must work solely on the website.

Writer claims that its tool is secure and safe to use and that it won’t share your information with anybody. A detailed security policy is described on a separate page.

They provide a contact number and email address on the site, but no live support.

Discover Writer’s plagiarism checker

Go back to the best plagiarism checker

Plagiarism can be detected by your professor or readers if the tone, formatting, or style of your text is different in different parts of your paper, or if they’re familiar with the plagiarized source.

Many universities also use plagiarism detection software like Turnitin’s, which compares your text to a large database of other sources, flagging any similarities that come up.

It can be easier than you think to commit plagiarism by accident. Consider using a plagiarism checker prior to submitting your paper to ensure you haven’t missed any citations.

The accuracy depends on the plagiarism checker you use. Per our in-depth research , Scribbr is the most accurate plagiarism checker. Many free plagiarism checkers fail to detect all plagiarism or falsely flag text as plagiarism.

Plagiarism checkers work by using advanced database software to scan for matches between your text and existing texts. Their accuracy is determined by two factors: the algorithm (which recognizes the plagiarism) and the size of the database (with which your document is compared).

Yes, Scribbr offers a limited free version of its plagiarism checker in partnership with Turnitin. It uses Turnitin’s industry-leading plagiarism detection technology and has access to most content databases.

Run a free plagiarism check

If you’re a university representative, you can contact the sales department of Turnitin .

Scribbr is an authorized Turnitin partner

Your document will be compared to the world’s largest and fastest-growing content database , containing over:

  • 99.3 billion current and historical webpages.
  • 8 million publications from more than 1,700 publishers such as Springer, IEEE, Elsevier, Wiley-Blackwell, and Taylor & Francis.

Note: Scribbr does not have access to Turnitin’s global database with student papers. Only your university can add and compare submissions to this database.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Driessen, K. (2023, June 07). Best Plagiarism Checkers of 2023 Compared. Scribbr. Retrieved April 8, 2024, from https://www.scribbr.com/plagiarism/best-plagiarism-checker/

Is this article helpful?

Koen Driessen

Koen Driessen

Other students also liked, types of plagiarism and how to recognize them, how to avoid plagiarism | tips on citing sources, consequences of mild, moderate & severe plagiarism, what is your plagiarism score.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.35(27); 2020 Jul 13

Logo of jkms

Similarity and Plagiarism in Scholarly Journal Submissions: Bringing Clarity to the Concept for Authors, Reviewers and Editors

Aamir raoof memon.

Institute of Physiotherapy & Rehabilitation Sciences, Peoples University of Medical & Health Sciences for Women, Nawabshah (Shaheed Benazirabad), Sindh, Pakistan.

INTRODUCTION

What constitutes plagiarism? What are the methods to detect plagiarism? How do “plagiarism detection tools” assist in detecting plagiarism? What is the difference between plagiarism and similarity index? These are probably the most common questions regarding plagiarism that many research experts in scientific writing are usually faced with, but a definitive answer to them is less known to many. According to a report published in 2018, papers retracted for plagiarism have sharply increased over the last two decades, with higher rates in developing and non-English speaking countries. 1 Several studies have reported similar findings with Iran, China, India, Japan, Korea, Italy, Romania, Turkey, and France amongst the countries with highest number of retractions due to plagiarism. 1 , 2 , 3 , 4 A study reported that duplication of text, figures or tables without appropriate referencing accounted for 41.3% of post-2009 retractions of papers published from India. 5 In Pakistan, Journal of Pakistan Medical Association started a special section titled “Learning Research” and published a couple of papers on research writing skills, research integrity and scientific misconduct. 6 , 7 However, the problem has not been adequately addressed and specific issues about it remain unresolved and unclear. According to an unpublished data based on 1,679 students from four universities of Pakistan, 85.5% did not have a clear understanding of the difference between similarity index and plagiarism (unpublished data). Smart et al. 8 in their global survey of editors reported that around 63% experienced some plagiarized submissions, with Asian editors experiencing the highest levels of plagiarized/duplicated content. In some papers, journals from non-English speaking countries have specifically discussed the cases of plagiarized submissions to them and have highlighted the drawbacks in relying on similarity checking programs. 9 , 10 , 11 The cases of plagiarism in non-English speaking countries have a strong message for honest researchers that they should improve their English writing skills and credit used sources by properly citing and referencing them. 12

Despite aggregating literature on plagiarism from non-Anglophonic countries, the answers to the aforementioned questions remain unclear. In order to answer these questions, it is important to have a thorough understanding of plagiarism and bring clarity to the less known issues about it. Therefore, this paper aims to 1) define plagiarism and growth in its prevalence as well as literature on it; 2) explain the difference between similarity and plagiarism; 3) discuss the role of similarity checking tools in detecting plagiarism and the flaws on completely relying on them; and 4) discuss the phenomenon called Trojan citation. At the end, suggestions are provided for authors and editors from developing countries so that this issue maybe collectively addressed.

Defining plagiarism and its prevalence in manuscripts

To begin with, plagiarism maybe defined as “ when somebody presents the published or unpublished work of others, including ideas, scholarly text, images, research design and data, as new and original rather than crediting the existing source of it. ” 13 The common types of plagiarism, including direct, mosaic, paraphrasing, intentional (covert) or unintentional (accidental) plagiarism, and self-plagiarism have been discussed in previous reviews. 14 , 15 , 16

Evidence suggests that the first paper accused for plagiarism was published in 1979 and there has been a substantial growth in the cases of plagiarism over time. 1 , 2 , 3 , 4 , 5 , 8 , 17 Previous studies have pointed that plagiarism is prevalent in developing and non-English speaking countries but the occurrence of plagiarism in developed countries suggests that it is rather a global problem. 1 , 2 , 3 , 4 , 18 , 19 , 20 As of today (1 April 2020), the search conducted in Retraction Database ( http://retractiondatabase.org/RetractionSearch.aspx ?) for papers retracted for plagiarism found 2,280 documents. Similarly, Scopus search for plagiarism in title of journal articles found 2,159 results. This suggests that the papers retracted for plagiarism are in fact higher than the papers published on this issue. However, what we see now may not necessary be true i.e., the cases of plagiarism might be higher than we know. Certainly, database search for papers tagged for plagiarism is limited to indexed journals only, which keeps non-indexed journals (both low-quality and deceptive journals) out of focus. 5 , 21 Moreover, journal coverage may vary from one database to the other as reported in a recent paper on research dissemination in South Asia. 22 Therefore, both the prevalence of plagiarism and literature published on it as reported by database search are most likely “ understated as of today .” 5

Reasons for plagiarism: lack of understanding and poor citing practices

Although reasons for plagiarism are complex, previous papers have suggested possible causes for plagiarism by authors. 16 , 23 , 24 , 25 , 26 One of the major but less known reason for this might be that the students, naïve researchers, and even some faculty members either lack clarity about what constitutes plagiarism or are unable to differentiate similarity index versus plagiarism. 24 , 26 , 27 For example, a recent online survey conducted on the participants in the AuthorAID MOOC on Research Writing found that 84.4% of the survey participants were unaware of the difference between similarity index and plagiarism, though almost all of them had reported having an understanding of plagiarism. 24 The same paper reported that one in three participants admitted that they had plagiarized at some point during their academic career. 24 Therefore, it is important to have clarity about what constitutes plagiarism and the difference between similarity index and plagiarism so that the increasing rates of plagiarism could be deterred.

The ‘existing source’ or ‘original source’ in the definition of plagiarism refers to the main (primary) source and not the source (secondary) from where the author extracts the information. For example, someone cites a paper for a passage on mechanism of how exercise affects sleep but the cited paper aims to determine the prevalence of sleep disorders and exercise level rather than the mechanistic association. A thorough evaluation finds that the cited paper had used the text from another review paper that talked about the mechanisms relating sleep with exercise behavior. This phenomenon of improper secondary (or indirect) citations may be common among students and novice researchers, particularly from developing countries, and should be discouraged. 27

SIMILARITY INDEX

Plagiarism vs. similarity index and the role of similarity checking tools.

Plagiarism as defined above refers to the intentional (covert) or unintentional (accidental) theft of published or unpublished intellectual property (i.e., words or ideas), whereas similarity index refers to “ the extent of overlap or match between an author's work compared to other existing sources (books, websites, student thesis, and research articles) in the databases of similarity checking tools. ” 9 , 24 The advancements in information technology has helped researchers get help from various freely available (i.e., Viper, eTBLAST/HelioBLAST, PlagScan, PlagiarismDetect, Antiplagiat, Plagiarisma, DupliChecker) and subscription-based (i.e., iThenticate, Turnitin, Similarity Check) similarity checking tools. 8 , 24 Many journal editors use iThenticate and/or Similarity Check (Crossref) for screening submitted manuscripts for similarity detection whereas Turnitin is commonly used by universities and faculty to assess text similarity in students' work; however, there is a fairness issue that not every journal or university, particularly those from developing countries, can afford to pay for using these subscription-based services. 28 For instance, an online survey found that only about 18% participants could use Turnitin through their university subscription. 24 Another problem is the way these tools are commonly referred to as i.e., plagiarism detection tools, plagiarism checking software, or plagiarism detection programs. However, based on the function they perform, it would be appropriate to call them differently, such as similarity checking tools, similarity checkers, text-matching tools, or simply text-duplicity detection tools. 5 , 8 , 23 This means that these tools help locate matching or overlapping text (similarity) in submitted work, without directly flagging up plagiarism. 24

Taking Turnitin as an example, these tools reflect the text similarity through color codes, each linked to an online source of it; details for this have been described elsewhere. 23 , 28 Journal editors, universities and some organizations consider text above specific cutoff values for the percentage of similarity as problematic. According to a paper, 5% or less text similarity (overlap of the text in the manuscript with text in the online literature) is acceptable to some journal editors, while others might want to put the manuscript under scrutiny if the text similarity is over 20%. 29 , 30 Another paper observed that journal editors tend to reject a manuscript if text similarity is above 10%. 31 The study on participants completing the AuthorAID MOOC on Research Writing also found that some participants reported that their institutions consider text similarity of less than 20% as acceptable. 24 As an example, the guidelines of the University Grants Commission of India allow for similarity up to 10% as acceptable or minor (Level 0), but anything above is categorized into different levels (based on the percentages), each with separate list of repercussions for students and researchers. 32 This approach might miss the cases where the acceptable similarity of 10% comes from a single source, especially if the editors relied on the numbers only. In addition, this approach has the potential for punishing authors who have not committed plagiarism at all. To illustrate this, the randomly written text presented in Fig. 1 would be considered plagiarism based on the rule of cutoff values. Some authors opine that text with over four consecutive words or a number of word strings should be treated as plagiarized. 28 , 33 This again is not a good idea as the text “the International Physical Activity Questionnaire was used to measure …” would be same in several papers, but this is definitely not plagiarism because the methodology of different papers on the same topic could be similar; so, the decision should not be based on the numbers reflected by similarity detection tools. 28 Therefore, it would be prudent not to set any cutoff values for text similarity as it will lead to a slippery slope (“a course of action that seems to lead inevitably from one action or result to another with unintended consequences” –defined by Merriam-Webster Dictionary ) and give “a sense of impunity to the perpetrators.” 32

An external file that holds a picture, illustration, etc.
Object name is jkms-35-e217-g001.jpg

Drawbacks of similarity checking tools

There are a few drawbacks on completely relying on the similarity checking tools. First, these tools are not foolproof and might miss the incidents of translational plagiarism and figure plagiarism. 24 Translational plagiarism is the most invisible type of copying in non-Anglophone countries where an article published in languages other than English is copied (with or without minor modifications) and published in an English journal or vice versa. 10 This is indeed extremely difficult type of plagiarism to detect, and different approaches (e.g., use of Google translator) to address it have been recently reported. 34 , 35 Nevertheless, there might be some cases where this practice maybe acceptable, such as publishing policy papers (see “ Identifying predatory or pseudo-journals” – this paper was published in International Journal of Occupational and Environmental Medicine , National Medical Journal of India , and Biochemia Medica in 2017 by authors affiliated with World Association of Medical Editors (WAME) – or “The revised guidelines of the Medical Council of India for academic promotions: Need for a rethink” – this paper was published in over ten journals during 2016 by four journal editors and endorsed by members (not all) of the Indian Association of Medical Journal Editors, for example). Second, text similarity in some parts of manuscript (i.e., methods and results) should be weighed differently from other sections (i.e., introduction and discussions) and its conclusions. 31 In addition, based on the personal experience of the author of this paper, some individuals might use a sophisticated technique to avoid detection of high similarity through the use of inappropriate synonyms, jargon, and deliberate grammatical and structural errors in the text of the manuscript. Third, plagiarism of ideas may be missed by these tools as they can only detect plagiarism of words. 23 , 32 Therefore, similarity checking tools tend to underestimate plagiarized text or sometimes overestimate non-plagiarized material as problematic ( Fig. 1 ). 24 , 36 It should be noted that these tools serve as only an aid to determine suspected instances of plagiarism and the text of the manuscript should always be evaluated by experts, so “a careful human cannot be replaced.” 31 , 37 A few papers published in the Journal of Korean Medical Science have presented the examples where plagiarized content was missed by similarity checking tools and later noticed after a careful examination of the text. 9 , 10 Finally, plagiarism of unpublished work cannot be detected by these tools as they are limited to online sources only. 23 This is particularly important in the context of developing countries where research theses/dissertations of students are not deposited in research repositories, and where commercial, predatory editing and brokering services exist. 10 , 38 For example, the research repository of the Higher Education Commission of Pakistan allows deposition of doctoral theses only, and less than five universities (out of over 150) across the country have a research repository allowing for deposition of scholarly content. 38 Recently some strange trend of predatory editing and brokering services has emerged that offer clones of previously published papers or unpublished work to non-Anglophone or some lazy authors demanding quick and easy route to publications for promotion and career advancement. 10 Although plagiarism of unpublished work would not be easy for experts to detect, this may be possible through their previous experience and scholarly networks.

TROJAN CITATION: PERSONAL EXPERIENCE

A recent experience worth discussion in context to plagiarism comes in the shape of the Trojan citation where someone “ makes reference to a source one time to in order to evade detection (by editors and readers) of bad intentions and provide cover for a deeper, more pervasive plagiarism. ” 39 This practice is particularly common in those with an intent of deceiving the readers and playing with the system. A few months ago, the author of this paper was invited to review a manuscript on predatory publishing by a journal. The content of the manuscript appeared suspicious but was not labelled “plagiarized” during the first round of the review. However, during the second round, it was noticed that this was a case of Trojan citation where the author(s) cited the main source for a minor point and copied the major part of the manuscript from a paper published in Biochemia Medica (a Croatian journal) with slight modification in the content. 40 The editor of the journal was informed about this and the manuscript was rejected further processing. This example suggests that careful human intervention by experts is required to highlight the cases of plagiarism.

In conclusion, what we know about the growth in the prevalence of plagiarism may be ‘just the tip of the iceberg’. Therefore, collective contribution from authors, reviewers, and editors, particularly from Asia-Pacific region, is required. Authors from the Asia-Pacific region and developing countries, with an expertise on this topic, should play their role by supporting journal editors and through their mentorship skills. Furthermore, senior researchers should encourage and help their honors and master students to publish their unpublished work before it gets stolen by commercial, brokering agencies. They should also work in close collaboration with universities and organizations related with higher education in countries where this issue is not properly addressed, and should facilitate education and training sessions on plagiarism as previous evidence suggests that workshops and online training sessions may be helpful. 5 On the other hand, journal editors from Asia-Pacific region and developing countries should not judge the manuscripts solely on the basis of percentage of similarity as reflected by similarity checking services. They should have a database of their own where manuscripts about plagiarism in scientific writing, for example, should be sent for review to the experts on this subject. As journal editors may not be experts in all fields, networking and seeking help from experts would be helpful in avoiding the cases of plagiarism in the future. It would be appropriate that the journal editors and the trainee editors, particularly from the resource-limited countries, are educated about the concept of scientific misconduct and the advancement in knowledge around this area. Moreover, journal editors should publish and publically discuss the cases of plagiarism as a learning experience for others. The Journal of Korean Medical Science has used this approach regarding cases of plagiarism, which other journals from the region are encouraged to adopt. 9 , 10 Likewise, a paper discussing case scenarios of salami publication (i.e., “ a distinct form of redundant publication which is usually characterized by similarity of hypothesis, methodology or results but not text similarity ”) serves as a good example of how journal editors may facilitate authors to utilize their mentorship skills and support journals in educating researchers. 41 There should be strict penalties on cases of plagiarism, and safety measures for security of whistleblowers should be in place and be ensured. By doing so, evil and lazy authors who bypass the system would be punished and honest authors would be served. Thus, the take-home message for editors from Asia-Pacific region is that a collective effort and commitment from authors, reviewers, editors and policy-makers is required to address the problem of plagiarism, especially in the developing and non-English speaking countries.

Disclosure: The author has no potential conflicts of interest to disclose.

Turnitin for Research & Publication

Everything you need to publish with confidence

plagiarism detection research papers

Most complete database of scholarly work

Compare work against premier content from Open Access Journals and top publishers like Elsevier, Springer Nature, Wiley, Taylor & Francis, and IEEE.

plagiarism detection research papers

Flexible exclusions for rigorous checking

Add efficiency to the manuscript review process with customizable exclusion options that let you evaluate only the most critical matches.

plagiarism detection research papers

Streamline reviews & collaboration

Work with peers and advisers easily to review and revise submissions using a simple system of folders and folder sharing.

plagiarism detection research papers

Safeguard your institution's reputation

Identify text similarities early in a high-stakes publication process so the quality of your manuscripts maintains your institution’s reputation.

plagiarism detection research papers

The #1 plagiarism checker, trusted by researchers & publishers

plagiarism detection research papers

iThenticate

This high-stakes plagiarism checking tool is the gold standard for academic researchers and publishers.

Used by leading academic publishers

plagiarism detection research papers

Book cover

Analyzing Non-Textual Content Elements to Detect Academic Plagiarism pp 121–148 Cite as

Image-based Plagiarism Detection

  • Norman Meuschke 2  
  • First Online: 01 August 2023

116 Accesses

This chapter is structured as follows. Section 4.1 presents related work on Image-based Plagiarism Detection to point out the research gap we address. Section 4.2 describes typical forms of image similarity we observed in practice.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Author information

Authors and affiliations.

University of Göttingen, Göttingen, Germany

Norman Meuschke

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Norman Meuschke .

4.1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 2626 kb)

Rights and permissions.

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature

About this chapter

Cite this chapter.

Meuschke, N. (2023). Image-based Plagiarism Detection. In: Analyzing Non-Textual Content Elements to Detect Academic Plagiarism. Springer Vieweg, Wiesbaden. https://doi.org/10.1007/978-3-658-42062-8_4

Download citation

DOI : https://doi.org/10.1007/978-3-658-42062-8_4

Published : 01 August 2023

Publisher Name : Springer Vieweg, Wiesbaden

Print ISBN : 978-3-658-42061-1

Online ISBN : 978-3-658-42062-8

eBook Packages : Computer Science and Engineering (German Language)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research
  • Human Editing
  • Free AI Essay Writer
  • AI Outline Generator
  • AI Paragraph Generator
  • Paragraph Expander
  • Essay Expander
  • Literature Review Generator
  • Research Paper Generator
  • Thesis Generator
  • Paraphrasing tool
  • AI Rewording Tool
  • AI Sentence Rewriter
  • AI Rephraser
  • AI Paragraph Rewriter
  • Summarizing Tool
  • AI Content Shortener
  • Plagiarism Checker
  • AI Detector
  • AI Essay Checker
  • Citation Generator
  • Reference Finder
  • Book Citation Generator
  • Legal Citation Generator
  • Journal Citation Generator
  • Reference Citation Generator
  • Scientific Citation Generator
  • Source Citation Generator
  • Website Citation Generator
  • URL Citation Generator
  • Proofreading Service
  • Editing Service
  • AI Writing Guides
  • AI Detection Guides
  • Citation Guides
  • Grammar Guides
  • Paraphrasing Guides
  • Plagiarism Guides
  • Summary Writing Guides
  • STEM Guides
  • Humanities Guides
  • Language Learning Guides
  • Coding Guides
  • Top Lists and Recommendations
  • AI Detectors
  • AI Writing Services
  • Coding Homework Help
  • Citation Generators
  • Editing Websites
  • Essay Writing Websites
  • Language Learning Websites
  • Math Solvers
  • Paraphrasers
  • Plagiarism Checkers
  • Reference Finders
  • Spell Checkers
  • Summarizers
  • Tutoring Websites

Research Paper Plagiarism Checker AcademicHelp

Secure the uniqueness of your scholarly work.

Advanced Scholarly Database Access

Advanced Scholarly Database Access

Contextual Analysis Technology

Contextual Analysis Technology

Source Integration Guidance

Source Integration Guidance

Bolster your research integrity.

plagiarism detection research papers

Remember Me

What is your profession ? Student Teacher Writer Other

Forgotten Password?

Username or Email

Free plagiarism checker by EasyBib

Check for plagiarism, grammar errors, and more.

  • Expert Check

plagiarism detection research papers

Check for accidental plagiarism

Avoid unintentional plagiarism. Check your work against billions of sources to ensure complete originality.

plagiarism detection research papers

Find and fix grammar errors

Turn in your best work. Our smart proofreader catches even the smallest writing mistakes so you don't have to.

plagiarism detection research papers

Get expert writing help

Improve the quality of your paper. Receive feedback on your main idea, writing mechanics, structure, conclusion, and more.

What students are saying about us

plagiarism detection research papers

"Caught comma errors that I actually struggle with even after proofreading myself."

- Natasha J.

plagiarism detection research papers

"I find the suggestions to be extremely helpful especially as they can instantly take you to that section in your paper for you to fix any and all issues related to the grammar or spelling error(s)."

- Catherine R.

plagiarism detection research papers

Check for unintentional plagiarism

Easily check your paper for missing citations and accidental plagiarism with the EasyBib plagiarism checker. The EasyBib plagiarism checker:

  • Scans your paper against billions of sources.
  • Identifies text that may be flagged for plagiarism.
  • Provides you with a plagiarism score.

You can submit your paper at any hour of the day and quickly receive a plagiarism report.

What is the EasyBib plagiarism checker? 

Most basic plagiarism checkers review your work and calculate a percentage, meaning how much of your writing is indicative of original work. But, the EasyBib plagiarism checker goes way beyond a simple percentage. Any text that could be categorized as potential plagiarism is highlighted, allowing you time to review each warning and determine how to adjust it or how to cite it correctly.

You’ll even see the sources against which your writing is compared and the actual word for word breakdown. If you determine that a warning is unnecessary, you can waive the plagiarism check suggestion.

Plagiarism is unethical because it doesn’t credit those who created the original work; it violates intellectual property and serves to benefit the perpetrator. It is a severe enough academic offense, that many faculty members use their own plagiarism checking tool for their students’ work. With the EasyBib Plagiarism checker, you can stay one step ahead of your professors and catch citation mistakes and accidental plagiarism before you submit your work for grading.

plagiarism detection research papers

Why use a plagiarism checker? 

Imagine – it’s finals week and the final research paper of the semester is due in two days. You, being quite familiar with this high-stakes situation, hit the books, and pull together a ten-page, last-minute masterpiece using articles and materials from dozens of different sources.

However, in those late, coffee-fueled hours, are you fully confident that you correctly cited all the different sources you used? Are you sure you didn’t accidentally forget any? Are you confident that your teacher’s plagiarism tool will give your paper a 0% plagiarism score?

That’s where the EasyBib plagiarism checker comes in to save the day. One quick check can help you address all the above questions and put your mind at ease.

What exactly is plagiarism? 

Plagiarism has a number of possible definitions; it involves more than just copying someone else’s work. Improper citing, patchworking, and paraphrasing could all lead to plagiarism in one of your college assignments. Below are some common examples of accidental plagiarism that commonly occur.

Quoting or paraphrasing without citations

Not including in-text citations is another common type of accidental plagiarism. Quoting is taking verbatim text from a source. Paraphrasing is when you’re using another source to take the same idea but put it in your own words. In both cases, it’s important to always cite where those ideas are coming from. The EasyBib plagiarism checker can help alert you to when you need to accurately cite the sources you used.

Patchwork plagiarism

When writing a paper, you’re often sifting through multiple sources and tabs from different search engines. It’s easy to accidentally string together pieces of sentences and phrases into your own paragraphs. You may change a few words here and there, but it’s similar to the original text. Even though it’s accidental, it is still considered plagiarism. It’s important to clearly state when you’re using someone else’s words and work.

Improper citations

Depending on the class, professor, subject, or teacher, there are multiple correct citation styles and preferences. Some examples of common style guides that are followed for citations include MLA, APA, and Chicago style. When citing resources, it’s important to cite them accurately. Incorrect citations could make it impossible for a reader to track down a source and it’s considered plagiarism. There are EasyBib citation tools to help you do this.

Don’t fall victim to plagiarism pitfalls. Most of the time, you don’t even mean to commit plagiarism; rather, you’ve read so many sources from different search engines that it gets difficult to determine an original thought or well-stated fact versus someone else’s work. Or worse, you assume a statement is common knowledge, when in fact, it should be attributed to another author.

When in doubt, cite your source!

Time for a quick plagiarism quiz! 

Which of the following requires a citation?

  • A chart or graph from another source
  • A paraphrase of an original source
  • Several sources’ ideas summarized into your own paragraph
  • A direct quote
  • All of the above

If you guessed option E than you’d be correct. Correct punctuation and citation of another individual’s ideas, quotes, and graphics are a pillar of good academic writing.

What if you copy your own previous writing?

Resubmitting your own original work for another class’s assignment is a form of self-plagiarism, so don’t cut corners in your writing. Draft an original piece for each class or ask your professor if you can incorporate your previous research.

What features are available with the EasyBib plagiarism checker? 

Along with providing warnings and sources for possible plagiarism, the EasyBib  plagiarism checker works alongside the other EasyBib tools, including a grammar checker  and a spell checker . You’ll receive personalized feedback on your thesis and writing structure too!

The  plagiarism checker compares your writing sample with billions of available sources online so that it detects plagiarism at every level. You’ll be notified of which phrases are too similar to current research and literature, prompting a possible rewrite or additional citation. You’ll also get feedback on your paper’s inconsistencies, such as changes in text, formatting, or style. These small details could suggest possible plagiarism within your assignment.

And speaking of citations, there are also  EasyBib citation tools  available. They help you quickly build your bibliography and avoid accidental plagiarism. Make sure you know which citation format your professor prefers!

Great! How do I start? 

Simply copy and paste or upload your essay into the checker at the top of this page. You’ll receive the first five grammar suggestions for free! To try the plagiarism checker for free, start your EasyBib Plus three-day free trial.* If you love the product and decide to opt for premium services, you’ll have access to unlimited writing suggestions and personalized feedback.

The EasyBib plagiarism checker is conveniently available 24 hours a day and seven days a week. You can cancel anytime.  Check your paper for free today!.

*See Terms and Conditions

Visit www.easybib.com for more information on helpful EasyBib writing and citing tools.

For informational guides and on writing and citing, visit the EasyBib guides homepage .

To revisit this article, visit My Profile, then View saved stories .

  • Backchannel
  • Newsletters
  • WIRED Insider
  • WIRED Consulting

Amanda Hoover

Students Are Likely Writing Millions of Papers With AI

Illustration of four hands holding pencils that are connected to a central brain

Students have submitted more than 22 million papers that may have used generative AI in the past year, new data released by plagiarism detection company Turnitin shows.

A year ago, Turnitin rolled out an AI writing detection tool that was trained on its trove of papers written by students as well as other AI-generated texts. Since then, more than 200 million papers have been reviewed by the detector, predominantly written by high school and college students. Turnitin found that 11 percent may contain AI-written language in 20 percent of its content, with 3 percent of the total papers reviewed getting flagged for having 80 percent or more AI writing. (Turnitin is owned by Advance, which also owns Condé Nast, publisher of WIRED.) Turnitin says its detector has a false positive rate of less than 1 percent when analyzing full documents.

ChatGPT’s launch was met with knee-jerk fears that the English class essay would die . The chatbot can synthesize information and distill it near-instantly—but that doesn’t mean it always gets it right. Generative AI has been known to hallucinate , creating its own facts and citing academic references that don’t actually exist. Generative AI chatbots have also been caught spitting out biased text on gender and race . Despite those flaws, students have used chatbots for research, organizing ideas, and as a ghostwriter . Traces of chatbots have even been found in peer-reviewed, published academic writing .

Teachers understandably want to hold students accountable for using generative AI without permission or disclosure. But that requires a reliable way to prove AI was used in a given assignment. Instructors have tried at times to find their own solutions to detecting AI in writing, using messy, untested methods to enforce rules , and distressing students. Further complicating the issue, some teachers are even using generative AI in their grading processes.

Detecting the use of gen AI is tricky. It’s not as easy as flagging plagiarism, because generated text is still original text. Plus, there’s nuance to how students use gen AI; some may ask chatbots to write their papers for them in large chunks or in full, while others may use the tools as an aid or a brainstorm partner.

Students also aren't tempted by only ChatGPT and similar large language models. So-called word spinners are another type of AI software that rewrites text, and may make it less obvious to a teacher that work was plagiarized or generated by AI. Turnitin’s AI detector has also been updated to detect word spinners, says Annie Chechitelli, the company’s chief product officer. It can also flag work that was rewritten by services like spell checker Grammarly, which now has its own generative AI tool . As familiar software increasingly adds generative AI components, what students can and can’t use becomes more muddled.

Detection tools themselves have a risk of bias. English language learners may be more likely to set them off; a 2023 study found a 61.3 percent false positive rate when evaluating Test of English as a Foreign Language (TOEFL) exams with seven different AI detectors. The study did not examine Turnitin’s version. The company says it has trained its detector on writing from English language learners as well as native English speakers. A study published in October found that Turnitin was among the most accurate of 16 AI language detectors in a test that had the tool examine undergraduate papers and AI-generated papers.

Watch the Total Solar Eclipse Online Here

Reece Rogers

The Solar Eclipse Is the Super Bowl for Conspiracists

David Gilbert

How I Became a Python Programmer&-and Fell Out of Love With the Machine

Scott Gilbertson

A Vigilante Hacker Took Down North Korea’s Internet. Now He’s Taking Off His Mask

Andy Greenberg

Schools that use Turnitin had access to the AI detection software for a free pilot period, which ended at the start of this year. Chechitelli says a majority of the service’s clients have opted to purchase the AI detection. But the risks of false positives and bias against English learners have led some universities to ditch the tools for now. Montclair State University in New Jersey announced in November that it would pause use of Turnitin’s AI detector. Vanderbilt University and Northwestern University did the same last summer.

“This is hard. I understand why people want a tool,” says Emily Isaacs, executive director of the Office of Faculty Excellence at Montclair State. But Isaacs says the university is concerned about potentially biased results from AI detectors, as well as the fact that the tools can’t provide confirmation the way they can with plagiarism. Plus, Montclair State doesn’t want to put a blanket ban on AI, which will have some place in academia. With time and more trust in the tools, the policies could change. “It’s not a forever decision, it’s a now decision,” Isaacs says.

Chechitelli says the Turnitin tool shouldn’t be the only consideration in passing or failing a student. Instead, it’s a chance for teachers to start conversations with students that touch on all of the nuance in using generative AI. “People don’t really know where that line should be,” she says.

You Might Also Like …

In your inbox: The best and weirdest stories from WIRED’s archive

Jeffrey Epstein’s island visitors exposed by data broker

8 Google employees invented modern AI. Here’s the inside story

The crypto fraud kingpin who almost got away

It's shadow time! How to view the solar eclipse, online and in person

plagiarism detection research papers

Steven Levy

Perplexity's Founder Was Inspired by Sundar Pichai. Now They’re Competing to Reinvent Search

Lauren Goode

Your Kid May Already Be Watching AI-Generated Videos on YouTube

Kate Knibbs

Inside the Creation of the World’s Most Powerful Open Source AI Model

Will Knight

To Build a Better AI Supercomputer, Let There Be Light

Benj Edwards, Ars Technica

Google DeepMind’s Latest AI Agent Learned to Play Goat Simulator 3

Get science-backed answers as you write with Paperpal's Research feature

Do Plagiarism Checkers Detect AI Content?

AI content

Originality has always been a cornerstone of academic success. But the landscape is changing. With the rise of AI writing assistants, the line between helpful tool and sneaky shortcut is blurring. This raises a crucial question: can traditional plagiarism checkers, a student’s reliable partner to detect plagiarism, keep pace with AI content generation? In this blog, we discuss the extent of plagiarism detection on AI-generated text, and how not to overlook AI’s involvement in academic writing.  

Table of Contents

Why can’t plagiarism detectors detect ai content.

  • What are university guidelines on AI-generated content?

Human Supervision

Originality emphasis, enhancing text responsibly, staying informed.

AI-generated content often consists of entirely new combinations of words, making it difficult for traditional plagiarism detection tools to identify unintentional or self-plagiarism. These tools employ conventional text-matching and compare text against a database of existing content to detect plagiarised content. As AI-generated content is original and is not copied or paraphrased from existing content, it doesn’t match any entries in the database, making it challenging for plagiarism detection tools to flag it as plagiarized.

What are university guidelines on AI-generated content ?

Universities, including Yale, have recognized the challenges associated with detecting AI-generated writing and have adapted their guidelines accordingly. 1 Yale’s guidance on the usage of AI acknowledges the difficulty of controlling the ease of AI writing through surveillance or detection technology. Notably, Yale has opted not to enable Turnitin’s AI detection feature in their Canvas system due to reliability concerns. This stance reflects a growing awareness among academic institutions of the limitations of existing detection tools in identifying AI-generated content accurately.

plagiarism detection research papers

What ethics should researchers and PhD students practice when using AI writing tools?

Researchers and PhD students can leverage the benefits of AI writing tools while upholding academic integrity and producing high-quality, original research. Below are a few ethical practices to use AI responsibly in academic writing:

It’s essential to supervise AI-generated content closely, ensuring that it aligns with academic standards and does not deviate from original intentions. AI should not be allowed to run autonomously without human oversight. Maintaining human control ensures that AI-generated content upholds ethical standards and avoids unintended consequences.

Distinguish between using AI to generate new text and enhancing existing work. Originality remains paramount in academic submissions, emphasizing the importance of authentically crafted content.

While AI can assist in enhancing text by simplifying language and structure, it’s crucial to avoid direct copying and ensure that the final output remains original and attributable to the author.

Stay informed about institutional policies and guidelines regarding the ethical use of AI tools in academic writing. Being aware of evolving guidelines helps navigate the ethical considerations effectively.

The discussion reflects a nuanced approach to the use of AI in academic writing, balancing the benefits of AI assistance with ethical considerations and academic integrity. While AI tools can be valuable aids in the writing process, they should be used responsibly and in accordance with institutional guidelines to ensure the production of original and ethically sound academic work.

References:

  • AI Guidance for Teachers – Yale Poorvu Centres for Teaching and Learning https://poorvucenter.yale.edu/AIguidance

Paperpal is a comprehensive AI writing toolkit that helps students and researchers achieve 2x the writing in half the time. It leverages 21+ years of STM experience and insights from millions of research articles to provide in-depth academic writing, language editing, and submission readiness support to help you write better, faster.  

Get accurate academic translations, rewriting support, grammar checks, vocabulary suggestions, and generative AI assistance that delivers human precision at machine speed. Try for free or upgrade to Paperpal Prime starting at US$19 a month to access premium features, including consistency, plagiarism, and 30+ submission readiness checks to help you succeed.  

Experience the future of academic writing – Sign up to Paperpal and start writing for free!  

Related Reads:

  • What are the Benefits of Generative AI for Academic Writing?
  • Should You Use AI Tools like ChatGPT for Academic Writing?
  • Plagiarism Checkers vs. AI Content Detection: Navigating the Academic Landscape
  • What are Journal Guidelines on Using Generative AI Tools

Word Choice Problems: How to Use The Right Words in Academic Writing 

Is it ethical to use ai-generated abstracts without altering it, you may also like, how to use paperpal to generate emails &..., ai in education: it’s time to change the..., is it ethical to use ai-generated abstracts without..., word choice problems: how to use the right..., how to avoid plagiarism when using generative ai..., what are journal guidelines on using generative ai..., types of plagiarism and 6 tips to avoid..., how to write an essay introduction (with examples)..., similarity checks: the author’s guide to plagiarism and..., what is a master’s thesis: a guide for....

ChatGPT Detector: How to Check and Remove Plagiarism by Paraphrasing

ChatGPT Detector

Whether you’re writing a blog post, a case study, or a research paper, you need to avoid any form of plagiarism.

With the rise of AI technologies like ChatGPT, ensuring the originality of content has become more crucial than ever.

Fortunately, using free plagiarism checkers and techniques of paraphrasing can help you make sure your article remains unique and better indexed by search engines.

Is Chat GPT plagiarism-free?

Well, let’s find out whether ChatGPT is plagiarized or not.

I asked it to write a 500 word article on writing strategies , and then checked it for plagiarism to see what would happen.

plagiarism detection research papers

What is plagiarism?

Plagiarism, the act of using someone else’s ideas, words, or work without proper attribution.

Whether intentional or unintentional, plagiarism can result in failed assignments, poor search rankings, and legal consequences.

To safeguard against plagiarism, it’s important to understand its nuances and use effective detection and prevention strategies.

How to check for plagiarism?

While ChatGPT itself is not inherently plagiaristic, the content it produces may inadvertently contain plagiarized material if not used responsibly.

To determine the originality of text generated by ChatGPT, users can employ plagiarism detection tools.

Plagiarism checkers serve as invaluable tools for identifying duplicate content and ensuring originality in your work.

These tools leverage advanced algorithms to compare your text against vast databases of existing content, flagging any instances of potential plagiarism for further review.

Best free plagiarism checker

One good free plagiarism checker I found is Quetext .

Let’s copy and paste our ChatGPT text into it and see if article is plagiarism-free.

This free plagiarism checker is quite handy because it tells you how much of your text matches other sources and provides links to those sources.

Even if you’re not worried about plagiarism, it’s actually quite fascinating to try out.

plagiarism detection research papers

According to this tool, ChatGPT article has a 23% chance of plagiarism.

But here’s the catch: you can only use it for free once before you have to upgrade.

How to remove plagiarism

So how can you avoid plagiarism?

One way is to make sure you cite and quote any sources you use in your writing.

Another method is to ask ChatGPT to rewrite the text without any plagiarism. I tried this and was surprised to find that it reduced the amount of plagiarism by half.

plagiarism detection research papers

Another tip is to provide specific details in your instructions when asking ChatGPT to generate content. For instance, I included some common questions I found on Google:

  • What are the 5 writing strategies?
  • How do you grab a reader’s attention in writing?
  • How do you structure content writing?
  • What are the copywriting techniques for blogging?

It helped reduce the plagiarism rate to just 8%.

plagiarism detection research papers

Another strategy is to go through any plagiarized text piece by piece and paraphrase it. You can do this manually or you can ask Chat GPT to do it for you and then copy and paste back into the document.

plagiarism detection research papers

By doing this, you’ll end up with text that is 100% original and doesn’t match anything online or in private databases.

plagiarism detection research papers

However, remember that just because your content is original doesn’t automatically guarantee high search rankings or provide value to your readers.

A better way to use Chat GPT

The best way to use ChatGPT is as a starting point for your writing.

For example, ask it to write an outline for research paper or article, and then add specific topics or questions you want to cover and ask ChatGPT to write an article based on that.

This method will give you a first draft that you can work on and improve step by step.

Related articles

  • 7 Reasons Why Copywriters Must Be Shameless
  • 5 Factors that Distinguish Great Copywriters
  • 10 Things Epic Copywriters Do
  • Career Advice for Writers: How to Become a Copywriter
  • 5 Things You Need to Know About Copywriting

Christina Walker

A professional freelance web copywriter with several years’ experience in web marketing and SEO copywriting.

Other posts by Christina Walker

Comments (0)

IMAGES

  1. (PDF) A Study on Extrinsic Text Plagiarism Detection Techniques and Tools

    plagiarism detection research papers

  2. (PDF) Plagiarism Detection based on studying correlation between Author

    plagiarism detection research papers

  3. (PDF) Plagiarism and Patchwriting Detection in EFL Students' Graduation

    plagiarism detection research papers

  4. (PDF) Plagiarism Detection Systems

    plagiarism detection research papers

  5. (PDF) Plagiarism Detection Process using Data Mining Techniques

    plagiarism detection research papers

  6. (PDF) Plagiarism in Research Special Reference to Initiatives Taken by

    plagiarism detection research papers

VIDEO

  1. Plagiarism Detection Software Orientation

  2. Lecture on Plagiarism detection tools & Techniques in Research by Shilpa Satralkar

  3. Plagiarism Detection System /Plagiarism Detection System in tamil 9th edition

  4. Introduction to Plagiarism Detection Tools

  5. How is imagetwin increasing quality and trust in science?

  6. Plagiarism Detector Tutorial

COMMENTS

  1. Plagiarism detection and prevention: a primer for researchers

    Creative thinking and plagiarism. Plagiarism is often revealed in works of novice non-Anglophone authors who are exposed to a conservative educational environment that encourages copying and memorizing and rejects creative thinking [12, 13].The gaps in training on research methodology, ethical writing, and acceptable editing support are also viewed as barriers to targeting influential journals ...

  2. Academic Plagiarism Detection: A Systematic Literature Review

    This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of ...

  3. Academic Plagiarism Detection: A Systematic Literature Review

    This article summarizes the research on computational methods to detect academic plagiarism by system-. atically reviewing 239 research papers published between 2013 and 2018. To structure the pr ...

  4. Free Plagiarism Checker in Partnership with Turnitin

    Our plagiarism checker, AI Detector, Citation Generator, proofreading services, paraphrasing tool, grammar checker, summarize, and free Knowledge Base content are designed to help students produce quality academic papers. We make every effort to prevent our software from being used for fraudulent or manipulative purposes.

  5. (PDF) Plagiarism Detection Methods and Tools: An Overview

    Plagiarism Detection Methods and Tools: An Overview. August 2021. Iraqi Journal of Science 62 (8):2771-2783. Authors: Farah Khiled. University of Baghdad.

  6. Plagiarism Detection Software and Its Appropriate Use

    The prevalence of plagiarism has been measured in different settings. Research using plagiarism detection software has uncovered plagiarism rates of between 6% and 23% of scanned manuscripts (Butler, 3). One study suggested that 3000 papers added to MEDLINE each year have strong similarity to papers already indexed in MEDLINE (Garner, 6).

  7. Testing of support tools for plagiarism detection

    In research, plagiarism is included in the three "cardinal sins", FFP—Fabrication, falsification, and plagiarism. According to Bouter, Tijdink, Axelsen, Martinson, and ... Taxonomy, tools and detection. Paper presented at the 19th National Convention on Knowledge, Library and Information Networking (NACLIN 2016), Tezpur University ...

  8. 10 Best Free Plagiarism Checkers in 2023

    The plagiarism tools in this research are tested using 4 test documents, ... Scribbr offers a limited free version that's perfect for checking if your paper contains potential plagiarism. To view the full report, you need to buy the premium version, which costs between $19.95 and $39.95 depending on the word count. ... Plagiarism Detector's ...

  9. Academic Plagiarism Detection

    Production-grade plagiarism detection systems—as opposed to research prototypes—exclusively follow the external detection paradigm. Comparing input documents to a user-specified closed set of potential sources, e.g., all submission for an assignment, is a specialization of external plagiarism detection termed collusion detection [293, p. 10 ...

  10. Best Plagiarism Checkers of 2023 Compared

    In total, we used 140 sources to construct our test documents. Our in-depth research shows that Scribbr's free plagiarism checker is the best plagiarism checker on the market in 2023. It is able to detect plagiarism in both exact copies and heavily edited plagiarized texts, and it provides a clear report.

  11. Plagiarism detection

    Plagiarism detection. The peer-review process is at the heart of scientific publishing. As part of Elsevier's commitment to protecting the integrity of the scholarly record, Elsevier feels a strong obligation to support the scientific community in all aspects of research and publishing ethics. We invest in many resources to help educate ...

  12. Similarity and Plagiarism in Scholarly Journal Submissions: Bringing

    Certainly, database search for papers tagged for plagiarism is limited to indexed journals only, which keeps non-indexed journals (both low-quality and deceptive journals) out of focus.5,21 Moreover, journal coverage may vary from one database to the other as reported in a recent paper on research dissemination in South Asia.22 Therefore, both ...

  13. Turnitin for Research & Publication

    Turnitin launches iThenticate 2.0 to help maintain integrity of high stakes content with AI writing detection Learn more. cancel Why Turnitin. ... the robust plagiarism checker that fits seamlessly into existing workflows. ... Discover how Turnitin works with Partners to enhance education and research Media Center. close. Media Center.

  14. Review of Code Similarity and Plagiarism Detection Research Studies

    This paper provides a comprehensive exploration of code similarity detection techniques and illuminates the prevailing trends in plagiarism detection research. It acquaints readers with a spectrum of distinct code similarity detection methods, accompanied by the requisite contextual background knowledge.

  15. Academic Plagiarism Detection: A Systematic Literature Review

    Abstract. This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the ...

  16. Image-based Plagiarism Detection

    This chapter is structured as follows. Section 4.1 presents related work on Image-based Plagiarism Detection to point out the research gap we address. Section 4.2 describes typical forms of image similarity we observed in practice. Building upon our observations, Section 4.3 derives the functional requirements on image-based plagiarism detection methods.

  17. (PDF) Plagiarism Detection Software: an Overview

    With help of plagiarism checking software, we can avoid the duplicacy in research and academic writings. This paper gives an overview of various effective plagiarism detection methods that have ...

  18. Full article: The case for academic plagiarism education: A PESA

    Recent research testing tools for plagiarism detection 'show that although some systems can indeed help identify some plagiarized content, they clearly do not find all plagiarism and at times also identify non-plagiarized material as problematic' (Foltýnek et al, Citation 2020). There are now more than twenty major PDS on the market.

  19. Research Paper Plagiarism Checker: Free & Online Detector

    AcademicHelp's Research Paper Plagiarism Checker is more than simply a tool; it's also a scholarly excellence companion. This specialized scanner is intended to suit the specific requirements of researchers, instructors, and students engaged in advanced learning. By ensuring that your research articles are unique and have integrity, you may ...

  20. Plagiarism Checker: Free Scan for Plagiarism

    Easily check your paper for missing citations and accidental plagiarism with the EasyBib plagiarism checker. The EasyBib plagiarism checker: Scans your paper against billions of sources. Identifies text that may be flagged for plagiarism. Provides you with a plagiarism score. You can submit your paper at any hour of the day and quickly receive ...

  21. Students Are Likely Writing Millions of Papers With AI

    Students have submitted more than 22 million papers that may have used generative AI in the past year, new data released by plagiarism detection company Turnitin shows.

  22. Plagiarism Detection: The Tool And The Case Study.

    In this paper a new method to detect cross-language plagiarism in electronic documents is presented. The method is based on natural language processing techniques and machine learning methods.

  23. PDF BERT-Enhanced Retrieval Tool for Homework Plagiarism Detection System

    isting research, detection of high level plagiarism is still a challenge due to the lack of high quality datasets. In this pa-per, we propose a plagiarized text data generation method based on GPT-3.5, which produces 32,927 pairs of text pla-giarism detection datasets covering a wide range of pla-

  24. Do Plagiarism Checkers Detect AI Content?

    These tools employ conventional text-matching and compare text against a database of existing content to detect plagiarised content. As AI-generated content is original and is not copied or paraphrased from existing content, it doesn't match any entries in the database, making it challenging for plagiarism detection tools to flag it as ...

  25. What happened after this college student's paper was falsely flagged

    And since a single university may have 50,000 student papers turned in each year, that means if all the professors used an AI detection system, 1,000 papers would be falsely called cases of cheating.

  26. ChatGPT Detector: How to Check and Remove Plagiarism by Paraphrasing

    It helped reduce the plagiarism rate to just 8%. Another strategy is to go through any plagiarized text piece by piece and paraphrase it. You can do this manually or you can ask Chat GPT to do it for you and then copy and paste back into the document. By doing this, you'll end up with text that is 100% original and doesn't match anything ...