International Journal of Speech Language and the Law, Vol 21, No 1 (2014)

A likelihood ratio-based evaluation of strength of authorship attribution evidence in SMS messages using N-grams

Shunichi Ishihara
Issued Date: 26 Jun 2014

Abstract


An experiment in forensic text comparison (FTC) within the likelihood ratio (LR) framework is described. The experiment attempts to determine the strength of authorship attribution evidence modelled with N-grams, which is perhaps one of the most basic automatic modelling techniques. The SMS messages of multiple authors selected from the SMS corpus compiled by the National University of Singapore were used for same- and different-author comparisons. I varied the number of words used for the N-gram modelling (200, 1000, 2000 or 3000 words), and then assessed the performance of each set. The performance of the LR-based FTC system was assessed with the log likelihood ratio cost (Cllr). It is shown in this study that N-grams can be employed within an LR framework to discriminate same-author and different-author SMS texts, but a fairly large amount of data are needed to do it well (i.e. to obtain Cllr < 0.75). It is concluded that the LR framework warrants further examination with different features and processing techniques.

Download Media

PDF (Price: £17.50 )

DOI: 10.1558/ijsll.v21i1.23

References


Abbasi,A.andChen,H.C.(2005)Applyingauthorshipanalysistoextremist-groupwebforummessages.IEEEIntelligentSystems20(5):67–75.http://dx.doi.org/10.1109/MIS.2005.81.
Aitken,C.G.G.(1995)StatisticsandtheEvaluationofEvidenceforForensicScientists.Chichester:JohnWiley.
Aitken,C.G.G.andLucy,D.(2004)Evaluationoftraceevidenceintheformofmultivariatedata.JournaloftheRoyalStatisticalSocietySeriesC-AppliedStatistics53:109–122.http://dx.doi.org/10.1046/j.0035-9254.2003.05271.x.
Aitken,C.G.G.andStoney,D.A.(1991)TheUseofStatisticsinForensicScience.NewYork,London:EllisHorwood.
Aitken,C.G.G.andTaroni,F.(2004)StatisticsandtheEvaluationofEvidenceforForensicScientists.Chichester:Wiley.
Baayen,H.,VanHalteren,H.andTweedie,F.(1996)Outsidethecaveofshadows:Usingsyntacticannotationtoenhanceauthorshipattribution.LiteraryandLinguisticComputing11(3):121–132.http://dx.doi.org/10.1093/llc/11.3.121.
Baggili,I.,Mohan,A.andRogers,M.(2010)SMIRK:SMSmanagementandinformationretrievalkit.InS.Goel(ed.),DigitalForensicsandCyberCrime33–42.NewYork:Springer.
Brümmer,N.andduPreez,J.(2006)Application-independentevaluationofspeakerdetection.ComputerSpeechandLanguage20(2–3):230–275.http://dx.doi.org/10.1016/j.csl.2005.08.001.
Cellular-news.(2006)SMSasatoolinmurderinvestigations.Cellular-news.Retrievedon10November2011fromhttp://www.cellular-news.com/story/18775.php
Chaski,C.E.(2001)Empiricalevaluationsoflanguage-basedauthoridentificationtechniques.ForensicLinguistics8:1–65.http://dx.doi.org/10.1558/ijsll.v8i1.1.
Cohen,F.(2009)Bulkemailforensics.InG.Peterson(ed.),AdvancesinDigitalForensicsV306:51–67.NewYork:SpringerLink.
Corney,M.W.,Anderson,A.M.,Mohay,G.M.andDeVel,O.(2001)Identifyingtheauthorsofsuspectemail.Unpublishedpaper.
DeVel,O.,Anderson,A.,Corney,M.andMohay,G.(2001)Mininge-mailcontentforauthoridentificationforensics.ACMSigmodRecord30(4):55–64.http://dx.doi.org/10.1145/604264.604272
Doddington,G.(2001)Speakerrecognitionbasedonidiolectaldifferencesbetweenspeakers.Proceedingsof2001Eurospeech:2521–2524.
Evett,I.W.,Scranage,J.andPinchin,R.(1993)Anillustrationoftheadvantagesofefficientstatistical-methodsforRFLPanalysisinforensic-science.AmericanJournalofHumanGenetics52(3):498–505.
Fuhrman,C.P.(2008)Forensicvalueofbackscatterfromemailspam.Proceedingsofthe2008ThirdInternationalAnnualWorkshoponDigitalForensicsandIncidentAnalysis46–52.http://dx.doi.org/10.1109/wdfia.2008.10.
Gao,Y.B.andZhao,G.(2005)Knowledge-basedinformationextraction:acasestudyofrecognizingemailsofNigerianfrauds.Proceedingsofthe10thNaturalLanguageProcessingandInformationSystems3513:161–172.
Grant,T.(2007)Quantifyingevidenceinforensicauthorshipanalysis.InternationalJournalofSpeechLanguageandtheLaw14(1):1–25.http://dx.doi.org/10.1558/ijsll.v14i1.1.
Grant,T.(2010)Textmessagingforensics:txt4n6:Idiolectfreeauthorshipanalysis?InA.J.MalcolmCoulthard(ed.),TheRoutledgeHandbookofForensicLinguistics508–522.Abingdon:Routledge.
Halteren,H.V.(2007)Authorverificationbylinguisticprofiling:anexplorationoftheparameterspace.JournalofACMTransactionsonSpeechandLanguageProcessing4(1):1–17.http://dx.doi.org/10.1145/1187415.1187416.
Holmes,D.I.,Robertson,M.andPaez,R.(2001)StephenCraneandtheNew-YorkTribune:acasestudyintraditionalandnon-traditionalauthorshipattribution.ComputersandtheHumanities35(3):315–331.http://dx.doi.org/10.1023/A:1017549100097.
Iqbal,F.,Binsalleeh,H.,Fung,B.C.M.andDebbabi,M.(2013)Aunifieddataminingsolutionforauthorshipanalysisinanonymoustextualcommunications.InformationSciences:98–112.http://dx.doi.org/10.1016/j.ins.2011.03.006.
Iqbal,F.,Khan,L.A.,Fung,B.C.M.andDebbabi,M.(2010)E-mailauthorshipverificationforforensicinvestigation.Proceedingsofthe2010ACMSymposiumonAppliedComputing1591–1598.http://dx.doi.org/10.1145/1774088.1774428.
Juola,P.andBaayen,R.H.(2005)Acontrolled-corpusexperimentinauthorshipidentificationbycross-entropy.LiteraryandLinguisticComputing20(Suppl.):59.http://dx.doi.org/10.1093/llc/fqi024
Jurafsky,D.andMartin,J.H.(2000)SpeechandLanguageProcessing:AnIntroductiontoNaturalLanguageProcessing,ComputationalLinguistics,andSpeechRecognition.UpperSaddleRiver,NJ:PrenticeHall;London:Prentice-HallInternational.
Keselj,V.,Peng,F.,Cercone,N.andThomas,C.(2003)N-gram-basedauthorprofilesforauthorshipattribution.PacificAssociationforComputationalLinguistics3:255–264.
Khan,S.R.,Nirkhi,S.M.andDharaskar,R.V.(2012)AuthoridentificationforE-mailforensic.ProceedingsofNationalConferenceonRecentTrendsinComputingNCRTC29–32.
Koppel,M.andSchler,J.(2004)Authorshipverificationasaone-classclassificationproblem.InC.E.Brodley(ed.),Proceedingsofthe21stInternationalConferenceonMachineLearning1–7.
Koppel,M.,Schler,J.andArgamon,S.(2009)Computationalmethodsinauthorshipattribution.JournaloftheAmericanSocietyforInformationScienceandTechnology60(1):9–26.http://dx.doi.org/10.1002/asi.v60:1.
Kucukyilmaz,T.,Cambazoglu,B.B.,Aykanat,C.andCan,F.(2006)Chatminingforgenderprediction.InT.YakhnoandE.Neuhold(eds.),AdvancesinInformationSystems4243:274-283.NewYork:Springer-VerlagBerlin/Heidelberg.
Kucukyilmaz,T.,Cambazoglu,B.B.,Aykanat,C.andCan,F.(2008)Chatmining:predictinguserandmessageattributesincomputer-mediatedcommunication.InformationProcessing&Management44(4):1448–1466.http://dx.doi.org/10.1016/j.ipm.2007.12.009.
Lambers,M.andVeenman,C.J.(2009)Forensicauthorshipattributionusingcompressiondistancestoprototypes.InZ.Geradts,K.Y.FrankeandC.J.Veenman(eds)ComputationalForensics13–24.Berlin:Springer.
Layton,R.,Watters,P.andDazeley,R.(2010)Authorshipattributionfortwitterin140charactersorless.Proceedingsofthe2ndCybercrimeandTrustworthyComputingWorkshop(CTC)1–8.
Lindley,D.V.(1977)Probleminforensicscience.Biometrika64(2):207–213.
Manning,C.D.andSchütze,H.(2000)FoundationsofStatisticalNaturalLanguageProcessing(2ndedn).Cambridge,MA:MITPress.
McMenamin,G.R.(2002)ForensicLinguistics:AdvancesinForensicStylistics.BocaRaton,FL:CRCPress.
Mohan,A.,Baggili,I.M.andRogers,M.K.(2010)AuthorshipattributionofSMSmessagesusinganN-gramsapproach.ProceedingsofCERIASTechReport2010-11,CenterforEducationandResearchInformationAssuranceandSecurityPurdueUniversity,USA1–12.
Morrison,G.S.(2009a)CommentsonCoulthardandJohnson’s(2007)portrayalofthelikelihood-ratioframework.AustralianJournalofForensicSciences41(2):155–161.http://dx.doi.org/10.1080/00450610903147701.
Morrison,G.S.(2009b)Forensicvoicecomparisonandtheparadigmshift.Science&Justice49(4):298–308.http://dx.doi.org/10.1016/j.scijus.2009.09.002.
Morrison,G.S.(2009c)Likelihood-ratioforensicvoicecomparisonusingparametricrepresentationsoftheformanttrajectoriesofdiphthongs.JournaloftheAcousticalSocietyofAmerica125(4):2387–2397.http://dx.doi.org/10.1121/1.3081384.
Morrison,G.S.(2011)Measuringthevalidityandreliabilityofforensiclikelihood-ratiosystems.Science&Justice51(3):91–98.http://dx.doi.org/10.1016/j.scijus.2011.03.002.
Morrison,G.S.(2013)Tutorialonlogistic-regressioncalibrationandfusion:convertingascoretoalikelihoodratio.AustralianJournalofForensicSciences45(2):173–197.http://dx.doi.org/10.1080/00450618.2012.733025.
Orebaugh,A.andAllnutt,J.(2009)Classificationofinstantmessagingcommunicationsforforensicanalysis.TheInternationalJournalofForensicComputerScience1:22–28.http://dx.doi.org/10.5769/J200901002.
Pillay,S.R.andSolorio,T.(2011)Authorshipattributionofwebforumposts.eCrimeResearchersSummit(eCrime)1–7.http://dx.doi.org/10.1109/ecrime.2010.5706693.
Robertson,B.andVignaux,G.A.(1995)InterpretingEvidence:EvaluatingForensicScienceintheCourtroom.Chichester:Wiley.
Rose,P.(2002)ForensicSpeakerIdentification.London:Taylor&Francis.
Rose,P.(2013)Moreisbetter:likelihoodratio-basedforensicvoicecomparisonwithvocalicsegmentalcepstrafrontends.InternationalJournalofSpeechLanguageandtheLaw20(1):77–116.http://dx.doi.org/10.1558/ijsll.v20i1.77.
Saks,M.J.andKoehler,J.J.(2005)Thecomingparadigmshiftinforensicidentificationscience.Science309(5736):892–895.http://dx.doi.org/10.1126/science.1111565
Son,P.T.,Du,L.,Jin,H.,deVel,O.,Liu,N.andCaelli,T.(2008)AsimpleWordNet-ontologybasedemailretrievalsystemfordigitalforensics.InC.C.Yang,H.Chen,M.Chan,K.Chang,S.D.Lang,P.S.Chen,P.Hsieh,D.Zeng,F.Y.Wang,K.Carley,W.MaoandJ.Zhan(eds),IntelligenceandSecurityInformatics5075:217–228.
Stamatatos,E.(2009)Asurveyofmodernauthorshipattributionmethods.JournaloftheAmericanSocietyforInformationScienceandTechnology60(3):538–556.http://dx.doi.org/10.1002/asi.v60:3.
Stamatatos,E.(2013)Ontherobustnessofauthorshipattributionbasedoncharactern-gramfeatures.JournalofLawandPolicy21:421–725.
Stamatatos,E.,Fakotakis,N.andKokkinakis,G.(2001)Computer-basedauthorshipattributionwithoutlexicalmeasures.ComputersandtheHumanities35(2):193–214.
Stolfo,S.J.,Hershkop,S.,Wang,K.,Nimeskern,O.andHu,C.W.(2003)Behaviorprofilingofemail.Proceedingsofthe1stNSF/NIJConferenceonIntelligenceandSecurityInformatics2665:74–90.
Teng,G.F.,Lai,M.S.,Ma,J.B.andLi,Y.(2004)AuthorshipminingforChinesee-maildocuments.InN.Callaos,W.LessoandB.Sanchez(eds),Proceedingsof8thWorldMulti-ConferenceonSystemics,CyberneticsandInformatics,VolIi,Proceedings:ComputingTechniques262–266.
USNationalResearchCouncil(2009)StrengtheningForensicScienceintheUnitedStatesaPathForward.Retrievedon19November2011fromhttp://www.nap.edu/catalog.php?record_id=12589
vanLeeuwen,D.andBrümmer,N.(2007)Anintroductiontoapplication-independentevaluationofspeakerrecognitionsystems.InC.Müller(ed.),SpeakerClassificationI:Fundamentals,Features,andMethods330–353.Berlin,NewYork:Springer.
Wei,C.,Sprague,A.,Warner,G.andSkjellum,A.(2008)Miningspamemailtoidentifycommonoriginsforforensicapplication.Proceedingsofthe2008ACMSymposiumonAppliedComputing:1433-1437.
Zheng,R.,Qin,Y.,Huang,Z.andChen,H.C.(2003)Authorshipanalysisincybercrimeinvestigation.Proceedingsofthe1stNSF/NIJConferenceonIntelligenceandSecurityInformatics2665:59-73.http://dx.doi.org/10.1007/3-540-44853-5_5.

Refbacks

  • There are currently no refbacks.





Equinox Publishing Ltd - 415 The Workstation 15 Paternoster Row, Sheffield, S1 2BX United Kingdom
Telephone: +44 (0)114 221-0285 - Email: info@equinoxpub.com

Privacy Policy