JStylo-Anonymouth

From PSAL

Jump to: navigation, search

The JStylo and Anonymouth integrated open-source project (JSAN) resides on GitHub.

Contents

What is JSAN?

JSAN is a writing style analysis and anonymization framework. It consists of two parts:

  • JStylo - authorship attribution framework
  • Anonymouth - authorship evasion (anonymization) framework

JStylo is used as an underlying feature extraction and authorship attribution engine for Anonymouth, which uses the extracted stylometric features and classification results obtained through JStylo and suggests users changes to anonymize their writing style.

Details about JSAN: Use Fewer Instances of the Letter "i": Toward Writing Style Anonymization. Andrew McDonald, Sadia Afroz, Aylin Caliskan, Ariel Stolerman and Rachel Greenstadt. Privacy Enhancing Technologies Symposium (PETS 2012)

Tutorial

JSAN tutorial: Presented at 28c3 video

Download

Downloads:

  • Corpora - includes corpora and problem set XML files suitable for JStylo, for the following:
    • The Extended-Brennan-Greenstadt Adversarial Stylometry Corpus (45 Authors, 6500 words per author minimum)
    • The Brennan-Greenstadt Adversarial Stylometry Corpus (12 Authors, 5000 words per author minimum)
    • A subcorpus of the Enron email dataset (50 authors, 6500 words per author minimum)
  • JStylo - Authorship attribution analysis tool:
    • Includes JStylo and the Extended-Brennan-Greenstadt Adversarial Stylometry Corpus, the Brennan-Greenstadt Adversarial Stylometry Corpus and the Enron subcorpus detailed above.
    • JStylo v1.2
    • JStylo v1.1
  • JSAN (first edition) - includes:
    • JStylo v0.0.1 - Authorship attribution analysis tool.
    • Anonymouth v0.0.2 - Authorship recognition evasion tool.
    • The Extended-Brennan-Greenstadt Adversarial Stylometry Corpus and the Brennan-Greenstadt Adversarial Stylometry Corpus detailed above

If you use JStylo and/or Anonymouth in your research, please cite:

Andrew McDonald, Sadia Afroz, Aylin Caliskan, Ariel Stolerman and Rachel Greenstadt. Use Fewer Instances of the Letter "i": Toward Writing Style Anonymization. PETS 2012.

If you use the corpus in your research, please cite:

Michael Brennan and Rachel Greenstadt. Practical Attacks Against Authorship Recognition Techniques in Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, California, July 2009.

Developers

To setup the environment for developers follow these steps:

Personal tools