profanity filter что это

25.08.202303.07.2023 admin 0 Comments

Build a profanity filter API with GraphQL

August 5, 2021 4 min read 1273

Editor’s note: Examples of profanity in this article are represented by the word “profanity” in order to remain inclusive and appropriate for all audiences.

Detecting and filtering profanity is a task you are bound to run into while building applications where users post (or interact with) text. These can be social media apps, comment sections, or game chat rooms, just to name a few.

Having the ability to detect profanity in order to filter it out is the solution to keeping communication spaces safe and age-appropriate, if your app requires.

This tutorial will guide you on building a GraphQL API to detect and filter profanity with Python and Flask. If you are just interested in the code alone, you can visit this GitHub repo for the demo application source code.

Prerequisites

To follow and understand this tutorial, you will need the following:

What is profanity?

Profanity (also known as curse words or swear words) refers to the offensive, impolite, or rude use of words and language. Profanity also helps to show or express a strong feeling towards something. Profanity can make online spaces feel hostile towards users, which is undesirable for an app designed for a wide audience.

Which words qualify as profanity is up to your discretion. This tutorial will explain how to filter words individually, so you have control over what type of language is allowed on your app.

What is a profanity filter?

A profanity filter is a software or application that helps detect, filter, or modify words considered profane in communication spaces.

Why do we detect and filter profanity?

Common problems faced when detecting profanity

Detecting profanity with Python

Using Python, let’s build an application that tells us whether a given string is profane or not, then proceed to filter it.

Creating a word-list-based profanity detector

To create our profanity filter, we will create a list of unaccepted words, then check if a given string contains any of them. If profanity is detected, we will replace the profane word with a censoring text.

Create a file named filter.py and save the following code in it:

Testing our word-list-based filter

If you were to pass the following arguments to the function above:

You would get the following results:

However, this approach has many problems ranging from being unable to detect profanity outside its word list to being easily fooled by misspellings or word paddings. It also requires us to regularly maintain our word list, which adds many problems to the ones we already have. How do we improve what we have?

Using the better-profanity Python library to improve our filter

Better-profanity is a blazingly fast Python library to check for (and clean) profanity in strings. It supports custom word lists, safelists, detecting profanity in modified word spellings, and Unicode characters (also called leetspeak), and even multi-lingual profanity detection.

Installing the better-profanity library

In the terminal, type:

Integrating better-profanity into our filter

Now, update the filter.py file with the following code:

Testing the better-profanity-based filter

If you were to pass the following arguments once again to the function above:

You would get the following results, as expected:

Like I mentioned previously, better-profanity supports profanity detection of modified word spellings, so the following examples will be censored accurately:

Better-profanity also has functionalities to tell if a string is profane. To do this, use:

Better-profanity also allows us provide a character to censor profanity with. To do this, use:

Building a GraphQL API for our filter

We have created a Python script to detect and filter profanity, but it’s pretty useless in the real world as no other platform can use our service. We’ll need to build a GraphQL API with Flask for our profanity filter, so we can call it an actual application and use it somewhere other than a Python environment.

Installing the application requirements

In the terminal, type:

Writing the application’s GraphQL schemas

Next, let’s write our GraphQL schemas for the API. Create a file named schema.py and save the following code in it:

Configuring our application server for GraphQL

After that, create another file named server.py and save the following code in it:

Running the GraphQL server

To run the server, execute the server.py script.

In the terminal, type:

Your terminal should look like the following:

Testing the GraphQL API

After running the server.py file in the terminal, head to your browser and open the URL http://127.0.0.1:5000. You should have access to the GraphiQL interface and get a response similar to the image below:

We can proceed to test the API by running a query like the one below in the GraphiQL interface:

The result should be similar to the images below:

Conclusion

This article taught us about profanity detection, its importance, and its implementation. In addition, we saw how easy it is to build a profanity detection API with Python, Flask, and GraphQL.

The source code of the GraphQL API is available on GitHub. You can learn more about the better-profanity Python library from its official documentation.

Monitor failed and slow GraphQL requests in production

LogRocket is like a DVR for web apps, recording literally everything that happens on your site. Instead of guessing why problems happen, you can aggregate and report on problematic GraphQL requests to quickly understand the root cause. In addition, you can track Apollo client state and inspect GraphQL queries’ key-value pairs.

LogRocket instruments your app to record baseline performance timings such as page load time, time to first byte, slow network requests, and also logs Redux, NgRx, and Vuex actions/state. Start monitoring for free.

Источник

Profanity filter что это

A fast, robust Python library to check for profanity or offensive language in strings. Read more about how and why profanity-check was built in this blog post. You can also test out profanity-check in your browser.

profanity-check uses a linear SVM model trained on 200k human-labeled samples of clean and profane text strings. Its model is simple but surprisingly effective, meaning profanity-check is both robust and extremely performant.

Why Use profanity-check?

No Explicit Blacklist

Many profanity detection libraries use a hard-coded list of bad words to detect and filter profanity. For example, profanity uses this wordlist, and even better-profanity still uses a wordlist. There are obviously glaring issues with this approach, and, while they might be performant, these libraries are not accurate at all.

Other libraries like profanity-filter use more sophisticated methods that are much more accurate but at the cost of performance. A benchmark (performed December 2018 on a new 2018 Macbook Pro) using a Kaggle dataset of Wikipedia comments yielded roughly the following results:

Package	1 Prediction (ms)	10 Predictions (ms)	100 Predictions (ms)
profanity-check	0.2	0.5	3.5
profanity-filter	60	1200	13000
profanity	0.3	1.2	24

This table speaks for itself:

Package	Test Accuracy	Balanced Test Accuracy	Precision	Recall	F1 Score
profanity-check	95.0%	93.0%	86.1%	89.6%	0.88
profanity-filter	91.8%	83.6%	85.4%	70.2%	0.77
profanity	85.6%	65.1%	91.7%	30.8%	0.46

See the How section below for more details on the dataset used for these results.

Note that both predict() and predict_prob return numpy arrays.

About

A fast, robust Python library to check for offensive language in strings.

Источник

rominf / profanity-filter Go PK Goto Github PK

A Python library for detecting and filtering profanity

License: GNU General Public License v3.0

profanity-filter’s Introduction

profanity-filter: A Python library for detecting and filtering profanity

This library is no longer a priority for me. Feel free to fork it.

profanity-filter is a universal library for detecting and filtering profanity. Support for English and Russian is included.

Here are the basic examples of how to use the library. For more examples please see tests folder.

Using as a part of Spacy pipeline

RESTful web service

Go to the /docs for interactive documentation.

First two parts of installation instructions are designed for the users who want to filter English profanity. If you want to filter profanity in another language you still need to read it.

For minimal setup you need to install profanity-filter with is bundled with spacy and download spacy model for tokenization and lemmatization:

For more info about Spacy models read: https://spacy.io/usage/models/.

To get deep analysis functionality install additional libraries and dictionary for your language.

Firstly, install hunspell and hunspell-devel packages with your system package manager.

For Amazon Linux AMI run:

Other language support

Let’s take Russian for example on how to add new language support.

Russian language support

Firstly, we need to provide file profanity_filter/data/ru_badwords.txt which contains a newline separated list of profane words. For Russian it’s already present, so we skip file generation.

Next, we need to download the appropriate Spacy model. Unfortunately, Spacy model for Russian is not yet ready, so we will use an English model for tokenization. If you had not install Spacy model for English, it’s the right time to do so. As a consequence, even if you want to filter just Russian profanity, you need to specify English in ProfanityFilter constructor as shown in usage examples.

Next, we download dictionaries in Hunspell format for deep analysis from the site https://cgit.freedesktop.org/libreoffice/dictionaries/plain/:

You need to install polyglot package and it’s requirements for language detection. See https://polyglot.readthedocs.io/en/latest/Installation.html for more detailed instructions.

For Amazon Linux AMI run:

RESTful web service

If something is not right, you can import dependencies yourself to see the import exceptions:

English profane word dictionary: https://github.com/areebbeigh/profanityfilter/ (author Areeb Beigh).

Russian profane word dictionary: https://github.com/PixxxeL/djantimat (author Ivan Sergeev).

profanity-filter’s People

Contributors

Stargazers

Watchers

Forkers

profanity-filter’s Issues

Minimize profane word dictionaries for deep analysis usage

TypeError when calling extra_profane_word_dictionaries

When supplying a dict (>) to extra_profane_word_dictionaries, it raises a TypeError after trying to divide a string by a string:

Traceback (most recent call last): File » «, line 4, in File «/home/hwhite/frac37/lib/python3.7/site-packages/profanity_filter/profanity_filter.py», line 265, in custom_profane_word_dictionaries self.clear_cache() File «/home/hwhite/frac37/lib/python3.7/site-packages/profanity_filter/profanity_filter.py», line 384, in clear_cache self._update_profane_word_dictionary_files() File «/home/hwhite/frac37/lib/python3.7/site-packages/profanity_filter/profanity_filter.py», line 429, in _update_profane_word_dictionary_files profane_word_file = self._DATA_DIR / f’_profane_words.txt’ TypeError: unsupported operand type(s) for /: ‘str’ and ‘str’

Fails to detect phrases

What am I doing wrong?

I created a simple service using instructions from your readme but nothing works.

And I got errors when I call censor method.

P.S. I tried different ways but have no luck.

Try TinyFastSS

Optionaly store cache in MongoDB

This will make parallelized censoring faster. This should be optional because the user will need to setup MongoDB and install additional dependencies.

Not working with auto-py-to-exe

Use more-itertools library

Can’t import profanity_filter

I’m getting this error when trying to import the library on a python terminal. Using both python 3.6.0 and 3.7.0

Some plurals not considered profane

Speedup initialization

The bottlenecks are:

Make tests faster

Every test a new instance of profanity filter is created. I think it should be possible to cache fixtures.

Unable to mark words as not profane(Customization / English)

Hey,
I have been using the Library to classify english texts.
The one problem I have been facing is that the tool is wrongly classifying words that have devil, hell or allah in it. I was wondering if I can remove those from the Library’s Dictionary.
Thanks,
Vyom

Show real profane word to a user

Hi Roman,
Thank you for sharing a code for your product. I learned a lot from it and
find it very powerful and reliable for the amount of features it provides. Did not try all of them yet though. 🙂

Have a suggestion.
Can we bring up the bad_word that was mutated by the user into result?
Ex, if I have «shiiiit» as an input, I would want to know what was the real bad_word that Levenshtein «had in mind» («shit»). This example is easy but sometimes there are cases when you cannot even guess why the word is censored.
Do you see a value in it? Do you think it makes sense to add it? Maybe by extra parameter if not always?

Thank you very much for being very responsive and providing an excellent support for your great product!

Windows 10, Python 3.8 can’t run console command

After installing, I’m not getting the console command to work.

It’s not in my C:\Python38\Scripts nor my C:\Users\abc\AppData\Roaming\Python\Python38\Scripts

Only first language in a list of languages is working

Invalid syntax in profanity_filter.py Class config

This error pops
SyntaxError: invalid syntax in File «/usr/local/lib/python3.5/dist-packages/profanity_filter/profanity_filter.py», line 102
censor_char: str = ‘*’

Is this a python3 issue? Does this only support python 2?

TypeError: init() got an unexpected keyword argument ‘lang’

Bug in saving profane word in redis

Expected behavior
Profane word is saved in redis.

Real behavior
Exception is thrown.

How to reproduce

_save_censored_word will fail` method will throw an exception

Failed to detect number substitutions

When trying to identify profane words sh1t is not getting identified as profane.
Levenstein approach should have identified the variation to the original profane word.
Also, I see that sh1t is listed under the profane word dictionary. Could you please see where the problem is?

Publish on Spacy website

Improve README.md

where to cd?

the Deep learning section contains code to cd int profanity_filter/data, where are these

Make REST webservice for profanity filtering

Also package it to the Docker.

Parallelize censoring

I think dask is a good solution because it has a nice API and can be used in a cluster.

The easiest and most effective parallelization is to map words after tokenization.

Make it possible to change DATA_DIR

It should be implemented as settable property. Note, that cache should be cleared after the setting the new value.

Get exception on particular input

For these inputs «deathfrom», «eskimobob», «»piazza@gma» with pf.censor_whole_words=False, pf.censor_word throws below exception.

Publish all dependencies on PyPI to avoid installation via git URLs

Refactor tests

Use the Spacy component for most tests, as it offers more information.

censor() and censor_word() give different results for profanity

How to explain this behavior in a current version?

Do not try to search profanity in compound words of dictionary words in emails and URLs

For example, these words should not be detected as profane: «deathfrom», «eskimobob» if they come as part of emails and URLs.

Recommend Projects

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

A PHP framework for web artisans

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Some thing interesting about game, make everyone happy.

Recently View Projects

profanity-filter

a python library for detecting and filtering profanity.

assume-role

assume-role: a cli tool making it easy to assume iam roles through an aws bastion account.

YetiForceCRM

our team created for you one of the most innovative crm systems that supports mainly business processes and allows for customization according to your needs. be ahead of your competition and implement

edna-sdk-android

edna Android SDK libraries and demo project

Hacktoberfest2021

This repository has been excluded from Hacktoberfest by DigitalOcean.

Источник

Profanity filter что это

Profanity Filter takes strings as input and removes any bad curse words that the string might have. It checks the strings for specific blacklist which must match as a separate word to be considered as a curse word. If a curse word is found, then it will replace the curse word with a censor character the user chooses (default is *).

This package is intended to used with Laravel. Tested and working with laravel 5.4.

This code is based on Fastwebmedia/Profanity-Filter. A major part of it is taken from there and I added the things that I thought it required.

###Laravel Add ‘Sworup\ProfanityFilter\ProfanityServiceProvider’ to your providers array.

If you wish to use the Facade then add ‘Profanity’ => ‘Sworup\ProfanityFilter\Profanity’

The package will automatically use the config file containing the list of banned words.

The above code would return:

Please see CHANGELOG for more information about what has changed recently.

Please see CONTRIBUTING and CONDUCT for details.

If you discover any security related issues, please email sworup.shakya@gmail.com instead of using the issue tracker.

The MIT License (MIT). Please see License File for more information.

About

Profanity filter package would help you censor some of the bad words users put in your posts and/or comments.

Источник

Profanity filter что это

The Profanity Filter for Rails ¶ ↑

This plugin will allow you to filter profanity using basic replacement or a dictionary term.

You can use it in your models: ¶ ↑

Notice – there are two profanity filters, one is destructive. Beware the exclamation point (profanity_filter!).

Non-Destructive (filters content when called, original text remains in the database)

Destructive (saves the filtered content to the database)

You can also use the filter directly: ¶ ↑

Inquiring minds can checkout the simple benchmarks I’ve included so you can have an idea of what kind of performance to expect. I’ve included some quick scenarios including strings of (100, 1000, 5000, 1000) words and dictionaries of (100, 1000, 5000, 25000, 50000, 100000) words.

You can run the benchmarks via:

May break ProfanityFilter out on it’s own

Clean up dictionary implementation and substitution (suboptimal and messy)

Move benchmarks into a rake task

Ability to supplement the profanity database (with a yaml outside of the gem) via @seankibler

Easy custom blacklists/dictionaries (essentially the same as above)

The Profanity Filter for Rails uses the MIT License. Please see the MIT-LICENSE file.

Created by Adam Bair (adam@intridea.com) of Intridea (www.intridea.com) in the open source room at RailsConf 2008. Originally called Fu-fu: The Profanity Filter for Rails.

About

A Rails plugin gem for filtering out profanity.

Источник

Build a profanity filter API with GraphQL

Prerequisites

What is profanity?

What is a profanity filter?

Why do we detect and filter profanity?

Common problems faced when detecting profanity

Detecting profanity with Python

Creating a word-list-based profanity detector

Testing our word-list-based filter

Using the better-profanity Python library to improve our filter

Installing the better-profanity library

Integrating better-profanity into our filter

Testing the better-profanity-based filter

Building a GraphQL API for our filter

Installing the application requirements

Writing the application’s GraphQL schemas

Configuring our application server for GraphQL

Running the GraphQL server

Testing the GraphQL API

Conclusion

Monitor failed and slow GraphQL requests in production

Profanity filter что это

About

rominf / profanity-filter Go PK Goto Github PK

profanity-filter’s Introduction

profanity-filter’s People

Contributors

Stargazers

Watchers

Forkers

profanity-filter’s Issues

Minimize profane word dictionaries for deep analysis usage

TypeError when calling extra_profane_word_dictionaries

Fails to detect phrases

What am I doing wrong?

Try TinyFastSS

Optionaly store cache in MongoDB

Not working with auto-py-to-exe

Use more-itertools library

Can’t import profanity_filter

Some plurals not considered profane

Speedup initialization

Make tests faster

Unable to mark words as not profane(Customization / English)

Show real profane word to a user

Windows 10, Python 3.8 can’t run console command

Only first language in a list of languages is working

Invalid syntax in profanity_filter.py Class config

TypeError: __init__() got an unexpected keyword argument ‘lang’

Bug in saving profane word in redis

Failed to detect number substitutions

Publish on Spacy website

Improve README.md

where to cd?

Make REST webservice for profanity filtering

Parallelize censoring

Make it possible to change DATA_DIR

Get exception on particular input

Publish all dependencies on PyPI to avoid installation via git URLs

Refactor tests

censor() and censor_word() give different results for profanity

Do not try to search profanity in compound words of dictionary words in emails and URLs

Recommend Projects

Vue.js

TensorFlow

Django

Recommend Topics

javascript

server

Machine learning

Visualization

Recently View Projects

profanity-filter

assume-role

YetiForceCRM

edna-sdk-android

Hacktoberfest2021

Profanity filter что это

About

Profanity filter что это

TypeError: init() got an unexpected keyword argument ‘lang’