Pride, Prejudice, and Posts
Between Inclusion and Prejudice: Examining LGBTQ+ Discourse and Homophobia in Filipino Reddit Spaces
This initiative seeks to explore and critically examine the discourse surrounding LGBTQ+ experiences within Filipino Reddit communities. Our project analyzes how homophobia manifests in online spaces through natural language processing (NLP), investigates the overall sentiment of posts containing LGBTQ+ related terms, and evaluates how community engagement relates to the emotional tone of these discussions.
Our Research Focus
Description of the Dataset
The dataset consists of Reddit posts collected from the following subreddits:
Data Collection Process
The posts were filtered based on specific search inputs related to the LGBTQ+ community. The search inputs used are as follows:
The originally scraped dataset contains:
- Subreddit: The subreddit from which the post originated.
- Title: The title of the post.
- Original Body: The main content of the post before preprocessing.
- Preprocessed Body: The main content of the post after preprocessing.
- Author: Author of the post.
- Link: The link URL that directs to the post.
- Created: Date when the post was created.
- Num_Comments: Number of comments on the post.
- Upvote_Ratio: Upvote ratio of the post.
- Sentiment: The sentiment label assigned using VADER.
- Compound_Score: The overall sentiment score computed by VADER.
- Homophobia Flag: Indicator assigned using a Hugging Face model.
PREPROCESSING
We transformed the raw dataset into a well-structured and clean format to enable accurate analysis and meaningful insights. These preprocessing steps enhance data reliability and ensure the dataset is suitable for thorough exploration and modeling.
Steps in Preprocessing
1. Scraping Reddit Posts
- Collected Reddit posts using keyword-based search queries.
- Included metadata such as title, body, author, permalink, timestamp, and subreddit.
2. Cleaning Text Data
- Removed unnecessary characters such as hyperlinks, symbols, and formatting tags from titles and bodies.
- Standardized all text to lowercase to ensure uniformity across entries.
- This cleaning allowed us to focus solely on relevant textual content for downstream analysis.
3. Translating to English
- Translated posts written in Tagalog to English to ensure compatibility with our NLP models.
- This step was critical for enabling accurate sentiment detection and classification tasks.
4. Handling Missing Values
- Inspected columns with potential missing values, such as the
body field.
- Removed posts that lacked essential information (e.g., missing body content) due to deletion or removal.
- This filtering step preserved the integrity and quality of our dataset.
5. Tokenization and Stop Word Removal
- Segmented text into individual tokens to allow term-level analysis.
- Removed commonly used English stop words (e.g., “the”, “is”, “at”) using NLTK’s predefined stop word list.
- This ensured we retained only the most meaningful and relevant words related to gender-based violence.
6. Sentiment Analysis (VADER)
- Used the VADER sentiment analysis tool to determine emotional tone for each post.
- Generated a compound sentiment score to represent the overall polarity of text content.
7. Homophobia Flagging (Hugging Face Model)
- Applied a pretrained language model from Hugging Face to detect posts that might contain homophobic language.
- Flagged posts with a binary label to facilitate further filtering and analysis on hate speech.
1. Scraping Reddit Posts
- Collected Reddit posts using keyword-based search queries.
- Included metadata such as title, body, author, permalink, timestamp, and subreddit.
2. Cleaning Text Data
- Removed unnecessary characters such as hyperlinks, symbols, and formatting tags from titles and bodies.
- Standardized all text to lowercase to ensure uniformity across entries.
- This cleaning allowed us to focus solely on relevant textual content for downstream analysis.
3. Translating to English
- Translated posts written in Tagalog to English to ensure compatibility with our NLP models.
- This step was critical for enabling accurate sentiment detection and classification tasks.
4. Handling Missing Values
- Inspected columns with potential missing values, such as the
bodyfield. - Removed posts that lacked essential information (e.g., missing body content) due to deletion or removal.
- This filtering step preserved the integrity and quality of our dataset.
5. Tokenization and Stop Word Removal
- Segmented text into individual tokens to allow term-level analysis.
- Removed commonly used English stop words (e.g., “the”, “is”, “at”) using NLTK’s predefined stop word list.
- This ensured we retained only the most meaningful and relevant words related to gender-based violence.
6. Sentiment Analysis (VADER)
- Used the VADER sentiment analysis tool to determine emotional tone for each post.
- Generated a compound sentiment score to represent the overall polarity of text content.
7. Homophobia Flagging (Hugging Face Model)
- Applied a pretrained language model from Hugging Face to detect posts that might contain homophobic language.
- Flagged posts with a binary label to facilitate further filtering and analysis on hate speech.
EXPLORATORY DATA ANALYSIS
Research Question #1
The word cloud reveals that homophobic posts frequently center around LGBT-related terms such as "gay," "lgbt," "lgbtqi," "lesbian," "bisexual," "partner," "community," and "gender," suggesting that discussions are heavily focused on sexual orientation and gender identity. Expressions of bias and discrimination are indicated by the prominence of words like "homophobic," "hate," "straight," and "stereotype." Furthermore, the recurring appearance of words such as "country," "family," "parent," "marriage," "law," and "study" implies that these posts often refer to broader societal, legal, and familial issues concerning homophobia.
Discussion
Examining our results, it’s clear that discussions around LGBTQ+ topics in Filipino Reddit spaces are complex, and community reactions vary depending on both sentiment and the presence of homophobia. Our analysis found that neutral, non-homophobic posts receive the highest levels of engagement and users are more likely to comment on and upvote these kinds of discussions. In contrast, posts flagged as homophobic, regardless of whether their tone is positive or negative, tend to receive less attention overall. This suggests that the community, on the whole, prefers balanced and respectful conversations and does not reward hate or hostility.However, our study also indicates that homophobia still exists in these online discussions, sometimes in obvious ways, but often more subtly. Even if explicit hate isn’t always at the forefront, less visible forms of discrimination and prejudice can still impact readers, especially those seeking support or validation. Recognizing these subtleties is important for understanding the challenges faced by LGBTQ+ members both online and in real life.
For Filipinos, especially those in the LGBTQ+ community, our findings highlight that there is space for open conversation on Reddit, and these spaces can be supportive if users and moderators uphold a respectful environment. Our research underscores the importance of active moderation and the use of tools to identify harmful content, ensuring that online communities remain welcoming and safe.
It’s important to acknowledge the limitations of our work: our dataset was restricted to selected subreddits and specific keywords, and our language models are mostly trained on English, not Filipino languages. For future research, a more diverse linguistic approach and broader subreddit sampling would provide deeper insights.
Overall, our findings emphasize the ongoing need for inclusive spaces and thoughtful moderation. Healthy online discussions can go a long way toward building a better, more understanding community for everyone.
Conclusion
In summary, our exploration of LGBTQ+ discourse in Filipino Reddit communities reveals a nuanced landscape. Neutral, non-homophobic discussions consistently attract the most engagement, showing that users respond best to respectful and balanced conversations. Conversely, posts marked by homophobia or those with heightened emotional tones tend to receive less interaction, indicating a general community preference against divisive or hostile content.
Despite these positive trends, it is important to acknowledge that instances of homophobia, whether subtle or overt, still persist in these online spaces. This underscores the ongoing need for effective moderation and community guidelines to ensure that discussions remain welcoming, particularly for LGBTQ+ individuals seeking support.
While our study is limited by its focus on a select group of subreddits and primarily English keywords, it provides a strong foundation for future research. Expanding the scope to consider Filipino languages and additional communities could yield even deeper insights.
Ultimately, this research highlights that online spaces, when guided by thoughtfulness and inclusivity, can foster healthy discussions and community support. By building on these findings, Filipino online communities can continue to work toward greater understanding and acceptance for all members.
MEET THE TEAM
TJ Noval 🐣
Hello, I am Tj!
You can call me Tristan or Tj for short. I am a sophomore Computer Science student at the University of the Philippines Diliman.
I’m especially passionate about cybersecurity and can’t wait to dive deeper into this dynamic field—helping make the digital world a safer place for everyone.
Jarl Ricaforte 🫧
Hi, I’m Jarl! 👋
I’m Jarelle Ricaforte, a 2nd-year BS Computer Science student at the University of the Philippines Diliman, still exploring the world of tech and loving every bit of it.
I got into programming because I found it super cool and interesting—like solving puzzles with endless possibilities. Right now, I’m really interested in web development, even though I still have a lot to learn. I don’t have much experience yet, but I’m excited to keep learning how to build useful, interactive websites from the ground up!
Gabby Sacramento 🧸
Hey, I'm Gabby! 😊
I’m a Computer Science sophomore at UP Diliman, currently exploring UI/UX design and
web development. I’m still early in the process and learning as I go,
but I’m excited about where it’s heading! I’ve always been passionate
about graphic design and was really into STEM back in high school, so I’m looking forward to
finding creative ways to blend both worlds. Right now, I’m loving the challenge of turning ideas into real,
interactive interfaces—even if I’m still figuring things out along the way.
Outside of tech, I’m into all things 2000s—movies, TV shows, music, and games. I’m also a huge cat person. 😸