Pride, Prejudice, and Posts

Between Inclusion and Prejudice: Examining LGBTQ+ Discourse and Homophobia in Filipino Reddit Spaces

This initiative seeks to explore and critically examine the discourse surrounding LGBTQ+ experiences within Filipino Reddit communities. Our project analyzes how homophobia manifests in online spaces through natural language processing (NLP), investigates the overall sentiment of posts containing LGBTQ+ related terms, and evaluates how community engagement relates to the emotional tone of these discussions.

Explore Our Study

Our Research Focus

LGBTQ+ rights and inclusion remain vital issues within Filipino society, and online spaces such as Reddit often serve as platforms for open expression and discussion. Our project centers on understanding how LGBTQ+ discourse evolves in Filipino Reddit communities, with a focus on the prevalence of homophobic content. By analyzing these conversations, we aim to identify patterns of inclusion and prejudice, offering insights that can inform more effective advocacy, support systems, and educational efforts for the LGBTQ+ community

About Our Dataset

The dataset was compiled by scraping over 1,500 posts from Philippine-based subreddits, using keywords related to LGBTQ+ issues and various expressions of homophobia as search terms.

preprocessed_dataset.csv

EXPLORE DATASET

Description of the Dataset

The dataset consists of Reddit posts collected from the following subreddits:

r/OffmychestPH

A Filipino community where we work to make it a safe space in which you can unload your burdens, as well as celebrate your wins and milestones.

941k

Members

r/phlgbt

Looking for a safe space for the LGBTQIA+ community in the Philippines? Well, pull out a seat and make yourself comfy 'cause you came to the right place!

39k

Members

r/alasjuicy

NSFW stories and confessions from Filipino redditors

337k

Members

r/AkoBaYungGago

Sino ba ang gago sa istorya mo? Ikwento mo na 'yan! Hango sa r/AITA (Am I The Asshole Subreddit) pero mga pinoy ang gago.

282k

Members

r/Philippines

The official Philippines subreddit

Members

r/relationship_advicePH

Need advice with your relationship? Whether it's romance, friendship, family, co-workers, or basic human interaction: we're here to help.

149k

Members

r/MentalHealthPH

A community of Filipinos here and abroad to find support, share stories, discuss mental health issues and more.

67k

Members

r/CasualPH

For Casual Philippine experience

544k

Members

Data Collection Process

The posts were filtered based on specific search inputs related to the LGBTQ+ community. The search inputs used are as follows:

bading bakla tomboy homophobic trans gay lesbian

The originally scraped dataset contains:

Subreddit: The subreddit from which the post originated.
Title: The title of the post.
Original Body: The main content of the post before preprocessing.
Preprocessed Body: The main content of the post after preprocessing.
Author: Author of the post.
Link: The link URL that directs to the post.
Created: Date when the post was created.
Num_Comments: Number of comments on the post.
Upvote_Ratio: Upvote ratio of the post.
Sentiment: The sentiment label assigned using VADER.
Compound_Score: The overall sentiment score computed by VADER.
Homophobia Flag: Indicator assigned using a Hugging Face model.

PREPROCESSING

We transformed the raw dataset into a well-structured and clean format to enable accurate analysis and meaningful insights. These preprocessing steps enhance data reliability and ensure the dataset is suitable for thorough exploration and modeling.

Steps in Preprocessing

1. Scraping Reddit Posts

Collected Reddit posts using keyword-based search queries.
Included metadata such as title, body, author, permalink, timestamp, and subreddit.

2. Cleaning Text Data

Removed unnecessary characters such as hyperlinks, symbols, and formatting tags from titles and bodies.
Standardized all text to lowercase to ensure uniformity across entries.
This cleaning allowed us to focus solely on relevant textual content for downstream analysis.

3. Translating to English

Translated posts written in Tagalog to English to ensure compatibility with our NLP models.
This step was critical for enabling accurate sentiment detection and classification tasks.

4. Handling Missing Values

Inspected columns with potential missing values, such as the body field.
Removed posts that lacked essential information (e.g., missing body content) due to deletion or removal.
This filtering step preserved the integrity and quality of our dataset.

5. Tokenization and Stop Word Removal

Segmented text into individual tokens to allow term-level analysis.
Removed commonly used English stop words (e.g., “the”, “is”, “at”) using NLTK’s predefined stop word list.
This ensured we retained only the most meaningful and relevant words related to gender-based violence.

6. Sentiment Analysis (VADER)

Used the VADER sentiment analysis tool to determine emotional tone for each post.
Generated a compound sentiment score to represent the overall polarity of text content.

7. Homophobia Flagging (Hugging Face Model)

Applied a pretrained language model from Hugging Face to detect posts that might contain homophobic language.
Flagged posts with a binary label to facilitate further filtering and analysis on hate speech.

EXPLORATORY DATA ANALYSIS

Research Question #1

The word cloud reveals that homophobic posts frequently center around LGBT-related terms such as "gay," "lgbt," "lgbtqi," "lesbian," "bisexual," "partner," "community," and "gender," suggesting that discussions are heavily focused on sexual orientation and gender identity. Expressions of bias and discrimination are indicated by the prominence of words like "homophobic," "hate," "straight," and "stereotype." Furthermore, the recurring appearance of words such as "country," "family," "parent," "marriage," "law," and "study" implies that these posts often refer to broader societal, legal, and familial issues concerning homophobia.

Discussion

Examining our results, it’s clear that discussions around LGBTQ+ topics in Filipino Reddit spaces are complex, and community reactions vary depending on both sentiment and the presence of homophobia. Our analysis found that neutral, non-homophobic posts receive the highest levels of engagement and users are more likely to comment on and upvote these kinds of discussions. In contrast, posts flagged as homophobic, regardless of whether their tone is positive or negative, tend to receive less attention overall. This suggests that the community, on the whole, prefers balanced and respectful conversations and does not reward hate or hostility.

However, our study also indicates that homophobia still exists in these online discussions, sometimes in obvious ways, but often more subtly. Even if explicit hate isn’t always at the forefront, less visible forms of discrimination and prejudice can still impact readers, especially those seeking support or validation. Recognizing these subtleties is important for understanding the challenges faced by LGBTQ+ members both online and in real life.

For Filipinos, especially those in the LGBTQ+ community, our findings highlight that there is space for open conversation on Reddit, and these spaces can be supportive if users and moderators uphold a respectful environment. Our research underscores the importance of active moderation and the use of tools to identify harmful content, ensuring that online communities remain welcoming and safe.

It’s important to acknowledge the limitations of our work: our dataset was restricted to selected subreddits and specific keywords, and our language models are mostly trained on English, not Filipino languages. For future research, a more diverse linguistic approach and broader subreddit sampling would provide deeper insights.

Overall, our findings emphasize the ongoing need for inclusive spaces and thoughtful moderation. Healthy online discussions can go a long way toward building a better, more understanding community for everyone.

Conclusion

In summary, our exploration of LGBTQ+ discourse in Filipino Reddit communities reveals a nuanced landscape. Neutral, non-homophobic discussions consistently attract the most engagement, showing that users respond best to respectful and balanced conversations. Conversely, posts marked by homophobia or those with heightened emotional tones tend to receive less interaction, indicating a general community preference against divisive or hostile content.

Despite these positive trends, it is important to acknowledge that instances of homophobia, whether subtle or overt, still persist in these online spaces. This underscores the ongoing need for effective moderation and community guidelines to ensure that discussions remain welcoming, particularly for LGBTQ+ individuals seeking support.

While our study is limited by its focus on a select group of subreddits and primarily English keywords, it provides a strong foundation for future research. Expanding the scope to consider Filipino languages and additional communities could yield even deeper insights.

Ultimately, this research highlights that online spaces, when guided by thoughtfulness and inclusivity, can foster healthy discussions and community support. By building on these findings, Filipino online communities can continue to work toward greater understanding and acceptance for all members.

MEET THE TEAM

TJ Noval 🐣

Hello, I am Tj!

You can call me Tristan or Tj for short. I am a sophomore Computer Science student at the University of the Philippines Diliman.

I’m especially passionate about cybersecurity and can’t wait to dive deeper into this dynamic field—helping make the digital world a safer place for everyone.

Jarl Ricaforte 🫧

Hi, I’m Jarl! 👋

I’m Jarelle Ricaforte, a 2nd-year BS Computer Science student at the University of the Philippines Diliman, still exploring the world of tech and loving every bit of it.

I got into programming because I found it super cool and interesting—like solving puzzles with endless possibilities. Right now, I’m really interested in web development, even though I still have a lot to learn. I don’t have much experience yet, but I’m excited to keep learning how to build useful, interactive websites from the ground up!

Gabby Sacramento 🧸

Hey, I'm Gabby! 😊

I’m a Computer Science sophomore at UP Diliman, currently exploring UI/UX design and web development. I’m still early in the process and learning as I go, but I’m excited about where it’s heading! I’ve always been passionate about graphic design and was really into STEM back in high school, so I’m looking forward to finding creative ways to blend both worlds. Right now, I’m loving the challenge of turning ideas into real, interactive interfaces—even if I’m still figuring things out along the way.

Outside of tech, I’m into all things 2000s—movies, TV shows, music, and games. I’m also a huge cat person. 😸

Pride, Prejudice, and Posts

Between Inclusion and Prejudice: Examining LGBTQ+ Discourse and Homophobia in Filipino Reddit Spaces

Our Research Focus

About Our Dataset

Description of the Dataset

Data Collection Process

PREPROCESSING

Steps in Preprocessing

1. Scraping Reddit Posts

2. Cleaning Text Data

3. Translating to English

4. Handling Missing Values

5. Tokenization and Stop Word Removal

6. Sentiment Analysis (VADER)

7. Homophobia Flagging (Hugging Face Model)

EXPLORATORY DATA ANALYSIS

Research Question #1

Research Question #2

Research Question #3

Nutshell Plot

Discussion

Conclusion

MEET THE TEAM

TJ Noval 🐣

Jarl Ricaforte 🫧

Gabby Sacramento 🧸