All Projects
project

SocialBi

A healthcare intelligence platform that scrapes patient discussions from Reddit, runs them through a multi-model NLP pipeline (Meditron + Qwen + OpenAI), and surfaces structured insights including disease trends, patient psychographics, treatment switch behaviour, and executive summaries for pharmaceutical decision-makers.

Role: Full Stack Developer Type: Internal / B2B SaaS Year: 2024–2025 Status: Internal Deployment
Next.jsPython ML MeditronQwen OpenAIReddit API MongoDB
Executive Summary

High-level overview dashboard for pharma stakeholders. Real-time metrics on posts processed, NLP accuracy, unique conditions detected, and trending topics extracted from patient communities.

socialbi.app / executive-summary
Analytics
📊 Executive Summary
🦠 Disease Insights
👤 Patient Psychographics
💊 Treatment Switch
🧬 Diagnostic Biomarker
💡 Treatment Insights
Config
⚙ User Input
⬛ Export
Executive Summary
Disease: Diabetes NLP: Active ↓ Export PDF
12.4k
Posts Analysed
↑ 840 new
89%
NLP Accuracy
↑ 3% vs last run
247
Unique Conditions
↑ 12 new
Type 2 Diabetes Hypertension Hypothyroidism High Cholesterol PCOS Fatty Liver Insulin Resistance Pre-Diabetes Kidney Disease Neuropathy
Post Volume by Week
Sentiment Distribution
Negative
58%
Neutral
28%
Positive
14%
Analytics Views

Two of the six analytics modules: Disease Insights shows condition frequency from patient posts, and Patient Psychographics reveals demographics, lifestyle patterns, and behavioural signals.

Disease Insights: Condition Frequency Analysis
Disease Insights
Diabetes Dataset
Condition Mention Frequency
Type 2 Diabetes
3,841
Hypertension
3,192
Hypothyroidism
2,498
High Cholesterol
1,964
PCOS
1,512
Neuropathy
982
Treatment Mentions
Metformin
2,140
Insulin
1,750
Ozempic
1,580
Jardiance
998
Berberine
705
Trulicity
462
Patient Psychographics: Demographics & Behaviour
Demographics
Age Group
18–34
38%
35–54
49%
55+
13%
Gender
Female
62%
Male
36%
Other
2%
Behavioural Signals
Diet & Nutrition focus
74%
Self-monitoring (CGM)
61%
Exercise mentions
55%
Doctor distrust
38%
Alternative medicine
31%
Insurance issues
28%
Key Features
🤖
Multi-Model NLP Pipeline
Meditron (clinical), Qwen (general), and OpenAI models work in ensemble to extract structured healthcare insights from unstructured Reddit text.
📡
Reddit Scraping
Automated crawling of disease-specific subreddits via the Reddit API. Configurable per-condition keyword sets.
🦠
Disease Insights
Frequency analysis of conditions, symptoms, and comorbidities across all scraped posts for a given search query.
👤
Patient Psychographics
Inferred demographics, behavioural patterns, lifestyle factors, and emotional signals from patient language.
💊
Treatment Switch Behavior
Tracks which drugs patients switch from/to and the stated reasons, a critical insight for pharma marketing teams.
🧬
Diagnostic Biomarker Analysis
Surfaces biomarkers co-mentioned with diseases, tracking patient awareness of diagnostic tests and lab values.
📋
Executive Summaries
Auto-generated MRI-style summaries of all analytics modules, ready to share with pharma stakeholders as PDFs.
Configurable Input
Users define disease keywords, subreddits to crawl, and time window. Pipeline re-runs with new parameters on demand.
Technical Breakdown
Next.js Frontend
Dashboard views with real-time data polling from the analytics API. Server components for initial data load, client for chart interactions.
Python ML Backend
FastAPI-based ML service running Meditron + Qwen for medical NER, sentiment analysis, entity extraction, and summarisation.
Meditron + Qwen
Open medical LLM (Meditron) for clinical NER; Qwen for general language understanding. Ensemble outputs merged and scored.
OpenAI API
GPT-4 used for executive summary generation and high-confidence entity disambiguation as a final pass.
Reddit API (PRAW)
Python Reddit API Wrapper fetches posts and comments from targeted subreddits. Rate-limited, paginated, deduplicated.
MongoDB
Stores raw posts, NLP annotations, extracted entities, and final structured results. Aggregations feed every chart and table.
Kollens All Projects AlgoMine