Combining Sentiment Analysis with Topic Modeling on Streamed Social Network Data

This repository contains the project for my thesis as well as the thesis itself. Information about this project beyond this README, for example about the project structure, can be found in the thesis itself. Technical documentation is done in-code.

Setup

The following steps need to be taken to launch the dashboard.

Dependencies

The following dependencies need to be installed and (if applicable) added to PATH or otherwise set up as per their respective documentations.

Python 3.6
Python dependencies listed in requirements.txt
Bower
Bower dependencies listed in bower.json
Docker 1.13.1
Apache Spark 2.2.0
Apache Kafka (Go through integration guide here)

It is strongly advised to use a `virtualenv

Creating a Twitter App

To enable the dashboard to connect to Twitter, create an App in Twitter Application Management, download the access information, and place it in the root directory with the name twitter.access.json.

Running the Dashboard

Start Kafka by running docker-compose up. Make sure Docker is running
Set the environment variables:
- SPARK_HOME="/path/to/spark/"
- PYSPARK_PYTHON=python3
- PYSPARK_SUBMIT_ARGS=--packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 pyspark-shell
Run python3 run.py from src/visualization/dashboard

Troubleshooting

NoBrokersAvailable: Pyspark cannot reach the Kafka message broker. Make sure docker-compose ran without errors.

Miscellaneous

Starting a pySpark Notebook Server:

Set PYSPARK_DRIVER_PYTHON=jupyter and PYSPARK_DRIVER_PYTHON_OPTS="notebook" to run pyspark in Notebook-mode. Spark can now be used in Jupyter notebooks.

Streaming from MongoDB

Add '--packages org.mongodb.spark:mongo-spark-connector_2.10:1.1.0' to PYSPARK_SUBMIT_ARGS to be able to use Spark with MongoDB instead of a Kafka message queue. This is useful for development/debugging, since it doesn't require connecting to the actual Twitter stream.

Changing Models

All models are trained in Notebooks under /notebooks.

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
data		data
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
src		src
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Combining Sentiment Analysis with Topic Modeling on Streamed Social Network Data

Setup

Dependencies

Creating a Twitter App

Running the Dashboard

Troubleshooting

Miscellaneous

Starting a pySpark Notebook Server:

Streaming from MongoDB

Changing Models

About

Releases

Packages

Languages

License

ClaasM/Bachelors-Thesis

Folders and files

Latest commit

History

Repository files navigation

Combining Sentiment Analysis with Topic Modeling on Streamed Social Network Data

Setup

Dependencies

Creating a Twitter App

Running the Dashboard

Troubleshooting

Miscellaneous

Starting a pySpark Notebook Server:

Streaming from MongoDB

Changing Models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages