Report

Warning

This report is a work in progress and is subject to change throughout the development of the group project. If there are any errors or inaccuracies we have failed to address, please change and report them accordingly.

This report is intended to provide a comprehensive overview of all sections of the group project. For reference, the project is titled "Theft Over Open Data - Analysis and Visualization", and the following sections are included within the report in their respective definitions (following the evaluation for this group project):

Data Exploration
Data Modelling
Predictive Model Building
Model Scoring and Evaluation
Model Deployment

The report is structured as follows:

Introduction & Summary - This section will provide an overview of the project, including the purpose, goals, and objectives of the project.
Data Exploration, Data Modelling Process - This section will provide a detailed analysis of the data, including but not limited to the following:
- Data sources used and exploration of the data.
- Data cleaning and preprocessing.
- Data transformation and feature engineering techniques.
- Data modeling and feature selection.
Predictive Model Building, Model Training and Evaluation, Model Deployment - This section will provide a detailed analysis of the predictive model, including but not limited to the following:
- Model selection and training.
- Model evaluation and scoring.
- Model deployment and deployment considerations.
Bibliography - This section will provide a list of references used in the report.
Appendix - This section will provide additional information and insights not covered in the report.
Conclusion - This section will provide a summary of the report and the findings.

Summary

THis project, titled "Theft Over Open Data - Analysis and Visualization", aims to explore theft-related incidents using publicly available datasets. The primary objective / goals are:

Perform comprehensive data exploration with provided visualizations.
Build predictive models that identify key patterns and trends in theft incidents.
Provide actionable insights and recommendations for improving theft prevention and response.
Develop a model that can be deployed with a frontend and backend.

Executive Summary (Brief)

In this project, we utilized the Toronto Theft Over dataset from the Toronto Police Service Public Safety Data Portal to analyze and gain insights into theft trends in Toronto from 2014 to 2023. The dataset contained information regardin the location where the offence occured based on OCC_YEAR, OCC_MONTH, OCC_DAY, and various other features. Our goal with this project was to create a predictive model that could predict what type of crime would occur, given certain information passed to the model as input.

Columnar Analysis and Visualization

Column Name	Data Type	Column Description	Missing Values
OBJECTID	Numeric	Unique identifier for each record	0
EVENT_UNIQUE_ID	Object	Unique identifier for each theft event	0
REPORT_DATE	Object	Date and time when the theft was reported	0
OCC_DATE	Object	Date and time when the theft occurred	0
REPORT_YEAR	Numeric	Year when the theft was reported	0
REPORT_MONTH	Object	Month when the theft was reported	0
REPORT_DAY	Numeric	Day of the month when theft was reported	0
REPORT_DOY	Numeric	Day of the year when theft was reported	0
REPORT_DOW	Object	Day of the week when theft was reported	0
REPORT_HOUR	Numeric	Hour when the theft was reported	0
OCC_YEAR	Numeric	Year when the theft occurred	4
OCC_MONTH	Object	Month when the theft occurred	4
OCC_DAY	Numeric	Day of the month when theft occurred	4
OCC_DOY	Numeric	Day of the year when theft occurred	4
OCC_DOW	Object	Day of the week when theft occurred	4
OCC_HOUR	Numeric	Hour when the theft occurred	0
DIVISION	Object	Police division that was notified of the theft	0
LOCATION_TYPE	Object	Type of location where theft occurred	0
PREMISES_TYPE	Object	Type of premises where theft occurred	0
UCR_CODE	Numeric	Uniform Crime Reporting code	0
UCR_EXT	Numeric	UCR extension code	0
OFFENCE	Object	Description of the theft offense (categorized)	0
MCI_CATEGORY	Object	Major Crime Indicator category	0
HOOD_158	Object	Hood 158 identifier (newer)	0
NEIGHBOURHOOD_158	Object	Neighbourhood name (158 areas)	0
HOOD_140	Object	Hood 140 identifier (older)	0
NEIGHBOURHOOD_140	Object	Neighbourhood name (140 areas)	0
LONG_WGS84	Numeric	Longitude coordinate (WGS84)	0
LAT_WGS84	Numeric	Latitude coordinate (WGS84)	0
x	Numeric	X coordinate	0
y	Numeric	Y coordinate	0

Types of Visualizations

Here are some of the visualizations you might see throughout this project, due note that these are most commonly used in the data exploration stage of the project (which is stored in an .ipynb file):

Bar Plots/Charts: Illustrating the imbalances in the data, and exploration of relationships between potential features.
Heatmaps: Visualizing the relationship between various features, and their potential distributions / correlations.
Scatter Plots: Rough exploration of everything in the dataset.
and various other visualizations...

Models and Model Evaluation

We used a variety of models to explore the data, with various techniques. For techniques used, we employed SMOTE (Synthetic Minority Over-sampling Technique) to balance the data, and also SMOTENC (SMOTE for Numerical and Categorical Variables) to balance the dataset.

For our models we used the following:

Logistic Regression (SMOTE, SMOTENC)
DecisionTreeClassifier (SMOTE, SMOTENC)
RandomForestClassifier (SMOTE, SMOTENC)

To see the results of our models, and the evaluation of the models, please refer to the c309_r2_toodu_model.ipynb notebook in the server folder for more information.

Bibliography

Theft Over Open Data - https://data.torontopolice.on.ca/datasets/7530d9b637c340059ccb81a782481c04_0/explore?location=43.696813%2C-79.323493%2C10.36
Pandas - https://pandas.pydata.org/
Seaborn - https://seaborn.pydata.org/
Matplotlib - https://matplotlib.org/
Scikit-learn - https://scikit-learn.org/stable/
Numpy - https://numpy.org/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report

Summary

Executive Summary (Brief)

Columnar Analysis and Visualization

Types of Visualizations

Models and Model Evaluation

Bibliography

FilesExpand file tree

REPORT.md

Latest commit

History

REPORT.md

File metadata and controls

Report

Summary

Executive Summary (Brief)

Columnar Analysis and Visualization

Types of Visualizations

Models and Model Evaluation

Bibliography