Commit 49c97b09 authored by Roberto Martinez-Maldonado's avatar Roberto Martinez-Maldonado
Browse files

First release of the library

parent 41c17e97
Pipeline #11457 canceled with stages
We are planning to release the Moodoo library by the end of July, 2020.
Moodoo library: Indoor Positioning Analytics for Characterising Classroom Teaching.
If you are interested on this project, please, read the paper.
#Please, cite as:
Martinez-Maldonado, R., Echeverria, V., Schulte, J., Shibani, A., Mangaroska, K. and Buckingham Shum, S. (2020).
Moodoo: Indoor Positioning Analytics for Characterising Classroom Teaching.
International Conference on Artificial Intelligence in Education, AIED 2020, 360-373.
http://martinezmaldonado.net/files/AIED20-MoodooLocalisationLibrary.pdf
Moodoo: Indoor Positioning Analytics for Characterising Classroom Teaching
https://www.springerprofessional.de/en/moodoo-indoor-positioning-analytics-for-characterising-classroom/18146278
# How to install
If you want to get a preliminary version of the code or want to collaborate, please contact the development team at: Roberto.MartinezMaldonado@monash.edu
### Project dependencies
1. python 3.8
2. pip3
How to install python3.8
- For mac `brew install python@3.8`
Create virtual environment
- Using pycharm to create virtual environment
Install pip
`curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`
`python get-pip.py`
## Install dependencies
`pycairo` library needs`pkg-config` and `cairo` instaled
### For mac
Install `pkg-congig`
- `brew install pkg-config`
Install `Cairo`
- `brew install --cc=clang cairo`
Install pip requirements
- `pip3 install -r requirements.txt`
### How to run script
To run all the files using the test dataset run the script test\demoMAIN.py:
`python demoMAIN.py --all`
(NOTE: It can take some time to complete the analysis)
To test the functions in each script and generate intermediate output files,
run the files l demo1-5 in the following order:
test\demo1_preprocessing.py
test\demo2_stopsAndTransitions.py
test\demo3_fixedPoints.py
test\demo4_entropy.py
test\demo5_generateMetrics.py
The file info.ini contains important parameters that are used by the scripts.
To analyse your own data, example files are in the folder test\Merged dataset 2018-2019\.
Please, format your data using these samples as a reference.
This folder contains the following csv files:
demo_dataset_2019.csv - which contains the indoor positioning datapoints.
The positioning system POZYX was used to generate this file but similar x and y positions can be formatted
according to this example. Refer to the script _preprocessing.py for a description of the columns of this file.
demo_fixed_points_2019.csv - which contains datapoints of fixed objects. These can be of type 'student'
(used, for example, to refer to tables where there are students and tables don't move) or 'zones'
(used, for example, to point particular classroom areas or objects such as whiteboards, benches, the
teacher's computer, etc).
demo_phases_2019.csv - which contains information about the phases of each session in the dataset. Phases are
used to trim the dataset to consider only datapoints within phases and to generate metrics per phase.
If there are no phases in your dataset, at least, create one phase per session indicating the
beginning and the end of such session (e.g. the first and last datapoint of each session).
AUTHORS:
Roberto Martinez-Maldonado, Vanessa Echeverria, Jurgen Schulte, Antonette Shibani, Katerina Mangaroska and Simon Buckingham Shum
\ No newline at end of file
[parameters]
#DIMENSIONS OF THE ROOM
# size of the floor plan in milimeters -
# for example: 16,810 mm / 9,860 mm = 16.81m x 9.86m
room_x = 16810
room_y = 9860
# Position of the coordinate 0,0 in the floor plan (used to plot charts)
#Horizontal axis: Right or left?
HorizonalZero= right
#Vertical axis: Up or down?
VerticalZero= down
#PARAMETERS RELATED TO STOPS AND TRANSITIONS
# maximum distance to group positioning datapoints into a "STOP" (e.g. 1000 millimeters)
distance = 1000
# minimum time to group consecutive points into a "STOP" (e.g. 10 seconds = 00:00:10)
duration = 00:00:10
# maximum distance to consider a stop near a fixed point as potential instance of interaction
# for example, a teacher attending to a group of students (e.g. 1500 milimiters).
distance_tracker_fixed_point = 2000
#PARAMETERS RELATED TO ROTATION"
# a string value that indicates which rotation variable will be used to calculate the rotation of
# the sensor on the floorplan. It can ONLY take the values: 'yaw', 'roll' or 'pitch'
# If your dataset does not contain rotation you can add dummy values or ommit to run the function
# _preprocessing.add_rotation()
target_column = yaw
# the rotation in radians facing north (UPPER direction in the floor plan)
north=3.21
#PARAMETERS RELATED TO ENTROPY
#size of the grid cells used to calculate entropy (in milimeters)
size_of_grid_cells = 1000
#OUTPUT
#number of quartiles for analysing subsets -of equal duaration- of datapoints in each phase
#for example, set to 4 for dividing the data into quartiles
numberOfQuantiles=3
#weightening the results: if 1, then all the final outputs are weighted based on the duration of each phase
#with reference to the shortest session. This is useful for reporting normalised metrics.
weighted=1
#number of phases (DOUBLE CHECK WHY IS THIS NEEDED AND CANNOT BE OBTAINED FROM THE PHASES DATASET)
#phases=3
\ No newline at end of file
appnope==0.1.0
backcall==0.1.0
cycler==0.10.0
decorator==4.4.2
ipython==7.13.0
ipython-genutils==0.2.0
jedi==0.16.0
kiwisolver==1.2.0
matplotlib==3.2.1
numpy==1.18.2
pandas==1.0.3
parso==0.6.2
pexpect==4.8.0
pickleshare==0.7.5
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycairo==1.19.1
Pygments==2.6.1
pyparsing==2.4.7
python-dateutil==2.8.1
python-igraph==0.8.0
pytz==2019.3
scipy==1.4.1
seaborn==0.10.0
six==1.14.0
texttable==1.6.2
traitlets==4.3.3
wcwidth==0.1.9
xlrd==1.2.0
"""Scripts to generate metrics related to "fixed" objects in a classroom
This script allows the user to
i) calculate distances between moving trackers and fixed coordinates (zones/objects or students) in the classroom.
2) Calculate the index of dispersion between the moving trackers and fixed points tagged as "students"
This script requires that `pandas` be installed within the Python
environment you are running this script in.
This file can also be imported as a module and contains the following
functions:
* generate_fixed_points_stats (main) - to create a data frame (time each trackers was close to
'student' or 'zone' fixed points) that can be used to calculate the gini index and also used to generate
general stats relative to fixed points
* calculate_gini_by_tracker (main) - processes the data frame returned by the function
generate_fixed_points_stats and calculates the index of dispersion by tracker
* calculate_gini_trackers_together (main) - processes the data frame returned by the function
generate_fixed_points_stats and calculates the index for all trackers together
* gini (auxiliar)- function to calculate gini index of a SERIES - numpy array
* get_closer_fixedpoint_stop - auxiliar function to identify which fixed point is the closest to a stop
"""
import configparser
import numpy as np
import pandas as pd
import math
import datetime
from dateutil import parser
import time
import _util as util
#load parameters
config = configparser.ConfigParser()
config.read('../info.ini')
def generate_fixed_points_stats(df_stops_transitions,df_fixed_points):
"""This function creates a data frame with the time each tracker was close to a fixed point
This can be used to calculate the gini index if only student fixed points are selected.
This can also serve to generate metrics about fixed points in the classroom.
This function reads the following parameters from the configuration file:
distance_tracker_fixed_point
Parameters
----------
df_stops_transitions : Pandas Data Frame
The output from _stopsAndTransitions.get_stops_and_transitions() function
This is: a data frame of stops and transitions
df_fixed_points : Pandas Data Frame
Containing the coordinates of fixed objects in the classroom for each particular session.
It must contain the following columns:
session (identifier)
tag (string) name of the fixed object or position
x,y (coordinates)
It can contain the following columns (not yet used in the calculations)
time_start (datetime as "%Y-%m-%d_%H:%M:%S")
obj_type (string) type of fixed object or position (e.g. "zone") The output from _stopsAndTransitions.get_stops_and_transitions() function
Returns
-------
df_fixed_points_stats
returns a data frame with the stops close to any fixed point (student, zone, etc),
with the following columns:
block - (int) the unique identifier of the stop
session (identifier)
tracker (identifier)
x and y (coordinates)
tag (string) name of the fixed object or position
dist_student (distance between the stop and the object in milimeters)
timestamp (datetime as "%Y-%m-%d_%H:%M:%S")
max_duration_sec (float) duration of the stop in secons
phase (int)
quantile (int)
obj_type (string) "student" and "zone"
type (string) "stop" in all cases
"""
print ("Generating fixed-points related stats...")
#Select only stops from the df_stops_transitions dataframe
df1_fix = df_stops_transitions[['tracker', 'session','block','x','y','x_stdev','y_stdev','max_duration','type','timestamp']] # select columns
df1_fix = df1_fix.rename({'x': 'x_mean', 'y': 'y_mean'}, axis='columns')
df1_fix = df1_fix[df1_fix['type'] == 'stop']
#Merge datasets to obtain all the potential combinations between stops and fixed objects
merge = pd.merge(df1_fix, df_fixed_points, on='session')
#Calculate euclidian distances
merge['dist_student'] = np.sqrt((merge['x_mean'] - merge['x']) ** 2 + (merge['y_mean'] - merge['y']) ** 2)
#Simplifying output data frame
merge = merge[['block', 'tracker', 'session', 'tag', 'dist_student', 'timestamp']]
obj_types = []
for index, row_df in merge.iterrows():
data=df_fixed_points.loc[(df_fixed_points['session'] ==row_df['session']) & (df_fixed_points['tag'] == row_df['tag'])]
obj_types.append(data['obj_type'].values[0])
merge['obj_type'] = obj_types
df_dist=merge
#df_dist includes distances between each "stop" (associated to each tracker)
# and each fixed object (coordinate). It contains the following columns:
# block - (int) the unique identifier of the stop
# session (identifier)
# tracker (identifier)
# x and y (coordinates)
# tag (string) name of the fixed object or position
# dist_student (distance between the stop and the object in milimeters)
# time_start (datetime as "%Y-%m-%d_%H:%M:%S")
df_stops_transitions.sort_values(by=['session'], inplace=True)
# Create structure with stops only
stops = df_stops_transitions.loc[(df_stops_transitions['type'] == 'stop')][['block','session','tracker','timestamp','phase','quantile','max_duration_sec','x','y','type']]
stops.set_index("block", inplace = True)
stops.sort_values(by=['session'], inplace=True)
stops
# identify closest fixed point to each stop
df_min_dis=get_closer_fixedpoint_stop(df_dist,stops)
# Remove distances over the parameter distance_tracker_fixed_point
distance_tracker_fixed_point= float(config.get('parameters','distance_tracker_fixed_point'))
df_min_dis = df_min_dis.loc[(df_min_dis['dist_student'] <= distance_tracker_fixed_point)]
# Calculate total time dedicated to each group of stduents
summary=df_min_dis.groupby(['session','tracker','phase','tag'])['max_duration_sec'].agg(['sum','count'])
summary.reset_index(inplace=True)
# Select only "student" points from the list of ALL fixed ppints
#df_fixed_points = df_fixed_points.loc[(df_fixed_points['obj_type'] == 'student')]
##Create structure that will hold SUMMARY data frame plus the fixed trackers that were never visited.
col_names = ['session', 'tracker', 'phase', 'tag', 'sum','count','type']
df_fixed_points_stats = pd.DataFrame(columns = col_names)
summary.reset_index(drop=True)
for index_fix, row_fixed in df_fixed_points.iterrows():
phases= summary.loc[(summary['session'] == row_fixed['session'])]['phase'].drop_duplicates()
trackers= summary.loc[(summary['session'] == row_fixed['session'])]['tracker'].drop_duplicates()
for i1, ph in phases.items():
for i2, tr in trackers.items():
df_fixed_points_stats=df_fixed_points_stats.append({'session':row_fixed['session'],'tracker':tr,'phase':ph,'tag':row_fixed['tag'],'sum':0.0,'count':0.0,'obj_type':row_fixed['obj_type']}, ignore_index=True)
for index_sum, r_sum in summary.iterrows():
for index_g, r_gini in df_fixed_points_stats.iterrows():
if (r_sum['session']==r_gini['session'] and r_sum['tracker']==r_gini['tracker'] and
r_sum['phase']==r_gini['phase'] and
r_sum['tag']==r_gini['tag']):
df_fixed_points_stats.loc[index_g, 'sum'] = r_sum['sum']
df_fixed_points_stats.loc[index_g, 'count'] = r_sum['count']
r_gini['sum']=r_sum['sum']
r_gini['count']=r_sum['count']
print ("Fixed points stas generation COMPLETED")
return (df_fixed_points_stats)
def calculate_gini_by_tracker(df_fixed_points_stats):
"""This function processes the data frame returned by the function
generate_fixed_points_stats and calculates the index of dispersion by tracker
Parameters
----------
df_fixed_points_stats
returns a data frame with the following columns
block - (int) the unique identifier of the stop
session (identifier)
tracker (identifier)
x and y (coordinates)
tag (string) name of the fixed object or position
dist_student (distance between the stop and the object in milimeters)
timestamp (datetime as "%Y-%m-%d_%H:%M:%S")
max_duration_sec (float) duration of the stop in secons
phase (int)
quantile (int)
obj_type (string) student and zones
type (string) "stop" in all cases
Returns
-------
gini_output_separate_trackers
returns a data frame with the following columns
session (identifier)
tracker (identifier)
phase (int)
gini (float) the final result
"""
print ("Calculating gini index by tracker")
# Select only stops closer to a student
df_gini = df_fixed_points_stats.loc[(df_fixed_points_stats['obj_type'] == 'student')]
#CALCULATE GINI INDEX by session, tracker and phase
gini_output_separate_trackers=df_gini.groupby(['session','tracker','phase'])['count'].agg([gini])
print ("Gini index by tracker COMPLETED")
return (gini_output_separate_trackers)
def calculate_gini_trackers_together(df_fixed_points_stats):
"""This function processes the data frame returned by the function generate_fixed_points_stats
grouped by session and phase (all trackers together)
Parameters
----------
df_fixed_points_stats
returns a data frame with the following columns
block - (int) the unique identifier of the stop
session (identifier)
tracker (identifier)
x and y (coordinates)
tag (string) name of the fixed object or position
dist_student (distance between the stop and the object in milimeters)
timestamp (datetime as "%Y-%m-%d_%H:%M:%S")
max_duration_sec (float) duration of the stop in secons
phase (int)
quantile (int)
obj_type (string) student and zones
type (string) "stop" in all cases
Returns
-------
gini_output_joint_trackers
returns a data frame with the following columns
session (identifier)
tracker (identifier)
gini (float) the final result
"""
print ("Calculating gini index all tracker together")
# Select only stops closer to a student
df_gini = df_fixed_points_stats.loc[(df_fixed_points_stats['obj_type'] == 'student')]
#CALCULATE GINI INDEX by session and phase (all trackers together)
gini_output_joint_trackers=df_gini.groupby(['session','phase'])['count'].agg([gini])
print ("Gini index for all trackers COMPLETED")
return (gini_output_joint_trackers)
def gini(array):
"""This function calculates the Gini coefficient of a SERIES ####numpy array.
Parameters
----------
array : series (float)
Returns
-------
gini coefficient (float)
"""
#Roberto added this line
array=array.as_matrix(columns=None)#<-------there is a warning here
# based on bottom eq:
# http://www.statsdirect.com/help/generatedimages/equations/equation154.svg
# from:
# http://www.statsdirect.com/help/default.htm#nonparametric_methods/gini.htm
# All values are treated equally, arrays must be 1d:
array = array.flatten()
if np.amin(array) < 0:
# Values cannot be negative:
array -= np.amin(array)
# Values cannot be 0:
array += 0.0000001
# Values must be sorted:
array = np.sort(array)
# Index per array element:
index = np.arange(1,array.shape[0]+1)
# Number of array elements:
n = array.shape[0]
# Gini coefficient:
return ((np.sum((2 * index - n - 1) * array)) / (n * np.sum(array)))
def get_closer_fixedpoint_stop(df_dist,df_stops):
"""This function merges dataframes of stops and distances between stops and fixed points.
It returns a list of stops with the closest fixed point to it and the distance to it.
Parameters
----------
df_dist : Pandas Data Frame
The output from calculate_distances_trackers_objects() function in this script
This is includes distances between each "stop" (associated to each tracker)
and each fixed object (coordinate). It contains the following columns:
block - (int) the unique identifier of the stop
session (identifier)
tracker (identifier)
x and y (coordinates)
tag (string) name of the fixed object or position
dist_student (distance between the stop and the object in milimeters)
time_start (datetime as "%Y-%m-%d_%H:%M:%S")
df_stops : Pandas Data Frame
This is: a data frame of stops and transitions
Returns
-------
merge
returns a data frame with the following selected columns from the merge
block - (int) the unique identifier of the stop
session (identifier)
tracker (identifier)
x and y (coordinates)
tag (string) name of the fixed object or position
dist_student (distance between the stop and the object in milimeters)
timestamp (datetime as "%Y-%m-%d_%H:%M:%S")
max_duration_sec (float) duration of the stop in secons
phase (int)
quantile (int)
obj_type (string) "student" in all cases
type (string) "stop" in all cases
"""
### Select row with minimum distance to a fixed object grouped by block, tracker, session
df_min_dis=df_dist[df_dist['dist_student'].isin(df_dist.groupby(('block','tracker','session')).min()['dist_student'].values)]
# sort Brand - ascending order
df_min_dis.sort_values(by=['session'], inplace=True)
#Merge both dataframes to identify what fixed point is the closest to a stop
merge = pd.merge(df_min_dis, df_stops, on=['tracker','session','block'])
merge = merge[['block','session','tracker','tag','dist_student','timestamp_x','obj_type','phase','quantile','max_duration_sec','x','y','type']]
merge = merge.rename({'timestamp_x': 'timestamp'}, axis='columns')
return (merge)
\ No newline at end of file
"""Scripts to generate metrics related to entropy
This script allows the user to
i) calculate entropy on the pre-processed positioning dataset based on the dimensions of the room
This script requires that `pandas` be installed within the Python
environment you are running this script in.
This file can also be imported as a module and contains the following
functions:
* calculate_entropy_session_tracker_phase (main function)- ENTROPY grouped by session, tracker, phase
* calculate_entropy_session_tracker - ENTROPY grouped by session, tracker (created in case this is needed)
* get_entropy (auxiliar)- This function calculates entropy for a given DataFrame with a grid of proportions in column "grid".
* plot_charts_per_tracker - This function generates Voronoi, ConvexHull and Delaunay charts in the folder "output_figures"
per tracker.
"""
import configparser
import numpy as np
import pandas as pd
from scipy.stats import entropy
from scipy.spatial import Voronoi, voronoi_plot_2d, ConvexHull , convex_hull_plot_2d, Delaunay, delaunay_plot_2d
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.cm as cm
import csv
import math
import time
import _util as util
from pathlib import Path
#load parameters
config = configparser.ConfigParser()
config.read('../info.ini')
def calculate_entropy_session_tracker_phase(df_dist):
"""This function generates a grid for each session, tracker and phase to calculate the entropy of that tracker
in each "phase".
This function reads the following parameters from the configuration file:
room_x
room_y
size_of_grid_cells
Parameters
----------
df_dist : Pandas Data Frame
A Localization DataFrame whith at least the following columns:
timestamp (datetime as "%Y-%m-%d_%H:%M:%S")
session (identifier)
tracker (identifier)
x and y (coordinates)
phase (int)
quantile (int) Set to 1 if not interested in using this column
Returns
-------
pairs_session_tracker
returns a data frame with the following columns
session (identifier)
tracker (identifier)
count (int) number of datapoints considered
grid the m by n matrix that contains the proportion of data points in each cell.
the matrix is created based on the dimensions of the room and a cell size set in the configuration file.
entropy - unidimensional entropy calculated on the values of the grid
"""
print ("Calculating entropy.")
# Read room dimensions and grid size from config file
room_x= float(config.get('parameters','room_x'))
room_y= float(config.get('parameters','room_y'))
size_of_grid_cells= float(config.get('parameters','size_of_grid_cells'))
# Calculate number of columns and rows
n_gridsquares = int(round(room_x/size_of_grid_cells,0))
m_gridsquares = int(round(room_y/size_of_grid_cells,0))
distinct_phase_quartile=df_dist.groupby(['session','tracker','phase']).size().reset_index().rename(columns={0:'count'})
# Create a proportional grid for each session, tracker and phase
grids = []
for index, row_pair in distinct_phase_quartile.iterrows():
##CREATE GRID WITH ZEROS
new_grid = [0] * m_gridsquares
for i in range(m_gridsquares):
new_grid[i] = [0] * n_gridsquares
#print ('Grid created')
##FILL GRID FROM THE DATASET- SUM of DATAPOINTS
for index, row_df in df_dist.iterrows():
x_var=int(math.floor(row_df['x']/size_of_grid_cells))
y_var=int(math.floor(row_df['y']/size_of_grid_cells))
if (x_var<n_gridsquares and y_var<m_gridsquares
and row_df['tracker']==row_pair['tracker']
and row_df['session']==row_pair['session']
and row_df['phase']==row_pair['phase']
#and row_df['quartile']==row_pair['quartile']
):
(new_grid[y_var])[x_var] = (new_grid[y_var])[x_var] + 1
#print ('Datapoints')
#print (pd.DataFrame(new_grid))
## CALCULATE the proportion of data points for the given period
#print ('...calculating proportions')
y_index=0
x_index=0
for row in new_grid:
for column in row:
#print ((new_grid[x_index])[y_index])
(new_grid[x_index])[y_index] = ( (new_grid[x_index])[y_index] * 100 / row_pair['count'])
#print ((new_grid[x_index])[y_index])
y_index+=1
x_index+=1
y_index=0
#print ('Proportions')
#print (pd.DataFrame(new_grid))
grids.append(new_grid)
##Append grids list to pairs-session_tracker structure
distinct_phase_quartile['grid'] = grids
#Calculate entropies
distinct_phase_quartile=get_entropy(distinct_phase_quartile)
print ("Entropy calculation per phase COMPLETED")
return (distinct_phase_quartile)