Quickstart

This is a notebook explaining the usage of the main features of mobilkit.

The features covered here are:

  • how to create a synthetic dataset from the Microsoft’s GeoLife dataset.

  • how to create a tessellation shapefile in case you only have a collection of centroid;

  • load data from a pandas dataframe;

  • tessellate the pings (assign them to a given location);

  • compute the land use of an urban area;

  • compute the resident population for each area and compare it with census figures;

  • compute user activity statistics and filter users accordingly;

  • compute the displacement figures in a given area.

To allow the publication of the data we used an open dataset such as the GeoLife one. We augment the number of users observed by using some functions present in the mobilkit.loader module.

Depending on the case we map each user/day or each user/week to a synthetic user performing the same events as in the original dataset at the same original time, just traslating everything in the synthetic day or week..

To continue you have to download the GeoLife dataset and uncompress it in the data directory.

[1]:
%config Completer.use_jedi = False
%matplotlib inline

import pytz
from datetime import datetime
import geopandas as gpd

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import mobilkit
from dask.distributed import Client
from shapely.geometry import Polygon, Point
from dask import dataframe as dd

import warnings
warnings.filterwarnings('ignore')

import mobilkit

Create tessellation from points

I use the file with centroids found here, select some points in Beijing and create a Voronoi tessellation of it.

[2]:
# Choose the spatial extent of your analysis and where to save the Vortonoi tessellation
box = (116.20, 39.74, 116.56, 40.06)
voronoi_file = "../../data/Beijing/voronoi_points_beijing.shp"

df_china = gpd.read_file("../../data/Beijing/DT19new/PopCensus2010_township.shp")
if not os.path.exists(voronoi_file):
    poly_box = mobilkit.spatial.box2poly(box)
    df_china = df_china[df_china.within(poly_box)]
    poly_gdf = gpd.GeoDataFrame(["Region"], geometry=[poly_box], crs=df_china.crs)
    layer = mobilkit.spatial.makeVoronoi(df_china)
    layer.to_file(voronoi_file)
else:
    layer = gpd.read_file(voronoi_file)
    box = layer.unary_union.bounds
    poly_box = mobilkit.spatial.box2poly(box)
    df_china = df_china[df_china.within(poly_box)]
[3]:
ax = layer.plot()
df_china.plot(color="r", ax=ax)
[3]:
<Axes: >
../_images/examples_mobilkit_tutorial_4_1.png

Load/reload the geolife trajectories data

You should manually download the data from here and put them into to data/ folder and unzip them there.

cd ../data/
wget https://download.microsoft.com/download/F/4/8/F4894AA5-FDBC-481E-9285-D5F8C4C4F039/Geolife%20Trajectories%201.3.zip
unzip Geolife%20Trajectories%201.3.zip
[4]:
# !wget -P ../../data/ https://download.microsoft.com/download/F/4/8/F4894AA5-FDBC-481E-9285-D5F8C4C4F039/Geolife%20Trajectories%201.3.zip
# !unzip -d ../../data/ ../../data/Geolife\ Trajectories\ 1.3.zip
[5]:
geolifePath = "../../data/Geolife Trajectories 1.3"
pkl_trajectories = "../../data/Geolife Trajectories 1.3/processed_traj.pkl"
if not os.path.exists(pkl_trajectories):
    df_geolife = mobilkit.loader.loadGeolifeData(geolifePath)
    df_geolife.to_pickle(pkl_trajectories)
else:
    df_geolife = pd.read_pickle(pkl_trajectories)

df_geolife.head()
[5]:
UTC acc datetime lat lng uid
0 1224730384 1 2008-10-23 10:53:04+08:00 39.984702 116.318417 0
1 1224730390 1 2008-10-23 10:53:10+08:00 39.984683 116.318450 0
2 1224730395 1 2008-10-23 10:53:15+08:00 39.984686 116.318417 0
3 1224730400 1 2008-10-23 10:53:20+08:00 39.984688 116.318385 0
4 1224730405 1 2008-10-23 10:53:25+08:00 39.984655 116.318263 0

Create synthetic days/week from the data

We perform two expansion of the data: - a weekly one, where each (user,week) couple is treated as a separate user and all the events are moved to a synthetic week keeping the original weekday, hour, minute and second of the recorded point; - a daily one, where each (user,day) couple is treated as a separate user and all the events are moved to a synthetic day keeping the original hour, minute and second of the recorded point;

[6]:
# One day with all the data projected to a single day
selected_day = datetime(2020, 6, 1, tzinfo=pytz.timezone("UTC"))
df_users_day = mobilkit.loader.syntheticGeoLifeDay(df_geolife, selected_day=selected_day)
df_users_day["uid"].nunique()
[6]:
11152
[7]:
# One week with all the data projected to a single week
selected_week = datetime(2020, 6, 4, tzinfo=pytz.timezone("UTC"))
df_users_week = mobilkit.loader.syntheticGeoLifeWeek(df_geolife, selected_week=selected_week)
df_users_week["uid"].nunique()
Anticipated the date to Monday:  2020-06-01 00:00:00+00:00
[7]:
2524

We now have two dataframes containing our data.

Each row is an event containing the spatial and temporal information.

[8]:
df_users_week.head(4)
[8]:
UTC acc datetime lat lng uid
0 1591267984 1 2020-06-04 10:53:04+00:00 39.984702 116.318417 0
1 1591267990 1 2020-06-04 10:53:10+00:00 39.984683 116.318450 0
2 1591267995 1 2020-06-04 10:53:15+00:00 39.984686 116.318417 0
3 1591268000 1 2020-06-04 10:53:20+00:00 39.984688 116.318385 0
[9]:
# Get a small fraction of the pings and see how they are distributed over the tessellation
ax = df_users_day.sample(frac=.01).plot("lng","lat",kind="scatter", alpha=.01)
ax = layer.plot(color="none", edgecolor="k", ax=ax)

box = layer.unary_union.bounds
plt.xlim(box[0], box[2])
plt.ylim(box[1], box[3])
[9]:
(39.74096801500008, 40.05733690900007)
../_images/examples_mobilkit_tutorial_13_1.png

Create dask client

Launch worker and scheduler if working on localhost with:

dask-worker 127.0.0.1:8786 --nworkers -1 & dask-scheduler

If you get an error with Popen in dask-worker, add the option --preload-nanny multiprocessing.popen_spawn_posix to the first command.

[10]:
client = Client(address="127.0.0.1:8786")
client
[10]:

Client

Client-ed631bd2-0f93-11ee-97cc-63c186d89b62

Connection method: Direct
Dashboard: http://127.0.0.1:8787/status

Scheduler Info

Scheduler

Scheduler-19415a23-7a4e-4501-a545-b2a63b4d907a

Comm: tcp://192.168.1.20:8786 Workers: 48
Dashboard: http://192.168.1.20:8787/status Total threads: 48
Started: 18 minutes ago Total memory: 188.55 GiB

Workers

Worker: tcp://127.0.0.1:33061

Comm: tcp://127.0.0.1:33061 Total threads: 1
Dashboard: http://127.0.0.1:42417/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:33353
Local directory: /tmp/dask-scratch-space/worker-r12h7jeq
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 434.76 MiB Spilled bytes: 0 B
Read bytes: 88.92 kiB Write bytes: 89.51 kiB

Worker: tcp://127.0.0.1:33673

Comm: tcp://127.0.0.1:33673 Total threads: 1
Dashboard: http://127.0.0.1:44469/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:35049
Local directory: /tmp/dask-scratch-space/worker-s5arlujv
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 412.54 MiB Spilled bytes: 0 B
Read bytes: 21.80 kiB Write bytes: 22.28 kiB

Worker: tcp://127.0.0.1:34243

Comm: tcp://127.0.0.1:34243 Total threads: 1
Dashboard: http://127.0.0.1:45703/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:41873
Local directory: /tmp/dask-scratch-space/worker-bneglcv5
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 469.80 MiB Spilled bytes: 0 B
Read bytes: 60.33 kiB Write bytes: 61.13 kiB

Worker: tcp://127.0.0.1:34473

Comm: tcp://127.0.0.1:34473 Total threads: 1
Dashboard: http://127.0.0.1:40907/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:36857
Local directory: /tmp/dask-scratch-space/worker-4u4p74rg
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 341.86 MiB Spilled bytes: 0 B
Read bytes: 45.54 kiB Write bytes: 45.40 kiB

Worker: tcp://127.0.0.1:34759

Comm: tcp://127.0.0.1:34759 Total threads: 1
Dashboard: http://127.0.0.1:37909/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:37765
Local directory: /tmp/dask-scratch-space/worker-lqyevwi_
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 503.96 MiB Spilled bytes: 0 B
Read bytes: 90.25 kiB Write bytes: 90.85 kiB

Worker: tcp://127.0.0.1:35245

Comm: tcp://127.0.0.1:35245 Total threads: 1
Dashboard: http://127.0.0.1:39519/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:44891
Local directory: /tmp/dask-scratch-space/worker-z1a38btj
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 404.91 MiB Spilled bytes: 0 B
Read bytes: 57.19 kiB Write bytes: 57.98 kiB

Worker: tcp://127.0.0.1:36047

Comm: tcp://127.0.0.1:36047 Total threads: 1
Dashboard: http://127.0.0.1:36449/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:34021
Local directory: /tmp/dask-scratch-space/worker-88p660z1
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 439.95 MiB Spilled bytes: 0 B
Read bytes: 38.67 kiB Write bytes: 38.53 kiB

Worker: tcp://127.0.0.1:36421

Comm: tcp://127.0.0.1:36421 Total threads: 1
Dashboard: http://127.0.0.1:39449/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:41417
Local directory: /tmp/dask-scratch-space/worker-_990tl15
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 519.66 MiB Spilled bytes: 0 B
Read bytes: 96.51 kiB Write bytes: 97.10 kiB

Worker: tcp://127.0.0.1:36667

Comm: tcp://127.0.0.1:36667 Total threads: 1
Dashboard: http://127.0.0.1:34335/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:42983
Local directory: /tmp/dask-scratch-space/worker-ydyoh6ee
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 399.44 MiB Spilled bytes: 0 B
Read bytes: 27.72 kiB Write bytes: 27.58 kiB

Worker: tcp://127.0.0.1:36765

Comm: tcp://127.0.0.1:36765 Total threads: 1
Dashboard: http://127.0.0.1:42401/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:35995
Local directory: /tmp/dask-scratch-space/worker-a084fgc3
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 483.52 MiB Spilled bytes: 0 B
Read bytes: 87.16 kiB Write bytes: 87.76 kiB

Worker: tcp://127.0.0.1:36769

Comm: tcp://127.0.0.1:36769 Total threads: 1
Dashboard: http://127.0.0.1:34913/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:44139
Local directory: /tmp/dask-scratch-space/worker-72fa7jpu
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 360.64 MiB Spilled bytes: 0 B
Read bytes: 47.68 kiB Write bytes: 47.39 kiB

Worker: tcp://127.0.0.1:37607

Comm: tcp://127.0.0.1:37607 Total threads: 1
Dashboard: http://127.0.0.1:36365/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:39637
Local directory: /tmp/dask-scratch-space/worker-s8829bvw
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 379.50 MiB Spilled bytes: 0 B
Read bytes: 37.43 kiB Write bytes: 37.29 kiB

Worker: tcp://127.0.0.1:38207

Comm: tcp://127.0.0.1:38207 Total threads: 1
Dashboard: http://127.0.0.1:36223/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:45579
Local directory: /tmp/dask-scratch-space/worker-rfmownb0
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 509.14 MiB Spilled bytes: 0 B
Read bytes: 25.77 kiB Write bytes: 25.63 kiB

Worker: tcp://127.0.0.1:38387

Comm: tcp://127.0.0.1:38387 Total threads: 1
Dashboard: http://127.0.0.1:34409/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:35175
Local directory: /tmp/dask-scratch-space/worker-y1l_4piy
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 400.66 MiB Spilled bytes: 0 B
Read bytes: 58.84 kiB Write bytes: 59.64 kiB

Worker: tcp://127.0.0.1:38389

Comm: tcp://127.0.0.1:38389 Total threads: 1
Dashboard: http://127.0.0.1:42719/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:42211
Local directory: /tmp/dask-scratch-space/worker-fxsg7_yh
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 421.97 MiB Spilled bytes: 0 B
Read bytes: 22.62 kiB Write bytes: 22.48 kiB

Worker: tcp://127.0.0.1:38749

Comm: tcp://127.0.0.1:38749 Total threads: 1
Dashboard: http://127.0.0.1:39495/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:32801
Local directory: /tmp/dask-scratch-space/worker-p5jrdspb
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 361.07 MiB Spilled bytes: 0 B
Read bytes: 79.55 kiB Write bytes: 80.15 kiB

Worker: tcp://127.0.0.1:39063

Comm: tcp://127.0.0.1:39063 Total threads: 1
Dashboard: http://127.0.0.1:44649/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:32959
Local directory: /tmp/dask-scratch-space/worker-9gxew3ha
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 450.54 MiB Spilled bytes: 0 B
Read bytes: 29.03 kiB Write bytes: 28.89 kiB

Worker: tcp://127.0.0.1:39107

Comm: tcp://127.0.0.1:39107 Total threads: 1
Dashboard: http://127.0.0.1:45405/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:41081
Local directory: /tmp/dask-scratch-space/worker-asqqrdqf
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 396.77 MiB Spilled bytes: 0 B
Read bytes: 84.36 kiB Write bytes: 84.96 kiB

Worker: tcp://127.0.0.1:39215

Comm: tcp://127.0.0.1:39215 Total threads: 1
Dashboard: http://127.0.0.1:40273/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:42749
Local directory: /tmp/dask-scratch-space/worker-9kdkwdsr
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 414.57 MiB Spilled bytes: 0 B
Read bytes: 84.69 kiB Write bytes: 85.29 kiB

Worker: tcp://127.0.0.1:40037

Comm: tcp://127.0.0.1:40037 Total threads: 1
Dashboard: http://127.0.0.1:35931/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:42683
Local directory: /tmp/dask-scratch-space/worker-ssx075ud
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 424.18 MiB Spilled bytes: 0 B
Read bytes: 93.57 kiB Write bytes: 94.17 kiB

Worker: tcp://127.0.0.1:40175

Comm: tcp://127.0.0.1:40175 Total threads: 1
Dashboard: http://127.0.0.1:39195/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:46365
Local directory: /tmp/dask-scratch-space/worker-636uq6z1
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 362.13 MiB Spilled bytes: 0 B
Read bytes: 61.08 kiB Write bytes: 61.68 kiB

Worker: tcp://127.0.0.1:40259

Comm: tcp://127.0.0.1:40259 Total threads: 1
Dashboard: http://127.0.0.1:36019/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:33883
Local directory: /tmp/dask-scratch-space/worker-tqzry7i6
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 503.07 MiB Spilled bytes: 0 B
Read bytes: 43.87 kiB Write bytes: 43.73 kiB

Worker: tcp://127.0.0.1:40895

Comm: tcp://127.0.0.1:40895 Total threads: 1
Dashboard: http://127.0.0.1:42945/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:44851
Local directory: /tmp/dask-scratch-space/worker-nzdzwzqk
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 455.16 MiB Spilled bytes: 0 B
Read bytes: 40.35 kiB Write bytes: 40.21 kiB

Worker: tcp://127.0.0.1:40943

Comm: tcp://127.0.0.1:40943 Total threads: 1
Dashboard: http://127.0.0.1:36525/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:39021
Local directory: /tmp/dask-scratch-space/worker-k66o5m6d
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 375.66 MiB Spilled bytes: 0 B
Read bytes: 88.78 kiB Write bytes: 89.38 kiB

Worker: tcp://127.0.0.1:40971

Comm: tcp://127.0.0.1:40971 Total threads: 1
Dashboard: http://127.0.0.1:40935/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:44031
Local directory: /tmp/dask-scratch-space/worker-6vxt3tl1
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 487.20 MiB Spilled bytes: 0 B
Read bytes: 65.92 kiB Write bytes: 66.52 kiB

Worker: tcp://127.0.0.1:41117

Comm: tcp://127.0.0.1:41117 Total threads: 1
Dashboard: http://127.0.0.1:46605/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:37421
Local directory: /tmp/dask-scratch-space/worker-92anioeb
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 415.42 MiB Spilled bytes: 0 B
Read bytes: 35.45 kiB Write bytes: 35.31 kiB

Worker: tcp://127.0.0.1:41267

Comm: tcp://127.0.0.1:41267 Total threads: 1
Dashboard: http://127.0.0.1:36419/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:33383
Local directory: /tmp/dask-scratch-space/worker-9wp3do47
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 436.40 MiB Spilled bytes: 0 B
Read bytes: 85.92 kiB Write bytes: 86.52 kiB

Worker: tcp://127.0.0.1:41391

Comm: tcp://127.0.0.1:41391 Total threads: 1
Dashboard: http://127.0.0.1:33513/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:46637
Local directory: /tmp/dask-scratch-space/worker-o9gu0xbx
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 363.50 MiB Spilled bytes: 0 B
Read bytes: 53.33 kiB Write bytes: 51.74 kiB

Worker: tcp://127.0.0.1:41457

Comm: tcp://127.0.0.1:41457 Total threads: 1
Dashboard: http://127.0.0.1:42281/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:45053
Local directory: /tmp/dask-scratch-space/worker-bsdv0ff5
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 494.88 MiB Spilled bytes: 0 B
Read bytes: 62.37 kiB Write bytes: 62.97 kiB

Worker: tcp://127.0.0.1:42041

Comm: tcp://127.0.0.1:42041 Total threads: 1
Dashboard: http://127.0.0.1:41729/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:44953
Local directory: /tmp/dask-scratch-space/worker-hv98qnhl
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 330.66 MiB Spilled bytes: 0 B
Read bytes: 32.66 kiB Write bytes: 32.52 kiB

Worker: tcp://127.0.0.1:42311

Comm: tcp://127.0.0.1:42311 Total threads: 1
Dashboard: http://127.0.0.1:42045/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:44311
Local directory: /tmp/dask-scratch-space/worker-9vml91gr
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 414.96 MiB Spilled bytes: 0 B
Read bytes: 88.75 kiB Write bytes: 89.35 kiB

Worker: tcp://127.0.0.1:42643

Comm: tcp://127.0.0.1:42643 Total threads: 1
Dashboard: http://127.0.0.1:40399/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:44141
Local directory: /tmp/dask-scratch-space/worker-7v7xi_zz
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 499.41 MiB Spilled bytes: 0 B
Read bytes: 69.18 kiB Write bytes: 69.78 kiB

Worker: tcp://127.0.0.1:42707

Comm: tcp://127.0.0.1:42707 Total threads: 1
Dashboard: http://127.0.0.1:35825/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:32839
Local directory: /tmp/dask-scratch-space/worker-8hxynz67
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 442.02 MiB Spilled bytes: 0 B
Read bytes: 51.71 kiB Write bytes: 50.13 kiB

Worker: tcp://127.0.0.1:42937

Comm: tcp://127.0.0.1:42937 Total threads: 1
Dashboard: http://127.0.0.1:41125/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:42879
Local directory: /tmp/dask-scratch-space/worker-pck0v5cq
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 486.73 MiB Spilled bytes: 0 B
Read bytes: 93.77 kiB Write bytes: 94.37 kiB

Worker: tcp://127.0.0.1:43321

Comm: tcp://127.0.0.1:43321 Total threads: 1
Dashboard: http://127.0.0.1:37537/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:38345
Local directory: /tmp/dask-scratch-space/worker-1z4_uqta
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 422.61 MiB Spilled bytes: 0 B
Read bytes: 42.24 kiB Write bytes: 42.10 kiB

Worker: tcp://127.0.0.1:43335

Comm: tcp://127.0.0.1:43335 Total threads: 1
Dashboard: http://127.0.0.1:40561/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:46705
Local directory: /tmp/dask-scratch-space/worker-j8wpunjg
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 421.68 MiB Spilled bytes: 0 B
Read bytes: 24.19 kiB Write bytes: 24.05 kiB

Worker: tcp://127.0.0.1:43655

Comm: tcp://127.0.0.1:43655 Total threads: 1
Dashboard: http://127.0.0.1:41059/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:37321
Local directory: /tmp/dask-scratch-space/worker-57q_3n0w
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 507.19 MiB Spilled bytes: 0 B
Read bytes: 83.08 kiB Write bytes: 83.68 kiB

Worker: tcp://127.0.0.1:43735

Comm: tcp://127.0.0.1:43735 Total threads: 1
Dashboard: http://127.0.0.1:34755/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:36489
Local directory: /tmp/dask-scratch-space/worker-dk5fqm1b
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 450.47 MiB Spilled bytes: 0 B
Read bytes: 69.25 kiB Write bytes: 69.85 kiB

Worker: tcp://127.0.0.1:43811

Comm: tcp://127.0.0.1:43811 Total threads: 1
Dashboard: http://127.0.0.1:46041/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:36853
Local directory: /tmp/dask-scratch-space/worker-ldw_92u5
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 410.17 MiB Spilled bytes: 0 B
Read bytes: 91.74 kiB Write bytes: 92.34 kiB

Worker: tcp://127.0.0.1:44171

Comm: tcp://127.0.0.1:44171 Total threads: 1
Dashboard: http://127.0.0.1:45009/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:37799
Local directory: /tmp/dask-scratch-space/worker-_72pnvii
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 457.95 MiB Spilled bytes: 0 B
Read bytes: 30.67 kiB Write bytes: 30.52 kiB

Worker: tcp://127.0.0.1:44691

Comm: tcp://127.0.0.1:44691 Total threads: 1
Dashboard: http://127.0.0.1:44787/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:34333
Local directory: /tmp/dask-scratch-space/worker-wvd04iar
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 498.45 MiB Spilled bytes: 0 B
Read bytes: 72.28 kiB Write bytes: 72.88 kiB

Worker: tcp://127.0.0.1:44913

Comm: tcp://127.0.0.1:44913 Total threads: 1
Dashboard: http://127.0.0.1:45675/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:42331
Local directory: /tmp/dask-scratch-space/worker-w7i24zia
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 428.10 MiB Spilled bytes: 0 B
Read bytes: 72.78 kiB Write bytes: 73.37 kiB

Worker: tcp://127.0.0.1:45061

Comm: tcp://127.0.0.1:45061 Total threads: 1
Dashboard: http://127.0.0.1:35047/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:39355
Local directory: /tmp/dask-scratch-space/worker-6gc4lwdh
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 445.23 MiB Spilled bytes: 0 B
Read bytes: 79.51 kiB Write bytes: 80.11 kiB

Worker: tcp://127.0.0.1:45349

Comm: tcp://127.0.0.1:45349 Total threads: 1
Dashboard: http://127.0.0.1:34555/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:41015
Local directory: /tmp/dask-scratch-space/worker-s8l5kjwe
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 4.0% Last seen: Just now
Memory usage: 476.23 MiB Spilled bytes: 0 B
Read bytes: 76.60 kiB Write bytes: 77.20 kiB

Worker: tcp://127.0.0.1:45523

Comm: tcp://127.0.0.1:45523 Total threads: 1
Dashboard: http://127.0.0.1:36605/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:46797
Local directory: /tmp/dask-scratch-space/worker-9c43joay
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 459.50 MiB Spilled bytes: 0 B
Read bytes: 76.52 kiB Write bytes: 77.12 kiB

Worker: tcp://127.0.0.1:46139

Comm: tcp://127.0.0.1:46139 Total threads: 1
Dashboard: http://127.0.0.1:43055/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:41489
Local directory: /tmp/dask-scratch-space/worker-j_bablof
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 0.0% Last seen: Just now
Memory usage: 390.72 MiB Spilled bytes: 0 B
Read bytes: 66.25 kiB Write bytes: 66.85 kiB

Worker: tcp://127.0.0.1:46807

Comm: tcp://127.0.0.1:46807 Total threads: 1
Dashboard: http://127.0.0.1:46333/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:38421
Local directory: /tmp/dask-scratch-space/worker-g0hp9pon
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 451.67 MiB Spilled bytes: 0 B
Read bytes: 34.25 kiB Write bytes: 34.11 kiB

Worker: tcp://127.0.0.1:46859

Comm: tcp://127.0.0.1:46859 Total threads: 1
Dashboard: http://127.0.0.1:44231/status Memory: 3.93 GiB
Nanny: tcp://127.0.0.1:36011
Local directory: /tmp/dask-scratch-space/worker-lz4ifeiu
Tasks executing: Tasks in memory:
Tasks ready: Tasks in flight:
CPU usage: 2.0% Last seen: Just now
Memory usage: 169.81 MiB Spilled bytes: 0 B
Read bytes: 64.28 kiB Write bytes: 64.87 kiB

Load events in dask

Here we use the dask API, see the loader module on how to load pings from raw files.

[11]:
dd_users_raw = dd.from_pandas(df_users_day, npartitions=20)
dd_week_raw = dd.from_pandas(df_users_week, npartitions=20)

Compute user stats

We first focus on the weekly dataframe because it has more than one day (but less users).

We show here how to compute the basic user stats.

You can see the mobilkit.stats.userStats documentation for details but we get back some basic stats for each user.

[12]:
# Use the .compute() to make it a pandas df
users_stats_df = mobilkit.stats.userStats(dd_week_raw).compute()
users_stats_df.head(4)
[12]:
uid min_day max_day pings daysActive daysSpanned pingsPerDay avg
0 82 2020-06-01 00:00:00+00:00 2020-06-07 00:00:00+00:00 14027 7 6 [779, 1814, 790, 1848, 852, 2096, 5848] 2003.857143
1 102 2020-06-01 00:00:00+00:00 2020-06-07 00:00:00+00:00 16598 7 6 [641, 2493, 3853, 1181, 2695, 2251, 3484] 2371.142857
2 111 2020-06-01 00:00:00+00:00 2020-06-07 00:00:00+00:00 11562 7 6 [1165, 1124, 1038, 1001, 2203, 1453, 3578] 1651.714286
3 141 2020-06-03 00:00:00+00:00 2020-06-06 00:00:00+00:00 3909 3 3 [218, 932, 2759] 1303.000000
[13]:
# We can plot the distribution of the number of pings and active days to see how many users we have in each quadrant
mobilkit.stats.plotUsersHist(users_stats_df, min_pings=100, min_days=2)
[13]:
<Axes: title={'center': 'ul: 40 - ur: 2180 - lr: 233 - ll: 71'}, xlabel='log10 #records', ylabel='timespan (days daysActive)'>
../_images/examples_mobilkit_tutorial_20_1.png

Filter users based on stats

We want to keep only users with at least 2 active days and 100 pings.

[14]:
# We either provide users filtering by hand
valid_users = list(users_stats_df.query("daysActive >= 2 & pings > 100")["uid"])
df_filtered = mobilkit.stats.filterUsersFromSet(dd_week_raw, users_set=valid_users)
[15]:
# Or we could have had the stats and filter computed at once
df_filtered, users_stats_df, valid_users = mobilkit.stats.filterUsers(dd_week_raw, minPings=100, minDaysActive=2)

Assign pings to an area

We first assign each ping to a given area passing the name of the shapefile to use.

With filterAreas=True we are discarding all the events that fall outside of our ROI.

[16]:
dd_week_with_zones, tessellation_gdf = mobilkit.spatial.tessellate(df_filtered,
                                                                   tesselation_shp=voronoi_file,
                                                                   filterAreas=True)

Compute home and work areas

Once we have each ping assigned to a given area, we can sort out the home and work areas of each user by looking where he spends most of the time during day and night-time.

We first add the isHome and isWork columns, then we pass this df to the home location function to see where an agent lives.

[17]:
# Add the home/work columns, all the events within the given hours will be considered home/work
dd_week_hw = mobilkit.stats.userHomeWork(dd_week_with_zones,
                                         homeHours=(20, 8),
                                         workHours=(9,18))
[18]:
# Compute the locations and pass them to pandas:
# - the tile_IDs of the areas of home and work;
# - the lat and lon of the home and work locations;
df_hw_locs = mobilkit.stats.userHomeWorkLocation(dd_week_hw)
df_hw_locs_pd = df_hw_locs.compute()

The synthetic case

We now merge our population estimate with the one given in the original shapefile in the POP column.

Results are not beautiful but remember that: - we are working on a small dataset (~200 original users) expanded to simulate many users in an arbitrary way; - the spatial tessellation may be different from the original one as we reconstructed it with a Voronoi tessellation;

While we focus on the Beijing synthetic case here, in the next section we will show the estimations for the Mexico usecase, where results are found to be in very good agreement with the empirical case.

[19]:
df_hw_locs_pd.head(4)
[19]:
tot_pings home_tile_ID lat_home lng_home home_pings work_tile_ID lat_work lng_work work_pings
uid
82 4297.0 106.0 40.004416 116.322909 1218.0 106.0 39.997723 116.323012 1040.0
102 16531.0 106.0 39.996643 116.325115 2471.0 106.0 40.000683 116.323300 1966.0
111 11562.0 106.0 40.000315 116.322960 2240.0 106.0 40.001965 116.322623 3066.0
141 2200.0 102.0 39.981531 116.338790 78.0 89.0 39.967705 116.339616 522.0
[20]:
population_per_area = df_hw_locs_pd.reset_index().groupby("home_tile_ID").agg({
                                                "uid": "nunique",
                                                "home_pings": "sum"}).reset_index()

population_per_area = population_per_area.rename(columns={
                                                "home_tile_ID": "tile_ID",
                                                "uid": "POP_DATA",
                                                "home_pings": "pings"})
population_per_area.head(2)
[20]:
tile_ID POP_DATA pings
0 0.0 6 1498.0
1 2.0 1 0.0
[21]:
# Merge with gdf
gdf_areas = pd.merge(tessellation_gdf, population_per_area, on="tile_ID", how="left")
gdf_areas["POP_DATA"] = gdf_areas["POP_DATA"].fillna(0)
gdf_areas.head(4)
[21]:
TID POP M F AGE0 AGE15 AGE65 Address geometry tile_ID POP_DATA pings
0 257 102402 51818 50584 11322 83441 7639 北京市大兴区清源街道 POLYGON ((116.29603 39.74097, 116.31217 39.771... 0 6.0 1498.0
1 259 168444 94211 74233 17898 144834 5712 北京市大兴区黄村 POLYGON ((116.22050 39.74097, 116.22050 39.786... 1 0.0 NaN
2 262 49612 27649 21963 4472 42847 2293 北京市大兴区瀛海 POLYGON ((116.39314 39.74097, 116.39185 39.750... 2 1.0 0.0
3 92 48076 24392 23684 4975 39191 3910 北京市丰台区南苑街道 POLYGON ((116.39185 39.75000, 116.37391 39.764... 3 2.0 0.0
[22]:
sns.lmplot(x="POP", y="POP_DATA", data=gdf_areas)
# plt.loglog()
plt.xlim(0, 250000)
plt.ylim(0, 50)

plt.xlabel("Census population")
plt.ylabel("Data population")
[22]:
Text(24.625000000000007, 0.5, 'Data population')
../_images/examples_mobilkit_tutorial_33_1.png

The Mexican case

We load the results of the population analysis in Mexico for the Puebla earthquake and see the agreement between census and mobility estimation at different aggregation levels, from the smallest (AGEB, street blocks) to the largest (Municipios, city level).

Note that these data are not included in the repository to preserve users’ privacy.

This section is inserted as an example to show the capabilities of the mobility data to measure the population spatial density.

[24]:
# Table-preview not shown so as not to disclose original dataset statistics
population_mexico_df = pd.read_csv("../../data/population_estimate_mexico.csv")
population_mexico_df.head(2)
[25]:
# The smallest aggregations (street block level)
mobilkit.viz.plot_pop(population_mexico_df, "AGEB")
plt.xlim(1e2, 3e4)
plt.ylim(1e-1, 1e2)
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.075
Model:                            OLS   Adj. R-squared:                  0.074
Method:                 Least Squares   F-statistic:                     352.2
Date:                Wed, 02 Jun 2021   Prob (F-statistic):           1.25e-75
Time:                        08:16:16   Log-Likelihood:                -208.40
No. Observations:                4369   AIC:                             420.8
Df Residuals:                    4367   BIC:                             433.6
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.5443      0.042    -13.058      0.000      -0.626      -0.463
x1             0.2211      0.012     18.766      0.000       0.198       0.244
==============================================================================
Omnibus:                      454.477   Durbin-Watson:                   1.779
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              610.535
Skew:                           0.857   Prob(JB):                    2.65e-133
Kurtosis:                       3.645   Cond. No.                         41.4
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[25]:
(0.1, 100.0)
../_images/examples_mobilkit_tutorial_36_2.png
[26]:
# Aggregate to locality (districts) and municipio (city)
population_mexico_df["CVEGEO_LOC"] = population_mexico_df["CVEGEO"].apply(lambda s: s[:-4])
population_mexico_df["CVEGEO_MUN"] = population_mexico_df["CVEGEO_LOC"].apply(lambda s: s[:-4])

urban_areas_loc_gdf = population_mexico_df.groupby("CVEGEO_LOC").agg({
        "POP_HFLB": "sum",
        "POBTOT": "sum",
        "CVEGEO_MUN": "first",
        "CVE_LOC": "first",
        "CVE_ENT": "first",
        "CVE_MUN": "first",
    }).reset_index()

urban_areas_mun_gdf = population_mexico_df.groupby("CVEGEO_MUN").agg({
        "POP_HFLB": "sum",
        "POBTOT": "sum",
        "CVE_ENT": "first",
        "CVE_MUN": "first",
    }).reset_index()
[27]:
# The locality aggregation level (district level)
mobilkit.viz.plot_pop(urban_areas_loc_gdf, "LOC")
plt.xlim(1e1,3e6)
plt.ylim(7e-1,8e3)
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.766
Model:                            OLS   Adj. R-squared:                  0.766
Method:                 Least Squares   F-statistic:                     1535.
Date:                Wed, 02 Jun 2021   Prob (F-statistic):          7.01e-150
Time:                        08:16:24   Log-Likelihood:                -113.13
No. Observations:                 470   AIC:                             230.3
Df Residuals:                     468   BIC:                             238.6
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -3.3410      0.098    -34.198      0.000      -3.533      -3.149
x1             0.9302      0.024     39.185      0.000       0.884       0.977
==============================================================================
Omnibus:                        8.056   Durbin-Watson:                   1.578
Prob(Omnibus):                  0.018   Jarque-Bera (JB):                8.871
Skew:                           0.229   Prob(JB):                       0.0118
Kurtosis:                       3.492   Cond. No.                         29.9
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[27]:
(0.7, 8000.0)
../_images/examples_mobilkit_tutorial_38_2.png
[ ]:
# The municipality aggregation level
mobilkit.viz.plot_pop(urban_areas_mun_gdf, "MUN")
plt.xlim(1e1,3e6)
plt.ylim(7e-1,8e3)
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.854
Model:                            OLS   Adj. R-squared:                  0.853
Method:                 Least Squares   F-statistic:                     1214.
Date:                Wed, 02 Jun 2021   Prob (F-statistic):           8.83e-89
Time:                        08:16:25   Log-Likelihood:                -52.923
No. Observations:                 210   AIC:                             109.8
Df Residuals:                     208   BIC:                             116.5
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -4.5717      0.156    -29.313      0.000      -4.879      -4.264
x1             1.1737      0.034     34.846      0.000       1.107       1.240
==============================================================================
Omnibus:                        1.063   Durbin-Watson:                   1.894
Prob(Omnibus):                  0.588   Jarque-Bera (JB):                0.891
Skew:                          -0.158   Prob(JB):                        0.640
Kurtosis:                       3.048   Cond. No.                         35.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
(0.7, 8000.0)
../_images/examples_mobilkit_tutorial_39_2.png
[ ]:

Displacement measures

Now we show how to measure the location of users in time.

We have to tell how many initial days to use to determine the original home location.

Then, we tell how many days to use for each window to determine the dynamical home location of each user (and how many pings we want at least for a night to be valid).

[28]:
# The parameters of the home location in window function
initial_days_home = 2
home_days_window = 2
start_date=None

# Compute running home location
running_home_df = mobilkit.temporal.homeLocationWindow(dd_week_hw,
                    initial_days_home=initial_days_home,
                    home_days_window=home_days_window,
                    start_date=None, stop_date=None)
Got the delta days distributed as: count    29331.000000
mean         3.639324
std          2.024058
min          0.000000
25%          2.000000
50%          4.000000
75%          5.000000
max          7.000000
Name: deltaDay, dtype: float64
Doing window 01 / 02
Doing window 02 / 02
[29]:
# We now have for ech user and time window (with its initial date) the location
# where he supposedly spent the night and how many pings are recorded there
running_home_df.head(4)
[29]:
pings tile_ID timeSlice uid window_date
0 9 106 0 1 2020-06-01 00:00:00+00:00
1 273 106 0 2 2020-06-01 00:00:00+00:00
2 314 106 0 4 2020-06-01 00:00:00+00:00
3 212 106 0 7 2020-06-01 00:00:00+00:00

After we determined the residing area for each user/night we use mobilkit.temporal.computeDisplacementFigures to get four objects containing the results of the analysis.

The ones we are interested in are: - pivoted_df telling for each night where a user slept; - count_users_per_area telling for each area how many users originally residing there were active and how many were displaced on that day;

[30]:
# Compute displacement figures

minimum_pings_per_night = 3

pivoted_df, original_home,\
    heaps, count_users_per_area = mobilkit.temporal.computeDisplacementFigures(
        running_home_df, minimum_pings_per_night=minimum_pings_per_night,
)

# The lat lon of the center
epicenter = [tessellation_gdf.unary_union.centroid.xy[1][0],
             tessellation_gdf.unary_union.centroid.xy[0][0]]

# Assess displacement based on distance from epicenter
fig, gdf_enriched = mobilkit.temporal.plotDisplacement(count_users_per_area, pivoted_df,
                                   tessellation_gdf,
                                   epicenter=epicenter,
                                   bins=3)
../_images/examples_mobilkit_tutorial_45_0.png
[31]:
# Viusualize the distance bins
fig, ax = plt.subplots(1,1,figsize=(12,12))
ax.set_aspect("equal")
gdf_enriched.plot("distance_bin", legend=True, ax=ax)
[31]:
<Axes: >
../_images/examples_mobilkit_tutorial_46_1.png
[32]:
# Visualize the displacement rate per area on a given date
dates_sorted = sorted(pivoted_df.columns)
selected_date_index = 2

gdf_enriched["displaced_at_date"] = gdf_enriched["tile_ID"].apply(lambda a:
                                                      count_users_per_area[a]["displaced"][selected_date_index]
                                                      / max(1, count_users_per_area[a]["active"][selected_date_index])
                                                      if a in count_users_per_area else None)

fig, ax = plt.subplots(1,1,figsize=(15,12))
ax.set_aspect("equal")
gdf_enriched.plot("displaced_at_date", legend=True, ax=ax, legend_kwds={'label': "Displaced fraction"})
plt.title("Date = %s" % dates_sorted[selected_date_index].strftime("%d/%m/%Y"))
[32]:
Text(0.5, 1.0, 'Date = 03/06/2020')
../_images/examples_mobilkit_tutorial_47_1.png

Land use

For this particular analysis we use the daily data because they map to more users (more stats).

We start assigning every ping to a location and then we compute the activity profiles.

[33]:
dd_usr_with_zones, tessellation_gdf = mobilkit.spatial.tessellate(dd_users_raw,
                                                                  tesselation_shp=voronoi_file,
                                                                  filterAreas=True)
dd_usr_with_zones.head()
[33]:
UTC acc datetime lat lng uid tile_ID
0 1591008784 1 2020-06-01 10:53:04+00:00 39.984702 116.318417 0 99
1 1591008790 1 2020-06-01 10:53:10+00:00 39.984683 116.318450 0 99
2 1591008795 1 2020-06-01 10:53:15+00:00 39.984686 116.318417 0 99
3 1591008800 1 2020-06-01 10:53:20+00:00 39.984688 116.318385 0 99
4 1591008805 1 2020-06-01 10:53:25+00:00 39.984655 116.318263 0 99
[34]:
# This is the time period over which we want to compute the average activity of an area
# normalization="total" tells to the program to normalize the activity of each area dividing it by
# the overall volume of pings or users found in all the ROI. See the docs for other normalization
# strategies.
selected_profile_period = "day"
total_profiles_df = mobilkit.temporal.computeTemporalProfile(dd_usr_with_zones, timeBin="H",
                                                             byArea=True,
                                                             profile=selected_profile_period,
                                                            normalization="total").compute()
[35]:
# We compute the residual activity of users found in a given area
signal_column = "users"
results, mappings =  mobilkit.temporal.computeResiduals(total_profiles_df,
            signal_column=signal_column, profile=selected_profile_period)

Finally we try to cluster these profiles in some groups. We use hierarchical clustering of the residual activity profiles using the cosine metric.

This plot tells us the score of the partitioning, the higer the better.

Given that we do not have many data we select n=4 clusters even though we do not have a clear maximum.

[36]:
signal_to_use = "residual"
metric = "cosine"  # The metric to be used in computing the distance matrix
results_clusters = mobilkit.tools.computeClusters(results,
                                                  signal_to_use,
                                                  metric=metric,
                                                  nClusters=range(2,11))

# Visualize score
ax_score = mobilkit.tools.checkScore(results_clusters)
Done n clusters = 02
Done n clusters = 03
Done n clusters = 04
Done n clusters = 05
Done n clusters = 06
Done n clusters = 07
Done n clusters = 08
Done n clusters = 09
Done n clusters = 10
../_images/examples_mobilkit_tutorial_53_1.png
[37]:
# Plot clusters profiles and map: We select 4 clusters and plot their profiles and map.
nClusters = 4
ax = mobilkit.tools.visualizeClustersProfiles(results_clusters,
            nClusts=nClusters, showMean=False, showMedian=True, showCurves=True)
../_images/examples_mobilkit_tutorial_54_0.png
[38]:
# We compare all the medians activity profiles of the 4 clusters
ax = mobilkit.tools.visualizeClustersProfiles(results_clusters,
            nClusts=nClusters, showMean=False, showMedian=True, showCurves=False, together=True)
../_images/examples_mobilkit_tutorial_55_0.png
[39]:
# We plot the map of the clusters, with the same colors as before.
gdf_update, ax_mappa = mobilkit.tools.plotClustersMap(tessellation_gdf, results_clusters,
                                                      mappings, nClusts=nClusters)
../_images/examples_mobilkit_tutorial_56_0.png
[ ]: