1. Overview
In the roughly twenty years that Tethys-based GAStech has been operating a natural gas production site in the island country of Kronos, it has produced remarkable profits and developed strong relationships with the government of Kronos. However, GAStech has not been as successful in demonstrating environmental stewardship.
In January, 2014, the leaders of GAStech are celebrating their new-found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.
2. Objectives
Both historical vehicle tracking data and transaction data from loyalty and credit card will be used to observe the following issues:
- The most popular locations and when they are popular
- Infer the owner of each credit card and loyalty card
- Identify potential informal or unofficial relationships among GASTech personnel
- Analyze suspicious activity of the missing personnel prior to the disappearance
3. Data Sources
The data source are available publicly on VAST Challenge 2021 website under the sub section Mini-Challenge 2. The data used for the project are as follows:
library("tidyverse")
library("readxl")
emp_records <- read_excel("datasets/EmployeeRecords.xlsx")
car <- read_csv("datasets/car-assignments.csv")
cc <- read_csv("datasets/cc_data.csv")
gps <- read_csv("datasets/gps.csv")
loyalty <- read_csv("datasets/loyalty_data.csv")
- Geospatial maps of Abila and Kronos Island
- 54 employees details of GAStech
knitr::kable(head(emp_records[,1:5]), "simple")
Bramar |
Mat |
1981-12-19 |
Tethys |
Male |
Ribera |
Anda |
1975-11-17 |
Tethys |
Female |
Pantanal |
Rachel |
1984-08-22 |
Tethys |
Female |
Lagos |
Linda |
1980-01-26 |
Tethys |
Female |
Mies Haber |
Ruscella |
1964-04-26 |
Kronos |
Female |
Forluniau |
Carla |
1981-06-02 |
Kronos |
Female |
- 44 employees car assignments
Calixto |
Nils |
1 |
Information Technology |
IT Helpdesk |
Azada |
Lars |
2 |
Engineering |
Engineer |
Balas |
Felix |
3 |
Engineering |
Engineer |
Barranco |
Ingrid |
4 |
Executive |
SVP/CFO |
Baza |
Isak |
5 |
Information Technology |
IT Technician |
Bergen |
Linnea |
6 |
Information Technology |
IT Group Manager |
- 1490 credit card transactions for 55 unique credit card numbers
1/6/2014 7:28 |
Brew’ve Been Served |
11.34 |
4795 |
1/6/2014 7:34 |
Hallowed Grounds |
52.22 |
7108 |
1/6/2014 7:35 |
Brew’ve Been Served |
8.33 |
6816 |
1/6/2014 7:36 |
Hallowed Grounds |
16.72 |
9617 |
1/6/2014 7:37 |
Brew’ve Been Served |
4.24 |
7384 |
1/6/2014 7:38 |
Brew’ve Been Served |
4.17 |
5368 |
- 685169 GPS log data from 6 Jan 2014 to 19 Jan 2014
01/06/2014 06:28:01 |
35 |
36.07623 |
24.87469 |
01/06/2014 06:28:01 |
35 |
36.07622 |
24.87460 |
01/06/2014 06:28:03 |
35 |
36.07621 |
24.87444 |
01/06/2014 06:28:05 |
35 |
36.07622 |
24.87425 |
01/06/2014 06:28:06 |
35 |
36.07621 |
24.87417 |
01/06/2014 06:28:07 |
35 |
36.07619 |
24.87406 |
- 1392 loyalty card transactions for 54 unique loyalty card numbers
01/06/2014 |
Brew’ve Been Served |
4.17 |
L2247 |
01/06/2014 |
Brew’ve Been Served |
9.60 |
L9406 |
01/06/2014 |
Hallowed Grounds |
16.53 |
L8328 |
01/06/2014 |
Coffee Shack |
11.51 |
L6417 |
01/06/2014 |
Hallowed Grounds |
12.93 |
L1107 |
01/06/2014 |
Brew’ve Been Served |
4.27 |
L4034 |
4. Literature Review
4.1 Past MITB Visual Analytics project were reviewed and evaluated prior to the assignment.
- In the assignment by (Ong 2016) and (Guan 2016), they utilised heatmap to plot frequency by timeseries. The heatmap used a color gradient to fill the boxes which showed the intensity and volume of the frequency and count over a time period. The visual overview allowed readers to easily determine patterns and trends over a timeseries period. For example from (Ong 2016) report, it revealed that there were more messages received over the weekend, especially on Sundays. From (Guan 2016) report, it revealed that camping3 and camping6 has higher number of records than the other camps.
- However, static heatmap are not reader friendly enough to determine what was the count at specific time slots. As each heatmap box represents a discrete count by using a gradient color fill, it was difficult to accurately determine the specific count. Making the heatmap plot interactive would allow the details to be displayed at the tooltip when hover across. This would allow granularity data to be more well-presented in the report.
4.2 The solutions submitted for VAST challenge 2014 were also reviewed on their repository webpage(“VAST Challenge 2014:MC2 - Patterns of Life Analysis” 2014).
Submission entry from the University of Buenos Aires - Tralice (Villordo et al. 2014) utilised a multi-layered horizontal bar graph that showed the GPS movement for each employment type. The background highlight to indicate the weekend provided a good contrast and representation for the differentiation between weekdays and weekends.
Submission entry from KU Leuven (Chua et al. 2014) used a boxplot to visualise the credit card spending price at each location. Boxplot allows for distinct and clear visualisation of outliers in the transaction price. However, boxplot also provides informative details such as the median, 25 and 75 percentile price for each location which was not reflected in the boxplot. Furthermore, the 10,000 dollars outliers caused the y-axis tick marks to be large and each individual boxplot became too small on the plot.
- Currently, as the spending price per transaction at each location are much lower in comparison to the tick marks, the features such as median and percentiles of the boxplot could not be properly represented on the plot. Hence, the boxplot visualisation can be improved by performing logarithmic transformation for the y-axis to better represent the boxplot for prices at each location. It will also allow readers to see the outliers for each location too.
- Interactive boxplot can be implemented to provide the micro-data such as median, 25 and 75 percentile prices using the tooltip when users hover over the points. This reduces cluttering of the plot while providing micro-data to the readers.
Submission entry from the University of Calgary (Sahaf et al. 2014) utilised parallel coordinate plot to show the interaction and relationship between different categorical and numerical variables. The visualisation provides story telling insights between the different variables.
Most past submission utilised map and overlay with GPS lines and points to show the movement of each car. I would like to highlight the submission from Central South University (Zhao et al. 2014) where the map utilised different colors for lines and dots to present their findings. The variation in colors allowed for better visualisation and clarity of the different employees information to be highlighted to gain insights. However, due to the overlap of GPS data such as the location and GPS lines, an interactive map with tooltip will allow for better interpretation of the findings.
Submission from Fraunhofer IAIS and City University London (Andrienko, Andrienko, and Fuchs 2014) and RBEI-Bangalore (Singhal et al. 2014) both used network cluster and analysis to investigate the relationships between GAStech employees. Fraunhofer IAIS university used an ego-centric graph whereas RBEI used a combination of fragmented and node-only layout to visual the relationship by connecting employees. Network analysis is an informative visualisation that provides an overview of potential relationships between employees or even connecting employees to different mediums such as the locations or emails.
- Although network analysis provides an overview of the relationships between nodes, usually the plot will be cluttered which make it difficult to drill down to specific or individual relationships. An alternative would be to make the plot interactive so that readers will be able to drill down on specific areas to investigate the relationships.
Submission from University of Buenos Aires - Alcoser (Flores, Lopez, and Forero 2014) used the sankey diagrams to visualise the locations where employees frequently visits. Sankey diagram shows how the quantities flow from one state to another and is usually used to show flows or processes.
- Alluvial Plot is an alternative to Sankey Diagrams where it shows population of facts allocated across categorical dimensions. Depending on the visualisation and context, we can use either Sankey or Alluvial plot for visualisation.
Submission from University of Bueons Aires - Croceri (Croceri and Guzzi 2014) used a scatter plot to show the distance average speed against the speed for each employees route. The visualisation displayed extreme outliers effectively based on the car speed.
- Interactive scatter plot split into different categories and different conditions might present more useful insights to infer possible deductions. Example by drilling down on certain departments or a specific time period etc.
The various use cases and visualisation techniques were all reviewed and evaluated to integrate into the investigation works for the report.
To be continued in Part 2….