I am working out a submission to the Land Transport Authority for their data analytic project.
I will ask them to provide the following data to me:
I will ask them to provide the following data to me:
a) Passengers boarding and alighting from buses for 1 month - showing the passenger id, bus id, time and bus stop.
b) Buses arriving at bus stops showing the bus id, time and bus stop.
I expect to get 8 million records per day X 30 days or nearly 240 million records under (a) and 4,600 X 10 X 60 = 3 million records under (b). This is a huge amount of data.
I will compute the average delay time for each bus, adjusted by the time of the day. For example, we expect a bigger delay during peak hours, so this has to be adjusted.
The buses with higher delay can be identified. It could suggest that the bus is in a poorer condition and need more thorough maintenance.
Some buses have faulty tracking systems, leading to non-reporting of their locations. They will be identified and the tracking system can be repaired or replaced.
I will also compute the average occupancy rate for each bus service and for peak and off-peak hours. We can transfer buses from services with lower occupancy rates to services with higher occupancy rates, so that the occupancy rates is more evenly distributed.
If the average occupany rate is higher than a benchmark, more buses should be added to the system.
The data will also provide me with the means to carry out a simulation to change the bus routes and to introduce express services to reduce the travelling time.
Does this sound complicated? It seems to be quite a common sense approach, right?
Do you have any suggestion on what other analysis might be useful?
Do you have any suggestion on what other analysis might be useful?
b) Buses arriving at bus stops showing the bus id, time and bus stop.
I expect to get 8 million records per day X 30 days or nearly 240 million records under (a) and 4,600 X 10 X 60 = 3 million records under (b). This is a huge amount of data.
I will compute the average delay time for each bus, adjusted by the time of the day. For example, we expect a bigger delay during peak hours, so this has to be adjusted.
The buses with higher delay can be identified. It could suggest that the bus is in a poorer condition and need more thorough maintenance.
Some buses have faulty tracking systems, leading to non-reporting of their locations. They will be identified and the tracking system can be repaired or replaced.
I will also compute the average occupancy rate for each bus service and for peak and off-peak hours. We can transfer buses from services with lower occupancy rates to services with higher occupancy rates, so that the occupancy rates is more evenly distributed.
If the average occupany rate is higher than a benchmark, more buses should be added to the system.
The data will also provide me with the means to carry out a simulation to change the bus routes and to introduce express services to reduce the travelling time.
Does this sound complicated? It seems to be quite a common sense approach, right?
Do you have any suggestion on what other analysis might be useful?
Do you have any suggestion on what other analysis might be useful?
amazing mr. tan. with so much dedication and spread on topics, where do you find time? :)
ReplyDelete