I think sorting the file will disturb the order and the first unique url found won't be the same as the first unique one in the actual file.
according to the fact stated by u:
array-1 store the size of array
But it's not displaying the size
1. Introduce a kafka server between the data coming from server(devices of driver will send request to server in the form of s1,s2).- techdebugger.zg September 24, 2016
2. Each ride message will be sent in the topic created in Kafka.
3. Topic is partitioned based on the city, so that consumers can consume the data in a multi-threaded fashion.
4. These consumers will write the data in the table
5. Here I am using RDBS database. We can have two tables , one is CUSTOMER, RIDE_INFO
CUSTOMER table will contain all the meta data of the customer for eg: cust_id, name, age, address, create_date, last_ride_data
RIDE_INFO will have all the details of ride using cust_id as the foreign key, and other coloumns as CITY, RIDE_TIME, FARE, CUST_ID
In order to optimize the query , we need to partition it by data and then sub-partition it by city
We can also use sharding which can help in horizontal scaling. Sharding should be done based on city, so that all the data for a particular city sits in one server.
6. Now we can create the group by queries on the RIDE_INFO table, assuming 25-SEP-16 is today's date
To get total trips: select count(*), city from RIDE_INFO group by city where ride_time > '25-SEP-16'
To get total fare : select sum(fare), city from RIDE_INFO group by city where ride_time > '25-SEP-16'
To get fare from old clients: select sum(fare), city from RIDE_INFO ri,CUSTOMER c group by city where ride_time > '25-SEP-16' and ri.cust_id = c.cust_id and c.create_date < '25-SEP-16'
To get fare from new clients: select sum(fare), city from RIDE_INFO ri,CUSTOMER c group by city where ride_time > '25-SEP-16' and ri.cust_id = c.cust_id and c.create_date > '25-SEP-16'
7. In order to display in the UI, use MVC model