Analysis on apps found on the Google Play store and the Apple App store


The goal of this project is to analyze data within the app stores to see which type of app attracts the most user and is able to retain that number for a period of time. The main apps we will be focusing on are free who generate their revenue through in-app purchases and in-app ads.

  • We begin by reading in both the AppleStore.csv and googleplaystore.csv into our session using open()
  • For both dataset, we save the main dataset as a list without the header i.e apple and android
  • We then assign the headers for the datasets as apple_header and android_header respectivly

With the datasets assaigned to variables, we now want to look at a portion of list to get a sense of the columns being used to quatify each app. With this in mind we define a function explore_data which takes the parametres dataset, start, end and rows_and_columns which loops through the dataset from the specified start and end, printing each row entry using print(). To distinguish between each row entry, we use print('\n') to insert a space between rows. We use an if statement on the parameter rows_and_columns to print the number of rows using len(dataset) and the number of columns using len(dataset[0]). By defualt the parameter rows_and_columns is set to False and so the if statement is not executed.

Below shows the first 2 entries for both the Apple and Google store with the column header included. The number of rows and columns are also printed out for each dataset.

Google Play Column Name: Link

Column NameDescription
CategoryCategory the app belongs to
RatingOverall user rating of the app (as when scraped)
ReviewsNumber of user reviews for the app (as when scraped)
SizeSize of the app (as when scraped)
InstallsNumber of user downloads/installs for the app (as when scraped)
TypePaid or Free
PricePrice of the app (as when scraped)
Content RatingAge group the app is targeted at - Children / Mature 21+ / Adult
GenresAn app can belong to multiple genres (apart from its main category)
Last UpdatedDate when the app was last updated on Play Store (as when scraped)
Current VerCurrent version of the app available on Play Store (as when scraped)
Android VerMin required Android version (as when scraped)

Apple App Store Column Names: Link

Column NameDescription
idApp ID
track_nameApp Name
size_bytesSize (in Bytes)
currencyCurrency Type
pricePrice amount
rating_count_totUser Rating counts (for all version)
rating_count_verUser Rating counts (for current version)
user_ratingAverage User Rating value (for all version)
user_rating_verAverage User Rating value (for current version)
verLatest version code
cont_ratingContent Rating
prime_genrePrimary Genre
sup_devices.numNumber of supporting devices
ipadSc_urls.numNumber of screenshots showed for display
lang.numNumber of supported languages
vpp_licVpp Device Based Licensing Enabled

The column that will be useful in our analysis are:

Google Play

  • AppApplication
  • Catergory
  • Rating
  • Reviews
  • Installs
  • Type
  • Content Rating

Apple App Store

  • track_name
  • price
  • rating_count_tot
  • user_rating
  • cont_rating
  • prime_genre

Looking through the discussion thread for the android dataset link , an error has occurred with the entry for Life Made WI-FI Touchscreen Photo Frame where the entry in the Category column has been entered incorrectly as shown in the print out below. Because we don't know what the true value of the entry it's easier to use the function del() to delete the entire row from the dataset.

Another problem that needs to be sorted out is the number of duplicated entries in each datasets. To see the extent of this problem, we combine a for loop and an if statement over each datasets. Assigning two empty lists, one for duplicate entries and one for unique entries, we can use append() to add the entry to either list depending on the condition. Here, the condition is if the entry is found in the unique list, it is added to the duplicate list. Otherwise it is added to the unique list. Below we executed the code on both datasets and found that there were a total of 1181 cases of duplicated entries in the Google play dataset where as in the Apple app store dataset, there were none.