Should I do an explainer about apps, SDKs, and the data economy?
Especially re: TikTok I see so much misinformation about how the personal data ecosystem operates and who the true big players are.
(Hint: companies youâve never heard of have more data about you than FAANGs)
Especially re: TikTok I see so much misinformation about how the personal data ecosystem operates and who the true big players are.
(Hint: companies youâve never heard of have more data about you than FAANGs)
Like â do people generally know about Ad-IDs, MSISDNs, IMEIs, or EIDs?


In the data economy, thereâs a fundamental problem if you want to build a full profile of a person that combines online and offline data.
Online data is good, but not great, at mapping who you are and what ads to show you 1/
Most systems are aware of this and designed to keep online and offline identities separate.
For example, every mobile phone has multiple identifiers: IMEI, EID, etc.
Advertises and apps â in theory â shouldnât see these because they are impossible to change. 2/
For example, every mobile phone has multiple identifiers: IMEI, EID, etc.
Advertises and apps â in theory â shouldnât see these because they are impossible to change. 2/
Your phone is also identified by an Ad-ID, which is a standard created by the @iab.
These allow advertisers to gather data about you, but A) are anonymous and B) can be reset by the end users.
This sets up a fundamental battle in digital advertising. 3/ https://www.intego.com/mac-security-blog/how-to-reset-the-advertising-identifier-on-your-mac-ios-device-or-apple-tv/
These allow advertisers to gather data about you, but A) are anonymous and B) can be reset by the end users.
This sets up a fundamental battle in digital advertising. 3/ https://www.intego.com/mac-security-blog/how-to-reset-the-advertising-identifier-on-your-mac-ios-device-or-apple-tv/
So how do advertisers/platforms get so much data about users?
Three sources:
1) Sofrwate development kits (SDKs) embedded into an app.
2) Data brokers.
3) Usage on the platform/app.
4/
Three sources:
1) Sofrwate development kits (SDKs) embedded into an app.
2) Data brokers.
3) Usage on the platform/app.
4/
1) SDKs are little bits of code that are embedded into apps. Users may never know a particular app has an SDK embedded.
Have a free weather app that asks to use your location?
That app probably has a bunch of SDKs that capture and resell your location data. 5/
Have a free weather app that asks to use your location?
That app probably has a bunch of SDKs that capture and resell your location data. 5/
2) Data brokers. There are a bunch of companies that quietly gather and resell a ton of data about people.
Examples include Experian, Oracle Data Cloud, Acxiom.
They Hoover up records about your offline behavior.
6/
Examples include Experian, Oracle Data Cloud, Acxiom.
They Hoover up records about your offline behavior.
6/
3) Platforms.
NO INSIDER KNOWLEDGE
but itâs reasonable to assume apps register everything you click, look at, and interact with to build a profile of you.
But thereâs a problem. 1) and 3) probably know your devicesâ IDs, but they canât integrate with 2) 7/


But thereâs a problem. 1) and 3) probably know your devicesâ IDs, but they canât integrate with 2) 7/
This creates incentives to learn your offline identity and associate that with your online identifiers. In advertiser speak, this allows for a far broader view of a person for segmentation.
8/
8/
Iâm not an expert on TikTok, but a valid concern is that it enables the Chinese government to associate and merge your online and offline identities.
Thatâs scary! Because you canât change your name.
9/
Thatâs scary! Because you canât change your name.
9/
But pending more knowledge, I donât see it as fundamentally different than what corporate policies such as Facebookâs real name policy attempt to accomplish.
9/9
9/9
Addendum:
When reading an article about data, a good question is whether the data are from 1), 2), or 3).
If itâs from 1) like the phone location data articles, the vendor is very unlikely to be able to associate them with your offline identity. https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html
When reading an article about data, a good question is whether the data are from 1), 2), or 3).
If itâs from 1) like the phone location data articles, the vendor is very unlikely to be able to associate them with your offline identity. https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html
If you read about a company or institution selling offline data, the customer is likely 2) or maybe 3). But likely 2).
Why?
Offline data is useless unless you can associate it with something. https://www.vice.com/en_us/article/jgxanx/lawmakers-california-dmv-selling-data
Why?
Offline data is useless unless you can associate it with something. https://www.vice.com/en_us/article/jgxanx/lawmakers-california-dmv-selling-data