Week #6

Midterm

I was going to try and continue my eye detection thing for midterm, but last class the presentation Jiaqi and I worked on seemed to have been engaging and interesting to people, so instead of stopping with that we decided to improve it for our midterm.

UNDER SURVEILLANCE →

We started by refining our goal — raising awareness of how seemingly unsuspecting personal data may be used to manipulate web users. We chose to do that through showing people all the (somewhat easily) obtainable information we can possibly get on them just by them entering our website. (Side note: we didn't actually show all the possible information because a lot of it didn't seem very useful for the way we used it. That said, no doubt using all data points in conjunction with one another might prove very useful in a different setup). We then took all of that data, sent it to ChatGPT and asked it to use that information to infer any and all kinds of things about the user.

Some improvements from last week: we came up with a "report" design that lists all the data, while the ChatGPT response is shown as the "conclusion" to that report. We also looked for additional ways to gather information. We incorporated cookies to show how many times the client visited this site. We also made use of the Whois API to get information about the network the client is using. This is immensely useful in some cases — for example when accessing the web at school you can actually tell the client is using NYU's network which could suggest pointers to their affiliation, occupation, physical whereabouts and more. We also added a download button for downloading the report in a pdf file, and of course — refined the design...

Technical stuff

It took a while to figure out how to incorporate both the ChatGPT API and the Whois API, but that worked relatively quickly. There wasn't one particular thing that was causing a lot of issues, but rather many small things that worked and then didn't and then worked again. I think the main problem was that at some point the code got quite long and quite cluttered, and it was hard to tell what's leading to what and in what order, especially as we were working together on the same files at the same time. Most of our issues were the kind where something is accidentally "undefined", suggesting data that isn't being sent, or isn't being sent correctly. Those were relatively easier to fix as we just followed the path of the data and figured where the problem was.

The issues that were more difficult to solve were things like internal server errors where we couldn't tell what caused the issue. At some point we had to start just removing lines of code until we didn't see the error anymore, then redo some parts and unfortunately give up on others.

We made an object for each client that is made of key-value pairs of all the data points. The data itself isn't being gathered in the same manner and all at once, so the information is being added to that object in several different parts of the code. For example, this is data that is being gathered from the browser upon connection to the web socket:

clientDataCollection = {

 browser: navigator.userAgent,

  browserName: navigator.appName,

  browserEngine: navigator.product,

  browserVersion: navigator.userAgent,

  browserLanguage: navigator.language,

  scrColorDepth: screen.colorDepth,

  windowWidth: window.innerWidth,

  windowHeight: window.innerHeight,

  timeOpened: new Date(),

  timezone: new Date().getTimezoneOffset() / 60,

  previousSites: history.length,

}

* Unfortunately we had to get rid of the geolocation data — it was causing permission issues, was sometimes crashing the entire thing, and in any case it meant having to ask the user for permission to get that data which makes it not that interesting to show as it's clearly consensual, and the point was showing the more "invisible" stuff anyway.

All the network data (things we can know based on the IP address) is gathered through the whois API like so:

function parseWhoisData(whoisData) {

 let parsedData = {};

 let netNameMatch = whoisData.match(/NetName:\s*(.+)/);

 if (netNameMatch) {

     parsedData.netName = netNameMatch[1];

     document.getElementById("network-name").innerHTML = netNameMatch[1];

     }

 let orgNameMatch = whoisData.match(/OrgName:\s*(.+)/);

 if (orgNameMatch) {

     parsedData.orgName = orgNameMatch[1];

     document.getElementById("org-id").innerHTML = orgNameMatch[1];

     }

 let netRangeMatch = whoisData.match(/NetRange:\s*(.+)/);

 if (netRangeMatch) {

     parsedData.netRange = netRangeMatch[1];

     document.getElementById("network-range").innerHTML = netRangeMatch[1];

     }

 let cidrMatch = whoisData.match(/CIDR:\s*(.+)/);

 if (cidrMatch) {

     parsedData.cidr = cidrMatch[1];

     document.getElementById("cidr").innerHTML = cidrMatch[1];

     }

 let netTypeMatch = whoisData.match(/NetType:\s*(.+)/);

 if (netTypeMatch) {

     parsedData.netType = netTypeMatch[1];

     document.getElementById("network-type").innerHTML = netTypeMatch[1];

     }

 let orgMatch = whoisData.match(/Organization:\s*(.+)/);

 if (orgMatch) {

     parsedData.organization = orgMatch[1];

     document.getElementById("organization").innerHTML = orgMatch[1];

     }

 let regDateMatch = whoisData.match(/RegDate:\s*(.+)/);

 if (regDateMatch) {

     parsedData.regDate = regDateMatch[1];

     document.getElementById("reg-date").innerHTML = regDateMatch[1];

     }

 let updatedMatch = whoisData.match(/Updated:\s*(.+)/);

 if (updatedMatch) {

     parsedData.updated = updatedMatch[1];

     document.getElementById("updated").innerHTML = updatedMatch[1];

     }

 let orgIdMatch = whoisData.match(/OrgId:\s*(.+)/);

 if (orgIdMatch) {

     parsedData.orgId = orgIdMatch[1];

     document.getElementById("org-id").innerHTML = orgIdMatch[1];

     }

 let addressMatch = whoisData.match(/Address:\s*(.+)/);

 if (addressMatch) {

     parsedData.address = addressMatch[1];

     document.getElementById("address").innerHTML = addressMatch[1];

     }

 let cityMatch = whoisData.match(/City:\s*(.+)/);

 if (cityMatch) {

     parsedData.city = cityMatch[1];

     document.getElementById("city").innerHTML = cityMatch[1];

     }

 let stateProvMatch = whoisData.match(/StateProv:\s*(.+)/);

 if (stateProvMatch) {

     parsedData.state = stateProvMatch[1];

     document.getElementById("state").innerHTML = stateProvMatch[1];

     }

 let postalCodeMatch = whoisData.match(/PostalCode:\s*(.+)/);

 if (postalCodeMatch) {

     parsedData.postalCode = postalCodeMatch[1];

     document.getElementById("postal-code").innerHTML = postalCodeMatch[1];

     }

 let countryMatch = whoisData.match(/Country:\s*(.+)/);

 if (countryMatch) {

     parsedData.country = countryMatch[1];

     document.getElementById("country").innerHTML = countryMatch[1];

     }

         return parsedData;

    }

  });

A quick overview of the weird looking match method — take for example this line:

let countryMatch = whoisData.match(/Country:\s*(.+)/);

whoisData.match() uses a regular expression (/Country:\s*(.+)/) to search within the whoisData string.

/Country:\s*(.+)/ is the regex pattern where:

  • Country: looks for this exact text.
  • \s* matches zero or more whitespace characters following "Country:".
  • (.+) is a capturing group that matches one or more characters after the whitespace, capturing the actual country name. This is the part of the text right after "Country:" and the whitespace.
  • If a match is found, countryMatch will be an array where:
    • countryMatch[0] contains the entire matched string (like "Country: United States").
    • countryMatch[1] contains the first capturing group, which is the country name (like "United States").

The Whois raw data is in the form of a long string so we need to extract the information we're interested in and parsing it so it's usable to us. We then need to add each piece of data to the ClientDataCollection object in the form of key-value pairs. The code above shows how we look for the specific data points we're interested in and also setting their value in the corresponding HTML element to show on the page. The parsedData is sent to another function that adds the new information to the clientDataCollection object.

Of course, once we have the complete clientDataCollection we're converting the object back to a string so it can be sent to ChatGPT along with instructions on what it should do with that information.

It's important to note that the network information is dependent on the IP address given by the router, and for that reason isn't always useful. But (and it's a big but) — possibly, when used in conjunction with cookies (which might be phased out of circulation soon though) and other datapoints, it might be possible to assume it's the same client even if they connect from different locations — which in turn might actually be used to assume even more about the user itself. And this is just the "simple" stuff anyone with a website can get...

Example of when network data isn't always useful, but it could very well be whenever the user connects from an organization's network such as a workplace, university, libraries, etc.

Considerations for future enhancements

  1. I think the design can be better. That means the visual design but mostly the conceptual design and the copy. The words "official report" might come off as a little cringy and trying too hard, and we could examine different approaches to what could be more effective ways to convey our message.
  2. The way we built the data structure is supposed to allow us to have multiple clients' data at the same time. But we aren't sure how to handle that conceptually. While it offers exciting new opportunities (such as showing people data about other people currently online), it doesn't work perfectly with the current goal.
  3. The ChatGPT instructions we are sending could be further improved as well. We're currently guiding it a little bit, but given more time I think we could give better constructed instructions that might yield more interesting results.
  4. Researching more ways to gather data about our users. Possibly incorporating live webcam info in the form of sentiment analysis, image analysis / recognition tools, and mouse movement. (Can we listen to people and analyze that?).
  5. Testing whether or not it's more effective to "disguise" ourselves as something else or just be straightforward as we are now. This is slightly worrying because some parts of this work are not explicitly legal but this is only done in the scope of an ITP class.
  6. Room for more thoughts we'll definitely have after showing it to people!
X button icon

Jasmine Nackash is a multidisciplinary designer and developer intereseted in creating unique and innovative experiences.