Currently we have three guideline documents for the Open Humans community:
Some Open Humans members generously make their data public under the Creative Commons Zero Public Domain Dedication (v1.0 or later). The following usage guidelines are based on goodwill. These are not a legal contract, but we request you follow these guidelines when using data from our project.
If an individual has not intentionally shared their name/identity in their Open Humans profile, you should not make specific efforts to re-identify that individual from this data unless you meet the following guidelines:
If you have suggestions for changes to these guidelines, please feel free to contact us.
Members of Open Humans must choose names and usernames that meet the following guidelines. These names are public and can uniquely identify their account.
You may use your real name as your name or username (or both). You are not required to reveal your real name, but remember that your account's data and information might still be highly identifiable.
You may not use a name that implies you are a specific, identifiable person, unless that is your real name. If you have the same name as a well-known person to whom you are unrelated and are using your real name then you should state clearly on your userpage that you are unrelated to the well-known person.
Examples of disruptive or offensive names or usernames are names that contain or imply profanity or personal attacks.
Your name or username should not promote an organization or product.
Your name or username should not be otherwise misleading or confusing. For example, you may not have a username that leads people to believe the account has permissions it does not have (e.g. "administrator" or "moderator").
We're also flattered by the enthusiasm demonstrated by usernames similar to our own organization (e.g. "OpenHuman1"), but we believe these could be misleading and are therefore not allowed.
Open Humans has the following practices that it expects connected studies and other activities to follow.
Explain the data you'll receive
Give a plain English list of the data your activity will access and store. Describe the potential sensitivity and identifiability of this data. Give these lists to your participants or users, and (if you are a study) to your IRB or equivalent ethics board.
For example, instead of saying:
"We will access and store your Hypothetical Diet Tracker App data."
Say:
"We will access and store the following data from Hypothetical Diet Tracker App: your ZIP code, food diary, and weight log data."
Explain what you will do with the data you'll receive
Give a plain summary that explains what you will do with the data you will access. Describe the kind of study or activity you are running and why you would like to access the data.
For example say:
"We will use this data to explore whether there is a correlation between a person’s location and their diet/weight."
Explain your data privacy and security
You are responsible for how your activity manages data.
Give a plain English description of how you will manage the data. Explain whether that data is identified or potentially identifiable, its sensitivity, and other privacy and security issues that may be relevant. Share this with your participants or users, and (if you are a study) with your IRB or equivalent ethics board.
For example:
Explain what happens with the data after a user leaves your activity
Users can leave your activity on Open Humans at any time. Explain what you will do with their data after this happens. (For example: will you delete your copies of their data?)
Be aware of existing de-identification standards
You should be aware of what types of data are considered "identifiable" when you're deciding which data to collect and how to manage it. Although you may have access to data without explicit personal identifiers, that data can still be highly identifiable or become identifiable if combined with other data elements. Explain how this could happen with the data you are requesting from the individual users.
"De-identification" refers to processing personal data to make it very difficult or impossible to re-identify an individual. Open Humans does NOT de-identify data. The most well known standard for data de-identification is HIPAA's safe harbor guidelines.
Don't ask for more data than you need
When you're requesting data and information, be considerate. Don't needlessly increase the identifiability and/or sensitivity of the data you'll be collecting.
For example, avoid unnecessary granularity that makes data more identifiable. If someone's year of birth is sufficient for your activity, don't ask for the month and day.
Share data with activity members
Open Humans supports the philosophy of "equal access": when generating data about individuals, we should try to give them access to that data. For example, we would like to support a study that wished to give their participants access to resulting raw genome data.
Activities can use our APIs to upload data for their activity members. Your data will be private in their account, where they will be able to manage it as an additional data source.
Organize data according to type
If you have data types that are very different, consider sharing them as separate units (e.g. "sequencing data" and "survey data") to facilitate your participants' downstream management of that data.
Minimize the use of personal data
It's trivial to identify a data set if someone's name or email is included. Avoid collecting or maintaining this information, if possible. When you do collect such information, try to minimize its use (e.g. don't include it in data analysis files).
Use HTTPS
Use HTTPS (HTTP over SSL) to encrypt transmissions of data to and from your website. This is required to protect user information, tokens, passwords, and other sensitive data in transit.
If you are running your own website, your SSL should be audited for using weak encryption algorithms and support for perfect forward secrecy with a tool like Qualys SSL Server Test.
Keep secrets secret
Your activity will have secret keys, codes, and tokens, that are used to authenticate identity and encrypt interactions. These MUST be kept secret (e.g. as local files or environment variable). You should use encrypted communications to share these with other administrators.
If a secret is accidentally leaked, e.g. in a code commit (even if private!), make sure removal is complete (e.g. using git-filter-branch) but – more importantly – make sure this secret is invalidated and a new key or token is generated.
Monitor and limit admin access
Have a policy for who has administrative access to servers and data. Revoke access when it is no longer needed.
Use standard software and services
Security is hard to implement correctly, and standard web frameworks and packages exist that implement these for you (e.g. password hashing). Standard services, like platform cloud hosting, can also help by implementing and updating standard security tools (e.g. SSL).
Stay up to date
Security practices constantly need updating. Be sure to update operating systems and software packages to stay up to date with the latest security updates.
Hash passwords
If you're using passwords for account management, there is no need to store them. Use well-established salt and cryptographic hash functions (e.g. bcrypt) to verify passwords without storing them, to minimize the damage a database breach could cause.
Backup your database
Perform regular backups, and regularly test the data restoration process. It's easy to think you are performing backups correctly, until it's too late.