Community Guidelines

Currently we have three guideline documents for the Open Humans community:

Public Data Guidelines

Some Open Humans members generously make their data public under the Creative Commons Zero Public Domain Dedication (v1.0 or later). The following usage guidelines are based on goodwill. These are not a legal contract, but we request you follow these guidelines when using data from our project.

Don't re-identify

If an individual has not intentionally shared their name/identity in their Open Humans profile, you should not make specific efforts to re-identify that individual from this data unless you meet the following guidelines:

  • You inform Open Humans of your plans to perform this work.
  • An ethics review board (e.g. an institutional review board) reviews and approves your study as human subjects research (i.e. it is not considered exempt).
  • We privately receive any reports you plan to publish at least two weeks prior to publication and have an opportunity to contact you and your ethics board with concerns.

Acknowledge and cite where possible

  • Please acknowledge and cite the researchers who created the datasets shared by participants.
  • Do not misrepresent Open Humans data as something you generated yourself.
  • Please include a reference to these guidelines in any re-distribution of our data.

Share knowledge

  • Use standard identifiers where possible to assist others interested in finding work related to a sample or data set, and to enable participants to track research derived from their contributions.
  • We encourage open access publication when possible.

Use at your own risk

  • Our data is offered "as is", without warranty of any kind.
  • Our database will change, as participants may choose to withdraw and/or remove data.
  • Any use you make of the data must conform to all applicable laws and regulations in a given jurisdiction.

If you have suggestions for changes to these guidelines, please feel free to contact us.

Naming Guidelines

Members of Open Humans must choose names and usernames that meet the following guidelines. These names are public and can uniquely identify their account.

Real names are allowed, but not required

You may use your real name as your name or username (or both). You are not required to reveal your real name, but remember that your account's data and information might still be highly identifiable.

No impersonation

You may not use a name that implies you are a specific, identifiable person, unless that is your real name. If you have the same name as a well-known person to whom you are unrelated and are using your real name then you should state clearly on your userpage that you are unrelated to the well-known person.

Not disruptive or offensive

Examples of disruptive or offensive names or usernames are names that contain or imply profanity or personal attacks.

Not promotional

Your name or username should not promote an organization or product.

Not otherwise misleading or confusing

Your name or username should not be otherwise misleading or confusing. For example, you may not have a username that leads people to believe the account has permissions it does not have (e.g. "administrator" or "moderator").

We're also flattered by the enthusiasm demonstrated by usernames similar to our own organization (e.g. "OpenHuman1"), but we believe these could be misleading and are therefore not allowed.

Activity Guidelines

Open Humans has the following practices that it expects connected studies and other activities to follow.


Data management

  • Explain the data you'll receive

    Give a plain English list of the data your activity will access and store. Describe the potential sensitivity and identifiability of this data. Give these lists to your participants or users, and (if you are a study) to your IRB or equivalent ethics board.

    For example, instead of saying:

    "We will access and store your Hypothetical Diet Tracker App data."

    Say:

    "We will access and store the following data from Hypothetical Diet Tracker App: your ZIP code, food diary, and weight log data."

  • Explain what you will do with the data you'll receive

    Give a plain summary that explains what you will do with the data you will access. Describe the kind of study or activity you are running and why you would like to access the data.

    For example say:

    "We will use this data to explore whether there is a correlation between a person’s location and their diet/weight."

  • Explain your data privacy and security

    You are responsible for how your activity manages data.

    Give a plain English description of how you will manage the data. Explain whether that data is identified or potentially identifiable, its sensitivity, and other privacy and security issues that may be relevant. Share this with your participants or users, and (if you are a study) with your IRB or equivalent ethics board.

    For example:

    "Raw data will be managed privately, accessed only by study staff and other authorized individuals. To further enhance privacy and security, this private data will not have your name associated with it, although it is possible someone could identify you from the data."

  • Explain what happens with the data after a user leaves your activity

    Users can leave your activity on Open Humans at any time. Explain what you will do with their data after this happens. (For example: will you delete your copies of their data?)

  • Be aware of existing de-identification standards

    You should be aware of what types of data are considered "identifiable" when you're deciding which data to collect and how to manage it. Although you may have access to data without explicit personal identifiers, that data can still be highly identifiable or become identifiable if combined with other data elements. Explain how this could happen with the data you are requesting from the individual users.

    "De-identification" refers to processing personal data to make it very difficult or impossible to re-identify an individual. Open Humans does NOT de-identify data. The most well known standard for data de-identification is HIPAA's safe harbor guidelines.

  • Don't ask for more data than you need

    When you're requesting data and information, be considerate. Don't needlessly increase the identifiability and/or sensitivity of the data you'll be collecting.

    For example, avoid unnecessary granularity that makes data more identifiable. If someone's year of birth is sufficient for your activity, don't ask for the month and day.

  • Share data with activity members

    Open Humans supports the philosophy of "equal access": when generating data about individuals, we should try to give them access to that data. For example, we would like to support a study that wished to give their participants access to resulting raw genome data.

    Activities can use our APIs to upload data for their activity members. Your data will be private in their account, where they will be able to manage it as an additional data source.

  • Organize data according to type

    If you have data types that are very different, consider sharing them as separate units (e.g. "sequencing data" and "survey data") to facilitate your participants' downstream management of that data.


Security

  • Minimize the use of personal data

    It's trivial to identify a data set if someone's name or email is included. Avoid collecting or maintaining this information, if possible. When you do collect such information, try to minimize its use (e.g. don't include it in data analysis files).

  • Use HTTPS

    Use HTTPS (HTTP over SSL) to encrypt transmissions of data to and from your website. This is required to protect user information, tokens, passwords, and other sensitive data in transit.

    If you are running your own website, your SSL should be audited for using weak encryption algorithms and support for perfect forward secrecy with a tool like Qualys SSL Server Test.

  • Keep secrets secret

    Your activity will have secret keys, codes, and tokens, that are used to authenticate identity and encrypt interactions. These MUST be kept secret (e.g. as local files or environment variable). You should use encrypted communications to share these with other administrators.

    If a secret is accidentally leaked, e.g. in a code commit (even if private!), make sure removal is complete (e.g. using git-filter-branch) but – more importantly – make sure this secret is invalidated and a new key or token is generated.

  • Monitor and limit admin access

    Have a policy for who has administrative access to servers and data. Revoke access when it is no longer needed.

  • Use standard software and services

    Security is hard to implement correctly, and standard web frameworks and packages exist that implement these for you (e.g. password hashing). Standard services, like platform cloud hosting, can also help by implementing and updating standard security tools (e.g. SSL).

  • Stay up to date

    Security practices constantly need updating. Be sure to update operating systems and software packages to stay up to date with the latest security updates.

  • Hash passwords

    If you're using passwords for account management, there is no need to store them. Use well-established salt and cryptographic hash functions (e.g. bcrypt) to verify passwords without storing them, to minimize the damage a database breach could cause.

  • Backup your database

    Perform regular backups, and regularly test the data restoration process. It's easy to think you are performing backups correctly, until it's too late.