Tutorial for Beginners

From DIR
Jump to: navigation, search

Before using the DIR framework, you may want to go through this tutorial to get familiar with it.

What You Should Know

DIR tutorial1 3datasets.jpg
Main Page
Health care related datasets are critical for analyzing and improving health care outcomes and effectiveness. However, these datasets are scattered everywhere in various websites and are usually difficult to understand by quickly browsing their official websites. This is especially challenging for students and new comers into the health data analytics field. They are often introduced in complex documents and most of them require agreements or purchases before using, which increases the difficulty for beginners. The purpose of this resource is to integrate all useful knowledge related health care datasets with a specific focus on helping students and new researchers in their early phases of finding and understanding available datasets and learning to analyzing these datasets.

In this first phase of development, we have included a few large datasets, including both free and proprietary datasets. Most of the datasets require users to sign data use agreements.


Several useful features have been provided in this version of DIR resource. These include asking sample questions by simple click (recommended) and search using semantic queries. In addtion, you can get more information by browsing dataset pages and follow our use cases with codes in blogs.


Sample Questions (recommended)

DIR tutorial1 sample questions.jpg
Main Page
DIR tutorial1 sample questions list.jpg
Sample Questions Page
DIR tutorial1 sample questions example.jpg
A Sample Question--What does a dataset talk about
DIR tutorial1 sample questions example2.JPG
A Sample Question--What datasets I can apply the method to
A set of sample questions with pre-set queries has already been created for non-technical users. Using these features, users do not need to generate any queries by themselves. Thus, this is the easiest and most recommended way to search in DIR currently. To try a sample question, please refer to the main page or the sample questions page with a full list.

Since sample questions are limited, if you want more, please feel free to contact us.

Semantic Search

DIR tutorial1 semantic search.jpg
Main Page
DIR tutorial1 semantic search example.jpg
Semantic Search Page
Semantic search, supported by SPARQL-like queries, is much more complex than a keyword search. However, if you are familiar with SPARQL or SQL, you will quickly get started and experience the magical effect of this comprehensive search. For the tutorial of semantic search, please refer to the SemanticMedia Wiki website.
Try the query: [[Category:Summary Level]][[-Dct:isVersionOf::<q>[[Category:Version Level]][[Subject number::>=100000]]</q>]] in Semantic Search.

Dataset Pages

DIR tutorial1 summary level.jpg
A Summary Level Page--MIMIC
Information and knowledge of datasets are represented in a structured way. For each dataset, there are three levels of information (pages): (1) summary level, (2) version level, and (3) distribution level. Summary level pages contain the most general things while the distribution pages contain the most detailed information about a specific version.

A summary level page (e.g., MIMIC) gives you a basic description, dataset website links, publications, methods, etc. In this page, the Version linking block links to a version level page, which shows more details about the current version of this dataset (e.g., MIMIC-III v1.3).

DIR tutorial1 version level.jpg
A Version Level Page--MIMIC-III v1.3
A version level page (e.g., MIMIC-III v1.3) tells you version-specific information, such as a description, date of issue and the link to landing page. It also has links to older versions of the same dataset (e.g., MIMIC-II v2.6 in Version linking block) and links to the distribution level if you decide to get started with a specific form (e.g., database distribution--MIMIC-III v1.3 db--and csv distribution--MIMIC-III v1.3 csv--in Distribution description block).

DIR tutorial1 distribution level.jpg
A Distribution Level Page--MIMIC-III v1.3 database
A distribution level page (e.g., MIMIC-III v1.3 db) describes a specific form of a specific version (e.g., MIMIC-III v1.3) of a dataset (e.g., MIMIC). You can identify the file format (e.g., database format) as well as the version in this page. In addition, you may be interested in the direct links to get access (in Documentations block) and download (in File directory block) a distribution.


DIR tutorial1 blog block.jpg
A Summary Level Page--HCUP
DIR tutorial1 blog.jpg
A Blog of HCUP
To help users easily get started with a dataset, we also included several blogs with instructions, codes and usage examples that everyone can follow. Blogs can be found both in the main page and at the Blog block in summary level pages (e.g., see the Blog block in HCUP).

For More Information

If this tutorial is not sufficient to help, please feel free to contact us (see Support).