Clinical Trial TREC 2022 Data

This is a VERY preliminary analysis, basic processing of the data found here


Year by Year Counts

Total number of clinical trials registered for each year


Trial Types

Interventional vs Observational


Leading Clinical Trial Sponsors: Top 100

This shows the top clinical trial sponsors across all years that sponsor the most research


IDF Terms

Approximatley 300k plus terms identified from IDF analysis. For interpretation, IDF scores that are low, say a 1 or 4, mean the terms were common, and not good at distinguishing between documents or studies. While a higher IDF means it is a unique term, that can distinguish between docs. To display this information more efficency, I have created some buckets to group IDF scores, and then performed counts of how many IDF terms fall into each bucket. Low is bad distinguisher. High is good distinguisher.

IDF Bin

Term Counts

0-3

3-5

5-7

7-9

9-10

10-11

11-12

12+

91

1,067

3,951

11,789

12,976

23,518

86,353

221,524


IDF Table of Example Terms

Terms that are that the beginning of the table, found in buckets 0-3, are the worst terms and are not good at distinguishing between documents or studies. The terms at the end of the table, found in buckets 10-11, 11-12, 12+, are the best terms, those that are good for distinguishing between documents or studies. The purpose of this table is to just provide examples.

IDF BucketTerm Example
0-3

population

0-3

data

0-3

randomized

0-3

controlled

0-3

cell

3-5

behaviors

3-5

pd

3-5

routine

3-5

depression

3-5

physical

5-7

indicate

5-7

vital

5-7

ministry

5-7

march

5-7

planning

7-9

aminotransferase

7-9

spanishspeaking

7-9

rejecting

7-9

mca

7-9

lifespan

9-10

naliri

9-10

10session

9-10

cefotaxime

9-10

embodiment

9-10

diseasecopd

10-11

hypovolaemia

10-11

invehicle

10-11

selfweighing

10-11

sf12v2

10-11

doublearmed

11-12

asm8

11-12

sequestrant

11-12

postemergent

11-12

musculoaponeurotic

11-12

souza

12+

erhebungseinheit

12+

mannwhitneyutests

12+

000165

12+

immunoglobuli

12+

lichenien

Visit the github repo.

GitHub Data Code