Internet Search and Set Operations Discussion

Student:I was doing an Internet search for information about soccer. I put the word soccer in a keyword search, and got about 160,000 results. That's too many!


Mentor: What do you want to know about soccer?

Student: I want to find some information about college soccer teams. Oh, let me try soccer team as a phrase. I will use quotation marks to find only those Web pages where the two words appear next to each other. There are about 30,000 results for that!


Mentor: Let us try again. This time, we will try to get the pages that contain both the word college and the phrase soccer team. Do you know how to do it?

Student: Yes, it's not hard. I just have to put plus signs before each word or phrase. Oh, this is better. About 10,000 results.


Mentor: This kind of refined search can help you a lot! How many results do you predict for the search for "college"?

Student: A lot! A million, maybe? Let me try. Wow! 1,563,571 links. I guess the numbers can be different with different search engines, though.


Mentor: There is an interesting mathematical model for the searches you just did. Let us draw a picture with two ovals, one for college search results, another for soccer team results. If you'll excuse me, I won't draw all million and a half links for college and won't even try to keep the proportions between the two sets of links:

Student: Oh, I see. When we do a search using plus signs, we only get documents that contain both college and soccer team.

Mentor: In mathematics, this operation is called the intersection of sets. Do you see why?

Student: I can see it on the picture! It is harder to express it in words, though.

Mentor: That's why mathematicians are so fond of pictures. By the way, there is a special picture, or sign, for this operation. The signs in scientific language are often used to write (and read) faster. Let us use C for college, and ST for soccer team. Then the documents you found on your last search would contain C and ST, or using the special sign,

CST

Student: So

stands for and. Easy enough. But let us return to our search. 10,000 links is still too much.


Mentor: Let us force the search to be even more specific. Are you interested in college soccer teams from around the world, or not?

Student: I only want to check on the teams from the USA. So, I am going to refine the search even more. This time, I am looking for the documents that contain all of the words: college, soccer team, and USA. I am using plus signs again:

Mentor: Can you draw a picture for your search, as we did before? Such pictures are called Venn diagrams.

Student: Sure. I will only use the first letters for the links. This time, I will have three...

Mentor: Sets.

Student: Right, sets. One set for each word or phrase I used. By the way, I got about 1300 documents this time, because the search engine only selected those of the 10,000 documents with college and soccer team in them that also contained USA. There are a lot of documents that have the word USA, and a lot that have the word college, and a lot that have the phrase soccer team, but a much smaller number of documents contain all three!

Mentor: So here we have the intersection of three sets: the set of documents that have the phrase soccer team, the set of documents that contain the word USA, and the set of documents that contain the word college. We can write it using the symbol for intersection:

C ST USA


Mentor: By the way, can you highlight on the diagram what happens if you search for college, soccer team and USA without using plus signs?

Student: I will see the documents that have at least one of these words. There should be a lot of documents that do! Here is the picture for that:

Mentor: This operation is called the union of sets. There is a special sign for that, of course:

C U ST U USA

means that we are talking about all the documents that contain the word college or the word USA or the phrase soccer team.


Mentor: The last search option I would like to discuss is using the minus sign. Suppose you want to search for documents that contain soccer team but not college...

Student: If I want that, I will use the plus sign in front of soccer team and minus sign in front of college:

Mentor: Can you draw a Venn diagram for that, highlighting the parts we will find?

Student: Sure:

Student: Now tell me, what is the special sign mathematicians use for this one:

Mentor: It reads: "The difference between sets." Here:

ST \ C

It reads: "The difference between the set of documents that have soccer team and the set of documents that have college."