Core data lab methods

2/19/2024

Paulson School of Engineering and Applied Sciences and senior author Tim Kraska, associate professor of EECS at MIT and co-director of the Data, Systems, and AI Lab. They are joined by co-authors Dominik Horn, a graduate student at the Technical University of Munich Andreas Kipf, an MIT postdoc Michael Mitzenmacher, professor of computer science at the Harvard John A. Sabek is the co-lead author of the paper with Department of Electrical Engineering and Computer Science (EECS) graduate student Kapil Vaidya.

For instance, their technique could accelerate computational systems that scientists use to store and analyze DNA, amino acid sequences, or other biological information. Their research, which will be presented at the 2023 International Conference on Very Large Databases, demonstrates how a hash function can be designed to significantly speed up searches in a huge database. In these situations, the computation time for the hash function can be increased a bit, but at the same time its collisions can be reduced very significantly,” says Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL). “What we found in this work is that in some situations we can come up with a better tradeoff between the computation of the hash function and the collisions we will face. The team’s experiments also showed that learned models were often more computationally efficient than perfect hash functions. These learned models are created by running a machine-learning algorithm on a dataset to capture specific characteristics. They found that, in certain situations, using learned models instead of traditional hash functions could result in half as many collisions. So, researchers from MIT and elsewhere set out to see if they could use machine learning to build better hash functions. Since hashing is used in so many applications, from database indexing to data compression to cryptography, fast and efficient hash functions are critical. But they are time-consuming to construct for each dataset and take more time to compute than traditional hash functions. It takes much longer to find the right one, resulting in slower searches and reduced performance.Ĭertain types of hash functions, known as perfect hash functions, are designed to place the data in a way that prevents collisions. This causes collisions - when searching for one item points a user to many pieces of data with the same hash value. However, because traditional hash functions generate codes randomly, sometimes two pieces of data can be hashed with the same value. So, using these codes, it is easier to find and retrieve the data. A hash function generates codes that directly determine the location where data would be stored. Hashing is a core operation in most online databases, like a library catalogue or an e-commerce website.

0 Comments

Core data lab methods

Leave a Reply.

Author

Archives

Categories